public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] New attribute to create target clones
@ 2015-08-27 11:35 Evgeny Stupachenko
  2015-09-08 11:47 ` Evgeny Stupachenko
  2015-09-21 13:43 ` Bernd Schmidt
  0 siblings, 2 replies; 36+ messages in thread
From: Evgeny Stupachenko @ 2015-08-27 11:35 UTC (permalink / raw)
  To: GCC Patches, Jan Hubicka, Jeff Law

[-- Attachment #1: Type: text/plain, Size: 1688 bytes --]

Hi All,

Based on RFC:
https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01322.html

The patch implement an extension to Function Multiversioning that
allows to clone a function for multiple targets.
__attribute__((target_clones("avx","arch=slm","default")))
int foo ()
...

Will create 3 clones of foo(). One optimized with -mavx, one optimized
with -march=slm, and one with default optimizations.
And will create ifunc resolver that calls appropriate clone (same as
in Function Multiversioning).

Bootstrap and make check for x86 passed.

Is it ok for trunk?

2015-08-27  Evgeny Stupachenko  <evstupac@gmail.com>

gcc/
        * Makefile.in (OBJS): Add multiple_target.o.
        * multiple_target.c (make_attribute): New.
        (create_dispatcher_calls): Ditto.
        (expand_target_clones): Ditto.
        (ipa_target_clone): Ditto.
        * passes.def (pass_target_clone): New ipa pass.
        * tree-pass.h (make_pass_target_clone): Ditto.

gcc/c-family
        * c-common.c (handle_target_clones_attribute): New.
        * (c_common_attribute_table): Add handle_target_clones_attribute.
        * (handle_always_inline_attribute): Add check on target_clones
        attribute.
        * (handle_target_attribute): Ditto.

gcc/testsuite
        * gcc.dg/mvc1.c: New test for multiple targets cloning.
        * gcc.dg/mvc2.c: Ditto.
        * gcc.dg/mvc3.c: Ditto.
        * gcc.dg/mvc4.c: Ditto.
        * gcc.dg/mvc5.c: Ditto.
        * gcc.dg/mvc6.c: Ditto.
        * gcc.dg/mvc7.c: Ditto.
        * g++.dg/ext/mvc1.C: Ditto.
        * g++.dg/ext/mvc2.C: Ditto.
        * g++.dg/ext/mvc3.C: Ditto.

gcc/doc
        * doc/extend.texi (target_clones): New attribute description.

[-- Attachment #2: target_clones.patch --]
[-- Type: application/octet-stream, Size: 20246 bytes --]

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index e298ecc..fc12ab9 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1350,6 +1350,7 @@ OBJS = \
 	mcf.o \
 	mode-switching.o \
 	modulo-sched.o \
+	multiple_target.o \
 	omp-low.o \
 	optabs.o \
 	options-save.o \
diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 7691035..31ba31f 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -381,6 +381,7 @@ static tree handle_alloc_size_attribute (tree *, tree, tree, int, bool *);
 static tree handle_alloc_align_attribute (tree *, tree, tree, int, bool *);
 static tree handle_assume_aligned_attribute (tree *, tree, tree, int, bool *);
 static tree handle_target_attribute (tree *, tree, tree, int, bool *);
+static tree handle_target_clones_attribute (tree *, tree, tree, int, bool *);
 static tree handle_optimize_attribute (tree *, tree, tree, int, bool *);
 static tree ignore_attribute (tree *, tree, tree, int, bool *);
 static tree handle_no_split_stack_attribute (tree *, tree, tree, int, bool *);
@@ -788,6 +789,8 @@ const struct attribute_spec c_common_attribute_table[] =
 			      handle_error_attribute, false },
   { "target",                 1, -1, true, false, false,
 			      handle_target_attribute, false },
+  { "target_clones",	      1, -1, true, false, false,
+			      handle_target_clones_attribute, false },
   { "optimize",               1, -1, true, false, false,
 			      handle_optimize_attribute, false },
   /* For internal use only.  The leading '*' both prevents its usage in
@@ -7363,6 +7366,13 @@ handle_always_inline_attribute (tree *node, tree name,
 		   "with %qs attribute", name, "noinline");
 	  *no_add_attrs = true;
 	}
+      else if (lookup_attribute ("target_clones", DECL_ATTRIBUTES (*node)))
+	{
+	  warning (OPT_Wattributes,
+		   "%qE attribute ignored as it conflict with target_clones",
+		   name);
+	  *no_add_attrs = true;
+	}
       else
 	/* Set the attribute and mark it for disregarding inline
 	   limits.  */
@@ -9774,6 +9784,12 @@ handle_target_attribute (tree *node, tree name, tree args, int flags,
       warning (OPT_Wattributes, "%qE attribute ignored", name);
       *no_add_attrs = true;
     }
+  else if (lookup_attribute ("target_clones", DECL_ATTRIBUTES (*node)))
+    {
+      warning (OPT_Wattributes,
+	       "%qE attribute ignored as it conflict with target_clones", name);
+      *no_add_attrs = true;
+    }
   else if (! targetm.target_option.valid_attribute_p (*node, name, args,
 						      flags))
     *no_add_attrs = true;
@@ -9781,6 +9797,39 @@ handle_target_attribute (tree *node, tree name, tree args, int flags,
   return NULL_TREE;
 }
 
+/* Handle a "target_clones" attribute.  */
+
+static tree
+handle_target_clones_attribute (tree *node, tree name, tree ARG_UNUSED (args),
+			  int ARG_UNUSED (flags), bool *no_add_attrs)
+{
+  /* Ensure we have a function type.  */
+  if (TREE_CODE (*node) == FUNCTION_DECL)
+    {
+      if (lookup_attribute ("always_inline", DECL_ATTRIBUTES (*node)))
+	{
+	  warning (OPT_Wattributes,
+		   "%qE attribute ignored as it assumes noinline", name);
+	  *no_add_attrs = true;
+	}
+      else if (lookup_attribute ("target", DECL_ATTRIBUTES (*node)))
+	{
+	  warning (OPT_Wattributes,
+		   "%qE attribute ignored as it conflict with target", name);
+	  *no_add_attrs = true;
+	}
+      else
+      /* Do not inline functions with multiple clone targets.  */
+	DECL_UNINLINABLE (*node) = 1;
+    }
+  else
+    {
+      warning (OPT_Wattributes, "%qE attribute ignored", name);
+      *no_add_attrs = true;
+    }
+  return NULL_TREE;
+}
+
 /* Arguments being collected for optimization.  */
 typedef const char *const_char_p;		/* For DEF_VEC_P.  */
 static GTY(()) vec<const_char_p, va_gc> *optimize_args;
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index dba8b43..c0b5875 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -3066,6 +3066,19 @@ This function attribute make a stack protection of the function if
 flags @option{fstack-protector} or @option{fstack-protector-strong}
 or @option{fstack-protector-explicit} are set.
 
+@item target_clones (@var{options})
+@cindex @code{target_clones} function attribute
+The @code{target_clones} attribute is used to specify that a function is to
+be cloned into multiple versions compiled with different target options
+than specified on the command line.  The supported options and restrictions
+are the same as for @code{target}.
+
+For instance on an x86, you could compile a function with
+@code{target_clones("sse4.1,avx")}.  It will create 2 function clones,
+one compiled with @option{-msse4.1} and another with @option{-mavx}.
+At the function call it will create resolver @code{ifunc}, that will
+dynamically call a clone suitable for current architecture.
+
 @item target (@var{options})
 @cindex @code{target} function attribute
 Multiple target back ends implement the @code{target} attribute
diff --git a/gcc/multiple_target.c b/gcc/multiple_target.c
new file mode 100644
index 0000000..1946dba
--- /dev/null
+++ b/gcc/multiple_target.c
@@ -0,0 +1,331 @@
+/* Pass for parsing functions with multiple target attributes.
+
+   Contributed by Evgeny Stupachenko <evstupac@gmail.com>
+
+   Copyright (C) 2015 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "tree.h"
+#include "stringpool.h"
+#include "gimple.h"
+#include "diagnostic-core.h"
+#include "gimple-ssa.h"
+#include "cgraph.h"
+#include "tree-pass.h"
+#include "target.h"
+#include "pretty-print.h"
+
+/* Makes a function attribute of the form NAME(ARG_NAME) and chains
+   it to CHAIN.  */
+
+static tree
+make_attribute (const char *name, const char *arg_name, tree chain)
+{
+  tree attr_name;
+  tree attr_arg_name;
+  tree attr_args;
+  tree attr;
+
+  attr_name = get_identifier (name);
+  attr_arg_name = build_string (strlen (arg_name), arg_name);
+  attr_args = tree_cons (NULL_TREE, attr_arg_name, NULL_TREE);
+  attr = tree_cons (attr_name, attr_args, chain);
+  return attr;
+}
+
+/* If the call in NODE has multiple target attribute with multiple fields,
+   replace it with dispatcher call and create dispatcher (once).  */
+
+static void
+create_dispatcher_calls (struct cgraph_node *node)
+{
+  cgraph_edge *e;
+  cgraph_edge *e_next;
+  for (e = node->callers; e ;e = (e == NULL) ? e_next : e->next_caller)
+    {
+      tree resolver_decl;
+      tree idecl;
+      tree decl;
+      gimple call = e->call_stmt;
+      struct cgraph_node *inode;
+
+      /* Checking if call of function is call of versioned function.
+	 Versioned function are not inlined, so there is no need to
+	 check for inline.  */
+      if (!call
+	  || !(decl = gimple_call_fndecl (call))
+	  || !DECL_FUNCTION_VERSIONED (decl))
+	continue;
+
+      e_next = e->next_caller;
+      idecl = targetm.get_function_versions_dispatcher (decl);
+      if (!idecl)
+	{
+	  error_at (gimple_location (call),
+		    "default target_clones attribute was not set");
+	}
+      inode = cgraph_node::get (idecl);
+      gcc_assert (inode);
+      resolver_decl = targetm.generate_version_dispatcher_body (inode);
+
+      /* Update aliases.  */
+      inode->alias = true;
+      inode->alias_target = resolver_decl;
+      if (!inode->analyzed)
+	inode->resolve_alias (cgraph_node::get (resolver_decl));
+      e->redirect_callee (inode);
+      gimple_call_set_fndecl (call, idecl);
+      /*  Force to move to next edge caller.  */
+      e = NULL;
+    }
+}
+
+/* If the function in NODE has multiple target attribute with multiple fields,
+   create the appropriate clone for each field.  */
+
+static bool
+expand_target_clones (struct cgraph_node *node)
+{
+  tree id;
+  /* Parsing target attributes separated by comma.  */
+  tree attr_target = lookup_attribute ("target_clones",
+				       DECL_ATTRIBUTES (node->decl));
+  /* No targets specified.  */
+  if (!attr_target)
+    return false;
+
+  tree arg;
+  tree arglist = TREE_VALUE (attr_target);
+  size_t str_len_sum = 0;
+  char **args = NULL;
+  char *attr_str;
+  char *attr = NULL;
+  int argnum = 1;
+  int i;
+
+  for (arg = arglist; arg; arg = TREE_CHAIN (arg))
+    {
+      unsigned int i;
+      const char *str = TREE_STRING_POINTER (TREE_VALUE (arg));
+      size_t len = strlen (str);
+      str_len_sum += len + 1;
+      if (arg != arglist)
+	argnum++;
+      for (i = 0; i < strlen (str); i++)
+	if (str[i] == ',')
+	  argnum++;
+    }
+
+  /* No need to clone for 1 target attribute.  */
+  if (argnum == 1)
+    {
+      warning_at (DECL_SOURCE_LOCATION (node->decl),
+		  0,
+		  "single target_clones attribute is ignored");
+      return false;
+    }
+
+  attr_str = XNEWVEC (char, str_len_sum);
+  str_len_sum = 0;
+  for (arg = arglist; arg; arg = TREE_CHAIN (arg))
+    {
+      const char *str = TREE_STRING_POINTER (TREE_VALUE (arg));
+      size_t len = strlen (str);
+      memcpy (attr_str + str_len_sum, str, len);
+      attr_str[str_len_sum + len] = TREE_CHAIN (arg) ? ',' : '\0';
+      str_len_sum += len + 1;
+    }
+
+  args = XNEWVEC (char *, argnum);
+  bool has_default = false;
+  attr = strtok (attr_str, ",");
+  i = 0;
+  while (attr != NULL)
+    {
+      if (strcmp (attr, "default") == 0)
+	{
+	  has_default = true;
+	  attr = strtok (NULL, ",");
+	  continue;
+	}
+      args[i] = attr;
+      attr = strtok (NULL, ",");
+      i++;
+    }
+
+  int args_num = i;
+
+  const char *old_asm_name;
+  old_asm_name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (node->decl));
+
+  cgraph_function_version_info *decl1_v = NULL;
+  cgraph_function_version_info *decl2_v = NULL;
+  cgraph_function_version_info *before = NULL;
+  cgraph_function_version_info *after = NULL;
+  tree attributes = NULL;
+  if (!has_default)
+    error_at (DECL_SOURCE_LOCATION (node->decl),
+	      "default target was not set");
+  attributes = make_attribute ("target", "default", NULL);
+  DECL_ATTRIBUTES (node->decl) = attributes;
+  targetm.target_option.valid_attribute_p (node->decl, NULL,
+					   TREE_VALUE (attributes), 0);
+  decl1_v = node->function_version ();
+  if (decl1_v == NULL)
+    decl1_v = node->insert_new_function_version ();
+  before = decl1_v;
+  DECL_FUNCTION_VERSIONED (node->decl) = 1;
+
+  /* Setting new attribute to initial function.  */
+
+  for (i = 0; i < args_num; i++)
+    {
+      pretty_printer pp;
+      attr = args[i];
+      cgraph_node *new_node;
+      unsigned int j;
+
+      if (node->definition)
+	{
+	  if (!node->has_gimple_body_p ())
+	    return false;
+	  node->get_body ();
+
+	  /* Replacing initial function with clone only for 1st target.  */
+	  new_node = node->create_version_clone_with_body (vNULL, NULL,
+							   NULL, false,
+							   NULL, NULL,
+							   "target_clone");
+	  new_node->externally_visible = node->externally_visible;
+	  new_node->address_taken = node->address_taken;
+	  new_node->thunk = node->thunk;
+	  new_node->alias = node->alias;
+	  new_node->weakref = node->weakref;
+	  new_node->cpp_implicit_alias = node->cpp_implicit_alias;
+	  new_node->local.local = node->local.local;
+	  TREE_PUBLIC (new_node->decl) = TREE_PUBLIC (node->decl);
+	}
+      else
+	{
+	  tree new_decl = copy_node (node->decl);
+	  new_node = cgraph_node::get_create (new_decl);
+	}
+
+	const char *new_asm_name;
+	char *new_suffix;
+	new_suffix = XNEWVEC (char, strlen (attr) + 1);
+	memcpy (new_suffix, attr, strlen (attr) + 1);
+
+	/* Replace all '=' and '-' in target option with '_'.  */
+	for (j = 0; j < strlen (new_suffix); j++)
+	  if (new_suffix[j] == '=' || new_suffix[j] == '-')
+	    new_suffix[j] = '_';
+
+	pp_string (&pp, old_asm_name);
+	pp_string (&pp, ".");
+	pp_string (&pp, new_suffix);
+	new_asm_name = pp_formatted_text (&pp);
+	id = get_identifier (new_asm_name);
+	symtab->change_decl_assembler_name (new_node->decl, id);
+	XDELETEVEC (new_suffix);
+
+	/* Setting new attribute to cloned function.  */
+	tree attributes = make_attribute ("target", attr, NULL_TREE);
+	DECL_ATTRIBUTES (new_node->decl) = attributes;
+	targetm.target_option.valid_attribute_p (new_node->decl, NULL,
+						 TREE_VALUE (attributes), 0);
+	decl2_v = new_node->function_version ();
+	if (decl2_v != NULL)
+	  continue;
+	decl2_v = new_node->insert_new_function_version ();
+
+	/* Chain decl2_v and decl1_v.  All semantically identical versions
+	   will be chained together.  */
+
+	after = decl2_v;
+	while (before->next != NULL)
+	  before = before->next;
+	while (after->prev != NULL)
+	  after = after->prev;
+
+	before->next = after;
+	after->prev = before;
+	DECL_FUNCTION_VERSIONED (new_node->decl) = 1;
+    }
+  XDELETEVEC (args);
+  XDELETEVEC (attr_str);
+  return true;
+}
+
+static unsigned int
+ipa_target_clone (void)
+{
+  struct cgraph_node *node;
+  bool res = false;
+  FOR_EACH_FUNCTION (node)
+    res |= expand_target_clones (node);
+  if (res)
+    FOR_EACH_FUNCTION (node)
+      create_dispatcher_calls (node);
+  return 0;
+}
+
+namespace {
+
+const pass_data pass_data_target_clone =
+{
+  SIMPLE_IPA_PASS,		/* type */
+  "targetclone",		/* name */
+  OPTGROUP_NONE,		/* optinfo_flags */
+  TV_NONE,			/* tv_id */
+  ( PROP_ssa | PROP_cfg ),	/* properties_required */
+  0,				/* properties_provided */
+  0,				/* properties_destroyed */
+  0,				/* todo_flags_start */
+  0,				/* todo_flags_finish */
+};
+
+class pass_target_clone : public simple_ipa_opt_pass
+{
+public:
+  pass_target_clone (gcc::context *ctxt)
+    : simple_ipa_opt_pass (pass_data_target_clone, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *);
+  virtual unsigned int execute (function *) { return ipa_target_clone (); }
+};
+
+bool
+pass_target_clone::gate (function *)
+{
+  return true;
+}
+
+} // anon namespace
+
+simple_ipa_opt_pass *
+make_pass_target_clone (gcc::context *ctxt)
+{
+  return new pass_target_clone (ctxt);
+}
diff --git a/gcc/passes.def b/gcc/passes.def
index 64fc4d9..ff6c550 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -141,6 +141,7 @@ along with GCC; see the file COPYING3.  If not see
   INSERT_PASSES_AFTER (all_late_ipa_passes)
   NEXT_PASS (pass_ipa_pta);
   NEXT_PASS (pass_omp_simd_clone);
+  NEXT_PASS (pass_target_clone);
   TERMINATE_PASS_LIST ()
 
   /* These passes are run after IPA passes on every function that is being
diff --git a/gcc/testsuite/gcc.dg/mvc1.c b/gcc/testsuite/gcc.dg/mvc1.c
new file mode 100644
index 0000000..ab1e0f5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/mvc1.c
@@ -0,0 +1,27 @@
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2" } */
+
+__attribute__((target_clones("avx","arch=slm","arch=core-avx2","default")))
+int
+foo ()
+{
+  return -2;
+}
+
+int
+bar ()
+{
+  return 2;
+}
+
+int
+main ()
+{
+  int r = 0;
+  r += bar ();
+  r += foo ();
+  r += bar ();
+  r += foo ();
+  r += bar ();
+  return r - 2;
+}
diff --git a/gcc/testsuite/gcc.dg/mvc2.c b/gcc/testsuite/gcc.dg/mvc2.c
new file mode 100644
index 0000000..165e632
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/mvc2.c
@@ -0,0 +1,5 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2" } */
+
+__attribute__((target_clones("avx","arch=slm","arch=core-avx2")))
+int foo ();
diff --git a/gcc/testsuite/gcc.dg/mvc3.c b/gcc/testsuite/gcc.dg/mvc3.c
new file mode 100644
index 0000000..074808a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/mvc3.c
@@ -0,0 +1,11 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "" } */
+
+__attribute__((target_clones("avx","arch=slm","arch=core-avx2")))
+int foo (); /* { dg-error "default target was not set" } */
+
+int
+bar ()
+{
+  return foo();
+}
diff --git a/gcc/testsuite/gcc.dg/mvc4.c b/gcc/testsuite/gcc.dg/mvc4.c
new file mode 100644
index 0000000..6147365
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/mvc4.c
@@ -0,0 +1,27 @@
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2" } */
+
+__attribute__((target_clones("default","avx","default")))
+int
+foo ()
+{
+  return -2;
+}
+
+int
+bar ()
+{
+  return 2;
+}
+
+int
+main ()
+{
+  int r = 0;
+  r += bar ();
+  r += foo ();
+  r += bar ();
+  r += foo ();
+  r += bar ();
+  return r - 2;
+}
diff --git a/gcc/testsuite/gcc.dg/mvc5.c b/gcc/testsuite/gcc.dg/mvc5.c
new file mode 100644
index 0000000..dfdb762
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/mvc5.c
@@ -0,0 +1,17 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2 -fno-inline" } */
+/* { dg-final { scan-assembler-times "foo.ifunc" 6 } } */
+
+__attribute__((target_clones("default","avx","avx2")))
+int
+foo ()
+{
+  return 10;
+}
+
+__attribute__((target_clones("default","avx","avx2")))
+int
+bar ()
+{
+  return -foo ();
+}
diff --git a/gcc/testsuite/gcc.dg/mvc6.c b/gcc/testsuite/gcc.dg/mvc6.c
new file mode 100644
index 0000000..1621985
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/mvc6.c
@@ -0,0 +1,16 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O3" } */
+/* { dg-final { scan-assembler "vpshufb" } } */
+/* { dg-final { scan-assembler "punpcklbw" } } */
+
+__attribute__((target_clones("arch=core-avx2","arch=slm","default")))
+void
+foo(char *in, char *out, int size)
+{
+  int i;
+  for(i = 0; i < size; i++)
+    {
+	out[2 * i] = in[i];
+	out[2 * i + 1] = in[i];
+    }
+}
diff --git a/gcc/testsuite/gcc.dg/mvc7.c b/gcc/testsuite/gcc.dg/mvc7.c
new file mode 100644
index 0000000..782f8dc
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/mvc7.c
@@ -0,0 +1,11 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O3" } */
+/* { dg-final { scan-assembler-times "foo.ifunc" 4 } } */
+
+__attribute__((target_clones("avx","default","arch=slm","arch=core-avx2")))
+int foo ();
+
+int main()
+{
+  return foo();
+}
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 7b66a1c..ec723b8 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -483,6 +483,7 @@ extern ipa_opt_pass_d *make_pass_ipa_pure_const (gcc::context *ctxt);
 extern simple_ipa_opt_pass *make_pass_ipa_pta (gcc::context *ctxt);
 extern simple_ipa_opt_pass *make_pass_ipa_tm (gcc::context *ctxt);
 extern simple_ipa_opt_pass *make_pass_omp_simd_clone (gcc::context *ctxt);
+extern simple_ipa_opt_pass *make_pass_target_clone (gcc::context *ctxt);
 extern ipa_opt_pass_d *make_pass_ipa_profile (gcc::context *ctxt);
 extern ipa_opt_pass_d *make_pass_ipa_cdtor_merge (gcc::context *ctxt);
 extern ipa_opt_pass_d *make_pass_ipa_single_use (gcc::context *ctxt);
diff --git a/target_clones.ChangeLog b/target_clones.ChangeLog
new file mode 100644
index 0000000..b06e702
--- /dev/null
+++ b/target_clones.ChangeLog
@@ -0,0 +1,32 @@
+2015-08-26  Evgeny Stupachenko  <evstupac@gmail.com>
+
+gcc/
+	* Makefile.in (OBJS): Add multiple_target.o.
+	* multiple_target.c (make_attribute): New.
+	(create_dispatcher_calls): Ditto.
+	(expand_target_clones): Ditto.
+	(ipa_target_clone): Ditto.
+	* passes.def (pass_target_clone): New ipa pass.
+	* tree-pass.h (make_pass_target_clone): Ditto.
+
+gcc/c-family
+	* c-common.c (handle_target_clones_attribute): New.
+	* (c_common_attribute_table): Add handle_target_clones_attribute.
+	* (handle_always_inline_attribute): Add check on target_clones
+	attribute.
+	* (handle_target_attribute): Ditto.
+
+gcc/testsuite
+	* gcc.dg/mvc1.c: New test for multiple targets cloning.
+	* gcc.dg/mvc2.c: Ditto.
+	* gcc.dg/mvc3.c: Ditto.
+	* gcc.dg/mvc4.c: Ditto.
+	* gcc.dg/mvc5.c: Ditto.
+	* gcc.dg/mvc6.c: Ditto.
+	* gcc.dg/mvc7.c: Ditto.
+	* g++.dg/ext/mvc1.C: Ditto.
+	* g++.dg/ext/mvc2.C: Ditto.
+	* g++.dg/ext/mvc3.C: Ditto.
+
+gcc/doc
+	* doc/extend.texi (target_clones): New attribute description.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] New attribute to create target clones
  2015-08-27 11:35 [PATCH] New attribute to create target clones Evgeny Stupachenko
@ 2015-09-08 11:47 ` Evgeny Stupachenko
  2015-09-16 11:42   ` Evgeny Stupachenko
  2015-09-21 13:43 ` Bernd Schmidt
  1 sibling, 1 reply; 36+ messages in thread
From: Evgeny Stupachenko @ 2015-09-08 11:47 UTC (permalink / raw)
  To: GCC Patches, Jan Hubicka, Jeff Law

Ping.

On Thu, Aug 27, 2015 at 2:18 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
> Hi All,
>
> Based on RFC:
> https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01322.html
>
> The patch implement an extension to Function Multiversioning that
> allows to clone a function for multiple targets.
> __attribute__((target_clones("avx","arch=slm","default")))
> int foo ()
> ...
>
> Will create 3 clones of foo(). One optimized with -mavx, one optimized
> with -march=slm, and one with default optimizations.
> And will create ifunc resolver that calls appropriate clone (same as
> in Function Multiversioning).
>
> Bootstrap and make check for x86 passed.
>
> Is it ok for trunk?
>
> 2015-08-27  Evgeny Stupachenko  <evstupac@gmail.com>
>
> gcc/
>         * Makefile.in (OBJS): Add multiple_target.o.
>         * multiple_target.c (make_attribute): New.
>         (create_dispatcher_calls): Ditto.
>         (expand_target_clones): Ditto.
>         (ipa_target_clone): Ditto.
>         * passes.def (pass_target_clone): New ipa pass.
>         * tree-pass.h (make_pass_target_clone): Ditto.
>
> gcc/c-family
>         * c-common.c (handle_target_clones_attribute): New.
>         * (c_common_attribute_table): Add handle_target_clones_attribute.
>         * (handle_always_inline_attribute): Add check on target_clones
>         attribute.
>         * (handle_target_attribute): Ditto.
>
> gcc/testsuite
>         * gcc.dg/mvc1.c: New test for multiple targets cloning.
>         * gcc.dg/mvc2.c: Ditto.
>         * gcc.dg/mvc3.c: Ditto.
>         * gcc.dg/mvc4.c: Ditto.
>         * gcc.dg/mvc5.c: Ditto.
>         * gcc.dg/mvc6.c: Ditto.
>         * gcc.dg/mvc7.c: Ditto.
>         * g++.dg/ext/mvc1.C: Ditto.
>         * g++.dg/ext/mvc2.C: Ditto.
>         * g++.dg/ext/mvc3.C: Ditto.
>
> gcc/doc
>         * doc/extend.texi (target_clones): New attribute description.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] New attribute to create target clones
  2015-09-08 11:47 ` Evgeny Stupachenko
@ 2015-09-16 11:42   ` Evgeny Stupachenko
  0 siblings, 0 replies; 36+ messages in thread
From: Evgeny Stupachenko @ 2015-09-16 11:42 UTC (permalink / raw)
  To: GCC Patches, Jan Hubicka, Jeff Law

PING 2.

On Tue, Sep 8, 2015 at 2:27 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
> Ping.
>
> On Thu, Aug 27, 2015 at 2:18 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
>> Hi All,
>>
>> Based on RFC:
>> https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01322.html
>>
>> The patch implement an extension to Function Multiversioning that
>> allows to clone a function for multiple targets.
>> __attribute__((target_clones("avx","arch=slm","default")))
>> int foo ()
>> ...
>>
>> Will create 3 clones of foo(). One optimized with -mavx, one optimized
>> with -march=slm, and one with default optimizations.
>> And will create ifunc resolver that calls appropriate clone (same as
>> in Function Multiversioning).
>>
>> Bootstrap and make check for x86 passed.
>>
>> Is it ok for trunk?
>>
>> 2015-08-27  Evgeny Stupachenko  <evstupac@gmail.com>
>>
>> gcc/
>>         * Makefile.in (OBJS): Add multiple_target.o.
>>         * multiple_target.c (make_attribute): New.
>>         (create_dispatcher_calls): Ditto.
>>         (expand_target_clones): Ditto.
>>         (ipa_target_clone): Ditto.
>>         * passes.def (pass_target_clone): New ipa pass.
>>         * tree-pass.h (make_pass_target_clone): Ditto.
>>
>> gcc/c-family
>>         * c-common.c (handle_target_clones_attribute): New.
>>         * (c_common_attribute_table): Add handle_target_clones_attribute.
>>         * (handle_always_inline_attribute): Add check on target_clones
>>         attribute.
>>         * (handle_target_attribute): Ditto.
>>
>> gcc/testsuite
>>         * gcc.dg/mvc1.c: New test for multiple targets cloning.
>>         * gcc.dg/mvc2.c: Ditto.
>>         * gcc.dg/mvc3.c: Ditto.
>>         * gcc.dg/mvc4.c: Ditto.
>>         * gcc.dg/mvc5.c: Ditto.
>>         * gcc.dg/mvc6.c: Ditto.
>>         * gcc.dg/mvc7.c: Ditto.
>>         * g++.dg/ext/mvc1.C: Ditto.
>>         * g++.dg/ext/mvc2.C: Ditto.
>>         * g++.dg/ext/mvc3.C: Ditto.
>>
>> gcc/doc
>>         * doc/extend.texi (target_clones): New attribute description.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] New attribute to create target clones
  2015-08-27 11:35 [PATCH] New attribute to create target clones Evgeny Stupachenko
  2015-09-08 11:47 ` Evgeny Stupachenko
@ 2015-09-21 13:43 ` Bernd Schmidt
  2015-09-22 20:24   ` Jeff Law
  1 sibling, 1 reply; 36+ messages in thread
From: Bernd Schmidt @ 2015-09-21 13:43 UTC (permalink / raw)
  To: Evgeny Stupachenko, GCC Patches, Jan Hubicka, Jeff Law

On 08/27/2015 01:18 PM, Evgeny Stupachenko wrote:
> Based on RFC:
> https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01322.html
>
> The patch implement an extension to Function Multiversioning that
> allows to clone a function for multiple targets.
> __attribute__((target_clones("avx","arch=slm","default")))
> int foo ()
> ...
>
> Will create 3 clones of foo(). One optimized with -mavx, one optimized
> with -march=slm, and one with default optimizations.
> And will create ifunc resolver that calls appropriate clone (same as
> in Function Multiversioning).

The general question is - do we want this, given that it seems to 
introduce no functionality that can't be had with the existing 
multiversioning? You could always compile the same source file to 
multiple objects with different defines for the target optimization, or 
include a file containing the multiversioned function multiple times 
with changing #defines.

Some comments on the patch itself:

 > diff --git a/gcc/testsuite/gcc.dg/mvc1.c b/gcc/testsuite/gcc.dg/mvc1.c
 > new file mode 100644
 > index 0000000..ab1e0f5
 > --- /dev/null
 > +++ b/gcc/testsuite/gcc.dg/mvc1.c
 > @@ -0,0 +1,27 @@
 > +/* { dg-do run { target i?86-*-* x86_64-*-* } } */
 > +/* { dg-options "-O2" } */

Target-specific tests should go into gcc.target/i386.

> +      else if (lookup_attribute ("target_clones", DECL_ATTRIBUTES (*node)))
> +	{
> +	  warning (OPT_Wattributes,
> +		   "%qE attribute ignored as it conflict with target_clones",
> +		   name);
> +	  *no_add_attrs = true;
> +	}

Bad grammar ("conflicts"), and apparently you're supposed to use %qs 
even for known attribute names. Look at the code immediately before this 
and copy it. Similar issues throughout.

> +  for (e = node->callers; e ;e = (e == NULL) ? e_next : e->next_caller)

It looks like this could be simplified if you immediately assign
   e_next = e->next_caller
and then always use e_next.

Trying to test it, I get (compiling with -O2 on x86_64-linux):

mvc1.c: In function ‘main’:
mvc1.c:18:1: error: virtual definition of statement not up-to-date
  main ()
  ^
_2 = foo.ifunc ();
mvc1.c:18:1: internal compiler error: verify_ssa failed


Bernd

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] New attribute to create target clones
  2015-09-21 13:43 ` Bernd Schmidt
@ 2015-09-22 20:24   ` Jeff Law
  2015-09-22 21:09     ` Bernd Schmidt
  0 siblings, 1 reply; 36+ messages in thread
From: Jeff Law @ 2015-09-22 20:24 UTC (permalink / raw)
  To: Bernd Schmidt, Evgeny Stupachenko, GCC Patches, Jan Hubicka

On 09/21/2015 07:25 AM, Bernd Schmidt wrote:
> On 08/27/2015 01:18 PM, Evgeny Stupachenko wrote:
>> Based on RFC:
>> https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01322.html
>>
>> The patch implement an extension to Function Multiversioning that
>> allows to clone a function for multiple targets.
>> __attribute__((target_clones("avx","arch=slm","default")))
>> int foo ()
>> ...
>>
>> Will create 3 clones of foo(). One optimized with -mavx, one optimized
>> with -march=slm, and one with default optimizations.
>> And will create ifunc resolver that calls appropriate clone (same as
>> in Function Multiversioning).
>
> The general question is - do we want this, given that it seems to
> introduce no functionality that can't be had with the existing
> multiversioning? You could always compile the same source file to
> multiple objects with different defines for the target optimization, or
> include a file containing the multiversioned function multiple times
> with changing #defines.
Essentially it allows us to more easily support 
per-microarchitecture-optimized versions of functions.   You list just 
have to list the microarchitectures and the compiler handles the rest. 
Very simple, very easy.  I'd think it'd be particularly helpful for 
vectorization.

You could emulate this with compiling the same source multiple times 
with different flags/defines and wire up on ifunc by hand.  But Evgeny's 
approach is vastly simpler.

We are still waiting on Jan to chime in as to whether or not any of this 
breaks the rules WRT nodes in the symbol table.  Given I mucked up the 
MPX code by not knowing those rules, I'm hesitant to approve this patch 
without input from Jan.

Jeff

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] New attribute to create target clones
  2015-09-22 20:24   ` Jeff Law
@ 2015-09-22 21:09     ` Bernd Schmidt
  2015-09-22 23:52       ` Evgeny Stupachenko
  2015-10-08 16:39       ` Jeff Law
  0 siblings, 2 replies; 36+ messages in thread
From: Bernd Schmidt @ 2015-09-22 21:09 UTC (permalink / raw)
  To: Jeff Law, Bernd Schmidt, Evgeny Stupachenko, GCC Patches, Jan Hubicka

On 09/22/2015 09:41 PM, Jeff Law wrote:
> Essentially it allows us to more easily support
> per-microarchitecture-optimized versions of functions.   You list just
> have to list the microarchitectures and the compiler handles the rest.
> Very simple, very easy.  I'd think it'd be particularly helpful for
> vectorization.
>
> You could emulate this with compiling the same source multiple times
> with different flags/defines and wire up on ifunc by hand.  But Evgeny's
> approach is vastly simpler.

As far as I can tell the ifunc is generated automatically (and the 
functionality is documented as such), so the new target_clone doesn't 
buy much. But one thing I didn't was that the existing support is only 
available in C++, while Evgeny's patch works for C. That is probably an 
argument that could be made for its inclusion.

Or at least, it's supposed to work. As I said, I get verify_ssa failures 
on the included testcases, and for a simpler one I just tried I get the 
clones of the function, but not the resolver that ought to be generated.


Bernd

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] New attribute to create target clones
  2015-09-22 21:09     ` Bernd Schmidt
@ 2015-09-22 23:52       ` Evgeny Stupachenko
  2015-09-25  0:02         ` Evgeny Stupachenko
  2015-10-08 16:39       ` Jeff Law
  1 sibling, 1 reply; 36+ messages in thread
From: Evgeny Stupachenko @ 2015-09-22 23:52 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: Jeff Law, Bernd Schmidt, GCC Patches, Jan Hubicka

Thank you for the review.
The patch still works with gcc 5, but the fail reproduced on trunk
(looks like it appeared while patch was at review). I'll debug it and
fix.
As a workaround to test the feature...
Removing
"gimple_call_set_fndecl (call, idecl);" from multiple_target.c
should resolve the ICE

I'll fix the patch for trunk and send an update.

Thanks,
Evgeny


On Wed, Sep 23, 2015 at 12:09 AM, Bernd Schmidt <bschmidt@redhat.com> wrote:
> On 09/22/2015 09:41 PM, Jeff Law wrote:
>>
>> Essentially it allows us to more easily support
>> per-microarchitecture-optimized versions of functions.   You list just
>> have to list the microarchitectures and the compiler handles the rest.
>> Very simple, very easy.  I'd think it'd be particularly helpful for
>> vectorization.
>>
>> You could emulate this with compiling the same source multiple times
>> with different flags/defines and wire up on ifunc by hand.  But Evgeny's
>> approach is vastly simpler.
>
>
> As far as I can tell the ifunc is generated automatically (and the
> functionality is documented as such), so the new target_clone doesn't buy
> much. But one thing I didn't was that the existing support is only available
> in C++, while Evgeny's patch works for C. That is probably an argument that
> could be made for its inclusion.
>
> Or at least, it's supposed to work. As I said, I get verify_ssa failures on
> the included testcases, and for a simpler one I just tried I get the clones
> of the function, but not the resolver that ought to be generated.
>
>
> Bernd

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] New attribute to create target clones
  2015-09-22 23:52       ` Evgeny Stupachenko
@ 2015-09-25  0:02         ` Evgeny Stupachenko
  2015-10-02 13:18           ` Evgeny Stupachenko
  2015-10-08 19:00           ` Jeff Law
  0 siblings, 2 replies; 36+ messages in thread
From: Evgeny Stupachenko @ 2015-09-25  0:02 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: Jeff Law, Bernd Schmidt, GCC Patches, Jan Hubicka

[-- Attachment #1: Type: text/plain, Size: 2905 bytes --]

I've fixed ICE and review issues.
x86 make check and bootstrap passed.

Thanks,
Evgeny

ChangeLog

2015-09-25  Evgeny Stupachenko  <evstupac@gmail.com>

gcc/
        * Makefile.in (OBJS): Add multiple_target.o.
        * multiple_target.c (make_attribute): New.
        (create_dispatcher_calls): Ditto.
        (expand_target_clones): Ditto.
        (ipa_target_clone): Ditto.
        * passes.def (pass_target_clone): New ipa pass.
        * tree-pass.h (make_pass_target_clone): Ditto.

gcc/c-family
        * c-common.c (handle_target_clones_attribute): New.
        * (c_common_attribute_table): Add handle_target_clones_attribute.
        * (handle_always_inline_attribute): Add check on target_clones
        attribute.
        * (handle_target_attribute): Ditto.

gcc/testsuite
        * gcc.dg/mvc1.c: New test for multiple targets cloning.
        * gcc.dg/mvc2.c: Ditto.
        * gcc.dg/mvc3.c: Ditto.
        * gcc.dg/mvc4.c: Ditto.
        * gcc.dg/mvc5.c: Ditto.
        * gcc.dg/mvc6.c: Ditto.
        * gcc.dg/mvc7.c: Ditto.
        * g++.dg/ext/mvc1.C: Ditto.
        * g++.dg/ext/mvc2.C: Ditto.
        * g++.dg/ext/mvc3.C: Ditto.

gcc/doc
        * doc/extend.texi (target_clones): New attribute description.


On Wed, Sep 23, 2015 at 1:49 AM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
> Thank you for the review.
> The patch still works with gcc 5, but the fail reproduced on trunk
> (looks like it appeared while patch was at review). I'll debug it and
> fix.
> As a workaround to test the feature...
> Removing
> "gimple_call_set_fndecl (call, idecl);" from multiple_target.c
> should resolve the ICE
>
> I'll fix the patch for trunk and send an update.
>
> Thanks,
> Evgeny
>
>
> On Wed, Sep 23, 2015 at 12:09 AM, Bernd Schmidt <bschmidt@redhat.com> wrote:
>> On 09/22/2015 09:41 PM, Jeff Law wrote:
>>>
>>> Essentially it allows us to more easily support
>>> per-microarchitecture-optimized versions of functions.   You list just
>>> have to list the microarchitectures and the compiler handles the rest.
>>> Very simple, very easy.  I'd think it'd be particularly helpful for
>>> vectorization.
>>>
>>> You could emulate this with compiling the same source multiple times
>>> with different flags/defines and wire up on ifunc by hand.  But Evgeny's
>>> approach is vastly simpler.
>>
>>
>> As far as I can tell the ifunc is generated automatically (and the
>> functionality is documented as such), so the new target_clone doesn't buy
>> much. But one thing I didn't was that the existing support is only available
>> in C++, while Evgeny's patch works for C. That is probably an argument that
>> could be made for its inclusion.
>>
>> Or at least, it's supposed to work. As I said, I get verify_ssa failures on
>> the included testcases, and for a simpler one I just tried I get the clones
>> of the function, but not the resolver that ought to be generated.
>>
>>
>> Bernd

[-- Attachment #2: target_clones.patch --]
[-- Type: application/octet-stream, Size: 20956 bytes --]

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 009c745..5d1bd4e 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1348,6 +1348,7 @@ OBJS = \
 	mcf.o \
 	mode-switching.o \
 	modulo-sched.o \
+	multiple_target.o \
 	omp-low.o \
 	optabs.o \
 	optabs-libfuncs.o \
diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 879f4db..b369e91 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -384,6 +384,7 @@ static tree handle_alloc_size_attribute (tree *, tree, tree, int, bool *);
 static tree handle_alloc_align_attribute (tree *, tree, tree, int, bool *);
 static tree handle_assume_aligned_attribute (tree *, tree, tree, int, bool *);
 static tree handle_target_attribute (tree *, tree, tree, int, bool *);
+static tree handle_target_clones_attribute (tree *, tree, tree, int, bool *);
 static tree handle_optimize_attribute (tree *, tree, tree, int, bool *);
 static tree ignore_attribute (tree *, tree, tree, int, bool *);
 static tree handle_no_split_stack_attribute (tree *, tree, tree, int, bool *);
@@ -791,6 +792,8 @@ const struct attribute_spec c_common_attribute_table[] =
 			      handle_error_attribute, false },
   { "target",                 1, -1, true, false, false,
 			      handle_target_attribute, false },
+  { "target_clones",          1, -1, true, false, false,
+			      handle_target_clones_attribute, false },
   { "optimize",               1, -1, true, false, false,
 			      handle_optimize_attribute, false },
   /* For internal use only.  The leading '*' both prevents its usage in
@@ -7366,6 +7369,12 @@ handle_always_inline_attribute (tree *node, tree name,
 		   "with %qs attribute", name, "noinline");
 	  *no_add_attrs = true;
 	}
+      else if (lookup_attribute ("target_clones", DECL_ATTRIBUTES (*node)))
+	{
+	  warning (OPT_Wattributes, "%qE attribute ignored due to conflict "
+		   "with %qs attribute", name, "target_clones");
+	  *no_add_attrs = true;
+	}
       else
 	/* Set the attribute and mark it for disregarding inline
 	   limits.  */
@@ -9777,6 +9786,12 @@ handle_target_attribute (tree *node, tree name, tree args, int flags,
       warning (OPT_Wattributes, "%qE attribute ignored", name);
       *no_add_attrs = true;
     }
+  else if (lookup_attribute ("target_clones", DECL_ATTRIBUTES (*node)))
+    {
+      warning (OPT_Wattributes, "%qE attribute ignored due to conflict "
+		   "with %qs attribute", name, "target_clones");
+      *no_add_attrs = true;
+    }
   else if (! targetm.target_option.valid_attribute_p (*node, name, args,
 						      flags))
     *no_add_attrs = true;
@@ -9784,6 +9799,39 @@ handle_target_attribute (tree *node, tree name, tree args, int flags,
   return NULL_TREE;
 }
 
+/* Handle a "target_clones" attribute.  */
+
+static tree
+handle_target_clones_attribute (tree *node, tree name, tree ARG_UNUSED (args),
+			  int ARG_UNUSED (flags), bool *no_add_attrs)
+{
+  /* Ensure we have a function type.  */
+  if (TREE_CODE (*node) == FUNCTION_DECL)
+    {
+      if (lookup_attribute ("always_inline", DECL_ATTRIBUTES (*node)))
+	{
+	  warning (OPT_Wattributes, "%qE attribute ignored due to conflict "
+		   "with %qs attribute", name, "always_inline");
+	  *no_add_attrs = true;
+	}
+      else if (lookup_attribute ("target", DECL_ATTRIBUTES (*node)))
+	{
+	  warning (OPT_Wattributes, "%qE attribute ignored due to conflict "
+		   "with %qs attribute", name, "target");
+	  *no_add_attrs = true;
+	}
+      else
+      /* Do not inline functions with multiple clone targets.  */
+	DECL_UNINLINABLE (*node) = 1;
+    }
+  else
+    {
+      warning (OPT_Wattributes, "%qE attribute ignored", name);
+      *no_add_attrs = true;
+    }
+  return NULL_TREE;
+}
+
 /* Arguments being collected for optimization.  */
 typedef const char *const_char_p;		/* For DEF_VEC_P.  */
 static GTY(()) vec<const_char_p, va_gc> *optimize_args;
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 23e6a76..badfcda 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -3066,6 +3066,19 @@ This function attribute make a stack protection of the function if
 flags @option{fstack-protector} or @option{fstack-protector-strong}
 or @option{fstack-protector-explicit} are set.
 
+@item target_clones (@var{options})
+@cindex @code{target_clones} function attribute
+The @code{target_clones} attribute is used to specify that a function is to
+be cloned into multiple versions compiled with different target options
+than specified on the command line.  The supported options and restrictions 
+are the same as for @code{target}.
+
+For instance on an x86, you could compile a function with
+@code{target_clones("sse4.1,avx")}. It will create 2 function clones,
+one compiled with @option{-msse4.1} and another with @option{-mavx}.
+At the function call it will create resolver @code{ifunc}, that will
+dynamically call a clone suitable for current architecture.
+
 @item target (@var{options})
 @cindex @code{target} function attribute
 Multiple target back ends implement the @code{target} attribute
diff --git a/gcc/multiple_target.c b/gcc/multiple_target.c
new file mode 100644
index 0000000..6490570
--- /dev/null
+++ b/gcc/multiple_target.c
@@ -0,0 +1,331 @@
+/* Pass for parsing functions with multiple target attributes.
+
+   Contributed by Evgeny Stupachenko <evstupac@gmail.com>
+
+   Copyright (C) 2015 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "tree.h"
+#include "stringpool.h"
+#include "gimple.h"
+#include "diagnostic-core.h"
+#include "gimple-ssa.h"
+#include "cgraph.h"
+#include "tree-pass.h"
+#include "target.h"
+#include "pretty-print.h"
+/* Makes a function attribute of the form NAME(ARG_NAME) and chains
+   it to CHAIN.  */
+
+static tree
+make_attribute (const char *name, const char *arg_name, tree chain)
+{
+  tree attr_name;
+  tree attr_arg_name;
+  tree attr_args;
+  tree attr;
+
+  attr_name = get_identifier (name);
+  attr_arg_name = build_string (strlen (arg_name), arg_name);
+  attr_args = tree_cons (NULL_TREE, attr_arg_name, NULL_TREE);
+  attr = tree_cons (attr_name, attr_args, chain);
+  return attr;
+}
+
+/* If the call in NODE has multiple target attribute with multiple fields,
+   replace it with dispatcher call and create dispatcher (once).  */
+
+static void
+create_dispatcher_calls (struct cgraph_node *node)
+{
+  cgraph_edge *e;
+  cgraph_edge *e_next;
+  for (e = node->callers; e ;e = (e == NULL) ? e_next : e->next_caller)
+    {
+      tree resolver_decl;
+      tree idecl;
+      tree decl;
+      gimple *call = e->call_stmt;
+      struct cgraph_node *inode;
+
+      /* Checking if call of function is call of versioned function.
+	 Versioned function are not inlined, so there is no need to
+	 check for inline.  */
+      if (!call
+	  || !(decl = gimple_call_fndecl (call))
+	  || !DECL_FUNCTION_VERSIONED (decl))
+	continue;
+
+      e_next = e->next_caller;
+      idecl = targetm.get_function_versions_dispatcher (decl);
+      if (!idecl)
+	{
+	  error_at (gimple_location (call),
+		    "default target_clones attribute was not set");
+	}
+      inode = cgraph_node::get (idecl);
+      gcc_assert (inode);
+      resolver_decl = targetm.generate_version_dispatcher_body (inode);
+
+      /* Update aliases.  */
+      inode->alias = true;
+      inode->alias_target = resolver_decl;
+      if (!inode->analyzed)
+	inode->resolve_alias (cgraph_node::get (resolver_decl));
+
+      e->redirect_callee (inode);
+      /*  Force to move to next edge caller.  */
+      e = NULL;
+    }
+}
+
+/* If the function in NODE has multiple target attribute with multiple fields,
+   create the appropriate clone for each field.  */
+
+static bool
+expand_target_clones (struct cgraph_node *node)
+{
+  tree id;
+  /* Parsing target attributes separated by comma.  */
+  tree attr_target = lookup_attribute ("target_clones",
+				       DECL_ATTRIBUTES (node->decl));
+  /* No targets specified.  */
+  if (!attr_target)
+    return false;
+
+  tree arg;
+  tree arglist = TREE_VALUE (attr_target);
+  size_t str_len_sum = 0;
+  char **args = NULL;
+  char *attr_str;
+  char *attr = NULL;
+  int argnum = 1;
+  int i;
+
+  for (arg = arglist; arg; arg = TREE_CHAIN (arg))
+    {
+      unsigned int i;
+      const char *str = TREE_STRING_POINTER (TREE_VALUE (arg));
+      size_t len = strlen (str);
+      str_len_sum += len + 1;
+      if (arg != arglist)
+	argnum++;
+      for (i = 0; i < strlen (str); i++)
+	if (str[i] == ',')
+	  argnum++;
+    }
+
+  /* No need to clone for 1 target attribute.  */
+  if (argnum == 1)
+    {
+      warning_at (DECL_SOURCE_LOCATION (node->decl),
+		  0,
+		  "single target_clones attribute is ignored");
+      return false;
+    }
+
+  attr_str = XNEWVEC (char, str_len_sum);
+  str_len_sum = 0;
+  for (arg = arglist; arg; arg = TREE_CHAIN (arg))
+    {
+      const char *str = TREE_STRING_POINTER (TREE_VALUE (arg));
+      size_t len = strlen (str);
+      memcpy (attr_str + str_len_sum, str, len);
+      attr_str[str_len_sum + len] = TREE_CHAIN (arg) ? ',' : '\0';
+      str_len_sum += len + 1;
+    }
+
+  args = XNEWVEC (char *, argnum);
+  bool has_default = false;
+  attr = strtok (attr_str, ",");
+  i = 0;
+  while (attr != NULL)
+    {
+      if (strcmp (attr, "default") == 0)
+	{
+	  has_default = true;
+	  attr = strtok (NULL, ",");
+	  continue;
+	}
+      args[i] = attr;
+      attr = strtok (NULL, ",");
+      i++;
+    }
+
+  int args_num = i;
+
+  const char *old_asm_name;
+  old_asm_name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (node->decl));
+
+  cgraph_function_version_info *decl1_v = NULL;
+  cgraph_function_version_info *decl2_v = NULL;
+  cgraph_function_version_info *before = NULL;
+  cgraph_function_version_info *after = NULL;
+  tree attributes = NULL;
+  if (!has_default)
+    error_at (DECL_SOURCE_LOCATION (node->decl),
+	      "default target was not set");
+  decl1_v = node->function_version ();
+  if (decl1_v == NULL)
+    decl1_v = node->insert_new_function_version ();
+  before = decl1_v;
+  DECL_FUNCTION_VERSIONED (node->decl) = 1;
+
+  /* Setting new attribute to initial function.  */
+
+  for (i = 0; i < args_num; i++)
+    {
+      pretty_printer pp;
+      attr = args[i];
+      cgraph_node *new_node;
+      unsigned int j;
+
+      if (node->definition)
+	{
+	  if (!node->has_gimple_body_p ())
+	    return false;
+
+	  /* Replacing initial function with clone only for 1st target.  */
+	  new_node = node->create_version_clone_with_body (vNULL, NULL,
+							   NULL, false,
+							   NULL, NULL,
+							   "target_clone");
+	  new_node->externally_visible = node->externally_visible;
+	  new_node->address_taken = node->address_taken;
+	  new_node->thunk = node->thunk;
+	  new_node->alias = node->alias;
+	  new_node->weakref = node->weakref;
+	  new_node->cpp_implicit_alias = node->cpp_implicit_alias;
+	  new_node->local.local = node->local.local;
+	  TREE_PUBLIC (new_node->decl) = TREE_PUBLIC (node->decl);
+	}
+      else
+	{
+	  tree new_decl = copy_node (node->decl);
+	  new_node = cgraph_node::get_create (new_decl);
+	}
+
+	const char *new_asm_name;
+	char *new_suffix;
+	new_suffix = XNEWVEC (char, strlen (attr) + 1);
+	memcpy (new_suffix, attr, strlen (attr) + 1);
+
+	/* Replace all '=' and '-' in target option with '_'.  */
+	for (j = 0; j < strlen (new_suffix); j++)
+	  if (new_suffix[j] == '=' || new_suffix[j] == '-')
+	    new_suffix[j] = '_';
+
+	pp_string (&pp, old_asm_name);
+	pp_string (&pp, ".");
+	pp_string (&pp, new_suffix);
+	new_asm_name = pp_formatted_text (&pp);
+	id = get_identifier (new_asm_name);
+	symtab->change_decl_assembler_name (new_node->decl, id);
+	XDELETEVEC (new_suffix);
+
+	/* Setting new attribute to cloned function.  */
+	tree attributes = make_attribute ("target", attr,
+					  DECL_ATTRIBUTES (new_node->decl));
+	DECL_ATTRIBUTES (new_node->decl) = attributes;
+	targetm.target_option.valid_attribute_p (new_node->decl, NULL,
+						 TREE_VALUE (attributes), 0);
+	decl2_v = new_node->function_version ();
+	if (decl2_v != NULL)
+	  continue;
+	decl2_v = new_node->insert_new_function_version ();
+
+	/* Chain decl2_v and decl1_v.  All semantically identical versions
+	   will be chained together.  */
+
+	after = decl2_v;
+	while (before->next != NULL)
+	  before = before->next;
+	while (after->prev != NULL)
+	  after = after->prev;
+
+	before->next = after;
+	after->prev = before;
+	DECL_FUNCTION_VERSIONED (new_node->decl) = 1;
+    }
+  attributes = make_attribute ("target", "default",
+			       DECL_ATTRIBUTES (node->decl));
+  DECL_ATTRIBUTES (node->decl) = attributes;
+  targetm.target_option.valid_attribute_p (node->decl, NULL,
+					   TREE_VALUE (attributes), 0);
+  XDELETEVEC (args);
+  XDELETEVEC (attr_str);
+  return true;
+}
+
+static unsigned int
+ipa_target_clone (void)
+{
+  struct cgraph_node *node;
+  bool res = false;
+  FOR_EACH_FUNCTION (node)
+    res |= expand_target_clones (node);
+  if (res)
+    FOR_EACH_FUNCTION (node)
+      create_dispatcher_calls (node);
+  return 0;
+}
+
+namespace {
+
+const pass_data pass_data_target_clone =
+{
+  SIMPLE_IPA_PASS,		/* type */
+  "targetclone",		/* name */
+  OPTGROUP_NONE,		/* optinfo_flags */
+  TV_NONE,			/* tv_id */
+  ( PROP_ssa | PROP_cfg ),	/* properties_required */
+  0,				/* properties_provided */
+  0,				/* properties_destroyed */
+  0,				/* todo_flags_start */
+  0				/* todo_flags_finish */
+};
+
+class pass_target_clone : public simple_ipa_opt_pass
+{
+public:
+  pass_target_clone (gcc::context *ctxt)
+    : simple_ipa_opt_pass (pass_data_target_clone, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *);
+  virtual unsigned int execute (function *) { return ipa_target_clone (); }
+};
+
+bool
+pass_target_clone::gate (function *)
+{
+  return true;
+}
+
+} // anon namespace
+
+simple_ipa_opt_pass *
+make_pass_target_clone (gcc::context *ctxt)
+{
+  return new pass_target_clone (ctxt);
+}
diff --git a/gcc/passes.def b/gcc/passes.def
index 64fc4d9..ff6c550 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -141,6 +141,7 @@ along with GCC; see the file COPYING3.  If not see
   INSERT_PASSES_AFTER (all_late_ipa_passes)
   NEXT_PASS (pass_ipa_pta);
   NEXT_PASS (pass_omp_simd_clone);
+  NEXT_PASS (pass_target_clone);
   TERMINATE_PASS_LIST ()
 
   /* These passes are run after IPA passes on every function that is being
diff --git a/gcc/testsuite/g++.dg/ext/mvc1.C b/gcc/testsuite/g++.dg/ext/mvc1.C
new file mode 100644
index 0000000..fbf9011
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/mvc1.C
@@ -0,0 +1,34 @@
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
+
+__attribute__((target_clones("avx","arch=slm","arch=core-avx2","default")))
+int
+foo ()
+{
+  return -2;
+}
+
+__attribute__((target("avx","arch=core-avx2")))
+int
+bar ()
+{
+  return 2;
+}
+
+__attribute__((target("default")))
+int
+bar ()
+{
+  return 2;
+}
+
+int
+main ()
+{
+  int r = 0;
+  r += bar ();
+  r += foo ();
+  r += bar ();
+  r += foo ();
+  r += bar ();
+  return r - 2;
+}
diff --git a/gcc/testsuite/g++.dg/ext/mvc2.C b/gcc/testsuite/g++.dg/ext/mvc2.C
new file mode 100644
index 0000000..e7abab8
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/mvc2.C
@@ -0,0 +1,8 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+
+__attribute__((target_clones("avx","arch=slm","default")))
+__attribute__((target("avx")))
+int foo (); /* { dg-warning "'target' attribute ignored due to conflict with 'target_clones' attribute" } */
+
+__attribute__((target_clones("avx","arch=slm","default"),always_inline))
+int bar (); /* { dg-warning "'always_inline' attribute ignored due to conflict with 'target_clones' attribute" } */
diff --git a/gcc/testsuite/g++.dg/ext/mvc3.C b/gcc/testsuite/g++.dg/ext/mvc3.C
new file mode 100644
index 0000000..05bebf7
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/mvc3.C
@@ -0,0 +1,8 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+
+__attribute__((target("avx")))
+__attribute__((target_clones("avx","arch=slm","default")))
+int foo (); /* { dg-warning "'target_clones' attribute ignored due to conflict with 'target' attribute" } */
+
+__attribute__((always_inline,target_clones("avx","arch=slm","default")))
+int bar (); /* { dg-warning "'target_clones' attribute ignored due to conflict with 'always_inline' attribute" } */
diff --git a/gcc/testsuite/gcc.dg/mvc1.c b/gcc/testsuite/gcc.dg/mvc1.c
new file mode 100644
index 0000000..8e02721
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/mvc1.c
@@ -0,0 +1,26 @@
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
+
+__attribute__((target_clones("avx","arch=slm","arch=core-avx2","default")))
+int
+foo ()
+{
+  return -2;
+}
+
+int
+bar ()
+{
+  return 2;
+}
+
+int
+main ()
+{
+  int r = 0;
+  r += bar ();
+  r += foo ();
+  r += bar ();
+  r += foo ();
+  r += bar ();
+  return r - 2;
+}
diff --git a/gcc/testsuite/gcc.dg/mvc2.c b/gcc/testsuite/gcc.dg/mvc2.c
new file mode 100644
index 0000000..af0c6f7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/mvc2.c
@@ -0,0 +1,4 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+
+__attribute__((target_clones("avx","arch=slm","arch=core-avx2")))
+int foo ();
diff --git a/gcc/testsuite/gcc.dg/mvc3.c b/gcc/testsuite/gcc.dg/mvc3.c
new file mode 100644
index 0000000..3af3e35
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/mvc3.c
@@ -0,0 +1,10 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+
+__attribute__((target_clones("avx","arch=slm","arch=core-avx2")))
+int foo (); /* { dg-error "default target was not set" } */
+
+int
+bar ()
+{
+  return foo();
+}
diff --git a/gcc/testsuite/gcc.dg/mvc4.c b/gcc/testsuite/gcc.dg/mvc4.c
new file mode 100644
index 0000000..48ec9a1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/mvc4.c
@@ -0,0 +1,26 @@
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
+
+__attribute__((target_clones("default","avx","default")))
+int
+foo ()
+{
+  return -2;
+}
+
+int
+bar ()
+{
+  return 2;
+}
+
+int
+main ()
+{
+  int r = 0;
+  r += bar ();
+  r += foo ();
+  r += bar ();
+  r += foo ();
+  r += bar ();
+  return r - 2;
+}
diff --git a/gcc/testsuite/gcc.dg/mvc5.c b/gcc/testsuite/gcc.dg/mvc5.c
new file mode 100644
index 0000000..89001e5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/mvc5.c
@@ -0,0 +1,17 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-fno-inline" } */
+/* { dg-final { scan-assembler-times "foo.ifunc" 6 } } */
+
+__attribute__((target_clones("default","avx","avx2")))
+int
+foo ()
+{
+  return 10;
+}
+
+__attribute__((target_clones("default","avx","avx2")))
+int
+bar ()
+{
+  return -foo ();
+}
diff --git a/gcc/testsuite/gcc.dg/mvc6.c b/gcc/testsuite/gcc.dg/mvc6.c
new file mode 100644
index 0000000..1621985
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/mvc6.c
@@ -0,0 +1,16 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O3" } */
+/* { dg-final { scan-assembler "vpshufb" } } */
+/* { dg-final { scan-assembler "punpcklbw" } } */
+
+__attribute__((target_clones("arch=core-avx2","arch=slm","default")))
+void
+foo(char *in, char *out, int size)
+{
+  int i;
+  for(i = 0; i < size; i++)
+    {
+	out[2 * i] = in[i];
+	out[2 * i + 1] = in[i];
+    }
+}
diff --git a/gcc/testsuite/gcc.dg/mvc7.c b/gcc/testsuite/gcc.dg/mvc7.c
new file mode 100644
index 0000000..d61d78e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/mvc7.c
@@ -0,0 +1,10 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-final { scan-assembler-times "foo.ifunc" 4 } } */
+
+__attribute__((target_clones("avx","default","arch=slm","arch=core-avx2")))
+int foo ();
+
+int main()
+{
+  return foo();
+}
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 3c913ea..7103b08 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -483,6 +483,7 @@ extern ipa_opt_pass_d *make_pass_ipa_pure_const (gcc::context *ctxt);
 extern simple_ipa_opt_pass *make_pass_ipa_pta (gcc::context *ctxt);
 extern simple_ipa_opt_pass *make_pass_ipa_tm (gcc::context *ctxt);
 extern simple_ipa_opt_pass *make_pass_omp_simd_clone (gcc::context *ctxt);
+extern simple_ipa_opt_pass *make_pass_target_clone (gcc::context *ctxt);
 extern ipa_opt_pass_d *make_pass_ipa_profile (gcc::context *ctxt);
 extern ipa_opt_pass_d *make_pass_ipa_cdtor_merge (gcc::context *ctxt);
 extern ipa_opt_pass_d *make_pass_ipa_single_use (gcc::context *ctxt);

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] New attribute to create target clones
  2015-09-25  0:02         ` Evgeny Stupachenko
@ 2015-10-02 13:18           ` Evgeny Stupachenko
  2015-10-08 19:00           ` Jeff Law
  1 sibling, 0 replies; 36+ messages in thread
From: Evgeny Stupachenko @ 2015-10-02 13:18 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: Jeff Law, Bernd Schmidt, GCC Patches, Jan Hubicka

PING.

On Fri, Sep 25, 2015 at 1:28 AM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
> I've fixed ICE and review issues.
> x86 make check and bootstrap passed.
>
> Thanks,
> Evgeny
>
> ChangeLog
>
> 2015-09-25  Evgeny Stupachenko  <evstupac@gmail.com>
>
> gcc/
>         * Makefile.in (OBJS): Add multiple_target.o.
>         * multiple_target.c (make_attribute): New.
>         (create_dispatcher_calls): Ditto.
>         (expand_target_clones): Ditto.
>         (ipa_target_clone): Ditto.
>         * passes.def (pass_target_clone): New ipa pass.
>         * tree-pass.h (make_pass_target_clone): Ditto.
>
> gcc/c-family
>         * c-common.c (handle_target_clones_attribute): New.
>         * (c_common_attribute_table): Add handle_target_clones_attribute.
>         * (handle_always_inline_attribute): Add check on target_clones
>         attribute.
>         * (handle_target_attribute): Ditto.
>
> gcc/testsuite
>         * gcc.dg/mvc1.c: New test for multiple targets cloning.
>         * gcc.dg/mvc2.c: Ditto.
>         * gcc.dg/mvc3.c: Ditto.
>         * gcc.dg/mvc4.c: Ditto.
>         * gcc.dg/mvc5.c: Ditto.
>         * gcc.dg/mvc6.c: Ditto.
>         * gcc.dg/mvc7.c: Ditto.
>         * g++.dg/ext/mvc1.C: Ditto.
>         * g++.dg/ext/mvc2.C: Ditto.
>         * g++.dg/ext/mvc3.C: Ditto.
>
> gcc/doc
>         * doc/extend.texi (target_clones): New attribute description.
>
>
> On Wed, Sep 23, 2015 at 1:49 AM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
>> Thank you for the review.
>> The patch still works with gcc 5, but the fail reproduced on trunk
>> (looks like it appeared while patch was at review). I'll debug it and
>> fix.
>> As a workaround to test the feature...
>> Removing
>> "gimple_call_set_fndecl (call, idecl);" from multiple_target.c
>> should resolve the ICE
>>
>> I'll fix the patch for trunk and send an update.
>>
>> Thanks,
>> Evgeny
>>
>>
>> On Wed, Sep 23, 2015 at 12:09 AM, Bernd Schmidt <bschmidt@redhat.com> wrote:
>>> On 09/22/2015 09:41 PM, Jeff Law wrote:
>>>>
>>>> Essentially it allows us to more easily support
>>>> per-microarchitecture-optimized versions of functions.   You list just
>>>> have to list the microarchitectures and the compiler handles the rest.
>>>> Very simple, very easy.  I'd think it'd be particularly helpful for
>>>> vectorization.
>>>>
>>>> You could emulate this with compiling the same source multiple times
>>>> with different flags/defines and wire up on ifunc by hand.  But Evgeny's
>>>> approach is vastly simpler.
>>>
>>>
>>> As far as I can tell the ifunc is generated automatically (and the
>>> functionality is documented as such), so the new target_clone doesn't buy
>>> much. But one thing I didn't was that the existing support is only available
>>> in C++, while Evgeny's patch works for C. That is probably an argument that
>>> could be made for its inclusion.
>>>
>>> Or at least, it's supposed to work. As I said, I get verify_ssa failures on
>>> the included testcases, and for a simpler one I just tried I get the clones
>>> of the function, but not the resolver that ought to be generated.
>>>
>>>
>>> Bernd

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] New attribute to create target clones
  2015-09-22 21:09     ` Bernd Schmidt
  2015-09-22 23:52       ` Evgeny Stupachenko
@ 2015-10-08 16:39       ` Jeff Law
  1 sibling, 0 replies; 36+ messages in thread
From: Jeff Law @ 2015-10-08 16:39 UTC (permalink / raw)
  To: Bernd Schmidt, Bernd Schmidt, Evgeny Stupachenko, GCC Patches,
	Jan Hubicka

On 09/22/2015 03:09 PM, Bernd Schmidt wrote:
> On 09/22/2015 09:41 PM, Jeff Law wrote:
>> Essentially it allows us to more easily support
>> per-microarchitecture-optimized versions of functions.   You list just
>> have to list the microarchitectures and the compiler handles the rest.
>> Very simple, very easy.  I'd think it'd be particularly helpful for
>> vectorization.
>>
>> You could emulate this with compiling the same source multiple times
>> with different flags/defines and wire up on ifunc by hand.  But Evgeny's
>> approach is vastly simpler.
>
> As far as I can tell the ifunc is generated automatically (and the
> functionality is documented as such), so the new target_clone doesn't
> buy much. But one thing I didn't was that the existing support is only
> available in C++, while Evgeny's patch works for C. That is probably an
> argument that could be made for its inclusion.
In multi-versioning, you have a distinct source implementation for each 
function.  ie

__attribute__ ((target ("default")))
int foo ()
{
   // The default version of foo.
   return 0;
}

__attribute__ ((target ("sse4.2")))
int foo ()
{
   // foo version for SSE4.2
   return 1;
}

And so-on for each processor version you want to support.

In Evgeny's patch we'd have a single source implementation of foo which 
gets compiled multiple times, once for each target specified by the 
attribute.

I wasn't aware that multi-versioning was only implemented for C++, that 
seems fairly lame.  I hope I didn't approve that :-)

Jeff

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] New attribute to create target clones
  2015-09-25  0:02         ` Evgeny Stupachenko
  2015-10-02 13:18           ` Evgeny Stupachenko
@ 2015-10-08 19:00           ` Jeff Law
  2015-10-08 19:23             ` Jan Hubicka
  2015-10-08 20:01             ` Evgeny Stupachenko
  1 sibling, 2 replies; 36+ messages in thread
From: Jeff Law @ 2015-10-08 19:00 UTC (permalink / raw)
  To: Evgeny Stupachenko, Bernd Schmidt; +Cc: Bernd Schmidt, GCC Patches, Jan Hubicka

On 09/24/2015 04:28 PM, Evgeny Stupachenko wrote:
> I've fixed ICE and review issues.
> x86 make check and bootstrap passed.
>
> Thanks,
> Evgeny
>
> ChangeLog
>
> 2015-09-25  Evgeny Stupachenko<evstupac@gmail.com>
>
> gcc/
>          * Makefile.in (OBJS): Add multiple_target.o.
>          * multiple_target.c (make_attribute): New.
>          (create_dispatcher_calls): Ditto.
>          (expand_target_clones): Ditto.
>          (ipa_target_clone): Ditto.
>          * passes.def (pass_target_clone): New ipa pass.
>          * tree-pass.h (make_pass_target_clone): Ditto.
>
> gcc/c-family
>          * c-common.c (handle_target_clones_attribute): New.
>          * (c_common_attribute_table): Add handle_target_clones_attribute.
>          * (handle_always_inline_attribute): Add check on target_clones
>          attribute.
>          * (handle_target_attribute): Ditto.
>
> gcc/testsuite
>          * gcc.dg/mvc1.c: New test for multiple targets cloning.
>          * gcc.dg/mvc2.c: Ditto.
>          * gcc.dg/mvc3.c: Ditto.
>          * gcc.dg/mvc4.c: Ditto.
>          * gcc.dg/mvc5.c: Ditto.
>          * gcc.dg/mvc6.c: Ditto.
>          * gcc.dg/mvc7.c: Ditto.
>          * g++.dg/ext/mvc1.C: Ditto.
>          * g++.dg/ext/mvc2.C: Ditto.
>          * g++.dg/ext/mvc3.C: Ditto.
>
> gcc/doc
>          * doc/extend.texi (target_clones): New attribute description.
>
>

>
> target_clones.patch
Sorry this has taken so long to come back to...  As I mentioned a couple 
months ago, I'd hoped Jan would chime in on the IPA/symtab requirements. 
  But that didn't happen.


SO I went back and reviewed the discussion between Jan, Ilya & myself 
WRT some of the rules around aliases, clones, etc.  I think the key 
question for this patch is whether or not the clones have the same 
assembler name or not.  From looking at expand_target_clones, I'm 
confident the answer is the clones have different assembler names.  In 
fact, the assembler names are munged with the options used for that 
specific clone of the original function.


>
> +/* Makes a function attribute of the form NAME(ARG_NAME) and chains
> +   it to CHAIN.  */
> +
> +static tree
> +make_attribute (const char *name, const char *arg_name, tree chain)
> +{
> +  tree attr_name;
> +  tree attr_arg_name;
> +  tree attr_args;
> +  tree attr;
> +
> +  attr_name = get_identifier (name);
> +  attr_arg_name = build_string (strlen (arg_name), arg_name);
> +  attr_args = tree_cons (NULL_TREE, attr_arg_name, NULL_TREE);
> +  attr = tree_cons (attr_name, attr_args, chain);
> +  return attr;
> +}
This seems generic enough that I'd prefer it in attribs.c.  I was rather 
surprised when I looked and didn't find an existing routine to do this.


> +
> +/* If the call in NODE has multiple target attribute with multiple fields,
> +   replace it with dispatcher call and create dispatcher (once).  */
> +
> +static void
> +create_dispatcher_calls (struct cgraph_node *node)
> +{
> +  cgraph_edge *e;
> +  cgraph_edge *e_next;
> +  for (e = node->callers; e ;e = (e == NULL) ? e_next : e->next_caller)
That's a rather strange way to write the loop increment.  If I follow 
the loop logic correctly, it seems that we always end up using 
e->next_caller, it's just obscured.

For the test if we're calling a versioned function, we just "continue". 
  e will be non-null and thus we use e->next_caller to set e for the 
next iteration.

If the test for calling a versioned function falls, we set e_next to 
e->next_caller, then later set e to NULL.  That results in using e_next 
to set e for the next iteration.  But e_next was initialized to 
e->next_caller.

So why not just write the loop increment as e = e->next-caller?



> +    {
> +      tree resolver_decl;
> +      tree idecl;
> +      tree decl;
> +      gimple *call = e->call_stmt;
> +      struct cgraph_node *inode;
> +
> +      /* Checking if call of function is call of versioned function.
> +	 Versioned function are not inlined, so there is no need to
> +	 check for inline.  */
This comment doesn't parse well.  Perhaps:

/* Check if this is a call to a versioned function.  Verisoned
    fucntions are not inlined, so there is no need to check for that.  */


> +
> +/* If the function in NODE has multiple target attribute with multiple fields,
> +   create the appropriate clone for each field.  */
> +
> +static bool
> +expand_target_clones (struct cgraph_node *node)
So this is probably getting a little large.  Can we look to refactor it 
a little?  It's not a huge deal and there's certainly code in GCC that 
is far worse, but it just feels like there's enough going on in this 
code that there ought to be 2-3 helpers for the larger chunks of work 
going on inside this code.

The first is the attribute parsing.  The second is creation of the 
nodes, and the final would be munging the name and attributes on the clones.

I'm slightly concerned with using the pretty printer to build up the new 
name.  Is there precedent for this anywhere else in GCC?

When creating the munged name, don't you also have to make sure that 
other symbols that aren't supported for names don't sneak through?  I 
see that you replace = and -, but you'd need to replace any symbol that 
could possibly be used in an option, but which isn't allowed in a 
function name at the assembler level.  I'd be worried about anything 
that might possibly be seen as an operator by the assembler, '.', and 
possibly others.

Also, please double check indentation of that function.  In particular 
the code after the if-then-else (node->definition) looks like it's 
indented too far.

I think this is pretty close, but does need another iteration.  Now that 
we've moved past the interactions with IPA question and have fairly 
carefully looked at the rest of the patch, I don't think subsequent 
reviews will take much time at all.

jeff

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] New attribute to create target clones
  2015-10-08 19:00           ` Jeff Law
@ 2015-10-08 19:23             ` Jan Hubicka
  2015-10-08 19:53               ` Jeff Law
  2015-10-08 20:01             ` Evgeny Stupachenko
  1 sibling, 1 reply; 36+ messages in thread
From: Jan Hubicka @ 2015-10-08 19:23 UTC (permalink / raw)
  To: Jeff Law
  Cc: Evgeny Stupachenko, Bernd Schmidt, Bernd Schmidt, GCC Patches,
	Jan Hubicka

> Sorry this has taken so long to come back to...  As I mentioned a
> couple months ago, I'd hoped Jan would chime in on the IPA/symtab
> requirements.  But that didn't happen.

Sorry for that.  I had bit too many real life things this summer
and I am still trying to catch up.
> 
> 
> SO I went back and reviewed the discussion between Jan, Ilya &
> myself WRT some of the rules around aliases, clones, etc.  I think
> the key question for this patch is whether or not the clones have
> the same assembler name or not.  From looking at

Ilya's code seems different from what this patch does. Ilya simply needs
multiple declarations for one physical assembler name (this is not an alias).
This is not currently supported by symtab (support for that was removed long
time ago as part of the one decl project) and I have some perliminary patches
to push out, but since they add basic sanity checking that the different
declarations of the same thing looks compatible I get too many positives I need
to walk through.  Those seems real bugs in glibc (which uses duplicated decls
for checking) and the pointer bounds code.

> expand_target_clones, I'm confident the answer is the clones have
> different assembler names.  In fact, the assembler names are munged

Yes, here you have different names for different variants of the function
body. Basically this pass takes ctarget attribute and creates bunch of verisons
of the functions and assigns them the proper target attributes, right?

One thing I am confused about is why this does not happen early?
What happens to inlines of functions with specific taret requirements? I.e.
if I compile with AVX enabled have function body with AVX code but then, at
late compilation, force a clone with -mno-avx?

I would expect cloning to happen early, perhaps even before early optimizations...
Switching random target flags mid optimization queue seems dangerous.

As for the patch itself:
+      if (node->definition)
+	{
+	  if (!node->has_gimple_body_p ())
+	    return false;
+	  node->get_body ();
+
+	  /* Replacing initial function with clone only for 1st ctarget.  */
+	  new_node = node->create_version_clone_with_body (vNULL, NULL,
+							   NULL, false,
+							   NULL, NULL,
+							   "target_clone");
+	  new_node->externally_visible = node->externally_visible;
+	  new_node->address_taken = node->address_taken;
+	  new_node->thunk = node->thunk;
+	  new_node->alias = node->alias;
+	  new_node->weakref = node->weakref;
+	  new_node->cpp_implicit_alias = node->cpp_implicit_alias;
+	  new_node->local.local = node->local.local;
+	  TREE_PUBLIC (new_node->decl) = TREE_PUBLIC (node->decl);
+	}
Since you test it has gimple body, then you don't need to worry about
alias/thunk/weakrefs/implicit_alias properties. Those will be never set.
How does the dispatcher look like?  Can the function be considered local
in a sense that one can change calling conventions of one clone but not another?

On the other hand, node->create_version_clone_with_body creates a function
that is local, if we want to create globally visible clones, I think we should
add a parameter there and avoid the localization followed by unlocalization.
(I know we already have too many parameters here, perhaps we could add a flags
parameter that will also handle the existing skip_return flag.)
For exmaple, I think you want to have all functions with same WEAK attributes
or in the same COMDAT group.

Also when you are copying a function, you probably want to copy the associated
thunks and version them, too?

+      id = get_identifier (new_asm_name);
+      symtab->change_decl_assembler_name (new_node->decl, id);

here I think you can just pass new_asm_name as clone_name. I.e. replace
the current use of "target_clone" string.

+      targetm.target_option.valid_attribute_p (new_node->decl, NULL,
+					       TREE_VALUE (attributes), 0);

looks like return value should not be ignored. If attribute is not valid,
we need to error and do something sane.

Honza

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] New attribute to create target clones
  2015-10-08 19:23             ` Jan Hubicka
@ 2015-10-08 19:53               ` Jeff Law
  2015-10-08 21:36                 ` Jan Hubicka
  0 siblings, 1 reply; 36+ messages in thread
From: Jeff Law @ 2015-10-08 19:53 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Evgeny Stupachenko, Bernd Schmidt, Bernd Schmidt, GCC Patches

On 10/08/2015 01:23 PM, Jan Hubicka wrote:
>> Sorry this has taken so long to come back to...  As I mentioned a
>> couple months ago, I'd hoped Jan would chime in on the IPA/symtab
>> requirements.  But that didn't happen.
>
> Sorry for that.  I had bit too many real life things this summer
> and I am still trying to catch up.
It happens to us all :-)  No worries.

>
> Ilya's code seems different from what this patch does. Ilya simply needs
> multiple declarations for one physical assembler name (this is not an alias).
> This is not currently supported by symtab (support for that was removed long
> time ago as part of the one decl project) and I have some perliminary patches
> to push out, but since they add basic sanity checking that the different
> declarations of the same thing looks compatible I get too many positives I need
> to walk through.  Those seems real bugs in glibc (which uses duplicated decls
> for checking) and the pointer bounds code.
Right.  I was mostly concerned because I goof'd letting those bits from 
Ilya in and didn't want to repeat the mistake again.  It was pretty 
clear once I found the thread between the tree of us that Evgeny's 
patches were doing something rather different and didn't violate the 
assumptions of the symtab code.

>
> Yes, here you have different names for different variants of the function
> body. Basically this pass takes ctarget attribute and creates bunch of verisons
> of the functions and assigns them the proper target attributes, right?
Right.  Given a single function in the source tree with the new 
attribute, we'll create clones and compile each clone with a different 
set of options.  It's got a lot of similarities to the multi-versioning 
code.  The key difference is with multi-versioning, you actually have a 
different source level implementation for each target while Evgeny's 
stuff has a single source implementation.

>
> One thing I am confused about is why this does not happen early?
> What happens to inlines of functions with specific taret requirements? I.e.
> if I compile with AVX enabled have function body with AVX code but then, at
> late compilation, force a clone with -mno-avx?
>
> I would expect cloning to happen early, perhaps even before early optimizations...
> Switching random target flags mid optimization queue seems dangerous.
These shouldn't be inlined to the best of my knowledge.  We go back and 
direct all callers to a dispatcher.  Inlining them would be a mistake 
since the goal here is to specialize the clones around target 
capabilities.  Thus if something got inlined, then we lose that ability.


>
> As for the patch itself:
> +      if (node->definition)
> +	{
> +	  if (!node->has_gimple_body_p ())
> +	    return false;
> +	  node->get_body ();
> +
> +	  /* Replacing initial function with clone only for 1st ctarget.  */
> +	  new_node = node->create_version_clone_with_body (vNULL, NULL,
> +							   NULL, false,
> +							   NULL, NULL,
> +							   "target_clone");
> +	  new_node->externally_visible = node->externally_visible;
> +	  new_node->address_taken = node->address_taken;
> +	  new_node->thunk = node->thunk;
> +	  new_node->alias = node->alias;
> +	  new_node->weakref = node->weakref;
> +	  new_node->cpp_implicit_alias = node->cpp_implicit_alias;
> +	  new_node->local.local = node->local.local;
> +	  TREE_PUBLIC (new_node->decl) = TREE_PUBLIC (node->decl);
> +	}
> Since you test it has gimple body, then you don't need to worry about
> alias/thunk/weakrefs/implicit_alias properties. Those will be never set.
> How does the dispatcher look like?  Can the function be considered local
> in a sense that one can change calling conventions of one clone but not another?
Easiest to think of them as having the same abilities as multi-versioning.


Since we're going through a dispatcher I don't think we're allowed to 
change the ABI across the clones.


>
> Also when you are copying a function, you probably want to copy the associated
> thunks and version them, too?
Dunno on that.

Jeff

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] New attribute to create target clones
  2015-10-08 19:00           ` Jeff Law
  2015-10-08 19:23             ` Jan Hubicka
@ 2015-10-08 20:01             ` Evgeny Stupachenko
  2015-10-09  4:05               ` Jeff Law
  1 sibling, 1 reply; 36+ messages in thread
From: Evgeny Stupachenko @ 2015-10-08 20:01 UTC (permalink / raw)
  To: Jeff Law; +Cc: Bernd Schmidt, Bernd Schmidt, GCC Patches, Jan Hubicka

On Thu, Oct 8, 2015 at 10:00 PM, Jeff Law <law@redhat.com> wrote:
> On 09/24/2015 04:28 PM, Evgeny Stupachenko wrote:
>>
>> I've fixed ICE and review issues.
>> x86 make check and bootstrap passed.
>>
>> Thanks,
>> Evgeny
>>
>> ChangeLog
>>
>> 2015-09-25  Evgeny Stupachenko<evstupac@gmail.com>
>>
>> gcc/
>>          * Makefile.in (OBJS): Add multiple_target.o.
>>          * multiple_target.c (make_attribute): New.
>>          (create_dispatcher_calls): Ditto.
>>          (expand_target_clones): Ditto.
>>          (ipa_target_clone): Ditto.
>>          * passes.def (pass_target_clone): New ipa pass.
>>          * tree-pass.h (make_pass_target_clone): Ditto.
>>
>> gcc/c-family
>>          * c-common.c (handle_target_clones_attribute): New.
>>          * (c_common_attribute_table): Add handle_target_clones_attribute.
>>          * (handle_always_inline_attribute): Add check on target_clones
>>          attribute.
>>          * (handle_target_attribute): Ditto.
>>
>> gcc/testsuite
>>          * gcc.dg/mvc1.c: New test for multiple targets cloning.
>>          * gcc.dg/mvc2.c: Ditto.
>>          * gcc.dg/mvc3.c: Ditto.
>>          * gcc.dg/mvc4.c: Ditto.
>>          * gcc.dg/mvc5.c: Ditto.
>>          * gcc.dg/mvc6.c: Ditto.
>>          * gcc.dg/mvc7.c: Ditto.
>>          * g++.dg/ext/mvc1.C: Ditto.
>>          * g++.dg/ext/mvc2.C: Ditto.
>>          * g++.dg/ext/mvc3.C: Ditto.
>>
>> gcc/doc
>>          * doc/extend.texi (target_clones): New attribute description.
>>
>>
>
>>
>> target_clones.patch
>
> Sorry this has taken so long to come back to...  As I mentioned a couple
> months ago, I'd hoped Jan would chime in on the IPA/symtab requirements.
> But that didn't happen.
>
>
> SO I went back and reviewed the discussion between Jan, Ilya & myself WRT
> some of the rules around aliases, clones, etc.  I think the key question for
> this patch is whether or not the clones have the same assembler name or not.
> From looking at expand_target_clones, I'm confident the answer is the clones
> have different assembler names.  In fact, the assembler names are munged
> with the options used for that specific clone of the original function.
>
>
>>
>> +/* Makes a function attribute of the form NAME(ARG_NAME) and chains
>> +   it to CHAIN.  */
>> +
>> +static tree
>> +make_attribute (const char *name, const char *arg_name, tree chain)
>> +{
>> +  tree attr_name;
>> +  tree attr_arg_name;
>> +  tree attr_args;
>> +  tree attr;
>> +
>> +  attr_name = get_identifier (name);
>> +  attr_arg_name = build_string (strlen (arg_name), arg_name);
>> +  attr_args = tree_cons (NULL_TREE, attr_arg_name, NULL_TREE);
>> +  attr = tree_cons (attr_name, attr_args, chain);
>> +  return attr;
>> +}
>
> This seems generic enough that I'd prefer it in attribs.c.  I was rather
> surprised when I looked and didn't find an existing routine to do this.
>
>
>> +
>> +/* If the call in NODE has multiple target attribute with multiple
>> fields,
>> +   replace it with dispatcher call and create dispatcher (once).  */
>> +
>> +static void
>> +create_dispatcher_calls (struct cgraph_node *node)
>> +{
>> +  cgraph_edge *e;
>> +  cgraph_edge *e_next;
>> +  for (e = node->callers; e ;e = (e == NULL) ? e_next : e->next_caller)
>
> That's a rather strange way to write the loop increment.  If I follow the
> loop logic correctly, it seems that we always end up using e->next_caller,
> it's just obscured.
>
> For the test if we're calling a versioned function, we just "continue".  e
> will be non-null and thus we use e->next_caller to set e for the next
> iteration.
>
> If the test for calling a versioned function falls, we set e_next to
> e->next_caller, then later set e to NULL.  That results in using e_next to
> set e for the next iteration.  But e_next was initialized to e->next_caller.
>
> So why not just write the loop increment as e = e->next-caller?

Because of this:
      e->redirect_callee (inode);
It modifies next_caller field.
The other way is to remember all what we want to redirect and create
one more loop through the modifications to apply them.

>
>
>
>> +    {
>> +      tree resolver_decl;
>> +      tree idecl;
>> +      tree decl;
>> +      gimple *call = e->call_stmt;
>> +      struct cgraph_node *inode;
>> +
>> +      /* Checking if call of function is call of versioned function.
>> +        Versioned function are not inlined, so there is no need to
>> +        check for inline.  */
>
> This comment doesn't parse well.  Perhaps:
>
> /* Check if this is a call to a versioned function.  Verisoned
>    fucntions are not inlined, so there is no need to check for that.  */
>
>
>> +
>> +/* If the function in NODE has multiple target attribute with multiple
>> fields,
>> +   create the appropriate clone for each field.  */
>> +
>> +static bool
>> +expand_target_clones (struct cgraph_node *node)
>
> So this is probably getting a little large.  Can we look to refactor it a
> little?  It's not a huge deal and there's certainly code in GCC that is far
> worse, but it just feels like there's enough going on in this code that
> there ought to be 2-3 helpers for the larger chunks of work going on inside
> this code.
>
> The first is the attribute parsing.  The second is creation of the nodes,
> and the final would be munging the name and attributes on the clones.
>
> I'm slightly concerned with using the pretty printer to build up the new
> name.  Is there precedent for this anywhere else in GCC?
I don't remember where it exactly came from. However it's not a big
deal to simplify this to std functions.
>
> When creating the munged name, don't you also have to make sure that other
> symbols that aren't supported for names don't sneak through?  I see that you
> replace = and -, but you'd need to replace any symbol that could possibly be
> used in an option, but which isn't allowed in a function name at the
> assembler level.  I'd be worried about anything that might possibly be seen
> as an operator by the assembler, '.', and possibly others.
This restriction comes from "targetm.target_option.valid_attribute_p"
and it is the same for current implementation of function
multiversioning.
It exits with error: "attribute(target("...")) is unknown".
It looks reasonable to put the check before symtab changes.
>
> Also, please double check indentation of that function.  In particular the
> code after the if-then-else (node->definition) looks like it's indented too
> far.
>
> I think this is pretty close, but does need another iteration.  Now that
> we've moved past the interactions with IPA question and have fairly
> carefully looked at the rest of the patch, I don't think subsequent reviews
> will take much time at all.
>
> jeff

Thanks for the review,
Evgeny

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] New attribute to create target clones
  2015-10-08 19:53               ` Jeff Law
@ 2015-10-08 21:36                 ` Jan Hubicka
  2015-10-09 17:45                   ` Jeff Law
  0 siblings, 1 reply; 36+ messages in thread
From: Jan Hubicka @ 2015-10-08 21:36 UTC (permalink / raw)
  To: Jeff Law
  Cc: Jan Hubicka, Evgeny Stupachenko, Bernd Schmidt, Bernd Schmidt,
	GCC Patches

> >
> >Yes, here you have different names for different variants of the function
> >body. Basically this pass takes ctarget attribute and creates bunch of verisons
> >of the functions and assigns them the proper target attributes, right?
> Right.  Given a single function in the source tree with the new
> attribute, we'll create clones and compile each clone with a
> different set of options.  It's got a lot of similarities to the
> multi-versioning code.  The key difference is with multi-versioning,
> you actually have a different source level implementation for each
> target while Evgeny's stuff has a single source implementation.
> 
> >
> >One thing I am confused about is why this does not happen early?
> >What happens to inlines of functions with specific taret requirements? I.e.
> >if I compile with AVX enabled have function body with AVX code but then, at
> >late compilation, force a clone with -mno-avx?
> >
> >I would expect cloning to happen early, perhaps even before early optimizations...
> >Switching random target flags mid optimization queue seems dangerous.
> These shouldn't be inlined to the best of my knowledge.  We go back
> and direct all callers to a dispatcher.  Inlining them would be a
> mistake since the goal here is to specialize the clones around
> target capabilities.  Thus if something got inlined, then we lose
> that ability.

OK, I assume that every multiversioned function will end up in something
like this:

foo_ver1() target(...)
{
}
foo_ver2() target(...)
{
}
foo()
{
  dispatch either to foo_ver1() or foo_ver2()
}

I wonder why foo_ver1/foo_ver2 needs ever become public?  If there is only way
to call them via dispatcher, then the code setting TREE_PUBLIC seems wrong.  If
there is direct way to call them, then inlinng is possible.

Of course it also depends what you inline into function. You can have

bar() target(-mavx) {fancy avx code}
foobar() { ...... if (avx) bar();}
foo() ctarget(-mavx,-mno-avx) {....foobar();....}

Now if you compile with -mavx and because ctarget takes effect only after inlining,
at inlining time the target attributes will match and we can edn up inline bar->foobar->foo.
After that we multiversion foo and drop AVX flag we will likely get ICE at expansion
time.

> 
> 
> Since we're going through a dispatcher I don't think we're allowed
> to change the ABI across the clones.

If the dispatcher was something like

switch(value)
{
  case 1: foo_ver1(param); break;
  case 2: foo_ver2(param); break;
}
then it would be possible to change ABI if we can prove that param=0 in ipa-cp.
If the dispatcher uses indirect calls, then we probably want to consider
foo_ver* as having address taken and thus not local.

Honza
> 
> 
> >
> >Also when you are copying a function, you probably want to copy the associated
> >thunks and version them, too?
> Dunno on that.
> 
> Jeff

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] New attribute to create target clones
  2015-10-08 20:01             ` Evgeny Stupachenko
@ 2015-10-09  4:05               ` Jeff Law
  0 siblings, 0 replies; 36+ messages in thread
From: Jeff Law @ 2015-10-09  4:05 UTC (permalink / raw)
  To: Evgeny Stupachenko; +Cc: Bernd Schmidt, Bernd Schmidt, GCC Patches, Jan Hubicka

On 10/08/2015 02:01 PM, Evgeny Stupachenko wrote:
> On Thu, Oct 8, 2015 at 10:00 PM, Jeff Law <law@redhat.com> wrote:
>> On 09/24/2015 04:28 PM, Evgeny Stupachenko wrote:
>>>
>>> I've fixed ICE and review issues.
>>> x86 make check and bootstrap passed.
>>>
>>> Thanks,
>>> Evgeny
>>>
>>> ChangeLog
>>>
>>> 2015-09-25  Evgeny Stupachenko<evstupac@gmail.com>
>>>
>>> gcc/
>>>           * Makefile.in (OBJS): Add multiple_target.o.
>>>           * multiple_target.c (make_attribute): New.
>>>           (create_dispatcher_calls): Ditto.
>>>           (expand_target_clones): Ditto.
>>>           (ipa_target_clone): Ditto.
>>>           * passes.def (pass_target_clone): New ipa pass.
>>>           * tree-pass.h (make_pass_target_clone): Ditto.
>>>
>>> gcc/c-family
>>>           * c-common.c (handle_target_clones_attribute): New.
>>>           * (c_common_attribute_table): Add handle_target_clones_attribute.
>>>           * (handle_always_inline_attribute): Add check on target_clones
>>>           attribute.
>>>           * (handle_target_attribute): Ditto.
>>>
>>> gcc/testsuite
>>>           * gcc.dg/mvc1.c: New test for multiple targets cloning.
>>>           * gcc.dg/mvc2.c: Ditto.
>>>           * gcc.dg/mvc3.c: Ditto.
>>>           * gcc.dg/mvc4.c: Ditto.
>>>           * gcc.dg/mvc5.c: Ditto.
>>>           * gcc.dg/mvc6.c: Ditto.
>>>           * gcc.dg/mvc7.c: Ditto.
>>>           * g++.dg/ext/mvc1.C: Ditto.
>>>           * g++.dg/ext/mvc2.C: Ditto.
>>>           * g++.dg/ext/mvc3.C: Ditto.
>>>
>>> gcc/doc
>>>           * doc/extend.texi (target_clones): New attribute description.
>>>
>>>
>>
>>>
>>> target_clones.patch
>>
>> Sorry this has taken so long to come back to...  As I mentioned a couple
>> months ago, I'd hoped Jan would chime in on the IPA/symtab requirements.
>> But that didn't happen.
>>
>>
>> SO I went back and reviewed the discussion between Jan, Ilya & myself WRT
>> some of the rules around aliases, clones, etc.  I think the key question for
>> this patch is whether or not the clones have the same assembler name or not.
>>  From looking at expand_target_clones, I'm confident the answer is the clones
>> have different assembler names.  In fact, the assembler names are munged
>> with the options used for that specific clone of the original function.
>>
>>
>>>
>>> +/* Makes a function attribute of the form NAME(ARG_NAME) and chains
>>> +   it to CHAIN.  */
>>> +
>>> +static tree
>>> +make_attribute (const char *name, const char *arg_name, tree chain)
>>> +{
>>> +  tree attr_name;
>>> +  tree attr_arg_name;
>>> +  tree attr_args;
>>> +  tree attr;
>>> +
>>> +  attr_name = get_identifier (name);
>>> +  attr_arg_name = build_string (strlen (arg_name), arg_name);
>>> +  attr_args = tree_cons (NULL_TREE, attr_arg_name, NULL_TREE);
>>> +  attr = tree_cons (attr_name, attr_args, chain);
>>> +  return attr;
>>> +}
>>
>> This seems generic enough that I'd prefer it in attribs.c.  I was rather
>> surprised when I looked and didn't find an existing routine to do this.
>>
>>
>>> +
>>> +/* If the call in NODE has multiple target attribute with multiple
>>> fields,
>>> +   replace it with dispatcher call and create dispatcher (once).  */
>>> +
>>> +static void
>>> +create_dispatcher_calls (struct cgraph_node *node)
>>> +{
>>> +  cgraph_edge *e;
>>> +  cgraph_edge *e_next;
>>> +  for (e = node->callers; e ;e = (e == NULL) ? e_next : e->next_caller)
>>
>> That's a rather strange way to write the loop increment.  If I follow the
>> loop logic correctly, it seems that we always end up using e->next_caller,
>> it's just obscured.
>>
>> For the test if we're calling a versioned function, we just "continue".  e
>> will be non-null and thus we use e->next_caller to set e for the next
>> iteration.
>>
>> If the test for calling a versioned function falls, we set e_next to
>> e->next_caller, then later set e to NULL.  That results in using e_next to
>> set e for the next iteration.  But e_next was initialized to e->next_caller.
>>
>> So why not just write the loop increment as e = e->next-caller?
>
> Because of this:
>        e->redirect_callee (inode);
> It modifies next_caller field.
> The other way is to remember all what we want to redirect and create
> one more loop through the modifications to apply them.
Seems like a comment would be wise.

>>>
>> I'm slightly concerned with using the pretty printer to build up the new
>> name.  Is there precedent for this anywhere else in GCC?
> I don't remember where it exactly came from. However it's not a big
> deal to simplify this to std functions.
Thanks.  Not sure why, but I'd appreciate that change.


>>
>> When creating the munged name, don't you also have to make sure that other
>> symbols that aren't supported for names don't sneak through?  I see that you
>> replace = and -, but you'd need to replace any symbol that could possibly be
>> used in an option, but which isn't allowed in a function name at the
>> assembler level.  I'd be worried about anything that might possibly be seen
>> as an operator by the assembler, '.', and possibly others.
> This restriction comes from "targetm.target_option.valid_attribute_p"
> and it is the same for current implementation of function
> multiversioning.
> It exits with error: "attribute(target("...")) is unknown".
> It looks reasonable to put the check before symtab changes.
Right, but there's nothing inherently that says that a option couldn't 
have other operators such as '+' in the option string.  So I'm concerned 
that if this code were used on a target that had such an option that 
we'd end up generating invalid assembly.

I think all you have to do is map a fuller set of invalid characters to 
a valid character.



Jeff

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] New attribute to create target clones
  2015-10-08 21:36                 ` Jan Hubicka
@ 2015-10-09 17:45                   ` Jeff Law
  2015-10-09 18:27                     ` Jan Hubicka
  0 siblings, 1 reply; 36+ messages in thread
From: Jeff Law @ 2015-10-09 17:45 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Evgeny Stupachenko, Bernd Schmidt, Bernd Schmidt, GCC Patches

On 10/08/2015 03:36 PM, Jan Hubicka wrote:
>>>
>>> Yes, here you have different names for different variants of the function
>>> body. Basically this pass takes ctarget attribute and creates bunch of verisons
>>> of the functions and assigns them the proper target attributes, right?
>> Right.  Given a single function in the source tree with the new
>> attribute, we'll create clones and compile each clone with a
>> different set of options.  It's got a lot of similarities to the
>> multi-versioning code.  The key difference is with multi-versioning,
>> you actually have a different source level implementation for each
>> target while Evgeny's stuff has a single source implementation.
>>
>>>
>>> One thing I am confused about is why this does not happen early?
>>> What happens to inlines of functions with specific taret requirements? I.e.
>>> if I compile with AVX enabled have function body with AVX code but then, at
>>> late compilation, force a clone with -mno-avx?
>>>
>>> I would expect cloning to happen early, perhaps even before early optimizations...
>>> Switching random target flags mid optimization queue seems dangerous.
>> These shouldn't be inlined to the best of my knowledge.  We go back
>> and direct all callers to a dispatcher.  Inlining them would be a
>> mistake since the goal here is to specialize the clones around
>> target capabilities.  Thus if something got inlined, then we lose
>> that ability.
>
> OK, I assume that every multiversioned function will end up in something
> like this:
>
> foo_ver1() target(...)
> {
> }
> foo_ver2() target(...)
> {
> }
> foo()
> {
>    dispatch either to foo_ver1() or foo_ver2()
> }
Yes.

>
> I wonder why foo_ver1/foo_ver2 needs ever become public?  If there is only way
> to call them via dispatcher, then the code setting TREE_PUBLIC seems wrong.  If
> there is direct way to call them, then inlinng is possible.
I can't offhand think of a good reason why they should be public.  In 
fact, it would defeat the purpose of this feature to begin with if those 
symbols were directly reachable.  So, yes, setting TREE_PUBLIC seems wrong.


>
> Of course it also depends what you inline into function. You can have
>
> bar() target(-mavx) {fancy avx code}
> foobar() { ...... if (avx) bar();}
> foo() ctarget(-mavx,-mno-avx) {....foobar();....}
>
> Now if you compile with -mavx and because ctarget takes effect only after inlining,
> at inlining time the target attributes will match and we can edn up inline bar->foobar->foo.
> After that we multiversion foo and drop AVX flag we will likely get ICE at expansion
> time.
But isn't that avoided by fixing up the call graph so that all calls to 
the affected function are going through the dispatcher?  Or is that 
happening too late?

>
>>
>>
>> Since we're going through a dispatcher I don't think we're allowed
>> to change the ABI across the clones.
>
> If the dispatcher was something like
>
> switch(value)
> {
>    case 1: foo_ver1(param); break;
>    case 2: foo_ver2(param); break;
> }
> then it would be possible to change ABI if we can prove that param=0 in ipa-cp.
> If the dispatcher uses indirect calls, then we probably want to consider
> foo_ver* as having address taken and thus not local.
Yea, I guess that could work.  For some reason I kept thinking about 
indirect calls, but that shouldn't be the case here.

That argues further that setting TREE_PUBLIC on the target specific 
implementations is wrong.


Jeff

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] New attribute to create target clones
  2015-10-09 17:45                   ` Jeff Law
@ 2015-10-09 18:27                     ` Jan Hubicka
  2015-10-09 19:57                       ` Evgeny Stupachenko
  0 siblings, 1 reply; 36+ messages in thread
From: Jan Hubicka @ 2015-10-09 18:27 UTC (permalink / raw)
  To: Jeff Law
  Cc: Jan Hubicka, Evgeny Stupachenko, Bernd Schmidt, Bernd Schmidt,
	GCC Patches

> >Of course it also depends what you inline into function. You can have
> >
> >bar() target(-mavx) {fancy avx code}
> >foobar() { ...... if (avx) bar();}
> >foo() ctarget(-mavx,-mno-avx) {....foobar();....}
> >
> >Now if you compile with -mavx and because ctarget takes effect only after inlining,
> >at inlining time the target attributes will match and we can edn up inline bar->foobar->foo.
> >After that we multiversion foo and drop AVX flag we will likely get ICE at expansion
> >time.
> But isn't that avoided by fixing up the call graph so that all calls
> to the affected function are going through the dispatcher?  Or is
> that happening too late?

There is dispatcher only for foo that is the root of the callgarph tree.
When inlining we compare target attributes for match (in can_inline_edge_p).
We do not compare ctarget attributes.  Expanding ctarget to target early would
avoid need for ctarget handling.

> 
> >
> >>
> >>
> >>Since we're going through a dispatcher I don't think we're allowed
> >>to change the ABI across the clones.
> >
> >If the dispatcher was something like
> >
> >switch(value)
> >{
> >   case 1: foo_ver1(param); break;
> >   case 2: foo_ver2(param); break;
> >}
> >then it would be possible to change ABI if we can prove that param=0 in ipa-cp.
> >If the dispatcher uses indirect calls, then we probably want to consider
> >foo_ver* as having address taken and thus not local.
> Yea, I guess that could work.  For some reason I kept thinking about
> indirect calls, but that shouldn't be the case here.
> 
> That argues further that setting TREE_PUBLIC on the target specific
> implementations is wrong.

Yep, if the function doesn't need to be public then the whole block of code
tampering with visibility and other flags can go.  We probably also don't need
to care about giving clone some fixed assembler name as clonning will invent
.target_clone.XYZ name itself.

Honza
> 
> 
> Jeff

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] New attribute to create target clones
  2015-10-09 18:27                     ` Jan Hubicka
@ 2015-10-09 19:57                       ` Evgeny Stupachenko
  2015-10-09 20:04                         ` Jan Hubicka
  0 siblings, 1 reply; 36+ messages in thread
From: Evgeny Stupachenko @ 2015-10-09 19:57 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Jeff Law, Bernd Schmidt, Bernd Schmidt, GCC Patches

On Fri, Oct 9, 2015 at 9:27 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> >Of course it also depends what you inline into function. You can have
>> >
>> >bar() target(-mavx) {fancy avx code}
>> >foobar() { ...... if (avx) bar();}
>> >foo() ctarget(-mavx,-mno-avx) {....foobar();....}

"no-" targets are not supported

>> >
>> >Now if you compile with -mavx and because ctarget takes effect only after inlining,
>> >at inlining time the target attributes will match and we can edn up inline bar->foobar->foo.
>> >After that we multiversion foo and drop AVX flag we will likely get ICE at expansion
>> >time.
>> But isn't that avoided by fixing up the call graph so that all calls
>> to the affected function are going through the dispatcher?  Or is
>> that happening too late?
>
> There is dispatcher only for foo that is the root of the callgarph tree.
> When inlining we compare target attributes for match (in can_inline_edge_p).
> We do not compare ctarget attributes.  Expanding ctarget to target early would
> avoid need for ctarget handling.
Currently inlining is disabled for functions with target_clone attribute:

if call from versioned function is call of multiversion function:

foo.avx2()
{
  ...
  bar();
  ...
}

bar dispatcher is better than inlining:

foo.avx2()
{
  ...
  bar.ifunc();
  ...
}

That is better from my point of view as:
1. multiversioning is going to be used mostly for target specific
optimizations like vectorization, scheduling... For these cases
inlininng not giving much. One of good examples is moving hot
vectorizable loop into the function and add target_clones attribute.
2. Inlining algorithm when we have not equal chains of architectures
in target_clones attribute are complex:
__attribute__((target_clones("avx","arch=slm","arch=core-avx2","default")))
__attribute__((target_clones("ssse3","arch=atom","arch=slm","default")))
3. If for one version inline occurs, but for another not we can get
unexpected (from user point) performance difference

Anyway I'll add the test you've mentioned to the patch.

The other big question is how all this deal with OpenMP clauses.
For now both target and target_clones attributes disable #pragma omp
effect inserting scalar ifunc call into loop.

Thanks,
Evgeny

>
>>
>> >
>> >>
>> >>
>> >>Since we're going through a dispatcher I don't think we're allowed
>> >>to change the ABI across the clones.
>> >
>> >If the dispatcher was something like
>> >
>> >switch(value)
>> >{
>> >   case 1: foo_ver1(param); break;
>> >   case 2: foo_ver2(param); break;
>> >}
>> >then it would be possible to change ABI if we can prove that param=0 in ipa-cp.
>> >If the dispatcher uses indirect calls, then we probably want to consider
>> >foo_ver* as having address taken and thus not local.
>> Yea, I guess that could work.  For some reason I kept thinking about
>> indirect calls, but that shouldn't be the case here.
>>
>> That argues further that setting TREE_PUBLIC on the target specific
>> implementations is wrong.
>
> Yep, if the function doesn't need to be public then the whole block of code
> tampering with visibility and other flags can go.  We probably also don't need
> to care about giving clone some fixed assembler name as clonning will invent
> .target_clone.XYZ name itself.
>
> Honza
>>
>>
>> Jeff

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] New attribute to create target clones
  2015-10-09 19:57                       ` Evgeny Stupachenko
@ 2015-10-09 20:04                         ` Jan Hubicka
  2015-10-09 21:44                           ` Evgeny Stupachenko
  0 siblings, 1 reply; 36+ messages in thread
From: Jan Hubicka @ 2015-10-09 20:04 UTC (permalink / raw)
  To: Evgeny Stupachenko
  Cc: Jan Hubicka, Jeff Law, Bernd Schmidt, Bernd Schmidt, GCC Patches

> On Fri, Oct 9, 2015 at 9:27 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
> >> >Of course it also depends what you inline into function. You can have
> >> >
> >> >bar() target(-mavx) {fancy avx code}
> >> >foobar() { ...... if (avx) bar();}
> >> >foo() ctarget(-mavx,-mno-avx) {....foobar();....}
> 
> "no-" targets are not supported

Why not? I suppose I can use -march=x86_64 in a file compiled with -march=core-avx2 or something like that, too.
> 
> >> >
> >> >Now if you compile with -mavx and because ctarget takes effect only after inlining,
> >> >at inlining time the target attributes will match and we can edn up inline bar->foobar->foo.
> >> >After that we multiversion foo and drop AVX flag we will likely get ICE at expansion
> >> >time.
> >> But isn't that avoided by fixing up the call graph so that all calls
> >> to the affected function are going through the dispatcher?  Or is
> >> that happening too late?
> >
> > There is dispatcher only for foo that is the root of the callgarph tree.
> > When inlining we compare target attributes for match (in can_inline_edge_p).
> > We do not compare ctarget attributes.  Expanding ctarget to target early would
> > avoid need for ctarget handling.
> Currently inlining is disabled for functions with target_clone attribute:

Do you also disable inlining into functions with target_clone?  
What I am concerned about is early inliner inlining (say) AVX code into ctarget
function because at early inlining time the target is not applied, yet.

Honza

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] New attribute to create target clones
  2015-10-09 20:04                         ` Jan Hubicka
@ 2015-10-09 21:44                           ` Evgeny Stupachenko
  2015-10-12 23:35                             ` Evgeny Stupachenko
  0 siblings, 1 reply; 36+ messages in thread
From: Evgeny Stupachenko @ 2015-10-09 21:44 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Jeff Law, Bernd Schmidt, Bernd Schmidt, GCC Patches

On Fri, Oct 9, 2015 at 11:04 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> On Fri, Oct 9, 2015 at 9:27 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> >> >Of course it also depends what you inline into function. You can have
>> >> >
>> >> >bar() target(-mavx) {fancy avx code}
>> >> >foobar() { ...... if (avx) bar();}
>> >> >foo() ctarget(-mavx,-mno-avx) {....foobar();....}
>>
>> "no-" targets are not supported
>
> Why not? I suppose I can use -march=x86_64 in a file compiled with -march=core-avx2 or something like that, too.
Sure, you can. target(arch=x86-64) is ok. I mean exactly target(no-avx) returns:

aaa.cpp: In function '<built-in>':
aaa.cpp:7:5: error: No dispatcher found for no-avx
 int bar()
     ^

>>
>> >> >
>> >> >Now if you compile with -mavx and because ctarget takes effect only after inlining,
>> >> >at inlining time the target attributes will match and we can edn up inline bar->foobar->foo.
>> >> >After that we multiversion foo and drop AVX flag we will likely get ICE at expansion
>> >> >time.
>> >> But isn't that avoided by fixing up the call graph so that all calls
>> >> to the affected function are going through the dispatcher?  Or is
>> >> that happening too late?
>> >
>> > There is dispatcher only for foo that is the root of the callgarph tree.
>> > When inlining we compare target attributes for match (in can_inline_edge_p).
>> > We do not compare ctarget attributes.  Expanding ctarget to target early would
>> > avoid need for ctarget handling.
>> Currently inlining is disabled for functions with target_clone attribute:
>
> Do you also disable inlining into functions with target_clone?
> What I am concerned about is early inliner inlining (say) AVX code into ctarget
> function because at early inlining time the target is not applied, yet.
Right. Now I've got your point and ICE on the test.
Yes the solution is to disable inline into target_clones function.
Or to move the pass creating clones before inline (as you suggested)
and leave dispatcher creator after inline.

I like you suggestion. It fixes the ICE.
I'll fix the patch and retest.

Thank you for the review,
Evgeny.


>
> Honza

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] New attribute to create target clones
  2015-10-09 21:44                           ` Evgeny Stupachenko
@ 2015-10-12 23:35                             ` Evgeny Stupachenko
  2015-10-14 21:32                               ` Evgeny Stupachenko
  2015-10-26 15:59                               ` Jeff Law
  0 siblings, 2 replies; 36+ messages in thread
From: Evgeny Stupachenko @ 2015-10-12 23:35 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Jeff Law, Bernd Schmidt, Bernd Schmidt, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 4221 bytes --]

Hi All,

Here is a new version of patch (attached).
Bootstrap and make check are in progress (all new tests passed).

New test case g++.dg/ext/mvc4.C fails with ICE, when options lower
than "-mavx" are passed.
However it has the same behavior if "target_clones" attribute is
replaced by 2 corresponding "target" attributes.
I've filed PR67946 on this:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67946

Thanks,
Evgeny

ChangeLog:

2015-10-13  Evgeny Stupachenko  <evstupac@gmail.com>
gcc/
        * Makefile.in (OBJS): Add multiple_target.o.
        * attrib.c (make_attribute): Moved from config/i386/i386.c
        * config/i386/i386.c (make_attribute): Deleted.
        * multiple_target.c (make_attribute): New.
        (create_dispatcher_calls): Ditto.
        (get_attr_len): Ditto.
        (get_attr_str): Ditto.
        (is_valid_asm_symbol): Ditto.
        (create_new_asm_name): Ditto.
        (create_target_clone): Ditto.
        (expand_target_clones): Ditto.
        (ipa_target_clone): Ditto.
        (ipa_dispatcher_calls): Ditto.
        * passes.def (pass_target_clone): Two new ipa passes.
        * tree-pass.h (make_pass_target_clone): Ditto.

gcc/c-family
        * c-common.c (handle_target_clones_attribute): New.
        * (c_common_attribute_table): Add handle_target_clones_attribute.
        * (handle_always_inline_attribute): Add check on target_clones
        attribute.
        * (handle_target_attribute): Ditto.

gcc/testsuite
        * gcc.dg/mvc1.c: New test for multiple targets cloning.
        * gcc.dg/mvc2.c: Ditto.
        * gcc.dg/mvc3.c: Ditto.
        * gcc.dg/mvc4.c: Ditto.
        * gcc.dg/mvc5.c: Ditto.
        * gcc.dg/mvc6.c: Ditto.
        * gcc.dg/mvc7.c: Ditto.
        * g++.dg/ext/mvc1.C: Ditto.
        * g++.dg/ext/mvc2.C: Ditto.
        * g++.dg/ext/mvc3.C: Ditto.
        * g++.dg/ext/mvc4.C: Ditto.

gcc/doc
        * doc/extend.texi (target_clones): New attribute description.

On Sat, Oct 10, 2015 at 12:44 AM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
> On Fri, Oct 9, 2015 at 11:04 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>> On Fri, Oct 9, 2015 at 9:27 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>> >> >Of course it also depends what you inline into function. You can have
>>> >> >
>>> >> >bar() target(-mavx) {fancy avx code}
>>> >> >foobar() { ...... if (avx) bar();}
>>> >> >foo() ctarget(-mavx,-mno-avx) {....foobar();....}
>>>
>>> "no-" targets are not supported
>>
>> Why not? I suppose I can use -march=x86_64 in a file compiled with -march=core-avx2 or something like that, too.
> Sure, you can. target(arch=x86-64) is ok. I mean exactly target(no-avx) returns:
>
> aaa.cpp: In function '<built-in>':
> aaa.cpp:7:5: error: No dispatcher found for no-avx
>  int bar()
>      ^
>
>>>
>>> >> >
>>> >> >Now if you compile with -mavx and because ctarget takes effect only after inlining,
>>> >> >at inlining time the target attributes will match and we can edn up inline bar->foobar->foo.
>>> >> >After that we multiversion foo and drop AVX flag we will likely get ICE at expansion
>>> >> >time.
>>> >> But isn't that avoided by fixing up the call graph so that all calls
>>> >> to the affected function are going through the dispatcher?  Or is
>>> >> that happening too late?
>>> >
>>> > There is dispatcher only for foo that is the root of the callgarph tree.
>>> > When inlining we compare target attributes for match (in can_inline_edge_p).
>>> > We do not compare ctarget attributes.  Expanding ctarget to target early would
>>> > avoid need for ctarget handling.
>>> Currently inlining is disabled for functions with target_clone attribute:
>>
>> Do you also disable inlining into functions with target_clone?
>> What I am concerned about is early inliner inlining (say) AVX code into ctarget
>> function because at early inlining time the target is not applied, yet.
> Right. Now I've got your point and ICE on the test.
> Yes the solution is to disable inline into target_clones function.
> Or to move the pass creating clones before inline (as you suggested)
> and leave dispatcher creator after inline.
>
> I like you suggestion. It fixes the ICE.
> I'll fix the patch and retest.
>
> Thank you for the review,
> Evgeny.
>
>
>>
>> Honza

[-- Attachment #2: target_clones.patch --]
[-- Type: application/octet-stream, Size: 27179 bytes --]

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 009c745..5d1bd4e 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1348,6 +1348,7 @@ OBJS = \
 	mcf.o \
 	mode-switching.o \
 	modulo-sched.o \
+	multiple_target.o \
 	omp-low.o \
 	optabs.o \
 	optabs-libfuncs.o \
diff --git a/gcc/attribs.c b/gcc/attribs.c
index 6cbe011..9e70f04 100644
--- a/gcc/attribs.c
+++ b/gcc/attribs.c
@@ -681,3 +681,21 @@ apply_tm_attr (tree fndecl, tree attr)
 {
   decl_attributes (&TREE_TYPE (fndecl), tree_cons (attr, NULL, NULL), 0);
 }
+
+/* Makes a function attribute of the form NAME(ARG_NAME) and chains
+   it to CHAIN.  */
+
+tree
+make_attribute (const char *name, const char *arg_name, tree chain)
+{
+  tree attr_name;
+  tree attr_arg_name;
+  tree attr_args;
+  tree attr;
+
+  attr_name = get_identifier (name);
+  attr_arg_name = build_string (strlen (arg_name), arg_name);
+  attr_args = tree_cons (NULL_TREE, attr_arg_name, NULL_TREE);
+  attr = tree_cons (attr_name, attr_args, chain);
+  return attr;
+}
diff --git a/gcc/attribs.h b/gcc/attribs.h
index ac855d5..3c0be4f 100644
--- a/gcc/attribs.h
+++ b/gcc/attribs.h
@@ -36,5 +36,6 @@ extern tree decl_attributes (tree *, tree, int);
 extern bool cxx11_attribute_p (const_tree);
 extern tree get_attribute_name (const_tree);
 extern void apply_tm_attr (tree, tree);
+extern tree make_attribute (const char *, const char *, tree);
 
 #endif // GCC_ATTRIBS_H
diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 879f4db..b369e91 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -384,6 +384,7 @@ static tree handle_alloc_size_attribute (tree *, tree, tree, int, bool *);
 static tree handle_alloc_align_attribute (tree *, tree, tree, int, bool *);
 static tree handle_assume_aligned_attribute (tree *, tree, tree, int, bool *);
 static tree handle_target_attribute (tree *, tree, tree, int, bool *);
+static tree handle_target_clones_attribute (tree *, tree, tree, int, bool *);
 static tree handle_optimize_attribute (tree *, tree, tree, int, bool *);
 static tree ignore_attribute (tree *, tree, tree, int, bool *);
 static tree handle_no_split_stack_attribute (tree *, tree, tree, int, bool *);
@@ -791,6 +792,8 @@ const struct attribute_spec c_common_attribute_table[] =
 			      handle_error_attribute, false },
   { "target",                 1, -1, true, false, false,
 			      handle_target_attribute, false },
+  { "target_clones",          1, -1, true, false, false,
+			      handle_target_clones_attribute, false },
   { "optimize",               1, -1, true, false, false,
 			      handle_optimize_attribute, false },
   /* For internal use only.  The leading '*' both prevents its usage in
@@ -7366,6 +7369,12 @@ handle_always_inline_attribute (tree *node, tree name,
 		   "with %qs attribute", name, "noinline");
 	  *no_add_attrs = true;
 	}
+      else if (lookup_attribute ("target_clones", DECL_ATTRIBUTES (*node)))
+	{
+	  warning (OPT_Wattributes, "%qE attribute ignored due to conflict "
+		   "with %qs attribute", name, "target_clones");
+	  *no_add_attrs = true;
+	}
       else
 	/* Set the attribute and mark it for disregarding inline
 	   limits.  */
@@ -9777,6 +9786,12 @@ handle_target_attribute (tree *node, tree name, tree args, int flags,
       warning (OPT_Wattributes, "%qE attribute ignored", name);
       *no_add_attrs = true;
     }
+  else if (lookup_attribute ("target_clones", DECL_ATTRIBUTES (*node)))
+    {
+      warning (OPT_Wattributes, "%qE attribute ignored due to conflict "
+		   "with %qs attribute", name, "target_clones");
+      *no_add_attrs = true;
+    }
   else if (! targetm.target_option.valid_attribute_p (*node, name, args,
 						      flags))
     *no_add_attrs = true;
@@ -9784,6 +9799,39 @@ handle_target_attribute (tree *node, tree name, tree args, int flags,
   return NULL_TREE;
 }
 
+/* Handle a "target_clones" attribute.  */
+
+static tree
+handle_target_clones_attribute (tree *node, tree name, tree ARG_UNUSED (args),
+			  int ARG_UNUSED (flags), bool *no_add_attrs)
+{
+  /* Ensure we have a function type.  */
+  if (TREE_CODE (*node) == FUNCTION_DECL)
+    {
+      if (lookup_attribute ("always_inline", DECL_ATTRIBUTES (*node)))
+	{
+	  warning (OPT_Wattributes, "%qE attribute ignored due to conflict "
+		   "with %qs attribute", name, "always_inline");
+	  *no_add_attrs = true;
+	}
+      else if (lookup_attribute ("target", DECL_ATTRIBUTES (*node)))
+	{
+	  warning (OPT_Wattributes, "%qE attribute ignored due to conflict "
+		   "with %qs attribute", name, "target");
+	  *no_add_attrs = true;
+	}
+      else
+      /* Do not inline functions with multiple clone targets.  */
+	DECL_UNINLINABLE (*node) = 1;
+    }
+  else
+    {
+      warning (OPT_Wattributes, "%qE attribute ignored", name);
+      *no_add_attrs = true;
+    }
+  return NULL_TREE;
+}
+
 /* Arguments being collected for optimization.  */
 typedef const char *const_char_p;		/* For DEF_VEC_P.  */
 static GTY(()) vec<const_char_p, va_gc> *optimize_args;
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 8a26f68..ca85af1 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -35372,24 +35372,6 @@ ix86_get_function_versions_dispatcher (void *decl)
   return dispatch_decl;
 }
 
-/* Makes a function attribute of the form NAME(ARG_NAME) and chains
-   it to CHAIN.  */
-
-static tree
-make_attribute (const char *name, const char *arg_name, tree chain)
-{
-  tree attr_name;
-  tree attr_arg_name;
-  tree attr_args;
-  tree attr;
-
-  attr_name = get_identifier (name);
-  attr_arg_name = build_string (strlen (arg_name), arg_name);
-  attr_args = tree_cons (NULL_TREE, attr_arg_name, NULL_TREE);
-  attr = tree_cons (attr_name, attr_args, chain);
-  return attr;
-}
-
 /* Make the resolver function decl to dispatch the versions of
    a multi-versioned function,  DEFAULT_DECL.  Create an
    empty basic block in the resolver and store the pointer in
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 23e6a76..f9d28d1 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -3066,6 +3066,19 @@ This function attribute make a stack protection of the function if
 flags @option{fstack-protector} or @option{fstack-protector-strong}
 or @option{fstack-protector-explicit} are set.
 
+@item target_clones (@var{options})
+@cindex @code{target_clones} function attribute
+The @code{target_clones} attribute is used to specify that a function is to
+be cloned into multiple versions compiled with different target options
+than specified on the command line.  The supported options and restrictions
+are the same as for @code{target}.
+
+For instance on an x86, you could compile a function with
+@code{target_clones("sse4.1,avx")}. It will create 2 function clones,
+one compiled with @option{-msse4.1} and another with @option{-mavx}.
+At the function call it will create resolver @code{ifunc}, that will
+dynamically call a clone suitable for current architecture.
+
 @item target (@var{options})
 @cindex @code{target} function attribute
 Multiple target back ends implement the @code{target} attribute
diff --git a/gcc/multiple_target.c b/gcc/multiple_target.c
new file mode 100644
index 0000000..18dd648
--- /dev/null
+++ b/gcc/multiple_target.c
@@ -0,0 +1,448 @@
+/* Pass for parsing functions with multiple target attributes.
+
+   Contributed by Evgeny Stupachenko <evstupac@gmail.com>
+
+   Copyright (C) 2015 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "tree.h"
+#include "stringpool.h"
+#include "gimple.h"
+#include "diagnostic-core.h"
+#include "gimple-ssa.h"
+#include "cgraph.h"
+#include "tree-pass.h"
+#include "target.h"
+#include "attribs.h"
+#include "pretty-print.h"
+
+/* If the call in NODE has multiple target attribute with multiple fields,
+   replace it with dispatcher call and create dispatcher (once).  */
+
+static void
+create_dispatcher_calls (struct cgraph_node *node)
+{
+  cgraph_edge *e;
+  cgraph_edge *e_next;
+  /* We need to remember NEXT_CALLER as it could be modified in the loop.  */
+  for (e = node->callers; e ;e = (e == NULL) ? e_next : e->next_caller)
+    {
+      tree resolver_decl;
+      tree idecl;
+      tree decl;
+      gimple *call = e->call_stmt;
+      struct cgraph_node *inode;
+
+      /* Checking if call of function is call of versioned function.
+	 Versioned function are not inlined, so there is no need to
+	 check for inline.  */
+      if (!call
+	  || !(decl = gimple_call_fndecl (call))
+	  || !DECL_FUNCTION_VERSIONED (decl))
+	continue;
+
+      e_next = e->next_caller;
+      idecl = targetm.get_function_versions_dispatcher (decl);
+      if (!idecl)
+	{
+	  error_at (gimple_location (call),
+		    "default target_clones attribute was not set");
+	}
+      inode = cgraph_node::get (idecl);
+      gcc_assert (inode);
+      resolver_decl = targetm.generate_version_dispatcher_body (inode);
+
+      /* Update aliases.  */
+      inode->alias = true;
+      inode->alias_target = resolver_decl;
+      if (!inode->analyzed)
+	inode->resolve_alias (cgraph_node::get (resolver_decl));
+
+      e->redirect_callee (inode);
+      /*  Since REDIRECT_CALLEE modifies NEXT_CALLER field we move to
+	  previously set NEXT_CALLER.  */
+      e = NULL;
+    }
+}
+
+/* Return length of attribute names string,
+   if arglist chain >1, -1 otherwise.  */
+
+static int
+get_attr_len (tree arglist)
+{
+  tree arg;
+  int str_len_sum = 0;
+  int argnum = 1;
+  for (arg = arglist; arg; arg = TREE_CHAIN (arg))
+    {
+      unsigned int i;
+      const char *str = TREE_STRING_POINTER (TREE_VALUE (arg));
+      int len = strlen (str);
+      str_len_sum += len + 1;
+      if (arg != arglist)
+	argnum++;
+      for (i = 0; i < strlen (str); i++)
+	if (str[i] == ',')
+	  argnum++;
+    }
+  if (argnum == 1)
+    return -1;
+  return str_len_sum;
+}
+
+/* Create sting with attributes separated by comma.
+   Return number of attributes.  */
+
+static int
+get_attr_str (tree arglist, char *attr_str)
+{
+  tree arg;
+  size_t str_len_sum = 0;
+  int argnum = 0;
+  str_len_sum = 0;
+  for (arg = arglist; arg; arg = TREE_CHAIN (arg))
+    {
+      const char *str = TREE_STRING_POINTER (TREE_VALUE (arg));
+      size_t len = strlen (str);
+      memcpy (attr_str + str_len_sum, str, len);
+      attr_str[str_len_sum + len] = TREE_CHAIN (arg) ? ',' : '\0';
+      str_len_sum += len + 1;
+      argnum++;
+    }
+  return argnum;
+}
+
+/* Return number of attributes separated by comma and put them into ARGS.
+   If there is no DEFAULT attribute return -1.  */
+
+static int
+separate_attrs (char *attr_str, char **attrs)
+{
+  int i = 0;
+  bool has_default = false;
+  char *attr = strtok (attr_str, ",");
+  while (attr != NULL)
+    {
+      if (strcmp (attr, "default") == 0)
+	{
+	  has_default = true;
+	  attr = strtok (NULL, ",");
+	  continue;
+	}
+      attrs[i] = attr;
+      attr = strtok (NULL, ",");
+      i++;
+    }
+  if (!has_default)
+    return -1;
+  return i;
+}
+
+/*  Return true if symbol is valid in assembler name.  */
+
+static bool
+is_valid_asm_symbol (char c)
+{
+  if ('a' <= c && c <= 'z')
+    return true;
+  if ('A' <= c && c <= 'Z')
+    return true;
+  if ('0' <= c && c <= '9')
+    return true;
+  if (c == '_')
+    return true;
+  return false;
+}
+
+/* Create new assembler name for given old name and target name.
+   Replace all not valid assembler symbols with '_'.  */
+
+static void
+create_new_asm_name (const char *old_asm_name, char *attr, char *new_asm_name)
+{
+  int i;
+  int old_name_len = strlen (old_asm_name);
+  int attr_len = strlen (attr);
+  memcpy (new_asm_name, old_asm_name, old_name_len);
+
+  char *new_suffix = &new_asm_name[old_name_len + 1];
+  memcpy (new_suffix, attr, attr_len + 1);
+
+  /* Replace all not valid assembler symbols with '_'.  */
+  for (i = 0; i < attr_len; i++)
+    if (!is_valid_asm_symbol (new_suffix[i]))
+      new_suffix[i] = '_';
+
+  new_asm_name[old_name_len] = '.';
+}
+
+/*  Creates target clone of NODE.  */
+
+static cgraph_node *
+create_target_clone (cgraph_node *node, bool definition)
+{
+  cgraph_node *new_node;
+  if (definition)
+    {
+      new_node = node->create_version_clone_with_body (vNULL, NULL,
+    						       NULL, false,
+						       NULL, NULL,
+						       "target_clone");
+      new_node->externally_visible = node->externally_visible;
+      new_node->address_taken = node->address_taken;
+      new_node->thunk = node->thunk;
+      new_node->alias = node->alias;
+      new_node->weakref = node->weakref;
+      new_node->cpp_implicit_alias = node->cpp_implicit_alias;
+      new_node->local.local = node->local.local;
+    }
+  else
+    {
+      tree new_decl = copy_node (node->decl);
+      new_node = cgraph_node::get_create (new_decl);
+    }
+  return new_node;
+}
+
+/* If the function in NODE has multiple target attributes
+   create the appropriate clone for each valid target attribute.  */
+
+static bool
+expand_target_clones (struct cgraph_node *node, bool defenition)
+{
+  int i;
+  tree id;
+  /* Parsing target attributes separated by comma.  */
+  tree attr_target = lookup_attribute ("target_clones",
+				       DECL_ATTRIBUTES (node->decl));
+  /* No targets specified.  */
+  if (!attr_target)
+    return false;
+
+  tree arglist = TREE_VALUE (attr_target);
+  int attr_len = get_attr_len (arglist);
+
+  /* No need to clone for 1 target attribute.  */
+  if (attr_len == -1)
+    {
+      warning_at (DECL_SOURCE_LOCATION (node->decl),
+		  0,
+		  "single target_clones attribute is ignored");
+      return false;
+    }
+
+  char *attr_str = XNEWVEC (char, attr_len);
+  int attrnum = get_attr_str (arglist, attr_str);
+
+  char **attrs = XNEWVEC (char *, attrnum);
+  attrnum = separate_attrs (attr_str, attrs);
+
+  if (attrnum == -1)
+    {
+      error_at (DECL_SOURCE_LOCATION (node->decl),
+		"default target was not set");
+      return false;
+    }
+
+  const char *old_asm_name;
+  old_asm_name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (node->decl));
+
+  cgraph_function_version_info *decl1_v = NULL;
+  cgraph_function_version_info *decl2_v = NULL;
+  cgraph_function_version_info *before = NULL;
+  cgraph_function_version_info *after = NULL;
+  decl1_v = node->function_version ();
+  if (decl1_v == NULL)
+    decl1_v = node->insert_new_function_version ();
+  before = decl1_v;
+  DECL_FUNCTION_VERSIONED (node->decl) = 1;
+
+
+  for (i = 0; i < attrnum; i++)
+    {
+      char *attr = attrs[i];
+      cgraph_node *new_node = create_target_clone (node, defenition);
+      char *new_asm_name =
+	  XNEWVEC (char, strlen (old_asm_name) + strlen (attr) + 2);
+      create_new_asm_name (old_asm_name, attr, new_asm_name);
+
+      /* Setting new attribute to cloned function.  */
+      tree attributes = make_attribute ("target", attr,
+					DECL_ATTRIBUTES (new_node->decl));
+      DECL_ATTRIBUTES (new_node->decl) = attributes;
+      if (!targetm.target_option.valid_attribute_p (new_node->decl, NULL,
+						    TREE_VALUE (attributes), 0))
+	{
+	  warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+		      "attribute(target_clones(\"%s\")) is not "
+		      "valid for current target", attr);
+	  continue;
+	}
+
+      id = get_identifier (new_asm_name);
+      symtab->change_decl_assembler_name (new_node->decl, id);
+      XDELETEVEC (new_asm_name);
+
+      decl2_v = new_node->function_version ();
+      if (decl2_v != NULL)
+        continue;
+      decl2_v = new_node->insert_new_function_version ();
+
+      /* Chain decl2_v and decl1_v.  All semantically identical versions
+	 will be chained together.  */
+
+      after = decl2_v;
+      while (before->next != NULL)
+	before = before->next;
+      while (after->prev != NULL)
+	after = after->prev;
+
+      before->next = after;
+      after->prev = before;
+      DECL_FUNCTION_VERSIONED (new_node->decl) = 1;
+    }
+  /* Setting new attribute to initial function.  */
+  tree attributes =  make_attribute ("target", "default",
+				     DECL_ATTRIBUTES (node->decl));
+  DECL_ATTRIBUTES (node->decl) = attributes;
+  if (!targetm.target_option.valid_attribute_p (node->decl, NULL,
+						TREE_VALUE (attributes), 0))
+    {
+      error_at (DECL_SOURCE_LOCATION (node->decl),
+		"attribute(target_clones(\"default\")) is not "
+		"valid for current target");
+      return false;
+    }
+
+  XDELETEVEC (attrs);
+  XDELETEVEC (attr_str);
+  return true;
+}
+
+static bool target_clone_pass;
+
+static unsigned int
+ipa_target_clone (void)
+{
+  struct cgraph_node *node;
+  target_clone_pass = false;
+  FOR_EACH_FUNCTION (node)
+    if (node->definition)
+      target_clone_pass |= expand_target_clones (node, true);
+  return 0;
+}
+
+namespace {
+
+const pass_data pass_data_target_clone =
+{
+  SIMPLE_IPA_PASS,		/* type */
+  "targetclone",		/* name */
+  OPTGROUP_NONE,		/* optinfo_flags */
+  TV_NONE,			/* tv_id */
+  ( PROP_ssa | PROP_cfg ),	/* properties_required */
+  0,				/* properties_provided */
+  0,				/* properties_destroyed */
+  0,				/* todo_flags_start */
+  0				/* todo_flags_finish */
+};
+
+class pass_target_clone : public simple_ipa_opt_pass
+{
+public:
+  pass_target_clone (gcc::context *ctxt)
+    : simple_ipa_opt_pass (pass_data_target_clone, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *);
+  virtual unsigned int execute (function *) { return ipa_target_clone (); }
+};
+
+bool
+pass_target_clone::gate (function *)
+{
+  return true;
+}
+
+} // anon namespace
+
+simple_ipa_opt_pass *
+make_pass_target_clone (gcc::context *ctxt)
+{
+  return new pass_target_clone (ctxt);
+}
+
+static unsigned int
+ipa_dispatcher_calls (void)
+{
+  struct cgraph_node *node;
+  FOR_EACH_FUNCTION (node)
+    if (!node->definition)
+      target_clone_pass |= expand_target_clones (node, false);
+  if (target_clone_pass)
+    FOR_EACH_FUNCTION (node)
+      create_dispatcher_calls (node);
+  return 0;
+}
+
+namespace {
+
+const pass_data pass_data_dispatcher_calls =
+{
+  SIMPLE_IPA_PASS,		/* type */
+  "dispachercalls",		/* name */
+  OPTGROUP_NONE,		/* optinfo_flags */
+  TV_NONE,			/* tv_id */
+  ( PROP_ssa | PROP_cfg ),	/* properties_required */
+  0,				/* properties_provided */
+  0,				/* properties_destroyed */
+  0,				/* todo_flags_start */
+  0				/* todo_flags_finish */
+};
+
+class pass_dispatcher_calls : public simple_ipa_opt_pass
+{
+public:
+  pass_dispatcher_calls (gcc::context *ctxt)
+    : simple_ipa_opt_pass (pass_data_dispatcher_calls, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *);
+  virtual unsigned int execute (function *) { return ipa_dispatcher_calls (); }
+};
+
+bool
+pass_dispatcher_calls::gate (function *)
+{
+  return true;
+}
+
+} // anon namespace
+
+simple_ipa_opt_pass *
+make_pass_dispatcher_calls (gcc::context *ctxt)
+{
+  return new pass_dispatcher_calls (ctxt);
+}
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index cb00758..14fbf82 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -13783,6 +13783,7 @@ ipa_omp_simd_clone (void)
   struct cgraph_node *node;
   FOR_EACH_FUNCTION (node)
     expand_simd_clones (node);
+
   return 0;
 }
 
diff --git a/gcc/passes.def b/gcc/passes.def
index 64fc4d9..540177c 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -124,6 +124,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_ipa_devirt);
   NEXT_PASS (pass_ipa_cp);
   NEXT_PASS (pass_ipa_cdtor_merge);
+  NEXT_PASS (pass_target_clone);
   NEXT_PASS (pass_ipa_inline);
   NEXT_PASS (pass_ipa_pure_const);
   NEXT_PASS (pass_ipa_reference);
@@ -140,6 +141,7 @@ along with GCC; see the file COPYING3.  If not see
      compiled unit.  */
   INSERT_PASSES_AFTER (all_late_ipa_passes)
   NEXT_PASS (pass_ipa_pta);
+  NEXT_PASS (pass_dispatcher_calls);
   NEXT_PASS (pass_omp_simd_clone);
   TERMINATE_PASS_LIST ()
 
diff --git a/gcc/testsuite/g++.dg/ext/mvc1.C b/gcc/testsuite/g++.dg/ext/mvc1.C
new file mode 100644
index 0000000..fbf9011
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/mvc1.C
@@ -0,0 +1,34 @@
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
+
+__attribute__((target_clones("avx","arch=slm","arch=core-avx2","default")))
+int
+foo ()
+{
+  return -2;
+}
+
+__attribute__((target("avx","arch=core-avx2")))
+int
+bar ()
+{
+  return 2;
+}
+
+__attribute__((target("default")))
+int
+bar ()
+{
+  return 2;
+}
+
+int
+main ()
+{
+  int r = 0;
+  r += bar ();
+  r += foo ();
+  r += bar ();
+  r += foo ();
+  r += bar ();
+  return r - 2;
+}
diff --git a/gcc/testsuite/g++.dg/ext/mvc2.C b/gcc/testsuite/g++.dg/ext/mvc2.C
new file mode 100644
index 0000000..e7abab8
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/mvc2.C
@@ -0,0 +1,8 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+
+__attribute__((target_clones("avx","arch=slm","default")))
+__attribute__((target("avx")))
+int foo (); /* { dg-warning "'target' attribute ignored due to conflict with 'target_clones' attribute" } */
+
+__attribute__((target_clones("avx","arch=slm","default"),always_inline))
+int bar (); /* { dg-warning "'always_inline' attribute ignored due to conflict with 'target_clones' attribute" } */
diff --git a/gcc/testsuite/g++.dg/ext/mvc3.C b/gcc/testsuite/g++.dg/ext/mvc3.C
new file mode 100644
index 0000000..05bebf7
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/mvc3.C
@@ -0,0 +1,8 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+
+__attribute__((target("avx")))
+__attribute__((target_clones("avx","arch=slm","default")))
+int foo (); /* { dg-warning "'target_clones' attribute ignored due to conflict with 'target' attribute" } */
+
+__attribute__((always_inline,target_clones("avx","arch=slm","default")))
+int bar (); /* { dg-warning "'target_clones' attribute ignored due to conflict with 'always_inline' attribute" } */
diff --git a/gcc/testsuite/g++.dg/ext/mvc4.C b/gcc/testsuite/g++.dg/ext/mvc4.C
new file mode 100644
index 0000000..98e3502
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/mvc4.C
@@ -0,0 +1,34 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-mavx" } */
+
+#include <immintrin.h>
+
+__m256 x, y, z;
+
+__attribute__((target("avx")))
+int bar()
+{
+  x = _mm256_add_ps (y, z);
+  return 1;
+}
+
+__attribute__((target("default")))
+int bar()
+{
+  return 2;
+}
+
+int
+foobar()
+{
+  if (__builtin_cpu_supports ("avx"))
+    return bar();
+  else
+    return 0;
+}
+
+__attribute__((target_clones("default","sse3")))
+int foo()
+{
+  return foobar();
+}
diff --git a/gcc/testsuite/gcc.dg/mvc1.c b/gcc/testsuite/gcc.dg/mvc1.c
new file mode 100644
index 0000000..8e02721
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/mvc1.c
@@ -0,0 +1,26 @@
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
+
+__attribute__((target_clones("avx","arch=slm","arch=core-avx2","default")))
+int
+foo ()
+{
+  return -2;
+}
+
+int
+bar ()
+{
+  return 2;
+}
+
+int
+main ()
+{
+  int r = 0;
+  r += bar ();
+  r += foo ();
+  r += bar ();
+  r += foo ();
+  r += bar ();
+  return r - 2;
+}
diff --git a/gcc/testsuite/gcc.dg/mvc2.c b/gcc/testsuite/gcc.dg/mvc2.c
new file mode 100644
index 0000000..af0c6f7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/mvc2.c
@@ -0,0 +1,4 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+
+__attribute__((target_clones("avx","arch=slm","arch=core-avx2")))
+int foo ();
diff --git a/gcc/testsuite/gcc.dg/mvc3.c b/gcc/testsuite/gcc.dg/mvc3.c
new file mode 100644
index 0000000..3af3e35
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/mvc3.c
@@ -0,0 +1,10 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+
+__attribute__((target_clones("avx","arch=slm","arch=core-avx2")))
+int foo (); /* { dg-error "default target was not set" } */
+
+int
+bar ()
+{
+  return foo();
+}
diff --git a/gcc/testsuite/gcc.dg/mvc4.c b/gcc/testsuite/gcc.dg/mvc4.c
new file mode 100644
index 0000000..48ec9a1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/mvc4.c
@@ -0,0 +1,26 @@
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
+
+__attribute__((target_clones("default","avx","default")))
+int
+foo ()
+{
+  return -2;
+}
+
+int
+bar ()
+{
+  return 2;
+}
+
+int
+main ()
+{
+  int r = 0;
+  r += bar ();
+  r += foo ();
+  r += bar ();
+  r += foo ();
+  r += bar ();
+  return r - 2;
+}
diff --git a/gcc/testsuite/gcc.dg/mvc5.c b/gcc/testsuite/gcc.dg/mvc5.c
new file mode 100644
index 0000000..89001e5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/mvc5.c
@@ -0,0 +1,17 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-fno-inline" } */
+/* { dg-final { scan-assembler-times "foo.ifunc" 6 } } */
+
+__attribute__((target_clones("default","avx","avx2")))
+int
+foo ()
+{
+  return 10;
+}
+
+__attribute__((target_clones("default","avx","avx2")))
+int
+bar ()
+{
+  return -foo ();
+}
diff --git a/gcc/testsuite/gcc.dg/mvc6.c b/gcc/testsuite/gcc.dg/mvc6.c
new file mode 100644
index 0000000..1621985
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/mvc6.c
@@ -0,0 +1,16 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O3" } */
+/* { dg-final { scan-assembler "vpshufb" } } */
+/* { dg-final { scan-assembler "punpcklbw" } } */
+
+__attribute__((target_clones("arch=core-avx2","arch=slm","default")))
+void
+foo(char *in, char *out, int size)
+{
+  int i;
+  for(i = 0; i < size; i++)
+    {
+	out[2 * i] = in[i];
+	out[2 * i + 1] = in[i];
+    }
+}
diff --git a/gcc/testsuite/gcc.dg/mvc7.c b/gcc/testsuite/gcc.dg/mvc7.c
new file mode 100644
index 0000000..d61d78e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/mvc7.c
@@ -0,0 +1,10 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-final { scan-assembler-times "foo.ifunc" 4 } } */
+
+__attribute__((target_clones("avx","default","arch=slm","arch=core-avx2")))
+int foo ();
+
+int main()
+{
+  return foo();
+}
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 3c913ea..9747f39 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -482,6 +482,8 @@ extern ipa_opt_pass_d *make_pass_ipa_reference (gcc::context *ctxt);
 extern ipa_opt_pass_d *make_pass_ipa_pure_const (gcc::context *ctxt);
 extern simple_ipa_opt_pass *make_pass_ipa_pta (gcc::context *ctxt);
 extern simple_ipa_opt_pass *make_pass_ipa_tm (gcc::context *ctxt);
+extern simple_ipa_opt_pass *make_pass_target_clone (gcc::context *ctxt);
+extern simple_ipa_opt_pass *make_pass_dispatcher_calls (gcc::context *ctxt);
 extern simple_ipa_opt_pass *make_pass_omp_simd_clone (gcc::context *ctxt);
 extern ipa_opt_pass_d *make_pass_ipa_profile (gcc::context *ctxt);
 extern ipa_opt_pass_d *make_pass_ipa_cdtor_merge (gcc::context *ctxt);

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] New attribute to create target clones
  2015-10-12 23:35                             ` Evgeny Stupachenko
@ 2015-10-14 21:32                               ` Evgeny Stupachenko
  2015-10-22 18:07                                 ` Evgeny Stupachenko
  2015-10-26 15:59                               ` Jeff Law
  1 sibling, 1 reply; 36+ messages in thread
From: Evgeny Stupachenko @ 2015-10-14 21:32 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Jeff Law, Bernd Schmidt, Bernd Schmidt, GCC Patches

Bootstrap and make check for x86 passed. No new fails.
Please ignore an empty line added to omp-low.c in the patch, the
misprint will be removed prior to a commit.

Thanks,
Evgeny

On Tue, Oct 13, 2015 at 2:35 AM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
> Hi All,
>
> Here is a new version of patch (attached).
> Bootstrap and make check are in progress (all new tests passed).
>
> New test case g++.dg/ext/mvc4.C fails with ICE, when options lower
> than "-mavx" are passed.
> However it has the same behavior if "target_clones" attribute is
> replaced by 2 corresponding "target" attributes.
> I've filed PR67946 on this:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67946
>
> Thanks,
> Evgeny
>
> ChangeLog:
>
> 2015-10-13  Evgeny Stupachenko  <evstupac@gmail.com>
> gcc/
>         * Makefile.in (OBJS): Add multiple_target.o.
>         * attrib.c (make_attribute): Moved from config/i386/i386.c
>         * config/i386/i386.c (make_attribute): Deleted.
>         * multiple_target.c (make_attribute): New.
>         (create_dispatcher_calls): Ditto.
>         (get_attr_len): Ditto.
>         (get_attr_str): Ditto.
>         (is_valid_asm_symbol): Ditto.
>         (create_new_asm_name): Ditto.
>         (create_target_clone): Ditto.
>         (expand_target_clones): Ditto.
>         (ipa_target_clone): Ditto.
>         (ipa_dispatcher_calls): Ditto.
>         * passes.def (pass_target_clone): Two new ipa passes.
>         * tree-pass.h (make_pass_target_clone): Ditto.
>
> gcc/c-family
>         * c-common.c (handle_target_clones_attribute): New.
>         * (c_common_attribute_table): Add handle_target_clones_attribute.
>         * (handle_always_inline_attribute): Add check on target_clones
>         attribute.
>         * (handle_target_attribute): Ditto.
>
> gcc/testsuite
>         * gcc.dg/mvc1.c: New test for multiple targets cloning.
>         * gcc.dg/mvc2.c: Ditto.
>         * gcc.dg/mvc3.c: Ditto.
>         * gcc.dg/mvc4.c: Ditto.
>         * gcc.dg/mvc5.c: Ditto.
>         * gcc.dg/mvc6.c: Ditto.
>         * gcc.dg/mvc7.c: Ditto.
>         * g++.dg/ext/mvc1.C: Ditto.
>         * g++.dg/ext/mvc2.C: Ditto.
>         * g++.dg/ext/mvc3.C: Ditto.
>         * g++.dg/ext/mvc4.C: Ditto.
>
> gcc/doc
>         * doc/extend.texi (target_clones): New attribute description.
>
> On Sat, Oct 10, 2015 at 12:44 AM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
>> On Fri, Oct 9, 2015 at 11:04 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>>> On Fri, Oct 9, 2015 at 9:27 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>>> >> >Of course it also depends what you inline into function. You can have
>>>> >> >
>>>> >> >bar() target(-mavx) {fancy avx code}
>>>> >> >foobar() { ...... if (avx) bar();}
>>>> >> >foo() ctarget(-mavx,-mno-avx) {....foobar();....}
>>>>
>>>> "no-" targets are not supported
>>>
>>> Why not? I suppose I can use -march=x86_64 in a file compiled with -march=core-avx2 or something like that, too.
>> Sure, you can. target(arch=x86-64) is ok. I mean exactly target(no-avx) returns:
>>
>> aaa.cpp: In function '<built-in>':
>> aaa.cpp:7:5: error: No dispatcher found for no-avx
>>  int bar()
>>      ^
>>
>>>>
>>>> >> >
>>>> >> >Now if you compile with -mavx and because ctarget takes effect only after inlining,
>>>> >> >at inlining time the target attributes will match and we can edn up inline bar->foobar->foo.
>>>> >> >After that we multiversion foo and drop AVX flag we will likely get ICE at expansion
>>>> >> >time.
>>>> >> But isn't that avoided by fixing up the call graph so that all calls
>>>> >> to the affected function are going through the dispatcher?  Or is
>>>> >> that happening too late?
>>>> >
>>>> > There is dispatcher only for foo that is the root of the callgarph tree.
>>>> > When inlining we compare target attributes for match (in can_inline_edge_p).
>>>> > We do not compare ctarget attributes.  Expanding ctarget to target early would
>>>> > avoid need for ctarget handling.
>>>> Currently inlining is disabled for functions with target_clone attribute:
>>>
>>> Do you also disable inlining into functions with target_clone?
>>> What I am concerned about is early inliner inlining (say) AVX code into ctarget
>>> function because at early inlining time the target is not applied, yet.
>> Right. Now I've got your point and ICE on the test.
>> Yes the solution is to disable inline into target_clones function.
>> Or to move the pass creating clones before inline (as you suggested)
>> and leave dispatcher creator after inline.
>>
>> I like you suggestion. It fixes the ICE.
>> I'll fix the patch and retest.
>>
>> Thank you for the review,
>> Evgeny.
>>
>>
>>>
>>> Honza

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] New attribute to create target clones
  2015-10-14 21:32                               ` Evgeny Stupachenko
@ 2015-10-22 18:07                                 ` Evgeny Stupachenko
  0 siblings, 0 replies; 36+ messages in thread
From: Evgeny Stupachenko @ 2015-10-22 18:07 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Jeff Law, Bernd Schmidt, Bernd Schmidt, GCC Patches

PING.

On Thu, Oct 15, 2015 at 12:32 AM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
> Bootstrap and make check for x86 passed. No new fails.
> Please ignore an empty line added to omp-low.c in the patch, the
> misprint will be removed prior to a commit.
>
> Thanks,
> Evgeny
>
> On Tue, Oct 13, 2015 at 2:35 AM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
>> Hi All,
>>
>> Here is a new version of patch (attached).
>> Bootstrap and make check are in progress (all new tests passed).
>>
>> New test case g++.dg/ext/mvc4.C fails with ICE, when options lower
>> than "-mavx" are passed.
>> However it has the same behavior if "target_clones" attribute is
>> replaced by 2 corresponding "target" attributes.
>> I've filed PR67946 on this:
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67946
>>
>> Thanks,
>> Evgeny
>>
>> ChangeLog:
>>
>> 2015-10-13  Evgeny Stupachenko  <evstupac@gmail.com>
>> gcc/
>>         * Makefile.in (OBJS): Add multiple_target.o.
>>         * attrib.c (make_attribute): Moved from config/i386/i386.c
>>         * config/i386/i386.c (make_attribute): Deleted.
>>         * multiple_target.c (make_attribute): New.
>>         (create_dispatcher_calls): Ditto.
>>         (get_attr_len): Ditto.
>>         (get_attr_str): Ditto.
>>         (is_valid_asm_symbol): Ditto.
>>         (create_new_asm_name): Ditto.
>>         (create_target_clone): Ditto.
>>         (expand_target_clones): Ditto.
>>         (ipa_target_clone): Ditto.
>>         (ipa_dispatcher_calls): Ditto.
>>         * passes.def (pass_target_clone): Two new ipa passes.
>>         * tree-pass.h (make_pass_target_clone): Ditto.
>>
>> gcc/c-family
>>         * c-common.c (handle_target_clones_attribute): New.
>>         * (c_common_attribute_table): Add handle_target_clones_attribute.
>>         * (handle_always_inline_attribute): Add check on target_clones
>>         attribute.
>>         * (handle_target_attribute): Ditto.
>>
>> gcc/testsuite
>>         * gcc.dg/mvc1.c: New test for multiple targets cloning.
>>         * gcc.dg/mvc2.c: Ditto.
>>         * gcc.dg/mvc3.c: Ditto.
>>         * gcc.dg/mvc4.c: Ditto.
>>         * gcc.dg/mvc5.c: Ditto.
>>         * gcc.dg/mvc6.c: Ditto.
>>         * gcc.dg/mvc7.c: Ditto.
>>         * g++.dg/ext/mvc1.C: Ditto.
>>         * g++.dg/ext/mvc2.C: Ditto.
>>         * g++.dg/ext/mvc3.C: Ditto.
>>         * g++.dg/ext/mvc4.C: Ditto.
>>
>> gcc/doc
>>         * doc/extend.texi (target_clones): New attribute description.
>>
>> On Sat, Oct 10, 2015 at 12:44 AM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
>>> On Fri, Oct 9, 2015 at 11:04 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>>>> On Fri, Oct 9, 2015 at 9:27 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>>>> >> >Of course it also depends what you inline into function. You can have
>>>>> >> >
>>>>> >> >bar() target(-mavx) {fancy avx code}
>>>>> >> >foobar() { ...... if (avx) bar();}
>>>>> >> >foo() ctarget(-mavx,-mno-avx) {....foobar();....}
>>>>>
>>>>> "no-" targets are not supported
>>>>
>>>> Why not? I suppose I can use -march=x86_64 in a file compiled with -march=core-avx2 or something like that, too.
>>> Sure, you can. target(arch=x86-64) is ok. I mean exactly target(no-avx) returns:
>>>
>>> aaa.cpp: In function '<built-in>':
>>> aaa.cpp:7:5: error: No dispatcher found for no-avx
>>>  int bar()
>>>      ^
>>>
>>>>>
>>>>> >> >
>>>>> >> >Now if you compile with -mavx and because ctarget takes effect only after inlining,
>>>>> >> >at inlining time the target attributes will match and we can edn up inline bar->foobar->foo.
>>>>> >> >After that we multiversion foo and drop AVX flag we will likely get ICE at expansion
>>>>> >> >time.
>>>>> >> But isn't that avoided by fixing up the call graph so that all calls
>>>>> >> to the affected function are going through the dispatcher?  Or is
>>>>> >> that happening too late?
>>>>> >
>>>>> > There is dispatcher only for foo that is the root of the callgarph tree.
>>>>> > When inlining we compare target attributes for match (in can_inline_edge_p).
>>>>> > We do not compare ctarget attributes.  Expanding ctarget to target early would
>>>>> > avoid need for ctarget handling.
>>>>> Currently inlining is disabled for functions with target_clone attribute:
>>>>
>>>> Do you also disable inlining into functions with target_clone?
>>>> What I am concerned about is early inliner inlining (say) AVX code into ctarget
>>>> function because at early inlining time the target is not applied, yet.
>>> Right. Now I've got your point and ICE on the test.
>>> Yes the solution is to disable inline into target_clones function.
>>> Or to move the pass creating clones before inline (as you suggested)
>>> and leave dispatcher creator after inline.
>>>
>>> I like you suggestion. It fixes the ICE.
>>> I'll fix the patch and retest.
>>>
>>> Thank you for the review,
>>> Evgeny.
>>>
>>>
>>>>
>>>> Honza

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] New attribute to create target clones
  2015-10-12 23:35                             ` Evgeny Stupachenko
  2015-10-14 21:32                               ` Evgeny Stupachenko
@ 2015-10-26 15:59                               ` Jeff Law
  2015-10-29 13:42                                 ` Evgeny Stupachenko
  1 sibling, 1 reply; 36+ messages in thread
From: Jeff Law @ 2015-10-26 15:59 UTC (permalink / raw)
  To: Evgeny Stupachenko, Jan Hubicka; +Cc: Bernd Schmidt, Bernd Schmidt, GCC Patches

On 10/12/2015 05:35 PM, Evgeny Stupachenko wrote:
> Hi All,
>
> Here is a new version of patch (attached).
> Bootstrap and make check are in progress (all new tests passed).
>
> New test case g++.dg/ext/mvc4.C fails with ICE, when options lower
> than "-mavx" are passed.
> However it has the same behavior if "target_clones" attribute is
> replaced by 2 corresponding "target" attributes.
> I've filed PR67946 on this:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67946
>
> Thanks,
> Evgeny
>
> ChangeLog:
>
> 2015-10-13  Evgeny Stupachenko<evstupac@gmail.com>
> gcc/
>          * Makefile.in (OBJS): Add multiple_target.o.
>          * attrib.c (make_attribute): Moved from config/i386/i386.c
>          * config/i386/i386.c (make_attribute): Deleted.
>          * multiple_target.c (make_attribute): New.
>          (create_dispatcher_calls): Ditto.
>          (get_attr_len): Ditto.
>          (get_attr_str): Ditto.
>          (is_valid_asm_symbol): Ditto.
>          (create_new_asm_name): Ditto.
>          (create_target_clone): Ditto.
>          (expand_target_clones): Ditto.
>          (ipa_target_clone): Ditto.
>          (ipa_dispatcher_calls): Ditto.
>          * passes.def (pass_target_clone): Two new ipa passes.
>          * tree-pass.h (make_pass_target_clone): Ditto.
>
> gcc/c-family
>          * c-common.c (handle_target_clones_attribute): New.
>          * (c_common_attribute_table): Add handle_target_clones_attribute.
>          * (handle_always_inline_attribute): Add check on target_clones
>          attribute.
>          * (handle_target_attribute): Ditto.
>
> gcc/testsuite
>          * gcc.dg/mvc1.c: New test for multiple targets cloning.
>          * gcc.dg/mvc2.c: Ditto.
>          * gcc.dg/mvc3.c: Ditto.
>          * gcc.dg/mvc4.c: Ditto.
>          * gcc.dg/mvc5.c: Ditto.
>          * gcc.dg/mvc6.c: Ditto.
>          * gcc.dg/mvc7.c: Ditto.
>          * g++.dg/ext/mvc1.C: Ditto.
>          * g++.dg/ext/mvc2.C: Ditto.
>          * g++.dg/ext/mvc3.C: Ditto.
>          * g++.dg/ext/mvc4.C: Ditto.
>
> gcc/doc
>          * doc/extend.texi (target_clones): New attribute description.
>

> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index 23e6a76..f9d28d1 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -3066,6 +3066,19 @@ This function attribute make a stack protection of the function if
>   flags @option{fstack-protector} or @option{fstack-protector-strong}
>   or @option{fstack-protector-explicit} are set.
>
> +@item target_clones (@var{options})
> +@cindex @code{target_clones} function attribute
> +The @code{target_clones} attribute is used to specify that a function is to
> +be cloned into multiple versions compiled with different target options
> +than specified on the command line.  The supported options and restrictions
> +are the same as for @code{target}.
"as for @code{target}" -> "as for the @code{target} attribute."

I think that makes it a tiny bit clearer.




> +
> +/*  Creates target clone of NODE.  */
> +
> +static cgraph_node *
> +create_target_clone (cgraph_node *node, bool definition)
> +{
> +  cgraph_node *new_node;
> +  if (definition)
> +    {
> +      new_node = node->create_version_clone_with_body (vNULL, NULL,
> +    						       NULL, false,
> +						       NULL, NULL,
> +						       "target_clone");
> +      new_node->externally_visible = node->externally_visible;
> +      new_node->address_taken = node->address_taken;
> +      new_node->thunk = node->thunk;
> +      new_node->alias = node->alias;
> +      new_node->weakref = node->weakref;
> +      new_node->cpp_implicit_alias = node->cpp_implicit_alias;
> +      new_node->local.local = node->local.local;
So do we need to explicitly clear TREE_PUBLIC here?  It also seems like 
copying externally_visible, address_taken and possibly some of those 
other fields is wrong.  The clone is going to be local to the CU, it 
doesn't inherit those properties from the original -- only the 
dispatcher needs to inherit those properties, right?

> +
> +
> +  for (i = 0; i < attrnum; i++)
> +    {
> +      char *attr = attrs[i];
> +      cgraph_node *new_node = create_target_clone (node, defenition);
> +      char *new_asm_name =
> +	  XNEWVEC (char, strlen (old_asm_name) + strlen (attr) + 2);
> +      create_new_asm_name (old_asm_name, attr, new_asm_name);
I thought after discussions with Jan that this wasn't going to be 
necessary as cloning should create a suitable name for us?


Jeff

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] New attribute to create target clones
  2015-10-26 15:59                               ` Jeff Law
@ 2015-10-29 13:42                                 ` Evgeny Stupachenko
  2015-10-29 17:07                                   ` Jan Hubicka
  0 siblings, 1 reply; 36+ messages in thread
From: Evgeny Stupachenko @ 2015-10-29 13:42 UTC (permalink / raw)
  To: Jeff Law; +Cc: Jan Hubicka, Bernd Schmidt, Bernd Schmidt, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 5537 bytes --]

On Mon, Oct 26, 2015 at 6:50 PM, Jeff Law <law@redhat.com> wrote:
> On 10/12/2015 05:35 PM, Evgeny Stupachenko wrote:
>>
>> Hi All,
>>
>> Here is a new version of patch (attached).
>> Bootstrap and make check are in progress (all new tests passed).
>>
>> New test case g++.dg/ext/mvc4.C fails with ICE, when options lower
>> than "-mavx" are passed.
>> However it has the same behavior if "target_clones" attribute is
>> replaced by 2 corresponding "target" attributes.
>> I've filed PR67946 on this:
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67946
>>
>> Thanks,
>> Evgeny
>>
>> ChangeLog:
>>
>> 2015-10-13  Evgeny Stupachenko<evstupac@gmail.com>
>> gcc/
>>          * Makefile.in (OBJS): Add multiple_target.o.
>>          * attrib.c (make_attribute): Moved from config/i386/i386.c
>>          * config/i386/i386.c (make_attribute): Deleted.
>>          * multiple_target.c (make_attribute): New.
>>          (create_dispatcher_calls): Ditto.
>>          (get_attr_len): Ditto.
>>          (get_attr_str): Ditto.
>>          (is_valid_asm_symbol): Ditto.
>>          (create_new_asm_name): Ditto.
>>          (create_target_clone): Ditto.
>>          (expand_target_clones): Ditto.
>>          (ipa_target_clone): Ditto.
>>          (ipa_dispatcher_calls): Ditto.
>>          * passes.def (pass_target_clone): Two new ipa passes.
>>          * tree-pass.h (make_pass_target_clone): Ditto.
>>
>> gcc/c-family
>>          * c-common.c (handle_target_clones_attribute): New.
>>          * (c_common_attribute_table): Add handle_target_clones_attribute.
>>          * (handle_always_inline_attribute): Add check on target_clones
>>          attribute.
>>          * (handle_target_attribute): Ditto.
>>
>> gcc/testsuite
>>          * gcc.dg/mvc1.c: New test for multiple targets cloning.
>>          * gcc.dg/mvc2.c: Ditto.
>>          * gcc.dg/mvc3.c: Ditto.
>>          * gcc.dg/mvc4.c: Ditto.
>>          * gcc.dg/mvc5.c: Ditto.
>>          * gcc.dg/mvc6.c: Ditto.
>>          * gcc.dg/mvc7.c: Ditto.
>>          * g++.dg/ext/mvc1.C: Ditto.
>>          * g++.dg/ext/mvc2.C: Ditto.
>>          * g++.dg/ext/mvc3.C: Ditto.
>>          * g++.dg/ext/mvc4.C: Ditto.
>>
>> gcc/doc
>>          * doc/extend.texi (target_clones): New attribute description.
>>
>
>> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
>> index 23e6a76..f9d28d1 100644
>> --- a/gcc/doc/extend.texi
>> +++ b/gcc/doc/extend.texi
>> @@ -3066,6 +3066,19 @@ This function attribute make a stack protection of
>> the function if
>>   flags @option{fstack-protector} or @option{fstack-protector-strong}
>>   or @option{fstack-protector-explicit} are set.
>>
>> +@item target_clones (@var{options})
>> +@cindex @code{target_clones} function attribute
>> +The @code{target_clones} attribute is used to specify that a function is
>> to
>> +be cloned into multiple versions compiled with different target options
>> +than specified on the command line.  The supported options and
>> restrictions
>> +are the same as for @code{target}.
>
> "as for @code{target}" -> "as for the @code{target} attribute."
>
> I think that makes it a tiny bit clearer.
>
>
>
>
>> +
>> +/*  Creates target clone of NODE.  */
>> +
>> +static cgraph_node *
>> +create_target_clone (cgraph_node *node, bool definition)
>> +{
>> +  cgraph_node *new_node;
>> +  if (definition)
>> +    {
>> +      new_node = node->create_version_clone_with_body (vNULL, NULL,
>> +                                                      NULL, false,
>> +                                                      NULL, NULL,
>> +                                                      "target_clone");
>> +      new_node->externally_visible = node->externally_visible;
>> +      new_node->address_taken = node->address_taken;
>> +      new_node->thunk = node->thunk;
>> +      new_node->alias = node->alias;
>> +      new_node->weakref = node->weakref;
>> +      new_node->cpp_implicit_alias = node->cpp_implicit_alias;
>> +      new_node->local.local = node->local.local;
>
> So do we need to explicitly clear TREE_PUBLIC here?  It also seems like
> copying externally_visible, address_taken and possibly some of those other
> fields is wrong.  The clone is going to be local to the CU, it doesn't
> inherit those properties from the original -- only the dispatcher needs to
> inherit those properties, right?
Right. That was just the way to keep the node, that I didn't clean up.
Replaced with "new_node->force_output = true;"

>
>> +
>> +
>> +  for (i = 0; i < attrnum; i++)
>> +    {
>> +      char *attr = attrs[i];
>> +      cgraph_node *new_node = create_target_clone (node, defenition);
>> +      char *new_asm_name =
>> +         XNEWVEC (char, strlen (old_asm_name) + strlen (attr) + 2);
>> +      create_new_asm_name (old_asm_name, attr, new_asm_name);
>
> I thought after discussions with Jan that this wasn't going to be necessary
> as cloning should create a suitable name for us?
Yes. This is not necessary. However that way we'll have the following
code in dispatcher:
        cmpl    $6, __cpu_model+4(%rip)
        sete    %al
        movzbl  %al, %eax
        testl   %eax, %eax
        jle     .L16
        movl    $foo.target_clone.1, %eax
I think it is very hard to read and debug such...

While now we have:

        cmpl    $6, __cpu_model+4(%rip)
        sete    %al
        movzbl  %al, %eax
        testl   %eax, %eax
        jle     .L16
        movl    $foo.arch_slm, %eax

and it is clear that we are jumping to SLM code here.
I'd like to keep target in names.

Thanks,
Evgeny
>
>
> Jeff

[-- Attachment #2: target_clones4.patch --]
[-- Type: application/octet-stream, Size: 26951 bytes --]

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 783e4c9..847b9d4 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1348,6 +1348,7 @@ OBJS = \
 	mcf.o \
 	mode-switching.o \
 	modulo-sched.o \
+	multiple_target.o \
 	omp-low.o \
 	optabs.o \
 	optabs-libfuncs.o \
diff --git a/gcc/attribs.c b/gcc/attribs.c
index 6cbe011..9e70f04 100644
--- a/gcc/attribs.c
+++ b/gcc/attribs.c
@@ -681,3 +681,21 @@ apply_tm_attr (tree fndecl, tree attr)
 {
   decl_attributes (&TREE_TYPE (fndecl), tree_cons (attr, NULL, NULL), 0);
 }
+
+/* Makes a function attribute of the form NAME(ARG_NAME) and chains
+   it to CHAIN.  */
+
+tree
+make_attribute (const char *name, const char *arg_name, tree chain)
+{
+  tree attr_name;
+  tree attr_arg_name;
+  tree attr_args;
+  tree attr;
+
+  attr_name = get_identifier (name);
+  attr_arg_name = build_string (strlen (arg_name), arg_name);
+  attr_args = tree_cons (NULL_TREE, attr_arg_name, NULL_TREE);
+  attr = tree_cons (attr_name, attr_args, chain);
+  return attr;
+}
diff --git a/gcc/attribs.h b/gcc/attribs.h
index ac855d5..3c0be4f 100644
--- a/gcc/attribs.h
+++ b/gcc/attribs.h
@@ -36,5 +36,6 @@ extern tree decl_attributes (tree *, tree, int);
 extern bool cxx11_attribute_p (const_tree);
 extern tree get_attribute_name (const_tree);
 extern void apply_tm_attr (tree, tree);
+extern tree make_attribute (const char *, const char *, tree);
 
 #endif // GCC_ATTRIBS_H
diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 1c75921..144a5d8 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -384,6 +384,7 @@ static tree handle_alloc_size_attribute (tree *, tree, tree, int, bool *);
 static tree handle_alloc_align_attribute (tree *, tree, tree, int, bool *);
 static tree handle_assume_aligned_attribute (tree *, tree, tree, int, bool *);
 static tree handle_target_attribute (tree *, tree, tree, int, bool *);
+static tree handle_target_clones_attribute (tree *, tree, tree, int, bool *);
 static tree handle_optimize_attribute (tree *, tree, tree, int, bool *);
 static tree ignore_attribute (tree *, tree, tree, int, bool *);
 static tree handle_no_split_stack_attribute (tree *, tree, tree, int, bool *);
@@ -798,6 +799,8 @@ const struct attribute_spec c_common_attribute_table[] =
 			      handle_error_attribute, false },
   { "target",                 1, -1, true, false, false,
 			      handle_target_attribute, false },
+  { "target_clones",          1, -1, true, false, false,
+			      handle_target_clones_attribute, false },
   { "optimize",               1, -1, true, false, false,
 			      handle_optimize_attribute, false },
   /* For internal use only.  The leading '*' both prevents its usage in
@@ -7396,6 +7399,12 @@ handle_always_inline_attribute (tree *node, tree name,
 		   "with %qs attribute", name, "noinline");
 	  *no_add_attrs = true;
 	}
+      else if (lookup_attribute ("target_clones", DECL_ATTRIBUTES (*node)))
+	{
+	  warning (OPT_Wattributes, "%qE attribute ignored due to conflict "
+		   "with %qs attribute", name, "target_clones");
+	  *no_add_attrs = true;
+	}
       else
 	/* Set the attribute and mark it for disregarding inline
 	   limits.  */
@@ -9824,6 +9833,12 @@ handle_target_attribute (tree *node, tree name, tree args, int flags,
       warning (OPT_Wattributes, "%qE attribute ignored", name);
       *no_add_attrs = true;
     }
+  else if (lookup_attribute ("target_clones", DECL_ATTRIBUTES (*node)))
+    {
+      warning (OPT_Wattributes, "%qE attribute ignored due to conflict "
+		   "with %qs attribute", name, "target_clones");
+      *no_add_attrs = true;
+    }
   else if (! targetm.target_option.valid_attribute_p (*node, name, args,
 						      flags))
     *no_add_attrs = true;
@@ -9831,6 +9846,39 @@ handle_target_attribute (tree *node, tree name, tree args, int flags,
   return NULL_TREE;
 }
 
+/* Handle a "target_clones" attribute.  */
+
+static tree
+handle_target_clones_attribute (tree *node, tree name, tree ARG_UNUSED (args),
+			  int ARG_UNUSED (flags), bool *no_add_attrs)
+{
+  /* Ensure we have a function type.  */
+  if (TREE_CODE (*node) == FUNCTION_DECL)
+    {
+      if (lookup_attribute ("always_inline", DECL_ATTRIBUTES (*node)))
+	{
+	  warning (OPT_Wattributes, "%qE attribute ignored due to conflict "
+		   "with %qs attribute", name, "always_inline");
+	  *no_add_attrs = true;
+	}
+      else if (lookup_attribute ("target", DECL_ATTRIBUTES (*node)))
+	{
+	  warning (OPT_Wattributes, "%qE attribute ignored due to conflict "
+		   "with %qs attribute", name, "target");
+	  *no_add_attrs = true;
+	}
+      else
+      /* Do not inline functions with multiple clone targets.  */
+	DECL_UNINLINABLE (*node) = 1;
+    }
+  else
+    {
+      warning (OPT_Wattributes, "%qE attribute ignored", name);
+      *no_add_attrs = true;
+    }
+  return NULL_TREE;
+}
+
 /* Arguments being collected for optimization.  */
 typedef const char *const_char_p;		/* For DEF_VEC_P.  */
 static GTY(()) vec<const_char_p, va_gc> *optimize_args;
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index a2314e7..9f22210 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -36476,24 +36476,6 @@ ix86_get_function_versions_dispatcher (void *decl)
   return dispatch_decl;
 }
 
-/* Makes a function attribute of the form NAME(ARG_NAME) and chains
-   it to CHAIN.  */
-
-static tree
-make_attribute (const char *name, const char *arg_name, tree chain)
-{
-  tree attr_name;
-  tree attr_arg_name;
-  tree attr_args;
-  tree attr;
-
-  attr_name = get_identifier (name);
-  attr_arg_name = build_string (strlen (arg_name), arg_name);
-  attr_args = tree_cons (NULL_TREE, attr_arg_name, NULL_TREE);
-  attr = tree_cons (attr_name, attr_args, chain);
-  return attr;
-}
-
 /* Make the resolver function decl to dispatch the versions of
    a multi-versioned function,  DEFAULT_DECL.  Create an
    empty basic block in the resolver and store the pointer in
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 79440d3..65a598a 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -3066,6 +3066,19 @@ This function attribute make a stack protection of the function if
 flags @option{fstack-protector} or @option{fstack-protector-strong}
 or @option{fstack-protector-explicit} are set.
 
+@item target_clones (@var{options})
+@cindex @code{target_clones} function attribute
+The @code{target_clones} attribute is used to specify that a function is to
+be cloned into multiple versions compiled with different target options
+than specified on the command line.  The supported options and restrictions
+are the same as for @code{target} attribute.
+
+For instance on an x86, you could compile a function with
+@code{target_clones("sse4.1,avx")}. It will create 2 function clones,
+one compiled with @option{-msse4.1} and another with @option{-mavx}.
+At the function call it will create resolver @code{ifunc}, that will
+dynamically call a clone suitable for current architecture.
+
 @item target (@var{options})
 @cindex @code{target} function attribute
 Multiple target back ends implement the @code{target} attribute
diff --git a/gcc/multiple_target.c b/gcc/multiple_target.c
new file mode 100644
index 0000000..ba03c71
--- /dev/null
+++ b/gcc/multiple_target.c
@@ -0,0 +1,445 @@
+/* Pass for parsing functions with multiple target attributes.
+
+   Contributed by Evgeny Stupachenko <evstupac@gmail.com>
+
+   Copyright (C) 2015 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "tree.h"
+#include "stringpool.h"
+#include "gimple.h"
+#include "diagnostic-core.h"
+#include "gimple-ssa.h"
+#include "cgraph.h"
+#include "tree-pass.h"
+#include "target.h"
+#include "attribs.h"
+#include "pretty-print.h"
+
+/* If the call in NODE has multiple target attribute with multiple fields,
+   replace it with dispatcher call and create dispatcher (once).  */
+
+static void
+create_dispatcher_calls (struct cgraph_node *node)
+{
+  cgraph_edge *e;
+  cgraph_edge *e_next;
+  /* We need to remember NEXT_CALLER as it could be modified in the loop.  */
+  for (e = node->callers; e ;e = (e == NULL) ? e_next : e->next_caller)
+    {
+      tree resolver_decl;
+      tree idecl;
+      tree decl;
+      gimple *call = e->call_stmt;
+      struct cgraph_node *inode;
+
+      /* Checking if call of function is call of versioned function.
+	 Versioned function are not inlined, so there is no need to
+	 check for inline.  */
+      if (!call
+	  || !(decl = gimple_call_fndecl (call))
+	  || !DECL_FUNCTION_VERSIONED (decl))
+	continue;
+
+      e_next = e->next_caller;
+      idecl = targetm.get_function_versions_dispatcher (decl);
+      if (!idecl)
+	{
+	  error_at (gimple_location (call),
+		    "default target_clones attribute was not set");
+	}
+      inode = cgraph_node::get (idecl);
+      gcc_assert (inode);
+      resolver_decl = targetm.generate_version_dispatcher_body (inode);
+
+      /* Update aliases.  */
+      inode->alias = true;
+      inode->alias_target = resolver_decl;
+      if (!inode->analyzed)
+	inode->resolve_alias (cgraph_node::get (resolver_decl));
+
+      e->redirect_callee (inode);
+      /*  Since REDIRECT_CALLEE modifies NEXT_CALLER field we move to
+	  previously set NEXT_CALLER.  */
+      e = NULL;
+    }
+}
+
+/* Return length of attribute names string,
+   if arglist chain >1, -1 otherwise.  */
+
+static int
+get_attr_len (tree arglist)
+{
+  tree arg;
+  int str_len_sum = 0;
+  int argnum = 1;
+  for (arg = arglist; arg; arg = TREE_CHAIN (arg))
+    {
+      unsigned int i;
+      const char *str = TREE_STRING_POINTER (TREE_VALUE (arg));
+      int len = strlen (str);
+      str_len_sum += len + 1;
+      if (arg != arglist)
+	argnum++;
+      for (i = 0; i < strlen (str); i++)
+	if (str[i] == ',')
+	  argnum++;
+    }
+  if (argnum == 1)
+    return -1;
+  return str_len_sum;
+}
+
+/* Create sting with attributes separated by comma.
+   Return number of attributes.  */
+
+static int
+get_attr_str (tree arglist, char *attr_str)
+{
+  tree arg;
+  size_t str_len_sum = 0;
+  int argnum = 0;
+  str_len_sum = 0;
+  for (arg = arglist; arg; arg = TREE_CHAIN (arg))
+    {
+      const char *str = TREE_STRING_POINTER (TREE_VALUE (arg));
+      size_t len = strlen (str);
+      memcpy (attr_str + str_len_sum, str, len);
+      attr_str[str_len_sum + len] = TREE_CHAIN (arg) ? ',' : '\0';
+      str_len_sum += len + 1;
+      argnum++;
+    }
+  return argnum;
+}
+
+/* Return number of attributes separated by comma and put them into ARGS.
+   If there is no DEFAULT attribute return -1.  */
+
+static int
+separate_attrs (char *attr_str, char **attrs)
+{
+  int i = 0;
+  bool has_default = false;
+  char *attr = strtok (attr_str, ",");
+  while (attr != NULL)
+    {
+      if (strcmp (attr, "default") == 0)
+	{
+	  has_default = true;
+	  attr = strtok (NULL, ",");
+	  continue;
+	}
+      attrs[i] = attr;
+      attr = strtok (NULL, ",");
+      i++;
+    }
+  if (!has_default)
+    return -1;
+  return i;
+}
+
+/*  Return true if symbol is valid in assembler name.  */
+
+static bool
+is_valid_asm_symbol (char c)
+{
+  if ('a' <= c && c <= 'z')
+    return true;
+  if ('A' <= c && c <= 'Z')
+    return true;
+  if ('0' <= c && c <= '9')
+    return true;
+  if (c == '_')
+    return true;
+  return false;
+}
+
+/* Create new assembler name for given old name and target name.
+   Replace all not valid assembler symbols with '_'.  */
+
+static void
+create_new_asm_name (const char *old_asm_name, char *attr, char *new_asm_name)
+{
+  int i;
+  int old_name_len = strlen (old_asm_name);
+  int attr_len = strlen (attr);
+  memcpy (new_asm_name, old_asm_name, old_name_len);
+
+  char *new_suffix = &new_asm_name[old_name_len + 1];
+  memcpy (new_suffix, attr, attr_len + 1);
+
+  /* Replace all not valid assembler symbols with '_'.  */
+  for (i = 0; i < attr_len; i++)
+    if (!is_valid_asm_symbol (new_suffix[i]))
+      new_suffix[i] = '_';
+
+  new_asm_name[old_name_len] = '.';
+}
+
+/*  Creates target clone of NODE.  */
+
+static cgraph_node *
+create_target_clone (cgraph_node *node, bool definition)
+{
+  cgraph_node *new_node;
+  if (definition)
+    {
+      new_node = node->create_version_clone_with_body (vNULL, NULL,
+    						       NULL, false,
+						       NULL, NULL,
+						       "target_clone");
+      new_node->force_output = true;
+    }
+  else
+    {
+      tree new_decl = copy_node (node->decl);
+      new_node = cgraph_node::get_create (new_decl);
+    }
+  return new_node;
+}
+
+/* If the function in NODE has multiple target attributes
+   create the appropriate clone for each valid target attribute.  */
+
+static bool
+expand_target_clones (struct cgraph_node *node, bool defenition)
+{
+  int i;
+  tree id;
+  /* Parsing target attributes separated by comma.  */
+  tree attr_target = lookup_attribute ("target_clones",
+				       DECL_ATTRIBUTES (node->decl));
+  /* No targets specified.  */
+  if (!attr_target)
+    return false;
+
+  tree arglist = TREE_VALUE (attr_target);
+  int attr_len = get_attr_len (arglist);
+
+  /* No need to clone for 1 target attribute.  */
+  if (attr_len == -1)
+    {
+      warning_at (DECL_SOURCE_LOCATION (node->decl),
+		  0,
+		  "single target_clones attribute is ignored");
+      return false;
+    }
+
+  char *attr_str = XNEWVEC (char, attr_len);
+  int attrnum = get_attr_str (arglist, attr_str);
+
+  char **attrs = XNEWVEC (char *, attrnum);
+  attrnum = separate_attrs (attr_str, attrs);
+
+  if (attrnum == -1)
+    {
+      error_at (DECL_SOURCE_LOCATION (node->decl),
+		"default target was not set");
+      return false;
+    }
+
+  const char *old_asm_name;
+  old_asm_name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (node->decl));
+
+  cgraph_function_version_info *decl1_v = NULL;
+  cgraph_function_version_info *decl2_v = NULL;
+  cgraph_function_version_info *before = NULL;
+  cgraph_function_version_info *after = NULL;
+  decl1_v = node->function_version ();
+  if (decl1_v == NULL)
+    decl1_v = node->insert_new_function_version ();
+  before = decl1_v;
+  DECL_FUNCTION_VERSIONED (node->decl) = 1;
+
+
+  for (i = 0; i < attrnum; i++)
+    {
+      char *attr = attrs[i];
+
+      /* Create new target clone.  */
+      cgraph_node *new_node = create_target_clone (node, defenition);
+      char *new_asm_name =
+	  XNEWVEC (char, strlen (old_asm_name) + strlen (attr) + 2);
+      create_new_asm_name (old_asm_name, attr, new_asm_name);
+
+      /* Set new attribute for the clone.  */
+      tree attributes = make_attribute ("target", attr,
+					DECL_ATTRIBUTES (new_node->decl));
+      DECL_ATTRIBUTES (new_node->decl) = attributes;
+      if (!targetm.target_option.valid_attribute_p (new_node->decl, NULL,
+						    TREE_VALUE (attributes), 0))
+	{
+	  warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+		      "attribute(target_clones(\"%s\")) is not "
+		      "valid for current target", attr);
+	  continue;
+	}
+
+      /* Set new name for the clone.  */
+      id = get_identifier (new_asm_name);
+      symtab->change_decl_assembler_name (new_node->decl, id);
+      XDELETEVEC (new_asm_name);
+
+      decl2_v = new_node->function_version ();
+      if (decl2_v != NULL)
+        continue;
+      decl2_v = new_node->insert_new_function_version ();
+
+      /* Chain decl2_v and decl1_v.  All semantically identical versions
+	 will be chained together.  */
+
+      after = decl2_v;
+      while (before->next != NULL)
+	before = before->next;
+      while (after->prev != NULL)
+	after = after->prev;
+
+      before->next = after;
+      after->prev = before;
+      DECL_FUNCTION_VERSIONED (new_node->decl) = 1;
+    }
+  /* Setting new attribute to initial function.  */
+  tree attributes =  make_attribute ("target", "default",
+				     DECL_ATTRIBUTES (node->decl));
+  DECL_ATTRIBUTES (node->decl) = attributes;
+  if (!targetm.target_option.valid_attribute_p (node->decl, NULL,
+						TREE_VALUE (attributes), 0))
+    {
+      error_at (DECL_SOURCE_LOCATION (node->decl),
+		"attribute(target_clones(\"default\")) is not "
+		"valid for current target");
+      return false;
+    }
+
+  XDELETEVEC (attrs);
+  XDELETEVEC (attr_str);
+  return true;
+}
+
+static bool target_clone_pass;
+
+static unsigned int
+ipa_target_clone (void)
+{
+  struct cgraph_node *node;
+  target_clone_pass = false;
+  FOR_EACH_FUNCTION (node)
+    if (node->definition)
+      target_clone_pass |= expand_target_clones (node, true);
+  return 0;
+}
+
+namespace {
+
+const pass_data pass_data_target_clone =
+{
+  SIMPLE_IPA_PASS,		/* type */
+  "targetclone",		/* name */
+  OPTGROUP_NONE,		/* optinfo_flags */
+  TV_NONE,			/* tv_id */
+  ( PROP_ssa | PROP_cfg ),	/* properties_required */
+  0,				/* properties_provided */
+  0,				/* properties_destroyed */
+  0,				/* todo_flags_start */
+  0				/* todo_flags_finish */
+};
+
+class pass_target_clone : public simple_ipa_opt_pass
+{
+public:
+  pass_target_clone (gcc::context *ctxt)
+    : simple_ipa_opt_pass (pass_data_target_clone, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *);
+  virtual unsigned int execute (function *) { return ipa_target_clone (); }
+};
+
+bool
+pass_target_clone::gate (function *)
+{
+  return true;
+}
+
+} // anon namespace
+
+simple_ipa_opt_pass *
+make_pass_target_clone (gcc::context *ctxt)
+{
+  return new pass_target_clone (ctxt);
+}
+
+static unsigned int
+ipa_dispatcher_calls (void)
+{
+  struct cgraph_node *node;
+  FOR_EACH_FUNCTION (node)
+    if (!node->definition)
+      target_clone_pass |= expand_target_clones (node, false);
+  if (target_clone_pass)
+    FOR_EACH_FUNCTION (node)
+      create_dispatcher_calls (node);
+  return 0;
+}
+
+namespace {
+
+const pass_data pass_data_dispatcher_calls =
+{
+  SIMPLE_IPA_PASS,		/* type */
+  "dispachercalls",		/* name */
+  OPTGROUP_NONE,		/* optinfo_flags */
+  TV_NONE,			/* tv_id */
+  ( PROP_ssa | PROP_cfg ),	/* properties_required */
+  0,				/* properties_provided */
+  0,				/* properties_destroyed */
+  0,				/* todo_flags_start */
+  0				/* todo_flags_finish */
+};
+
+class pass_dispatcher_calls : public simple_ipa_opt_pass
+{
+public:
+  pass_dispatcher_calls (gcc::context *ctxt)
+    : simple_ipa_opt_pass (pass_data_dispatcher_calls, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *);
+  virtual unsigned int execute (function *) { return ipa_dispatcher_calls (); }
+};
+
+bool
+pass_dispatcher_calls::gate (function *)
+{
+  return true;
+}
+
+} // anon namespace
+
+simple_ipa_opt_pass *
+make_pass_dispatcher_calls (gcc::context *ctxt)
+{
+  return new pass_dispatcher_calls (ctxt);
+}
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index b444864..c85ce4d 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -17340,6 +17340,7 @@ ipa_omp_simd_clone (void)
   struct cgraph_node *node;
   FOR_EACH_FUNCTION (node)
     expand_simd_clones (node);
+
   return 0;
 }
 
diff --git a/gcc/passes.def b/gcc/passes.def
index dc3f44c..57cba33 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -124,6 +124,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_ipa_devirt);
   NEXT_PASS (pass_ipa_cp);
   NEXT_PASS (pass_ipa_cdtor_merge);
+  NEXT_PASS (pass_target_clone);
   NEXT_PASS (pass_ipa_inline);
   NEXT_PASS (pass_ipa_pure_const);
   NEXT_PASS (pass_ipa_reference);
@@ -140,6 +141,7 @@ along with GCC; see the file COPYING3.  If not see
      compiled unit.  */
   INSERT_PASSES_AFTER (all_late_ipa_passes)
   NEXT_PASS (pass_ipa_pta);
+  NEXT_PASS (pass_dispatcher_calls);
   NEXT_PASS (pass_omp_simd_clone);
   TERMINATE_PASS_LIST ()
 
diff --git a/gcc/testsuite/g++.dg/ext/mvc1.C b/gcc/testsuite/g++.dg/ext/mvc1.C
new file mode 100644
index 0000000..fbf9011
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/mvc1.C
@@ -0,0 +1,34 @@
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
+
+__attribute__((target_clones("avx","arch=slm","arch=core-avx2","default")))
+int
+foo ()
+{
+  return -2;
+}
+
+__attribute__((target("avx","arch=core-avx2")))
+int
+bar ()
+{
+  return 2;
+}
+
+__attribute__((target("default")))
+int
+bar ()
+{
+  return 2;
+}
+
+int
+main ()
+{
+  int r = 0;
+  r += bar ();
+  r += foo ();
+  r += bar ();
+  r += foo ();
+  r += bar ();
+  return r - 2;
+}
diff --git a/gcc/testsuite/g++.dg/ext/mvc2.C b/gcc/testsuite/g++.dg/ext/mvc2.C
new file mode 100644
index 0000000..e7abab8
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/mvc2.C
@@ -0,0 +1,8 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+
+__attribute__((target_clones("avx","arch=slm","default")))
+__attribute__((target("avx")))
+int foo (); /* { dg-warning "'target' attribute ignored due to conflict with 'target_clones' attribute" } */
+
+__attribute__((target_clones("avx","arch=slm","default"),always_inline))
+int bar (); /* { dg-warning "'always_inline' attribute ignored due to conflict with 'target_clones' attribute" } */
diff --git a/gcc/testsuite/g++.dg/ext/mvc3.C b/gcc/testsuite/g++.dg/ext/mvc3.C
new file mode 100644
index 0000000..05bebf7
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/mvc3.C
@@ -0,0 +1,8 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+
+__attribute__((target("avx")))
+__attribute__((target_clones("avx","arch=slm","default")))
+int foo (); /* { dg-warning "'target_clones' attribute ignored due to conflict with 'target' attribute" } */
+
+__attribute__((always_inline,target_clones("avx","arch=slm","default")))
+int bar (); /* { dg-warning "'target_clones' attribute ignored due to conflict with 'always_inline' attribute" } */
diff --git a/gcc/testsuite/g++.dg/ext/mvc4.C b/gcc/testsuite/g++.dg/ext/mvc4.C
new file mode 100644
index 0000000..98e3502
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/mvc4.C
@@ -0,0 +1,34 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-mavx" } */
+
+#include <immintrin.h>
+
+__m256 x, y, z;
+
+__attribute__((target("avx")))
+int bar()
+{
+  x = _mm256_add_ps (y, z);
+  return 1;
+}
+
+__attribute__((target("default")))
+int bar()
+{
+  return 2;
+}
+
+int
+foobar()
+{
+  if (__builtin_cpu_supports ("avx"))
+    return bar();
+  else
+    return 0;
+}
+
+__attribute__((target_clones("default","sse3")))
+int foo()
+{
+  return foobar();
+}
diff --git a/gcc/testsuite/gcc.dg/mvc1.c b/gcc/testsuite/gcc.dg/mvc1.c
new file mode 100644
index 0000000..8e02721
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/mvc1.c
@@ -0,0 +1,26 @@
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
+
+__attribute__((target_clones("avx","arch=slm","arch=core-avx2","default")))
+int
+foo ()
+{
+  return -2;
+}
+
+int
+bar ()
+{
+  return 2;
+}
+
+int
+main ()
+{
+  int r = 0;
+  r += bar ();
+  r += foo ();
+  r += bar ();
+  r += foo ();
+  r += bar ();
+  return r - 2;
+}
diff --git a/gcc/testsuite/gcc.dg/mvc2.c b/gcc/testsuite/gcc.dg/mvc2.c
new file mode 100644
index 0000000..af0c6f7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/mvc2.c
@@ -0,0 +1,4 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+
+__attribute__((target_clones("avx","arch=slm","arch=core-avx2")))
+int foo ();
diff --git a/gcc/testsuite/gcc.dg/mvc3.c b/gcc/testsuite/gcc.dg/mvc3.c
new file mode 100644
index 0000000..3af3e35
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/mvc3.c
@@ -0,0 +1,10 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+
+__attribute__((target_clones("avx","arch=slm","arch=core-avx2")))
+int foo (); /* { dg-error "default target was not set" } */
+
+int
+bar ()
+{
+  return foo();
+}
diff --git a/gcc/testsuite/gcc.dg/mvc4.c b/gcc/testsuite/gcc.dg/mvc4.c
new file mode 100644
index 0000000..48ec9a1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/mvc4.c
@@ -0,0 +1,26 @@
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
+
+__attribute__((target_clones("default","avx","default")))
+int
+foo ()
+{
+  return -2;
+}
+
+int
+bar ()
+{
+  return 2;
+}
+
+int
+main ()
+{
+  int r = 0;
+  r += bar ();
+  r += foo ();
+  r += bar ();
+  r += foo ();
+  r += bar ();
+  return r - 2;
+}
diff --git a/gcc/testsuite/gcc.dg/mvc5.c b/gcc/testsuite/gcc.dg/mvc5.c
new file mode 100644
index 0000000..89001e5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/mvc5.c
@@ -0,0 +1,17 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-fno-inline" } */
+/* { dg-final { scan-assembler-times "foo.ifunc" 6 } } */
+
+__attribute__((target_clones("default","avx","avx2")))
+int
+foo ()
+{
+  return 10;
+}
+
+__attribute__((target_clones("default","avx","avx2")))
+int
+bar ()
+{
+  return -foo ();
+}
diff --git a/gcc/testsuite/gcc.dg/mvc6.c b/gcc/testsuite/gcc.dg/mvc6.c
new file mode 100644
index 0000000..1621985
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/mvc6.c
@@ -0,0 +1,16 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O3" } */
+/* { dg-final { scan-assembler "vpshufb" } } */
+/* { dg-final { scan-assembler "punpcklbw" } } */
+
+__attribute__((target_clones("arch=core-avx2","arch=slm","default")))
+void
+foo(char *in, char *out, int size)
+{
+  int i;
+  for(i = 0; i < size; i++)
+    {
+	out[2 * i] = in[i];
+	out[2 * i + 1] = in[i];
+    }
+}
diff --git a/gcc/testsuite/gcc.dg/mvc7.c b/gcc/testsuite/gcc.dg/mvc7.c
new file mode 100644
index 0000000..d61d78e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/mvc7.c
@@ -0,0 +1,10 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-final { scan-assembler-times "foo.ifunc" 4 } } */
+
+__attribute__((target_clones("avx","default","arch=slm","arch=core-avx2")))
+int foo ();
+
+int main()
+{
+  return foo();
+}
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 91f63a8..3191e2b 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -483,6 +483,8 @@ extern ipa_opt_pass_d *make_pass_ipa_reference (gcc::context *ctxt);
 extern ipa_opt_pass_d *make_pass_ipa_pure_const (gcc::context *ctxt);
 extern simple_ipa_opt_pass *make_pass_ipa_pta (gcc::context *ctxt);
 extern simple_ipa_opt_pass *make_pass_ipa_tm (gcc::context *ctxt);
+extern simple_ipa_opt_pass *make_pass_target_clone (gcc::context *ctxt);
+extern simple_ipa_opt_pass *make_pass_dispatcher_calls (gcc::context *ctxt);
 extern simple_ipa_opt_pass *make_pass_omp_simd_clone (gcc::context *ctxt);
 extern ipa_opt_pass_d *make_pass_ipa_profile (gcc::context *ctxt);
 extern ipa_opt_pass_d *make_pass_ipa_cdtor_merge (gcc::context *ctxt);

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] New attribute to create target clones
  2015-10-29 13:42                                 ` Evgeny Stupachenko
@ 2015-10-29 17:07                                   ` Jan Hubicka
  2015-10-29 18:15                                     ` Evgeny Stupachenko
  0 siblings, 1 reply; 36+ messages in thread
From: Jan Hubicka @ 2015-10-29 17:07 UTC (permalink / raw)
  To: Evgeny Stupachenko
  Cc: Jeff Law, Jan Hubicka, Bernd Schmidt, Bernd Schmidt, GCC Patches

> Yes. This is not necessary. However that way we'll have the following
> code in dispatcher:
>         cmpl    $6, __cpu_model+4(%rip)
>         sete    %al
>         movzbl  %al, %eax
>         testl   %eax, %eax
>         jle     .L16
>         movl    $foo.target_clone.1, %eax
> I think it is very hard to read and debug such...
> 
> While now we have:
> 
>         cmpl    $6, __cpu_model+4(%rip)
>         sete    %al
>         movzbl  %al, %eax
>         testl   %eax, %eax
>         jle     .L16
>         movl    $foo.arch_slm, %eax
> 
> and it is clear that we are jumping to SLM code here.
> I'd like to keep target in names.

I am not against more informative names, but why you don't pass the info here:

+create_target_clone (cgraph_node *node, bool definition)
+{
+  cgraph_node *new_node;
+  if (definition)
+    {
+      new_node = node->create_version_clone_with_body (vNULL, NULL,
+                                                      NULL, false,
+                                                      NULL, NULL,
+                                                      "target_clone");
+      new_node->force_output = true;
+    }
+  else
+    {
+      tree new_decl = copy_node (node->decl);
+      new_node = cgraph_node::get_create (new_decl);
+    }
+  return new_node;
+}

passing "arch_slm" instead of target_clone will get you the name you want
(plus the extra index that may be needed anyway to disambiguate).

Note that in general those .suffixes should be machine parseable, so cp-demangle.c
can expand them correctly.  We may want to have some consistent grammar for them here
and update cp-demangle.c to output nice info like "target clone for..."

I will look at the rest of the patch at evening.
Honza
> 
> Thanks,
> Evgeny
> >
> >
> > Jeff


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] New attribute to create target clones
  2015-10-29 17:07                                   ` Jan Hubicka
@ 2015-10-29 18:15                                     ` Evgeny Stupachenko
  2015-10-30  3:55                                       ` Jan Hubicka
  2015-10-30  5:30                                       ` Jeff Law
  0 siblings, 2 replies; 36+ messages in thread
From: Evgeny Stupachenko @ 2015-10-29 18:15 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Jeff Law, Bernd Schmidt, Bernd Schmidt, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 1976 bytes --]

On Thu, Oct 29, 2015 at 8:02 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> Yes. This is not necessary. However that way we'll have the following
>> code in dispatcher:
>>         cmpl    $6, __cpu_model+4(%rip)
>>         sete    %al
>>         movzbl  %al, %eax
>>         testl   %eax, %eax
>>         jle     .L16
>>         movl    $foo.target_clone.1, %eax
>> I think it is very hard to read and debug such...
>>
>> While now we have:
>>
>>         cmpl    $6, __cpu_model+4(%rip)
>>         sete    %al
>>         movzbl  %al, %eax
>>         testl   %eax, %eax
>>         jle     .L16
>>         movl    $foo.arch_slm, %eax
>>
>> and it is clear that we are jumping to SLM code here.
>> I'd like to keep target in names.
>
> I am not against more informative names, but why you don't pass the info here:
>
> +create_target_clone (cgraph_node *node, bool definition)
> +{
> +  cgraph_node *new_node;
> +  if (definition)
> +    {
> +      new_node = node->create_version_clone_with_body (vNULL, NULL,
> +                                                      NULL, false,
> +                                                      NULL, NULL,
> +                                                      "target_clone");
> +      new_node->force_output = true;
> +    }
> +  else
> +    {
> +      tree new_decl = copy_node (node->decl);
> +      new_node = cgraph_node::get_create (new_decl);
> +    }
> +  return new_node;
> +}
>
> passing "arch_slm" instead of target_clone will get you the name you want
> (plus the extra index that may be needed anyway to disambiguate).
>
> Note that in general those .suffixes should be machine parseable, so cp-demangle.c
> can expand them correctly.  We may want to have some consistent grammar for them here
> and update cp-demangle.c to output nice info like "target clone for..."
Ok. I've modified the patch correspondingly.

>
> I will look at the rest of the patch at evening.
> Honza
>>
>> Thanks,
>> Evgeny
>> >
>> >
>> > Jeff
>
>

[-- Attachment #2: target_clones5.patch --]
[-- Type: application/octet-stream, Size: 26424 bytes --]

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 783e4c9..847b9d4 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1348,6 +1348,7 @@ OBJS = \
 	mcf.o \
 	mode-switching.o \
 	modulo-sched.o \
+	multiple_target.o \
 	omp-low.o \
 	optabs.o \
 	optabs-libfuncs.o \
diff --git a/gcc/attribs.c b/gcc/attribs.c
index 6cbe011..9e70f04 100644
--- a/gcc/attribs.c
+++ b/gcc/attribs.c
@@ -681,3 +681,21 @@ apply_tm_attr (tree fndecl, tree attr)
 {
   decl_attributes (&TREE_TYPE (fndecl), tree_cons (attr, NULL, NULL), 0);
 }
+
+/* Makes a function attribute of the form NAME(ARG_NAME) and chains
+   it to CHAIN.  */
+
+tree
+make_attribute (const char *name, const char *arg_name, tree chain)
+{
+  tree attr_name;
+  tree attr_arg_name;
+  tree attr_args;
+  tree attr;
+
+  attr_name = get_identifier (name);
+  attr_arg_name = build_string (strlen (arg_name), arg_name);
+  attr_args = tree_cons (NULL_TREE, attr_arg_name, NULL_TREE);
+  attr = tree_cons (attr_name, attr_args, chain);
+  return attr;
+}
diff --git a/gcc/attribs.h b/gcc/attribs.h
index ac855d5..3c0be4f 100644
--- a/gcc/attribs.h
+++ b/gcc/attribs.h
@@ -36,5 +36,6 @@ extern tree decl_attributes (tree *, tree, int);
 extern bool cxx11_attribute_p (const_tree);
 extern tree get_attribute_name (const_tree);
 extern void apply_tm_attr (tree, tree);
+extern tree make_attribute (const char *, const char *, tree);
 
 #endif // GCC_ATTRIBS_H
diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 1c75921..144a5d8 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -384,6 +384,7 @@ static tree handle_alloc_size_attribute (tree *, tree, tree, int, bool *);
 static tree handle_alloc_align_attribute (tree *, tree, tree, int, bool *);
 static tree handle_assume_aligned_attribute (tree *, tree, tree, int, bool *);
 static tree handle_target_attribute (tree *, tree, tree, int, bool *);
+static tree handle_target_clones_attribute (tree *, tree, tree, int, bool *);
 static tree handle_optimize_attribute (tree *, tree, tree, int, bool *);
 static tree ignore_attribute (tree *, tree, tree, int, bool *);
 static tree handle_no_split_stack_attribute (tree *, tree, tree, int, bool *);
@@ -798,6 +799,8 @@ const struct attribute_spec c_common_attribute_table[] =
 			      handle_error_attribute, false },
   { "target",                 1, -1, true, false, false,
 			      handle_target_attribute, false },
+  { "target_clones",          1, -1, true, false, false,
+			      handle_target_clones_attribute, false },
   { "optimize",               1, -1, true, false, false,
 			      handle_optimize_attribute, false },
   /* For internal use only.  The leading '*' both prevents its usage in
@@ -7396,6 +7399,12 @@ handle_always_inline_attribute (tree *node, tree name,
 		   "with %qs attribute", name, "noinline");
 	  *no_add_attrs = true;
 	}
+      else if (lookup_attribute ("target_clones", DECL_ATTRIBUTES (*node)))
+	{
+	  warning (OPT_Wattributes, "%qE attribute ignored due to conflict "
+		   "with %qs attribute", name, "target_clones");
+	  *no_add_attrs = true;
+	}
       else
 	/* Set the attribute and mark it for disregarding inline
 	   limits.  */
@@ -9824,6 +9833,12 @@ handle_target_attribute (tree *node, tree name, tree args, int flags,
       warning (OPT_Wattributes, "%qE attribute ignored", name);
       *no_add_attrs = true;
     }
+  else if (lookup_attribute ("target_clones", DECL_ATTRIBUTES (*node)))
+    {
+      warning (OPT_Wattributes, "%qE attribute ignored due to conflict "
+		   "with %qs attribute", name, "target_clones");
+      *no_add_attrs = true;
+    }
   else if (! targetm.target_option.valid_attribute_p (*node, name, args,
 						      flags))
     *no_add_attrs = true;
@@ -9831,6 +9846,39 @@ handle_target_attribute (tree *node, tree name, tree args, int flags,
   return NULL_TREE;
 }
 
+/* Handle a "target_clones" attribute.  */
+
+static tree
+handle_target_clones_attribute (tree *node, tree name, tree ARG_UNUSED (args),
+			  int ARG_UNUSED (flags), bool *no_add_attrs)
+{
+  /* Ensure we have a function type.  */
+  if (TREE_CODE (*node) == FUNCTION_DECL)
+    {
+      if (lookup_attribute ("always_inline", DECL_ATTRIBUTES (*node)))
+	{
+	  warning (OPT_Wattributes, "%qE attribute ignored due to conflict "
+		   "with %qs attribute", name, "always_inline");
+	  *no_add_attrs = true;
+	}
+      else if (lookup_attribute ("target", DECL_ATTRIBUTES (*node)))
+	{
+	  warning (OPT_Wattributes, "%qE attribute ignored due to conflict "
+		   "with %qs attribute", name, "target");
+	  *no_add_attrs = true;
+	}
+      else
+      /* Do not inline functions with multiple clone targets.  */
+	DECL_UNINLINABLE (*node) = 1;
+    }
+  else
+    {
+      warning (OPT_Wattributes, "%qE attribute ignored", name);
+      *no_add_attrs = true;
+    }
+  return NULL_TREE;
+}
+
 /* Arguments being collected for optimization.  */
 typedef const char *const_char_p;		/* For DEF_VEC_P.  */
 static GTY(()) vec<const_char_p, va_gc> *optimize_args;
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index a2314e7..9f22210 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -36476,24 +36476,6 @@ ix86_get_function_versions_dispatcher (void *decl)
   return dispatch_decl;
 }
 
-/* Makes a function attribute of the form NAME(ARG_NAME) and chains
-   it to CHAIN.  */
-
-static tree
-make_attribute (const char *name, const char *arg_name, tree chain)
-{
-  tree attr_name;
-  tree attr_arg_name;
-  tree attr_args;
-  tree attr;
-
-  attr_name = get_identifier (name);
-  attr_arg_name = build_string (strlen (arg_name), arg_name);
-  attr_args = tree_cons (NULL_TREE, attr_arg_name, NULL_TREE);
-  attr = tree_cons (attr_name, attr_args, chain);
-  return attr;
-}
-
 /* Make the resolver function decl to dispatch the versions of
    a multi-versioned function,  DEFAULT_DECL.  Create an
    empty basic block in the resolver and store the pointer in
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 79440d3..65a598a 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -3066,6 +3066,19 @@ This function attribute make a stack protection of the function if
 flags @option{fstack-protector} or @option{fstack-protector-strong}
 or @option{fstack-protector-explicit} are set.
 
+@item target_clones (@var{options})
+@cindex @code{target_clones} function attribute
+The @code{target_clones} attribute is used to specify that a function is to
+be cloned into multiple versions compiled with different target options
+than specified on the command line.  The supported options and restrictions
+are the same as for @code{target} attribute.
+
+For instance on an x86, you could compile a function with
+@code{target_clones("sse4.1,avx")}. It will create 2 function clones,
+one compiled with @option{-msse4.1} and another with @option{-mavx}.
+At the function call it will create resolver @code{ifunc}, that will
+dynamically call a clone suitable for current architecture.
+
 @item target (@var{options})
 @cindex @code{target} function attribute
 Multiple target back ends implement the @code{target} attribute
diff --git a/gcc/multiple_target.c b/gcc/multiple_target.c
new file mode 100644
index 0000000..624b57f
--- /dev/null
+++ b/gcc/multiple_target.c
@@ -0,0 +1,430 @@
+/* Pass for parsing functions with multiple target attributes.
+
+   Contributed by Evgeny Stupachenko <evstupac@gmail.com>
+
+   Copyright (C) 2015 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "tree.h"
+#include "stringpool.h"
+#include "gimple.h"
+#include "diagnostic-core.h"
+#include "gimple-ssa.h"
+#include "cgraph.h"
+#include "tree-pass.h"
+#include "target.h"
+#include "attribs.h"
+#include "pretty-print.h"
+
+/* If the call in NODE has multiple target attribute with multiple fields,
+   replace it with dispatcher call and create dispatcher (once).  */
+
+static void
+create_dispatcher_calls (struct cgraph_node *node)
+{
+  cgraph_edge *e;
+  cgraph_edge *e_next;
+  /* We need to remember NEXT_CALLER as it could be modified in the loop.  */
+  for (e = node->callers; e ;e = (e == NULL) ? e_next : e->next_caller)
+    {
+      tree resolver_decl;
+      tree idecl;
+      tree decl;
+      gimple *call = e->call_stmt;
+      struct cgraph_node *inode;
+
+      /* Checking if call of function is call of versioned function.
+	 Versioned function are not inlined, so there is no need to
+	 check for inline.  */
+      if (!call
+	  || !(decl = gimple_call_fndecl (call))
+	  || !DECL_FUNCTION_VERSIONED (decl))
+	continue;
+
+      e_next = e->next_caller;
+      idecl = targetm.get_function_versions_dispatcher (decl);
+      if (!idecl)
+	{
+	  error_at (gimple_location (call),
+		    "default target_clones attribute was not set");
+	}
+      inode = cgraph_node::get (idecl);
+      gcc_assert (inode);
+      resolver_decl = targetm.generate_version_dispatcher_body (inode);
+
+      /* Update aliases.  */
+      inode->alias = true;
+      inode->alias_target = resolver_decl;
+      if (!inode->analyzed)
+	inode->resolve_alias (cgraph_node::get (resolver_decl));
+
+      e->redirect_callee (inode);
+      /*  Since REDIRECT_CALLEE modifies NEXT_CALLER field we move to
+	  previously set NEXT_CALLER.  */
+      e = NULL;
+    }
+}
+
+/* Return length of attribute names string,
+   if arglist chain >1, -1 otherwise.  */
+
+static int
+get_attr_len (tree arglist)
+{
+  tree arg;
+  int str_len_sum = 0;
+  int argnum = 1;
+  for (arg = arglist; arg; arg = TREE_CHAIN (arg))
+    {
+      unsigned int i;
+      const char *str = TREE_STRING_POINTER (TREE_VALUE (arg));
+      int len = strlen (str);
+      str_len_sum += len + 1;
+      if (arg != arglist)
+	argnum++;
+      for (i = 0; i < strlen (str); i++)
+	if (str[i] == ',')
+	  argnum++;
+    }
+  if (argnum == 1)
+    return -1;
+  return str_len_sum;
+}
+
+/* Create sting with attributes separated by comma.
+   Return number of attributes.  */
+
+static int
+get_attr_str (tree arglist, char *attr_str)
+{
+  tree arg;
+  size_t str_len_sum = 0;
+  int argnum = 0;
+  str_len_sum = 0;
+  for (arg = arglist; arg; arg = TREE_CHAIN (arg))
+    {
+      const char *str = TREE_STRING_POINTER (TREE_VALUE (arg));
+      size_t len = strlen (str);
+      memcpy (attr_str + str_len_sum, str, len);
+      attr_str[str_len_sum + len] = TREE_CHAIN (arg) ? ',' : '\0';
+      str_len_sum += len + 1;
+      argnum++;
+    }
+  return argnum;
+}
+
+/* Return number of attributes separated by comma and put them into ARGS.
+   If there is no DEFAULT attribute return -1.  */
+
+static int
+separate_attrs (char *attr_str, char **attrs)
+{
+  int i = 0;
+  bool has_default = false;
+  char *attr = strtok (attr_str, ",");
+  while (attr != NULL)
+    {
+      if (strcmp (attr, "default") == 0)
+	{
+	  has_default = true;
+	  attr = strtok (NULL, ",");
+	  continue;
+	}
+      attrs[i] = attr;
+      attr = strtok (NULL, ",");
+      i++;
+    }
+  if (!has_default)
+    return -1;
+  return i;
+}
+
+/*  Return true if symbol is valid in assembler name.  */
+
+static bool
+is_valid_asm_symbol (char c)
+{
+  if ('a' <= c && c <= 'z')
+    return true;
+  if ('A' <= c && c <= 'Z')
+    return true;
+  if ('0' <= c && c <= '9')
+    return true;
+  if (c == '_')
+    return true;
+  return false;
+}
+
+/*  Replace all not valid assembler symbols with '_'.  */
+
+static void
+create_new_asm_name (char *old_asm_name, char *new_asm_name)
+{
+  int i;
+  int old_name_len = strlen (old_asm_name);
+
+  /* Replace all not valid assembler symbols with '_'.  */
+  for (i = 0; i < old_name_len; i++)
+    if (!is_valid_asm_symbol (old_asm_name[i]))
+      new_asm_name[i] = '_';
+    else
+      new_asm_name[i] = old_asm_name[i];
+  new_asm_name[old_name_len] = '\0';
+}
+
+/*  Creates target clone of NODE.  */
+
+static cgraph_node *
+create_target_clone (cgraph_node *node, bool definition, char *name)
+{
+  cgraph_node *new_node;
+  if (definition)
+    {
+      new_node = node->create_version_clone_with_body (vNULL, NULL,
+    						       NULL, false,
+						       NULL, NULL,
+						       name);
+      new_node->force_output = true;
+    }
+  else
+    {
+      tree new_decl = copy_node (node->decl);
+      new_node = cgraph_node::get_create (new_decl);
+    }
+  return new_node;
+}
+
+/* If the function in NODE has multiple target attributes
+   create the appropriate clone for each valid target attribute.  */
+
+static bool
+expand_target_clones (struct cgraph_node *node, bool defenition)
+{
+  int i;
+  /* Parsing target attributes separated by comma.  */
+  tree attr_target = lookup_attribute ("target_clones",
+				       DECL_ATTRIBUTES (node->decl));
+  /* No targets specified.  */
+  if (!attr_target)
+    return false;
+
+  tree arglist = TREE_VALUE (attr_target);
+  int attr_len = get_attr_len (arglist);
+
+  /* No need to clone for 1 target attribute.  */
+  if (attr_len == -1)
+    {
+      warning_at (DECL_SOURCE_LOCATION (node->decl),
+		  0,
+		  "single target_clones attribute is ignored");
+      return false;
+    }
+
+  char *attr_str = XNEWVEC (char, attr_len);
+  int attrnum = get_attr_str (arglist, attr_str);
+
+  char **attrs = XNEWVEC (char *, attrnum);
+  attrnum = separate_attrs (attr_str, attrs);
+
+  if (attrnum == -1)
+    {
+      error_at (DECL_SOURCE_LOCATION (node->decl),
+		"default target was not set");
+      return false;
+    }
+
+  cgraph_function_version_info *decl1_v = NULL;
+  cgraph_function_version_info *decl2_v = NULL;
+  cgraph_function_version_info *before = NULL;
+  cgraph_function_version_info *after = NULL;
+  decl1_v = node->function_version ();
+  if (decl1_v == NULL)
+    decl1_v = node->insert_new_function_version ();
+  before = decl1_v;
+  DECL_FUNCTION_VERSIONED (node->decl) = 1;
+
+
+  for (i = 0; i < attrnum; i++)
+    {
+      char *attr = attrs[i];
+      char *suffix = XNEWVEC (char, strlen (attr) + 1);
+      create_new_asm_name (attr, asm_suffix);
+      /* Create new target clone.  */
+      cgraph_node *new_node = create_target_clone (node, defenition, suffix);
+      XDELETEVEC (suffix);
+
+      /* Set new attribute for the clone.  */
+      tree attributes = make_attribute ("target", attr,
+					DECL_ATTRIBUTES (new_node->decl));
+      DECL_ATTRIBUTES (new_node->decl) = attributes;
+      if (!targetm.target_option.valid_attribute_p (new_node->decl, NULL,
+						    TREE_VALUE (attributes), 0))
+	{
+	  warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+		      "attribute(target_clones(\"%s\")) is not "
+		      "valid for current target", attr);
+	  continue;
+	}
+
+      decl2_v = new_node->function_version ();
+      if (decl2_v != NULL)
+        continue;
+      decl2_v = new_node->insert_new_function_version ();
+
+      /* Chain decl2_v and decl1_v.  All semantically identical versions
+	 will be chained together.  */
+
+      after = decl2_v;
+      while (before->next != NULL)
+	before = before->next;
+      while (after->prev != NULL)
+	after = after->prev;
+
+      before->next = after;
+      after->prev = before;
+      DECL_FUNCTION_VERSIONED (new_node->decl) = 1;
+    }
+  /* Setting new attribute to initial function.  */
+  tree attributes =  make_attribute ("target", "default",
+				     DECL_ATTRIBUTES (node->decl));
+  DECL_ATTRIBUTES (node->decl) = attributes;
+  if (!targetm.target_option.valid_attribute_p (node->decl, NULL,
+						TREE_VALUE (attributes), 0))
+    {
+      error_at (DECL_SOURCE_LOCATION (node->decl),
+		"attribute(target_clones(\"default\")) is not "
+		"valid for current target");
+      return false;
+    }
+
+  XDELETEVEC (attrs);
+  XDELETEVEC (attr_str);
+  return true;
+}
+
+static bool target_clone_pass;
+
+static unsigned int
+ipa_target_clone (void)
+{
+  struct cgraph_node *node;
+  target_clone_pass = false;
+  FOR_EACH_FUNCTION (node)
+    if (node->definition)
+      target_clone_pass |= expand_target_clones (node, true);
+  return 0;
+}
+
+namespace {
+
+const pass_data pass_data_target_clone =
+{
+  SIMPLE_IPA_PASS,		/* type */
+  "targetclone",		/* name */
+  OPTGROUP_NONE,		/* optinfo_flags */
+  TV_NONE,			/* tv_id */
+  ( PROP_ssa | PROP_cfg ),	/* properties_required */
+  0,				/* properties_provided */
+  0,				/* properties_destroyed */
+  0,				/* todo_flags_start */
+  0				/* todo_flags_finish */
+};
+
+class pass_target_clone : public simple_ipa_opt_pass
+{
+public:
+  pass_target_clone (gcc::context *ctxt)
+    : simple_ipa_opt_pass (pass_data_target_clone, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *);
+  virtual unsigned int execute (function *) { return ipa_target_clone (); }
+};
+
+bool
+pass_target_clone::gate (function *)
+{
+  return true;
+}
+
+} // anon namespace
+
+simple_ipa_opt_pass *
+make_pass_target_clone (gcc::context *ctxt)
+{
+  return new pass_target_clone (ctxt);
+}
+
+static unsigned int
+ipa_dispatcher_calls (void)
+{
+  struct cgraph_node *node;
+  FOR_EACH_FUNCTION (node)
+    if (!node->definition)
+      target_clone_pass |= expand_target_clones (node, false);
+  if (target_clone_pass)
+    FOR_EACH_FUNCTION (node)
+      create_dispatcher_calls (node);
+  return 0;
+}
+
+namespace {
+
+const pass_data pass_data_dispatcher_calls =
+{
+  SIMPLE_IPA_PASS,		/* type */
+  "dispachercalls",		/* name */
+  OPTGROUP_NONE,		/* optinfo_flags */
+  TV_NONE,			/* tv_id */
+  ( PROP_ssa | PROP_cfg ),	/* properties_required */
+  0,				/* properties_provided */
+  0,				/* properties_destroyed */
+  0,				/* todo_flags_start */
+  0				/* todo_flags_finish */
+};
+
+class pass_dispatcher_calls : public simple_ipa_opt_pass
+{
+public:
+  pass_dispatcher_calls (gcc::context *ctxt)
+    : simple_ipa_opt_pass (pass_data_dispatcher_calls, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *);
+  virtual unsigned int execute (function *) { return ipa_dispatcher_calls (); }
+};
+
+bool
+pass_dispatcher_calls::gate (function *)
+{
+  return true;
+}
+
+} // anon namespace
+
+simple_ipa_opt_pass *
+make_pass_dispatcher_calls (gcc::context *ctxt)
+{
+  return new pass_dispatcher_calls (ctxt);
+}
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index b444864..c85ce4d 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -17340,6 +17340,7 @@ ipa_omp_simd_clone (void)
   struct cgraph_node *node;
   FOR_EACH_FUNCTION (node)
     expand_simd_clones (node);
+
   return 0;
 }
 
diff --git a/gcc/passes.def b/gcc/passes.def
index dc3f44c..57cba33 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -124,6 +124,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_ipa_devirt);
   NEXT_PASS (pass_ipa_cp);
   NEXT_PASS (pass_ipa_cdtor_merge);
+  NEXT_PASS (pass_target_clone);
   NEXT_PASS (pass_ipa_inline);
   NEXT_PASS (pass_ipa_pure_const);
   NEXT_PASS (pass_ipa_reference);
@@ -140,6 +141,7 @@ along with GCC; see the file COPYING3.  If not see
      compiled unit.  */
   INSERT_PASSES_AFTER (all_late_ipa_passes)
   NEXT_PASS (pass_ipa_pta);
+  NEXT_PASS (pass_dispatcher_calls);
   NEXT_PASS (pass_omp_simd_clone);
   TERMINATE_PASS_LIST ()
 
diff --git a/gcc/testsuite/g++.dg/ext/mvc1.C b/gcc/testsuite/g++.dg/ext/mvc1.C
new file mode 100644
index 0000000..fbf9011
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/mvc1.C
@@ -0,0 +1,34 @@
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
+
+__attribute__((target_clones("avx","arch=slm","arch=core-avx2","default")))
+int
+foo ()
+{
+  return -2;
+}
+
+__attribute__((target("avx","arch=core-avx2")))
+int
+bar ()
+{
+  return 2;
+}
+
+__attribute__((target("default")))
+int
+bar ()
+{
+  return 2;
+}
+
+int
+main ()
+{
+  int r = 0;
+  r += bar ();
+  r += foo ();
+  r += bar ();
+  r += foo ();
+  r += bar ();
+  return r - 2;
+}
diff --git a/gcc/testsuite/g++.dg/ext/mvc2.C b/gcc/testsuite/g++.dg/ext/mvc2.C
new file mode 100644
index 0000000..e7abab8
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/mvc2.C
@@ -0,0 +1,8 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+
+__attribute__((target_clones("avx","arch=slm","default")))
+__attribute__((target("avx")))
+int foo (); /* { dg-warning "'target' attribute ignored due to conflict with 'target_clones' attribute" } */
+
+__attribute__((target_clones("avx","arch=slm","default"),always_inline))
+int bar (); /* { dg-warning "'always_inline' attribute ignored due to conflict with 'target_clones' attribute" } */
diff --git a/gcc/testsuite/g++.dg/ext/mvc3.C b/gcc/testsuite/g++.dg/ext/mvc3.C
new file mode 100644
index 0000000..05bebf7
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/mvc3.C
@@ -0,0 +1,8 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+
+__attribute__((target("avx")))
+__attribute__((target_clones("avx","arch=slm","default")))
+int foo (); /* { dg-warning "'target_clones' attribute ignored due to conflict with 'target' attribute" } */
+
+__attribute__((always_inline,target_clones("avx","arch=slm","default")))
+int bar (); /* { dg-warning "'target_clones' attribute ignored due to conflict with 'always_inline' attribute" } */
diff --git a/gcc/testsuite/g++.dg/ext/mvc4.C b/gcc/testsuite/g++.dg/ext/mvc4.C
new file mode 100644
index 0000000..98e3502
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/mvc4.C
@@ -0,0 +1,34 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-mavx" } */
+
+#include <immintrin.h>
+
+__m256 x, y, z;
+
+__attribute__((target("avx")))
+int bar()
+{
+  x = _mm256_add_ps (y, z);
+  return 1;
+}
+
+__attribute__((target("default")))
+int bar()
+{
+  return 2;
+}
+
+int
+foobar()
+{
+  if (__builtin_cpu_supports ("avx"))
+    return bar();
+  else
+    return 0;
+}
+
+__attribute__((target_clones("default","sse3")))
+int foo()
+{
+  return foobar();
+}
diff --git a/gcc/testsuite/gcc.dg/mvc1.c b/gcc/testsuite/gcc.dg/mvc1.c
new file mode 100644
index 0000000..8e02721
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/mvc1.c
@@ -0,0 +1,26 @@
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
+
+__attribute__((target_clones("avx","arch=slm","arch=core-avx2","default")))
+int
+foo ()
+{
+  return -2;
+}
+
+int
+bar ()
+{
+  return 2;
+}
+
+int
+main ()
+{
+  int r = 0;
+  r += bar ();
+  r += foo ();
+  r += bar ();
+  r += foo ();
+  r += bar ();
+  return r - 2;
+}
diff --git a/gcc/testsuite/gcc.dg/mvc2.c b/gcc/testsuite/gcc.dg/mvc2.c
new file mode 100644
index 0000000..af0c6f7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/mvc2.c
@@ -0,0 +1,4 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+
+__attribute__((target_clones("avx","arch=slm","arch=core-avx2")))
+int foo ();
diff --git a/gcc/testsuite/gcc.dg/mvc3.c b/gcc/testsuite/gcc.dg/mvc3.c
new file mode 100644
index 0000000..3af3e35
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/mvc3.c
@@ -0,0 +1,10 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+
+__attribute__((target_clones("avx","arch=slm","arch=core-avx2")))
+int foo (); /* { dg-error "default target was not set" } */
+
+int
+bar ()
+{
+  return foo();
+}
diff --git a/gcc/testsuite/gcc.dg/mvc4.c b/gcc/testsuite/gcc.dg/mvc4.c
new file mode 100644
index 0000000..48ec9a1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/mvc4.c
@@ -0,0 +1,26 @@
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
+
+__attribute__((target_clones("default","avx","default")))
+int
+foo ()
+{
+  return -2;
+}
+
+int
+bar ()
+{
+  return 2;
+}
+
+int
+main ()
+{
+  int r = 0;
+  r += bar ();
+  r += foo ();
+  r += bar ();
+  r += foo ();
+  r += bar ();
+  return r - 2;
+}
diff --git a/gcc/testsuite/gcc.dg/mvc5.c b/gcc/testsuite/gcc.dg/mvc5.c
new file mode 100644
index 0000000..89001e5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/mvc5.c
@@ -0,0 +1,17 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-fno-inline" } */
+/* { dg-final { scan-assembler-times "foo.ifunc" 6 } } */
+
+__attribute__((target_clones("default","avx","avx2")))
+int
+foo ()
+{
+  return 10;
+}
+
+__attribute__((target_clones("default","avx","avx2")))
+int
+bar ()
+{
+  return -foo ();
+}
diff --git a/gcc/testsuite/gcc.dg/mvc6.c b/gcc/testsuite/gcc.dg/mvc6.c
new file mode 100644
index 0000000..1621985
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/mvc6.c
@@ -0,0 +1,16 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O3" } */
+/* { dg-final { scan-assembler "vpshufb" } } */
+/* { dg-final { scan-assembler "punpcklbw" } } */
+
+__attribute__((target_clones("arch=core-avx2","arch=slm","default")))
+void
+foo(char *in, char *out, int size)
+{
+  int i;
+  for(i = 0; i < size; i++)
+    {
+	out[2 * i] = in[i];
+	out[2 * i + 1] = in[i];
+    }
+}
diff --git a/gcc/testsuite/gcc.dg/mvc7.c b/gcc/testsuite/gcc.dg/mvc7.c
new file mode 100644
index 0000000..d61d78e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/mvc7.c
@@ -0,0 +1,10 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-final { scan-assembler-times "foo.ifunc" 4 } } */
+
+__attribute__((target_clones("avx","default","arch=slm","arch=core-avx2")))
+int foo ();
+
+int main()
+{
+  return foo();
+}
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 91f63a8..3191e2b 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -483,6 +483,8 @@ extern ipa_opt_pass_d *make_pass_ipa_reference (gcc::context *ctxt);
 extern ipa_opt_pass_d *make_pass_ipa_pure_const (gcc::context *ctxt);
 extern simple_ipa_opt_pass *make_pass_ipa_pta (gcc::context *ctxt);
 extern simple_ipa_opt_pass *make_pass_ipa_tm (gcc::context *ctxt);
+extern simple_ipa_opt_pass *make_pass_target_clone (gcc::context *ctxt);
+extern simple_ipa_opt_pass *make_pass_dispatcher_calls (gcc::context *ctxt);
 extern simple_ipa_opt_pass *make_pass_omp_simd_clone (gcc::context *ctxt);
 extern ipa_opt_pass_d *make_pass_ipa_profile (gcc::context *ctxt);
 extern ipa_opt_pass_d *make_pass_ipa_cdtor_merge (gcc::context *ctxt);

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] New attribute to create target clones
  2015-10-29 18:15                                     ` Evgeny Stupachenko
@ 2015-10-30  3:55                                       ` Jan Hubicka
  2015-10-30  5:30                                       ` Jeff Law
  1 sibling, 0 replies; 36+ messages in thread
From: Jan Hubicka @ 2015-10-30  3:55 UTC (permalink / raw)
  To: Evgeny Stupachenko
  Cc: Jan Hubicka, Jeff Law, Bernd Schmidt, Bernd Schmidt, GCC Patches

> Ok. I've modified the patch correspondingly.
Thanks,
the IPA/callgarph parts of the patch looks fine to me except for a typo:
+/* Create sting with attributes separated by comma.
+   Return number of attributes.  */

and a fact that sometimes you skip vertical whitespace after variable declarations.
I will leave it Jeff for the final ack.

Honza

> 
> >
> > I will look at the rest of the patch at evening.
> > Honza
> >>
> >> Thanks,
> >> Evgeny
> >> >
> >> >
> >> > Jeff
> >
> >


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] New attribute to create target clones
  2015-10-29 18:15                                     ` Evgeny Stupachenko
  2015-10-30  3:55                                       ` Jan Hubicka
@ 2015-10-30  5:30                                       ` Jeff Law
  2015-10-30 12:30                                         ` Evgeny Stupachenko
  1 sibling, 1 reply; 36+ messages in thread
From: Jeff Law @ 2015-10-30  5:30 UTC (permalink / raw)
  To: Evgeny Stupachenko, Jan Hubicka; +Cc: Bernd Schmidt, Bernd Schmidt, GCC Patches

On 10/29/2015 12:13 PM, Evgeny Stupachenko wrote:
> On Thu, Oct 29, 2015 at 8:02 PM, Jan Hubicka<hubicka@ucw.cz>  wrote:
>>> >>Yes. This is not necessary. However that way we'll have the following
>>> >>code in dispatcher:
>>> >>         cmpl    $6, __cpu_model+4(%rip)
>>> >>         sete    %al
>>> >>         movzbl  %al, %eax
>>> >>         testl   %eax, %eax
>>> >>         jle     .L16
>>> >>         movl    $foo.target_clone.1, %eax
>>> >>I think it is very hard to read and debug such...
>>> >>
>>> >>While now we have:
>>> >>
>>> >>         cmpl    $6, __cpu_model+4(%rip)
>>> >>         sete    %al
>>> >>         movzbl  %al, %eax
>>> >>         testl   %eax, %eax
>>> >>         jle     .L16
>>> >>         movl    $foo.arch_slm, %eax
>>> >>
>>> >>and it is clear that we are jumping to SLM code here.
>>> >>I'd like to keep target in names.
>> >
>> >I am not against more informative names, but why you don't pass the info here:
>> >
>> >+create_target_clone (cgraph_node *node, bool definition)
>> >+{
>> >+  cgraph_node *new_node;
>> >+  if (definition)
>> >+    {
>> >+      new_node = node->create_version_clone_with_body (vNULL, NULL,
>> >+                                                      NULL, false,
>> >+                                                      NULL, NULL,
>> >+                                                      "target_clone");
>> >+      new_node->force_output = true;
>> >+    }
>> >+  else
>> >+    {
>> >+      tree new_decl = copy_node (node->decl);
>> >+      new_node = cgraph_node::get_create (new_decl);
>> >+    }
>> >+  return new_node;
>> >+}
>> >
>> >passing "arch_slm" instead of target_clone will get you the name you want
>> >(plus the extra index that may be needed anyway to disambiguate).
>> >
>> >Note that in general those .suffixes should be machine parseable, so cp-demangle.c
>> >can expand them correctly.  We may want to have some consistent grammar for them here
>> >and update cp-demangle.c to output nice info like "target clone for..."
> Ok. I've modified the patch correspondingly.
You'll need updated ChangeLog entries.  Don't forget to drop the omp-low 
spurious whitespace change.

You should also fix the formatting nits Jan pointed out.

With those changes, this patch is OK for the trunk.  I'll run the header 
file reordering & cleanup tool after the patch is committed to the trunk.

jeff


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] New attribute to create target clones
  2015-10-30  5:30                                       ` Jeff Law
@ 2015-10-30 12:30                                         ` Evgeny Stupachenko
  0 siblings, 0 replies; 36+ messages in thread
From: Evgeny Stupachenko @ 2015-10-30 12:30 UTC (permalink / raw)
  To: Jeff Law; +Cc: Jan Hubicka, Bernd Schmidt, Bernd Schmidt, GCC Patches

I've fixed the misprint and vertical spaces.
I'll ask to commit the patch when x86 bootstrap and make check finished.

Thanks,
Evgeny

Updated ChangeLog:

2015-10-30  Evgeny Stupachenko  <evstupac@gmail.com>

gcc/
        * Makefile.in (OBJS): Add multiple_target.o.
        * attrib.c (make_attribute): Moved from config/i386/i386.c
        * config/i386/i386.c (make_attribute): Deleted.
        * multiple_target.c (create_dispatcher_calls): New.
        (get_attr_len): Ditto.
        (get_attr_str): Ditto.
        (separate_attrs): Ditto.
        (is_valid_asm_symbol): Ditto.
        (create_new_asm_name): Ditto.
        (create_target_clone): Ditto.
        (expand_target_clones): Ditto.
        (ipa_target_clone): Ditto.
        (ipa_dispatcher_calls): Ditto.
        * passes.def (pass_target_clone): Two new ipa passes.
        * tree-pass.h (make_pass_target_clone): Ditto.

gcc/c-family
        * c-common.c (handle_target_clones_attribute): New.
        * (c_common_attribute_table): Add handle_target_clones_attribute.
        * (handle_always_inline_attribute): Add check on target_clones
        attribute.
        * (handle_target_attribute): Ditto.

gcc/testsuite
        * gcc.dg/mvc1.c: New test for multiple targets cloning.
        * gcc.dg/mvc2.c: Ditto.
        * gcc.dg/mvc3.c: Ditto.
        * gcc.dg/mvc4.c: Ditto.
        * gcc.dg/mvc5.c: Ditto.
        * gcc.dg/mvc6.c: Ditto.
        * gcc.dg/mvc7.c: Ditto.
        * g++.dg/ext/mvc1.C: Ditto.
        * g++.dg/ext/mvc2.C: Ditto.
        * g++.dg/ext/mvc3.C: Ditto.
        * g++.dg/ext/mvc4.C: Ditto.

gcc/doc
        * doc/extend.texi (target_clones): New attribute description.

On Fri, Oct 30, 2015 at 8:27 AM, Jeff Law <law@redhat.com> wrote:
> On 10/29/2015 12:13 PM, Evgeny Stupachenko wrote:
>>
>> On Thu, Oct 29, 2015 at 8:02 PM, Jan Hubicka<hubicka@ucw.cz>  wrote:
>>>>
>>>> >>Yes. This is not necessary. However that way we'll have the following
>>>> >>code in dispatcher:
>>>> >>         cmpl    $6, __cpu_model+4(%rip)
>>>> >>         sete    %al
>>>> >>         movzbl  %al, %eax
>>>> >>         testl   %eax, %eax
>>>> >>         jle     .L16
>>>> >>         movl    $foo.target_clone.1, %eax
>>>> >>I think it is very hard to read and debug such...
>>>> >>
>>>> >>While now we have:
>>>> >>
>>>> >>         cmpl    $6, __cpu_model+4(%rip)
>>>> >>         sete    %al
>>>> >>         movzbl  %al, %eax
>>>> >>         testl   %eax, %eax
>>>> >>         jle     .L16
>>>> >>         movl    $foo.arch_slm, %eax
>>>> >>
>>>> >>and it is clear that we are jumping to SLM code here.
>>>> >>I'd like to keep target in names.
>>>
>>> >
>>> >I am not against more informative names, but why you don't pass the info
>>> > here:
>>> >
>>> >+create_target_clone (cgraph_node *node, bool definition)
>>> >+{
>>> >+  cgraph_node *new_node;
>>> >+  if (definition)
>>> >+    {
>>> >+      new_node = node->create_version_clone_with_body (vNULL, NULL,
>>> >+                                                      NULL, false,
>>> >+                                                      NULL, NULL,
>>> >+                                                      "target_clone");
>>> >+      new_node->force_output = true;
>>> >+    }
>>> >+  else
>>> >+    {
>>> >+      tree new_decl = copy_node (node->decl);
>>> >+      new_node = cgraph_node::get_create (new_decl);
>>> >+    }
>>> >+  return new_node;
>>> >+}
>>> >
>>> >passing "arch_slm" instead of target_clone will get you the name you
>>> > want
>>> >(plus the extra index that may be needed anyway to disambiguate).
>>> >
>>> >Note that in general those .suffixes should be machine parseable, so
>>> > cp-demangle.c
>>> >can expand them correctly.  We may want to have some consistent grammar
>>> > for them here
>>> >and update cp-demangle.c to output nice info like "target clone for..."
>>
>> Ok. I've modified the patch correspondingly.
>
> You'll need updated ChangeLog entries.  Don't forget to drop the omp-low
> spurious whitespace change.
>
> You should also fix the formatting nits Jan pointed out.
>
> With those changes, this patch is OK for the trunk.  I'll run the header
> file reordering & cleanup tool after the patch is committed to the trunk.
>
> jeff
>
>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] New attribute to create target clones
  2015-11-02 17:02   ` Jeff Law
@ 2015-11-03 11:45     ` Evgeny Stupachenko
  0 siblings, 0 replies; 36+ messages in thread
From: Evgeny Stupachenko @ 2015-11-03 11:45 UTC (permalink / raw)
  To: Jeff Law, Uros Bizjak
  Cc: Dominique d'Humières, Jan Hubicka, Bernd Schmidt, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1502 bytes --]

Some tests in the patch already updated (ifunc require condition
added) by Uros commit: ac39b078992c27934ea53cb580dbd79f75b6c727

I'll ask to commit attached patch.
x86 bootstrap and make check passed.

2015-11-03  Evgeny Stupachenko  <evstupac@gmail.com>

gcc/
        * multiple_target.c (create_dispatcher_calls): Add target check
        on ifunc.
        (create_target_clone): Change assembler name for versioned declarations.

gcc/testsuite
        * g++.dg/ext/mvc4.C: Add dg-require-ifunc condition.
        * gcc.target/i386/mvc5.c: Ditto.
        * gcc.target/i386/mvc7.c: Add dg-require-ifunc condition and checks on
        resolver.

Thanks,
Evgeny

On Mon, Nov 2, 2015 at 8:02 PM, Jeff Law <law@redhat.com> wrote:
> On 11/02/2015 07:50 AM, Evgeny Stupachenko wrote:
>>
>> Yes, that is exactly what should fix the tests.
>> Unfortunately I don't have access to darwin machine right now.
>> Can you please test if the patch (attached) fixes the tests?
>>
>> gcc/
>>          * multiple_target.c (create_dispatcher_calls): Add target check
>>          on ifunc.
>>          (create_target_clone): Change assembler name for versioned
>> declarations.
>>
>> gcc/testsuite
>>          * gcc.dg/mvc1.c: Add dg-require-ifunc condition.
>>          * gcc.dg/mvc4.c: Ditto.
>>          * gcc.dg/mvc5.c: Ditto.
>>          * g++.dg/ext/mvc1.C: Ditto.
>>          * g++.dg/ext/mvc4.C: Ditto.
>>          * gcc.dg/mvc7.c: Add dg-require-ifunc condition and checks on
>> resolver.
>>
> OK.
>
> jeff
>

[-- Attachment #2: target_clones_tests_fix.patch --]
[-- Type: application/octet-stream, Size: 2425 bytes --]

diff --git a/gcc/multiple_target.c b/gcc/multiple_target.c
index 54618d8..8cca281 100644
--- a/gcc/multiple_target.c
+++ b/gcc/multiple_target.c
@@ -61,12 +61,20 @@ create_dispatcher_calls (struct cgraph_node *node)
 	  || !DECL_FUNCTION_VERSIONED (decl))
 	continue;
 
+      if (!targetm.has_ifunc_p ())
+	{
+	  error_at (gimple_location (call),
+		    "the call requires ifunc, which is not"
+		    " supported by this target");
+	  break;
+	}
       e_next = e->next_caller;
       idecl = targetm.get_function_versions_dispatcher (decl);
       if (!idecl)
 	{
 	  error_at (gimple_location (call),
 		    "default target_clones attribute was not set");
+	  break;
 	}
       inode = cgraph_node::get (idecl);
       gcc_assert (inode);
@@ -215,6 +223,10 @@ create_target_clone (cgraph_node *node, bool definition, char *name)
     {
       tree new_decl = copy_node (node->decl);
       new_node = cgraph_node::get_create (new_decl);
+      /* Generate a new name for the new version.  */
+      symtab->change_decl_assembler_name (new_node->decl,
+					  clone_function_name (node->decl,
+							       name));
     }
   return new_node;
 }
diff --git a/gcc/testsuite/g++.dg/ext/mvc4.C b/gcc/testsuite/g++.dg/ext/mvc4.C
index 98e3502..6e18e56 100644
--- a/gcc/testsuite/g++.dg/ext/mvc4.C
+++ b/gcc/testsuite/g++.dg/ext/mvc4.C
@@ -1,4 +1,5 @@
 /* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-require-ifunc "" } */
 /* { dg-options "-mavx" } */
 
 #include <immintrin.h>
diff --git a/gcc/testsuite/gcc.target/i386/mvc5.c b/gcc/testsuite/gcc.target/i386/mvc5.c
index 0b1981d..8fea04c 100644
--- a/gcc/testsuite/gcc.target/i386/mvc5.c
+++ b/gcc/testsuite/gcc.target/i386/mvc5.c
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-require-ifunc "" } */
 /* { dg-options "-fno-inline" } */
 /* { dg-final { scan-assembler-times "foo.ifunc" 6 } } */
 
diff --git a/gcc/testsuite/gcc.target/i386/mvc7.c b/gcc/testsuite/gcc.target/i386/mvc7.c
index efc4b69..9a9d7a3 100644
--- a/gcc/testsuite/gcc.target/i386/mvc7.c
+++ b/gcc/testsuite/gcc.target/i386/mvc7.c
@@ -1,4 +1,8 @@
 /* { dg-do compile } */
+/* { dg-require-ifunc "" } */
+/* { dg-final { scan-assembler "foo.resolver" } } */
+/* { dg-final { scan-assembler "avx" } } */
+/* { dg-final { scan-assembler "slm" } } */
 /* { dg-final { scan-assembler-times "foo.ifunc" 4 } } */
 
 __attribute__((target_clones("avx","default","arch=slm","arch=core-avx2")))

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] New attribute to create target clones
  2015-11-02 14:50 ` Evgeny Stupachenko
  2015-11-02 15:22   ` Dominique d'Humières
@ 2015-11-02 17:02   ` Jeff Law
  2015-11-03 11:45     ` Evgeny Stupachenko
  1 sibling, 1 reply; 36+ messages in thread
From: Jeff Law @ 2015-11-02 17:02 UTC (permalink / raw)
  To: Evgeny Stupachenko, Dominique d'Humières
  Cc: Jan Hubicka, Bernd Schmidt, gcc-patches

On 11/02/2015 07:50 AM, Evgeny Stupachenko wrote:
> Yes, that is exactly what should fix the tests.
> Unfortunately I don't have access to darwin machine right now.
> Can you please test if the patch (attached) fixes the tests?
>
> gcc/
>          * multiple_target.c (create_dispatcher_calls): Add target check
>          on ifunc.
>          (create_target_clone): Change assembler name for versioned declarations.
>
> gcc/testsuite
>          * gcc.dg/mvc1.c: Add dg-require-ifunc condition.
>          * gcc.dg/mvc4.c: Ditto.
>          * gcc.dg/mvc5.c: Ditto.
>          * g++.dg/ext/mvc1.C: Ditto.
>          * g++.dg/ext/mvc4.C: Ditto.
>          * gcc.dg/mvc7.c: Add dg-require-ifunc condition and checks on resolver.
>
OK.

jeff

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] New attribute to create target clones
  2015-11-02 14:50 ` Evgeny Stupachenko
@ 2015-11-02 15:22   ` Dominique d'Humières
  2015-11-02 17:02   ` Jeff Law
  1 sibling, 0 replies; 36+ messages in thread
From: Dominique d'Humières @ 2015-11-02 15:22 UTC (permalink / raw)
  To: Evgeny Stupachenko; +Cc: Jeff Law, Jan Hubicka, Bernd Schmidt, gcc-patches

Evgeny,

I have already checked that the addition of

+/* { dg-require-ifunc "" } */

fixes the failures. I’ll test your full patch later today (currently chasing regression with gfortran).

Thanks,

Dominique

> Le 2 nov. 2015 à 15:50, Evgeny Stupachenko <evstupac@gmail.com> a écrit :
> 
> Yes, that is exactly what should fix the tests.
> Unfortunately I don't have access to darwin machine right now.
> Can you please test if the patch (attached) fixes the tests?
> 
> gcc/
>        * multiple_target.c (create_dispatcher_calls): Add target check
>        on ifunc.
>        (create_target_clone): Change assembler name for versioned declarations.
> 
> gcc/testsuite
>        * gcc.dg/mvc1.c: Add dg-require-ifunc condition.
>        * gcc.dg/mvc4.c: Ditto.
>        * gcc.dg/mvc5.c: Ditto.
>        * g++.dg/ext/mvc1.C: Ditto.
>        * g++.dg/ext/mvc4.C: Ditto.
>        * gcc.dg/mvc7.c: Add dg-require-ifunc condition and checks on resolver.
> 
> Thanks,
> Evgeny
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] New attribute to create target clones
  2015-10-31 10:52 Dominique d'Humières
@ 2015-11-02 14:50 ` Evgeny Stupachenko
  2015-11-02 15:22   ` Dominique d'Humières
  2015-11-02 17:02   ` Jeff Law
  0 siblings, 2 replies; 36+ messages in thread
From: Evgeny Stupachenko @ 2015-11-02 14:50 UTC (permalink / raw)
  To: Dominique d'Humières
  Cc: Jeff Law, Jan Hubicka, Bernd Schmidt, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2816 bytes --]

Yes, that is exactly what should fix the tests.
Unfortunately I don't have access to darwin machine right now.
Can you please test if the patch (attached) fixes the tests?

gcc/
        * multiple_target.c (create_dispatcher_calls): Add target check
        on ifunc.
        (create_target_clone): Change assembler name for versioned declarations.

gcc/testsuite
        * gcc.dg/mvc1.c: Add dg-require-ifunc condition.
        * gcc.dg/mvc4.c: Ditto.
        * gcc.dg/mvc5.c: Ditto.
        * g++.dg/ext/mvc1.C: Ditto.
        * g++.dg/ext/mvc4.C: Ditto.
        * gcc.dg/mvc7.c: Add dg-require-ifunc condition and checks on resolver.

Thanks,
Evgeny

On Sat, Oct 31, 2015 at 1:40 PM, Dominique d'Humières
<dominiq@lps.ens.fr> wrote:
> Evgeny,
> On darwin I see the following failures
> FAIL: g++.dg/ext/mvc1.C  -std=c++11 (test for excess errors)
> UNRESOLVED: g++.dg/ext/mvc1.C  -std=c++11 compilation failed to produce executable
> FAIL: g++.dg/ext/mvc1.C  -std=c++14 (test for excess errors)
> UNRESOLVED: g++.dg/ext/mvc1.C  -std=c++14 compilation failed to produce executable
> FAIL: g++.dg/ext/mvc1.C  -std=c++98 (test for excess errors)
> UNRESOLVED: g++.dg/ext/mvc1.C  -std=c++98 compilation failed to produce executable
> FAIL: g++.dg/ext/mvc4.C  -std=gnu++11 (internal compiler error)
> FAIL: g++.dg/ext/mvc4.C  -std=gnu++11 (test for excess errors)
> FAIL: g++.dg/ext/mvc4.C  -std=gnu++14 (internal compiler error)
> FAIL: g++.dg/ext/mvc4.C  -std=gnu++14 (test for excess errors)
> FAIL: g++.dg/ext/mvc4.C  -std=gnu++98 (internal compiler error)
> FAIL: g++.dg/ext/mvc4.C  -std=gnu++98 (test for excess errors)
>
> FAIL: gcc.dg/mvc1.c (internal compiler error)
> FAIL: gcc.dg/mvc1.c (test for excess errors)
> UNRESOLVED: gcc.dg/mvc1.c compilation failed to produce executable
> FAIL: gcc.dg/mvc4.c (internal compiler error)
> FAIL: gcc.dg/mvc4.c (test for excess errors)
> UNRESOLVED: gcc.dg/mvc4.c compilation failed to produce executable
> FAIL: gcc.dg/mvc5.c (internal compiler error)
> FAIL: gcc.dg/mvc5.c (test for excess errors)
> FAIL: gcc.dg/mvc5.c scan-assembler-times foo.ifunc 6
> FAIL: gcc.dg/mvc7.c (internal compiler error)
> FAIL: gcc.dg/mvc7.c (test for excess errors)
> FAIL: gcc.dg/mvc7.c scan-assembler-times foo.ifunc 4
>
> The errors are of the kind
>
> /opt/gcc/_clean/gcc/testsuite/gcc.dg/mvc7.C:5:5: error: multiversioning needs ifunc which is not supported on this target
>
> These tests probably require something such as
>
> /* { dg-require-ifunc "" } */
>
> I have no opinion if this should also used in the tests for warnings or errors.
>
> The ICEs are of the kind
>
> /opt/gcc/_clean/gcc/testsuite/gcc.dg/mvc7.C:10:1: internal compiler error: Segmentation fault: 11
>  }
>  ^
>
> TIA
>
> Dominique
>

[-- Attachment #2: target_clones_tests_fix.patch --]
[-- Type: application/octet-stream, Size: 3482 bytes --]

diff --git a/gcc/multiple_target.c b/gcc/multiple_target.c
index 54618d8..8cca281 100644
--- a/gcc/multiple_target.c
+++ b/gcc/multiple_target.c
@@ -61,12 +61,20 @@ create_dispatcher_calls (struct cgraph_node *node)
 	  || !DECL_FUNCTION_VERSIONED (decl))
 	continue;
 
+      if (!targetm.has_ifunc_p ())
+	{
+	  error_at (gimple_location (call),
+		    "the call requires ifunc, which is not"
+		    " supported by this target");
+	  break;
+	}
       e_next = e->next_caller;
       idecl = targetm.get_function_versions_dispatcher (decl);
       if (!idecl)
 	{
 	  error_at (gimple_location (call),
 		    "default target_clones attribute was not set");
+	  break;
 	}
       inode = cgraph_node::get (idecl);
       gcc_assert (inode);
@@ -215,6 +223,10 @@ create_target_clone (cgraph_node *node, bool definition, char *name)
     {
       tree new_decl = copy_node (node->decl);
       new_node = cgraph_node::get_create (new_decl);
+      /* Generate a new name for the new version.  */
+      symtab->change_decl_assembler_name (new_node->decl,
+					  clone_function_name (node->decl,
+							       name));
     }
   return new_node;
 }
diff --git a/gcc/testsuite/g++.dg/ext/mvc1.C b/gcc/testsuite/g++.dg/ext/mvc1.C
index fbf9011..ff37238 100644
--- a/gcc/testsuite/g++.dg/ext/mvc1.C
+++ b/gcc/testsuite/g++.dg/ext/mvc1.C
@@ -1,4 +1,5 @@
 /* { dg-do run { target i?86-*-* x86_64-*-* } } */
+/* { dg-require-ifunc "" } */
 
 __attribute__((target_clones("avx","arch=slm","arch=core-avx2","default")))
 int
diff --git a/gcc/testsuite/g++.dg/ext/mvc4.C b/gcc/testsuite/g++.dg/ext/mvc4.C
index 98e3502..6e18e56 100644
--- a/gcc/testsuite/g++.dg/ext/mvc4.C
+++ b/gcc/testsuite/g++.dg/ext/mvc4.C
@@ -1,4 +1,5 @@
 /* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-require-ifunc "" } */
 /* { dg-options "-mavx" } */
 
 #include <immintrin.h>
diff --git a/gcc/testsuite/gcc.dg/mvc1.c b/gcc/testsuite/gcc.dg/mvc1.c
index 8e02721..95fd8f6 100644
--- a/gcc/testsuite/gcc.dg/mvc1.c
+++ b/gcc/testsuite/gcc.dg/mvc1.c
@@ -1,4 +1,5 @@
 /* { dg-do run { target i?86-*-* x86_64-*-* } } */
+/* { dg-require-ifunc "" } */
 
 __attribute__((target_clones("avx","arch=slm","arch=core-avx2","default")))
 int
diff --git a/gcc/testsuite/gcc.dg/mvc4.c b/gcc/testsuite/gcc.dg/mvc4.c
index 48ec9a1..eb93abe 100644
--- a/gcc/testsuite/gcc.dg/mvc4.c
+++ b/gcc/testsuite/gcc.dg/mvc4.c
@@ -1,4 +1,5 @@
 /* { dg-do run { target i?86-*-* x86_64-*-* } } */
+/* { dg-require-ifunc "" } */
 
 __attribute__((target_clones("default","avx","default")))
 int
diff --git a/gcc/testsuite/gcc.dg/mvc5.c b/gcc/testsuite/gcc.dg/mvc5.c
index 89001e5..0ee1127 100644
--- a/gcc/testsuite/gcc.dg/mvc5.c
+++ b/gcc/testsuite/gcc.dg/mvc5.c
@@ -1,4 +1,5 @@
 /* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-require-ifunc "" } */
 /* { dg-options "-fno-inline" } */
 /* { dg-final { scan-assembler-times "foo.ifunc" 6 } } */
 
diff --git a/gcc/testsuite/gcc.dg/mvc7.c b/gcc/testsuite/gcc.dg/mvc7.c
index d61d78e..be2d5f0 100644
--- a/gcc/testsuite/gcc.dg/mvc7.c
+++ b/gcc/testsuite/gcc.dg/mvc7.c
@@ -1,5 +1,9 @@
 /* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-require-ifunc "" } */
 /* { dg-final { scan-assembler-times "foo.ifunc" 4 } } */
+/* { dg-final { scan-assembler "foo.resolver" } } */
+/* { dg-final { scan-assembler "avx" } } */
+/* { dg-final { scan-assembler "slm" } } */
 
 __attribute__((target_clones("avx","default","arch=slm","arch=core-avx2")))
 int foo ();

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] New attribute to create target clones
@ 2015-10-31 10:52 Dominique d'Humières
  2015-11-02 14:50 ` Evgeny Stupachenko
  0 siblings, 1 reply; 36+ messages in thread
From: Dominique d'Humières @ 2015-10-31 10:52 UTC (permalink / raw)
  To: evstupac; +Cc: law, hubicka, bschmidt, gcc-patches

Evgeny,
On darwin I see the following failures
FAIL: g++.dg/ext/mvc1.C  -std=c++11 (test for excess errors)
UNRESOLVED: g++.dg/ext/mvc1.C  -std=c++11 compilation failed to produce executable
FAIL: g++.dg/ext/mvc1.C  -std=c++14 (test for excess errors)
UNRESOLVED: g++.dg/ext/mvc1.C  -std=c++14 compilation failed to produce executable
FAIL: g++.dg/ext/mvc1.C  -std=c++98 (test for excess errors)
UNRESOLVED: g++.dg/ext/mvc1.C  -std=c++98 compilation failed to produce executable
FAIL: g++.dg/ext/mvc4.C  -std=gnu++11 (internal compiler error)
FAIL: g++.dg/ext/mvc4.C  -std=gnu++11 (test for excess errors)
FAIL: g++.dg/ext/mvc4.C  -std=gnu++14 (internal compiler error)
FAIL: g++.dg/ext/mvc4.C  -std=gnu++14 (test for excess errors)
FAIL: g++.dg/ext/mvc4.C  -std=gnu++98 (internal compiler error)
FAIL: g++.dg/ext/mvc4.C  -std=gnu++98 (test for excess errors)

FAIL: gcc.dg/mvc1.c (internal compiler error)
FAIL: gcc.dg/mvc1.c (test for excess errors)
UNRESOLVED: gcc.dg/mvc1.c compilation failed to produce executable
FAIL: gcc.dg/mvc4.c (internal compiler error)
FAIL: gcc.dg/mvc4.c (test for excess errors)
UNRESOLVED: gcc.dg/mvc4.c compilation failed to produce executable
FAIL: gcc.dg/mvc5.c (internal compiler error)
FAIL: gcc.dg/mvc5.c (test for excess errors)
FAIL: gcc.dg/mvc5.c scan-assembler-times foo.ifunc 6
FAIL: gcc.dg/mvc7.c (internal compiler error)
FAIL: gcc.dg/mvc7.c (test for excess errors)
FAIL: gcc.dg/mvc7.c scan-assembler-times foo.ifunc 4

The errors are of the kind

/opt/gcc/_clean/gcc/testsuite/gcc.dg/mvc7.C:5:5: error: multiversioning needs ifunc which is not supported on this target

These tests probably require something such as

/* { dg-require-ifunc "" } */

I have no opinion if this should also used in the tests for warnings or errors.

The ICEs are of the kind

/opt/gcc/_clean/gcc/testsuite/gcc.dg/mvc7.C:10:1: internal compiler error: Segmentation fault: 11
 }
 ^

TIA

Dominique

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2015-11-03 11:45 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-27 11:35 [PATCH] New attribute to create target clones Evgeny Stupachenko
2015-09-08 11:47 ` Evgeny Stupachenko
2015-09-16 11:42   ` Evgeny Stupachenko
2015-09-21 13:43 ` Bernd Schmidt
2015-09-22 20:24   ` Jeff Law
2015-09-22 21:09     ` Bernd Schmidt
2015-09-22 23:52       ` Evgeny Stupachenko
2015-09-25  0:02         ` Evgeny Stupachenko
2015-10-02 13:18           ` Evgeny Stupachenko
2015-10-08 19:00           ` Jeff Law
2015-10-08 19:23             ` Jan Hubicka
2015-10-08 19:53               ` Jeff Law
2015-10-08 21:36                 ` Jan Hubicka
2015-10-09 17:45                   ` Jeff Law
2015-10-09 18:27                     ` Jan Hubicka
2015-10-09 19:57                       ` Evgeny Stupachenko
2015-10-09 20:04                         ` Jan Hubicka
2015-10-09 21:44                           ` Evgeny Stupachenko
2015-10-12 23:35                             ` Evgeny Stupachenko
2015-10-14 21:32                               ` Evgeny Stupachenko
2015-10-22 18:07                                 ` Evgeny Stupachenko
2015-10-26 15:59                               ` Jeff Law
2015-10-29 13:42                                 ` Evgeny Stupachenko
2015-10-29 17:07                                   ` Jan Hubicka
2015-10-29 18:15                                     ` Evgeny Stupachenko
2015-10-30  3:55                                       ` Jan Hubicka
2015-10-30  5:30                                       ` Jeff Law
2015-10-30 12:30                                         ` Evgeny Stupachenko
2015-10-08 20:01             ` Evgeny Stupachenko
2015-10-09  4:05               ` Jeff Law
2015-10-08 16:39       ` Jeff Law
2015-10-31 10:52 Dominique d'Humières
2015-11-02 14:50 ` Evgeny Stupachenko
2015-11-02 15:22   ` Dominique d'Humières
2015-11-02 17:02   ` Jeff Law
2015-11-03 11:45     ` Evgeny Stupachenko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).