public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
  2011-08-16 21:27 [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046) Sriraman Tallam
@ 2011-08-16 21:27 ` H.J. Lu
  2011-08-16 21:52   ` Sriraman Tallam
  2011-08-16 23:22 ` Andi Kleen
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 50+ messages in thread
From: H.J. Lu @ 2011-08-16 21:27 UTC (permalink / raw)
  To: Sriraman Tallam; +Cc: reply, gcc-patches

On Tue, Aug 16, 2011 at 1:50 PM, Sriraman Tallam <tmsriram@google.com> wrote:
> Support for getting CPU type and feature information at run-time.
>
> The following patch provides support for finding the platform type at run-time, like cpu type and features supported. The multi-versioning framework will use the builtins added to dispatch the right function version. Please refer to http://gcc.gnu.org/ml/gcc/2011-08/msg00298.html for details on function multi-versioning usability.
>
>        * tree-pass.h (pass_tree_fold_builtin_target): New pass.
>        * builtins.def (BUILT_IN_TARGET_SUPPORTS_CMOV): New builtin.
>        (BUILT_IN_TARGET_SUPPORTS_MMX): New builtin.
>        (BUILT_IN_TARGET_SUPPORTS_POPCOUNT): New builtin.
>        (BUILT_IN_TARGET_SUPPORTS_SSE): New builtin.
>        (BUILT_IN_TARGET_SUPPORTS_SSE2): New builtin.
>        (BUILT_IN_TARGET_SUPPORTS_SSE3): New builtin.
>        (BUILT_IN_TARGET_SUPPORTS_SSSE3): New builtin.
>        (BUILT_IN_TARGET_SUPPORTS_SSE4_1): New builtin.
>        (BUILT_IN_TARGET_SUPPORTS_SSE4_2): New builtin.
>        (BUILT_IN_TARGET_IS_AMD): New builtin.
>        (BUILT_IN_TARGET_IS_INTEL): New builtin.
>        (BUILT_IN_TARGET_IS_COREI7_NEHALEM): New builtin.
>        (BUILT_IN_TARGET_IS_COREI7_WESTMERE): New builtin.
>        (BUILT_IN_TARGET_IS_COREI7_SANDYBRIDGE): New builtin.

Can you add Intel Atom?

>        (BUILT_IN_TARGET_IS_AMDFAM10_BARCELONA): New builtin.
>        (BUILT_IN_TARGET_IS_AMDFAM10_SHANGHAI): New builtin.
>        (BUILT_IN_TARGET_IS_AMDFAM10_ISTANBUL): New builtin.
>        * mversn-dispatch.c (do_fold_builtin_target): New function.
>        (gate_fold_builtin_target): New function.
>        (pass_tree_fold_builtin_target): New pass.
>        * timevar.def (TV_FOLD_BUILTIN_TARGET): New var.
>        * passes.c (init_optimization_passes): Add new pass to pass list.
>        * config/i386/i386.c (build_struct_with_one_bit_fields): New function.
>        (make_var_decl): New function.
>        (get_field_from_struct): New function.
>        (make_constructor_to_get_target_type): New function.
>        (fold_builtin_target): New function.
>        (ix86_fold_builtin): New function.
>        (TARGET_FOLD_BUILTIN): New macro.
>
>        * gcc.dg/builtin_target.c: New test.
>
>        * config/i386/i386-cpuinfo.c: New file.
>        * config/i386/t-cpuinfo: New file.
>        * config.host: Add t-cpuinfo to link i386-cpuinfo.o with libgcc
>

> +static void
> +get_intel_cpu (unsigned int family, unsigned int model, unsigned int brand_id)
> +{
> +  /* Parse family and model only if brand ID is 0. */
> +  if (brand_id == 0)
> +    {
> +      switch (family)
> +       {
> +       case 0x5:
> +         __cpu_type = PROCESSOR_PENTIUM;
> +         break;
> +       case 0x6:
> +         switch (model)
> +           {
> +           case 0x1a:
> +           case 0x1e:
> +           case 0x1f:
> +           case 0x2e:
> +             /* Nehalem.  */
> +             __cpu_type = PROCESSOR_COREI7_NEHALEM;
> +             __cpu_model.__cpu_is_corei7_nehalem = 1;
> +             break;
> +           case 0x25:
> +           case 0x2c:
> +           case 0x2f:
> +             /* Westmere.  */
> +             __cpu_type = PROCESSOR_COREI7_WESTMERE;
> +             __cpu_model.__cpu_is_corei7_westmere = 1;
> +             break;
> +           case 0x2a:
> +             /* Sandy Bridge.  */
> +             __cpu_type = PROCESSOR_COREI7_SANDYBRIDGE;
> +             __cpu_model.__cpu_is_corei7_sandybridge = 1;
> +             break;
> +           case 0x17:
> +           case 0x1d:
> +             /* Penryn.  */
> +           case 0x0f:
> +             /* Merom.  */
> +             __cpu_type = PROCESSOR_CORE2;
> +             break;
> +           default:
> +             __cpu_type = PROCESSOR_INTEL_GENERIC;
> +             break;
> +           }
> +         break;
> +       default:
> +         /* We have no idea.  */
> +         __cpu_type = PROCESSOR_INTEL_GENERIC;
> +         break;
> +       }
> +    }
> +}
> +

Please see config/i386/driver-i386.c for Intel CPU detection.
I will try to make it up to date.  For example, I added
model 0x2d, 0x1c, 0x26,

Thanks.

-- 
H.J.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
@ 2011-08-16 21:27 Sriraman Tallam
  2011-08-16 21:27 ` H.J. Lu
                   ` (4 more replies)
  0 siblings, 5 replies; 50+ messages in thread
From: Sriraman Tallam @ 2011-08-16 21:27 UTC (permalink / raw)
  To: reply, gcc-patches

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 31404 bytes --]

Support for getting CPU type and feature information at run-time.

The following patch provides support for finding the platform type at run-time, like cpu type and features supported. The multi-versioning framework will use the builtins added to dispatch the right function version. Please refer to http://gcc.gnu.org/ml/gcc/2011-08/msg00298.html for details on function multi-versioning usability.

	* tree-pass.h (pass_tree_fold_builtin_target): New pass.
	* builtins.def (BUILT_IN_TARGET_SUPPORTS_CMOV): New builtin.
	(BUILT_IN_TARGET_SUPPORTS_MMX): New builtin.
	(BUILT_IN_TARGET_SUPPORTS_POPCOUNT): New builtin.
	(BUILT_IN_TARGET_SUPPORTS_SSE): New builtin.
	(BUILT_IN_TARGET_SUPPORTS_SSE2): New builtin.
	(BUILT_IN_TARGET_SUPPORTS_SSE3): New builtin.
	(BUILT_IN_TARGET_SUPPORTS_SSSE3): New builtin.
	(BUILT_IN_TARGET_SUPPORTS_SSE4_1): New builtin.
	(BUILT_IN_TARGET_SUPPORTS_SSE4_2): New builtin.
	(BUILT_IN_TARGET_IS_AMD): New builtin.
	(BUILT_IN_TARGET_IS_INTEL): New builtin.
	(BUILT_IN_TARGET_IS_COREI7_NEHALEM): New builtin.
	(BUILT_IN_TARGET_IS_COREI7_WESTMERE): New builtin.
	(BUILT_IN_TARGET_IS_COREI7_SANDYBRIDGE): New builtin.
	(BUILT_IN_TARGET_IS_AMDFAM10_BARCELONA): New builtin.
	(BUILT_IN_TARGET_IS_AMDFAM10_SHANGHAI): New builtin.
	(BUILT_IN_TARGET_IS_AMDFAM10_ISTANBUL): New builtin.
	* mversn-dispatch.c (do_fold_builtin_target): New function.
	(gate_fold_builtin_target): New function.
	(pass_tree_fold_builtin_target): New pass.
	* timevar.def (TV_FOLD_BUILTIN_TARGET): New var.
	* passes.c (init_optimization_passes): Add new pass to pass list.
	* config/i386/i386.c (build_struct_with_one_bit_fields): New function.
	(make_var_decl): New function.
	(get_field_from_struct): New function.
	(make_constructor_to_get_target_type): New function.
	(fold_builtin_target): New function.
	(ix86_fold_builtin): New function.
	(TARGET_FOLD_BUILTIN): New macro.

	* gcc.dg/builtin_target.c: New test.
	
	* config/i386/i386-cpuinfo.c: New file.
	* config/i386/t-cpuinfo: New file.
	* config.host: Add t-cpuinfo to link i386-cpuinfo.o with libgcc

Index: libgcc/config.host
===================================================================
--- libgcc/config.host	(revision 177767)
+++ libgcc/config.host	(working copy)
@@ -609,7 +609,7 @@ case ${host} in
 i[34567]86-*-linux* | x86_64-*-linux* | \
   i[34567]86-*-kfreebsd*-gnu | i[34567]86-*-knetbsd*-gnu | \
   i[34567]86-*-gnu*)
-	tmake_file="${tmake_file} t-tls"
+	tmake_file="${tmake_file} t-tls i386/t-cpuinfo"
 	if test "$libgcc_cv_cfi" = "yes"; then
 		tmake_file="${tmake_file} t-stack i386/t-stack-i386"
 	fi
Index: libgcc/config/i386/t-cpuinfo
===================================================================
--- libgcc/config/i386/t-cpuinfo	(revision 0)
+++ libgcc/config/i386/t-cpuinfo	(revision 0)
@@ -0,0 +1,2 @@
+# This is an endfile
+LIB2ADD += $(srcdir)/config/i386/i386-cpuinfo.c
Index: libgcc/config/i386/i386-cpuinfo.c
===================================================================
--- libgcc/config/i386/i386-cpuinfo.c	(revision 0)
+++ libgcc/config/i386/i386-cpuinfo.c	(revision 0)
@@ -0,0 +1,275 @@
+/* Copyright (C) 2011 Free Software Foundation, Inc.
+ * Contributed by Sriraman Tallam <tmsriram@google.com>.
+ *
+ * This file is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 3, or (at your option) any
+ * later version.
+ *
+ * This file is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * Under Section 7 of GPL version 3, you are granted additional
+ * permissions described in the GCC Runtime Library Exception, version
+ * 3.1, as published by the Free Software Foundation.
+ *
+ * You should have received a copy of the GNU General Public License and
+ * a copy of the GCC Runtime Library Exception along with this program;
+ * see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+ * <http://www.gnu.org/licenses/>.
+ *
+ *
+ * This code is adapted from gcc/config/i386/driver-i386.c. The CPUID
+ * instruction is used to figure out the cpu type and supported features.
+ * GCC runs __cpu_indicator_init from a constructor which sets the members
+ * of __cpu_model and __cpu_features.
+ */
+
+#include <string.h>
+
+#ifdef __GNUC__
+#include "cpuid.h"
+
+enum processor_type
+{
+  PROCESSOR_PENTIUM = 0,
+  PROCESSOR_CORE2,
+  PROCESSOR_COREI7_NEHALEM,
+  PROCESSOR_COREI7_WESTMERE,
+  PROCESSOR_COREI7_SANDYBRIDGE,
+  PROCESSOR_INTEL_GENERIC,
+  PROCESSOR_AMDFAM10_BARCELONA,
+  PROCESSOR_AMDFAM10_SHANGHAI,
+  PROCESSOR_AMDFAM10_ISTANBUL,
+  PROCESSOR_AMDFAM10_GENERIC,
+  PROCESSOR_AMD_GENERIC,
+  PROCESSOR_GENERIC,
+  PROCESSOR_max
+};
+
+enum vendor_signatures
+{
+  SIG_INTEL =	0x756e6547 /* Genu */,
+  SIG_AMD =	0x68747541 /* Auth */
+};
+
+
+/* Features supported. */
+
+struct __processor_features
+{
+  unsigned int __cpu_cmov : 1;
+  unsigned int __cpu_mmx : 1;
+  unsigned int __cpu_popcnt : 1;
+  unsigned int __cpu_sse : 1;
+  unsigned int __cpu_sse2 : 1;
+  unsigned int __cpu_sse3 : 1;
+  unsigned int __cpu_ssse3 : 1;
+  unsigned int __cpu_sse4_1 : 1;
+  unsigned int __cpu_sse4_2 : 1;
+};
+
+/* Flags exported. */
+
+struct __processor_model
+{
+  unsigned int __cpu_is_amd : 1;
+  unsigned int __cpu_is_intel : 1;
+  unsigned int __cpu_is_corei7_nehalem : 1;
+  unsigned int __cpu_is_corei7_westmere : 1;
+  unsigned int __cpu_is_corei7_sandybridge : 1;
+  unsigned int __cpu_is_amdfam10_barcelona : 1;
+  unsigned int __cpu_is_amdfam10_shanghai : 1;
+  unsigned int __cpu_is_amdfam10_istanbul : 1;
+};
+
+enum processor_type __cpu_type = PROCESSOR_GENERIC;
+struct __processor_features __cpu_features;
+struct __processor_model __cpu_model;
+
+static void
+get_amd_cpu (unsigned int family, unsigned int model)
+{
+  switch (family)
+    {
+    case 0x10:
+      switch (model)
+	{
+	case 0x2:
+	  __cpu_type = PROCESSOR_AMDFAM10_BARCELONA;
+	  __cpu_model.__cpu_is_amdfam10_barcelona = 1;
+	  break;
+	case 0x4:
+	  __cpu_type = PROCESSOR_AMDFAM10_SHANGHAI;
+	  __cpu_model.__cpu_is_amdfam10_shanghai = 1;
+	  break;
+	case 0x8:
+	  __cpu_type = PROCESSOR_AMDFAM10_ISTANBUL;
+	  __cpu_model.__cpu_is_amdfam10_istanbul = 1;
+	  break;
+	default:
+	  __cpu_type = PROCESSOR_AMDFAM10_GENERIC;
+	  break;
+	}
+      break;
+    default:
+      __cpu_type = PROCESSOR_AMD_GENERIC;
+    }
+}
+
+static void
+get_intel_cpu (unsigned int family, unsigned int model, unsigned int brand_id)
+{
+  /* Parse family and model only if brand ID is 0. */
+  if (brand_id == 0)
+    {
+      switch (family)
+	{
+	case 0x5:
+	  __cpu_type = PROCESSOR_PENTIUM;
+	  break;
+	case 0x6:
+	  switch (model)
+	    {
+	    case 0x1a:
+	    case 0x1e:
+	    case 0x1f:
+	    case 0x2e:
+	      /* Nehalem.  */
+	      __cpu_type = PROCESSOR_COREI7_NEHALEM;
+	      __cpu_model.__cpu_is_corei7_nehalem = 1;
+	      break;
+	    case 0x25:
+	    case 0x2c:
+	    case 0x2f:
+	      /* Westmere.  */
+	      __cpu_type = PROCESSOR_COREI7_WESTMERE;
+	      __cpu_model.__cpu_is_corei7_westmere = 1;
+	      break;
+	    case 0x2a:
+	      /* Sandy Bridge.  */
+	      __cpu_type = PROCESSOR_COREI7_SANDYBRIDGE;
+	      __cpu_model.__cpu_is_corei7_sandybridge = 1;
+	      break;
+	    case 0x17:
+	    case 0x1d:
+	      /* Penryn.  */
+	    case 0x0f:
+	      /* Merom.  */
+	      __cpu_type = PROCESSOR_CORE2;
+	      break;
+	    default:
+	      __cpu_type = PROCESSOR_INTEL_GENERIC;
+	      break;
+	    }
+	  break;
+	default:
+	  /* We have no idea.  */
+	  __cpu_type = PROCESSOR_INTEL_GENERIC;
+	  break;
+	}
+    }
+}	             	
+
+static void
+get_available_features (unsigned int ecx, unsigned int edx)
+{
+  __cpu_features.__cpu_cmov = (edx & bit_CMOV) ? 1 : 0;
+  __cpu_features.__cpu_mmx = (edx & bit_MMX) ? 1 : 0;
+  __cpu_features.__cpu_sse = (edx & bit_SSE) ? 1 : 0;
+  __cpu_features.__cpu_sse2 = (edx & bit_SSE2) ? 1 : 0;
+  __cpu_features.__cpu_popcnt = (ecx & bit_POPCNT) ? 1 : 0;
+  __cpu_features.__cpu_sse3 = (ecx & bit_SSE3) ? 1 : 0;
+  __cpu_features.__cpu_ssse3 = (ecx & bit_SSSE3) ? 1 : 0;
+  __cpu_features.__cpu_sse4_1 = (ecx & bit_SSE4_1) ? 1 : 0;
+  __cpu_features.__cpu_sse4_2 = (ecx & bit_SSE4_2) ? 1 : 0;
+}
+
+/* A noinline function calling __get_cpuid. Having many calls to
+   cpuid in one function in 32-bit mode causes GCC to complain:
+   "can’t find a register in class ‘CLOBBERED_REGS’".  This is
+   related to PR rtl-optimization 44174. */
+
+static int __attribute__ ((noinline))
+__get_cpuid_output (unsigned int __level,
+		    unsigned int *__eax, unsigned int *__ebx,
+		    unsigned int *__ecx, unsigned int *__edx)
+{
+  return __get_cpuid (__level, __eax, __ebx, __ecx, __edx);
+}
+
+/* This function will be linked in to binaries that need to look up
+   CPU information.  */
+
+void
+__cpu_indicator_init(void)
+{
+  unsigned int eax, ebx, ecx, edx;
+
+  int max_level = 5;
+  unsigned int vendor;
+  unsigned int model, family, brand_id;
+
+  memset (&__cpu_features, 0, sizeof (struct __processor_features));
+  memset (&__cpu_model, 0, sizeof (struct __processor_model));
+
+  /* Assume cpuid insn present. Run in level 0 to get vendor id. */
+  if (!__get_cpuid_output (0, &eax, &ebx, &ecx, &edx))
+    return;
+
+  vendor = ebx;
+  max_level = eax;
+
+  if (max_level < 1)
+    return;
+
+  if (!__get_cpuid_output (1, &eax, &ebx, &ecx, &edx))
+    return;
+
+  model = (eax >> 4) & 0x0f;
+  family = (eax >> 8) & 0x0f;
+  brand_id = ebx & 0xff;
+
+  /* Adjust model and family for Intel CPUS. */
+  if (vendor == SIG_INTEL)
+    {
+      unsigned int extended_model, extended_family;
+
+      extended_model = (eax >> 12) & 0xf0;
+      extended_family = (eax >> 20) & 0xff;
+      if (family == 0x0f)
+	{
+	  family += extended_family;
+	  model += extended_model;
+	}
+      else if (family == 0x06)
+	model += extended_model;
+    }
+
+  /* Find CPU model. */
+
+  if (vendor == SIG_AMD)
+    {
+      __cpu_model.__cpu_is_amd = 1;
+      get_amd_cpu (family, model);
+    }
+  else if (vendor == SIG_INTEL)
+    {
+      __cpu_model.__cpu_is_intel = 1;
+      get_intel_cpu (family, model, brand_id);
+    }
+
+  /* Find available features. */
+  get_available_features (ecx, edx);
+}
+
+#else
+
+void
+__cpu_indicator_init(void)
+{
+}
+
+#endif /* __GNUC__ */
Index: gcc/tree-pass.h
===================================================================
--- gcc/tree-pass.h	(revision 177767)
+++ gcc/tree-pass.h	(working copy)
@@ -449,6 +449,7 @@ extern struct gimple_opt_pass pass_split_functions
 extern struct gimple_opt_pass pass_feedback_split_functions;
 extern struct gimple_opt_pass pass_threadsafe_analyze;
 extern struct gimple_opt_pass pass_tree_convert_builtin_dispatch;
+extern struct gimple_opt_pass pass_tree_fold_builtin_target;
 
 /* IPA Passes */
 extern struct simple_ipa_opt_pass pass_ipa_lower_emutls;
Index: gcc/testsuite/gcc.dg/builtin_target.c
===================================================================
--- gcc/testsuite/gcc.dg/builtin_target.c	(revision 0)
+++ gcc/testsuite/gcc.dg/builtin_target.c	(revision 0)
@@ -0,0 +1,49 @@
+/* This test checks if the __builtin_target_* calls are recognized. */
+
+/* { dg-do run } */
+
+int
+fn1 ()
+{
+  if (__builtin_target_supports_cmov () < 0)
+    return -1;
+  if (__builtin_target_supports_mmx () < 0)
+    return -1;
+  if (__builtin_target_supports_popcount () < 0)
+    return -1;
+  if (__builtin_target_supports_sse () < 0)
+    return -1;
+  if (__builtin_target_supports_sse2 () < 0)
+    return -1;
+  if (__builtin_target_supports_sse3 () < 0)
+    return -1;
+  if (__builtin_target_supports_ssse3 () < 0)
+    return -1;
+  if (__builtin_target_supports_sse4_1 () < 0)
+    return -1;
+  if (__builtin_target_supports_sse4_2 () < 0)
+    return -1;
+  if (__builtin_target_is_amd () < 0)
+    return -1;
+  if (__builtin_target_is_intel () < 0)
+    return -1;
+  if (__builtin_target_is_corei7_nehalem () < 0)
+    return -1;
+  if (__builtin_target_is_corei7_westmere () < 0)
+    return -1;
+  if (__builtin_target_is_corei7_sandybridge () < 0)
+    return -1;
+  if (__builtin_target_is_amdfam10_barcelona () < 0)
+    return -1;
+  if (__builtin_target_is_amdfam10_shanghai () < 0)
+    return -1;
+  if (__builtin_target_is_amdfam10_istanbul () < 0)
+    return -1;
+
+  return 0;
+}
+
+int main ()
+{
+  return fn1 ();
+}
Index: gcc/builtins.def
===================================================================
--- gcc/builtins.def	(revision 177767)
+++ gcc/builtins.def	(working copy)
@@ -763,6 +763,25 @@ DEF_BUILTIN (BUILT_IN_EMUTLS_REGISTER_COMMON,
 /* Multiversioning builtin dispatch hook. */
 DEF_GCC_BUILTIN (BUILT_IN_DISPATCH, "dispatch", BT_FN_INT_PTR_FN_INT_PTR_PTR_VAR, ATTR_NULL)
 
+/* Builtins to determine target type and features at run-time. */
+DEF_GCC_BUILTIN (BUILT_IN_TARGET_SUPPORTS_CMOV, "target_supports_cmov", BT_FN_INT, ATTR_NULL)
+DEF_GCC_BUILTIN (BUILT_IN_TARGET_SUPPORTS_MMX, "target_supports_mmx", BT_FN_INT, ATTR_NULL)
+DEF_GCC_BUILTIN (BUILT_IN_TARGET_SUPPORTS_POPCOUNT, "target_supports_popcount", BT_FN_INT, ATTR_NULL)
+DEF_GCC_BUILTIN (BUILT_IN_TARGET_SUPPORTS_SSE, "target_supports_sse", BT_FN_INT, ATTR_NULL)
+DEF_GCC_BUILTIN (BUILT_IN_TARGET_SUPPORTS_SSE2, "target_supports_sse2", BT_FN_INT, ATTR_NULL)
+DEF_GCC_BUILTIN (BUILT_IN_TARGET_SUPPORTS_SSE3, "target_supports_sse3", BT_FN_INT, ATTR_NULL)
+DEF_GCC_BUILTIN (BUILT_IN_TARGET_SUPPORTS_SSSE3, "target_supports_ssse3", BT_FN_INT, ATTR_NULL)
+DEF_GCC_BUILTIN (BUILT_IN_TARGET_SUPPORTS_SSE4_1, "target_supports_sse4_1", BT_FN_INT, ATTR_NULL)
+DEF_GCC_BUILTIN (BUILT_IN_TARGET_SUPPORTS_SSE4_2, "target_supports_sse4_2", BT_FN_INT, ATTR_NULL)
+DEF_GCC_BUILTIN (BUILT_IN_TARGET_IS_AMD, "target_is_amd", BT_FN_INT, ATTR_NULL)
+DEF_GCC_BUILTIN (BUILT_IN_TARGET_IS_INTEL, "target_is_intel", BT_FN_INT, ATTR_NULL)
+DEF_GCC_BUILTIN (BUILT_IN_TARGET_IS_COREI7_NEHALEM, "target_is_corei7_nehalem", BT_FN_INT, ATTR_NULL)
+DEF_GCC_BUILTIN (BUILT_IN_TARGET_IS_COREI7_WESTMERE, "target_is_corei7_westmere", BT_FN_INT, ATTR_NULL)
+DEF_GCC_BUILTIN (BUILT_IN_TARGET_IS_COREI7_SANDYBRIDGE, "target_is_corei7_sandybridge", BT_FN_INT, ATTR_NULL)
+DEF_GCC_BUILTIN (BUILT_IN_TARGET_IS_AMDFAM10_BARCELONA, "target_is_amdfam10_barcelona", BT_FN_INT, ATTR_NULL)
+DEF_GCC_BUILTIN (BUILT_IN_TARGET_IS_AMDFAM10_SHANGHAI, "target_is_amdfam10_shanghai", BT_FN_INT, ATTR_NULL)
+DEF_GCC_BUILTIN (BUILT_IN_TARGET_IS_AMDFAM10_ISTANBUL, "target_is_amdfam10_istanbul", BT_FN_INT, ATTR_NULL)
+
 /* Exception support.  */
 DEF_BUILTIN_STUB (BUILT_IN_UNWIND_RESUME, "__builtin_unwind_resume")
 DEF_BUILTIN_STUB (BUILT_IN_CXA_END_CLEANUP, "__builtin_cxa_end_cleanup")
Index: gcc/mversn-dispatch.c
===================================================================
--- gcc/mversn-dispatch.c	(revision 177767)
+++ gcc/mversn-dispatch.c	(working copy)
@@ -135,6 +135,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "output.h"
 #include "vecprim.h"
 #include "gimple-pretty-print.h"
+#include "target.h"
 
 typedef struct cgraph_node* NODEPTR;
 DEF_VEC_P (NODEPTR);
@@ -1764,3 +1765,103 @@ struct gimple_opt_pass pass_tree_convert_builtin_d
   TODO_update_ssa | TODO_verify_ssa
  }
 };
+
+/* Fold calls to __builtin_target_* */
+
+static unsigned int
+do_fold_builtin_target (void)
+{
+  basic_block bb;
+  gimple_stmt_iterator gsi;
+
+  /* Go through each stmt looking for __builtin_target_* calls */
+  FOR_EACH_BB_FN (bb, DECL_STRUCT_FUNCTION (current_function_decl))
+    {
+      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+        {
+	  gimple stmt = gsi_stmt (gsi);
+	  gimple assign_stmt;
+          tree call_decl;
+	  tree lhs_retval;
+	  tree folded_val;
+
+	  tree ssa_var, tmp_var;
+	  gimple init_stmt;
+
+          if (!is_gimple_call (stmt))
+            continue;
+
+          call_decl = gimple_call_fndecl (stmt);
+
+	  /* Check if it is a __builtin_target_* call. */
+
+	  if (call_decl == NULL
+	      || DECL_NAME (call_decl) == NULL_TREE
+	      || DECL_BUILT_IN_CLASS (call_decl) != BUILT_IN_NORMAL
+	      || strstr (IDENTIFIER_POINTER (DECL_NAME (call_decl)),
+                         "__builtin_target") == NULL)
+            continue;
+
+	  /* If the lhs is NULL there is no need to fold the call. */
+	  lhs_retval = gimple_call_lhs(stmt);
+	  if (lhs_retval == NULL)
+	    continue;
+
+	  /* Call the target hook to fold the builtin */	
+          folded_val = targetm.fold_builtin(call_decl, 0, NULL, false);
+
+	  /* If the target does not support the builtin then fold it to zero. */
+	  if (folded_val == NULL_TREE)
+	    folded_val = build_zero_cst (unsigned_type_node);
+
+	  /* Type cast unsigned value to integer */
+	  tmp_var = create_tmp_var (unsigned_type_node, NULL);
+	  init_stmt = gimple_build_assign (tmp_var, folded_val);
+	  ssa_var = make_ssa_name (tmp_var, init_stmt);
+	  gimple_assign_set_lhs (init_stmt, ssa_var);
+	  mark_symbols_for_renaming (init_stmt);
+
+	  assign_stmt = gimple_build_assign_with_ops (NOP_EXPR, lhs_retval, ssa_var, 0);
+	  mark_symbols_for_renaming(assign_stmt);
+
+	  gsi_insert_after_without_update (&gsi, assign_stmt, GSI_SAME_STMT);
+	  gsi_insert_after_without_update (&gsi, init_stmt, GSI_SAME_STMT);
+	  /* Delete the original call. */
+	  gsi_remove(&gsi, true);
+	}
+    }
+
+  return 0;
+}
+
+static bool
+gate_fold_builtin_target (void)
+{
+  return true;
+}
+
+/* Pass to fold __builtin_target_* functions */
+
+struct gimple_opt_pass pass_tree_fold_builtin_target =
+{
+ {
+  GIMPLE_PASS,
+  "fold_builtin_target",	        /* name */
+  gate_fold_builtin_target,		/* gate */
+  do_fold_builtin_target,		/* execute */
+  NULL,					/* sub */
+  NULL,					/* next */
+  0,					/* static_pass_number */
+  TV_FOLD_BUILTIN_TARGET,		/* tv_id */
+  PROP_cfg,				/* properties_required */
+  PROP_cfg,				/* properties_provided */
+  0,					/* properties_destroyed */
+  0,					/* todo_flags_start */
+  TODO_dump_func |			/* todo_flags_finish */
+  TODO_cleanup_cfg |
+  TODO_update_ssa |
+  TODO_verify_ssa
+ }
+};
+
+
Index: gcc/timevar.def
===================================================================
--- gcc/timevar.def	(revision 177767)
+++ gcc/timevar.def	(working copy)
@@ -124,6 +124,7 @@ DEFTIMEVAR (TV_PARSE_INMETH          , "parser inl
 DEFTIMEVAR (TV_TEMPLATE_INST         , "template instantiation")
 DEFTIMEVAR (TV_INLINE_HEURISTICS     , "inline heuristics")
 DEFTIMEVAR (TV_MVERSN_DISPATCH       , "multiversion dispatch")
+DEFTIMEVAR (TV_FOLD_BUILTIN_TARGET   , "fold __builtin_target calls")
 DEFTIMEVAR (TV_INTEGRATION           , "integration")
 DEFTIMEVAR (TV_TREE_GIMPLIFY	     , "tree gimplify")
 DEFTIMEVAR (TV_TREE_EH		     , "tree eh")
Index: gcc/passes.c
===================================================================
--- gcc/passes.c	(revision 177767)
+++ gcc/passes.c	(working copy)
@@ -1249,6 +1249,8 @@ init_optimization_passes (void)
     {
       struct opt_pass **p = &pass_ipa_multiversion_dispatch.pass.sub;
       NEXT_PASS (pass_tree_convert_builtin_dispatch);
+      /* Fold calls to __builtin_target_*. */
+      NEXT_PASS (pass_tree_fold_builtin_target);
       /* Rebuilding cgraph edges is necessary as the above passes change
          the call graph.  Otherwise, future optimizations use the old
 	 call graph and make wrong decisions sometimes.*/
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 177767)
+++ gcc/config/i386/i386.c	(working copy)
@@ -58,6 +58,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "sched-int.h"
 #include "sbitmap.h"
 #include "fibheap.h"
+#include "tree-flow.h"
+#include "tree-pass.h"
 
 enum upper_128bits_state
 {
@@ -7867,6 +7869,338 @@ ix86_build_builtin_va_list (void)
   return ret;
 }
 
+/* Returns a struct type with name NAME and number of fields equal to
+   NUM_FIELDS.  Each field is a unsigned int bit field of length 1 bit. */
+
+static tree
+build_struct_with_one_bit_fields (int num_fields, const char *name)
+{
+  int i;
+  char field_name [10];
+  tree field = NULL_TREE, field_chain = NULL_TREE;
+  tree type = make_node (RECORD_TYPE);
+
+  strcpy (field_name, "k_field");
+
+  for (i = 0; i < num_fields; i++)
+    {
+      /* Name the fields, 0_field, 1_field, ... */
+      field_name [0] = '0' + i;
+      field = build_decl (UNKNOWN_LOCATION, FIELD_DECL,
+			  get_identifier (field_name), unsigned_type_node);
+      DECL_BIT_FIELD (field) = 1;
+      DECL_SIZE (field) = bitsize_one_node;
+      if (field_chain != NULL_TREE)
+	DECL_CHAIN (field) = field_chain;
+      field_chain = field;
+    }
+  finish_builtin_struct (type, name, field_chain, NULL_TREE);
+  return type;
+}
+
+/* Returns a VAR_DECL of type TYPE and name NAME. */
+
+static tree
+make_var_decl (tree type, const char *name)
+{
+  tree new_decl;
+  struct varpool_node *vnode;
+
+  new_decl = build_decl (UNKNOWN_LOCATION,
+	                 VAR_DECL,
+	  	         get_identifier(name),
+		         type);
+
+  DECL_EXTERNAL (new_decl) = 1;
+  TREE_STATIC (new_decl) = 1;
+  TREE_PUBLIC (new_decl) = 1;
+  DECL_INITIAL (new_decl) = 0;
+  DECL_ARTIFICIAL (new_decl) = 0;
+  DECL_PRESERVE_P (new_decl) = 1;
+
+  make_decl_one_only (new_decl, DECL_ASSEMBLER_NAME (new_decl));
+  assemble_variable (new_decl, 0, 0, 0);
+
+  vnode = varpool_node (new_decl);
+  gcc_assert (vnode != NULL);
+  /* Set finalized to 1, otherwise it asserts in function "write_symbol" in
+     lto-streamer-out.c. */
+  vnode->finalized = 1;
+
+  return new_decl;
+}
+
+/* Traverses the chain of fields in STRUCT_TYPE and returns the FIELD_NUM
+   numbered field. */
+
+static tree
+get_field_from_struct (tree struct_type, int field_num)
+{
+  int i;
+  tree field = TYPE_FIELDS (struct_type);
+
+  for (i = 0; i < field_num; i++, field = DECL_CHAIN(field))
+    {
+      gcc_assert (field != NULL_TREE);
+    }
+
+  return field;
+}
+
+/* Create a new static constructor that calls __cpu_indicator_init ()
+   function defined in libgcc/config/i386-cpuinfo.c which runs cpuid
+   to figure out the type of the target. */
+
+static tree
+make_constructor_to_get_target_type (const char *name)
+{
+  tree decl, type, t;
+  gimple_seq seq;
+  basic_block new_bb;
+  tree old_current_function_decl;
+
+  tree __cpu_indicator_int_decl;
+  gimple constructor_body;
+
+
+  type = build_function_type_list (void_type_node, NULL_TREE);
+
+  /* Make a call stmt to __cpu_indicator_init */
+  __cpu_indicator_int_decl = build_fn_decl ("__cpu_indicator_init", type);
+  constructor_body = gimple_build_call (__cpu_indicator_int_decl, 0);
+  DECL_EXTERNAL (__cpu_indicator_int_decl) = 1;
+
+  decl = build_fn_decl (name, type);
+
+  DECL_NAME (decl) = get_identifier (name);
+  SET_DECL_ASSEMBLER_NAME (decl, DECL_NAME (decl));
+  gcc_assert (cgraph_node (decl) != NULL);
+
+  TREE_USED (decl) = 1;
+  DECL_ARTIFICIAL (decl) = 1;
+  DECL_IGNORED_P (decl) = 0;
+  TREE_PUBLIC (decl) = 0;
+  DECL_UNINLINABLE (decl) = 1;
+  DECL_EXTERNAL (decl) = 0;
+  DECL_CONTEXT (decl) = NULL_TREE;
+  DECL_INITIAL (decl) = make_node (BLOCK);
+  DECL_STATIC_CONSTRUCTOR (decl) = 1;
+  TREE_READONLY (decl) = 0;
+  DECL_PURE_P (decl) = 0;
+
+  /* This is a comdat. */ 
+  make_decl_one_only (decl, DECL_ASSEMBLER_NAME (decl));
+
+  /* Build result decl and add to function_decl. */
+  t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE, void_type_node);
+  DECL_ARTIFICIAL (t) = 1;
+  DECL_IGNORED_P (t) = 1;
+  DECL_RESULT (decl) = t;
+
+  gimplify_function_tree (decl);
+
+  /* Build CFG for this function. */
+
+  old_current_function_decl = current_function_decl;
+  push_cfun (DECL_STRUCT_FUNCTION (decl));
+  current_function_decl = decl;
+  init_empty_tree_cfg_for_function (DECL_STRUCT_FUNCTION (decl));
+  cfun->curr_properties |=
+    (PROP_gimple_lcf | PROP_gimple_leh | PROP_cfg | PROP_referenced_vars |
+     PROP_ssa);
+  new_bb = create_empty_bb (ENTRY_BLOCK_PTR);
+  make_edge (ENTRY_BLOCK_PTR, new_bb, EDGE_FALLTHRU);
+
+  /* XXX: Not sure if the edge commented below is necessary.  If I add this
+     edge, it fails in gimple_verify_flow_info in tree-cfg.c in condition :
+     " if (e->flags & EDGE_FALLTHRU)"
+     during -fprofile-generate.
+     Otherwise, it is fine.  Deleting this edge does not break anything.
+     Commenting this so that it is clear I am intentionally not doing this.*/
+  /* make_edge (new_bb, EXIT_BLOCK_PTR, EDGE_FALLTHRU); */
+
+  seq = gimple_seq_alloc_with_stmt (constructor_body);
+
+  set_bb_seq (new_bb, seq);
+  gimple_set_bb (constructor_body, new_bb);
+
+  /* Set the lexical block of the constructor body. Fails the inliner
+     other wise. */
+  gimple_set_block (constructor_body, DECL_INITIAL (decl));
+
+  /* This call is very important if this pass runs when the IR is in
+     SSA form.  It breaks things in strange ways otherwise. */
+  init_tree_ssa (DECL_STRUCT_FUNCTION (decl));
+  /* add_referenced_var (version_selector_var); */
+
+  cgraph_add_new_function (decl, true);
+  cgraph_call_function_insertion_hooks (cgraph_node (decl));
+  cgraph_mark_needed_node (cgraph_node (decl));
+
+  pop_cfun ();
+  current_function_decl = old_current_function_decl;
+  return decl;
+}
+
+/* FNDECL is a __builtin_target_* call that is folded into an integer defined
+   in libgcc/config/i386/i386-cpuinfo.c */
+
+static tree 
+fold_builtin_target (tree fndecl)
+{
+  /* This is the order of bit-fields in __processor_features in
+     i386-cpuinfo.c */
+  enum processor_features
+  {
+    F_CMOV = 0,
+    F_MMX,
+    F_POPCNT,
+    F_SSE,
+    F_SSE2,
+    F_SSE3,
+    F_SSSE3,
+    F_SSE4_1,
+    F_SSE4_2,
+    F_MAX
+  };
+
+  /* This is the order of bit-fields in __processor_model in
+     i386-cpuinfo.c */
+  enum processor_model
+  {
+    M_AMD = 0,
+    M_INTEL,
+    M_COREI7_NEHALEM,
+    M_COREI7_WESTMERE,
+    M_COREI7_SANDYBRIDGE,
+    M_AMDFAM10_BARCELONA,
+    M_AMDFAM10_SHANGHAI,
+    M_AMDFAM10_ISTANBUL,
+    M_MAX
+  };
+
+  static tree __processor_features_type = NULL_TREE;
+  static tree __cpu_features_var = NULL_TREE;
+  static tree __processor_model_type = NULL_TREE;
+  static tree __cpu_model_var = NULL_TREE;
+  static tree ctor_decl = NULL_TREE;
+  static tree field;
+  static tree which_struct;
+
+  /* Make a call to __cpu_indicatior_init in a constructor.
+     Function __cpu_indicator_init is defined in i386-cpuinfo.c. */
+  if (ctor_decl == NULL_TREE)
+   ctor_decl = make_constructor_to_get_target_type 
+		("__cpu_indicator_init_ctor");
+
+  if (__processor_features_type == NULL_TREE)
+    __processor_features_type = build_struct_with_one_bit_fields (F_MAX,
+ 			          "__processor_features");
+
+  if (__processor_model_type == NULL_TREE)
+    __processor_model_type = build_struct_with_one_bit_fields (M_MAX,
+ 			          "__processor_model");
+
+  if (__cpu_features_var == NULL_TREE)
+    __cpu_features_var = make_var_decl (__processor_features_type,
+					"__cpu_features");
+
+  if (__cpu_model_var == NULL_TREE)
+    __cpu_model_var = make_var_decl (__processor_model_type,
+				     "__cpu_model");
+
+  /* Look at fndecl code to identify the field requested. */ 
+  switch (DECL_FUNCTION_CODE (fndecl))
+    {
+    case BUILT_IN_TARGET_SUPPORTS_CMOV:
+      field = get_field_from_struct (__processor_features_type, F_CMOV);
+      which_struct = __cpu_features_var;
+      break;
+    case BUILT_IN_TARGET_SUPPORTS_MMX:
+      field = get_field_from_struct (__processor_features_type, F_MMX);
+      which_struct = __cpu_features_var;
+      break;
+    case BUILT_IN_TARGET_SUPPORTS_POPCOUNT:
+      field = get_field_from_struct (__processor_features_type, F_POPCNT);
+      which_struct = __cpu_features_var;
+      break;
+    case BUILT_IN_TARGET_SUPPORTS_SSE:
+      field = get_field_from_struct (__processor_features_type, F_SSE);
+      which_struct = __cpu_features_var;
+      break;
+    case BUILT_IN_TARGET_SUPPORTS_SSE2:
+      field = get_field_from_struct (__processor_features_type, F_SSE2);
+      which_struct = __cpu_features_var;
+      break;
+    case BUILT_IN_TARGET_SUPPORTS_SSE3:
+      field = get_field_from_struct (__processor_features_type, F_SSE3);
+      which_struct = __cpu_features_var;
+      break;
+    case BUILT_IN_TARGET_SUPPORTS_SSSE3:
+      field = get_field_from_struct (__processor_features_type, F_SSE3);
+      which_struct = __cpu_features_var;
+      break;
+    case BUILT_IN_TARGET_SUPPORTS_SSE4_1:
+      field = get_field_from_struct (__processor_features_type, F_SSE4_1);
+      which_struct = __cpu_features_var;
+      break;
+    case BUILT_IN_TARGET_SUPPORTS_SSE4_2:
+      field = get_field_from_struct (__processor_features_type, F_SSE4_2);
+      which_struct = __cpu_features_var;
+      break;
+    case BUILT_IN_TARGET_IS_AMD:
+      field = get_field_from_struct (__processor_model_type, M_AMD);;
+      which_struct = __cpu_model_var;
+      break;
+    case BUILT_IN_TARGET_IS_INTEL:
+      field = get_field_from_struct (__processor_model_type, M_INTEL);;
+      which_struct = __cpu_model_var;
+      break;
+    case BUILT_IN_TARGET_IS_COREI7_NEHALEM:
+      field = get_field_from_struct (__processor_model_type, M_COREI7_NEHALEM);;
+      which_struct = __cpu_model_var;
+      break;
+    case BUILT_IN_TARGET_IS_COREI7_WESTMERE:
+      field = get_field_from_struct (__processor_model_type, M_COREI7_WESTMERE);;
+      which_struct = __cpu_model_var;
+      break;
+    case BUILT_IN_TARGET_IS_COREI7_SANDYBRIDGE:
+      field = get_field_from_struct (__processor_model_type, M_COREI7_SANDYBRIDGE);;
+      which_struct = __cpu_model_var;
+      break;
+    case BUILT_IN_TARGET_IS_AMDFAM10_BARCELONA:
+      field = get_field_from_struct (__processor_model_type, M_AMDFAM10_BARCELONA);;
+      which_struct = __cpu_model_var;
+      break;
+    case BUILT_IN_TARGET_IS_AMDFAM10_SHANGHAI:
+      field = get_field_from_struct (__processor_model_type, M_AMDFAM10_SHANGHAI);;
+      which_struct = __cpu_model_var;
+      break;
+    case BUILT_IN_TARGET_IS_AMDFAM10_ISTANBUL:
+      field = get_field_from_struct (__processor_model_type, M_AMDFAM10_ISTANBUL);;
+      which_struct = __cpu_model_var;
+      break;
+    default:
+      return NULL_TREE;
+    }
+
+  return build3 (COMPONENT_REF, TREE_TYPE (field), which_struct, field, NULL_TREE);
+}
+
+/* Folds __builtin_target_* builtins. */
+
+static tree
+ix86_fold_builtin (tree fndecl, int n_args ATTRIBUTE_UNUSED,
+		    tree *args ATTRIBUTE_UNUSED, bool ignore ATTRIBUTE_UNUSED)
+{
+  const char *decl_name = IDENTIFIER_POINTER (DECL_NAME (fndecl));
+  if (DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL
+      && strstr(decl_name, "__builtin_target") != NULL)
+    return fold_builtin_target (fndecl);
+
+  return NULL_TREE;
+}
+
 /* Worker function for TARGET_SETUP_INCOMING_VARARGS.  */
 
 static void
@@ -35097,6 +35431,9 @@ ix86_autovectorize_vector_sizes (void)
 #undef TARGET_BUILD_BUILTIN_VA_LIST
 #define TARGET_BUILD_BUILTIN_VA_LIST ix86_build_builtin_va_list
 
+#undef TARGET_FOLD_BUILTIN
+#define TARGET_FOLD_BUILTIN ix86_fold_builtin
+
 #undef TARGET_ENUM_VA_LIST_P
 #define TARGET_ENUM_VA_LIST_P ix86_enum_va_list
 

--
This patch is available for review at http://codereview.appspot.com/4893046

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
  2011-08-16 21:27 ` H.J. Lu
@ 2011-08-16 21:52   ` Sriraman Tallam
  0 siblings, 0 replies; 50+ messages in thread
From: Sriraman Tallam @ 2011-08-16 21:52 UTC (permalink / raw)
  To: H.J. Lu; +Cc: reply, gcc-patches

On Tue, Aug 16, 2011 at 2:06 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Tue, Aug 16, 2011 at 1:50 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>> Support for getting CPU type and feature information at run-time.
>>
>> The following patch provides support for finding the platform type at run-time, like cpu type and features supported. The multi-versioning framework will use the builtins added to dispatch the right function version. Please refer to http://gcc.gnu.org/ml/gcc/2011-08/msg00298.html for details on function multi-versioning usability.
>>
>>        * tree-pass.h (pass_tree_fold_builtin_target): New pass.
>>        * builtins.def (BUILT_IN_TARGET_SUPPORTS_CMOV): New builtin.
>>        (BUILT_IN_TARGET_SUPPORTS_MMX): New builtin.
>>        (BUILT_IN_TARGET_SUPPORTS_POPCOUNT): New builtin.
>>        (BUILT_IN_TARGET_SUPPORTS_SSE): New builtin.
>>        (BUILT_IN_TARGET_SUPPORTS_SSE2): New builtin.
>>        (BUILT_IN_TARGET_SUPPORTS_SSE3): New builtin.
>>        (BUILT_IN_TARGET_SUPPORTS_SSSE3): New builtin.
>>        (BUILT_IN_TARGET_SUPPORTS_SSE4_1): New builtin.
>>        (BUILT_IN_TARGET_SUPPORTS_SSE4_2): New builtin.
>>        (BUILT_IN_TARGET_IS_AMD): New builtin.
>>        (BUILT_IN_TARGET_IS_INTEL): New builtin.
>>        (BUILT_IN_TARGET_IS_COREI7_NEHALEM): New builtin.
>>        (BUILT_IN_TARGET_IS_COREI7_WESTMERE): New builtin.
>>        (BUILT_IN_TARGET_IS_COREI7_SANDYBRIDGE): New builtin.
>
> Can you add Intel Atom?

Yes, I will. There is probably a lot more that is interesting which I
missed and will add if somebody sees fit.

>
>>        (BUILT_IN_TARGET_IS_AMDFAM10_BARCELONA): New builtin.
>>        (BUILT_IN_TARGET_IS_AMDFAM10_SHANGHAI): New builtin.
>>        (BUILT_IN_TARGET_IS_AMDFAM10_ISTANBUL): New builtin.
>>        * mversn-dispatch.c (do_fold_builtin_target): New function.
>>        (gate_fold_builtin_target): New function.
>>        (pass_tree_fold_builtin_target): New pass.
>>        * timevar.def (TV_FOLD_BUILTIN_TARGET): New var.
>>        * passes.c (init_optimization_passes): Add new pass to pass list.
>>        * config/i386/i386.c (build_struct_with_one_bit_fields): New function.
>>        (make_var_decl): New function.
>>        (get_field_from_struct): New function.
>>        (make_constructor_to_get_target_type): New function.
>>        (fold_builtin_target): New function.
>>        (ix86_fold_builtin): New function.
>>        (TARGET_FOLD_BUILTIN): New macro.
>>
>>        * gcc.dg/builtin_target.c: New test.
>>
>>        * config/i386/i386-cpuinfo.c: New file.
>>        * config/i386/t-cpuinfo: New file.
>>        * config.host: Add t-cpuinfo to link i386-cpuinfo.o with libgcc
>>
>
>> +static void
>> +get_intel_cpu (unsigned int family, unsigned int model, unsigned int brand_id)
>> +{
>> +  /* Parse family and model only if brand ID is 0. */
>> +  if (brand_id == 0)
>> +    {
>> +      switch (family)
>> +       {
>> +       case 0x5:
>> +         __cpu_type = PROCESSOR_PENTIUM;
>> +         break;
>> +       case 0x6:
>> +         switch (model)
>> +           {
>> +           case 0x1a:
>> +           case 0x1e:
>> +           case 0x1f:
>> +           case 0x2e:
>> +             /* Nehalem.  */
>> +             __cpu_type = PROCESSOR_COREI7_NEHALEM;
>> +             __cpu_model.__cpu_is_corei7_nehalem = 1;
>> +             break;
>> +           case 0x25:
>> +           case 0x2c:
>> +           case 0x2f:
>> +             /* Westmere.  */
>> +             __cpu_type = PROCESSOR_COREI7_WESTMERE;
>> +             __cpu_model.__cpu_is_corei7_westmere = 1;
>> +             break;
>> +           case 0x2a:
>> +             /* Sandy Bridge.  */
>> +             __cpu_type = PROCESSOR_COREI7_SANDYBRIDGE;
>> +             __cpu_model.__cpu_is_corei7_sandybridge = 1;
>> +             break;
>> +           case 0x17:
>> +           case 0x1d:
>> +             /* Penryn.  */
>> +           case 0x0f:
>> +             /* Merom.  */
>> +             __cpu_type = PROCESSOR_CORE2;
>> +             break;
>> +           default:
>> +             __cpu_type = PROCESSOR_INTEL_GENERIC;
>> +             break;
>> +           }
>> +         break;
>> +       default:
>> +         /* We have no idea.  */
>> +         __cpu_type = PROCESSOR_INTEL_GENERIC;
>> +         break;
>> +       }
>> +    }
>> +}
>> +
>
> Please see config/i386/driver-i386.c for Intel CPU detection.
> I will try to make it up to date.  For example, I added
> model 0x2d, 0x1c, 0x26,

I used the code in config/i386/driver-i386.c was used as reference.

Thanks,
-Sri.

>
> Thanks.
>
> --
> H.J.
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
  2011-08-16 21:27 [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046) Sriraman Tallam
  2011-08-16 21:27 ` H.J. Lu
@ 2011-08-16 23:22 ` Andi Kleen
  2011-08-17  6:55 ` Joseph S. Myers
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 50+ messages in thread
From: Andi Kleen @ 2011-08-16 23:22 UTC (permalink / raw)
  To: Sriraman Tallam; +Cc: reply, gcc-patches

tmsriram@google.com (Sriraman Tallam) writes:

> Support for getting CPU type and feature information at run-time.
>
> The following patch provides support for finding the platform type at run-time, like cpu type and features supported. The multi-versioning framework will use the builtins added to dispatch the right function version. Please refer to http://gcc.gnu.org/ml/gcc/2011-08/msg00298.html for details on function multi-versioning usability.

It would be nice if you could share the code for handling the model
numbers with the similar code in gcc.c

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
  2011-08-16 21:27 [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046) Sriraman Tallam
  2011-08-16 21:27 ` H.J. Lu
  2011-08-16 23:22 ` Andi Kleen
@ 2011-08-17  6:55 ` Joseph S. Myers
  2011-08-17  8:28   ` Sriraman Tallam
  2011-08-17  9:38 ` Richard Guenther
  2011-08-18  1:56 ` Hans-Peter Nilsson
  4 siblings, 1 reply; 50+ messages in thread
From: Joseph S. Myers @ 2011-08-17  6:55 UTC (permalink / raw)
  To: Sriraman Tallam; +Cc: reply, gcc-patches

On Tue, 16 Aug 2011, Sriraman Tallam wrote:

> Index: libgcc/config/i386/t-cpuinfo
> ===================================================================
> --- libgcc/config/i386/t-cpuinfo	(revision 0)
> +++ libgcc/config/i386/t-cpuinfo	(revision 0)
> @@ -0,0 +1,2 @@
> +# This is an endfile
> +LIB2ADD += $(srcdir)/config/i386/i386-cpuinfo.c

What do you mean by this comment?  That it's linked in like crt*end*.o?  
It looks to me like a normal libgcc object, not an endfile.

> Index: libgcc/config/i386/i386-cpuinfo.c
> ===================================================================
> --- libgcc/config/i386/i386-cpuinfo.c	(revision 0)
> +++ libgcc/config/i386/i386-cpuinfo.c	(revision 0)
> @@ -0,0 +1,275 @@
> +/* Copyright (C) 2011 Free Software Foundation, Inc.
> + * Contributed by Sriraman Tallam <tmsriram@google.com>.

Please format in the normal way; no leading "*" on each comment line.

> +#include <string.h>

Don't include headers not provided by GCC in libgcc without checking 
inhibit_libc, to avoid bootstrap problems.  Declaring just the functions 
you need is safer here than including a system header.

> +#ifdef __GNUC__

Such a conditional does not make sense in libgcc code.

> +/* This function will be linked in to binaries that need to look up
> +   CPU information.  */
> +
> +void
> +__cpu_indicator_init(void)

Format according to the GNU Coding Standards.

You appear not to have added any symbol versions; do you have a particular 
rationale for these functions being linked separately into each executable 
and shared library needing them, rather than being exported from the 
shared libgcc?

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
  2011-08-17  6:55 ` Joseph S. Myers
@ 2011-08-17  8:28   ` Sriraman Tallam
  0 siblings, 0 replies; 50+ messages in thread
From: Sriraman Tallam @ 2011-08-17  8:28 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: reply, gcc-patches

On Tue, Aug 16, 2011 at 3:35 PM, Joseph S. Myers
<joseph@codesourcery.com> wrote:
> On Tue, 16 Aug 2011, Sriraman Tallam wrote:
>
>> Index: libgcc/config/i386/t-cpuinfo
>> ===================================================================
>> --- libgcc/config/i386/t-cpuinfo      (revision 0)
>> +++ libgcc/config/i386/t-cpuinfo      (revision 0)
>> @@ -0,0 +1,2 @@
>> +# This is an endfile
>> +LIB2ADD += $(srcdir)/config/i386/i386-cpuinfo.c
>
> What do you mean by this comment?  That it's linked in like crt*end*.o?
> It looks to me like a normal libgcc object, not an endfile.
>
>> Index: libgcc/config/i386/i386-cpuinfo.c
>> ===================================================================
>> --- libgcc/config/i386/i386-cpuinfo.c (revision 0)
>> +++ libgcc/config/i386/i386-cpuinfo.c (revision 0)
>> @@ -0,0 +1,275 @@
>> +/* Copyright (C) 2011 Free Software Foundation, Inc.
>> + * Contributed by Sriraman Tallam <tmsriram@google.com>.
>
> Please format in the normal way; no leading "*" on each comment line.
>
>> +#include <string.h>
>
> Don't include headers not provided by GCC in libgcc without checking
> inhibit_libc, to avoid bootstrap problems.  Declaring just the functions
> you need is safer here than including a system header.
>
>> +#ifdef __GNUC__
>
> Such a conditional does not make sense in libgcc code.
>
>> +/* This function will be linked in to binaries that need to look up
>> +   CPU information.  */
>> +
>> +void
>> +__cpu_indicator_init(void)
>
> Format according to the GNU Coding Standards.
>
> You appear not to have added any symbol versions; do you have a particular
> rationale for these functions being linked separately into each executable
> and shared library needing them, rather than being exported from the
> shared libgcc?

I did not realize I could just make shared libgcc export those
symbols. I will make the changes you mentioned.

Thanks,
-Sri.

>
> --
> Joseph S. Myers
> joseph@codesourcery.com
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
  2011-08-16 21:27 [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046) Sriraman Tallam
                   ` (2 preceding siblings ...)
  2011-08-17  6:55 ` Joseph S. Myers
@ 2011-08-17  9:38 ` Richard Guenther
  2011-08-17 20:04   ` Sriraman Tallam
  2011-08-18  1:56 ` Hans-Peter Nilsson
  4 siblings, 1 reply; 50+ messages in thread
From: Richard Guenther @ 2011-08-17  9:38 UTC (permalink / raw)
  To: Sriraman Tallam; +Cc: reply, gcc-patches

On Tue, Aug 16, 2011 at 10:50 PM, Sriraman Tallam <tmsriram@google.com> wrote:
> Support for getting CPU type and feature information at run-time.
>
> The following patch provides support for finding the platform type at run-time, like cpu type and features supported. The multi-versioning framework will use the builtins added to dispatch the right function version. Please refer to http://gcc.gnu.org/ml/gcc/2011-08/msg00298.html for details on function multi-versioning usability.

Please provide an overview why you need the new builtins, why you need
a separate pass to fold them (instead of just expanding them) and why
you are creating
vars behind the back of GCC:

+  /* Set finalized to 1, otherwise it asserts in function "write_symbol" in
+     lto-streamer-out.c. */
+  vnode->finalized = 1;

where I think you miss a varpool_finalize_node call somewhere.  Why
isn't this all done at target init time?  If you don't mark the
variable as to be preserved
like you do cgraph will optimize it all away if it isn't needed.

Richard.

>        * tree-pass.h (pass_tree_fold_builtin_target): New pass.
>        * builtins.def (BUILT_IN_TARGET_SUPPORTS_CMOV): New builtin.
>        (BUILT_IN_TARGET_SUPPORTS_MMX): New builtin.
>        (BUILT_IN_TARGET_SUPPORTS_POPCOUNT): New builtin.
>        (BUILT_IN_TARGET_SUPPORTS_SSE): New builtin.
>        (BUILT_IN_TARGET_SUPPORTS_SSE2): New builtin.
>        (BUILT_IN_TARGET_SUPPORTS_SSE3): New builtin.
>        (BUILT_IN_TARGET_SUPPORTS_SSSE3): New builtin.
>        (BUILT_IN_TARGET_SUPPORTS_SSE4_1): New builtin.
>        (BUILT_IN_TARGET_SUPPORTS_SSE4_2): New builtin.
>        (BUILT_IN_TARGET_IS_AMD): New builtin.
>        (BUILT_IN_TARGET_IS_INTEL): New builtin.
>        (BUILT_IN_TARGET_IS_COREI7_NEHALEM): New builtin.
>        (BUILT_IN_TARGET_IS_COREI7_WESTMERE): New builtin.
>        (BUILT_IN_TARGET_IS_COREI7_SANDYBRIDGE): New builtin.
>        (BUILT_IN_TARGET_IS_AMDFAM10_BARCELONA): New builtin.
>        (BUILT_IN_TARGET_IS_AMDFAM10_SHANGHAI): New builtin.
>        (BUILT_IN_TARGET_IS_AMDFAM10_ISTANBUL): New builtin.
>        * mversn-dispatch.c (do_fold_builtin_target): New function.
>        (gate_fold_builtin_target): New function.
>        (pass_tree_fold_builtin_target): New pass.
>        * timevar.def (TV_FOLD_BUILTIN_TARGET): New var.
>        * passes.c (init_optimization_passes): Add new pass to pass list.
>        * config/i386/i386.c (build_struct_with_one_bit_fields): New function.
>        (make_var_decl): New function.
>        (get_field_from_struct): New function.
>        (make_constructor_to_get_target_type): New function.
>        (fold_builtin_target): New function.
>        (ix86_fold_builtin): New function.
>        (TARGET_FOLD_BUILTIN): New macro.
>
>        * gcc.dg/builtin_target.c: New test.
>
>        * config/i386/i386-cpuinfo.c: New file.
>        * config/i386/t-cpuinfo: New file.
>        * config.host: Add t-cpuinfo to link i386-cpuinfo.o with libgcc
>
> Index: libgcc/config.host
> ===================================================================
> --- libgcc/config.host  (revision 177767)
> +++ libgcc/config.host  (working copy)
> @@ -609,7 +609,7 @@ case ${host} in
>  i[34567]86-*-linux* | x86_64-*-linux* | \
>   i[34567]86-*-kfreebsd*-gnu | i[34567]86-*-knetbsd*-gnu | \
>   i[34567]86-*-gnu*)
> -       tmake_file="${tmake_file} t-tls"
> +       tmake_file="${tmake_file} t-tls i386/t-cpuinfo"
>        if test "$libgcc_cv_cfi" = "yes"; then
>                tmake_file="${tmake_file} t-stack i386/t-stack-i386"
>        fi
> Index: libgcc/config/i386/t-cpuinfo
> ===================================================================
> --- libgcc/config/i386/t-cpuinfo        (revision 0)
> +++ libgcc/config/i386/t-cpuinfo        (revision 0)
> @@ -0,0 +1,2 @@
> +# This is an endfile
> +LIB2ADD += $(srcdir)/config/i386/i386-cpuinfo.c
> Index: libgcc/config/i386/i386-cpuinfo.c
> ===================================================================
> --- libgcc/config/i386/i386-cpuinfo.c   (revision 0)
> +++ libgcc/config/i386/i386-cpuinfo.c   (revision 0)
> @@ -0,0 +1,275 @@
> +/* Copyright (C) 2011 Free Software Foundation, Inc.
> + * Contributed by Sriraman Tallam <tmsriram@google.com>.
> + *
> + * This file is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU General Public License as published by the
> + * Free Software Foundation; either version 3, or (at your option) any
> + * later version.
> + *
> + * This file is distributed in the hope that it will be useful, but
> + * WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * General Public License for more details.
> + *
> + * Under Section 7 of GPL version 3, you are granted additional
> + * permissions described in the GCC Runtime Library Exception, version
> + * 3.1, as published by the Free Software Foundation.
> + *
> + * You should have received a copy of the GNU General Public License and
> + * a copy of the GCC Runtime Library Exception along with this program;
> + * see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> + * <http://www.gnu.org/licenses/>.
> + *
> + *
> + * This code is adapted from gcc/config/i386/driver-i386.c. The CPUID
> + * instruction is used to figure out the cpu type and supported features.
> + * GCC runs __cpu_indicator_init from a constructor which sets the members
> + * of __cpu_model and __cpu_features.
> + */
> +
> +#include <string.h>
> +
> +#ifdef __GNUC__
> +#include "cpuid.h"
> +
> +enum processor_type
> +{
> +  PROCESSOR_PENTIUM = 0,
> +  PROCESSOR_CORE2,
> +  PROCESSOR_COREI7_NEHALEM,
> +  PROCESSOR_COREI7_WESTMERE,
> +  PROCESSOR_COREI7_SANDYBRIDGE,
> +  PROCESSOR_INTEL_GENERIC,
> +  PROCESSOR_AMDFAM10_BARCELONA,
> +  PROCESSOR_AMDFAM10_SHANGHAI,
> +  PROCESSOR_AMDFAM10_ISTANBUL,
> +  PROCESSOR_AMDFAM10_GENERIC,
> +  PROCESSOR_AMD_GENERIC,
> +  PROCESSOR_GENERIC,
> +  PROCESSOR_max
> +};
> +
> +enum vendor_signatures
> +{
> +  SIG_INTEL =  0x756e6547 /* Genu */,
> +  SIG_AMD =    0x68747541 /* Auth */
> +};
> +
> +
> +/* Features supported. */
> +
> +struct __processor_features
> +{
> +  unsigned int __cpu_cmov : 1;
> +  unsigned int __cpu_mmx : 1;
> +  unsigned int __cpu_popcnt : 1;
> +  unsigned int __cpu_sse : 1;
> +  unsigned int __cpu_sse2 : 1;
> +  unsigned int __cpu_sse3 : 1;
> +  unsigned int __cpu_ssse3 : 1;
> +  unsigned int __cpu_sse4_1 : 1;
> +  unsigned int __cpu_sse4_2 : 1;
> +};
> +
> +/* Flags exported. */
> +
> +struct __processor_model
> +{
> +  unsigned int __cpu_is_amd : 1;
> +  unsigned int __cpu_is_intel : 1;
> +  unsigned int __cpu_is_corei7_nehalem : 1;
> +  unsigned int __cpu_is_corei7_westmere : 1;
> +  unsigned int __cpu_is_corei7_sandybridge : 1;
> +  unsigned int __cpu_is_amdfam10_barcelona : 1;
> +  unsigned int __cpu_is_amdfam10_shanghai : 1;
> +  unsigned int __cpu_is_amdfam10_istanbul : 1;
> +};
> +
> +enum processor_type __cpu_type = PROCESSOR_GENERIC;
> +struct __processor_features __cpu_features;
> +struct __processor_model __cpu_model;
> +
> +static void
> +get_amd_cpu (unsigned int family, unsigned int model)
> +{
> +  switch (family)
> +    {
> +    case 0x10:
> +      switch (model)
> +       {
> +       case 0x2:
> +         __cpu_type = PROCESSOR_AMDFAM10_BARCELONA;
> +         __cpu_model.__cpu_is_amdfam10_barcelona = 1;
> +         break;
> +       case 0x4:
> +         __cpu_type = PROCESSOR_AMDFAM10_SHANGHAI;
> +         __cpu_model.__cpu_is_amdfam10_shanghai = 1;
> +         break;
> +       case 0x8:
> +         __cpu_type = PROCESSOR_AMDFAM10_ISTANBUL;
> +         __cpu_model.__cpu_is_amdfam10_istanbul = 1;
> +         break;
> +       default:
> +         __cpu_type = PROCESSOR_AMDFAM10_GENERIC;
> +         break;
> +       }
> +      break;
> +    default:
> +      __cpu_type = PROCESSOR_AMD_GENERIC;
> +    }
> +}
> +
> +static void
> +get_intel_cpu (unsigned int family, unsigned int model, unsigned int brand_id)
> +{
> +  /* Parse family and model only if brand ID is 0. */
> +  if (brand_id == 0)
> +    {
> +      switch (family)
> +       {
> +       case 0x5:
> +         __cpu_type = PROCESSOR_PENTIUM;
> +         break;
> +       case 0x6:
> +         switch (model)
> +           {
> +           case 0x1a:
> +           case 0x1e:
> +           case 0x1f:
> +           case 0x2e:
> +             /* Nehalem.  */
> +             __cpu_type = PROCESSOR_COREI7_NEHALEM;
> +             __cpu_model.__cpu_is_corei7_nehalem = 1;
> +             break;
> +           case 0x25:
> +           case 0x2c:
> +           case 0x2f:
> +             /* Westmere.  */
> +             __cpu_type = PROCESSOR_COREI7_WESTMERE;
> +             __cpu_model.__cpu_is_corei7_westmere = 1;
> +             break;
> +           case 0x2a:
> +             /* Sandy Bridge.  */
> +             __cpu_type = PROCESSOR_COREI7_SANDYBRIDGE;
> +             __cpu_model.__cpu_is_corei7_sandybridge = 1;
> +             break;
> +           case 0x17:
> +           case 0x1d:
> +             /* Penryn.  */
> +           case 0x0f:
> +             /* Merom.  */
> +             __cpu_type = PROCESSOR_CORE2;
> +             break;
> +           default:
> +             __cpu_type = PROCESSOR_INTEL_GENERIC;
> +             break;
> +           }
> +         break;
> +       default:
> +         /* We have no idea.  */
> +         __cpu_type = PROCESSOR_INTEL_GENERIC;
> +         break;
> +       }
> +    }
> +}
> +
> +static void
> +get_available_features (unsigned int ecx, unsigned int edx)
> +{
> +  __cpu_features.__cpu_cmov = (edx & bit_CMOV) ? 1 : 0;
> +  __cpu_features.__cpu_mmx = (edx & bit_MMX) ? 1 : 0;
> +  __cpu_features.__cpu_sse = (edx & bit_SSE) ? 1 : 0;
> +  __cpu_features.__cpu_sse2 = (edx & bit_SSE2) ? 1 : 0;
> +  __cpu_features.__cpu_popcnt = (ecx & bit_POPCNT) ? 1 : 0;
> +  __cpu_features.__cpu_sse3 = (ecx & bit_SSE3) ? 1 : 0;
> +  __cpu_features.__cpu_ssse3 = (ecx & bit_SSSE3) ? 1 : 0;
> +  __cpu_features.__cpu_sse4_1 = (ecx & bit_SSE4_1) ? 1 : 0;
> +  __cpu_features.__cpu_sse4_2 = (ecx & bit_SSE4_2) ? 1 : 0;
> +}
> +
> +/* A noinline function calling __get_cpuid. Having many calls to
> +   cpuid in one function in 32-bit mode causes GCC to complain:
> +   "can’t find a register in class ‘CLOBBERED_REGS’".  This is
> +   related to PR rtl-optimization 44174. */
> +
> +static int __attribute__ ((noinline))
> +__get_cpuid_output (unsigned int __level,
> +                   unsigned int *__eax, unsigned int *__ebx,
> +                   unsigned int *__ecx, unsigned int *__edx)
> +{
> +  return __get_cpuid (__level, __eax, __ebx, __ecx, __edx);
> +}
> +
> +/* This function will be linked in to binaries that need to look up
> +   CPU information.  */
> +
> +void
> +__cpu_indicator_init(void)
> +{
> +  unsigned int eax, ebx, ecx, edx;
> +
> +  int max_level = 5;
> +  unsigned int vendor;
> +  unsigned int model, family, brand_id;
> +
> +  memset (&__cpu_features, 0, sizeof (struct __processor_features));
> +  memset (&__cpu_model, 0, sizeof (struct __processor_model));
> +
> +  /* Assume cpuid insn present. Run in level 0 to get vendor id. */
> +  if (!__get_cpuid_output (0, &eax, &ebx, &ecx, &edx))
> +    return;
> +
> +  vendor = ebx;
> +  max_level = eax;
> +
> +  if (max_level < 1)
> +    return;
> +
> +  if (!__get_cpuid_output (1, &eax, &ebx, &ecx, &edx))
> +    return;
> +
> +  model = (eax >> 4) & 0x0f;
> +  family = (eax >> 8) & 0x0f;
> +  brand_id = ebx & 0xff;
> +
> +  /* Adjust model and family for Intel CPUS. */
> +  if (vendor == SIG_INTEL)
> +    {
> +      unsigned int extended_model, extended_family;
> +
> +      extended_model = (eax >> 12) & 0xf0;
> +      extended_family = (eax >> 20) & 0xff;
> +      if (family == 0x0f)
> +       {
> +         family += extended_family;
> +         model += extended_model;
> +       }
> +      else if (family == 0x06)
> +       model += extended_model;
> +    }
> +
> +  /* Find CPU model. */
> +
> +  if (vendor == SIG_AMD)
> +    {
> +      __cpu_model.__cpu_is_amd = 1;
> +      get_amd_cpu (family, model);
> +    }
> +  else if (vendor == SIG_INTEL)
> +    {
> +      __cpu_model.__cpu_is_intel = 1;
> +      get_intel_cpu (family, model, brand_id);
> +    }
> +
> +  /* Find available features. */
> +  get_available_features (ecx, edx);
> +}
> +
> +#else
> +
> +void
> +__cpu_indicator_init(void)
> +{
> +}
> +
> +#endif /* __GNUC__ */
> Index: gcc/tree-pass.h
> ===================================================================
> --- gcc/tree-pass.h     (revision 177767)
> +++ gcc/tree-pass.h     (working copy)
> @@ -449,6 +449,7 @@ extern struct gimple_opt_pass pass_split_functions
>  extern struct gimple_opt_pass pass_feedback_split_functions;
>  extern struct gimple_opt_pass pass_threadsafe_analyze;
>  extern struct gimple_opt_pass pass_tree_convert_builtin_dispatch;
> +extern struct gimple_opt_pass pass_tree_fold_builtin_target;
>
>  /* IPA Passes */
>  extern struct simple_ipa_opt_pass pass_ipa_lower_emutls;
> Index: gcc/testsuite/gcc.dg/builtin_target.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/builtin_target.c       (revision 0)
> +++ gcc/testsuite/gcc.dg/builtin_target.c       (revision 0)
> @@ -0,0 +1,49 @@
> +/* This test checks if the __builtin_target_* calls are recognized. */
> +
> +/* { dg-do run } */
> +
> +int
> +fn1 ()
> +{
> +  if (__builtin_target_supports_cmov () < 0)
> +    return -1;
> +  if (__builtin_target_supports_mmx () < 0)
> +    return -1;
> +  if (__builtin_target_supports_popcount () < 0)
> +    return -1;
> +  if (__builtin_target_supports_sse () < 0)
> +    return -1;
> +  if (__builtin_target_supports_sse2 () < 0)
> +    return -1;
> +  if (__builtin_target_supports_sse3 () < 0)
> +    return -1;
> +  if (__builtin_target_supports_ssse3 () < 0)
> +    return -1;
> +  if (__builtin_target_supports_sse4_1 () < 0)
> +    return -1;
> +  if (__builtin_target_supports_sse4_2 () < 0)
> +    return -1;
> +  if (__builtin_target_is_amd () < 0)
> +    return -1;
> +  if (__builtin_target_is_intel () < 0)
> +    return -1;
> +  if (__builtin_target_is_corei7_nehalem () < 0)
> +    return -1;
> +  if (__builtin_target_is_corei7_westmere () < 0)
> +    return -1;
> +  if (__builtin_target_is_corei7_sandybridge () < 0)
> +    return -1;
> +  if (__builtin_target_is_amdfam10_barcelona () < 0)
> +    return -1;
> +  if (__builtin_target_is_amdfam10_shanghai () < 0)
> +    return -1;
> +  if (__builtin_target_is_amdfam10_istanbul () < 0)
> +    return -1;
> +
> +  return 0;
> +}
> +
> +int main ()
> +{
> +  return fn1 ();
> +}
> Index: gcc/builtins.def
> ===================================================================
> --- gcc/builtins.def    (revision 177767)
> +++ gcc/builtins.def    (working copy)
> @@ -763,6 +763,25 @@ DEF_BUILTIN (BUILT_IN_EMUTLS_REGISTER_COMMON,
>  /* Multiversioning builtin dispatch hook. */
>  DEF_GCC_BUILTIN (BUILT_IN_DISPATCH, "dispatch", BT_FN_INT_PTR_FN_INT_PTR_PTR_VAR, ATTR_NULL)
>
> +/* Builtins to determine target type and features at run-time. */
> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_SUPPORTS_CMOV, "target_supports_cmov", BT_FN_INT, ATTR_NULL)
> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_SUPPORTS_MMX, "target_supports_mmx", BT_FN_INT, ATTR_NULL)
> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_SUPPORTS_POPCOUNT, "target_supports_popcount", BT_FN_INT, ATTR_NULL)
> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_SUPPORTS_SSE, "target_supports_sse", BT_FN_INT, ATTR_NULL)
> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_SUPPORTS_SSE2, "target_supports_sse2", BT_FN_INT, ATTR_NULL)
> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_SUPPORTS_SSE3, "target_supports_sse3", BT_FN_INT, ATTR_NULL)
> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_SUPPORTS_SSSE3, "target_supports_ssse3", BT_FN_INT, ATTR_NULL)
> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_SUPPORTS_SSE4_1, "target_supports_sse4_1", BT_FN_INT, ATTR_NULL)
> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_SUPPORTS_SSE4_2, "target_supports_sse4_2", BT_FN_INT, ATTR_NULL)
> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_IS_AMD, "target_is_amd", BT_FN_INT, ATTR_NULL)
> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_IS_INTEL, "target_is_intel", BT_FN_INT, ATTR_NULL)
> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_IS_COREI7_NEHALEM, "target_is_corei7_nehalem", BT_FN_INT, ATTR_NULL)
> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_IS_COREI7_WESTMERE, "target_is_corei7_westmere", BT_FN_INT, ATTR_NULL)
> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_IS_COREI7_SANDYBRIDGE, "target_is_corei7_sandybridge", BT_FN_INT, ATTR_NULL)
> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_IS_AMDFAM10_BARCELONA, "target_is_amdfam10_barcelona", BT_FN_INT, ATTR_NULL)
> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_IS_AMDFAM10_SHANGHAI, "target_is_amdfam10_shanghai", BT_FN_INT, ATTR_NULL)
> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_IS_AMDFAM10_ISTANBUL, "target_is_amdfam10_istanbul", BT_FN_INT, ATTR_NULL)
> +
>  /* Exception support.  */
>  DEF_BUILTIN_STUB (BUILT_IN_UNWIND_RESUME, "__builtin_unwind_resume")
>  DEF_BUILTIN_STUB (BUILT_IN_CXA_END_CLEANUP, "__builtin_cxa_end_cleanup")
> Index: gcc/mversn-dispatch.c
> ===================================================================
> --- gcc/mversn-dispatch.c       (revision 177767)
> +++ gcc/mversn-dispatch.c       (working copy)
> @@ -135,6 +135,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "output.h"
>  #include "vecprim.h"
>  #include "gimple-pretty-print.h"
> +#include "target.h"
>
>  typedef struct cgraph_node* NODEPTR;
>  DEF_VEC_P (NODEPTR);
> @@ -1764,3 +1765,103 @@ struct gimple_opt_pass pass_tree_convert_builtin_d
>   TODO_update_ssa | TODO_verify_ssa
>  }
>  };
> +
> +/* Fold calls to __builtin_target_* */
> +
> +static unsigned int
> +do_fold_builtin_target (void)
> +{
> +  basic_block bb;
> +  gimple_stmt_iterator gsi;
> +
> +  /* Go through each stmt looking for __builtin_target_* calls */
> +  FOR_EACH_BB_FN (bb, DECL_STRUCT_FUNCTION (current_function_decl))
> +    {
> +      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> +        {
> +         gimple stmt = gsi_stmt (gsi);
> +         gimple assign_stmt;
> +          tree call_decl;
> +         tree lhs_retval;
> +         tree folded_val;
> +
> +         tree ssa_var, tmp_var;
> +         gimple init_stmt;
> +
> +          if (!is_gimple_call (stmt))
> +            continue;
> +
> +          call_decl = gimple_call_fndecl (stmt);
> +
> +         /* Check if it is a __builtin_target_* call. */
> +
> +         if (call_decl == NULL
> +             || DECL_NAME (call_decl) == NULL_TREE
> +             || DECL_BUILT_IN_CLASS (call_decl) != BUILT_IN_NORMAL
> +             || strstr (IDENTIFIER_POINTER (DECL_NAME (call_decl)),
> +                         "__builtin_target") == NULL)
> +            continue;
> +
> +         /* If the lhs is NULL there is no need to fold the call. */
> +         lhs_retval = gimple_call_lhs(stmt);
> +         if (lhs_retval == NULL)
> +           continue;
> +
> +         /* Call the target hook to fold the builtin */
> +          folded_val = targetm.fold_builtin(call_decl, 0, NULL, false);
> +
> +         /* If the target does not support the builtin then fold it to zero. */
> +         if (folded_val == NULL_TREE)
> +           folded_val = build_zero_cst (unsigned_type_node);
> +
> +         /* Type cast unsigned value to integer */
> +         tmp_var = create_tmp_var (unsigned_type_node, NULL);
> +         init_stmt = gimple_build_assign (tmp_var, folded_val);
> +         ssa_var = make_ssa_name (tmp_var, init_stmt);
> +         gimple_assign_set_lhs (init_stmt, ssa_var);
> +         mark_symbols_for_renaming (init_stmt);
> +
> +         assign_stmt = gimple_build_assign_with_ops (NOP_EXPR, lhs_retval, ssa_var, 0);
> +         mark_symbols_for_renaming(assign_stmt);
> +
> +         gsi_insert_after_without_update (&gsi, assign_stmt, GSI_SAME_STMT);
> +         gsi_insert_after_without_update (&gsi, init_stmt, GSI_SAME_STMT);
> +         /* Delete the original call. */
> +         gsi_remove(&gsi, true);
> +       }
> +    }
> +
> +  return 0;
> +}
> +
> +static bool
> +gate_fold_builtin_target (void)
> +{
> +  return true;
> +}
> +
> +/* Pass to fold __builtin_target_* functions */
> +
> +struct gimple_opt_pass pass_tree_fold_builtin_target =
> +{
> + {
> +  GIMPLE_PASS,
> +  "fold_builtin_target",               /* name */
> +  gate_fold_builtin_target,            /* gate */
> +  do_fold_builtin_target,              /* execute */
> +  NULL,                                        /* sub */
> +  NULL,                                        /* next */
> +  0,                                   /* static_pass_number */
> +  TV_FOLD_BUILTIN_TARGET,              /* tv_id */
> +  PROP_cfg,                            /* properties_required */
> +  PROP_cfg,                            /* properties_provided */
> +  0,                                   /* properties_destroyed */
> +  0,                                   /* todo_flags_start */
> +  TODO_dump_func |                     /* todo_flags_finish */
> +  TODO_cleanup_cfg |
> +  TODO_update_ssa |
> +  TODO_verify_ssa
> + }
> +};
> +
> +
> Index: gcc/timevar.def
> ===================================================================
> --- gcc/timevar.def     (revision 177767)
> +++ gcc/timevar.def     (working copy)
> @@ -124,6 +124,7 @@ DEFTIMEVAR (TV_PARSE_INMETH          , "parser inl
>  DEFTIMEVAR (TV_TEMPLATE_INST         , "template instantiation")
>  DEFTIMEVAR (TV_INLINE_HEURISTICS     , "inline heuristics")
>  DEFTIMEVAR (TV_MVERSN_DISPATCH       , "multiversion dispatch")
> +DEFTIMEVAR (TV_FOLD_BUILTIN_TARGET   , "fold __builtin_target calls")
>  DEFTIMEVAR (TV_INTEGRATION           , "integration")
>  DEFTIMEVAR (TV_TREE_GIMPLIFY        , "tree gimplify")
>  DEFTIMEVAR (TV_TREE_EH              , "tree eh")
> Index: gcc/passes.c
> ===================================================================
> --- gcc/passes.c        (revision 177767)
> +++ gcc/passes.c        (working copy)
> @@ -1249,6 +1249,8 @@ init_optimization_passes (void)
>     {
>       struct opt_pass **p = &pass_ipa_multiversion_dispatch.pass.sub;
>       NEXT_PASS (pass_tree_convert_builtin_dispatch);
> +      /* Fold calls to __builtin_target_*. */
> +      NEXT_PASS (pass_tree_fold_builtin_target);
>       /* Rebuilding cgraph edges is necessary as the above passes change
>          the call graph.  Otherwise, future optimizations use the old
>         call graph and make wrong decisions sometimes.*/
> Index: gcc/config/i386/i386.c
> ===================================================================
> --- gcc/config/i386/i386.c      (revision 177767)
> +++ gcc/config/i386/i386.c      (working copy)
> @@ -58,6 +58,8 @@ along with GCC; see the file COPYING3.  If not see
>  #include "sched-int.h"
>  #include "sbitmap.h"
>  #include "fibheap.h"
> +#include "tree-flow.h"
> +#include "tree-pass.h"
>
>  enum upper_128bits_state
>  {
> @@ -7867,6 +7869,338 @@ ix86_build_builtin_va_list (void)
>   return ret;
>  }
>
> +/* Returns a struct type with name NAME and number of fields equal to
> +   NUM_FIELDS.  Each field is a unsigned int bit field of length 1 bit. */
> +
> +static tree
> +build_struct_with_one_bit_fields (int num_fields, const char *name)
> +{
> +  int i;
> +  char field_name [10];
> +  tree field = NULL_TREE, field_chain = NULL_TREE;
> +  tree type = make_node (RECORD_TYPE);
> +
> +  strcpy (field_name, "k_field");
> +
> +  for (i = 0; i < num_fields; i++)
> +    {
> +      /* Name the fields, 0_field, 1_field, ... */
> +      field_name [0] = '0' + i;
> +      field = build_decl (UNKNOWN_LOCATION, FIELD_DECL,
> +                         get_identifier (field_name), unsigned_type_node);
> +      DECL_BIT_FIELD (field) = 1;
> +      DECL_SIZE (field) = bitsize_one_node;
> +      if (field_chain != NULL_TREE)
> +       DECL_CHAIN (field) = field_chain;
> +      field_chain = field;
> +    }
> +  finish_builtin_struct (type, name, field_chain, NULL_TREE);
> +  return type;
> +}
> +
> +/* Returns a VAR_DECL of type TYPE and name NAME. */
> +
> +static tree
> +make_var_decl (tree type, const char *name)
> +{
> +  tree new_decl;
> +  struct varpool_node *vnode;
> +
> +  new_decl = build_decl (UNKNOWN_LOCATION,
> +                        VAR_DECL,
> +                        get_identifier(name),
> +                        type);
> +
> +  DECL_EXTERNAL (new_decl) = 1;
> +  TREE_STATIC (new_decl) = 1;
> +  TREE_PUBLIC (new_decl) = 1;
> +  DECL_INITIAL (new_decl) = 0;
> +  DECL_ARTIFICIAL (new_decl) = 0;
> +  DECL_PRESERVE_P (new_decl) = 1;
> +
> +  make_decl_one_only (new_decl, DECL_ASSEMBLER_NAME (new_decl));
> +  assemble_variable (new_decl, 0, 0, 0);
> +
> +  vnode = varpool_node (new_decl);
> +  gcc_assert (vnode != NULL);
> +  /* Set finalized to 1, otherwise it asserts in function "write_symbol" in
> +     lto-streamer-out.c. */
> +  vnode->finalized = 1;
> +
> +  return new_decl;
> +}
> +
> +/* Traverses the chain of fields in STRUCT_TYPE and returns the FIELD_NUM
> +   numbered field. */
> +
> +static tree
> +get_field_from_struct (tree struct_type, int field_num)
> +{
> +  int i;
> +  tree field = TYPE_FIELDS (struct_type);
> +
> +  for (i = 0; i < field_num; i++, field = DECL_CHAIN(field))
> +    {
> +      gcc_assert (field != NULL_TREE);
> +    }
> +
> +  return field;
> +}
> +
> +/* Create a new static constructor that calls __cpu_indicator_init ()
> +   function defined in libgcc/config/i386-cpuinfo.c which runs cpuid
> +   to figure out the type of the target. */
> +
> +static tree
> +make_constructor_to_get_target_type (const char *name)
> +{
> +  tree decl, type, t;
> +  gimple_seq seq;
> +  basic_block new_bb;
> +  tree old_current_function_decl;
> +
> +  tree __cpu_indicator_int_decl;
> +  gimple constructor_body;
> +
> +
> +  type = build_function_type_list (void_type_node, NULL_TREE);
> +
> +  /* Make a call stmt to __cpu_indicator_init */
> +  __cpu_indicator_int_decl = build_fn_decl ("__cpu_indicator_init", type);
> +  constructor_body = gimple_build_call (__cpu_indicator_int_decl, 0);
> +  DECL_EXTERNAL (__cpu_indicator_int_decl) = 1;
> +
> +  decl = build_fn_decl (name, type);
> +
> +  DECL_NAME (decl) = get_identifier (name);
> +  SET_DECL_ASSEMBLER_NAME (decl, DECL_NAME (decl));
> +  gcc_assert (cgraph_node (decl) != NULL);
> +
> +  TREE_USED (decl) = 1;
> +  DECL_ARTIFICIAL (decl) = 1;
> +  DECL_IGNORED_P (decl) = 0;
> +  TREE_PUBLIC (decl) = 0;
> +  DECL_UNINLINABLE (decl) = 1;
> +  DECL_EXTERNAL (decl) = 0;
> +  DECL_CONTEXT (decl) = NULL_TREE;
> +  DECL_INITIAL (decl) = make_node (BLOCK);
> +  DECL_STATIC_CONSTRUCTOR (decl) = 1;
> +  TREE_READONLY (decl) = 0;
> +  DECL_PURE_P (decl) = 0;
> +
> +  /* This is a comdat. */
> +  make_decl_one_only (decl, DECL_ASSEMBLER_NAME (decl));
> +
> +  /* Build result decl and add to function_decl. */
> +  t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE, void_type_node);
> +  DECL_ARTIFICIAL (t) = 1;
> +  DECL_IGNORED_P (t) = 1;
> +  DECL_RESULT (decl) = t;
> +
> +  gimplify_function_tree (decl);
> +
> +  /* Build CFG for this function. */
> +
> +  old_current_function_decl = current_function_decl;
> +  push_cfun (DECL_STRUCT_FUNCTION (decl));
> +  current_function_decl = decl;
> +  init_empty_tree_cfg_for_function (DECL_STRUCT_FUNCTION (decl));
> +  cfun->curr_properties |=
> +    (PROP_gimple_lcf | PROP_gimple_leh | PROP_cfg | PROP_referenced_vars |
> +     PROP_ssa);
> +  new_bb = create_empty_bb (ENTRY_BLOCK_PTR);
> +  make_edge (ENTRY_BLOCK_PTR, new_bb, EDGE_FALLTHRU);
> +
> +  /* XXX: Not sure if the edge commented below is necessary.  If I add this
> +     edge, it fails in gimple_verify_flow_info in tree-cfg.c in condition :
> +     " if (e->flags & EDGE_FALLTHRU)"
> +     during -fprofile-generate.
> +     Otherwise, it is fine.  Deleting this edge does not break anything.
> +     Commenting this so that it is clear I am intentionally not doing this.*/
> +  /* make_edge (new_bb, EXIT_BLOCK_PTR, EDGE_FALLTHRU); */
> +
> +  seq = gimple_seq_alloc_with_stmt (constructor_body);
> +
> +  set_bb_seq (new_bb, seq);
> +  gimple_set_bb (constructor_body, new_bb);
> +
> +  /* Set the lexical block of the constructor body. Fails the inliner
> +     other wise. */
> +  gimple_set_block (constructor_body, DECL_INITIAL (decl));
> +
> +  /* This call is very important if this pass runs when the IR is in
> +     SSA form.  It breaks things in strange ways otherwise. */
> +  init_tree_ssa (DECL_STRUCT_FUNCTION (decl));
> +  /* add_referenced_var (version_selector_var); */
> +
> +  cgraph_add_new_function (decl, true);
> +  cgraph_call_function_insertion_hooks (cgraph_node (decl));
> +  cgraph_mark_needed_node (cgraph_node (decl));
> +
> +  pop_cfun ();
> +  current_function_decl = old_current_function_decl;
> +  return decl;
> +}
> +
> +/* FNDECL is a __builtin_target_* call that is folded into an integer defined
> +   in libgcc/config/i386/i386-cpuinfo.c */
> +
> +static tree
> +fold_builtin_target (tree fndecl)
> +{
> +  /* This is the order of bit-fields in __processor_features in
> +     i386-cpuinfo.c */
> +  enum processor_features
> +  {
> +    F_CMOV = 0,
> +    F_MMX,
> +    F_POPCNT,
> +    F_SSE,
> +    F_SSE2,
> +    F_SSE3,
> +    F_SSSE3,
> +    F_SSE4_1,
> +    F_SSE4_2,
> +    F_MAX
> +  };
> +
> +  /* This is the order of bit-fields in __processor_model in
> +     i386-cpuinfo.c */
> +  enum processor_model
> +  {
> +    M_AMD = 0,
> +    M_INTEL,
> +    M_COREI7_NEHALEM,
> +    M_COREI7_WESTMERE,
> +    M_COREI7_SANDYBRIDGE,
> +    M_AMDFAM10_BARCELONA,
> +    M_AMDFAM10_SHANGHAI,
> +    M_AMDFAM10_ISTANBUL,
> +    M_MAX
> +  };
> +
> +  static tree __processor_features_type = NULL_TREE;
> +  static tree __cpu_features_var = NULL_TREE;
> +  static tree __processor_model_type = NULL_TREE;
> +  static tree __cpu_model_var = NULL_TREE;
> +  static tree ctor_decl = NULL_TREE;
> +  static tree field;
> +  static tree which_struct;
> +
> +  /* Make a call to __cpu_indicatior_init in a constructor.
> +     Function __cpu_indicator_init is defined in i386-cpuinfo.c. */
> +  if (ctor_decl == NULL_TREE)
> +   ctor_decl = make_constructor_to_get_target_type
> +               ("__cpu_indicator_init_ctor");
> +
> +  if (__processor_features_type == NULL_TREE)
> +    __processor_features_type = build_struct_with_one_bit_fields (F_MAX,
> +                                 "__processor_features");
> +
> +  if (__processor_model_type == NULL_TREE)
> +    __processor_model_type = build_struct_with_one_bit_fields (M_MAX,
> +                                 "__processor_model");
> +
> +  if (__cpu_features_var == NULL_TREE)
> +    __cpu_features_var = make_var_decl (__processor_features_type,
> +                                       "__cpu_features");
> +
> +  if (__cpu_model_var == NULL_TREE)
> +    __cpu_model_var = make_var_decl (__processor_model_type,
> +                                    "__cpu_model");
> +
> +  /* Look at fndecl code to identify the field requested. */
> +  switch (DECL_FUNCTION_CODE (fndecl))
> +    {
> +    case BUILT_IN_TARGET_SUPPORTS_CMOV:
> +      field = get_field_from_struct (__processor_features_type, F_CMOV);
> +      which_struct = __cpu_features_var;
> +      break;
> +    case BUILT_IN_TARGET_SUPPORTS_MMX:
> +      field = get_field_from_struct (__processor_features_type, F_MMX);
> +      which_struct = __cpu_features_var;
> +      break;
> +    case BUILT_IN_TARGET_SUPPORTS_POPCOUNT:
> +      field = get_field_from_struct (__processor_features_type, F_POPCNT);
> +      which_struct = __cpu_features_var;
> +      break;
> +    case BUILT_IN_TARGET_SUPPORTS_SSE:
> +      field = get_field_from_struct (__processor_features_type, F_SSE);
> +      which_struct = __cpu_features_var;
> +      break;
> +    case BUILT_IN_TARGET_SUPPORTS_SSE2:
> +      field = get_field_from_struct (__processor_features_type, F_SSE2);
> +      which_struct = __cpu_features_var;
> +      break;
> +    case BUILT_IN_TARGET_SUPPORTS_SSE3:
> +      field = get_field_from_struct (__processor_features_type, F_SSE3);
> +      which_struct = __cpu_features_var;
> +      break;
> +    case BUILT_IN_TARGET_SUPPORTS_SSSE3:
> +      field = get_field_from_struct (__processor_features_type, F_SSE3);
> +      which_struct = __cpu_features_var;
> +      break;
> +    case BUILT_IN_TARGET_SUPPORTS_SSE4_1:
> +      field = get_field_from_struct (__processor_features_type, F_SSE4_1);
> +      which_struct = __cpu_features_var;
> +      break;
> +    case BUILT_IN_TARGET_SUPPORTS_SSE4_2:
> +      field = get_field_from_struct (__processor_features_type, F_SSE4_2);
> +      which_struct = __cpu_features_var;
> +      break;
> +    case BUILT_IN_TARGET_IS_AMD:
> +      field = get_field_from_struct (__processor_model_type, M_AMD);;
> +      which_struct = __cpu_model_var;
> +      break;
> +    case BUILT_IN_TARGET_IS_INTEL:
> +      field = get_field_from_struct (__processor_model_type, M_INTEL);;
> +      which_struct = __cpu_model_var;
> +      break;
> +    case BUILT_IN_TARGET_IS_COREI7_NEHALEM:
> +      field = get_field_from_struct (__processor_model_type, M_COREI7_NEHALEM);;
> +      which_struct = __cpu_model_var;
> +      break;
> +    case BUILT_IN_TARGET_IS_COREI7_WESTMERE:
> +      field = get_field_from_struct (__processor_model_type, M_COREI7_WESTMERE);;
> +      which_struct = __cpu_model_var;
> +      break;
> +    case BUILT_IN_TARGET_IS_COREI7_SANDYBRIDGE:
> +      field = get_field_from_struct (__processor_model_type, M_COREI7_SANDYBRIDGE);;
> +      which_struct = __cpu_model_var;
> +      break;
> +    case BUILT_IN_TARGET_IS_AMDFAM10_BARCELONA:
> +      field = get_field_from_struct (__processor_model_type, M_AMDFAM10_BARCELONA);;
> +      which_struct = __cpu_model_var;
> +      break;
> +    case BUILT_IN_TARGET_IS_AMDFAM10_SHANGHAI:
> +      field = get_field_from_struct (__processor_model_type, M_AMDFAM10_SHANGHAI);;
> +      which_struct = __cpu_model_var;
> +      break;
> +    case BUILT_IN_TARGET_IS_AMDFAM10_ISTANBUL:
> +      field = get_field_from_struct (__processor_model_type, M_AMDFAM10_ISTANBUL);;
> +      which_struct = __cpu_model_var;
> +      break;
> +    default:
> +      return NULL_TREE;
> +    }
> +
> +  return build3 (COMPONENT_REF, TREE_TYPE (field), which_struct, field, NULL_TREE);
> +}
> +
> +/* Folds __builtin_target_* builtins. */
> +
> +static tree
> +ix86_fold_builtin (tree fndecl, int n_args ATTRIBUTE_UNUSED,
> +                   tree *args ATTRIBUTE_UNUSED, bool ignore ATTRIBUTE_UNUSED)
> +{
> +  const char *decl_name = IDENTIFIER_POINTER (DECL_NAME (fndecl));
> +  if (DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL
> +      && strstr(decl_name, "__builtin_target") != NULL)
> +    return fold_builtin_target (fndecl);
> +
> +  return NULL_TREE;
> +}
> +
>  /* Worker function for TARGET_SETUP_INCOMING_VARARGS.  */
>
>  static void
> @@ -35097,6 +35431,9 @@ ix86_autovectorize_vector_sizes (void)
>  #undef TARGET_BUILD_BUILTIN_VA_LIST
>  #define TARGET_BUILD_BUILTIN_VA_LIST ix86_build_builtin_va_list
>
> +#undef TARGET_FOLD_BUILTIN
> +#define TARGET_FOLD_BUILTIN ix86_fold_builtin
> +
>  #undef TARGET_ENUM_VA_LIST_P
>  #define TARGET_ENUM_VA_LIST_P ix86_enum_va_list
>
>
> --
> This patch is available for review at http://codereview.appspot.com/4893046
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
  2011-08-17  9:38 ` Richard Guenther
@ 2011-08-17 20:04   ` Sriraman Tallam
  2011-08-18  9:33     ` Richard Guenther
  0 siblings, 1 reply; 50+ messages in thread
From: Sriraman Tallam @ 2011-08-17 20:04 UTC (permalink / raw)
  To: Richard Guenther; +Cc: reply, gcc-patches

On Wed, Aug 17, 2011 at 12:37 AM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Tue, Aug 16, 2011 at 10:50 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>> Support for getting CPU type and feature information at run-time.
>>
>> The following patch provides support for finding the platform type at run-time, like cpu type and features supported. The multi-versioning framework will use the builtins added to dispatch the right function version. Please refer to http://gcc.gnu.org/ml/gcc/2011-08/msg00298.html for details on function multi-versioning usability.
>
> Please provide an overview why you need the new builtins,

For multi-versioning,  the compiler can call the appropriate builtin
to dispatch the right version. The builtin call will later get folded.

For example,

int  __attribute__ version ("sse4_1")
compute ()
{
   // Do sse4_1 specific impkementation.
}

int
compute ()
{
  // Generic implementation
}

The compiler will check if the target supports the attribute and then
convert a call to compute ()  into  this:

if (__builtin_target_supports_sse4_1 ())
  compute_sse4_1 (); // Call to the SSE4_1 implementation
else
  compute_generic (); // Call to the generic implementation

Further, having it as builtin function allows it to be overridden by
the programmer. For instance, the programmer can override it to
identify newer CPU types not yet supported. Having these builtins
makes it convenient to identify platform type and features in general.

why you need
> a separate pass to fold them (instead of just expanding them) and why

I can move it into builtins.c along with where other builtins are
folded and remove the separate pass. My intention originally was to
fold them as early as possible, in this case after multi-versioning
but I guess this is not a requirement.

> you are creating
> vars behind the back of GCC:

The flow I had in mind was to have functions in libgcc which will use
CPUID to get target features and set global vars corresponding to the
features. So, the builtin should be folded by into the appropriate
variable in libgcc.

>
> +  /* Set finalized to 1, otherwise it asserts in function "write_symbol" in
> +     lto-streamer-out.c. */
> +  vnode->finalized = 1;
>
> where I think you miss a varpool_finalize_node call somewhere.  Why
> isn't this all done at target init time

I wanted to do this on demand. If none of the new builtins are called
in the program, I do not need to to do this at all. In summary, libgcc
has a function called __cpu_indicator_init which does the work of
determining target features and setting the appropriate globals. If
the new builtins are called, gcc will call __cpu_indicator_init in a
constructor so that it is called exactly once. Then, gcc will fold the
builtin to the appropriate global variable.


?  If you don't mark the
> variable as to be preserved
> like you do cgraph will optimize it all away if it isn't needed.

>
> Richard.
>
>>        * tree-pass.h (pass_tree_fold_builtin_target): New pass.
>>        * builtins.def (BUILT_IN_TARGET_SUPPORTS_CMOV): New builtin.
>>        (BUILT_IN_TARGET_SUPPORTS_MMX): New builtin.
>>        (BUILT_IN_TARGET_SUPPORTS_POPCOUNT): New builtin.
>>        (BUILT_IN_TARGET_SUPPORTS_SSE): New builtin.
>>        (BUILT_IN_TARGET_SUPPORTS_SSE2): New builtin.
>>        (BUILT_IN_TARGET_SUPPORTS_SSE3): New builtin.
>>        (BUILT_IN_TARGET_SUPPORTS_SSSE3): New builtin.
>>        (BUILT_IN_TARGET_SUPPORTS_SSE4_1): New builtin.
>>        (BUILT_IN_TARGET_SUPPORTS_SSE4_2): New builtin.
>>        (BUILT_IN_TARGET_IS_AMD): New builtin.
>>        (BUILT_IN_TARGET_IS_INTEL): New builtin.
>>        (BUILT_IN_TARGET_IS_COREI7_NEHALEM): New builtin.
>>        (BUILT_IN_TARGET_IS_COREI7_WESTMERE): New builtin.
>>        (BUILT_IN_TARGET_IS_COREI7_SANDYBRIDGE): New builtin.
>>        (BUILT_IN_TARGET_IS_AMDFAM10_BARCELONA): New builtin.
>>        (BUILT_IN_TARGET_IS_AMDFAM10_SHANGHAI): New builtin.
>>        (BUILT_IN_TARGET_IS_AMDFAM10_ISTANBUL): New builtin.
>>        * mversn-dispatch.c (do_fold_builtin_target): New function.
>>        (gate_fold_builtin_target): New function.
>>        (pass_tree_fold_builtin_target): New pass.
>>        * timevar.def (TV_FOLD_BUILTIN_TARGET): New var.
>>        * passes.c (init_optimization_passes): Add new pass to pass list.
>>        * config/i386/i386.c (build_struct_with_one_bit_fields): New function.
>>        (make_var_decl): New function.
>>        (get_field_from_struct): New function.
>>        (make_constructor_to_get_target_type): New function.
>>        (fold_builtin_target): New function.
>>        (ix86_fold_builtin): New function.
>>        (TARGET_FOLD_BUILTIN): New macro.
>>
>>        * gcc.dg/builtin_target.c: New test.
>>
>>        * config/i386/i386-cpuinfo.c: New file.
>>        * config/i386/t-cpuinfo: New file.
>>        * config.host: Add t-cpuinfo to link i386-cpuinfo.o with libgcc
>>
>> Index: libgcc/config.host
>> ===================================================================
>> --- libgcc/config.host  (revision 177767)
>> +++ libgcc/config.host  (working copy)
>> @@ -609,7 +609,7 @@ case ${host} in
>>  i[34567]86-*-linux* | x86_64-*-linux* | \
>>   i[34567]86-*-kfreebsd*-gnu | i[34567]86-*-knetbsd*-gnu | \
>>   i[34567]86-*-gnu*)
>> -       tmake_file="${tmake_file} t-tls"
>> +       tmake_file="${tmake_file} t-tls i386/t-cpuinfo"
>>        if test "$libgcc_cv_cfi" = "yes"; then
>>                tmake_file="${tmake_file} t-stack i386/t-stack-i386"
>>        fi
>> Index: libgcc/config/i386/t-cpuinfo
>> ===================================================================
>> --- libgcc/config/i386/t-cpuinfo        (revision 0)
>> +++ libgcc/config/i386/t-cpuinfo        (revision 0)
>> @@ -0,0 +1,2 @@
>> +# This is an endfile
>> +LIB2ADD += $(srcdir)/config/i386/i386-cpuinfo.c
>> Index: libgcc/config/i386/i386-cpuinfo.c
>> ===================================================================
>> --- libgcc/config/i386/i386-cpuinfo.c   (revision 0)
>> +++ libgcc/config/i386/i386-cpuinfo.c   (revision 0)
>> @@ -0,0 +1,275 @@
>> +/* Copyright (C) 2011 Free Software Foundation, Inc.
>> + * Contributed by Sriraman Tallam <tmsriram@google.com>.
>> + *
>> + * This file is free software; you can redistribute it and/or modify it
>> + * under the terms of the GNU General Public License as published by the
>> + * Free Software Foundation; either version 3, or (at your option) any
>> + * later version.
>> + *
>> + * This file is distributed in the hope that it will be useful, but
>> + * WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> + * General Public License for more details.
>> + *
>> + * Under Section 7 of GPL version 3, you are granted additional
>> + * permissions described in the GCC Runtime Library Exception, version
>> + * 3.1, as published by the Free Software Foundation.
>> + *
>> + * You should have received a copy of the GNU General Public License and
>> + * a copy of the GCC Runtime Library Exception along with this program;
>> + * see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
>> + * <http://www.gnu.org/licenses/>.
>> + *
>> + *
>> + * This code is adapted from gcc/config/i386/driver-i386.c. The CPUID
>> + * instruction is used to figure out the cpu type and supported features.
>> + * GCC runs __cpu_indicator_init from a constructor which sets the members
>> + * of __cpu_model and __cpu_features.
>> + */
>> +
>> +#include <string.h>
>> +
>> +#ifdef __GNUC__
>> +#include "cpuid.h"
>> +
>> +enum processor_type
>> +{
>> +  PROCESSOR_PENTIUM = 0,
>> +  PROCESSOR_CORE2,
>> +  PROCESSOR_COREI7_NEHALEM,
>> +  PROCESSOR_COREI7_WESTMERE,
>> +  PROCESSOR_COREI7_SANDYBRIDGE,
>> +  PROCESSOR_INTEL_GENERIC,
>> +  PROCESSOR_AMDFAM10_BARCELONA,
>> +  PROCESSOR_AMDFAM10_SHANGHAI,
>> +  PROCESSOR_AMDFAM10_ISTANBUL,
>> +  PROCESSOR_AMDFAM10_GENERIC,
>> +  PROCESSOR_AMD_GENERIC,
>> +  PROCESSOR_GENERIC,
>> +  PROCESSOR_max
>> +};
>> +
>> +enum vendor_signatures
>> +{
>> +  SIG_INTEL =  0x756e6547 /* Genu */,
>> +  SIG_AMD =    0x68747541 /* Auth */
>> +};
>> +
>> +
>> +/* Features supported. */
>> +
>> +struct __processor_features
>> +{
>> +  unsigned int __cpu_cmov : 1;
>> +  unsigned int __cpu_mmx : 1;
>> +  unsigned int __cpu_popcnt : 1;
>> +  unsigned int __cpu_sse : 1;
>> +  unsigned int __cpu_sse2 : 1;
>> +  unsigned int __cpu_sse3 : 1;
>> +  unsigned int __cpu_ssse3 : 1;
>> +  unsigned int __cpu_sse4_1 : 1;
>> +  unsigned int __cpu_sse4_2 : 1;
>> +};
>> +
>> +/* Flags exported. */
>> +
>> +struct __processor_model
>> +{
>> +  unsigned int __cpu_is_amd : 1;
>> +  unsigned int __cpu_is_intel : 1;
>> +  unsigned int __cpu_is_corei7_nehalem : 1;
>> +  unsigned int __cpu_is_corei7_westmere : 1;
>> +  unsigned int __cpu_is_corei7_sandybridge : 1;
>> +  unsigned int __cpu_is_amdfam10_barcelona : 1;
>> +  unsigned int __cpu_is_amdfam10_shanghai : 1;
>> +  unsigned int __cpu_is_amdfam10_istanbul : 1;
>> +};
>> +
>> +enum processor_type __cpu_type = PROCESSOR_GENERIC;
>> +struct __processor_features __cpu_features;
>> +struct __processor_model __cpu_model;
>> +
>> +static void
>> +get_amd_cpu (unsigned int family, unsigned int model)
>> +{
>> +  switch (family)
>> +    {
>> +    case 0x10:
>> +      switch (model)
>> +       {
>> +       case 0x2:
>> +         __cpu_type = PROCESSOR_AMDFAM10_BARCELONA;
>> +         __cpu_model.__cpu_is_amdfam10_barcelona = 1;
>> +         break;
>> +       case 0x4:
>> +         __cpu_type = PROCESSOR_AMDFAM10_SHANGHAI;
>> +         __cpu_model.__cpu_is_amdfam10_shanghai = 1;
>> +         break;
>> +       case 0x8:
>> +         __cpu_type = PROCESSOR_AMDFAM10_ISTANBUL;
>> +         __cpu_model.__cpu_is_amdfam10_istanbul = 1;
>> +         break;
>> +       default:
>> +         __cpu_type = PROCESSOR_AMDFAM10_GENERIC;
>> +         break;
>> +       }
>> +      break;
>> +    default:
>> +      __cpu_type = PROCESSOR_AMD_GENERIC;
>> +    }
>> +}
>> +
>> +static void
>> +get_intel_cpu (unsigned int family, unsigned int model, unsigned int brand_id)
>> +{
>> +  /* Parse family and model only if brand ID is 0. */
>> +  if (brand_id == 0)
>> +    {
>> +      switch (family)
>> +       {
>> +       case 0x5:
>> +         __cpu_type = PROCESSOR_PENTIUM;
>> +         break;
>> +       case 0x6:
>> +         switch (model)
>> +           {
>> +           case 0x1a:
>> +           case 0x1e:
>> +           case 0x1f:
>> +           case 0x2e:
>> +             /* Nehalem.  */
>> +             __cpu_type = PROCESSOR_COREI7_NEHALEM;
>> +             __cpu_model.__cpu_is_corei7_nehalem = 1;
>> +             break;
>> +           case 0x25:
>> +           case 0x2c:
>> +           case 0x2f:
>> +             /* Westmere.  */
>> +             __cpu_type = PROCESSOR_COREI7_WESTMERE;
>> +             __cpu_model.__cpu_is_corei7_westmere = 1;
>> +             break;
>> +           case 0x2a:
>> +             /* Sandy Bridge.  */
>> +             __cpu_type = PROCESSOR_COREI7_SANDYBRIDGE;
>> +             __cpu_model.__cpu_is_corei7_sandybridge = 1;
>> +             break;
>> +           case 0x17:
>> +           case 0x1d:
>> +             /* Penryn.  */
>> +           case 0x0f:
>> +             /* Merom.  */
>> +             __cpu_type = PROCESSOR_CORE2;
>> +             break;
>> +           default:
>> +             __cpu_type = PROCESSOR_INTEL_GENERIC;
>> +             break;
>> +           }
>> +         break;
>> +       default:
>> +         /* We have no idea.  */
>> +         __cpu_type = PROCESSOR_INTEL_GENERIC;
>> +         break;
>> +       }
>> +    }
>> +}
>> +
>> +static void
>> +get_available_features (unsigned int ecx, unsigned int edx)
>> +{
>> +  __cpu_features.__cpu_cmov = (edx & bit_CMOV) ? 1 : 0;
>> +  __cpu_features.__cpu_mmx = (edx & bit_MMX) ? 1 : 0;
>> +  __cpu_features.__cpu_sse = (edx & bit_SSE) ? 1 : 0;
>> +  __cpu_features.__cpu_sse2 = (edx & bit_SSE2) ? 1 : 0;
>> +  __cpu_features.__cpu_popcnt = (ecx & bit_POPCNT) ? 1 : 0;
>> +  __cpu_features.__cpu_sse3 = (ecx & bit_SSE3) ? 1 : 0;
>> +  __cpu_features.__cpu_ssse3 = (ecx & bit_SSSE3) ? 1 : 0;
>> +  __cpu_features.__cpu_sse4_1 = (ecx & bit_SSE4_1) ? 1 : 0;
>> +  __cpu_features.__cpu_sse4_2 = (ecx & bit_SSE4_2) ? 1 : 0;
>> +}
>> +
>> +/* A noinline function calling __get_cpuid. Having many calls to
>> +   cpuid in one function in 32-bit mode causes GCC to complain:
>> +   "can’t find a register in class ‘CLOBBERED_REGS’".  This is
>> +   related to PR rtl-optimization 44174. */
>> +
>> +static int __attribute__ ((noinline))
>> +__get_cpuid_output (unsigned int __level,
>> +                   unsigned int *__eax, unsigned int *__ebx,
>> +                   unsigned int *__ecx, unsigned int *__edx)
>> +{
>> +  return __get_cpuid (__level, __eax, __ebx, __ecx, __edx);
>> +}
>> +
>> +/* This function will be linked in to binaries that need to look up
>> +   CPU information.  */
>> +
>> +void
>> +__cpu_indicator_init(void)
>> +{
>> +  unsigned int eax, ebx, ecx, edx;
>> +
>> +  int max_level = 5;
>> +  unsigned int vendor;
>> +  unsigned int model, family, brand_id;
>> +
>> +  memset (&__cpu_features, 0, sizeof (struct __processor_features));
>> +  memset (&__cpu_model, 0, sizeof (struct __processor_model));
>> +
>> +  /* Assume cpuid insn present. Run in level 0 to get vendor id. */
>> +  if (!__get_cpuid_output (0, &eax, &ebx, &ecx, &edx))
>> +    return;
>> +
>> +  vendor = ebx;
>> +  max_level = eax;
>> +
>> +  if (max_level < 1)
>> +    return;
>> +
>> +  if (!__get_cpuid_output (1, &eax, &ebx, &ecx, &edx))
>> +    return;
>> +
>> +  model = (eax >> 4) & 0x0f;
>> +  family = (eax >> 8) & 0x0f;
>> +  brand_id = ebx & 0xff;
>> +
>> +  /* Adjust model and family for Intel CPUS. */
>> +  if (vendor == SIG_INTEL)
>> +    {
>> +      unsigned int extended_model, extended_family;
>> +
>> +      extended_model = (eax >> 12) & 0xf0;
>> +      extended_family = (eax >> 20) & 0xff;
>> +      if (family == 0x0f)
>> +       {
>> +         family += extended_family;
>> +         model += extended_model;
>> +       }
>> +      else if (family == 0x06)
>> +       model += extended_model;
>> +    }
>> +
>> +  /* Find CPU model. */
>> +
>> +  if (vendor == SIG_AMD)
>> +    {
>> +      __cpu_model.__cpu_is_amd = 1;
>> +      get_amd_cpu (family, model);
>> +    }
>> +  else if (vendor == SIG_INTEL)
>> +    {
>> +      __cpu_model.__cpu_is_intel = 1;
>> +      get_intel_cpu (family, model, brand_id);
>> +    }
>> +
>> +  /* Find available features. */
>> +  get_available_features (ecx, edx);
>> +}
>> +
>> +#else
>> +
>> +void
>> +__cpu_indicator_init(void)
>> +{
>> +}
>> +
>> +#endif /* __GNUC__ */
>> Index: gcc/tree-pass.h
>> ===================================================================
>> --- gcc/tree-pass.h     (revision 177767)
>> +++ gcc/tree-pass.h     (working copy)
>> @@ -449,6 +449,7 @@ extern struct gimple_opt_pass pass_split_functions
>>  extern struct gimple_opt_pass pass_feedback_split_functions;
>>  extern struct gimple_opt_pass pass_threadsafe_analyze;
>>  extern struct gimple_opt_pass pass_tree_convert_builtin_dispatch;
>> +extern struct gimple_opt_pass pass_tree_fold_builtin_target;
>>
>>  /* IPA Passes */
>>  extern struct simple_ipa_opt_pass pass_ipa_lower_emutls;
>> Index: gcc/testsuite/gcc.dg/builtin_target.c
>> ===================================================================
>> --- gcc/testsuite/gcc.dg/builtin_target.c       (revision 0)
>> +++ gcc/testsuite/gcc.dg/builtin_target.c       (revision 0)
>> @@ -0,0 +1,49 @@
>> +/* This test checks if the __builtin_target_* calls are recognized. */
>> +
>> +/* { dg-do run } */
>> +
>> +int
>> +fn1 ()
>> +{
>> +  if (__builtin_target_supports_cmov () < 0)
>> +    return -1;
>> +  if (__builtin_target_supports_mmx () < 0)
>> +    return -1;
>> +  if (__builtin_target_supports_popcount () < 0)
>> +    return -1;
>> +  if (__builtin_target_supports_sse () < 0)
>> +    return -1;
>> +  if (__builtin_target_supports_sse2 () < 0)
>> +    return -1;
>> +  if (__builtin_target_supports_sse3 () < 0)
>> +    return -1;
>> +  if (__builtin_target_supports_ssse3 () < 0)
>> +    return -1;
>> +  if (__builtin_target_supports_sse4_1 () < 0)
>> +    return -1;
>> +  if (__builtin_target_supports_sse4_2 () < 0)
>> +    return -1;
>> +  if (__builtin_target_is_amd () < 0)
>> +    return -1;
>> +  if (__builtin_target_is_intel () < 0)
>> +    return -1;
>> +  if (__builtin_target_is_corei7_nehalem () < 0)
>> +    return -1;
>> +  if (__builtin_target_is_corei7_westmere () < 0)
>> +    return -1;
>> +  if (__builtin_target_is_corei7_sandybridge () < 0)
>> +    return -1;
>> +  if (__builtin_target_is_amdfam10_barcelona () < 0)
>> +    return -1;
>> +  if (__builtin_target_is_amdfam10_shanghai () < 0)
>> +    return -1;
>> +  if (__builtin_target_is_amdfam10_istanbul () < 0)
>> +    return -1;
>> +
>> +  return 0;
>> +}
>> +
>> +int main ()
>> +{
>> +  return fn1 ();
>> +}
>> Index: gcc/builtins.def
>> ===================================================================
>> --- gcc/builtins.def    (revision 177767)
>> +++ gcc/builtins.def    (working copy)
>> @@ -763,6 +763,25 @@ DEF_BUILTIN (BUILT_IN_EMUTLS_REGISTER_COMMON,
>>  /* Multiversioning builtin dispatch hook. */
>>  DEF_GCC_BUILTIN (BUILT_IN_DISPATCH, "dispatch", BT_FN_INT_PTR_FN_INT_PTR_PTR_VAR, ATTR_NULL)
>>
>> +/* Builtins to determine target type and features at run-time. */
>> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_SUPPORTS_CMOV, "target_supports_cmov", BT_FN_INT, ATTR_NULL)
>> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_SUPPORTS_MMX, "target_supports_mmx", BT_FN_INT, ATTR_NULL)
>> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_SUPPORTS_POPCOUNT, "target_supports_popcount", BT_FN_INT, ATTR_NULL)
>> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_SUPPORTS_SSE, "target_supports_sse", BT_FN_INT, ATTR_NULL)
>> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_SUPPORTS_SSE2, "target_supports_sse2", BT_FN_INT, ATTR_NULL)
>> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_SUPPORTS_SSE3, "target_supports_sse3", BT_FN_INT, ATTR_NULL)
>> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_SUPPORTS_SSSE3, "target_supports_ssse3", BT_FN_INT, ATTR_NULL)
>> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_SUPPORTS_SSE4_1, "target_supports_sse4_1", BT_FN_INT, ATTR_NULL)
>> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_SUPPORTS_SSE4_2, "target_supports_sse4_2", BT_FN_INT, ATTR_NULL)
>> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_IS_AMD, "target_is_amd", BT_FN_INT, ATTR_NULL)
>> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_IS_INTEL, "target_is_intel", BT_FN_INT, ATTR_NULL)
>> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_IS_COREI7_NEHALEM, "target_is_corei7_nehalem", BT_FN_INT, ATTR_NULL)
>> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_IS_COREI7_WESTMERE, "target_is_corei7_westmere", BT_FN_INT, ATTR_NULL)
>> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_IS_COREI7_SANDYBRIDGE, "target_is_corei7_sandybridge", BT_FN_INT, ATTR_NULL)
>> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_IS_AMDFAM10_BARCELONA, "target_is_amdfam10_barcelona", BT_FN_INT, ATTR_NULL)
>> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_IS_AMDFAM10_SHANGHAI, "target_is_amdfam10_shanghai", BT_FN_INT, ATTR_NULL)
>> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_IS_AMDFAM10_ISTANBUL, "target_is_amdfam10_istanbul", BT_FN_INT, ATTR_NULL)
>> +
>>  /* Exception support.  */
>>  DEF_BUILTIN_STUB (BUILT_IN_UNWIND_RESUME, "__builtin_unwind_resume")
>>  DEF_BUILTIN_STUB (BUILT_IN_CXA_END_CLEANUP, "__builtin_cxa_end_cleanup")
>> Index: gcc/mversn-dispatch.c
>> ===================================================================
>> --- gcc/mversn-dispatch.c       (revision 177767)
>> +++ gcc/mversn-dispatch.c       (working copy)
>> @@ -135,6 +135,7 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "output.h"
>>  #include "vecprim.h"
>>  #include "gimple-pretty-print.h"
>> +#include "target.h"
>>
>>  typedef struct cgraph_node* NODEPTR;
>>  DEF_VEC_P (NODEPTR);
>> @@ -1764,3 +1765,103 @@ struct gimple_opt_pass pass_tree_convert_builtin_d
>>   TODO_update_ssa | TODO_verify_ssa
>>  }
>>  };
>> +
>> +/* Fold calls to __builtin_target_* */
>> +
>> +static unsigned int
>> +do_fold_builtin_target (void)
>> +{
>> +  basic_block bb;
>> +  gimple_stmt_iterator gsi;
>> +
>> +  /* Go through each stmt looking for __builtin_target_* calls */
>> +  FOR_EACH_BB_FN (bb, DECL_STRUCT_FUNCTION (current_function_decl))
>> +    {
>> +      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
>> +        {
>> +         gimple stmt = gsi_stmt (gsi);
>> +         gimple assign_stmt;
>> +          tree call_decl;
>> +         tree lhs_retval;
>> +         tree folded_val;
>> +
>> +         tree ssa_var, tmp_var;
>> +         gimple init_stmt;
>> +
>> +          if (!is_gimple_call (stmt))
>> +            continue;
>> +
>> +          call_decl = gimple_call_fndecl (stmt);
>> +
>> +         /* Check if it is a __builtin_target_* call. */
>> +
>> +         if (call_decl == NULL
>> +             || DECL_NAME (call_decl) == NULL_TREE
>> +             || DECL_BUILT_IN_CLASS (call_decl) != BUILT_IN_NORMAL
>> +             || strstr (IDENTIFIER_POINTER (DECL_NAME (call_decl)),
>> +                         "__builtin_target") == NULL)
>> +            continue;
>> +
>> +         /* If the lhs is NULL there is no need to fold the call. */
>> +         lhs_retval = gimple_call_lhs(stmt);
>> +         if (lhs_retval == NULL)
>> +           continue;
>> +
>> +         /* Call the target hook to fold the builtin */
>> +          folded_val = targetm.fold_builtin(call_decl, 0, NULL, false);
>> +
>> +         /* If the target does not support the builtin then fold it to zero. */
>> +         if (folded_val == NULL_TREE)
>> +           folded_val = build_zero_cst (unsigned_type_node);
>> +
>> +         /* Type cast unsigned value to integer */
>> +         tmp_var = create_tmp_var (unsigned_type_node, NULL);
>> +         init_stmt = gimple_build_assign (tmp_var, folded_val);
>> +         ssa_var = make_ssa_name (tmp_var, init_stmt);
>> +         gimple_assign_set_lhs (init_stmt, ssa_var);
>> +         mark_symbols_for_renaming (init_stmt);
>> +
>> +         assign_stmt = gimple_build_assign_with_ops (NOP_EXPR, lhs_retval, ssa_var, 0);
>> +         mark_symbols_for_renaming(assign_stmt);
>> +
>> +         gsi_insert_after_without_update (&gsi, assign_stmt, GSI_SAME_STMT);
>> +         gsi_insert_after_without_update (&gsi, init_stmt, GSI_SAME_STMT);
>> +         /* Delete the original call. */
>> +         gsi_remove(&gsi, true);
>> +       }
>> +    }
>> +
>> +  return 0;
>> +}
>> +
>> +static bool
>> +gate_fold_builtin_target (void)
>> +{
>> +  return true;
>> +}
>> +
>> +/* Pass to fold __builtin_target_* functions */
>> +
>> +struct gimple_opt_pass pass_tree_fold_builtin_target =
>> +{
>> + {
>> +  GIMPLE_PASS,
>> +  "fold_builtin_target",               /* name */
>> +  gate_fold_builtin_target,            /* gate */
>> +  do_fold_builtin_target,              /* execute */
>> +  NULL,                                        /* sub */
>> +  NULL,                                        /* next */
>> +  0,                                   /* static_pass_number */
>> +  TV_FOLD_BUILTIN_TARGET,              /* tv_id */
>> +  PROP_cfg,                            /* properties_required */
>> +  PROP_cfg,                            /* properties_provided */
>> +  0,                                   /* properties_destroyed */
>> +  0,                                   /* todo_flags_start */
>> +  TODO_dump_func |                     /* todo_flags_finish */
>> +  TODO_cleanup_cfg |
>> +  TODO_update_ssa |
>> +  TODO_verify_ssa
>> + }
>> +};
>> +
>> +
>> Index: gcc/timevar.def
>> ===================================================================
>> --- gcc/timevar.def     (revision 177767)
>> +++ gcc/timevar.def     (working copy)
>> @@ -124,6 +124,7 @@ DEFTIMEVAR (TV_PARSE_INMETH          , "parser inl
>>  DEFTIMEVAR (TV_TEMPLATE_INST         , "template instantiation")
>>  DEFTIMEVAR (TV_INLINE_HEURISTICS     , "inline heuristics")
>>  DEFTIMEVAR (TV_MVERSN_DISPATCH       , "multiversion dispatch")
>> +DEFTIMEVAR (TV_FOLD_BUILTIN_TARGET   , "fold __builtin_target calls")
>>  DEFTIMEVAR (TV_INTEGRATION           , "integration")
>>  DEFTIMEVAR (TV_TREE_GIMPLIFY        , "tree gimplify")
>>  DEFTIMEVAR (TV_TREE_EH              , "tree eh")
>> Index: gcc/passes.c
>> ===================================================================
>> --- gcc/passes.c        (revision 177767)
>> +++ gcc/passes.c        (working copy)
>> @@ -1249,6 +1249,8 @@ init_optimization_passes (void)
>>     {
>>       struct opt_pass **p = &pass_ipa_multiversion_dispatch.pass.sub;
>>       NEXT_PASS (pass_tree_convert_builtin_dispatch);
>> +      /* Fold calls to __builtin_target_*. */
>> +      NEXT_PASS (pass_tree_fold_builtin_target);
>>       /* Rebuilding cgraph edges is necessary as the above passes change
>>          the call graph.  Otherwise, future optimizations use the old
>>         call graph and make wrong decisions sometimes.*/
>> Index: gcc/config/i386/i386.c
>> ===================================================================
>> --- gcc/config/i386/i386.c      (revision 177767)
>> +++ gcc/config/i386/i386.c      (working copy)
>> @@ -58,6 +58,8 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "sched-int.h"
>>  #include "sbitmap.h"
>>  #include "fibheap.h"
>> +#include "tree-flow.h"
>> +#include "tree-pass.h"
>>
>>  enum upper_128bits_state
>>  {
>> @@ -7867,6 +7869,338 @@ ix86_build_builtin_va_list (void)
>>   return ret;
>>  }
>>
>> +/* Returns a struct type with name NAME and number of fields equal to
>> +   NUM_FIELDS.  Each field is a unsigned int bit field of length 1 bit. */
>> +
>> +static tree
>> +build_struct_with_one_bit_fields (int num_fields, const char *name)
>> +{
>> +  int i;
>> +  char field_name [10];
>> +  tree field = NULL_TREE, field_chain = NULL_TREE;
>> +  tree type = make_node (RECORD_TYPE);
>> +
>> +  strcpy (field_name, "k_field");
>> +
>> +  for (i = 0; i < num_fields; i++)
>> +    {
>> +      /* Name the fields, 0_field, 1_field, ... */
>> +      field_name [0] = '0' + i;
>> +      field = build_decl (UNKNOWN_LOCATION, FIELD_DECL,
>> +                         get_identifier (field_name), unsigned_type_node);
>> +      DECL_BIT_FIELD (field) = 1;
>> +      DECL_SIZE (field) = bitsize_one_node;
>> +      if (field_chain != NULL_TREE)
>> +       DECL_CHAIN (field) = field_chain;
>> +      field_chain = field;
>> +    }
>> +  finish_builtin_struct (type, name, field_chain, NULL_TREE);
>> +  return type;
>> +}
>> +
>> +/* Returns a VAR_DECL of type TYPE and name NAME. */
>> +
>> +static tree
>> +make_var_decl (tree type, const char *name)
>> +{
>> +  tree new_decl;
>> +  struct varpool_node *vnode;
>> +
>> +  new_decl = build_decl (UNKNOWN_LOCATION,
>> +                        VAR_DECL,
>> +                        get_identifier(name),
>> +                        type);
>> +
>> +  DECL_EXTERNAL (new_decl) = 1;
>> +  TREE_STATIC (new_decl) = 1;
>> +  TREE_PUBLIC (new_decl) = 1;
>> +  DECL_INITIAL (new_decl) = 0;
>> +  DECL_ARTIFICIAL (new_decl) = 0;
>> +  DECL_PRESERVE_P (new_decl) = 1;
>> +
>> +  make_decl_one_only (new_decl, DECL_ASSEMBLER_NAME (new_decl));
>> +  assemble_variable (new_decl, 0, 0, 0);
>> +
>> +  vnode = varpool_node (new_decl);
>> +  gcc_assert (vnode != NULL);
>> +  /* Set finalized to 1, otherwise it asserts in function "write_symbol" in
>> +     lto-streamer-out.c. */
>> +  vnode->finalized = 1;
>> +
>> +  return new_decl;
>> +}
>> +
>> +/* Traverses the chain of fields in STRUCT_TYPE and returns the FIELD_NUM
>> +   numbered field. */
>> +
>> +static tree
>> +get_field_from_struct (tree struct_type, int field_num)
>> +{
>> +  int i;
>> +  tree field = TYPE_FIELDS (struct_type);
>> +
>> +  for (i = 0; i < field_num; i++, field = DECL_CHAIN(field))
>> +    {
>> +      gcc_assert (field != NULL_TREE);
>> +    }
>> +
>> +  return field;
>> +}
>> +
>> +/* Create a new static constructor that calls __cpu_indicator_init ()
>> +   function defined in libgcc/config/i386-cpuinfo.c which runs cpuid
>> +   to figure out the type of the target. */
>> +
>> +static tree
>> +make_constructor_to_get_target_type (const char *name)
>> +{
>> +  tree decl, type, t;
>> +  gimple_seq seq;
>> +  basic_block new_bb;
>> +  tree old_current_function_decl;
>> +
>> +  tree __cpu_indicator_int_decl;
>> +  gimple constructor_body;
>> +
>> +
>> +  type = build_function_type_list (void_type_node, NULL_TREE);
>> +
>> +  /* Make a call stmt to __cpu_indicator_init */
>> +  __cpu_indicator_int_decl = build_fn_decl ("__cpu_indicator_init", type);
>> +  constructor_body = gimple_build_call (__cpu_indicator_int_decl, 0);
>> +  DECL_EXTERNAL (__cpu_indicator_int_decl) = 1;
>> +
>> +  decl = build_fn_decl (name, type);
>> +
>> +  DECL_NAME (decl) = get_identifier (name);
>> +  SET_DECL_ASSEMBLER_NAME (decl, DECL_NAME (decl));
>> +  gcc_assert (cgraph_node (decl) != NULL);
>> +
>> +  TREE_USED (decl) = 1;
>> +  DECL_ARTIFICIAL (decl) = 1;
>> +  DECL_IGNORED_P (decl) = 0;
>> +  TREE_PUBLIC (decl) = 0;
>> +  DECL_UNINLINABLE (decl) = 1;
>> +  DECL_EXTERNAL (decl) = 0;
>> +  DECL_CONTEXT (decl) = NULL_TREE;
>> +  DECL_INITIAL (decl) = make_node (BLOCK);
>> +  DECL_STATIC_CONSTRUCTOR (decl) = 1;
>> +  TREE_READONLY (decl) = 0;
>> +  DECL_PURE_P (decl) = 0;
>> +
>> +  /* This is a comdat. */
>> +  make_decl_one_only (decl, DECL_ASSEMBLER_NAME (decl));
>> +
>> +  /* Build result decl and add to function_decl. */
>> +  t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE, void_type_node);
>> +  DECL_ARTIFICIAL (t) = 1;
>> +  DECL_IGNORED_P (t) = 1;
>> +  DECL_RESULT (decl) = t;
>> +
>> +  gimplify_function_tree (decl);
>> +
>> +  /* Build CFG for this function. */
>> +
>> +  old_current_function_decl = current_function_decl;
>> +  push_cfun (DECL_STRUCT_FUNCTION (decl));
>> +  current_function_decl = decl;
>> +  init_empty_tree_cfg_for_function (DECL_STRUCT_FUNCTION (decl));
>> +  cfun->curr_properties |=
>> +    (PROP_gimple_lcf | PROP_gimple_leh | PROP_cfg | PROP_referenced_vars |
>> +     PROP_ssa);
>> +  new_bb = create_empty_bb (ENTRY_BLOCK_PTR);
>> +  make_edge (ENTRY_BLOCK_PTR, new_bb, EDGE_FALLTHRU);
>> +
>> +  /* XXX: Not sure if the edge commented below is necessary.  If I add this
>> +     edge, it fails in gimple_verify_flow_info in tree-cfg.c in condition :
>> +     " if (e->flags & EDGE_FALLTHRU)"
>> +     during -fprofile-generate.
>> +     Otherwise, it is fine.  Deleting this edge does not break anything.
>> +     Commenting this so that it is clear I am intentionally not doing this.*/
>> +  /* make_edge (new_bb, EXIT_BLOCK_PTR, EDGE_FALLTHRU); */
>> +
>> +  seq = gimple_seq_alloc_with_stmt (constructor_body);
>> +
>> +  set_bb_seq (new_bb, seq);
>> +  gimple_set_bb (constructor_body, new_bb);
>> +
>> +  /* Set the lexical block of the constructor body. Fails the inliner
>> +     other wise. */
>> +  gimple_set_block (constructor_body, DECL_INITIAL (decl));
>> +
>> +  /* This call is very important if this pass runs when the IR is in
>> +     SSA form.  It breaks things in strange ways otherwise. */
>> +  init_tree_ssa (DECL_STRUCT_FUNCTION (decl));
>> +  /* add_referenced_var (version_selector_var); */
>> +
>> +  cgraph_add_new_function (decl, true);
>> +  cgraph_call_function_insertion_hooks (cgraph_node (decl));
>> +  cgraph_mark_needed_node (cgraph_node (decl));
>> +
>> +  pop_cfun ();
>> +  current_function_decl = old_current_function_decl;
>> +  return decl;
>> +}
>> +
>> +/* FNDECL is a __builtin_target_* call that is folded into an integer defined
>> +   in libgcc/config/i386/i386-cpuinfo.c */
>> +
>> +static tree
>> +fold_builtin_target (tree fndecl)
>> +{
>> +  /* This is the order of bit-fields in __processor_features in
>> +     i386-cpuinfo.c */
>> +  enum processor_features
>> +  {
>> +    F_CMOV = 0,
>> +    F_MMX,
>> +    F_POPCNT,
>> +    F_SSE,
>> +    F_SSE2,
>> +    F_SSE3,
>> +    F_SSSE3,
>> +    F_SSE4_1,
>> +    F_SSE4_2,
>> +    F_MAX
>> +  };
>> +
>> +  /* This is the order of bit-fields in __processor_model in
>> +     i386-cpuinfo.c */
>> +  enum processor_model
>> +  {
>> +    M_AMD = 0,
>> +    M_INTEL,
>> +    M_COREI7_NEHALEM,
>> +    M_COREI7_WESTMERE,
>> +    M_COREI7_SANDYBRIDGE,
>> +    M_AMDFAM10_BARCELONA,
>> +    M_AMDFAM10_SHANGHAI,
>> +    M_AMDFAM10_ISTANBUL,
>> +    M_MAX
>> +  };
>> +
>> +  static tree __processor_features_type = NULL_TREE;
>> +  static tree __cpu_features_var = NULL_TREE;
>> +  static tree __processor_model_type = NULL_TREE;
>> +  static tree __cpu_model_var = NULL_TREE;
>> +  static tree ctor_decl = NULL_TREE;
>> +  static tree field;
>> +  static tree which_struct;
>> +
>> +  /* Make a call to __cpu_indicatior_init in a constructor.
>> +     Function __cpu_indicator_init is defined in i386-cpuinfo.c. */
>> +  if (ctor_decl == NULL_TREE)
>> +   ctor_decl = make_constructor_to_get_target_type
>> +               ("__cpu_indicator_init_ctor");
>> +
>> +  if (__processor_features_type == NULL_TREE)
>> +    __processor_features_type = build_struct_with_one_bit_fields (F_MAX,
>> +                                 "__processor_features");
>> +
>> +  if (__processor_model_type == NULL_TREE)
>> +    __processor_model_type = build_struct_with_one_bit_fields (M_MAX,
>> +                                 "__processor_model");
>> +
>> +  if (__cpu_features_var == NULL_TREE)
>> +    __cpu_features_var = make_var_decl (__processor_features_type,
>> +                                       "__cpu_features");
>> +
>> +  if (__cpu_model_var == NULL_TREE)
>> +    __cpu_model_var = make_var_decl (__processor_model_type,
>> +                                    "__cpu_model");
>> +
>> +  /* Look at fndecl code to identify the field requested. */
>> +  switch (DECL_FUNCTION_CODE (fndecl))
>> +    {
>> +    case BUILT_IN_TARGET_SUPPORTS_CMOV:
>> +      field = get_field_from_struct (__processor_features_type, F_CMOV);
>> +      which_struct = __cpu_features_var;
>> +      break;
>> +    case BUILT_IN_TARGET_SUPPORTS_MMX:
>> +      field = get_field_from_struct (__processor_features_type, F_MMX);
>> +      which_struct = __cpu_features_var;
>> +      break;
>> +    case BUILT_IN_TARGET_SUPPORTS_POPCOUNT:
>> +      field = get_field_from_struct (__processor_features_type, F_POPCNT);
>> +      which_struct = __cpu_features_var;
>> +      break;
>> +    case BUILT_IN_TARGET_SUPPORTS_SSE:
>> +      field = get_field_from_struct (__processor_features_type, F_SSE);
>> +      which_struct = __cpu_features_var;
>> +      break;
>> +    case BUILT_IN_TARGET_SUPPORTS_SSE2:
>> +      field = get_field_from_struct (__processor_features_type, F_SSE2);
>> +      which_struct = __cpu_features_var;
>> +      break;
>> +    case BUILT_IN_TARGET_SUPPORTS_SSE3:
>> +      field = get_field_from_struct (__processor_features_type, F_SSE3);
>> +      which_struct = __cpu_features_var;
>> +      break;
>> +    case BUILT_IN_TARGET_SUPPORTS_SSSE3:
>> +      field = get_field_from_struct (__processor_features_type, F_SSE3);
>> +      which_struct = __cpu_features_var;
>> +      break;
>> +    case BUILT_IN_TARGET_SUPPORTS_SSE4_1:
>> +      field = get_field_from_struct (__processor_features_type, F_SSE4_1);
>> +      which_struct = __cpu_features_var;
>> +      break;
>> +    case BUILT_IN_TARGET_SUPPORTS_SSE4_2:
>> +      field = get_field_from_struct (__processor_features_type, F_SSE4_2);
>> +      which_struct = __cpu_features_var;
>> +      break;
>> +    case BUILT_IN_TARGET_IS_AMD:
>> +      field = get_field_from_struct (__processor_model_type, M_AMD);;
>> +      which_struct = __cpu_model_var;
>> +      break;
>> +    case BUILT_IN_TARGET_IS_INTEL:
>> +      field = get_field_from_struct (__processor_model_type, M_INTEL);;
>> +      which_struct = __cpu_model_var;
>> +      break;
>> +    case BUILT_IN_TARGET_IS_COREI7_NEHALEM:
>> +      field = get_field_from_struct (__processor_model_type, M_COREI7_NEHALEM);;
>> +      which_struct = __cpu_model_var;
>> +      break;
>> +    case BUILT_IN_TARGET_IS_COREI7_WESTMERE:
>> +      field = get_field_from_struct (__processor_model_type, M_COREI7_WESTMERE);;
>> +      which_struct = __cpu_model_var;
>> +      break;
>> +    case BUILT_IN_TARGET_IS_COREI7_SANDYBRIDGE:
>> +      field = get_field_from_struct (__processor_model_type, M_COREI7_SANDYBRIDGE);;
>> +      which_struct = __cpu_model_var;
>> +      break;
>> +    case BUILT_IN_TARGET_IS_AMDFAM10_BARCELONA:
>> +      field = get_field_from_struct (__processor_model_type, M_AMDFAM10_BARCELONA);;
>> +      which_struct = __cpu_model_var;
>> +      break;
>> +    case BUILT_IN_TARGET_IS_AMDFAM10_SHANGHAI:
>> +      field = get_field_from_struct (__processor_model_type, M_AMDFAM10_SHANGHAI);;
>> +      which_struct = __cpu_model_var;
>> +      break;
>> +    case BUILT_IN_TARGET_IS_AMDFAM10_ISTANBUL:
>> +      field = get_field_from_struct (__processor_model_type, M_AMDFAM10_ISTANBUL);;
>> +      which_struct = __cpu_model_var;
>> +      break;
>> +    default:
>> +      return NULL_TREE;
>> +    }
>> +
>> +  return build3 (COMPONENT_REF, TREE_TYPE (field), which_struct, field, NULL_TREE);
>> +}
>> +
>> +/* Folds __builtin_target_* builtins. */
>> +
>> +static tree
>> +ix86_fold_builtin (tree fndecl, int n_args ATTRIBUTE_UNUSED,
>> +                   tree *args ATTRIBUTE_UNUSED, bool ignore ATTRIBUTE_UNUSED)
>> +{
>> +  const char *decl_name = IDENTIFIER_POINTER (DECL_NAME (fndecl));
>> +  if (DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL
>> +      && strstr(decl_name, "__builtin_target") != NULL)
>> +    return fold_builtin_target (fndecl);
>> +
>> +  return NULL_TREE;
>> +}
>> +
>>  /* Worker function for TARGET_SETUP_INCOMING_VARARGS.  */
>>
>>  static void
>> @@ -35097,6 +35431,9 @@ ix86_autovectorize_vector_sizes (void)
>>  #undef TARGET_BUILD_BUILTIN_VA_LIST
>>  #define TARGET_BUILD_BUILTIN_VA_LIST ix86_build_builtin_va_list
>>
>> +#undef TARGET_FOLD_BUILTIN
>> +#define TARGET_FOLD_BUILTIN ix86_fold_builtin
>> +
>>  #undef TARGET_ENUM_VA_LIST_P
>>  #define TARGET_ENUM_VA_LIST_P ix86_enum_va_list
>>
>>
>> --
>> This patch is available for review at http://codereview.appspot.com/4893046
>>
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
  2011-08-16 21:27 [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046) Sriraman Tallam
                   ` (3 preceding siblings ...)
  2011-08-17  9:38 ` Richard Guenther
@ 2011-08-18  1:56 ` Hans-Peter Nilsson
  2011-08-18  2:06   ` Sriraman Tallam
  4 siblings, 1 reply; 50+ messages in thread
From: Hans-Peter Nilsson @ 2011-08-18  1:56 UTC (permalink / raw)
  To: Sriraman Tallam; +Cc: reply, gcc-patches

On Tue, 16 Aug 2011, Sriraman Tallam wrote:

(I don't see anyone else making this comment, so maybe I missed
something obvious, but I don't think so...)

> Support for getting CPU type and feature information at run-time.
>
> The following patch provides support for finding the platform type at run-time, like cpu type and features supported. The multi-versioning framework will use the builtins added to dispatch the right function version. Please refer to http://gcc.gnu.org/ml/gcc/2011-08/msg00298.html for details on function multi-versioning usability.
>
> 	* tree-pass.h (pass_tree_fold_builtin_target): New pass.
> 	* builtins.def (BUILT_IN_TARGET_SUPPORTS_CMOV): New builtin.
> 	(BUILT_IN_TARGET_SUPPORTS_MMX): New builtin.
> 	(BUILT_IN_TARGET_SUPPORTS_POPCOUNT): New builtin.
> 	(BUILT_IN_TARGET_SUPPORTS_SSE): New builtin.
> 	(BUILT_IN_TARGET_SUPPORTS_SSE2): New builtin.
> 	(BUILT_IN_TARGET_SUPPORTS_SSE3): New builtin.
> 	(BUILT_IN_TARGET_SUPPORTS_SSSE3): New builtin.
> 	(BUILT_IN_TARGET_SUPPORTS_SSE4_1): New builtin.
> 	(BUILT_IN_TARGET_SUPPORTS_SSE4_2): New builtin.
> 	(BUILT_IN_TARGET_IS_AMD): New builtin.
> 	(BUILT_IN_TARGET_IS_INTEL): New builtin.
> 	(BUILT_IN_TARGET_IS_COREI7_NEHALEM): New builtin.
> 	(BUILT_IN_TARGET_IS_COREI7_WESTMERE): New builtin.
> 	(BUILT_IN_TARGET_IS_COREI7_SANDYBRIDGE): New builtin.
> 	(BUILT_IN_TARGET_IS_AMDFAM10_BARCELONA): New builtin.
> 	(BUILT_IN_TARGET_IS_AMDFAM10_SHANGHAI): New builtin.
> 	(BUILT_IN_TARGET_IS_AMDFAM10_ISTANBUL): New builtin.
(cut)

Keep the port-specific bits in the port, please. I don't see why
this has to be in generic files as opposed to target hooks and
included target-specific file fragments like everything else
(well, most everything) in gcc.  If not, I think we'll see
cpu_ports*variants explosion here until these bits are
rewritten...

brgds, H-P

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
  2011-08-18  1:56 ` Hans-Peter Nilsson
@ 2011-08-18  2:06   ` Sriraman Tallam
  0 siblings, 0 replies; 50+ messages in thread
From: Sriraman Tallam @ 2011-08-18  2:06 UTC (permalink / raw)
  To: Hans-Peter Nilsson; +Cc: reply, gcc-patches

On Wed, Aug 17, 2011 at 4:59 PM, Hans-Peter Nilsson <hp@bitrange.com> wrote:
> On Tue, 16 Aug 2011, Sriraman Tallam wrote:
>
> (I don't see anyone else making this comment, so maybe I missed
> something obvious, but I don't think so...)
>
>> Support for getting CPU type and feature information at run-time.
>>
>> The following patch provides support for finding the platform type at run-time, like cpu type and features supported. The multi-versioning framework will use the builtins added to dispatch the right function version. Please refer to http://gcc.gnu.org/ml/gcc/2011-08/msg00298.html for details on function multi-versioning usability.
>>
>>       * tree-pass.h (pass_tree_fold_builtin_target): New pass.
>>       * builtins.def (BUILT_IN_TARGET_SUPPORTS_CMOV): New builtin.
>>       (BUILT_IN_TARGET_SUPPORTS_MMX): New builtin.
>>       (BUILT_IN_TARGET_SUPPORTS_POPCOUNT): New builtin.
>>       (BUILT_IN_TARGET_SUPPORTS_SSE): New builtin.
>>       (BUILT_IN_TARGET_SUPPORTS_SSE2): New builtin.
>>       (BUILT_IN_TARGET_SUPPORTS_SSE3): New builtin.
>>       (BUILT_IN_TARGET_SUPPORTS_SSSE3): New builtin.
>>       (BUILT_IN_TARGET_SUPPORTS_SSE4_1): New builtin.
>>       (BUILT_IN_TARGET_SUPPORTS_SSE4_2): New builtin.
>>       (BUILT_IN_TARGET_IS_AMD): New builtin.
>>       (BUILT_IN_TARGET_IS_INTEL): New builtin.
>>       (BUILT_IN_TARGET_IS_COREI7_NEHALEM): New builtin.
>>       (BUILT_IN_TARGET_IS_COREI7_WESTMERE): New builtin.
>>       (BUILT_IN_TARGET_IS_COREI7_SANDYBRIDGE): New builtin.
>>       (BUILT_IN_TARGET_IS_AMDFAM10_BARCELONA): New builtin.
>>       (BUILT_IN_TARGET_IS_AMDFAM10_SHANGHAI): New builtin.
>>       (BUILT_IN_TARGET_IS_AMDFAM10_ISTANBUL): New builtin.
> (cut)
>
> Keep the port-specific bits in the port, please. I don't see why
> this has to be in generic files as opposed to target hooks and
> included target-specific file fragments like everything else
> (well, most everything) in gcc.  If not, I think we'll see
> cpu_ports*variants explosion here until these bits are
> rewritten...

Yes, this should move into the port. Sorry, I will change it.

Thanks,
-Sri.

>
> brgds, H-P
>
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
  2011-08-17 20:04   ` Sriraman Tallam
@ 2011-08-18  9:33     ` Richard Guenther
  2011-08-18 14:04       ` Michael Matz
  2011-08-18 21:15       ` Sriraman Tallam
  0 siblings, 2 replies; 50+ messages in thread
From: Richard Guenther @ 2011-08-18  9:33 UTC (permalink / raw)
  To: Sriraman Tallam; +Cc: reply, gcc-patches

On Wed, Aug 17, 2011 at 7:54 PM, Sriraman Tallam <tmsriram@google.com> wrote:
> On Wed, Aug 17, 2011 at 12:37 AM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>> On Tue, Aug 16, 2011 at 10:50 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>> Support for getting CPU type and feature information at run-time.
>>>
>>> The following patch provides support for finding the platform type at run-time, like cpu type and features supported. The multi-versioning framework will use the builtins added to dispatch the right function version. Please refer to http://gcc.gnu.org/ml/gcc/2011-08/msg00298.html for details on function multi-versioning usability.
>>
>> Please provide an overview why you need the new builtins,
>
> For multi-versioning,  the compiler can call the appropriate builtin
> to dispatch the right version. The builtin call will later get folded.
>
> For example,
>
> int  __attribute__ version ("sse4_1")
> compute ()
> {
>   // Do sse4_1 specific impkementation.
> }
>
> int
> compute ()
> {
>  // Generic implementation
> }
>
> The compiler will check if the target supports the attribute and then
> convert a call to compute ()  into  this:
>
> if (__builtin_target_supports_sse4_1 ())
>  compute_sse4_1 (); // Call to the SSE4_1 implementation
> else
>  compute_generic (); // Call to the generic implementation
>
> Further, having it as builtin function allows it to be overridden by
> the programmer. For instance, the programmer can override it to
> identify newer CPU types not yet supported. Having these builtins
> makes it convenient to identify platform type and features in general.
>
> why you need
>> a separate pass to fold them (instead of just expanding them) and why
>
> I can move it into builtins.c along with where other builtins are
> folded and remove the separate pass. My intention originally was to
> fold them as early as possible, in this case after multi-versioning
> but I guess this is not a requirement.

Yes, they should be folded by targetm.fold_builtin instead.  The Frontend
should simply fold the tests at the time it creates them, that's as early
as possible (gimplification will also re-fold all builtin function calls).

>> you are creating
>> vars behind the back of GCC:
>
> The flow I had in mind was to have functions in libgcc which will use
> CPUID to get target features and set global vars corresponding to the
> features. So, the builtin should be folded by into the appropriate
> variable in libgcc.

Hm, but then the variable should reside in libgcc and you'd only need
an extern variant in the varpool.  I'm not sure separate constructors
(possibly in each module ...) would be better than a single one in
libgcc that would get run unconditionally.

>>
>> +  /* Set finalized to 1, otherwise it asserts in function "write_symbol" in
>> +     lto-streamer-out.c. */
>> +  vnode->finalized = 1;
>>
>> where I think you miss a varpool_finalize_node call somewhere.  Why
>> isn't this all done at target init time
>
> I wanted to do this on demand. If none of the new builtins are called
> in the program, I do not need to to do this at all. In summary, libgcc
> has a function called __cpu_indicator_init which does the work of
> determining target features and setting the appropriate globals. If
> the new builtins are called, gcc will call __cpu_indicator_init in a
> constructor so that it is called exactly once. Then, gcc will fold the
> builtin to the appropriate global variable.

I see, but this sounds like premature optimization to me, no?  Considering
you'd do this in each module and our inability to merge those constructors
at link time.  If we put __cpu_indicator, the constructor and the assorted
support into a separate module inside libgcc.a could we arrange it in a way
that if __cpu_indicator is not referenced from the program that piece isn't
linked in?  (not sure if that is possible with constructors)

Richard.

>
> ?  If you don't mark the
>> variable as to be preserved
>> like you do cgraph will optimize it all away if it isn't needed.
>
>>
>> Richard.
>>
>>>        * tree-pass.h (pass_tree_fold_builtin_target): New pass.
>>>        * builtins.def (BUILT_IN_TARGET_SUPPORTS_CMOV): New builtin.
>>>        (BUILT_IN_TARGET_SUPPORTS_MMX): New builtin.
>>>        (BUILT_IN_TARGET_SUPPORTS_POPCOUNT): New builtin.
>>>        (BUILT_IN_TARGET_SUPPORTS_SSE): New builtin.
>>>        (BUILT_IN_TARGET_SUPPORTS_SSE2): New builtin.
>>>        (BUILT_IN_TARGET_SUPPORTS_SSE3): New builtin.
>>>        (BUILT_IN_TARGET_SUPPORTS_SSSE3): New builtin.
>>>        (BUILT_IN_TARGET_SUPPORTS_SSE4_1): New builtin.
>>>        (BUILT_IN_TARGET_SUPPORTS_SSE4_2): New builtin.
>>>        (BUILT_IN_TARGET_IS_AMD): New builtin.
>>>        (BUILT_IN_TARGET_IS_INTEL): New builtin.
>>>        (BUILT_IN_TARGET_IS_COREI7_NEHALEM): New builtin.
>>>        (BUILT_IN_TARGET_IS_COREI7_WESTMERE): New builtin.
>>>        (BUILT_IN_TARGET_IS_COREI7_SANDYBRIDGE): New builtin.
>>>        (BUILT_IN_TARGET_IS_AMDFAM10_BARCELONA): New builtin.
>>>        (BUILT_IN_TARGET_IS_AMDFAM10_SHANGHAI): New builtin.
>>>        (BUILT_IN_TARGET_IS_AMDFAM10_ISTANBUL): New builtin.
>>>        * mversn-dispatch.c (do_fold_builtin_target): New function.
>>>        (gate_fold_builtin_target): New function.
>>>        (pass_tree_fold_builtin_target): New pass.
>>>        * timevar.def (TV_FOLD_BUILTIN_TARGET): New var.
>>>        * passes.c (init_optimization_passes): Add new pass to pass list.
>>>        * config/i386/i386.c (build_struct_with_one_bit_fields): New function.
>>>        (make_var_decl): New function.
>>>        (get_field_from_struct): New function.
>>>        (make_constructor_to_get_target_type): New function.
>>>        (fold_builtin_target): New function.
>>>        (ix86_fold_builtin): New function.
>>>        (TARGET_FOLD_BUILTIN): New macro.
>>>
>>>        * gcc.dg/builtin_target.c: New test.
>>>
>>>        * config/i386/i386-cpuinfo.c: New file.
>>>        * config/i386/t-cpuinfo: New file.
>>>        * config.host: Add t-cpuinfo to link i386-cpuinfo.o with libgcc
>>>
>>> Index: libgcc/config.host
>>> ===================================================================
>>> --- libgcc/config.host  (revision 177767)
>>> +++ libgcc/config.host  (working copy)
>>> @@ -609,7 +609,7 @@ case ${host} in
>>>  i[34567]86-*-linux* | x86_64-*-linux* | \
>>>   i[34567]86-*-kfreebsd*-gnu | i[34567]86-*-knetbsd*-gnu | \
>>>   i[34567]86-*-gnu*)
>>> -       tmake_file="${tmake_file} t-tls"
>>> +       tmake_file="${tmake_file} t-tls i386/t-cpuinfo"
>>>        if test "$libgcc_cv_cfi" = "yes"; then
>>>                tmake_file="${tmake_file} t-stack i386/t-stack-i386"
>>>        fi
>>> Index: libgcc/config/i386/t-cpuinfo
>>> ===================================================================
>>> --- libgcc/config/i386/t-cpuinfo        (revision 0)
>>> +++ libgcc/config/i386/t-cpuinfo        (revision 0)
>>> @@ -0,0 +1,2 @@
>>> +# This is an endfile
>>> +LIB2ADD += $(srcdir)/config/i386/i386-cpuinfo.c
>>> Index: libgcc/config/i386/i386-cpuinfo.c
>>> ===================================================================
>>> --- libgcc/config/i386/i386-cpuinfo.c   (revision 0)
>>> +++ libgcc/config/i386/i386-cpuinfo.c   (revision 0)
>>> @@ -0,0 +1,275 @@
>>> +/* Copyright (C) 2011 Free Software Foundation, Inc.
>>> + * Contributed by Sriraman Tallam <tmsriram@google.com>.
>>> + *
>>> + * This file is free software; you can redistribute it and/or modify it
>>> + * under the terms of the GNU General Public License as published by the
>>> + * Free Software Foundation; either version 3, or (at your option) any
>>> + * later version.
>>> + *
>>> + * This file is distributed in the hope that it will be useful, but
>>> + * WITHOUT ANY WARRANTY; without even the implied warranty of
>>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>>> + * General Public License for more details.
>>> + *
>>> + * Under Section 7 of GPL version 3, you are granted additional
>>> + * permissions described in the GCC Runtime Library Exception, version
>>> + * 3.1, as published by the Free Software Foundation.
>>> + *
>>> + * You should have received a copy of the GNU General Public License and
>>> + * a copy of the GCC Runtime Library Exception along with this program;
>>> + * see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
>>> + * <http://www.gnu.org/licenses/>.
>>> + *
>>> + *
>>> + * This code is adapted from gcc/config/i386/driver-i386.c. The CPUID
>>> + * instruction is used to figure out the cpu type and supported features.
>>> + * GCC runs __cpu_indicator_init from a constructor which sets the members
>>> + * of __cpu_model and __cpu_features.
>>> + */
>>> +
>>> +#include <string.h>
>>> +
>>> +#ifdef __GNUC__
>>> +#include "cpuid.h"
>>> +
>>> +enum processor_type
>>> +{
>>> +  PROCESSOR_PENTIUM = 0,
>>> +  PROCESSOR_CORE2,
>>> +  PROCESSOR_COREI7_NEHALEM,
>>> +  PROCESSOR_COREI7_WESTMERE,
>>> +  PROCESSOR_COREI7_SANDYBRIDGE,
>>> +  PROCESSOR_INTEL_GENERIC,
>>> +  PROCESSOR_AMDFAM10_BARCELONA,
>>> +  PROCESSOR_AMDFAM10_SHANGHAI,
>>> +  PROCESSOR_AMDFAM10_ISTANBUL,
>>> +  PROCESSOR_AMDFAM10_GENERIC,
>>> +  PROCESSOR_AMD_GENERIC,
>>> +  PROCESSOR_GENERIC,
>>> +  PROCESSOR_max
>>> +};
>>> +
>>> +enum vendor_signatures
>>> +{
>>> +  SIG_INTEL =  0x756e6547 /* Genu */,
>>> +  SIG_AMD =    0x68747541 /* Auth */
>>> +};
>>> +
>>> +
>>> +/* Features supported. */
>>> +
>>> +struct __processor_features
>>> +{
>>> +  unsigned int __cpu_cmov : 1;
>>> +  unsigned int __cpu_mmx : 1;
>>> +  unsigned int __cpu_popcnt : 1;
>>> +  unsigned int __cpu_sse : 1;
>>> +  unsigned int __cpu_sse2 : 1;
>>> +  unsigned int __cpu_sse3 : 1;
>>> +  unsigned int __cpu_ssse3 : 1;
>>> +  unsigned int __cpu_sse4_1 : 1;
>>> +  unsigned int __cpu_sse4_2 : 1;
>>> +};
>>> +
>>> +/* Flags exported. */
>>> +
>>> +struct __processor_model
>>> +{
>>> +  unsigned int __cpu_is_amd : 1;
>>> +  unsigned int __cpu_is_intel : 1;
>>> +  unsigned int __cpu_is_corei7_nehalem : 1;
>>> +  unsigned int __cpu_is_corei7_westmere : 1;
>>> +  unsigned int __cpu_is_corei7_sandybridge : 1;
>>> +  unsigned int __cpu_is_amdfam10_barcelona : 1;
>>> +  unsigned int __cpu_is_amdfam10_shanghai : 1;
>>> +  unsigned int __cpu_is_amdfam10_istanbul : 1;
>>> +};
>>> +
>>> +enum processor_type __cpu_type = PROCESSOR_GENERIC;
>>> +struct __processor_features __cpu_features;
>>> +struct __processor_model __cpu_model;
>>> +
>>> +static void
>>> +get_amd_cpu (unsigned int family, unsigned int model)
>>> +{
>>> +  switch (family)
>>> +    {
>>> +    case 0x10:
>>> +      switch (model)
>>> +       {
>>> +       case 0x2:
>>> +         __cpu_type = PROCESSOR_AMDFAM10_BARCELONA;
>>> +         __cpu_model.__cpu_is_amdfam10_barcelona = 1;
>>> +         break;
>>> +       case 0x4:
>>> +         __cpu_type = PROCESSOR_AMDFAM10_SHANGHAI;
>>> +         __cpu_model.__cpu_is_amdfam10_shanghai = 1;
>>> +         break;
>>> +       case 0x8:
>>> +         __cpu_type = PROCESSOR_AMDFAM10_ISTANBUL;
>>> +         __cpu_model.__cpu_is_amdfam10_istanbul = 1;
>>> +         break;
>>> +       default:
>>> +         __cpu_type = PROCESSOR_AMDFAM10_GENERIC;
>>> +         break;
>>> +       }
>>> +      break;
>>> +    default:
>>> +      __cpu_type = PROCESSOR_AMD_GENERIC;
>>> +    }
>>> +}
>>> +
>>> +static void
>>> +get_intel_cpu (unsigned int family, unsigned int model, unsigned int brand_id)
>>> +{
>>> +  /* Parse family and model only if brand ID is 0. */
>>> +  if (brand_id == 0)
>>> +    {
>>> +      switch (family)
>>> +       {
>>> +       case 0x5:
>>> +         __cpu_type = PROCESSOR_PENTIUM;
>>> +         break;
>>> +       case 0x6:
>>> +         switch (model)
>>> +           {
>>> +           case 0x1a:
>>> +           case 0x1e:
>>> +           case 0x1f:
>>> +           case 0x2e:
>>> +             /* Nehalem.  */
>>> +             __cpu_type = PROCESSOR_COREI7_NEHALEM;
>>> +             __cpu_model.__cpu_is_corei7_nehalem = 1;
>>> +             break;
>>> +           case 0x25:
>>> +           case 0x2c:
>>> +           case 0x2f:
>>> +             /* Westmere.  */
>>> +             __cpu_type = PROCESSOR_COREI7_WESTMERE;
>>> +             __cpu_model.__cpu_is_corei7_westmere = 1;
>>> +             break;
>>> +           case 0x2a:
>>> +             /* Sandy Bridge.  */
>>> +             __cpu_type = PROCESSOR_COREI7_SANDYBRIDGE;
>>> +             __cpu_model.__cpu_is_corei7_sandybridge = 1;
>>> +             break;
>>> +           case 0x17:
>>> +           case 0x1d:
>>> +             /* Penryn.  */
>>> +           case 0x0f:
>>> +             /* Merom.  */
>>> +             __cpu_type = PROCESSOR_CORE2;
>>> +             break;
>>> +           default:
>>> +             __cpu_type = PROCESSOR_INTEL_GENERIC;
>>> +             break;
>>> +           }
>>> +         break;
>>> +       default:
>>> +         /* We have no idea.  */
>>> +         __cpu_type = PROCESSOR_INTEL_GENERIC;
>>> +         break;
>>> +       }
>>> +    }
>>> +}
>>> +
>>> +static void
>>> +get_available_features (unsigned int ecx, unsigned int edx)
>>> +{
>>> +  __cpu_features.__cpu_cmov = (edx & bit_CMOV) ? 1 : 0;
>>> +  __cpu_features.__cpu_mmx = (edx & bit_MMX) ? 1 : 0;
>>> +  __cpu_features.__cpu_sse = (edx & bit_SSE) ? 1 : 0;
>>> +  __cpu_features.__cpu_sse2 = (edx & bit_SSE2) ? 1 : 0;
>>> +  __cpu_features.__cpu_popcnt = (ecx & bit_POPCNT) ? 1 : 0;
>>> +  __cpu_features.__cpu_sse3 = (ecx & bit_SSE3) ? 1 : 0;
>>> +  __cpu_features.__cpu_ssse3 = (ecx & bit_SSSE3) ? 1 : 0;
>>> +  __cpu_features.__cpu_sse4_1 = (ecx & bit_SSE4_1) ? 1 : 0;
>>> +  __cpu_features.__cpu_sse4_2 = (ecx & bit_SSE4_2) ? 1 : 0;
>>> +}
>>> +
>>> +/* A noinline function calling __get_cpuid. Having many calls to
>>> +   cpuid in one function in 32-bit mode causes GCC to complain:
>>> +   "can’t find a register in class ‘CLOBBERED_REGS’".  This is
>>> +   related to PR rtl-optimization 44174. */
>>> +
>>> +static int __attribute__ ((noinline))
>>> +__get_cpuid_output (unsigned int __level,
>>> +                   unsigned int *__eax, unsigned int *__ebx,
>>> +                   unsigned int *__ecx, unsigned int *__edx)
>>> +{
>>> +  return __get_cpuid (__level, __eax, __ebx, __ecx, __edx);
>>> +}
>>> +
>>> +/* This function will be linked in to binaries that need to look up
>>> +   CPU information.  */
>>> +
>>> +void
>>> +__cpu_indicator_init(void)
>>> +{
>>> +  unsigned int eax, ebx, ecx, edx;
>>> +
>>> +  int max_level = 5;
>>> +  unsigned int vendor;
>>> +  unsigned int model, family, brand_id;
>>> +
>>> +  memset (&__cpu_features, 0, sizeof (struct __processor_features));
>>> +  memset (&__cpu_model, 0, sizeof (struct __processor_model));
>>> +
>>> +  /* Assume cpuid insn present. Run in level 0 to get vendor id. */
>>> +  if (!__get_cpuid_output (0, &eax, &ebx, &ecx, &edx))
>>> +    return;
>>> +
>>> +  vendor = ebx;
>>> +  max_level = eax;
>>> +
>>> +  if (max_level < 1)
>>> +    return;
>>> +
>>> +  if (!__get_cpuid_output (1, &eax, &ebx, &ecx, &edx))
>>> +    return;
>>> +
>>> +  model = (eax >> 4) & 0x0f;
>>> +  family = (eax >> 8) & 0x0f;
>>> +  brand_id = ebx & 0xff;
>>> +
>>> +  /* Adjust model and family for Intel CPUS. */
>>> +  if (vendor == SIG_INTEL)
>>> +    {
>>> +      unsigned int extended_model, extended_family;
>>> +
>>> +      extended_model = (eax >> 12) & 0xf0;
>>> +      extended_family = (eax >> 20) & 0xff;
>>> +      if (family == 0x0f)
>>> +       {
>>> +         family += extended_family;
>>> +         model += extended_model;
>>> +       }
>>> +      else if (family == 0x06)
>>> +       model += extended_model;
>>> +    }
>>> +
>>> +  /* Find CPU model. */
>>> +
>>> +  if (vendor == SIG_AMD)
>>> +    {
>>> +      __cpu_model.__cpu_is_amd = 1;
>>> +      get_amd_cpu (family, model);
>>> +    }
>>> +  else if (vendor == SIG_INTEL)
>>> +    {
>>> +      __cpu_model.__cpu_is_intel = 1;
>>> +      get_intel_cpu (family, model, brand_id);
>>> +    }
>>> +
>>> +  /* Find available features. */
>>> +  get_available_features (ecx, edx);
>>> +}
>>> +
>>> +#else
>>> +
>>> +void
>>> +__cpu_indicator_init(void)
>>> +{
>>> +}
>>> +
>>> +#endif /* __GNUC__ */
>>> Index: gcc/tree-pass.h
>>> ===================================================================
>>> --- gcc/tree-pass.h     (revision 177767)
>>> +++ gcc/tree-pass.h     (working copy)
>>> @@ -449,6 +449,7 @@ extern struct gimple_opt_pass pass_split_functions
>>>  extern struct gimple_opt_pass pass_feedback_split_functions;
>>>  extern struct gimple_opt_pass pass_threadsafe_analyze;
>>>  extern struct gimple_opt_pass pass_tree_convert_builtin_dispatch;
>>> +extern struct gimple_opt_pass pass_tree_fold_builtin_target;
>>>
>>>  /* IPA Passes */
>>>  extern struct simple_ipa_opt_pass pass_ipa_lower_emutls;
>>> Index: gcc/testsuite/gcc.dg/builtin_target.c
>>> ===================================================================
>>> --- gcc/testsuite/gcc.dg/builtin_target.c       (revision 0)
>>> +++ gcc/testsuite/gcc.dg/builtin_target.c       (revision 0)
>>> @@ -0,0 +1,49 @@
>>> +/* This test checks if the __builtin_target_* calls are recognized. */
>>> +
>>> +/* { dg-do run } */
>>> +
>>> +int
>>> +fn1 ()
>>> +{
>>> +  if (__builtin_target_supports_cmov () < 0)
>>> +    return -1;
>>> +  if (__builtin_target_supports_mmx () < 0)
>>> +    return -1;
>>> +  if (__builtin_target_supports_popcount () < 0)
>>> +    return -1;
>>> +  if (__builtin_target_supports_sse () < 0)
>>> +    return -1;
>>> +  if (__builtin_target_supports_sse2 () < 0)
>>> +    return -1;
>>> +  if (__builtin_target_supports_sse3 () < 0)
>>> +    return -1;
>>> +  if (__builtin_target_supports_ssse3 () < 0)
>>> +    return -1;
>>> +  if (__builtin_target_supports_sse4_1 () < 0)
>>> +    return -1;
>>> +  if (__builtin_target_supports_sse4_2 () < 0)
>>> +    return -1;
>>> +  if (__builtin_target_is_amd () < 0)
>>> +    return -1;
>>> +  if (__builtin_target_is_intel () < 0)
>>> +    return -1;
>>> +  if (__builtin_target_is_corei7_nehalem () < 0)
>>> +    return -1;
>>> +  if (__builtin_target_is_corei7_westmere () < 0)
>>> +    return -1;
>>> +  if (__builtin_target_is_corei7_sandybridge () < 0)
>>> +    return -1;
>>> +  if (__builtin_target_is_amdfam10_barcelona () < 0)
>>> +    return -1;
>>> +  if (__builtin_target_is_amdfam10_shanghai () < 0)
>>> +    return -1;
>>> +  if (__builtin_target_is_amdfam10_istanbul () < 0)
>>> +    return -1;
>>> +
>>> +  return 0;
>>> +}
>>> +
>>> +int main ()
>>> +{
>>> +  return fn1 ();
>>> +}
>>> Index: gcc/builtins.def
>>> ===================================================================
>>> --- gcc/builtins.def    (revision 177767)
>>> +++ gcc/builtins.def    (working copy)
>>> @@ -763,6 +763,25 @@ DEF_BUILTIN (BUILT_IN_EMUTLS_REGISTER_COMMON,
>>>  /* Multiversioning builtin dispatch hook. */
>>>  DEF_GCC_BUILTIN (BUILT_IN_DISPATCH, "dispatch", BT_FN_INT_PTR_FN_INT_PTR_PTR_VAR, ATTR_NULL)
>>>
>>> +/* Builtins to determine target type and features at run-time. */
>>> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_SUPPORTS_CMOV, "target_supports_cmov", BT_FN_INT, ATTR_NULL)
>>> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_SUPPORTS_MMX, "target_supports_mmx", BT_FN_INT, ATTR_NULL)
>>> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_SUPPORTS_POPCOUNT, "target_supports_popcount", BT_FN_INT, ATTR_NULL)
>>> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_SUPPORTS_SSE, "target_supports_sse", BT_FN_INT, ATTR_NULL)
>>> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_SUPPORTS_SSE2, "target_supports_sse2", BT_FN_INT, ATTR_NULL)
>>> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_SUPPORTS_SSE3, "target_supports_sse3", BT_FN_INT, ATTR_NULL)
>>> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_SUPPORTS_SSSE3, "target_supports_ssse3", BT_FN_INT, ATTR_NULL)
>>> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_SUPPORTS_SSE4_1, "target_supports_sse4_1", BT_FN_INT, ATTR_NULL)
>>> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_SUPPORTS_SSE4_2, "target_supports_sse4_2", BT_FN_INT, ATTR_NULL)
>>> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_IS_AMD, "target_is_amd", BT_FN_INT, ATTR_NULL)
>>> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_IS_INTEL, "target_is_intel", BT_FN_INT, ATTR_NULL)
>>> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_IS_COREI7_NEHALEM, "target_is_corei7_nehalem", BT_FN_INT, ATTR_NULL)
>>> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_IS_COREI7_WESTMERE, "target_is_corei7_westmere", BT_FN_INT, ATTR_NULL)
>>> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_IS_COREI7_SANDYBRIDGE, "target_is_corei7_sandybridge", BT_FN_INT, ATTR_NULL)
>>> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_IS_AMDFAM10_BARCELONA, "target_is_amdfam10_barcelona", BT_FN_INT, ATTR_NULL)
>>> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_IS_AMDFAM10_SHANGHAI, "target_is_amdfam10_shanghai", BT_FN_INT, ATTR_NULL)
>>> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_IS_AMDFAM10_ISTANBUL, "target_is_amdfam10_istanbul", BT_FN_INT, ATTR_NULL)
>>> +
>>>  /* Exception support.  */
>>>  DEF_BUILTIN_STUB (BUILT_IN_UNWIND_RESUME, "__builtin_unwind_resume")
>>>  DEF_BUILTIN_STUB (BUILT_IN_CXA_END_CLEANUP, "__builtin_cxa_end_cleanup")
>>> Index: gcc/mversn-dispatch.c
>>> ===================================================================
>>> --- gcc/mversn-dispatch.c       (revision 177767)
>>> +++ gcc/mversn-dispatch.c       (working copy)
>>> @@ -135,6 +135,7 @@ along with GCC; see the file COPYING3.  If not see
>>>  #include "output.h"
>>>  #include "vecprim.h"
>>>  #include "gimple-pretty-print.h"
>>> +#include "target.h"
>>>
>>>  typedef struct cgraph_node* NODEPTR;
>>>  DEF_VEC_P (NODEPTR);
>>> @@ -1764,3 +1765,103 @@ struct gimple_opt_pass pass_tree_convert_builtin_d
>>>   TODO_update_ssa | TODO_verify_ssa
>>>  }
>>>  };
>>> +
>>> +/* Fold calls to __builtin_target_* */
>>> +
>>> +static unsigned int
>>> +do_fold_builtin_target (void)
>>> +{
>>> +  basic_block bb;
>>> +  gimple_stmt_iterator gsi;
>>> +
>>> +  /* Go through each stmt looking for __builtin_target_* calls */
>>> +  FOR_EACH_BB_FN (bb, DECL_STRUCT_FUNCTION (current_function_decl))
>>> +    {
>>> +      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
>>> +        {
>>> +         gimple stmt = gsi_stmt (gsi);
>>> +         gimple assign_stmt;
>>> +          tree call_decl;
>>> +         tree lhs_retval;
>>> +         tree folded_val;
>>> +
>>> +         tree ssa_var, tmp_var;
>>> +         gimple init_stmt;
>>> +
>>> +          if (!is_gimple_call (stmt))
>>> +            continue;
>>> +
>>> +          call_decl = gimple_call_fndecl (stmt);
>>> +
>>> +         /* Check if it is a __builtin_target_* call. */
>>> +
>>> +         if (call_decl == NULL
>>> +             || DECL_NAME (call_decl) == NULL_TREE
>>> +             || DECL_BUILT_IN_CLASS (call_decl) != BUILT_IN_NORMAL
>>> +             || strstr (IDENTIFIER_POINTER (DECL_NAME (call_decl)),
>>> +                         "__builtin_target") == NULL)
>>> +            continue;
>>> +
>>> +         /* If the lhs is NULL there is no need to fold the call. */
>>> +         lhs_retval = gimple_call_lhs(stmt);
>>> +         if (lhs_retval == NULL)
>>> +           continue;
>>> +
>>> +         /* Call the target hook to fold the builtin */
>>> +          folded_val = targetm.fold_builtin(call_decl, 0, NULL, false);
>>> +
>>> +         /* If the target does not support the builtin then fold it to zero. */
>>> +         if (folded_val == NULL_TREE)
>>> +           folded_val = build_zero_cst (unsigned_type_node);
>>> +
>>> +         /* Type cast unsigned value to integer */
>>> +         tmp_var = create_tmp_var (unsigned_type_node, NULL);
>>> +         init_stmt = gimple_build_assign (tmp_var, folded_val);
>>> +         ssa_var = make_ssa_name (tmp_var, init_stmt);
>>> +         gimple_assign_set_lhs (init_stmt, ssa_var);
>>> +         mark_symbols_for_renaming (init_stmt);
>>> +
>>> +         assign_stmt = gimple_build_assign_with_ops (NOP_EXPR, lhs_retval, ssa_var, 0);
>>> +         mark_symbols_for_renaming(assign_stmt);
>>> +
>>> +         gsi_insert_after_without_update (&gsi, assign_stmt, GSI_SAME_STMT);
>>> +         gsi_insert_after_without_update (&gsi, init_stmt, GSI_SAME_STMT);
>>> +         /* Delete the original call. */
>>> +         gsi_remove(&gsi, true);
>>> +       }
>>> +    }
>>> +
>>> +  return 0;
>>> +}
>>> +
>>> +static bool
>>> +gate_fold_builtin_target (void)
>>> +{
>>> +  return true;
>>> +}
>>> +
>>> +/* Pass to fold __builtin_target_* functions */
>>> +
>>> +struct gimple_opt_pass pass_tree_fold_builtin_target =
>>> +{
>>> + {
>>> +  GIMPLE_PASS,
>>> +  "fold_builtin_target",               /* name */
>>> +  gate_fold_builtin_target,            /* gate */
>>> +  do_fold_builtin_target,              /* execute */
>>> +  NULL,                                        /* sub */
>>> +  NULL,                                        /* next */
>>> +  0,                                   /* static_pass_number */
>>> +  TV_FOLD_BUILTIN_TARGET,              /* tv_id */
>>> +  PROP_cfg,                            /* properties_required */
>>> +  PROP_cfg,                            /* properties_provided */
>>> +  0,                                   /* properties_destroyed */
>>> +  0,                                   /* todo_flags_start */
>>> +  TODO_dump_func |                     /* todo_flags_finish */
>>> +  TODO_cleanup_cfg |
>>> +  TODO_update_ssa |
>>> +  TODO_verify_ssa
>>> + }
>>> +};
>>> +
>>> +
>>> Index: gcc/timevar.def
>>> ===================================================================
>>> --- gcc/timevar.def     (revision 177767)
>>> +++ gcc/timevar.def     (working copy)
>>> @@ -124,6 +124,7 @@ DEFTIMEVAR (TV_PARSE_INMETH          , "parser inl
>>>  DEFTIMEVAR (TV_TEMPLATE_INST         , "template instantiation")
>>>  DEFTIMEVAR (TV_INLINE_HEURISTICS     , "inline heuristics")
>>>  DEFTIMEVAR (TV_MVERSN_DISPATCH       , "multiversion dispatch")
>>> +DEFTIMEVAR (TV_FOLD_BUILTIN_TARGET   , "fold __builtin_target calls")
>>>  DEFTIMEVAR (TV_INTEGRATION           , "integration")
>>>  DEFTIMEVAR (TV_TREE_GIMPLIFY        , "tree gimplify")
>>>  DEFTIMEVAR (TV_TREE_EH              , "tree eh")
>>> Index: gcc/passes.c
>>> ===================================================================
>>> --- gcc/passes.c        (revision 177767)
>>> +++ gcc/passes.c        (working copy)
>>> @@ -1249,6 +1249,8 @@ init_optimization_passes (void)
>>>     {
>>>       struct opt_pass **p = &pass_ipa_multiversion_dispatch.pass.sub;
>>>       NEXT_PASS (pass_tree_convert_builtin_dispatch);
>>> +      /* Fold calls to __builtin_target_*. */
>>> +      NEXT_PASS (pass_tree_fold_builtin_target);
>>>       /* Rebuilding cgraph edges is necessary as the above passes change
>>>          the call graph.  Otherwise, future optimizations use the old
>>>         call graph and make wrong decisions sometimes.*/
>>> Index: gcc/config/i386/i386.c
>>> ===================================================================
>>> --- gcc/config/i386/i386.c      (revision 177767)
>>> +++ gcc/config/i386/i386.c      (working copy)
>>> @@ -58,6 +58,8 @@ along with GCC; see the file COPYING3.  If not see
>>>  #include "sched-int.h"
>>>  #include "sbitmap.h"
>>>  #include "fibheap.h"
>>> +#include "tree-flow.h"
>>> +#include "tree-pass.h"
>>>
>>>  enum upper_128bits_state
>>>  {
>>> @@ -7867,6 +7869,338 @@ ix86_build_builtin_va_list (void)
>>>   return ret;
>>>  }
>>>
>>> +/* Returns a struct type with name NAME and number of fields equal to
>>> +   NUM_FIELDS.  Each field is a unsigned int bit field of length 1 bit. */
>>> +
>>> +static tree
>>> +build_struct_with_one_bit_fields (int num_fields, const char *name)
>>> +{
>>> +  int i;
>>> +  char field_name [10];
>>> +  tree field = NULL_TREE, field_chain = NULL_TREE;
>>> +  tree type = make_node (RECORD_TYPE);
>>> +
>>> +  strcpy (field_name, "k_field");
>>> +
>>> +  for (i = 0; i < num_fields; i++)
>>> +    {
>>> +      /* Name the fields, 0_field, 1_field, ... */
>>> +      field_name [0] = '0' + i;
>>> +      field = build_decl (UNKNOWN_LOCATION, FIELD_DECL,
>>> +                         get_identifier (field_name), unsigned_type_node);
>>> +      DECL_BIT_FIELD (field) = 1;
>>> +      DECL_SIZE (field) = bitsize_one_node;
>>> +      if (field_chain != NULL_TREE)
>>> +       DECL_CHAIN (field) = field_chain;
>>> +      field_chain = field;
>>> +    }
>>> +  finish_builtin_struct (type, name, field_chain, NULL_TREE);
>>> +  return type;
>>> +}
>>> +
>>> +/* Returns a VAR_DECL of type TYPE and name NAME. */
>>> +
>>> +static tree
>>> +make_var_decl (tree type, const char *name)
>>> +{
>>> +  tree new_decl;
>>> +  struct varpool_node *vnode;
>>> +
>>> +  new_decl = build_decl (UNKNOWN_LOCATION,
>>> +                        VAR_DECL,
>>> +                        get_identifier(name),
>>> +                        type);
>>> +
>>> +  DECL_EXTERNAL (new_decl) = 1;
>>> +  TREE_STATIC (new_decl) = 1;
>>> +  TREE_PUBLIC (new_decl) = 1;
>>> +  DECL_INITIAL (new_decl) = 0;
>>> +  DECL_ARTIFICIAL (new_decl) = 0;
>>> +  DECL_PRESERVE_P (new_decl) = 1;
>>> +
>>> +  make_decl_one_only (new_decl, DECL_ASSEMBLER_NAME (new_decl));
>>> +  assemble_variable (new_decl, 0, 0, 0);
>>> +
>>> +  vnode = varpool_node (new_decl);
>>> +  gcc_assert (vnode != NULL);
>>> +  /* Set finalized to 1, otherwise it asserts in function "write_symbol" in
>>> +     lto-streamer-out.c. */
>>> +  vnode->finalized = 1;
>>> +
>>> +  return new_decl;
>>> +}
>>> +
>>> +/* Traverses the chain of fields in STRUCT_TYPE and returns the FIELD_NUM
>>> +   numbered field. */
>>> +
>>> +static tree
>>> +get_field_from_struct (tree struct_type, int field_num)
>>> +{
>>> +  int i;
>>> +  tree field = TYPE_FIELDS (struct_type);
>>> +
>>> +  for (i = 0; i < field_num; i++, field = DECL_CHAIN(field))
>>> +    {
>>> +      gcc_assert (field != NULL_TREE);
>>> +    }
>>> +
>>> +  return field;
>>> +}
>>> +
>>> +/* Create a new static constructor that calls __cpu_indicator_init ()
>>> +   function defined in libgcc/config/i386-cpuinfo.c which runs cpuid
>>> +   to figure out the type of the target. */
>>> +
>>> +static tree
>>> +make_constructor_to_get_target_type (const char *name)
>>> +{
>>> +  tree decl, type, t;
>>> +  gimple_seq seq;
>>> +  basic_block new_bb;
>>> +  tree old_current_function_decl;
>>> +
>>> +  tree __cpu_indicator_int_decl;
>>> +  gimple constructor_body;
>>> +
>>> +
>>> +  type = build_function_type_list (void_type_node, NULL_TREE);
>>> +
>>> +  /* Make a call stmt to __cpu_indicator_init */
>>> +  __cpu_indicator_int_decl = build_fn_decl ("__cpu_indicator_init", type);
>>> +  constructor_body = gimple_build_call (__cpu_indicator_int_decl, 0);
>>> +  DECL_EXTERNAL (__cpu_indicator_int_decl) = 1;
>>> +
>>> +  decl = build_fn_decl (name, type);
>>> +
>>> +  DECL_NAME (decl) = get_identifier (name);
>>> +  SET_DECL_ASSEMBLER_NAME (decl, DECL_NAME (decl));
>>> +  gcc_assert (cgraph_node (decl) != NULL);
>>> +
>>> +  TREE_USED (decl) = 1;
>>> +  DECL_ARTIFICIAL (decl) = 1;
>>> +  DECL_IGNORED_P (decl) = 0;
>>> +  TREE_PUBLIC (decl) = 0;
>>> +  DECL_UNINLINABLE (decl) = 1;
>>> +  DECL_EXTERNAL (decl) = 0;
>>> +  DECL_CONTEXT (decl) = NULL_TREE;
>>> +  DECL_INITIAL (decl) = make_node (BLOCK);
>>> +  DECL_STATIC_CONSTRUCTOR (decl) = 1;
>>> +  TREE_READONLY (decl) = 0;
>>> +  DECL_PURE_P (decl) = 0;
>>> +
>>> +  /* This is a comdat. */
>>> +  make_decl_one_only (decl, DECL_ASSEMBLER_NAME (decl));
>>> +
>>> +  /* Build result decl and add to function_decl. */
>>> +  t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE, void_type_node);
>>> +  DECL_ARTIFICIAL (t) = 1;
>>> +  DECL_IGNORED_P (t) = 1;
>>> +  DECL_RESULT (decl) = t;
>>> +
>>> +  gimplify_function_tree (decl);
>>> +
>>> +  /* Build CFG for this function. */
>>> +
>>> +  old_current_function_decl = current_function_decl;
>>> +  push_cfun (DECL_STRUCT_FUNCTION (decl));
>>> +  current_function_decl = decl;
>>> +  init_empty_tree_cfg_for_function (DECL_STRUCT_FUNCTION (decl));
>>> +  cfun->curr_properties |=
>>> +    (PROP_gimple_lcf | PROP_gimple_leh | PROP_cfg | PROP_referenced_vars |
>>> +     PROP_ssa);
>>> +  new_bb = create_empty_bb (ENTRY_BLOCK_PTR);
>>> +  make_edge (ENTRY_BLOCK_PTR, new_bb, EDGE_FALLTHRU);
>>> +
>>> +  /* XXX: Not sure if the edge commented below is necessary.  If I add this
>>> +     edge, it fails in gimple_verify_flow_info in tree-cfg.c in condition :
>>> +     " if (e->flags & EDGE_FALLTHRU)"
>>> +     during -fprofile-generate.
>>> +     Otherwise, it is fine.  Deleting this edge does not break anything.
>>> +     Commenting this so that it is clear I am intentionally not doing this.*/
>>> +  /* make_edge (new_bb, EXIT_BLOCK_PTR, EDGE_FALLTHRU); */
>>> +
>>> +  seq = gimple_seq_alloc_with_stmt (constructor_body);
>>> +
>>> +  set_bb_seq (new_bb, seq);
>>> +  gimple_set_bb (constructor_body, new_bb);
>>> +
>>> +  /* Set the lexical block of the constructor body. Fails the inliner
>>> +     other wise. */
>>> +  gimple_set_block (constructor_body, DECL_INITIAL (decl));
>>> +
>>> +  /* This call is very important if this pass runs when the IR is in
>>> +     SSA form.  It breaks things in strange ways otherwise. */
>>> +  init_tree_ssa (DECL_STRUCT_FUNCTION (decl));
>>> +  /* add_referenced_var (version_selector_var); */
>>> +
>>> +  cgraph_add_new_function (decl, true);
>>> +  cgraph_call_function_insertion_hooks (cgraph_node (decl));
>>> +  cgraph_mark_needed_node (cgraph_node (decl));
>>> +
>>> +  pop_cfun ();
>>> +  current_function_decl = old_current_function_decl;
>>> +  return decl;
>>> +}
>>> +
>>> +/* FNDECL is a __builtin_target_* call that is folded into an integer defined
>>> +   in libgcc/config/i386/i386-cpuinfo.c */
>>> +
>>> +static tree
>>> +fold_builtin_target (tree fndecl)
>>> +{
>>> +  /* This is the order of bit-fields in __processor_features in
>>> +     i386-cpuinfo.c */
>>> +  enum processor_features
>>> +  {
>>> +    F_CMOV = 0,
>>> +    F_MMX,
>>> +    F_POPCNT,
>>> +    F_SSE,
>>> +    F_SSE2,
>>> +    F_SSE3,
>>> +    F_SSSE3,
>>> +    F_SSE4_1,
>>> +    F_SSE4_2,
>>> +    F_MAX
>>> +  };
>>> +
>>> +  /* This is the order of bit-fields in __processor_model in
>>> +     i386-cpuinfo.c */
>>> +  enum processor_model
>>> +  {
>>> +    M_AMD = 0,
>>> +    M_INTEL,
>>> +    M_COREI7_NEHALEM,
>>> +    M_COREI7_WESTMERE,
>>> +    M_COREI7_SANDYBRIDGE,
>>> +    M_AMDFAM10_BARCELONA,
>>> +    M_AMDFAM10_SHANGHAI,
>>> +    M_AMDFAM10_ISTANBUL,
>>> +    M_MAX
>>> +  };
>>> +
>>> +  static tree __processor_features_type = NULL_TREE;
>>> +  static tree __cpu_features_var = NULL_TREE;
>>> +  static tree __processor_model_type = NULL_TREE;
>>> +  static tree __cpu_model_var = NULL_TREE;
>>> +  static tree ctor_decl = NULL_TREE;
>>> +  static tree field;
>>> +  static tree which_struct;
>>> +
>>> +  /* Make a call to __cpu_indicatior_init in a constructor.
>>> +     Function __cpu_indicator_init is defined in i386-cpuinfo.c. */
>>> +  if (ctor_decl == NULL_TREE)
>>> +   ctor_decl = make_constructor_to_get_target_type
>>> +               ("__cpu_indicator_init_ctor");
>>> +
>>> +  if (__processor_features_type == NULL_TREE)
>>> +    __processor_features_type = build_struct_with_one_bit_fields (F_MAX,
>>> +                                 "__processor_features");
>>> +
>>> +  if (__processor_model_type == NULL_TREE)
>>> +    __processor_model_type = build_struct_with_one_bit_fields (M_MAX,
>>> +                                 "__processor_model");
>>> +
>>> +  if (__cpu_features_var == NULL_TREE)
>>> +    __cpu_features_var = make_var_decl (__processor_features_type,
>>> +                                       "__cpu_features");
>>> +
>>> +  if (__cpu_model_var == NULL_TREE)
>>> +    __cpu_model_var = make_var_decl (__processor_model_type,
>>> +                                    "__cpu_model");
>>> +
>>> +  /* Look at fndecl code to identify the field requested. */
>>> +  switch (DECL_FUNCTION_CODE (fndecl))
>>> +    {
>>> +    case BUILT_IN_TARGET_SUPPORTS_CMOV:
>>> +      field = get_field_from_struct (__processor_features_type, F_CMOV);
>>> +      which_struct = __cpu_features_var;
>>> +      break;
>>> +    case BUILT_IN_TARGET_SUPPORTS_MMX:
>>> +      field = get_field_from_struct (__processor_features_type, F_MMX);
>>> +      which_struct = __cpu_features_var;
>>> +      break;
>>> +    case BUILT_IN_TARGET_SUPPORTS_POPCOUNT:
>>> +      field = get_field_from_struct (__processor_features_type, F_POPCNT);
>>> +      which_struct = __cpu_features_var;
>>> +      break;
>>> +    case BUILT_IN_TARGET_SUPPORTS_SSE:
>>> +      field = get_field_from_struct (__processor_features_type, F_SSE);
>>> +      which_struct = __cpu_features_var;
>>> +      break;
>>> +    case BUILT_IN_TARGET_SUPPORTS_SSE2:
>>> +      field = get_field_from_struct (__processor_features_type, F_SSE2);
>>> +      which_struct = __cpu_features_var;
>>> +      break;
>>> +    case BUILT_IN_TARGET_SUPPORTS_SSE3:
>>> +      field = get_field_from_struct (__processor_features_type, F_SSE3);
>>> +      which_struct = __cpu_features_var;
>>> +      break;
>>> +    case BUILT_IN_TARGET_SUPPORTS_SSSE3:
>>> +      field = get_field_from_struct (__processor_features_type, F_SSE3);
>>> +      which_struct = __cpu_features_var;
>>> +      break;
>>> +    case BUILT_IN_TARGET_SUPPORTS_SSE4_1:
>>> +      field = get_field_from_struct (__processor_features_type, F_SSE4_1);
>>> +      which_struct = __cpu_features_var;
>>> +      break;
>>> +    case BUILT_IN_TARGET_SUPPORTS_SSE4_2:
>>> +      field = get_field_from_struct (__processor_features_type, F_SSE4_2);
>>> +      which_struct = __cpu_features_var;
>>> +      break;
>>> +    case BUILT_IN_TARGET_IS_AMD:
>>> +      field = get_field_from_struct (__processor_model_type, M_AMD);;
>>> +      which_struct = __cpu_model_var;
>>> +      break;
>>> +    case BUILT_IN_TARGET_IS_INTEL:
>>> +      field = get_field_from_struct (__processor_model_type, M_INTEL);;
>>> +      which_struct = __cpu_model_var;
>>> +      break;
>>> +    case BUILT_IN_TARGET_IS_COREI7_NEHALEM:
>>> +      field = get_field_from_struct (__processor_model_type, M_COREI7_NEHALEM);;
>>> +      which_struct = __cpu_model_var;
>>> +      break;
>>> +    case BUILT_IN_TARGET_IS_COREI7_WESTMERE:
>>> +      field = get_field_from_struct (__processor_model_type, M_COREI7_WESTMERE);;
>>> +      which_struct = __cpu_model_var;
>>> +      break;
>>> +    case BUILT_IN_TARGET_IS_COREI7_SANDYBRIDGE:
>>> +      field = get_field_from_struct (__processor_model_type, M_COREI7_SANDYBRIDGE);;
>>> +      which_struct = __cpu_model_var;
>>> +      break;
>>> +    case BUILT_IN_TARGET_IS_AMDFAM10_BARCELONA:
>>> +      field = get_field_from_struct (__processor_model_type, M_AMDFAM10_BARCELONA);;
>>> +      which_struct = __cpu_model_var;
>>> +      break;
>>> +    case BUILT_IN_TARGET_IS_AMDFAM10_SHANGHAI:
>>> +      field = get_field_from_struct (__processor_model_type, M_AMDFAM10_SHANGHAI);;
>>> +      which_struct = __cpu_model_var;
>>> +      break;
>>> +    case BUILT_IN_TARGET_IS_AMDFAM10_ISTANBUL:
>>> +      field = get_field_from_struct (__processor_model_type, M_AMDFAM10_ISTANBUL);;
>>> +      which_struct = __cpu_model_var;
>>> +      break;
>>> +    default:
>>> +      return NULL_TREE;
>>> +    }
>>> +
>>> +  return build3 (COMPONENT_REF, TREE_TYPE (field), which_struct, field, NULL_TREE);
>>> +}
>>> +
>>> +/* Folds __builtin_target_* builtins. */
>>> +
>>> +static tree
>>> +ix86_fold_builtin (tree fndecl, int n_args ATTRIBUTE_UNUSED,
>>> +                   tree *args ATTRIBUTE_UNUSED, bool ignore ATTRIBUTE_UNUSED)
>>> +{
>>> +  const char *decl_name = IDENTIFIER_POINTER (DECL_NAME (fndecl));
>>> +  if (DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL
>>> +      && strstr(decl_name, "__builtin_target") != NULL)
>>> +    return fold_builtin_target (fndecl);
>>> +
>>> +  return NULL_TREE;
>>> +}
>>> +
>>>  /* Worker function for TARGET_SETUP_INCOMING_VARARGS.  */
>>>
>>>  static void
>>> @@ -35097,6 +35431,9 @@ ix86_autovectorize_vector_sizes (void)
>>>  #undef TARGET_BUILD_BUILTIN_VA_LIST
>>>  #define TARGET_BUILD_BUILTIN_VA_LIST ix86_build_builtin_va_list
>>>
>>> +#undef TARGET_FOLD_BUILTIN
>>> +#define TARGET_FOLD_BUILTIN ix86_fold_builtin
>>> +
>>>  #undef TARGET_ENUM_VA_LIST_P
>>>  #define TARGET_ENUM_VA_LIST_P ix86_enum_va_list
>>>
>>>
>>> --
>>> This patch is available for review at http://codereview.appspot.com/4893046
>>>
>>
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
  2011-08-18  9:33     ` Richard Guenther
@ 2011-08-18 14:04       ` Michael Matz
  2011-08-18 17:12         ` Xinliang David Li
  2011-08-18 21:15       ` Sriraman Tallam
  1 sibling, 1 reply; 50+ messages in thread
From: Michael Matz @ 2011-08-18 14:04 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Sriraman Tallam, reply, gcc-patches

Hi,

On Thu, 18 Aug 2011, Richard Guenther wrote:

> > CPUID to get target features and set global vars corresponding to the 
> > features. So, the builtin should be folded by into the appropriate 
> > variable in libgcc.
> 
> Hm, but then the variable should reside in libgcc and you'd only need an 
> extern variant in the varpool.  I'm not sure separate constructors 
> (possibly in each module ...) would be better than a single one in 
> libgcc that would get run unconditionally.

Would be my preference too.

> > determining target features and setting the appropriate globals. If
> > the new builtins are called, gcc will call __cpu_indicator_init in a
> > constructor so that it is called exactly once. Then, gcc will fold the
> > builtin to the appropriate global variable.
> 
> I see, but this sounds like premature optimization to me, no?  Considering
> you'd do this in each module and our inability to merge those constructors
> at link time.  If we put __cpu_indicator, the constructor and the assorted
> support into a separate module inside libgcc.a could we arrange it in a way
> that if __cpu_indicator is not referenced from the program that piece isn't
> linked in?  (not sure if that is possible with constructors)

If you make an .o file only exporting __cpu_indicator, then it won't be 
included in a link where no object file refers to that symbol.  If you put 
the ctor for that variable in the same .o file you win.

I also take issue with the large number of builtins, I'd have expected one 
single builtin returning the CPU type, and an enum that can be tested.  
That potentially requires an installed gcc private header, but I think 
enabling access to this cpu detection facility in libgcc to our users is 
worthwhile.


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
  2011-08-18 14:04       ` Michael Matz
@ 2011-08-18 17:12         ` Xinliang David Li
  0 siblings, 0 replies; 50+ messages in thread
From: Xinliang David Li @ 2011-08-18 17:12 UTC (permalink / raw)
  To: Michael Matz; +Cc: Richard Guenther, Sriraman Tallam, reply, gcc-patches

On Thu, Aug 18, 2011 at 6:10 AM, Michael Matz <matz@suse.de> wrote:
> Hi,
>
> On Thu, 18 Aug 2011, Richard Guenther wrote:
>
>> > CPUID to get target features and set global vars corresponding to the
>> > features. So, the builtin should be folded by into the appropriate
>> > variable in libgcc.
>>
>> Hm, but then the variable should reside in libgcc and you'd only need an
>> extern variant in the varpool.  I'm not sure separate constructors
>> (possibly in each module ...) would be better than a single one in
>> libgcc that would get run unconditionally.
>
> Would be my preference too.
>
>> > determining target features and setting the appropriate globals. If
>> > the new builtins are called, gcc will call __cpu_indicator_init in a
>> > constructor so that it is called exactly once. Then, gcc will fold the
>> > builtin to the appropriate global variable.
>>
>> I see, but this sounds like premature optimization to me, no?  Considering
>> you'd do this in each module and our inability to merge those constructors
>> at link time.  If we put __cpu_indicator, the constructor and the assorted
>> support into a separate module inside libgcc.a could we arrange it in a way
>> that if __cpu_indicator is not referenced from the program that piece isn't
>> linked in?  (not sure if that is possible with constructors)
>
> If you make an .o file only exporting __cpu_indicator, then it won't be
> included in a link where no object file refers to that symbol.  If you put
> the ctor for that variable in the same .o file you win.
>
> I also take issue with the large number of builtins, I'd have expected one
> single builtin returning the CPU type, and an enum that can be tested.
> That potentially requires an installed gcc private header, but I think
> enabling access to this cpu detection facility in libgcc to our users is
> worthwhile.

The CPU type builtins can probably be combined, not the feature testing ones.

David

>
>
> Ciao,
> Michael.
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
  2011-08-18  9:33     ` Richard Guenther
  2011-08-18 14:04       ` Michael Matz
@ 2011-08-18 21:15       ` Sriraman Tallam
  2011-08-18 21:53         ` Richard Henderson
  1 sibling, 1 reply; 50+ messages in thread
From: Sriraman Tallam @ 2011-08-18 21:15 UTC (permalink / raw)
  To: Richard Guenther; +Cc: reply, gcc-patches

On Thu, Aug 18, 2011 at 1:03 AM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Wed, Aug 17, 2011 at 7:54 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>> On Wed, Aug 17, 2011 at 12:37 AM, Richard Guenther
>> <richard.guenther@gmail.com> wrote:
>>> On Tue, Aug 16, 2011 at 10:50 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>> Support for getting CPU type and feature information at run-time.
>>>>
>>>> The following patch provides support for finding the platform type at run-time, like cpu type and features supported. The multi-versioning framework will use the builtins added to dispatch the right function version. Please refer to http://gcc.gnu.org/ml/gcc/2011-08/msg00298.html for details on function multi-versioning usability.
>>>
>>> Please provide an overview why you need the new builtins,
>>
>> For multi-versioning,  the compiler can call the appropriate builtin
>> to dispatch the right version. The builtin call will later get folded.
>>
>> For example,
>>
>> int  __attribute__ version ("sse4_1")
>> compute ()
>> {
>>   // Do sse4_1 specific impkementation.
>> }
>>
>> int
>> compute ()
>> {
>>  // Generic implementation
>> }
>>
>> The compiler will check if the target supports the attribute and then
>> convert a call to compute ()  into  this:
>>
>> if (__builtin_target_supports_sse4_1 ())
>>  compute_sse4_1 (); // Call to the SSE4_1 implementation
>> else
>>  compute_generic (); // Call to the generic implementation
>>
>> Further, having it as builtin function allows it to be overridden by
>> the programmer. For instance, the programmer can override it to
>> identify newer CPU types not yet supported. Having these builtins
>> makes it convenient to identify platform type and features in general.
>>
>> why you need
>>> a separate pass to fold them (instead of just expanding them) and why
>>
>> I can move it into builtins.c along with where other builtins are
>> folded and remove the separate pass. My intention originally was to
>> fold them as early as possible, in this case after multi-versioning
>> but I guess this is not a requirement.
>
> Yes, they should be folded by targetm.fold_builtin instead.  The Frontend
> should simply fold the tests at the time it creates them, that's as early
> as possible (gimplification will also re-fold all builtin function calls).
>
>>> you are creating
>>> vars behind the back of GCC:
>>
>> The flow I had in mind was to have functions in libgcc which will use
>> CPUID to get target features and set global vars corresponding to the
>> features. So, the builtin should be folded by into the appropriate
>> variable in libgcc.
>
> Hm, but then the variable should reside in libgcc and you'd only need
> an extern variant in the varpool.  I'm not sure separate constructors
> (possibly in each module ...) would be better than a single one in
> libgcc that would get run unconditionally.
>
>>>
>>> +  /* Set finalized to 1, otherwise it asserts in function "write_symbol" in
>>> +     lto-streamer-out.c. */
>>> +  vnode->finalized = 1;
>>>
>>> where I think you miss a varpool_finalize_node call somewhere.  Why
>>> isn't this all done at target init time
>>
>> I wanted to do this on demand. If none of the new builtins are called
>> in the program, I do not need to to do this at all. In summary, libgcc
>> has a function called __cpu_indicator_init which does the work of
>> determining target features and setting the appropriate globals. If
>> the new builtins are called, gcc will call __cpu_indicator_init in a
>> constructor so that it is called exactly once. Then, gcc will fold the
>> builtin to the appropriate global variable.
>
> I see, but this sounds like premature optimization to me, no?  Considering
> you'd do this in each module and our inability to merge those constructors
> at link time.  If we put __cpu_indicator, the constructor and the assorted
> support into a separate module inside libgcc.a could we arrange it in a way
> that if __cpu_indicator is not referenced from the program that piece isn't
> linked in?  (not sure if that is possible with constructors)

Ok, so two things. I create the constructor as a comdat. So, it is
created by gcc in every module but at link time only one copy will be
kept. So, it is going to be called only once and that is not a
problem. The other thing is that I can eliminate all of this code gen
in gcc for and mark this as a constructor in libgcc which means it
will always be linked in and always be called once at run-time. There
is no easy way right now to garbage collect unreferenced ctors at
run-time. I do not have a strong opinion on this and I can do the
latter.

>
> Richard.
>
>>
>> ?  If you don't mark the
>>> variable as to be preserved
>>> like you do cgraph will optimize it all away if it isn't needed.
>>
>>>
>>> Richard.
>>>
>>>>        * tree-pass.h (pass_tree_fold_builtin_target): New pass.
>>>>        * builtins.def (BUILT_IN_TARGET_SUPPORTS_CMOV): New builtin.
>>>>        (BUILT_IN_TARGET_SUPPORTS_MMX): New builtin.
>>>>        (BUILT_IN_TARGET_SUPPORTS_POPCOUNT): New builtin.
>>>>        (BUILT_IN_TARGET_SUPPORTS_SSE): New builtin.
>>>>        (BUILT_IN_TARGET_SUPPORTS_SSE2): New builtin.
>>>>        (BUILT_IN_TARGET_SUPPORTS_SSE3): New builtin.
>>>>        (BUILT_IN_TARGET_SUPPORTS_SSSE3): New builtin.
>>>>        (BUILT_IN_TARGET_SUPPORTS_SSE4_1): New builtin.
>>>>        (BUILT_IN_TARGET_SUPPORTS_SSE4_2): New builtin.
>>>>        (BUILT_IN_TARGET_IS_AMD): New builtin.
>>>>        (BUILT_IN_TARGET_IS_INTEL): New builtin.
>>>>        (BUILT_IN_TARGET_IS_COREI7_NEHALEM): New builtin.
>>>>        (BUILT_IN_TARGET_IS_COREI7_WESTMERE): New builtin.
>>>>        (BUILT_IN_TARGET_IS_COREI7_SANDYBRIDGE): New builtin.
>>>>        (BUILT_IN_TARGET_IS_AMDFAM10_BARCELONA): New builtin.
>>>>        (BUILT_IN_TARGET_IS_AMDFAM10_SHANGHAI): New builtin.
>>>>        (BUILT_IN_TARGET_IS_AMDFAM10_ISTANBUL): New builtin.
>>>>        * mversn-dispatch.c (do_fold_builtin_target): New function.
>>>>        (gate_fold_builtin_target): New function.
>>>>        (pass_tree_fold_builtin_target): New pass.
>>>>        * timevar.def (TV_FOLD_BUILTIN_TARGET): New var.
>>>>        * passes.c (init_optimization_passes): Add new pass to pass list.
>>>>        * config/i386/i386.c (build_struct_with_one_bit_fields): New function.
>>>>        (make_var_decl): New function.
>>>>        (get_field_from_struct): New function.
>>>>        (make_constructor_to_get_target_type): New function.
>>>>        (fold_builtin_target): New function.
>>>>        (ix86_fold_builtin): New function.
>>>>        (TARGET_FOLD_BUILTIN): New macro.
>>>>
>>>>        * gcc.dg/builtin_target.c: New test.
>>>>
>>>>        * config/i386/i386-cpuinfo.c: New file.
>>>>        * config/i386/t-cpuinfo: New file.
>>>>        * config.host: Add t-cpuinfo to link i386-cpuinfo.o with libgcc
>>>>
>>>> Index: libgcc/config.host
>>>> ===================================================================
>>>> --- libgcc/config.host  (revision 177767)
>>>> +++ libgcc/config.host  (working copy)
>>>> @@ -609,7 +609,7 @@ case ${host} in
>>>>  i[34567]86-*-linux* | x86_64-*-linux* | \
>>>>   i[34567]86-*-kfreebsd*-gnu | i[34567]86-*-knetbsd*-gnu | \
>>>>   i[34567]86-*-gnu*)
>>>> -       tmake_file="${tmake_file} t-tls"
>>>> +       tmake_file="${tmake_file} t-tls i386/t-cpuinfo"
>>>>        if test "$libgcc_cv_cfi" = "yes"; then
>>>>                tmake_file="${tmake_file} t-stack i386/t-stack-i386"
>>>>        fi
>>>> Index: libgcc/config/i386/t-cpuinfo
>>>> ===================================================================
>>>> --- libgcc/config/i386/t-cpuinfo        (revision 0)
>>>> +++ libgcc/config/i386/t-cpuinfo        (revision 0)
>>>> @@ -0,0 +1,2 @@
>>>> +# This is an endfile
>>>> +LIB2ADD += $(srcdir)/config/i386/i386-cpuinfo.c
>>>> Index: libgcc/config/i386/i386-cpuinfo.c
>>>> ===================================================================
>>>> --- libgcc/config/i386/i386-cpuinfo.c   (revision 0)
>>>> +++ libgcc/config/i386/i386-cpuinfo.c   (revision 0)
>>>> @@ -0,0 +1,275 @@
>>>> +/* Copyright (C) 2011 Free Software Foundation, Inc.
>>>> + * Contributed by Sriraman Tallam <tmsriram@google.com>.
>>>> + *
>>>> + * This file is free software; you can redistribute it and/or modify it
>>>> + * under the terms of the GNU General Public License as published by the
>>>> + * Free Software Foundation; either version 3, or (at your option) any
>>>> + * later version.
>>>> + *
>>>> + * This file is distributed in the hope that it will be useful, but
>>>> + * WITHOUT ANY WARRANTY; without even the implied warranty of
>>>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>>>> + * General Public License for more details.
>>>> + *
>>>> + * Under Section 7 of GPL version 3, you are granted additional
>>>> + * permissions described in the GCC Runtime Library Exception, version
>>>> + * 3.1, as published by the Free Software Foundation.
>>>> + *
>>>> + * You should have received a copy of the GNU General Public License and
>>>> + * a copy of the GCC Runtime Library Exception along with this program;
>>>> + * see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
>>>> + * <http://www.gnu.org/licenses/>.
>>>> + *
>>>> + *
>>>> + * This code is adapted from gcc/config/i386/driver-i386.c. The CPUID
>>>> + * instruction is used to figure out the cpu type and supported features.
>>>> + * GCC runs __cpu_indicator_init from a constructor which sets the members
>>>> + * of __cpu_model and __cpu_features.
>>>> + */
>>>> +
>>>> +#include <string.h>
>>>> +
>>>> +#ifdef __GNUC__
>>>> +#include "cpuid.h"
>>>> +
>>>> +enum processor_type
>>>> +{
>>>> +  PROCESSOR_PENTIUM = 0,
>>>> +  PROCESSOR_CORE2,
>>>> +  PROCESSOR_COREI7_NEHALEM,
>>>> +  PROCESSOR_COREI7_WESTMERE,
>>>> +  PROCESSOR_COREI7_SANDYBRIDGE,
>>>> +  PROCESSOR_INTEL_GENERIC,
>>>> +  PROCESSOR_AMDFAM10_BARCELONA,
>>>> +  PROCESSOR_AMDFAM10_SHANGHAI,
>>>> +  PROCESSOR_AMDFAM10_ISTANBUL,
>>>> +  PROCESSOR_AMDFAM10_GENERIC,
>>>> +  PROCESSOR_AMD_GENERIC,
>>>> +  PROCESSOR_GENERIC,
>>>> +  PROCESSOR_max
>>>> +};
>>>> +
>>>> +enum vendor_signatures
>>>> +{
>>>> +  SIG_INTEL =  0x756e6547 /* Genu */,
>>>> +  SIG_AMD =    0x68747541 /* Auth */
>>>> +};
>>>> +
>>>> +
>>>> +/* Features supported. */
>>>> +
>>>> +struct __processor_features
>>>> +{
>>>> +  unsigned int __cpu_cmov : 1;
>>>> +  unsigned int __cpu_mmx : 1;
>>>> +  unsigned int __cpu_popcnt : 1;
>>>> +  unsigned int __cpu_sse : 1;
>>>> +  unsigned int __cpu_sse2 : 1;
>>>> +  unsigned int __cpu_sse3 : 1;
>>>> +  unsigned int __cpu_ssse3 : 1;
>>>> +  unsigned int __cpu_sse4_1 : 1;
>>>> +  unsigned int __cpu_sse4_2 : 1;
>>>> +};
>>>> +
>>>> +/* Flags exported. */
>>>> +
>>>> +struct __processor_model
>>>> +{
>>>> +  unsigned int __cpu_is_amd : 1;
>>>> +  unsigned int __cpu_is_intel : 1;
>>>> +  unsigned int __cpu_is_corei7_nehalem : 1;
>>>> +  unsigned int __cpu_is_corei7_westmere : 1;
>>>> +  unsigned int __cpu_is_corei7_sandybridge : 1;
>>>> +  unsigned int __cpu_is_amdfam10_barcelona : 1;
>>>> +  unsigned int __cpu_is_amdfam10_shanghai : 1;
>>>> +  unsigned int __cpu_is_amdfam10_istanbul : 1;
>>>> +};
>>>> +
>>>> +enum processor_type __cpu_type = PROCESSOR_GENERIC;
>>>> +struct __processor_features __cpu_features;
>>>> +struct __processor_model __cpu_model;
>>>> +
>>>> +static void
>>>> +get_amd_cpu (unsigned int family, unsigned int model)
>>>> +{
>>>> +  switch (family)
>>>> +    {
>>>> +    case 0x10:
>>>> +      switch (model)
>>>> +       {
>>>> +       case 0x2:
>>>> +         __cpu_type = PROCESSOR_AMDFAM10_BARCELONA;
>>>> +         __cpu_model.__cpu_is_amdfam10_barcelona = 1;
>>>> +         break;
>>>> +       case 0x4:
>>>> +         __cpu_type = PROCESSOR_AMDFAM10_SHANGHAI;
>>>> +         __cpu_model.__cpu_is_amdfam10_shanghai = 1;
>>>> +         break;
>>>> +       case 0x8:
>>>> +         __cpu_type = PROCESSOR_AMDFAM10_ISTANBUL;
>>>> +         __cpu_model.__cpu_is_amdfam10_istanbul = 1;
>>>> +         break;
>>>> +       default:
>>>> +         __cpu_type = PROCESSOR_AMDFAM10_GENERIC;
>>>> +         break;
>>>> +       }
>>>> +      break;
>>>> +    default:
>>>> +      __cpu_type = PROCESSOR_AMD_GENERIC;
>>>> +    }
>>>> +}
>>>> +
>>>> +static void
>>>> +get_intel_cpu (unsigned int family, unsigned int model, unsigned int brand_id)
>>>> +{
>>>> +  /* Parse family and model only if brand ID is 0. */
>>>> +  if (brand_id == 0)
>>>> +    {
>>>> +      switch (family)
>>>> +       {
>>>> +       case 0x5:
>>>> +         __cpu_type = PROCESSOR_PENTIUM;
>>>> +         break;
>>>> +       case 0x6:
>>>> +         switch (model)
>>>> +           {
>>>> +           case 0x1a:
>>>> +           case 0x1e:
>>>> +           case 0x1f:
>>>> +           case 0x2e:
>>>> +             /* Nehalem.  */
>>>> +             __cpu_type = PROCESSOR_COREI7_NEHALEM;
>>>> +             __cpu_model.__cpu_is_corei7_nehalem = 1;
>>>> +             break;
>>>> +           case 0x25:
>>>> +           case 0x2c:
>>>> +           case 0x2f:
>>>> +             /* Westmere.  */
>>>> +             __cpu_type = PROCESSOR_COREI7_WESTMERE;
>>>> +             __cpu_model.__cpu_is_corei7_westmere = 1;
>>>> +             break;
>>>> +           case 0x2a:
>>>> +             /* Sandy Bridge.  */
>>>> +             __cpu_type = PROCESSOR_COREI7_SANDYBRIDGE;
>>>> +             __cpu_model.__cpu_is_corei7_sandybridge = 1;
>>>> +             break;
>>>> +           case 0x17:
>>>> +           case 0x1d:
>>>> +             /* Penryn.  */
>>>> +           case 0x0f:
>>>> +             /* Merom.  */
>>>> +             __cpu_type = PROCESSOR_CORE2;
>>>> +             break;
>>>> +           default:
>>>> +             __cpu_type = PROCESSOR_INTEL_GENERIC;
>>>> +             break;
>>>> +           }
>>>> +         break;
>>>> +       default:
>>>> +         /* We have no idea.  */
>>>> +         __cpu_type = PROCESSOR_INTEL_GENERIC;
>>>> +         break;
>>>> +       }
>>>> +    }
>>>> +}
>>>> +
>>>> +static void
>>>> +get_available_features (unsigned int ecx, unsigned int edx)
>>>> +{
>>>> +  __cpu_features.__cpu_cmov = (edx & bit_CMOV) ? 1 : 0;
>>>> +  __cpu_features.__cpu_mmx = (edx & bit_MMX) ? 1 : 0;
>>>> +  __cpu_features.__cpu_sse = (edx & bit_SSE) ? 1 : 0;
>>>> +  __cpu_features.__cpu_sse2 = (edx & bit_SSE2) ? 1 : 0;
>>>> +  __cpu_features.__cpu_popcnt = (ecx & bit_POPCNT) ? 1 : 0;
>>>> +  __cpu_features.__cpu_sse3 = (ecx & bit_SSE3) ? 1 : 0;
>>>> +  __cpu_features.__cpu_ssse3 = (ecx & bit_SSSE3) ? 1 : 0;
>>>> +  __cpu_features.__cpu_sse4_1 = (ecx & bit_SSE4_1) ? 1 : 0;
>>>> +  __cpu_features.__cpu_sse4_2 = (ecx & bit_SSE4_2) ? 1 : 0;
>>>> +}
>>>> +
>>>> +/* A noinline function calling __get_cpuid. Having many calls to
>>>> +   cpuid in one function in 32-bit mode causes GCC to complain:
>>>> +   "can’t find a register in class ‘CLOBBERED_REGS’".  This is
>>>> +   related to PR rtl-optimization 44174. */
>>>> +
>>>> +static int __attribute__ ((noinline))
>>>> +__get_cpuid_output (unsigned int __level,
>>>> +                   unsigned int *__eax, unsigned int *__ebx,
>>>> +                   unsigned int *__ecx, unsigned int *__edx)
>>>> +{
>>>> +  return __get_cpuid (__level, __eax, __ebx, __ecx, __edx);
>>>> +}
>>>> +
>>>> +/* This function will be linked in to binaries that need to look up
>>>> +   CPU information.  */
>>>> +
>>>> +void
>>>> +__cpu_indicator_init(void)
>>>> +{
>>>> +  unsigned int eax, ebx, ecx, edx;
>>>> +
>>>> +  int max_level = 5;
>>>> +  unsigned int vendor;
>>>> +  unsigned int model, family, brand_id;
>>>> +
>>>> +  memset (&__cpu_features, 0, sizeof (struct __processor_features));
>>>> +  memset (&__cpu_model, 0, sizeof (struct __processor_model));
>>>> +
>>>> +  /* Assume cpuid insn present. Run in level 0 to get vendor id. */
>>>> +  if (!__get_cpuid_output (0, &eax, &ebx, &ecx, &edx))
>>>> +    return;
>>>> +
>>>> +  vendor = ebx;
>>>> +  max_level = eax;
>>>> +
>>>> +  if (max_level < 1)
>>>> +    return;
>>>> +
>>>> +  if (!__get_cpuid_output (1, &eax, &ebx, &ecx, &edx))
>>>> +    return;
>>>> +
>>>> +  model = (eax >> 4) & 0x0f;
>>>> +  family = (eax >> 8) & 0x0f;
>>>> +  brand_id = ebx & 0xff;
>>>> +
>>>> +  /* Adjust model and family for Intel CPUS. */
>>>> +  if (vendor == SIG_INTEL)
>>>> +    {
>>>> +      unsigned int extended_model, extended_family;
>>>> +
>>>> +      extended_model = (eax >> 12) & 0xf0;
>>>> +      extended_family = (eax >> 20) & 0xff;
>>>> +      if (family == 0x0f)
>>>> +       {
>>>> +         family += extended_family;
>>>> +         model += extended_model;
>>>> +       }
>>>> +      else if (family == 0x06)
>>>> +       model += extended_model;
>>>> +    }
>>>> +
>>>> +  /* Find CPU model. */
>>>> +
>>>> +  if (vendor == SIG_AMD)
>>>> +    {
>>>> +      __cpu_model.__cpu_is_amd = 1;
>>>> +      get_amd_cpu (family, model);
>>>> +    }
>>>> +  else if (vendor == SIG_INTEL)
>>>> +    {
>>>> +      __cpu_model.__cpu_is_intel = 1;
>>>> +      get_intel_cpu (family, model, brand_id);
>>>> +    }
>>>> +
>>>> +  /* Find available features. */
>>>> +  get_available_features (ecx, edx);
>>>> +}
>>>> +
>>>> +#else
>>>> +
>>>> +void
>>>> +__cpu_indicator_init(void)
>>>> +{
>>>> +}
>>>> +
>>>> +#endif /* __GNUC__ */
>>>> Index: gcc/tree-pass.h
>>>> ===================================================================
>>>> --- gcc/tree-pass.h     (revision 177767)
>>>> +++ gcc/tree-pass.h     (working copy)
>>>> @@ -449,6 +449,7 @@ extern struct gimple_opt_pass pass_split_functions
>>>>  extern struct gimple_opt_pass pass_feedback_split_functions;
>>>>  extern struct gimple_opt_pass pass_threadsafe_analyze;
>>>>  extern struct gimple_opt_pass pass_tree_convert_builtin_dispatch;
>>>> +extern struct gimple_opt_pass pass_tree_fold_builtin_target;
>>>>
>>>>  /* IPA Passes */
>>>>  extern struct simple_ipa_opt_pass pass_ipa_lower_emutls;
>>>> Index: gcc/testsuite/gcc.dg/builtin_target.c
>>>> ===================================================================
>>>> --- gcc/testsuite/gcc.dg/builtin_target.c       (revision 0)
>>>> +++ gcc/testsuite/gcc.dg/builtin_target.c       (revision 0)
>>>> @@ -0,0 +1,49 @@
>>>> +/* This test checks if the __builtin_target_* calls are recognized. */
>>>> +
>>>> +/* { dg-do run } */
>>>> +
>>>> +int
>>>> +fn1 ()
>>>> +{
>>>> +  if (__builtin_target_supports_cmov () < 0)
>>>> +    return -1;
>>>> +  if (__builtin_target_supports_mmx () < 0)
>>>> +    return -1;
>>>> +  if (__builtin_target_supports_popcount () < 0)
>>>> +    return -1;
>>>> +  if (__builtin_target_supports_sse () < 0)
>>>> +    return -1;
>>>> +  if (__builtin_target_supports_sse2 () < 0)
>>>> +    return -1;
>>>> +  if (__builtin_target_supports_sse3 () < 0)
>>>> +    return -1;
>>>> +  if (__builtin_target_supports_ssse3 () < 0)
>>>> +    return -1;
>>>> +  if (__builtin_target_supports_sse4_1 () < 0)
>>>> +    return -1;
>>>> +  if (__builtin_target_supports_sse4_2 () < 0)
>>>> +    return -1;
>>>> +  if (__builtin_target_is_amd () < 0)
>>>> +    return -1;
>>>> +  if (__builtin_target_is_intel () < 0)
>>>> +    return -1;
>>>> +  if (__builtin_target_is_corei7_nehalem () < 0)
>>>> +    return -1;
>>>> +  if (__builtin_target_is_corei7_westmere () < 0)
>>>> +    return -1;
>>>> +  if (__builtin_target_is_corei7_sandybridge () < 0)
>>>> +    return -1;
>>>> +  if (__builtin_target_is_amdfam10_barcelona () < 0)
>>>> +    return -1;
>>>> +  if (__builtin_target_is_amdfam10_shanghai () < 0)
>>>> +    return -1;
>>>> +  if (__builtin_target_is_amdfam10_istanbul () < 0)
>>>> +    return -1;
>>>> +
>>>> +  return 0;
>>>> +}
>>>> +
>>>> +int main ()
>>>> +{
>>>> +  return fn1 ();
>>>> +}
>>>> Index: gcc/builtins.def
>>>> ===================================================================
>>>> --- gcc/builtins.def    (revision 177767)
>>>> +++ gcc/builtins.def    (working copy)
>>>> @@ -763,6 +763,25 @@ DEF_BUILTIN (BUILT_IN_EMUTLS_REGISTER_COMMON,
>>>>  /* Multiversioning builtin dispatch hook. */
>>>>  DEF_GCC_BUILTIN (BUILT_IN_DISPATCH, "dispatch", BT_FN_INT_PTR_FN_INT_PTR_PTR_VAR, ATTR_NULL)
>>>>
>>>> +/* Builtins to determine target type and features at run-time. */
>>>> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_SUPPORTS_CMOV, "target_supports_cmov", BT_FN_INT, ATTR_NULL)
>>>> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_SUPPORTS_MMX, "target_supports_mmx", BT_FN_INT, ATTR_NULL)
>>>> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_SUPPORTS_POPCOUNT, "target_supports_popcount", BT_FN_INT, ATTR_NULL)
>>>> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_SUPPORTS_SSE, "target_supports_sse", BT_FN_INT, ATTR_NULL)
>>>> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_SUPPORTS_SSE2, "target_supports_sse2", BT_FN_INT, ATTR_NULL)
>>>> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_SUPPORTS_SSE3, "target_supports_sse3", BT_FN_INT, ATTR_NULL)
>>>> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_SUPPORTS_SSSE3, "target_supports_ssse3", BT_FN_INT, ATTR_NULL)
>>>> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_SUPPORTS_SSE4_1, "target_supports_sse4_1", BT_FN_INT, ATTR_NULL)
>>>> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_SUPPORTS_SSE4_2, "target_supports_sse4_2", BT_FN_INT, ATTR_NULL)
>>>> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_IS_AMD, "target_is_amd", BT_FN_INT, ATTR_NULL)
>>>> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_IS_INTEL, "target_is_intel", BT_FN_INT, ATTR_NULL)
>>>> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_IS_COREI7_NEHALEM, "target_is_corei7_nehalem", BT_FN_INT, ATTR_NULL)
>>>> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_IS_COREI7_WESTMERE, "target_is_corei7_westmere", BT_FN_INT, ATTR_NULL)
>>>> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_IS_COREI7_SANDYBRIDGE, "target_is_corei7_sandybridge", BT_FN_INT, ATTR_NULL)
>>>> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_IS_AMDFAM10_BARCELONA, "target_is_amdfam10_barcelona", BT_FN_INT, ATTR_NULL)
>>>> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_IS_AMDFAM10_SHANGHAI, "target_is_amdfam10_shanghai", BT_FN_INT, ATTR_NULL)
>>>> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_IS_AMDFAM10_ISTANBUL, "target_is_amdfam10_istanbul", BT_FN_INT, ATTR_NULL)
>>>> +
>>>>  /* Exception support.  */
>>>>  DEF_BUILTIN_STUB (BUILT_IN_UNWIND_RESUME, "__builtin_unwind_resume")
>>>>  DEF_BUILTIN_STUB (BUILT_IN_CXA_END_CLEANUP, "__builtin_cxa_end_cleanup")
>>>> Index: gcc/mversn-dispatch.c
>>>> ===================================================================
>>>> --- gcc/mversn-dispatch.c       (revision 177767)
>>>> +++ gcc/mversn-dispatch.c       (working copy)
>>>> @@ -135,6 +135,7 @@ along with GCC; see the file COPYING3.  If not see
>>>>  #include "output.h"
>>>>  #include "vecprim.h"
>>>>  #include "gimple-pretty-print.h"
>>>> +#include "target.h"
>>>>
>>>>  typedef struct cgraph_node* NODEPTR;
>>>>  DEF_VEC_P (NODEPTR);
>>>> @@ -1764,3 +1765,103 @@ struct gimple_opt_pass pass_tree_convert_builtin_d
>>>>   TODO_update_ssa | TODO_verify_ssa
>>>>  }
>>>>  };
>>>> +
>>>> +/* Fold calls to __builtin_target_* */
>>>> +
>>>> +static unsigned int
>>>> +do_fold_builtin_target (void)
>>>> +{
>>>> +  basic_block bb;
>>>> +  gimple_stmt_iterator gsi;
>>>> +
>>>> +  /* Go through each stmt looking for __builtin_target_* calls */
>>>> +  FOR_EACH_BB_FN (bb, DECL_STRUCT_FUNCTION (current_function_decl))
>>>> +    {
>>>> +      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
>>>> +        {
>>>> +         gimple stmt = gsi_stmt (gsi);
>>>> +         gimple assign_stmt;
>>>> +          tree call_decl;
>>>> +         tree lhs_retval;
>>>> +         tree folded_val;
>>>> +
>>>> +         tree ssa_var, tmp_var;
>>>> +         gimple init_stmt;
>>>> +
>>>> +          if (!is_gimple_call (stmt))
>>>> +            continue;
>>>> +
>>>> +          call_decl = gimple_call_fndecl (stmt);
>>>> +
>>>> +         /* Check if it is a __builtin_target_* call. */
>>>> +
>>>> +         if (call_decl == NULL
>>>> +             || DECL_NAME (call_decl) == NULL_TREE
>>>> +             || DECL_BUILT_IN_CLASS (call_decl) != BUILT_IN_NORMAL
>>>> +             || strstr (IDENTIFIER_POINTER (DECL_NAME (call_decl)),
>>>> +                         "__builtin_target") == NULL)
>>>> +            continue;
>>>> +
>>>> +         /* If the lhs is NULL there is no need to fold the call. */
>>>> +         lhs_retval = gimple_call_lhs(stmt);
>>>> +         if (lhs_retval == NULL)
>>>> +           continue;
>>>> +
>>>> +         /* Call the target hook to fold the builtin */
>>>> +          folded_val = targetm.fold_builtin(call_decl, 0, NULL, false);
>>>> +
>>>> +         /* If the target does not support the builtin then fold it to zero. */
>>>> +         if (folded_val == NULL_TREE)
>>>> +           folded_val = build_zero_cst (unsigned_type_node);
>>>> +
>>>> +         /* Type cast unsigned value to integer */
>>>> +         tmp_var = create_tmp_var (unsigned_type_node, NULL);
>>>> +         init_stmt = gimple_build_assign (tmp_var, folded_val);
>>>> +         ssa_var = make_ssa_name (tmp_var, init_stmt);
>>>> +         gimple_assign_set_lhs (init_stmt, ssa_var);
>>>> +         mark_symbols_for_renaming (init_stmt);
>>>> +
>>>> +         assign_stmt = gimple_build_assign_with_ops (NOP_EXPR, lhs_retval, ssa_var, 0);
>>>> +         mark_symbols_for_renaming(assign_stmt);
>>>> +
>>>> +         gsi_insert_after_without_update (&gsi, assign_stmt, GSI_SAME_STMT);
>>>> +         gsi_insert_after_without_update (&gsi, init_stmt, GSI_SAME_STMT);
>>>> +         /* Delete the original call. */
>>>> +         gsi_remove(&gsi, true);
>>>> +       }
>>>> +    }
>>>> +
>>>> +  return 0;
>>>> +}
>>>> +
>>>> +static bool
>>>> +gate_fold_builtin_target (void)
>>>> +{
>>>> +  return true;
>>>> +}
>>>> +
>>>> +/* Pass to fold __builtin_target_* functions */
>>>> +
>>>> +struct gimple_opt_pass pass_tree_fold_builtin_target =
>>>> +{
>>>> + {
>>>> +  GIMPLE_PASS,
>>>> +  "fold_builtin_target",               /* name */
>>>> +  gate_fold_builtin_target,            /* gate */
>>>> +  do_fold_builtin_target,              /* execute */
>>>> +  NULL,                                        /* sub */
>>>> +  NULL,                                        /* next */
>>>> +  0,                                   /* static_pass_number */
>>>> +  TV_FOLD_BUILTIN_TARGET,              /* tv_id */
>>>> +  PROP_cfg,                            /* properties_required */
>>>> +  PROP_cfg,                            /* properties_provided */
>>>> +  0,                                   /* properties_destroyed */
>>>> +  0,                                   /* todo_flags_start */
>>>> +  TODO_dump_func |                     /* todo_flags_finish */
>>>> +  TODO_cleanup_cfg |
>>>> +  TODO_update_ssa |
>>>> +  TODO_verify_ssa
>>>> + }
>>>> +};
>>>> +
>>>> +
>>>> Index: gcc/timevar.def
>>>> ===================================================================
>>>> --- gcc/timevar.def     (revision 177767)
>>>> +++ gcc/timevar.def     (working copy)
>>>> @@ -124,6 +124,7 @@ DEFTIMEVAR (TV_PARSE_INMETH          , "parser inl
>>>>  DEFTIMEVAR (TV_TEMPLATE_INST         , "template instantiation")
>>>>  DEFTIMEVAR (TV_INLINE_HEURISTICS     , "inline heuristics")
>>>>  DEFTIMEVAR (TV_MVERSN_DISPATCH       , "multiversion dispatch")
>>>> +DEFTIMEVAR (TV_FOLD_BUILTIN_TARGET   , "fold __builtin_target calls")
>>>>  DEFTIMEVAR (TV_INTEGRATION           , "integration")
>>>>  DEFTIMEVAR (TV_TREE_GIMPLIFY        , "tree gimplify")
>>>>  DEFTIMEVAR (TV_TREE_EH              , "tree eh")
>>>> Index: gcc/passes.c
>>>> ===================================================================
>>>> --- gcc/passes.c        (revision 177767)
>>>> +++ gcc/passes.c        (working copy)
>>>> @@ -1249,6 +1249,8 @@ init_optimization_passes (void)
>>>>     {
>>>>       struct opt_pass **p = &pass_ipa_multiversion_dispatch.pass.sub;
>>>>       NEXT_PASS (pass_tree_convert_builtin_dispatch);
>>>> +      /* Fold calls to __builtin_target_*. */
>>>> +      NEXT_PASS (pass_tree_fold_builtin_target);
>>>>       /* Rebuilding cgraph edges is necessary as the above passes change
>>>>          the call graph.  Otherwise, future optimizations use the old
>>>>         call graph and make wrong decisions sometimes.*/
>>>> Index: gcc/config/i386/i386.c
>>>> ===================================================================
>>>> --- gcc/config/i386/i386.c      (revision 177767)
>>>> +++ gcc/config/i386/i386.c      (working copy)
>>>> @@ -58,6 +58,8 @@ along with GCC; see the file COPYING3.  If not see
>>>>  #include "sched-int.h"
>>>>  #include "sbitmap.h"
>>>>  #include "fibheap.h"
>>>> +#include "tree-flow.h"
>>>> +#include "tree-pass.h"
>>>>
>>>>  enum upper_128bits_state
>>>>  {
>>>> @@ -7867,6 +7869,338 @@ ix86_build_builtin_va_list (void)
>>>>   return ret;
>>>>  }
>>>>
>>>> +/* Returns a struct type with name NAME and number of fields equal to
>>>> +   NUM_FIELDS.  Each field is a unsigned int bit field of length 1 bit. */
>>>> +
>>>> +static tree
>>>> +build_struct_with_one_bit_fields (int num_fields, const char *name)
>>>> +{
>>>> +  int i;
>>>> +  char field_name [10];
>>>> +  tree field = NULL_TREE, field_chain = NULL_TREE;
>>>> +  tree type = make_node (RECORD_TYPE);
>>>> +
>>>> +  strcpy (field_name, "k_field");
>>>> +
>>>> +  for (i = 0; i < num_fields; i++)
>>>> +    {
>>>> +      /* Name the fields, 0_field, 1_field, ... */
>>>> +      field_name [0] = '0' + i;
>>>> +      field = build_decl (UNKNOWN_LOCATION, FIELD_DECL,
>>>> +                         get_identifier (field_name), unsigned_type_node);
>>>> +      DECL_BIT_FIELD (field) = 1;
>>>> +      DECL_SIZE (field) = bitsize_one_node;
>>>> +      if (field_chain != NULL_TREE)
>>>> +       DECL_CHAIN (field) = field_chain;
>>>> +      field_chain = field;
>>>> +    }
>>>> +  finish_builtin_struct (type, name, field_chain, NULL_TREE);
>>>> +  return type;
>>>> +}
>>>> +
>>>> +/* Returns a VAR_DECL of type TYPE and name NAME. */
>>>> +
>>>> +static tree
>>>> +make_var_decl (tree type, const char *name)
>>>> +{
>>>> +  tree new_decl;
>>>> +  struct varpool_node *vnode;
>>>> +
>>>> +  new_decl = build_decl (UNKNOWN_LOCATION,
>>>> +                        VAR_DECL,
>>>> +                        get_identifier(name),
>>>> +                        type);
>>>> +
>>>> +  DECL_EXTERNAL (new_decl) = 1;
>>>> +  TREE_STATIC (new_decl) = 1;
>>>> +  TREE_PUBLIC (new_decl) = 1;
>>>> +  DECL_INITIAL (new_decl) = 0;
>>>> +  DECL_ARTIFICIAL (new_decl) = 0;
>>>> +  DECL_PRESERVE_P (new_decl) = 1;
>>>> +
>>>> +  make_decl_one_only (new_decl, DECL_ASSEMBLER_NAME (new_decl));
>>>> +  assemble_variable (new_decl, 0, 0, 0);
>>>> +
>>>> +  vnode = varpool_node (new_decl);
>>>> +  gcc_assert (vnode != NULL);
>>>> +  /* Set finalized to 1, otherwise it asserts in function "write_symbol" in
>>>> +     lto-streamer-out.c. */
>>>> +  vnode->finalized = 1;
>>>> +
>>>> +  return new_decl;
>>>> +}
>>>> +
>>>> +/* Traverses the chain of fields in STRUCT_TYPE and returns the FIELD_NUM
>>>> +   numbered field. */
>>>> +
>>>> +static tree
>>>> +get_field_from_struct (tree struct_type, int field_num)
>>>> +{
>>>> +  int i;
>>>> +  tree field = TYPE_FIELDS (struct_type);
>>>> +
>>>> +  for (i = 0; i < field_num; i++, field = DECL_CHAIN(field))
>>>> +    {
>>>> +      gcc_assert (field != NULL_TREE);
>>>> +    }
>>>> +
>>>> +  return field;
>>>> +}
>>>> +
>>>> +/* Create a new static constructor that calls __cpu_indicator_init ()
>>>> +   function defined in libgcc/config/i386-cpuinfo.c which runs cpuid
>>>> +   to figure out the type of the target. */
>>>> +
>>>> +static tree
>>>> +make_constructor_to_get_target_type (const char *name)
>>>> +{
>>>> +  tree decl, type, t;
>>>> +  gimple_seq seq;
>>>> +  basic_block new_bb;
>>>> +  tree old_current_function_decl;
>>>> +
>>>> +  tree __cpu_indicator_int_decl;
>>>> +  gimple constructor_body;
>>>> +
>>>> +
>>>> +  type = build_function_type_list (void_type_node, NULL_TREE);
>>>> +
>>>> +  /* Make a call stmt to __cpu_indicator_init */
>>>> +  __cpu_indicator_int_decl = build_fn_decl ("__cpu_indicator_init", type);
>>>> +  constructor_body = gimple_build_call (__cpu_indicator_int_decl, 0);
>>>> +  DECL_EXTERNAL (__cpu_indicator_int_decl) = 1;
>>>> +
>>>> +  decl = build_fn_decl (name, type);
>>>> +
>>>> +  DECL_NAME (decl) = get_identifier (name);
>>>> +  SET_DECL_ASSEMBLER_NAME (decl, DECL_NAME (decl));
>>>> +  gcc_assert (cgraph_node (decl) != NULL);
>>>> +
>>>> +  TREE_USED (decl) = 1;
>>>> +  DECL_ARTIFICIAL (decl) = 1;
>>>> +  DECL_IGNORED_P (decl) = 0;
>>>> +  TREE_PUBLIC (decl) = 0;
>>>> +  DECL_UNINLINABLE (decl) = 1;
>>>> +  DECL_EXTERNAL (decl) = 0;
>>>> +  DECL_CONTEXT (decl) = NULL_TREE;
>>>> +  DECL_INITIAL (decl) = make_node (BLOCK);
>>>> +  DECL_STATIC_CONSTRUCTOR (decl) = 1;
>>>> +  TREE_READONLY (decl) = 0;
>>>> +  DECL_PURE_P (decl) = 0;
>>>> +
>>>> +  /* This is a comdat. */
>>>> +  make_decl_one_only (decl, DECL_ASSEMBLER_NAME (decl));
>>>> +
>>>> +  /* Build result decl and add to function_decl. */
>>>> +  t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE, void_type_node);
>>>> +  DECL_ARTIFICIAL (t) = 1;
>>>> +  DECL_IGNORED_P (t) = 1;
>>>> +  DECL_RESULT (decl) = t;
>>>> +
>>>> +  gimplify_function_tree (decl);
>>>> +
>>>> +  /* Build CFG for this function. */
>>>> +
>>>> +  old_current_function_decl = current_function_decl;
>>>> +  push_cfun (DECL_STRUCT_FUNCTION (decl));
>>>> +  current_function_decl = decl;
>>>> +  init_empty_tree_cfg_for_function (DECL_STRUCT_FUNCTION (decl));
>>>> +  cfun->curr_properties |=
>>>> +    (PROP_gimple_lcf | PROP_gimple_leh | PROP_cfg | PROP_referenced_vars |
>>>> +     PROP_ssa);
>>>> +  new_bb = create_empty_bb (ENTRY_BLOCK_PTR);
>>>> +  make_edge (ENTRY_BLOCK_PTR, new_bb, EDGE_FALLTHRU);
>>>> +
>>>> +  /* XXX: Not sure if the edge commented below is necessary.  If I add this
>>>> +     edge, it fails in gimple_verify_flow_info in tree-cfg.c in condition :
>>>> +     " if (e->flags & EDGE_FALLTHRU)"
>>>> +     during -fprofile-generate.
>>>> +     Otherwise, it is fine.  Deleting this edge does not break anything.
>>>> +     Commenting this so that it is clear I am intentionally not doing this.*/
>>>> +  /* make_edge (new_bb, EXIT_BLOCK_PTR, EDGE_FALLTHRU); */
>>>> +
>>>> +  seq = gimple_seq_alloc_with_stmt (constructor_body);
>>>> +
>>>> +  set_bb_seq (new_bb, seq);
>>>> +  gimple_set_bb (constructor_body, new_bb);
>>>> +
>>>> +  /* Set the lexical block of the constructor body. Fails the inliner
>>>> +     other wise. */
>>>> +  gimple_set_block (constructor_body, DECL_INITIAL (decl));
>>>> +
>>>> +  /* This call is very important if this pass runs when the IR is in
>>>> +     SSA form.  It breaks things in strange ways otherwise. */
>>>> +  init_tree_ssa (DECL_STRUCT_FUNCTION (decl));
>>>> +  /* add_referenced_var (version_selector_var); */
>>>> +
>>>> +  cgraph_add_new_function (decl, true);
>>>> +  cgraph_call_function_insertion_hooks (cgraph_node (decl));
>>>> +  cgraph_mark_needed_node (cgraph_node (decl));
>>>> +
>>>> +  pop_cfun ();
>>>> +  current_function_decl = old_current_function_decl;
>>>> +  return decl;
>>>> +}
>>>> +
>>>> +/* FNDECL is a __builtin_target_* call that is folded into an integer defined
>>>> +   in libgcc/config/i386/i386-cpuinfo.c */
>>>> +
>>>> +static tree
>>>> +fold_builtin_target (tree fndecl)
>>>> +{
>>>> +  /* This is the order of bit-fields in __processor_features in
>>>> +     i386-cpuinfo.c */
>>>> +  enum processor_features
>>>> +  {
>>>> +    F_CMOV = 0,
>>>> +    F_MMX,
>>>> +    F_POPCNT,
>>>> +    F_SSE,
>>>> +    F_SSE2,
>>>> +    F_SSE3,
>>>> +    F_SSSE3,
>>>> +    F_SSE4_1,
>>>> +    F_SSE4_2,
>>>> +    F_MAX
>>>> +  };
>>>> +
>>>> +  /* This is the order of bit-fields in __processor_model in
>>>> +     i386-cpuinfo.c */
>>>> +  enum processor_model
>>>> +  {
>>>> +    M_AMD = 0,
>>>> +    M_INTEL,
>>>> +    M_COREI7_NEHALEM,
>>>> +    M_COREI7_WESTMERE,
>>>> +    M_COREI7_SANDYBRIDGE,
>>>> +    M_AMDFAM10_BARCELONA,
>>>> +    M_AMDFAM10_SHANGHAI,
>>>> +    M_AMDFAM10_ISTANBUL,
>>>> +    M_MAX
>>>> +  };
>>>> +
>>>> +  static tree __processor_features_type = NULL_TREE;
>>>> +  static tree __cpu_features_var = NULL_TREE;
>>>> +  static tree __processor_model_type = NULL_TREE;
>>>> +  static tree __cpu_model_var = NULL_TREE;
>>>> +  static tree ctor_decl = NULL_TREE;
>>>> +  static tree field;
>>>> +  static tree which_struct;
>>>> +
>>>> +  /* Make a call to __cpu_indicatior_init in a constructor.
>>>> +     Function __cpu_indicator_init is defined in i386-cpuinfo.c. */
>>>> +  if (ctor_decl == NULL_TREE)
>>>> +   ctor_decl = make_constructor_to_get_target_type
>>>> +               ("__cpu_indicator_init_ctor");
>>>> +
>>>> +  if (__processor_features_type == NULL_TREE)
>>>> +    __processor_features_type = build_struct_with_one_bit_fields (F_MAX,
>>>> +                                 "__processor_features");
>>>> +
>>>> +  if (__processor_model_type == NULL_TREE)
>>>> +    __processor_model_type = build_struct_with_one_bit_fields (M_MAX,
>>>> +                                 "__processor_model");
>>>> +
>>>> +  if (__cpu_features_var == NULL_TREE)
>>>> +    __cpu_features_var = make_var_decl (__processor_features_type,
>>>> +                                       "__cpu_features");
>>>> +
>>>> +  if (__cpu_model_var == NULL_TREE)
>>>> +    __cpu_model_var = make_var_decl (__processor_model_type,
>>>> +                                    "__cpu_model");
>>>> +
>>>> +  /* Look at fndecl code to identify the field requested. */
>>>> +  switch (DECL_FUNCTION_CODE (fndecl))
>>>> +    {
>>>> +    case BUILT_IN_TARGET_SUPPORTS_CMOV:
>>>> +      field = get_field_from_struct (__processor_features_type, F_CMOV);
>>>> +      which_struct = __cpu_features_var;
>>>> +      break;
>>>> +    case BUILT_IN_TARGET_SUPPORTS_MMX:
>>>> +      field = get_field_from_struct (__processor_features_type, F_MMX);
>>>> +      which_struct = __cpu_features_var;
>>>> +      break;
>>>> +    case BUILT_IN_TARGET_SUPPORTS_POPCOUNT:
>>>> +      field = get_field_from_struct (__processor_features_type, F_POPCNT);
>>>> +      which_struct = __cpu_features_var;
>>>> +      break;
>>>> +    case BUILT_IN_TARGET_SUPPORTS_SSE:
>>>> +      field = get_field_from_struct (__processor_features_type, F_SSE);
>>>> +      which_struct = __cpu_features_var;
>>>> +      break;
>>>> +    case BUILT_IN_TARGET_SUPPORTS_SSE2:
>>>> +      field = get_field_from_struct (__processor_features_type, F_SSE2);
>>>> +      which_struct = __cpu_features_var;
>>>> +      break;
>>>> +    case BUILT_IN_TARGET_SUPPORTS_SSE3:
>>>> +      field = get_field_from_struct (__processor_features_type, F_SSE3);
>>>> +      which_struct = __cpu_features_var;
>>>> +      break;
>>>> +    case BUILT_IN_TARGET_SUPPORTS_SSSE3:
>>>> +      field = get_field_from_struct (__processor_features_type, F_SSE3);
>>>> +      which_struct = __cpu_features_var;
>>>> +      break;
>>>> +    case BUILT_IN_TARGET_SUPPORTS_SSE4_1:
>>>> +      field = get_field_from_struct (__processor_features_type, F_SSE4_1);
>>>> +      which_struct = __cpu_features_var;
>>>> +      break;
>>>> +    case BUILT_IN_TARGET_SUPPORTS_SSE4_2:
>>>> +      field = get_field_from_struct (__processor_features_type, F_SSE4_2);
>>>> +      which_struct = __cpu_features_var;
>>>> +      break;
>>>> +    case BUILT_IN_TARGET_IS_AMD:
>>>> +      field = get_field_from_struct (__processor_model_type, M_AMD);;
>>>> +      which_struct = __cpu_model_var;
>>>> +      break;
>>>> +    case BUILT_IN_TARGET_IS_INTEL:
>>>> +      field = get_field_from_struct (__processor_model_type, M_INTEL);;
>>>> +      which_struct = __cpu_model_var;
>>>> +      break;
>>>> +    case BUILT_IN_TARGET_IS_COREI7_NEHALEM:
>>>> +      field = get_field_from_struct (__processor_model_type, M_COREI7_NEHALEM);;
>>>> +      which_struct = __cpu_model_var;
>>>> +      break;
>>>> +    case BUILT_IN_TARGET_IS_COREI7_WESTMERE:
>>>> +      field = get_field_from_struct (__processor_model_type, M_COREI7_WESTMERE);;
>>>> +      which_struct = __cpu_model_var;
>>>> +      break;
>>>> +    case BUILT_IN_TARGET_IS_COREI7_SANDYBRIDGE:
>>>> +      field = get_field_from_struct (__processor_model_type, M_COREI7_SANDYBRIDGE);;
>>>> +      which_struct = __cpu_model_var;
>>>> +      break;
>>>> +    case BUILT_IN_TARGET_IS_AMDFAM10_BARCELONA:
>>>> +      field = get_field_from_struct (__processor_model_type, M_AMDFAM10_BARCELONA);;
>>>> +      which_struct = __cpu_model_var;
>>>> +      break;
>>>> +    case BUILT_IN_TARGET_IS_AMDFAM10_SHANGHAI:
>>>> +      field = get_field_from_struct (__processor_model_type, M_AMDFAM10_SHANGHAI);;
>>>> +      which_struct = __cpu_model_var;
>>>> +      break;
>>>> +    case BUILT_IN_TARGET_IS_AMDFAM10_ISTANBUL:
>>>> +      field = get_field_from_struct (__processor_model_type, M_AMDFAM10_ISTANBUL);;
>>>> +      which_struct = __cpu_model_var;
>>>> +      break;
>>>> +    default:
>>>> +      return NULL_TREE;
>>>> +    }
>>>> +
>>>> +  return build3 (COMPONENT_REF, TREE_TYPE (field), which_struct, field, NULL_TREE);
>>>> +}
>>>> +
>>>> +/* Folds __builtin_target_* builtins. */
>>>> +
>>>> +static tree
>>>> +ix86_fold_builtin (tree fndecl, int n_args ATTRIBUTE_UNUSED,
>>>> +                   tree *args ATTRIBUTE_UNUSED, bool ignore ATTRIBUTE_UNUSED)
>>>> +{
>>>> +  const char *decl_name = IDENTIFIER_POINTER (DECL_NAME (fndecl));
>>>> +  if (DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL
>>>> +      && strstr(decl_name, "__builtin_target") != NULL)
>>>> +    return fold_builtin_target (fndecl);
>>>> +
>>>> +  return NULL_TREE;
>>>> +}
>>>> +
>>>>  /* Worker function for TARGET_SETUP_INCOMING_VARARGS.  */
>>>>
>>>>  static void
>>>> @@ -35097,6 +35431,9 @@ ix86_autovectorize_vector_sizes (void)
>>>>  #undef TARGET_BUILD_BUILTIN_VA_LIST
>>>>  #define TARGET_BUILD_BUILTIN_VA_LIST ix86_build_builtin_va_list
>>>>
>>>> +#undef TARGET_FOLD_BUILTIN
>>>> +#define TARGET_FOLD_BUILTIN ix86_fold_builtin
>>>> +
>>>>  #undef TARGET_ENUM_VA_LIST_P
>>>>  #define TARGET_ENUM_VA_LIST_P ix86_enum_va_list
>>>>
>>>>
>>>> --
>>>> This patch is available for review at http://codereview.appspot.com/4893046
>>>>
>>>
>>
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
  2011-08-18 21:15       ` Sriraman Tallam
@ 2011-08-18 21:53         ` Richard Henderson
  2011-08-18 22:49           ` Sriraman Tallam
  0 siblings, 1 reply; 50+ messages in thread
From: Richard Henderson @ 2011-08-18 21:53 UTC (permalink / raw)
  To: Sriraman Tallam; +Cc: Richard Guenther, reply, gcc-patches

On 08/18/2011 10:25 AM, Sriraman Tallam wrote:
> Ok, so two things. I create the constructor as a comdat. So, it is
> created by gcc in every module but at link time only one copy will be
> kept. So, it is going to be called only once and that is not a
> problem.

Err, no.  You'll wind up with one copy of the constructor
which will be called N times.

The comdat applies to the function body, not the data in
the .ctors section.


r~

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
  2011-08-18 21:53         ` Richard Henderson
@ 2011-08-18 22:49           ` Sriraman Tallam
  2011-08-19  0:30             ` Richard Henderson
  0 siblings, 1 reply; 50+ messages in thread
From: Sriraman Tallam @ 2011-08-18 22:49 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Richard Guenther, reply, gcc-patches

On Thu, Aug 18, 2011 at 2:15 PM, Richard Henderson <rth@redhat.com> wrote:
> On 08/18/2011 10:25 AM, Sriraman Tallam wrote:
>> Ok, so two things. I create the constructor as a comdat. So, it is
>> created by gcc in every module but at link time only one copy will be
>> kept. So, it is going to be called only once and that is not a
>> problem.
>
> Err, no.  You'll wind up with one copy of the constructor
> which will be called N times.
>
> The comdat applies to the function body, not the data in
> the .ctors section.

Oh!, right, sorry. So, the only available option now is to mark it as
a constructor in libgcc.

Thanks.
-Sri.

>
>
> r~
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
  2011-08-18 22:49           ` Sriraman Tallam
@ 2011-08-19  0:30             ` Richard Henderson
  2011-08-19 11:55               ` Richard Guenther
  0 siblings, 1 reply; 50+ messages in thread
From: Richard Henderson @ 2011-08-19  0:30 UTC (permalink / raw)
  To: Sriraman Tallam; +Cc: Richard Guenther, reply, gcc-patches

On 08/18/2011 02:51 PM, Sriraman Tallam wrote:
> Oh!, right, sorry. So, the only available option now is to mark it as
> a constructor in libgcc.

Or call it explicitly from the out-of-line tests.

The thing is, if you intend to use this from ifunc tests, I believe
that these can run *extremely* early.  E.g. LD_BIND_NOW=1 will run
these while relocating the entire application, and therefore before
any of DT_INIT (aka .ctors), DT_INIT_ARRAY, or DT_PREINIT_ARRAY.


r~

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
  2011-08-19  0:30             ` Richard Henderson
@ 2011-08-19 11:55               ` Richard Guenther
  2011-08-19 12:11                 ` Jakub Jelinek
  2011-08-20 21:48                 ` Richard Henderson
  0 siblings, 2 replies; 50+ messages in thread
From: Richard Guenther @ 2011-08-19 11:55 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Sriraman Tallam, reply, gcc-patches

On Fri, Aug 19, 2011 at 12:08 AM, Richard Henderson <rth@redhat.com> wrote:
> On 08/18/2011 02:51 PM, Sriraman Tallam wrote:
>> Oh!, right, sorry. So, the only available option now is to mark it as
>> a constructor in libgcc.
>
> Or call it explicitly from the out-of-line tests.
>
> The thing is, if you intend to use this from ifunc tests, I believe
> that these can run *extremely* early.  E.g. LD_BIND_NOW=1 will run
> these while relocating the entire application, and therefore before
> any of DT_INIT (aka .ctors), DT_INIT_ARRAY, or DT_PREINIT_ARRAY.

So make sure that __cpu_indicator initially has a conservative correct
value?  I'd still prefer the constructor-in-libgcc option - if only because
then the compiler-side is much simplified.

Richard.

>
> r~
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
  2011-08-19 11:55               ` Richard Guenther
@ 2011-08-19 12:11                 ` Jakub Jelinek
  2011-08-20 21:48                 ` Richard Henderson
  1 sibling, 0 replies; 50+ messages in thread
From: Jakub Jelinek @ 2011-08-19 12:11 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Richard Henderson, Sriraman Tallam, reply, gcc-patches

On Fri, Aug 19, 2011 at 11:04:11AM +0200, Richard Guenther wrote:
> On Fri, Aug 19, 2011 at 12:08 AM, Richard Henderson <rth@redhat.com> wrote:
> > On 08/18/2011 02:51 PM, Sriraman Tallam wrote:
> >> Oh!, right, sorry. So, the only available option now is to mark it as
> >> a constructor in libgcc.
> >
> > Or call it explicitly from the out-of-line tests.
> >
> > The thing is, if you intend to use this from ifunc tests, I believe
> > that these can run *extremely* early.  E.g. LD_BIND_NOW=1 will run
> > these while relocating the entire application, and therefore before
> > any of DT_INIT (aka .ctors), DT_INIT_ARRAY, or DT_PREINIT_ARRAY.
> 
> So make sure that __cpu_indicator initially has a conservative correct
> value?  I'd still prefer the constructor-in-libgcc option - if only because
> then the compiler-side is much simplified.

Note that exporting data from shared libraries and using those in binaries
often leads to copy relocations (which are possibly still not applied when
calling IFUNC functions with LD_BIND_NOW=1).  Similarly calling a function
in a different shared library might be a problem from IFUNC handler.

	Jakub

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
  2011-08-19 11:55               ` Richard Guenther
  2011-08-19 12:11                 ` Jakub Jelinek
@ 2011-08-20 21:48                 ` Richard Henderson
  2011-08-20 22:02                   ` H.J. Lu
  2011-08-21 11:05                   ` Richard Guenther
  1 sibling, 2 replies; 50+ messages in thread
From: Richard Henderson @ 2011-08-20 21:48 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Sriraman Tallam, reply, gcc-patches

On 08/19/2011 02:04 AM, Richard Guenther wrote:
> So make sure that __cpu_indicator initially has a conservative correct
> value?  I'd still prefer the constructor-in-libgcc option - if only because
> then the compiler-side is much simplified.
> 

Err, I thought __cpu_indicator was a function, not data.

I think we need to discuss this more...


r~

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
  2011-08-20 21:48                 ` Richard Henderson
@ 2011-08-20 22:02                   ` H.J. Lu
  2011-08-21 11:05                   ` Richard Guenther
  1 sibling, 0 replies; 50+ messages in thread
From: H.J. Lu @ 2011-08-20 22:02 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Richard Guenther, Sriraman Tallam, reply, gcc-patches

On Sat, Aug 20, 2011 at 2:02 PM, Richard Henderson <rth@redhat.com> wrote:
> On 08/19/2011 02:04 AM, Richard Guenther wrote:
>> So make sure that __cpu_indicator initially has a conservative correct
>> value?  I'd still prefer the constructor-in-libgcc option - if only because
>> then the compiler-side is much simplified.
>>
>
> Err, I thought __cpu_indicator was a function, not data.
>
> I think we need to discuss this more...
>

In glibc, we export function __get_cpu_features as a private interface
used for IFUNC.  We can do something similar with libgcc very carefully.


-- 
H.J.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
  2011-08-20 21:48                 ` Richard Henderson
  2011-08-20 22:02                   ` H.J. Lu
@ 2011-08-21 11:05                   ` Richard Guenther
  2011-08-22 14:27                     ` Michael Matz
  1 sibling, 1 reply; 50+ messages in thread
From: Richard Guenther @ 2011-08-21 11:05 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Sriraman Tallam, reply, gcc-patches

On Sat, Aug 20, 2011 at 11:02 PM, Richard Henderson <rth@redhat.com> wrote:
> On 08/19/2011 02:04 AM, Richard Guenther wrote:
>> So make sure that __cpu_indicator initially has a conservative correct
>> value?  I'd still prefer the constructor-in-libgcc option - if only because
>> then the compiler-side is much simplified.
>>
>
> Err, I thought __cpu_indicator was a function, not data.
>
> I think we need to discuss this more...

Oh, I thought it was data initialized by the constructor ...

>
> r~
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
  2011-08-21 11:05                   ` Richard Guenther
@ 2011-08-22 14:27                     ` Michael Matz
  2011-08-22 14:33                       ` H.J. Lu
  0 siblings, 1 reply; 50+ messages in thread
From: Michael Matz @ 2011-08-22 14:27 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Richard Henderson, Sriraman Tallam, reply, gcc-patches

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1541 bytes --]

Hi,

On Sun, 21 Aug 2011, Richard Guenther wrote:

> On Sat, Aug 20, 2011 at 11:02 PM, Richard Henderson <rth@redhat.com> wrote:
> > On 08/19/2011 02:04 AM, Richard Guenther wrote:
> >> So make sure that __cpu_indicator initially has a conservative correct
> >> value?  I'd still prefer the constructor-in-libgcc option - if only because
> >> then the compiler-side is much simplified.
> >>
> >
> > Err, I thought __cpu_indicator was a function, not data.
> >
> > I think we need to discuss this more...
> 
> Oh, I thought it was data initialized by the constructor ...

Sriramans patch right now has a function __cpu_indicator_init which is 
called from (adhoc constructed) ctors and that initializes variables
__cpu_model and __cpu_features ;-)  There's no __cpu_indicator symbol :)

I think the whole initializer function and the associated data blobs have 
to sit in static libgcc and be hidden.  By that all shared modules 
will have their own copies of the model and features (and the initializer 
function) so there won't be issues with copy relocs, or cross shared lib 
calls while relocating the modules.  Dynamically they will contain the 
same data always, but it's not many bytes, and only modules making use of 
this facility will pay it.

The initializer function has to be callable from pre-.init contexts, e.g.  
ifunc dispatchers.  And to make life easier there should be one ctor 
function calling this initializer function too, so that normal code can 
rely on it being already called saving one check.


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
  2011-08-22 14:27                     ` Michael Matz
@ 2011-08-22 14:33                       ` H.J. Lu
  2011-08-22 17:11                         ` Michael Matz
  0 siblings, 1 reply; 50+ messages in thread
From: H.J. Lu @ 2011-08-22 14:33 UTC (permalink / raw)
  To: Michael Matz
  Cc: Richard Guenther, Richard Henderson, Sriraman Tallam, reply, gcc-patches

On Mon, Aug 22, 2011 at 7:07 AM, Michael Matz <matz@suse.de> wrote:
> Hi,
>
> On Sun, 21 Aug 2011, Richard Guenther wrote:
>
>> On Sat, Aug 20, 2011 at 11:02 PM, Richard Henderson <rth@redhat.com> wrote:
>> > On 08/19/2011 02:04 AM, Richard Guenther wrote:
>> >> So make sure that __cpu_indicator initially has a conservative correct
>> >> value?  I'd still prefer the constructor-in-libgcc option - if only because
>> >> then the compiler-side is much simplified.
>> >>
>> >
>> > Err, I thought __cpu_indicator was a function, not data.
>> >
>> > I think we need to discuss this more...
>>
>> Oh, I thought it was data initialized by the constructor ...
>
> Sriramans patch right now has a function __cpu_indicator_init which is
> called from (adhoc constructed) ctors and that initializes variables
> __cpu_model and __cpu_features ;-)  There's no __cpu_indicator symbol :)
>
> I think the whole initializer function and the associated data blobs have
> to sit in static libgcc and be hidden.  By that all shared modules
> will have their own copies of the model and features (and the initializer
> function) so there won't be issues with copy relocs, or cross shared lib
> calls while relocating the modules.  Dynamically they will contain the
> same data always, but it's not many bytes, and only modules making use of
> this facility will pay it.
>
> The initializer function has to be callable from pre-.init contexts, e.g.
> ifunc dispatchers.  And to make life easier there should be one ctor
> function calling this initializer function too, so that normal code can
> rely on it being already called saving one check.
>

It sounds more complicated than necessary.  Why not just do it
on demand like glibc does?

-- 
H.J.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
  2011-08-22 14:33                       ` H.J. Lu
@ 2011-08-22 17:11                         ` Michael Matz
  2011-08-22 17:18                           ` H.J. Lu
  0 siblings, 1 reply; 50+ messages in thread
From: Michael Matz @ 2011-08-22 17:11 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Richard Guenther, Richard Henderson, Sriraman Tallam, reply, gcc-patches

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1437 bytes --]

Hi,

On Mon, 22 Aug 2011, H.J. Lu wrote:

> >> Oh, I thought it was data initialized by the constructor ...
> >
> > Sriramans patch right now has a function __cpu_indicator_init which is 
> > called from (adhoc constructed) ctors and that initializes variables
> > __cpu_model and __cpu_features ;-)  There's no __cpu_indicator symbol :)
> >
> > I think the whole initializer function and the associated data blobs have
> > to sit in static libgcc and be hidden.  By that all shared modules
> > will have their own copies of the model and features (and the initializer
> > function) so there won't be issues with copy relocs, or cross shared lib
> > calls while relocating the modules.  Dynamically they will contain the
> > same data always, but it's not many bytes, and only modules making use of
> > this facility will pay it.
> >
> > The initializer function has to be callable from pre-.init contexts, e.g.
> > ifunc dispatchers.  And to make life easier there should be one ctor
> > function calling this initializer function too, so that normal code can
> > rely on it being already called saving one check.
> >
> 
> It sounds more complicated than necessary.  Why not just do it
> on demand like glibc does?

Ehm, the only difference would be to not have a ctor in libgcc that looks 
like so:

void __attribute__((constructor)) bla(void)
{
  __cpu_indicator_init ();
}

I don't see any complication.?


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
  2011-08-22 17:11                         ` Michael Matz
@ 2011-08-22 17:18                           ` H.J. Lu
  2011-08-22 19:02                             ` Sriraman Tallam
                                               ` (2 more replies)
  0 siblings, 3 replies; 50+ messages in thread
From: H.J. Lu @ 2011-08-22 17:18 UTC (permalink / raw)
  To: Michael Matz
  Cc: Richard Guenther, Richard Henderson, Sriraman Tallam, reply, gcc-patches

On Mon, Aug 22, 2011 at 8:56 AM, Michael Matz <matz@suse.de> wrote:
> Hi,
>
> On Mon, 22 Aug 2011, H.J. Lu wrote:
>
>> >> Oh, I thought it was data initialized by the constructor ...
>> >
>> > Sriramans patch right now has a function __cpu_indicator_init which is
>> > called from (adhoc constructed) ctors and that initializes variables
>> > __cpu_model and __cpu_features ;-)  There's no __cpu_indicator symbol :)
>> >
>> > I think the whole initializer function and the associated data blobs have
>> > to sit in static libgcc and be hidden.  By that all shared modules
>> > will have their own copies of the model and features (and the initializer
>> > function) so there won't be issues with copy relocs, or cross shared lib
>> > calls while relocating the modules.  Dynamically they will contain the
>> > same data always, but it's not many bytes, and only modules making use of
>> > this facility will pay it.
>> >
>> > The initializer function has to be callable from pre-.init contexts, e.g.
>> > ifunc dispatchers.  And to make life easier there should be one ctor
>> > function calling this initializer function too, so that normal code can
>> > rely on it being already called saving one check.
>> >
>>
>> It sounds more complicated than necessary.  Why not just do it
>> on demand like glibc does?
>
> Ehm, the only difference would be to not have a ctor in libgcc that looks
> like so:
>
> void __attribute__((constructor)) bla(void)
> {
>  __cpu_indicator_init ();
> }
>
> I don't see any complication.?
>

Order of constructors.  A constructor may call functions
which use __cpu_indicator.

-- 
H.J.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
  2011-08-22 17:18                           ` H.J. Lu
@ 2011-08-22 19:02                             ` Sriraman Tallam
  2011-08-22 19:26                               ` H.J. Lu
  2011-08-22 20:49                             ` Richard Guenther
  2011-08-23 12:33                             ` Michael Matz
  2 siblings, 1 reply; 50+ messages in thread
From: Sriraman Tallam @ 2011-08-22 19:02 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Michael Matz, Richard Guenther, Richard Henderson, reply, gcc-patches

On Mon, Aug 22, 2011 at 9:02 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Mon, Aug 22, 2011 at 8:56 AM, Michael Matz <matz@suse.de> wrote:
>> Hi,
>>
>> On Mon, 22 Aug 2011, H.J. Lu wrote:
>>
>>> >> Oh, I thought it was data initialized by the constructor ...
>>> >
>>> > Sriramans patch right now has a function __cpu_indicator_init which is
>>> > called from (adhoc constructed) ctors and that initializes variables
>>> > __cpu_model and __cpu_features ;-)  There's no __cpu_indicator symbol :)
>>> >
>>> > I think the whole initializer function and the associated data blobs have
>>> > to sit in static libgcc and be hidden.  By that all shared modules
>>> > will have their own copies of the model and features (and the initializer
>>> > function) so there won't be issues with copy relocs, or cross shared lib
>>> > calls while relocating the modules.  Dynamically they will contain the
>>> > same data always, but it's not many bytes, and only modules making use of
>>> > this facility will pay it.
>>> >
>>> > The initializer function has to be callable from pre-.init contexts, e.g.
>>> > ifunc dispatchers.  And to make life easier there should be one ctor
>>> > function calling this initializer function too, so that normal code can
>>> > rely on it being already called saving one check.
>>> >
>>>
>>> It sounds more complicated than necessary.  Why not just do it
>>> on demand like glibc does?
>>
>> Ehm, the only difference would be to not have a ctor in libgcc that looks
>> like so:
>>
>> void __attribute__((constructor)) bla(void)
>> {
>>  __cpu_indicator_init ();
>> }
>>
>> I don't see any complication.?
>>
>
> Order of constructors.  A constructor may call functions
> which use __cpu_indicator.

I have a suggestion that is a hybrid of the proposed solutions here:

1) Make a constructor in every module that calls
"__cpu_indicator_init" and make it to be the first constructor to run.
 Will this solve the ordering problem?
2) Change __cpu_indicator_init to run only once by using a variable to
check if it has been run before.

So, each module's constructor will call __cpu_indicator_init but the
CPUID insns are only done once. I also avoid the extra overhead of
having to check if "__cpu_indicator_init" is called from within the
binary. Will this work?

Thanks,
-Sri.

>
> --
> H.J.
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
  2011-08-22 19:02                             ` Sriraman Tallam
@ 2011-08-22 19:26                               ` H.J. Lu
  2011-08-22 19:44                                 ` Sriraman Tallam
  0 siblings, 1 reply; 50+ messages in thread
From: H.J. Lu @ 2011-08-22 19:26 UTC (permalink / raw)
  To: Sriraman Tallam
  Cc: Michael Matz, Richard Guenther, Richard Henderson, reply, gcc-patches

On Mon, Aug 22, 2011 at 11:50 AM, Sriraman Tallam <tmsriram@google.com> wrote:
> On Mon, Aug 22, 2011 at 9:02 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Mon, Aug 22, 2011 at 8:56 AM, Michael Matz <matz@suse.de> wrote:
>>> Hi,
>>>
>>> On Mon, 22 Aug 2011, H.J. Lu wrote:
>>>
>>>> >> Oh, I thought it was data initialized by the constructor ...
>>>> >
>>>> > Sriramans patch right now has a function __cpu_indicator_init which is
>>>> > called from (adhoc constructed) ctors and that initializes variables
>>>> > __cpu_model and __cpu_features ;-)  There's no __cpu_indicator symbol :)
>>>> >
>>>> > I think the whole initializer function and the associated data blobs have
>>>> > to sit in static libgcc and be hidden.  By that all shared modules
>>>> > will have their own copies of the model and features (and the initializer
>>>> > function) so there won't be issues with copy relocs, or cross shared lib
>>>> > calls while relocating the modules.  Dynamically they will contain the
>>>> > same data always, but it's not many bytes, and only modules making use of
>>>> > this facility will pay it.
>>>> >
>>>> > The initializer function has to be callable from pre-.init contexts, e.g.
>>>> > ifunc dispatchers.  And to make life easier there should be one ctor
>>>> > function calling this initializer function too, so that normal code can
>>>> > rely on it being already called saving one check.
>>>> >
>>>>
>>>> It sounds more complicated than necessary.  Why not just do it
>>>> on demand like glibc does?
>>>
>>> Ehm, the only difference would be to not have a ctor in libgcc that looks
>>> like so:
>>>
>>> void __attribute__((constructor)) bla(void)
>>> {
>>>  __cpu_indicator_init ();
>>> }
>>>
>>> I don't see any complication.?
>>>
>>
>> Order of constructors.  A constructor may call functions
>> which use __cpu_indicator.
>
> I have a suggestion that is a hybrid of the proposed solutions here:
>
> 1) Make a constructor in every module that calls
> "__cpu_indicator_init" and make it to be the first constructor to run.
>  Will this solve the ordering problem?
> 2) Change __cpu_indicator_init to run only once by using a variable to
> check if it has been run before.
>
> So, each module's constructor will call __cpu_indicator_init but the
> CPUID insns are only done once. I also avoid the extra overhead of
> having to check if "__cpu_indicator_init" is called from within the
> binary. Will this work?
>

Please make it simple like

if __cpu_indicator is not initialized then
    call __cpu_indicator_init
fi

use  __cpu_indicator


-- 
H.J.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
  2011-08-22 19:26                               ` H.J. Lu
@ 2011-08-22 19:44                                 ` Sriraman Tallam
  0 siblings, 0 replies; 50+ messages in thread
From: Sriraman Tallam @ 2011-08-22 19:44 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Michael Matz, Richard Guenther, Richard Henderson, reply, gcc-patches

On Mon, Aug 22, 2011 at 11:58 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Mon, Aug 22, 2011 at 11:50 AM, Sriraman Tallam <tmsriram@google.com> wrote:
>> On Mon, Aug 22, 2011 at 9:02 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Mon, Aug 22, 2011 at 8:56 AM, Michael Matz <matz@suse.de> wrote:
>>>> Hi,
>>>>
>>>> On Mon, 22 Aug 2011, H.J. Lu wrote:
>>>>
>>>>> >> Oh, I thought it was data initialized by the constructor ...
>>>>> >
>>>>> > Sriramans patch right now has a function __cpu_indicator_init which is
>>>>> > called from (adhoc constructed) ctors and that initializes variables
>>>>> > __cpu_model and __cpu_features ;-)  There's no __cpu_indicator symbol :)
>>>>> >
>>>>> > I think the whole initializer function and the associated data blobs have
>>>>> > to sit in static libgcc and be hidden.  By that all shared modules
>>>>> > will have their own copies of the model and features (and the initializer
>>>>> > function) so there won't be issues with copy relocs, or cross shared lib
>>>>> > calls while relocating the modules.  Dynamically they will contain the
>>>>> > same data always, but it's not many bytes, and only modules making use of
>>>>> > this facility will pay it.
>>>>> >
>>>>> > The initializer function has to be callable from pre-.init contexts, e.g.
>>>>> > ifunc dispatchers.  And to make life easier there should be one ctor
>>>>> > function calling this initializer function too, so that normal code can
>>>>> > rely on it being already called saving one check.
>>>>> >
>>>>>
>>>>> It sounds more complicated than necessary.  Why not just do it
>>>>> on demand like glibc does?
>>>>
>>>> Ehm, the only difference would be to not have a ctor in libgcc that looks
>>>> like so:
>>>>
>>>> void __attribute__((constructor)) bla(void)
>>>> {
>>>>  __cpu_indicator_init ();
>>>> }
>>>>
>>>> I don't see any complication.?
>>>>
>>>
>>> Order of constructors.  A constructor may call functions
>>> which use __cpu_indicator.
>>
>> I have a suggestion that is a hybrid of the proposed solutions here:
>>
>> 1) Make a constructor in every module that calls
>> "__cpu_indicator_init" and make it to be the first constructor to run.
>>  Will this solve the ordering problem?
>> 2) Change __cpu_indicator_init to run only once by using a variable to
>> check if it has been run before.
>>
>> So, each module's constructor will call __cpu_indicator_init but the
>> CPUID insns are only done once. I also avoid the extra overhead of
>> having to check if "__cpu_indicator_init" is called from within the
>> binary. Will this work?
>>
>
> Please make it simple like
>
> if __cpu_indicator is not initialized then
>    call __cpu_indicator_init
> fi
>
> use  __cpu_indicator
>

Will do, thanks.

-Sri.

>
> --
> H.J.
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
  2011-08-22 17:18                           ` H.J. Lu
  2011-08-22 19:02                             ` Sriraman Tallam
@ 2011-08-22 20:49                             ` Richard Guenther
  2011-08-22 20:55                               ` H.J. Lu
  2011-08-23 12:33                             ` Michael Matz
  2 siblings, 1 reply; 50+ messages in thread
From: Richard Guenther @ 2011-08-22 20:49 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Michael Matz, Richard Henderson, Sriraman Tallam, reply, gcc-patches

On Mon, Aug 22, 2011 at 6:02 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Mon, Aug 22, 2011 at 8:56 AM, Michael Matz <matz@suse.de> wrote:
>> Hi,
>>
>> On Mon, 22 Aug 2011, H.J. Lu wrote:
>>
>>> >> Oh, I thought it was data initialized by the constructor ...
>>> >
>>> > Sriramans patch right now has a function __cpu_indicator_init which is
>>> > called from (adhoc constructed) ctors and that initializes variables
>>> > __cpu_model and __cpu_features ;-)  There's no __cpu_indicator symbol :)
>>> >
>>> > I think the whole initializer function and the associated data blobs have
>>> > to sit in static libgcc and be hidden.  By that all shared modules
>>> > will have their own copies of the model and features (and the initializer
>>> > function) so there won't be issues with copy relocs, or cross shared lib
>>> > calls while relocating the modules.  Dynamically they will contain the
>>> > same data always, but it's not many bytes, and only modules making use of
>>> > this facility will pay it.
>>> >
>>> > The initializer function has to be callable from pre-.init contexts, e.g.
>>> > ifunc dispatchers.  And to make life easier there should be one ctor
>>> > function calling this initializer function too, so that normal code can
>>> > rely on it being already called saving one check.
>>> >
>>>
>>> It sounds more complicated than necessary.  Why not just do it
>>> on demand like glibc does?
>>
>> Ehm, the only difference would be to not have a ctor in libgcc that looks
>> like so:
>>
>> void __attribute__((constructor)) bla(void)
>> {
>>  __cpu_indicator_init ();
>> }
>>
>> I don't see any complication.?
>>
>
> Order of constructors.  A constructor may call functions
> which use __cpu_indicator.

As I said - make __cpu_indicator have a conservative
default value (zero).  It is irrelevant if constructors that
run before initializing __cpu_indicator run with the
default CPU capabilities.

Richard.

> --
> H.J.
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
  2011-08-22 20:49                             ` Richard Guenther
@ 2011-08-22 20:55                               ` H.J. Lu
  2011-08-22 21:22                                 ` Richard Guenther
  0 siblings, 1 reply; 50+ messages in thread
From: H.J. Lu @ 2011-08-22 20:55 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Michael Matz, Richard Henderson, Sriraman Tallam, reply, gcc-patches

On Mon, Aug 22, 2011 at 1:34 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Mon, Aug 22, 2011 at 6:02 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Mon, Aug 22, 2011 at 8:56 AM, Michael Matz <matz@suse.de> wrote:
>>> Hi,
>>>
>>> On Mon, 22 Aug 2011, H.J. Lu wrote:
>>>
>>>> >> Oh, I thought it was data initialized by the constructor ...
>>>> >
>>>> > Sriramans patch right now has a function __cpu_indicator_init which is
>>>> > called from (adhoc constructed) ctors and that initializes variables
>>>> > __cpu_model and __cpu_features ;-)  There's no __cpu_indicator symbol :)
>>>> >
>>>> > I think the whole initializer function and the associated data blobs have
>>>> > to sit in static libgcc and be hidden.  By that all shared modules
>>>> > will have their own copies of the model and features (and the initializer
>>>> > function) so there won't be issues with copy relocs, or cross shared lib
>>>> > calls while relocating the modules.  Dynamically they will contain the
>>>> > same data always, but it's not many bytes, and only modules making use of
>>>> > this facility will pay it.
>>>> >
>>>> > The initializer function has to be callable from pre-.init contexts, e.g.
>>>> > ifunc dispatchers.  And to make life easier there should be one ctor
>>>> > function calling this initializer function too, so that normal code can
>>>> > rely on it being already called saving one check.
>>>> >
>>>>
>>>> It sounds more complicated than necessary.  Why not just do it
>>>> on demand like glibc does?
>>>
>>> Ehm, the only difference would be to not have a ctor in libgcc that looks
>>> like so:
>>>
>>> void __attribute__((constructor)) bla(void)
>>> {
>>>  __cpu_indicator_init ();
>>> }
>>>
>>> I don't see any complication.?
>>>
>>
>> Order of constructors.  A constructor may call functions
>> which use __cpu_indicator.
>
> As I said - make __cpu_indicator have a conservative
> default value (zero).  It is irrelevant if constructors that
> run before initializing __cpu_indicator run with the
> default CPU capabilities.
>

If  IFUNC is used, this just disables IFUNC for those functions
called with the conservative default value since they are only
resolved once.


-- 
H.J.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
  2011-08-22 20:55                               ` H.J. Lu
@ 2011-08-22 21:22                                 ` Richard Guenther
  2011-08-22 21:42                                   ` H.J. Lu
  0 siblings, 1 reply; 50+ messages in thread
From: Richard Guenther @ 2011-08-22 21:22 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Michael Matz, Richard Henderson, Sriraman Tallam, reply, gcc-patches

On Mon, Aug 22, 2011 at 10:39 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Mon, Aug 22, 2011 at 1:34 PM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>> On Mon, Aug 22, 2011 at 6:02 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Mon, Aug 22, 2011 at 8:56 AM, Michael Matz <matz@suse.de> wrote:
>>>> Hi,
>>>>
>>>> On Mon, 22 Aug 2011, H.J. Lu wrote:
>>>>
>>>>> >> Oh, I thought it was data initialized by the constructor ...
>>>>> >
>>>>> > Sriramans patch right now has a function __cpu_indicator_init which is
>>>>> > called from (adhoc constructed) ctors and that initializes variables
>>>>> > __cpu_model and __cpu_features ;-)  There's no __cpu_indicator symbol :)
>>>>> >
>>>>> > I think the whole initializer function and the associated data blobs have
>>>>> > to sit in static libgcc and be hidden.  By that all shared modules
>>>>> > will have their own copies of the model and features (and the initializer
>>>>> > function) so there won't be issues with copy relocs, or cross shared lib
>>>>> > calls while relocating the modules.  Dynamically they will contain the
>>>>> > same data always, but it's not many bytes, and only modules making use of
>>>>> > this facility will pay it.
>>>>> >
>>>>> > The initializer function has to be callable from pre-.init contexts, e.g.
>>>>> > ifunc dispatchers.  And to make life easier there should be one ctor
>>>>> > function calling this initializer function too, so that normal code can
>>>>> > rely on it being already called saving one check.
>>>>> >
>>>>>
>>>>> It sounds more complicated than necessary.  Why not just do it
>>>>> on demand like glibc does?
>>>>
>>>> Ehm, the only difference would be to not have a ctor in libgcc that looks
>>>> like so:
>>>>
>>>> void __attribute__((constructor)) bla(void)
>>>> {
>>>>  __cpu_indicator_init ();
>>>> }
>>>>
>>>> I don't see any complication.?
>>>>
>>>
>>> Order of constructors.  A constructor may call functions
>>> which use __cpu_indicator.
>>
>> As I said - make __cpu_indicator have a conservative
>> default value (zero).  It is irrelevant if constructors that
>> run before initializing __cpu_indicator run with the
>> default CPU capabilities.
>>
>
> If  IFUNC is used, this just disables IFUNC for those functions
> called with the conservative default value since they are only
> resolved once.

Huh, well.  So what happens if you use __cpu_indicator from the
IFUNC selector function!?  Honestly, if we care about these
corner-cases why not make __cpu_indicator a hidden function
instead.

IMHO IFUNC selectors should simply do

if (!__cpu_indicator)
  __cpu_indicator_init ();

Richard.

>
> --
> H.J.
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
  2011-08-22 21:22                                 ` Richard Guenther
@ 2011-08-22 21:42                                   ` H.J. Lu
  2011-08-22 22:26                                     ` Richard Guenther
  0 siblings, 1 reply; 50+ messages in thread
From: H.J. Lu @ 2011-08-22 21:42 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Michael Matz, Richard Henderson, Sriraman Tallam, reply, gcc-patches

On Mon, Aug 22, 2011 at 1:46 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Mon, Aug 22, 2011 at 10:39 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Mon, Aug 22, 2011 at 1:34 PM, Richard Guenther
>> <richard.guenther@gmail.com> wrote:
>>> On Mon, Aug 22, 2011 at 6:02 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>> On Mon, Aug 22, 2011 at 8:56 AM, Michael Matz <matz@suse.de> wrote:
>>>>> Hi,
>>>>>
>>>>> On Mon, 22 Aug 2011, H.J. Lu wrote:
>>>>>
>>>>>> >> Oh, I thought it was data initialized by the constructor ...
>>>>>> >
>>>>>> > Sriramans patch right now has a function __cpu_indicator_init which is
>>>>>> > called from (adhoc constructed) ctors and that initializes variables
>>>>>> > __cpu_model and __cpu_features ;-)  There's no __cpu_indicator symbol :)
>>>>>> >
>>>>>> > I think the whole initializer function and the associated data blobs have
>>>>>> > to sit in static libgcc and be hidden.  By that all shared modules
>>>>>> > will have their own copies of the model and features (and the initializer
>>>>>> > function) so there won't be issues with copy relocs, or cross shared lib
>>>>>> > calls while relocating the modules.  Dynamically they will contain the
>>>>>> > same data always, but it's not many bytes, and only modules making use of
>>>>>> > this facility will pay it.
>>>>>> >
>>>>>> > The initializer function has to be callable from pre-.init contexts, e.g.
>>>>>> > ifunc dispatchers.  And to make life easier there should be one ctor
>>>>>> > function calling this initializer function too, so that normal code can
>>>>>> > rely on it being already called saving one check.
>>>>>> >
>>>>>>
>>>>>> It sounds more complicated than necessary.  Why not just do it
>>>>>> on demand like glibc does?
>>>>>
>>>>> Ehm, the only difference would be to not have a ctor in libgcc that looks
>>>>> like so:
>>>>>
>>>>> void __attribute__((constructor)) bla(void)
>>>>> {
>>>>>  __cpu_indicator_init ();
>>>>> }
>>>>>
>>>>> I don't see any complication.?
>>>>>
>>>>
>>>> Order of constructors.  A constructor may call functions
>>>> which use __cpu_indicator.
>>>
>>> As I said - make __cpu_indicator have a conservative
>>> default value (zero).  It is irrelevant if constructors that
>>> run before initializing __cpu_indicator run with the
>>> default CPU capabilities.
>>>
>>
>> If  IFUNC is used, this just disables IFUNC for those functions
>> called with the conservative default value since they are only
>> resolved once.
>
> Huh, well.  So what happens if you use __cpu_indicator from the
> IFUNC selector function!?  Honestly, if we care about these
> corner-cases why not make __cpu_indicator a hidden function
> instead.
>
> IMHO IFUNC selectors should simply do
>
> if (!__cpu_indicator)
>  __cpu_indicator_init ();
>

Isn't it what I said before?

-- 
H.J.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
  2011-08-22 21:42                                   ` H.J. Lu
@ 2011-08-22 22:26                                     ` Richard Guenther
  0 siblings, 0 replies; 50+ messages in thread
From: Richard Guenther @ 2011-08-22 22:26 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Michael Matz, Richard Henderson, Sriraman Tallam, reply, gcc-patches

On Mon, Aug 22, 2011 at 10:48 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Mon, Aug 22, 2011 at 1:46 PM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>> On Mon, Aug 22, 2011 at 10:39 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Mon, Aug 22, 2011 at 1:34 PM, Richard Guenther
>>> <richard.guenther@gmail.com> wrote:
>>>> On Mon, Aug 22, 2011 at 6:02 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>> On Mon, Aug 22, 2011 at 8:56 AM, Michael Matz <matz@suse.de> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> On Mon, 22 Aug 2011, H.J. Lu wrote:
>>>>>>
>>>>>>> >> Oh, I thought it was data initialized by the constructor ...
>>>>>>> >
>>>>>>> > Sriramans patch right now has a function __cpu_indicator_init which is
>>>>>>> > called from (adhoc constructed) ctors and that initializes variables
>>>>>>> > __cpu_model and __cpu_features ;-)  There's no __cpu_indicator symbol :)
>>>>>>> >
>>>>>>> > I think the whole initializer function and the associated data blobs have
>>>>>>> > to sit in static libgcc and be hidden.  By that all shared modules
>>>>>>> > will have their own copies of the model and features (and the initializer
>>>>>>> > function) so there won't be issues with copy relocs, or cross shared lib
>>>>>>> > calls while relocating the modules.  Dynamically they will contain the
>>>>>>> > same data always, but it's not many bytes, and only modules making use of
>>>>>>> > this facility will pay it.
>>>>>>> >
>>>>>>> > The initializer function has to be callable from pre-.init contexts, e.g.
>>>>>>> > ifunc dispatchers.  And to make life easier there should be one ctor
>>>>>>> > function calling this initializer function too, so that normal code can
>>>>>>> > rely on it being already called saving one check.
>>>>>>> >
>>>>>>>
>>>>>>> It sounds more complicated than necessary.  Why not just do it
>>>>>>> on demand like glibc does?
>>>>>>
>>>>>> Ehm, the only difference would be to not have a ctor in libgcc that looks
>>>>>> like so:
>>>>>>
>>>>>> void __attribute__((constructor)) bla(void)
>>>>>> {
>>>>>>  __cpu_indicator_init ();
>>>>>> }
>>>>>>
>>>>>> I don't see any complication.?
>>>>>>
>>>>>
>>>>> Order of constructors.  A constructor may call functions
>>>>> which use __cpu_indicator.
>>>>
>>>> As I said - make __cpu_indicator have a conservative
>>>> default value (zero).  It is irrelevant if constructors that
>>>> run before initializing __cpu_indicator run with the
>>>> default CPU capabilities.
>>>>
>>>
>>> If  IFUNC is used, this just disables IFUNC for those functions
>>> called with the conservative default value since they are only
>>> resolved once.
>>
>> Huh, well.  So what happens if you use __cpu_indicator from the
>> IFUNC selector function!?  Honestly, if we care about these
>> corner-cases why not make __cpu_indicator a hidden function
>> instead.
>>
>> IMHO IFUNC selectors should simply do
>>
>> if (!__cpu_indicator)
>>  __cpu_indicator_init ();
>>
>
> Isn't it what I said before?

Not in the quoted parts.  What I don't want is a constructor in each module.
Keep a single one in libgcc and document the __cpu_indicator usage
restrictions.

Richard.

> --
> H.J.
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
  2011-08-22 17:18                           ` H.J. Lu
  2011-08-22 19:02                             ` Sriraman Tallam
  2011-08-22 20:49                             ` Richard Guenther
@ 2011-08-23 12:33                             ` Michael Matz
  2011-08-26  7:24                               ` Sriraman Tallam
  2 siblings, 1 reply; 50+ messages in thread
From: Michael Matz @ 2011-08-23 12:33 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Richard Guenther, Richard Henderson, Sriraman Tallam, reply, gcc-patches

[-- Attachment #1: Type: TEXT/PLAIN, Size: 663 bytes --]

Hi,

On Mon, 22 Aug 2011, H.J. Lu wrote:

> > void __attribute__((constructor)) bla(void)
> > {
> >  __cpu_indicator_init ();
> > }
> >
> > I don't see any complication.?
> >
> 
> Order of constructors.  A constructor may call functions
> which use __cpu_indicator.

That's why I wrote also:

> The initializer function has to be callable from pre-.init contexts, e.g.
> ifunc dispatchers.

It obviously has to be guarded against multiple calls.  The ctor in libgcc 
would be mere convenience because then non-ctor code can rely on the data 
being initialized, and only (potential) ctor code has to check and call 
the init function on demand.


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
  2011-08-23 12:33                             ` Michael Matz
@ 2011-08-26  7:24                               ` Sriraman Tallam
  2011-08-26  7:33                                 ` H.J. Lu
  2011-08-29  8:33                                 ` Xinliang David Li
  0 siblings, 2 replies; 50+ messages in thread
From: Sriraman Tallam @ 2011-08-26  7:24 UTC (permalink / raw)
  To: Michael Matz
  Cc: H.J. Lu, Richard Guenther, Richard Henderson, reply, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 2956 bytes --]

Hi,

  Thanks for all the comments. I am attaching a new patch
incorporating all of the changes mentioned, mainly :

1) Make __cpu_indicator_init a constructor in libgcc and guard to call
it only once.
2) Add symbol versions.
3) Move all builtins to the i386 port.
4) Add check for atom processor.
5) No separate passes to fold the builtins.

Please let me know what you think.
Thanks,
-Sri.

	* config/i386/i386.c (build_struct_with_one_bit_fields): New function.
	(make_var_decl): New function.
	(get_field_from_struct): New function.
	(fold_builtin_target): New function.
	(ix86_fold_builtin): New function.
	(ix86_expand_builtin): Expand new builtins by folding them.
	(TARGET_FOLD_BUILTIN): New macro.
	(IX86_BUILTIN_CPU_SUPPORTS_CMOV): New enum value.
	(IX86_BUILTIN_CPU_SUPPORTS_MMX): New enum value.
	(IX86_BUILTIN_CPU_SUPPORTS_POPCOUNT): New enum value.
	(IX86_BUILTIN_CPU_SUPPORTS_SSE): New enum value.
	(IX86_BUILTIN_CPU_SUPPORTS_SSE2): New enum value.
	(IX86_BUILTIN_CPU_SUPPORTS_SSE3): New enum value.
	(IX86_BUILTIN_CPU_SUPPORTS_SSSE3): New enum value.
	(IX86_BUILTIN_CPU_SUPPORTS_SSE4_1): New enum value.
	(IX86_BUILTIN_CPU_SUPPORTS_SSE4_2): New enum value.
	(IX86_BUILTIN_CPU_IS_AMD): New enum value.
	(IX86_BUILTIN_CPU_IS_INTEL): New enum value.
	(IX86_BUILTIN_CPU_IS_INTEL_ATOM): New enum value.
	(IX86_BUILTIN_CPU_IS_INTEL_CORE2): New enum value.
	(IX86_BUILTIN_CPU_IS_INTEL_COREI7_NEHALEM): New enum value.
	(IX86_BUILTIN_CPU_IS_INTEL_COREI7_WESTMERE): New enum value.
	(IX86_BUILTIN_CPU_IS_INTEL_COREI7_SANDYBRIDGE): New enum value.
	(IX86_BUILTIN_CPU_IS_AMDFAM10_BARCELONA): New enum value.
	(IX86_BUILTIN_CPU_IS_AMDFAM10_SHANGHAI): New enum value.
	(IX86_BUILTIN_CPU_IS_AMDFAM10_ISTANBUL): New enum value.
	* config/i386/libgcc-glibc.ver (__cpu_indicator_init): Export symbol.
	(__cpu_model): Export symbol.
	(__cpu_features): Export symbol.
	* config/i386/i386-builtin-types.def: New function type.

	* config/i386/i386-cpuinfo.c: New file.
	* config/i386/t-cpuinfo: New file.
	* config.host: Add t-cpuinfo to link i386-cpuinfo.o with libgcc

	* gcc.dg/builtin_target.c: New test.




On Tue, Aug 23, 2011 at 4:35 AM, Michael Matz <matz@suse.de> wrote:
> Hi,
>
> On Mon, 22 Aug 2011, H.J. Lu wrote:
>
>> > void __attribute__((constructor)) bla(void)
>> > {
>> >  __cpu_indicator_init ();
>> > }
>> >
>> > I don't see any complication.?
>> >
>>
>> Order of constructors.  A constructor may call functions
>> which use __cpu_indicator.
>
> That's why I wrote also:
>
>> The initializer function has to be callable from pre-.init contexts, e.g.
>> ifunc dispatchers.
>
> It obviously has to be guarded against multiple calls.  The ctor in libgcc
> would be mere convenience because then non-ctor code can rely on the data
> being initialized, and only (potential) ctor code has to check and call
> the init function on demand.
>
>
> Ciao,
> Michael.

[-- Attachment #2: CPU_Runtime_patch.txt --]
[-- Type: text/plain, Size: 24191 bytes --]

Index: libgcc/config.host
===================================================================
--- libgcc/config.host	(revision 177767)
+++ libgcc/config.host	(working copy)
@@ -609,7 +609,7 @@ case ${host} in
 i[34567]86-*-linux* | x86_64-*-linux* | \
   i[34567]86-*-kfreebsd*-gnu | i[34567]86-*-knetbsd*-gnu | \
   i[34567]86-*-gnu*)
-	tmake_file="${tmake_file} t-tls"
+	tmake_file="${tmake_file} t-tls i386/t-cpuinfo"
 	if test "$libgcc_cv_cfi" = "yes"; then
 		tmake_file="${tmake_file} t-stack i386/t-stack-i386"
 	fi
Index: libgcc/config/i386/t-cpuinfo
===================================================================
--- libgcc/config/i386/t-cpuinfo	(revision 0)
+++ libgcc/config/i386/t-cpuinfo	(revision 0)
@@ -0,0 +1 @@
+LIB2ADD += $(srcdir)/config/i386/i386-cpuinfo.c
Index: libgcc/config/i386/i386-cpuinfo.c
===================================================================
--- libgcc/config/i386/i386-cpuinfo.c	(revision 0)
+++ libgcc/config/i386/i386-cpuinfo.c	(revision 0)
@@ -0,0 +1,245 @@
+/* Get CPU type and Features for x86 processors.
+   Copyright (C) 2011 Free Software Foundation, Inc.
+   Contributed by Sriraman Tallam (tmsriram@google.com)
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>. */
+
+#include "cpuid.h"
+
+void __cpu_indicator_init (void) __attribute__ ((constructor));
+
+enum vendor_signatures
+{
+  SIG_INTEL =	0x756e6547 /* Genu */,
+  SIG_AMD =	0x68747541 /* Auth */
+};
+
+
+/* Features supported. */
+
+struct __processor_features
+{
+  unsigned int __cpu_cmov : 1;
+  unsigned int __cpu_mmx : 1;
+  unsigned int __cpu_popcnt : 1;
+  unsigned int __cpu_sse : 1;
+  unsigned int __cpu_sse2 : 1;
+  unsigned int __cpu_sse3 : 1;
+  unsigned int __cpu_ssse3 : 1;
+  unsigned int __cpu_sse4_1 : 1;
+  unsigned int __cpu_sse4_2 : 1;
+} __cpu_features = {0};
+
+/* Processor Model. */
+
+struct __processor_model
+{
+  unsigned int __cpu_is_amd : 1;
+  unsigned int __cpu_is_intel : 1;
+  unsigned int __cpu_is_intel_atom : 1;
+  unsigned int __cpu_is_intel_core2 : 1;
+  unsigned int __cpu_is_intel_corei7_nehalem : 1;
+  unsigned int __cpu_is_intel_corei7_westmere : 1;
+  unsigned int __cpu_is_intel_corei7_sandybridge : 1;
+  unsigned int __cpu_is_amdfam10_barcelona : 1;
+  unsigned int __cpu_is_amdfam10_shanghai : 1;
+  unsigned int __cpu_is_amdfam10_istanbul : 1;
+} __cpu_model = {0};
+
+/* Get the specific type of AMD CPU.  */
+
+static void
+get_amd_cpu (unsigned int family, unsigned int model)
+{
+  switch (family)
+    {
+    case 0x10:
+      switch (model)
+	{
+	case 0x2:
+	  __cpu_model.__cpu_is_amdfam10_barcelona = 1;
+	  break;
+	case 0x4:
+	  __cpu_model.__cpu_is_amdfam10_shanghai = 1;
+	  break;
+	case 0x8:
+	  __cpu_model.__cpu_is_amdfam10_istanbul = 1;
+	  break;
+	default:
+	  break;
+	}
+      break;
+    default:
+      break;
+    }
+}
+
+/* Get the specific type of Intel CPU.  */
+
+static void
+get_intel_cpu (unsigned int family, unsigned int model, unsigned int brand_id)
+{
+  /* Parse family and model only if brand ID is 0. */
+  if (brand_id == 0)
+    {
+      switch (family)
+	{
+	case 0x5:
+	  /* Pentium.  */
+	  break;
+	case 0x6:
+	  switch (model)
+	    {
+	    case 0x1c:
+	    case 0x26:
+	      /* Atom.  */
+	      __cpu_model.__cpu_is_intel_atom = 1;
+	      break;
+	    case 0x1a:
+	    case 0x1e:
+	    case 0x1f:
+	    case 0x2e:
+	      /* Nehalem.  */
+	      __cpu_model.__cpu_is_intel_corei7_nehalem = 1;
+	      break;
+	    case 0x25:
+	    case 0x2c:
+	    case 0x2f:
+	      /* Westmere.  */
+	      __cpu_model.__cpu_is_intel_corei7_westmere = 1;
+	      break;
+	    case 0x2a:
+	      /* Sandy Bridge.  */
+	      __cpu_model.__cpu_is_intel_corei7_sandybridge = 1;
+	      break;
+	    case 0x17:
+	    case 0x1d:
+	      /* Penryn.  */
+	    case 0x0f:
+	      /* Merom.  */
+	      __cpu_model.__cpu_is_intel_core2 = 1;
+	      break;
+	    default:
+	      break;
+	    }
+	  break;
+	default:
+	  /* We have no idea.  */
+	  break;
+	}
+    }
+}	             	
+
+static void
+get_available_features (unsigned int ecx, unsigned int edx)
+{
+  __cpu_features.__cpu_cmov = (edx & bit_CMOV) ? 1 : 0;
+  __cpu_features.__cpu_mmx = (edx & bit_MMX) ? 1 : 0;
+  __cpu_features.__cpu_sse = (edx & bit_SSE) ? 1 : 0;
+  __cpu_features.__cpu_sse2 = (edx & bit_SSE2) ? 1 : 0;
+  __cpu_features.__cpu_popcnt = (ecx & bit_POPCNT) ? 1 : 0;
+  __cpu_features.__cpu_sse3 = (ecx & bit_SSE3) ? 1 : 0;
+  __cpu_features.__cpu_ssse3 = (ecx & bit_SSSE3) ? 1 : 0;
+  __cpu_features.__cpu_sse4_1 = (ecx & bit_SSE4_1) ? 1 : 0;
+  __cpu_features.__cpu_sse4_2 = (ecx & bit_SSE4_2) ? 1 : 0;
+}
+
+/* A noinline function calling __get_cpuid. Having many calls to
+   cpuid in one function in 32-bit mode causes GCC to complain:
+   "can’t find a register in class ‘CLOBBERED_REGS’".  This is
+   related to PR rtl-optimization 44174. */
+
+static int __attribute__ ((noinline))
+__get_cpuid_output (unsigned int __level,
+		    unsigned int *__eax, unsigned int *__ebx,
+		    unsigned int *__ecx, unsigned int *__edx)
+{
+  return __get_cpuid (__level, __eax, __ebx, __ecx, __edx);
+}
+
+
+/* A constructor function that is sets __cpu_model and __cpu_features with
+   the right values.  This needs to run only once.
+   If another constructor needs to use these values, explicitly call this
+   function from the other constructor.  Otherwise, the ordering of
+   constructors could make this constructor run later.  */
+
+void __attribute__ ((constructor))
+__cpu_indicator_init (void)
+{
+  unsigned int eax, ebx, ecx, edx;
+
+  int max_level = 5;
+  unsigned int vendor;
+  unsigned int model, family, brand_id;
+  static int called = 0;
+
+  /* This function needs to run just once.  */
+  if (called)
+    return;
+  else
+    called = 1;
+
+  /* Assume cpuid insn present. Run in level 0 to get vendor id. */
+  if (!__get_cpuid_output (0, &eax, &ebx, &ecx, &edx))
+    return;
+
+  vendor = ebx;
+  max_level = eax;
+
+  if (max_level < 1)
+    return;
+
+  if (!__get_cpuid_output (1, &eax, &ebx, &ecx, &edx))
+    return;
+
+  model = (eax >> 4) & 0x0f;
+  family = (eax >> 8) & 0x0f;
+  brand_id = ebx & 0xff;
+
+  /* Adjust model and family for Intel CPUS. */
+  if (vendor == SIG_INTEL)
+    {
+      unsigned int extended_model, extended_family;
+
+      extended_model = (eax >> 12) & 0xf0;
+      extended_family = (eax >> 20) & 0xff;
+      if (family == 0x0f)
+	{
+	  family += extended_family;
+	  model += extended_model;
+	}
+      else if (family == 0x06)
+	model += extended_model;
+    }
+
+  /* Find CPU model. */
+
+  if (vendor == SIG_AMD)
+    {
+      __cpu_model.__cpu_is_amd = 1;
+      get_amd_cpu (family, model);
+    }
+  else if (vendor == SIG_INTEL)
+    {
+      __cpu_model.__cpu_is_intel = 1;
+      get_intel_cpu (family, model, brand_id);
+    }
+
+  /* Find available features. */
+  get_available_features (ecx, edx);
+}
Index: gcc/testsuite/gcc.dg/builtin_target.c
===================================================================
--- gcc/testsuite/gcc.dg/builtin_target.c	(revision 0)
+++ gcc/testsuite/gcc.dg/builtin_target.c	(revision 0)
@@ -0,0 +1,53 @@
+/* This test checks if the __builtin_cpu_* calls are recognized. */
+
+/* { dg-do run } */
+
+int
+fn1 ()
+{
+  if (__builtin_cpu_supports_cmov () < 0)
+    return -1;
+  if (__builtin_cpu_supports_mmx () < 0)
+    return -1;
+  if (__builtin_cpu_supports_popcount () < 0)
+    return -1;
+  if (__builtin_cpu_supports_sse () < 0)
+    return -1;
+  if (__builtin_cpu_supports_sse2 () < 0)
+    return -1;
+  if (__builtin_cpu_supports_sse3 () < 0)
+    return -1;
+  if (__builtin_cpu_supports_ssse3 () < 0)
+    return -1;
+  if (__builtin_cpu_supports_sse4_1 () < 0)
+    return -1;
+  if (__builtin_cpu_supports_sse4_2 () < 0)
+    return -1;
+  if (__builtin_cpu_is_amd () < 0)
+    return -1;
+  if (__builtin_cpu_is_intel () < 0)
+    return -1;
+  if (__builtin_cpu_is_intel_atom () < 0)
+    return -1;
+  if (__builtin_cpu_is_intel_core2 () < 0)
+    return -1;
+  if (__builtin_cpu_is_intel_corei7_nehalem () < 0)
+    return -1;
+  if (__builtin_cpu_is_intel_corei7_westmere () < 0)
+    return -1;
+  if (__builtin_cpu_is_intel_corei7_sandybridge () < 0)
+    return -1;
+  if (__builtin_cpu_is_amdfam10_barcelona () < 0)
+    return -1;
+  if (__builtin_cpu_is_amdfam10_shanghai () < 0)
+    return -1;
+  if (__builtin_cpu_is_amdfam10_istanbul () < 0)
+    return -1;
+
+  return 0;
+}
+
+int main ()
+{
+  return fn1 ();
+}
Index: gcc/config/i386/i386-builtin-types.def
===================================================================
--- gcc/config/i386/i386-builtin-types.def	(revision 177767)
+++ gcc/config/i386/i386-builtin-types.def	(working copy)
@@ -131,6 +131,7 @@ DEF_FUNCTION_TYPE (UINT64)
 DEF_FUNCTION_TYPE (UNSIGNED)
 DEF_FUNCTION_TYPE (VOID)
 DEF_FUNCTION_TYPE (PVOID)
+DEF_FUNCTION_TYPE (INT)
 
 DEF_FUNCTION_TYPE (FLOAT, FLOAT)
 DEF_FUNCTION_TYPE (FLOAT128, FLOAT128)
Index: gcc/config/i386/libgcc-glibc.ver
===================================================================
--- gcc/config/i386/libgcc-glibc.ver	(revision 177767)
+++ gcc/config/i386/libgcc-glibc.ver	(working copy)
@@ -147,6 +147,12 @@ GCC_4.3.0 {
   __trunctfxf2
   __unordtf2
 }
+
+GCC_4.6.0 {
+  __cpu_indicator_init
+  __cpu_model
+  __cpu_features
+}
 %else
 GCC_4.4.0 {
   __addtf3
@@ -183,4 +189,10 @@ GCC_4.4.0 {
 GCC_4.5.0 {
   __extendxftf2
 }
+
+GCC_4.6.0 {
+  __cpu_indicator_init
+  __cpu_model
+  __cpu_features
+}
 %endif
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 177767)
+++ gcc/config/i386/i386.c	(working copy)
@@ -58,6 +58,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "sched-int.h"
 #include "sbitmap.h"
 #include "fibheap.h"
+#include "tree-flow.h"
+#include "tree-pass.h"
 
 enum upper_128bits_state
 {
@@ -24443,6 +24445,28 @@ enum ix86_builtins
   /* CFString built-in for darwin */
   IX86_BUILTIN_CFSTRING,
 
+  /* Builtins to get CPU features. */
+  IX86_BUILTIN_CPU_SUPPORTS_CMOV,
+  IX86_BUILTIN_CPU_SUPPORTS_MMX,
+  IX86_BUILTIN_CPU_SUPPORTS_POPCOUNT,
+  IX86_BUILTIN_CPU_SUPPORTS_SSE,
+  IX86_BUILTIN_CPU_SUPPORTS_SSE2,
+  IX86_BUILTIN_CPU_SUPPORTS_SSE3,
+  IX86_BUILTIN_CPU_SUPPORTS_SSSE3,
+  IX86_BUILTIN_CPU_SUPPORTS_SSE4_1,
+  IX86_BUILTIN_CPU_SUPPORTS_SSE4_2,
+  /* Builtins to get CPU type. */
+  IX86_BUILTIN_CPU_IS_AMD,
+  IX86_BUILTIN_CPU_IS_INTEL,
+  IX86_BUILTIN_CPU_IS_INTEL_ATOM,
+  IX86_BUILTIN_CPU_IS_INTEL_CORE2,
+  IX86_BUILTIN_CPU_IS_INTEL_COREI7_NEHALEM,
+  IX86_BUILTIN_CPU_IS_INTEL_COREI7_WESTMERE,
+  IX86_BUILTIN_CPU_IS_INTEL_COREI7_SANDYBRIDGE,
+  IX86_BUILTIN_CPU_IS_AMDFAM10_BARCELONA,
+  IX86_BUILTIN_CPU_IS_AMDFAM10_SHANGHAI,
+  IX86_BUILTIN_CPU_IS_AMDFAM10_ISTANBUL,
+
   IX86_BUILTIN_MAX
 };
 
@@ -25809,6 +25833,316 @@ ix86_init_mmx_sse_builtins (void)
     }
 }
 
+/* Returns a struct type with name NAME and number of fields equal to
+   NUM_FIELDS.  Each field is a unsigned int bit field of length 1 bit. */
+
+static tree
+build_struct_with_one_bit_fields (int num_fields, const char *name)
+{
+  int i;
+  char field_name [10];
+  tree field = NULL_TREE, field_chain = NULL_TREE;
+  tree type = make_node (RECORD_TYPE);
+
+  strcpy (field_name, "k_field");
+
+  for (i = 0; i < num_fields; i++)
+    {
+      /* Name the fields, 0_field, 1_field, ... */
+      field_name [0] = '0' + i;
+      field = build_decl (UNKNOWN_LOCATION, FIELD_DECL,
+			  get_identifier (field_name), unsigned_type_node);
+      DECL_BIT_FIELD (field) = 1;
+      DECL_SIZE (field) = bitsize_one_node;
+      if (field_chain != NULL_TREE)
+	DECL_CHAIN (field) = field_chain;
+      field_chain = field;
+    }
+  finish_builtin_struct (type, name, field_chain, NULL_TREE);
+  return type;
+}
+
+/* Returns a extern, comdat VAR_DECL of type TYPE and name NAME. */
+
+static tree
+make_var_decl (tree type, const char *name)
+{
+  tree new_decl;
+  struct varpool_node *vnode;
+
+  new_decl = build_decl (UNKNOWN_LOCATION,
+	                 VAR_DECL,
+	  	         get_identifier(name),
+		         type);
+
+  DECL_EXTERNAL (new_decl) = 1;
+  TREE_STATIC (new_decl) = 1;
+  TREE_PUBLIC (new_decl) = 1;
+  DECL_INITIAL (new_decl) = 0;
+  DECL_ARTIFICIAL (new_decl) = 0;
+  DECL_PRESERVE_P (new_decl) = 1;
+
+  make_decl_one_only (new_decl, DECL_ASSEMBLER_NAME (new_decl));
+  assemble_variable (new_decl, 0, 0, 0);
+
+  vnode = varpool_node (new_decl);
+  gcc_assert (vnode != NULL);
+  /* Set finalized to 1, otherwise it asserts in function "write_symbol" in
+     lto-streamer-out.c. */
+  vnode->finalized = 1;
+
+  return new_decl;
+}
+
+/* Traverses the chain of fields in STRUCT_TYPE and returns the FIELD_NUM
+   numbered field. */
+
+static tree
+get_field_from_struct (tree struct_type, int field_num)
+{
+  int i;
+  tree field = TYPE_FIELDS (struct_type);
+
+  for (i = 0; i < field_num; i++, field = DECL_CHAIN(field))
+    {
+      gcc_assert (field != NULL_TREE);
+    }
+
+  return field;
+}
+
+/* FNDECL is a __builtin_cpu_* call that is folded into an integer defined
+   in libgcc/config/i386/i386-cpuinfo.c */
+
+static tree 
+fold_builtin_cpu (enum ix86_builtins fn_code)
+{
+  /* This is the order of bit-fields in __processor_features in
+     i386-cpuinfo.c */
+  enum processor_features
+  {
+    F_CMOV = 0,
+    F_MMX,
+    F_POPCNT,
+    F_SSE,
+    F_SSE2,
+    F_SSE3,
+    F_SSSE3,
+    F_SSE4_1,
+    F_SSE4_2,
+    F_MAX
+  };
+
+  /* This is the order of bit-fields in __processor_model in
+     i386-cpuinfo.c */
+  enum processor_model
+  {
+    M_AMD = 0,
+    M_INTEL,
+    M_INTEL_ATOM,
+    M_INTEL_CORE2,
+    M_INTEL_COREI7_NEHALEM,
+    M_INTEL_COREI7_WESTMERE,
+    M_INTEL_COREI7_SANDYBRIDGE,
+    M_AMDFAM10_BARCELONA,
+    M_AMDFAM10_SHANGHAI,
+    M_AMDFAM10_ISTANBUL,
+    M_MAX
+  };
+
+  static tree __processor_features_type = NULL_TREE;
+  static tree __cpu_features_var = NULL_TREE;
+  static tree __processor_model_type = NULL_TREE;
+  static tree __cpu_model_var = NULL_TREE;
+  static tree field;
+  static tree which_struct;
+
+  if (__processor_features_type == NULL_TREE)
+    __processor_features_type = build_struct_with_one_bit_fields (F_MAX,
+ 			          "__processor_features");
+
+  if (__processor_model_type == NULL_TREE)
+    __processor_model_type = build_struct_with_one_bit_fields (M_MAX,
+ 			          "__processor_model");
+
+  if (__cpu_features_var == NULL_TREE)
+    __cpu_features_var = make_var_decl (__processor_features_type,
+					"__cpu_features");
+
+  if (__cpu_model_var == NULL_TREE)
+    __cpu_model_var = make_var_decl (__processor_model_type,
+				     "__cpu_model");
+
+  /* Look at the code to identify the field requested. */ 
+  switch (fn_code)
+    {
+    case IX86_BUILTIN_CPU_SUPPORTS_CMOV:
+      field = get_field_from_struct (__processor_features_type, F_CMOV);
+      which_struct = __cpu_features_var;
+      break;
+    case IX86_BUILTIN_CPU_SUPPORTS_MMX:
+      field = get_field_from_struct (__processor_features_type, F_MMX);
+      which_struct = __cpu_features_var;
+      break;
+    case IX86_BUILTIN_CPU_SUPPORTS_POPCOUNT:
+      field = get_field_from_struct (__processor_features_type, F_POPCNT);
+      which_struct = __cpu_features_var;
+      break;
+    case IX86_BUILTIN_CPU_SUPPORTS_SSE:
+      field = get_field_from_struct (__processor_features_type, F_SSE);
+      which_struct = __cpu_features_var;
+      break;
+    case IX86_BUILTIN_CPU_SUPPORTS_SSE2:
+      field = get_field_from_struct (__processor_features_type, F_SSE2);
+      which_struct = __cpu_features_var;
+      break;
+    case IX86_BUILTIN_CPU_SUPPORTS_SSE3:
+      field = get_field_from_struct (__processor_features_type, F_SSE3);
+      which_struct = __cpu_features_var;
+      break;
+    case IX86_BUILTIN_CPU_SUPPORTS_SSSE3:
+      field = get_field_from_struct (__processor_features_type, F_SSE3);
+      which_struct = __cpu_features_var;
+      break;
+    case IX86_BUILTIN_CPU_SUPPORTS_SSE4_1:
+      field = get_field_from_struct (__processor_features_type, F_SSE4_1);
+      which_struct = __cpu_features_var;
+      break;
+    case IX86_BUILTIN_CPU_SUPPORTS_SSE4_2:
+      field = get_field_from_struct (__processor_features_type, F_SSE4_2);
+      which_struct = __cpu_features_var;
+      break;
+    case IX86_BUILTIN_CPU_IS_AMD:
+      field = get_field_from_struct (__processor_model_type, M_AMD);
+      which_struct = __cpu_model_var;
+      break;
+    case IX86_BUILTIN_CPU_IS_INTEL:
+      field = get_field_from_struct (__processor_model_type, M_INTEL);
+      which_struct = __cpu_model_var;
+      break;
+    case IX86_BUILTIN_CPU_IS_INTEL_ATOM:
+      field = get_field_from_struct (__processor_model_type, M_INTEL_ATOM);
+      which_struct = __cpu_model_var;
+      break;
+    case IX86_BUILTIN_CPU_IS_INTEL_CORE2:
+      field = get_field_from_struct (__processor_model_type, M_INTEL_CORE2);
+      which_struct = __cpu_model_var;
+      break;
+    case IX86_BUILTIN_CPU_IS_INTEL_COREI7_NEHALEM:
+      field = get_field_from_struct (__processor_model_type,
+				     M_INTEL_COREI7_NEHALEM);
+      which_struct = __cpu_model_var;
+      break;
+    case IX86_BUILTIN_CPU_IS_INTEL_COREI7_WESTMERE:
+      field = get_field_from_struct (__processor_model_type,
+				     M_INTEL_COREI7_WESTMERE);
+      which_struct = __cpu_model_var;
+      break;
+    case IX86_BUILTIN_CPU_IS_INTEL_COREI7_SANDYBRIDGE:
+      field = get_field_from_struct (__processor_model_type,
+				     M_INTEL_COREI7_SANDYBRIDGE);
+      which_struct = __cpu_model_var;
+      break;
+    case IX86_BUILTIN_CPU_IS_AMDFAM10_BARCELONA:
+      field = get_field_from_struct (__processor_model_type,
+				     M_AMDFAM10_BARCELONA);
+      which_struct = __cpu_model_var;
+      break;
+    case IX86_BUILTIN_CPU_IS_AMDFAM10_SHANGHAI:
+      field = get_field_from_struct (__processor_model_type,
+				     M_AMDFAM10_SHANGHAI);
+      which_struct = __cpu_model_var;
+      break;
+    case IX86_BUILTIN_CPU_IS_AMDFAM10_ISTANBUL:
+      field = get_field_from_struct (__processor_model_type,
+				     M_AMDFAM10_ISTANBUL);
+      which_struct = __cpu_model_var;
+      break;
+    default:
+      return NULL_TREE;
+    }
+
+  return build3 (COMPONENT_REF, TREE_TYPE (field), which_struct, field, NULL_TREE);
+}
+
+static tree
+ix86_fold_builtin (tree fndecl, int n_args ATTRIBUTE_UNUSED,
+		   tree *args ATTRIBUTE_UNUSED, bool ignore ATTRIBUTE_UNUSED)
+{
+  const char* decl_name = IDENTIFIER_POINTER (DECL_NAME (fndecl));
+  if (DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_MD
+      && strstr(decl_name, "__builtin_cpu") != NULL)
+    {
+      enum ix86_builtins code = (enum ix86_builtins)
+				DECL_FUNCTION_CODE (fndecl);
+      return fold_builtin_cpu (code);
+    }
+  return NULL_TREE;
+}
+
+/* A builtin to return the cpu type or feature.  Returns an integer
+   and is a const. */
+
+static void
+make_platform_builtin (const char* name, int code)
+{
+  tree decl;
+  tree type;
+
+  type = ix86_get_builtin_func_type (INT_FTYPE_VOID);
+  decl = add_builtin_function (name, type, code, BUILT_IN_MD,
+			       NULL, NULL_TREE);
+  gcc_assert (decl != NULL_TREE);
+  ix86_builtins[(int) code] = decl;
+  /* Mark this as a const function. */
+  TREE_READONLY (decl) = 1;
+} 
+
+/* Builtins to get CPU type and features supported. */
+
+static void
+ix86_init_platform_type_builtins (void)
+{
+  make_platform_builtin ("__builtin_cpu_supports_cmov",
+			    IX86_BUILTIN_CPU_SUPPORTS_CMOV);
+  make_platform_builtin ("__builtin_cpu_supports_mmx",
+			    IX86_BUILTIN_CPU_SUPPORTS_MMX);
+  make_platform_builtin ("__builtin_cpu_supports_popcount",
+			    IX86_BUILTIN_CPU_SUPPORTS_POPCOUNT);
+  make_platform_builtin ("__builtin_cpu_supports_sse",
+			    IX86_BUILTIN_CPU_SUPPORTS_SSE);
+  make_platform_builtin ("__builtin_cpu_supports_sse2",
+			    IX86_BUILTIN_CPU_SUPPORTS_SSE2);
+  make_platform_builtin ("__builtin_cpu_supports_sse3",
+			    IX86_BUILTIN_CPU_SUPPORTS_SSE3);
+  make_platform_builtin ("__builtin_cpu_supports_ssse3",
+			    IX86_BUILTIN_CPU_SUPPORTS_SSSE3);
+  make_platform_builtin ("__builtin_cpu_supports_sse4_1",
+			    IX86_BUILTIN_CPU_SUPPORTS_SSE4_1);
+  make_platform_builtin ("__builtin_cpu_supports_sse4_2",
+			    IX86_BUILTIN_CPU_SUPPORTS_SSE4_2);
+  make_platform_builtin ("__builtin_cpu_is_amd",
+			    IX86_BUILTIN_CPU_IS_AMD);
+  make_platform_builtin ("__builtin_cpu_is_intel_atom",
+			    IX86_BUILTIN_CPU_IS_INTEL_ATOM);
+  make_platform_builtin ("__builtin_cpu_is_intel_core2",
+			    IX86_BUILTIN_CPU_IS_INTEL_CORE2);
+  make_platform_builtin ("__builtin_cpu_is_intel",
+			    IX86_BUILTIN_CPU_IS_INTEL);
+  make_platform_builtin ("__builtin_cpu_is_intel_corei7_nehalem",
+			    IX86_BUILTIN_CPU_IS_INTEL_COREI7_NEHALEM);
+  make_platform_builtin ("__builtin_cpu_is_intel_corei7_westmere",
+			    IX86_BUILTIN_CPU_IS_INTEL_COREI7_WESTMERE);
+  make_platform_builtin ("__builtin_cpu_is_intel_corei7_sandybridge",
+			    IX86_BUILTIN_CPU_IS_INTEL_COREI7_SANDYBRIDGE);
+  make_platform_builtin ("__builtin_cpu_is_amdfam10_barcelona",
+			    IX86_BUILTIN_CPU_IS_AMDFAM10_BARCELONA);
+  make_platform_builtin ("__builtin_cpu_is_amdfam10_shanghai",
+			    IX86_BUILTIN_CPU_IS_AMDFAM10_SHANGHAI);
+  make_platform_builtin ("__builtin_cpu_is_amdfam10_istanbul",
+			    IX86_BUILTIN_CPU_IS_AMDFAM10_ISTANBUL);
+}
+
 /* Internal method for ix86_init_builtins.  */
 
 static void
@@ -25892,6 +26226,9 @@ ix86_init_builtins (void)
 
   ix86_init_builtin_types ();
 
+  /* Builtins to get CPU type and features. */
+  ix86_init_platform_type_builtins ();
+
   /* TFmode support builtins.  */
   def_builtin_const (0, "__builtin_infq",
 		     FLOAT128_FTYPE_VOID, IX86_BUILTIN_INFQ);
@@ -27351,6 +27688,35 @@ ix86_expand_builtin (tree exp, rtx target, rtx sub
   enum machine_mode mode0, mode1, mode2;
   unsigned int fcode = DECL_FUNCTION_CODE (fndecl);
 
+  /* For CPU builtins that can be folded, fold first and expand the fold.  */
+  switch (fcode)
+    {
+    case IX86_BUILTIN_CPU_SUPPORTS_CMOV:
+    case IX86_BUILTIN_CPU_SUPPORTS_MMX:
+    case IX86_BUILTIN_CPU_SUPPORTS_POPCOUNT:
+    case IX86_BUILTIN_CPU_SUPPORTS_SSE:
+    case IX86_BUILTIN_CPU_SUPPORTS_SSE2:
+    case IX86_BUILTIN_CPU_SUPPORTS_SSE3:
+    case IX86_BUILTIN_CPU_SUPPORTS_SSSE3:
+    case IX86_BUILTIN_CPU_SUPPORTS_SSE4_1:
+    case IX86_BUILTIN_CPU_SUPPORTS_SSE4_2:
+    case IX86_BUILTIN_CPU_IS_AMD:
+    case IX86_BUILTIN_CPU_IS_INTEL:
+    case IX86_BUILTIN_CPU_IS_INTEL_ATOM:
+    case IX86_BUILTIN_CPU_IS_INTEL_CORE2:
+    case IX86_BUILTIN_CPU_IS_INTEL_COREI7_NEHALEM:
+    case IX86_BUILTIN_CPU_IS_INTEL_COREI7_WESTMERE:
+    case IX86_BUILTIN_CPU_IS_INTEL_COREI7_SANDYBRIDGE:
+    case IX86_BUILTIN_CPU_IS_AMDFAM10_BARCELONA:
+    case IX86_BUILTIN_CPU_IS_AMDFAM10_SHANGHAI:
+    case IX86_BUILTIN_CPU_IS_AMDFAM10_ISTANBUL:
+      {
+        tree fold_expr = fold_builtin_cpu ((enum ix86_builtins) fcode);
+	gcc_assert (fold_expr != NULL_TREE);
+        return expand_expr (fold_expr, target, mode, EXPAND_NORMAL);
+      }
+    }
+
   /* Determine whether the builtin function is available under the current ISA.
      Originally the builtin was not created if it wasn't applicable to the
      current ISA based on the command line switches.  With function specific
@@ -35097,6 +35463,9 @@ ix86_autovectorize_vector_sizes (void)
 #undef TARGET_BUILD_BUILTIN_VA_LIST
 #define TARGET_BUILD_BUILTIN_VA_LIST ix86_build_builtin_va_list
 
+#undef TARGET_FOLD_BUILTIN
+#define TARGET_FOLD_BUILTIN ix86_fold_builtin
+
 #undef TARGET_ENUM_VA_LIST_P
 #define TARGET_ENUM_VA_LIST_P ix86_enum_va_list
 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
  2011-08-26  7:24                               ` Sriraman Tallam
@ 2011-08-26  7:33                                 ` H.J. Lu
  2011-08-26 17:59                                   ` Sriraman Tallam
  2011-08-28 20:27                                   ` Mike Stump
  2011-08-29  8:33                                 ` Xinliang David Li
  1 sibling, 2 replies; 50+ messages in thread
From: H.J. Lu @ 2011-08-26  7:33 UTC (permalink / raw)
  To: Sriraman Tallam
  Cc: Michael Matz, Richard Guenther, Richard Henderson, reply, GCC Patches

On Thu, Aug 25, 2011 at 5:37 PM, Sriraman Tallam <tmsriram@google.com> wrote:
> Hi,
>
>  Thanks for all the comments. I am attaching a new patch
> incorporating all of the changes mentioned, mainly :
>
> 1) Make __cpu_indicator_init a constructor in libgcc and guard to call
> it only once.

This is unreliable and you don't need 3 symbols from libgcc. You can use

static struct cpu_indicator
{
  feature
  model
  status
} cpu_indicator;

struct cpu_indicator *
__get_cpu_indicator ()
{
   if cpu_indicator is uninitialized; then
      initialize cpu_indicator;
  return &cpu_indicator;
}

You can simply call __get_cpu_indicator to
get a pointer to cpu_indicator;

-- 
H.J.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
  2011-08-26  7:33                                 ` H.J. Lu
@ 2011-08-26 17:59                                   ` Sriraman Tallam
  2011-08-26 18:21                                     ` H.J. Lu
  2011-08-28 20:27                                   ` Mike Stump
  1 sibling, 1 reply; 50+ messages in thread
From: Sriraman Tallam @ 2011-08-26 17:59 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Michael Matz, Richard Guenther, Richard Henderson, reply, GCC Patches

On Thu, Aug 25, 2011 at 6:02 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Thu, Aug 25, 2011 at 5:37 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>> Hi,
>>
>>  Thanks for all the comments. I am attaching a new patch
>> incorporating all of the changes mentioned, mainly :
>>
>> 1) Make __cpu_indicator_init a constructor in libgcc and guard to call
>> it only once.
>
> This is unreliable and you don't need 3 symbols from libgcc. You can use

Do you mean it is unreliable because of the constructor ordering problem?

>
> static struct cpu_indicator
> {
>  feature
>  model
>  status
> } cpu_indicator;
>
> struct cpu_indicator *
> __get_cpu_indicator ()
> {
>   if cpu_indicator is uninitialized; then
>      initialize cpu_indicator;
>  return &cpu_indicator;
> }
>
> You can simply call __get_cpu_indicator to
> get a pointer to cpu_indicator;
>
> --
> H.J.
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
  2011-08-26 17:59                                   ` Sriraman Tallam
@ 2011-08-26 18:21                                     ` H.J. Lu
  2011-08-26 19:03                                       ` Sriraman Tallam
  0 siblings, 1 reply; 50+ messages in thread
From: H.J. Lu @ 2011-08-26 18:21 UTC (permalink / raw)
  To: Sriraman Tallam
  Cc: Michael Matz, Richard Guenther, Richard Henderson, reply, GCC Patches

On Fri, Aug 26, 2011 at 10:06 AM, Sriraman Tallam <tmsriram@google.com> wrote:
> On Thu, Aug 25, 2011 at 6:02 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Thu, Aug 25, 2011 at 5:37 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>> Hi,
>>>
>>>  Thanks for all the comments. I am attaching a new patch
>>> incorporating all of the changes mentioned, mainly :
>>>
>>> 1) Make __cpu_indicator_init a constructor in libgcc and guard to call
>>> it only once.
>>
>> This is unreliable and you don't need 3 symbols from libgcc. You can use
>
> Do you mean it is unreliable because of the constructor ordering problem?
>

You do not have total control when __cpu_indicator_init is called.

Also you shouldn't use bitfield in

struct __processor_model
+{
+  unsigned int __cpu_is_amd : 1;
+  unsigned int __cpu_is_intel : 1;
+  unsigned int __cpu_is_intel_atom : 1;
+  unsigned int __cpu_is_intel_core2 : 1;
+  unsigned int __cpu_is_intel_corei7_nehalem : 1;
+  unsigned int __cpu_is_intel_corei7_westmere : 1;
+  unsigned int __cpu_is_intel_corei7_sandybridge : 1;
+  unsigned int __cpu_is_amdfam10_barcelona : 1;
+  unsigned int __cpu_is_amdfam10_shanghai : 1;
+  unsigned int __cpu_is_amdfam10_istanbul : 1;
+} __cpu_model = {0};
+

A processor can't be both Atom and Core 2.

-- 
H.J.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
  2011-08-26 18:21                                     ` H.J. Lu
@ 2011-08-26 19:03                                       ` Sriraman Tallam
  2011-08-26 19:41                                         ` H.J. Lu
  0 siblings, 1 reply; 50+ messages in thread
From: Sriraman Tallam @ 2011-08-26 19:03 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Michael Matz, Richard Guenther, Richard Henderson, reply, GCC Patches

On Fri, Aug 26, 2011 at 10:10 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Fri, Aug 26, 2011 at 10:06 AM, Sriraman Tallam <tmsriram@google.com> wrote:
>> On Thu, Aug 25, 2011 at 6:02 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Thu, Aug 25, 2011 at 5:37 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>> Hi,
>>>>
>>>>  Thanks for all the comments. I am attaching a new patch
>>>> incorporating all of the changes mentioned, mainly :
>>>>
>>>> 1) Make __cpu_indicator_init a constructor in libgcc and guard to call
>>>> it only once.
>>>
>>> This is unreliable and you don't need 3 symbols from libgcc. You can use
>>
>> Do you mean it is unreliable because of the constructor ordering problem?
>>
>
> You do not have total control when __cpu_indicator_init is called.

Like  discussed before, for non-ctor functions, which in my opinion is
the common use case, it works out great because __cpu_indicator_init
is guaranteed to be called and I save doing an extra check. It is only
for other ctors where this is a problem. So other ctors call this
explicitly.  What did I miss?

Thanks,
-Sri.

>
> Also you shouldn't use bitfield in
>
> struct __processor_model
> +{
> +  unsigned int __cpu_is_amd : 1;
> +  unsigned int __cpu_is_intel : 1;
> +  unsigned int __cpu_is_intel_atom : 1;
> +  unsigned int __cpu_is_intel_core2 : 1;
> +  unsigned int __cpu_is_intel_corei7_nehalem : 1;
> +  unsigned int __cpu_is_intel_corei7_westmere : 1;
> +  unsigned int __cpu_is_intel_corei7_sandybridge : 1;
> +  unsigned int __cpu_is_amdfam10_barcelona : 1;
> +  unsigned int __cpu_is_amdfam10_shanghai : 1;
> +  unsigned int __cpu_is_amdfam10_istanbul : 1;
> +} __cpu_model = {0};
> +
>
> A processor can't be both Atom and Core 2.
>
> --
> H.J.
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
  2011-08-26 19:03                                       ` Sriraman Tallam
@ 2011-08-26 19:41                                         ` H.J. Lu
  2011-08-26 20:45                                           ` Sriraman Tallam
  0 siblings, 1 reply; 50+ messages in thread
From: H.J. Lu @ 2011-08-26 19:41 UTC (permalink / raw)
  To: Sriraman Tallam
  Cc: Michael Matz, Richard Guenther, Richard Henderson, reply, GCC Patches

On Fri, Aug 26, 2011 at 10:17 AM, Sriraman Tallam <tmsriram@google.com> wrote:
> On Fri, Aug 26, 2011 at 10:10 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Fri, Aug 26, 2011 at 10:06 AM, Sriraman Tallam <tmsriram@google.com> wrote:
>>> On Thu, Aug 25, 2011 at 6:02 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>> On Thu, Aug 25, 2011 at 5:37 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>>> Hi,
>>>>>
>>>>>  Thanks for all the comments. I am attaching a new patch
>>>>> incorporating all of the changes mentioned, mainly :
>>>>>
>>>>> 1) Make __cpu_indicator_init a constructor in libgcc and guard to call
>>>>> it only once.
>>>>
>>>> This is unreliable and you don't need 3 symbols from libgcc. You can use
>>>
>>> Do you mean it is unreliable because of the constructor ordering problem?
>>>
>>
>> You do not have total control when __cpu_indicator_init is called.
>
> Like  discussed before, for non-ctor functions, which in my opinion is
> the common use case, it works out great because __cpu_indicator_init
> is guaranteed to be called and I save doing an extra check. It is only
>> for other ctors where this is a problem. So other ctors call this
> explicitly.  What did I miss?
>

I have

static void foo ( void ) __attribute__((constructor));

static void foo ( void )
{
   ...
   call bar ();
   ...
}

in my application. bar () uses those cpu specific functions.
foo () is called before __cpu_indicator_init.  Since IFUNC
returns the cpu specific function address only for the
first call, the proper cpu specific functions will never be used.


-- 
H.J.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
  2011-08-26 19:41                                         ` H.J. Lu
@ 2011-08-26 20:45                                           ` Sriraman Tallam
  2011-08-26 20:52                                             ` H.J. Lu
  2011-08-27  7:54                                             ` Xinliang David Li
  0 siblings, 2 replies; 50+ messages in thread
From: Sriraman Tallam @ 2011-08-26 20:45 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Michael Matz, Richard Guenther, Richard Henderson, reply, GCC Patches

On Fri, Aug 26, 2011 at 10:24 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Fri, Aug 26, 2011 at 10:17 AM, Sriraman Tallam <tmsriram@google.com> wrote:
>> On Fri, Aug 26, 2011 at 10:10 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Fri, Aug 26, 2011 at 10:06 AM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>> On Thu, Aug 25, 2011 at 6:02 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>> On Thu, Aug 25, 2011 at 5:37 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>>>> Hi,
>>>>>>
>>>>>>  Thanks for all the comments. I am attaching a new patch
>>>>>> incorporating all of the changes mentioned, mainly :
>>>>>>
>>>>>> 1) Make __cpu_indicator_init a constructor in libgcc and guard to call
>>>>>> it only once.
>>>>>
>>>>> This is unreliable and you don't need 3 symbols from libgcc. You can use
>>>>
>>>> Do you mean it is unreliable because of the constructor ordering problem?
>>>>
>>>
>>> You do not have total control when __cpu_indicator_init is called.
>>
>> Like  discussed before, for non-ctor functions, which in my opinion is
>> the common use case, it works out great because __cpu_indicator_init
>> is guaranteed to be called and I save doing an extra check. It is only
>>> for other ctors where this is a problem. So other ctors call this
>> explicitly.  What did I miss?
>>
>
> I have
>
> static void foo ( void ) __attribute__((constructor));
>
> static void foo ( void )
> {
>   ...
>   call bar ();
>   ...
> }
>
> in my application. bar () uses those cpu specific functions.
> foo () is called before __cpu_indicator_init.  Since IFUNC
> returns the cpu specific function address only for the
> first call, the proper cpu specific functions will never be used.

Please correct me if I am wrong since I did not follow the IFUNC part
you mentioned.  However, it looks like this could be solved with
adding an explicit call to __cpu_indicator_init from within the ctor
foo. To me, it seems like the pain of adding this call explicitly in
other ctors is worth it because it works cleanly for non-ctors.

static void foo ( void ) __attribute__((constructor));

static void foo ( void )
{
  ...
  __cpu_indicator_init ();
  call bar ();
  ...
}

Will this work?

Thanks,
-Sri.

>
>
> --
> H.J.
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
  2011-08-26 20:45                                           ` Sriraman Tallam
@ 2011-08-26 20:52                                             ` H.J. Lu
  2011-08-26 20:56                                               ` Xinliang David Li
  2011-08-26 22:12                                               ` Sriraman Tallam
  2011-08-27  7:54                                             ` Xinliang David Li
  1 sibling, 2 replies; 50+ messages in thread
From: H.J. Lu @ 2011-08-26 20:52 UTC (permalink / raw)
  To: Sriraman Tallam
  Cc: Michael Matz, Richard Guenther, Richard Henderson, reply, GCC Patches

On Fri, Aug 26, 2011 at 10:37 AM, Sriraman Tallam <tmsriram@google.com> wrote:
> On Fri, Aug 26, 2011 at 10:24 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Fri, Aug 26, 2011 at 10:17 AM, Sriraman Tallam <tmsriram@google.com> wrote:
>>> On Fri, Aug 26, 2011 at 10:10 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>> On Fri, Aug 26, 2011 at 10:06 AM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>>> On Thu, Aug 25, 2011 at 6:02 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>>> On Thu, Aug 25, 2011 at 5:37 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>>  Thanks for all the comments. I am attaching a new patch
>>>>>>> incorporating all of the changes mentioned, mainly :
>>>>>>>
>>>>>>> 1) Make __cpu_indicator_init a constructor in libgcc and guard to call
>>>>>>> it only once.
>>>>>>
>>>>>> This is unreliable and you don't need 3 symbols from libgcc. You can use
>>>>>
>>>>> Do you mean it is unreliable because of the constructor ordering problem?
>>>>>
>>>>
>>>> You do not have total control when __cpu_indicator_init is called.
>>>
>>> Like  discussed before, for non-ctor functions, which in my opinion is
>>> the common use case, it works out great because __cpu_indicator_init
>>> is guaranteed to be called and I save doing an extra check. It is only
>>>> for other ctors where this is a problem. So other ctors call this
>>> explicitly.  What did I miss?
>>>
>>
>> I have
>>
>> static void foo ( void ) __attribute__((constructor));
>>
>> static void foo ( void )
>> {
>>   ...
>>   call bar ();
>>   ...
>> }
>>
>> in my application. bar () uses those cpu specific functions.
>> foo () is called before __cpu_indicator_init.  Since IFUNC
>> returns the cpu specific function address only for the
>> first call, the proper cpu specific functions will never be used.
>
> Please correct me if I am wrong since I did not follow the IFUNC part
> you mentioned.  However, it looks like this could be solved with
> adding an explicit call to __cpu_indicator_init from within the ctor
> foo. To me, it seems like the pain of adding this call explicitly in
> other ctors is worth it because it works cleanly for non-ctors.
>
> static void foo ( void ) __attribute__((constructor));
>
> static void foo ( void )
> {
>  ...
>  __cpu_indicator_init ();
>  call bar ();
>  ...
> }
>
> Will this work?
>
>

Do I have to do that in every constructor, including
C++ global constructors?  It is ridiculous.

-- 
H.J.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
  2011-08-26 20:52                                             ` H.J. Lu
@ 2011-08-26 20:56                                               ` Xinliang David Li
  2011-08-28 20:36                                                 ` Mike Stump
  2011-08-26 22:12                                               ` Sriraman Tallam
  1 sibling, 1 reply; 50+ messages in thread
From: Xinliang David Li @ 2011-08-26 20:56 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Sriraman Tallam, Michael Matz, Richard Guenther,
	Richard Henderson, reply, GCC Patches, Cary Coutant

Is there a standard way to force this init function to be called
before all ctors?  Adding a ctor in one crtx.o ?

David

On Fri, Aug 26, 2011 at 10:45 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Fri, Aug 26, 2011 at 10:37 AM, Sriraman Tallam <tmsriram@google.com> wrote:
>> On Fri, Aug 26, 2011 at 10:24 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Fri, Aug 26, 2011 at 10:17 AM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>> On Fri, Aug 26, 2011 at 10:10 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>> On Fri, Aug 26, 2011 at 10:06 AM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>>>> On Thu, Aug 25, 2011 at 6:02 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>>>> On Thu, Aug 25, 2011 at 5:37 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>>  Thanks for all the comments. I am attaching a new patch
>>>>>>>> incorporating all of the changes mentioned, mainly :
>>>>>>>>
>>>>>>>> 1) Make __cpu_indicator_init a constructor in libgcc and guard to call
>>>>>>>> it only once.
>>>>>>>
>>>>>>> This is unreliable and you don't need 3 symbols from libgcc. You can use
>>>>>>
>>>>>> Do you mean it is unreliable because of the constructor ordering problem?
>>>>>>
>>>>>
>>>>> You do not have total control when __cpu_indicator_init is called.
>>>>
>>>> Like  discussed before, for non-ctor functions, which in my opinion is
>>>> the common use case, it works out great because __cpu_indicator_init
>>>> is guaranteed to be called and I save doing an extra check. It is only
>>>>> for other ctors where this is a problem. So other ctors call this
>>>> explicitly.  What did I miss?
>>>>
>>>
>>> I have
>>>
>>> static void foo ( void ) __attribute__((constructor));
>>>
>>> static void foo ( void )
>>> {
>>>   ...
>>>   call bar ();
>>>   ...
>>> }
>>>
>>> in my application. bar () uses those cpu specific functions.
>>> foo () is called before __cpu_indicator_init.  Since IFUNC
>>> returns the cpu specific function address only for the
>>> first call, the proper cpu specific functions will never be used.
>>
>> Please correct me if I am wrong since I did not follow the IFUNC part
>> you mentioned.  However, it looks like this could be solved with
>> adding an explicit call to __cpu_indicator_init from within the ctor
>> foo. To me, it seems like the pain of adding this call explicitly in
>> other ctors is worth it because it works cleanly for non-ctors.
>>
>> static void foo ( void ) __attribute__((constructor));
>>
>> static void foo ( void )
>> {
>>  ...
>>  __cpu_indicator_init ();
>>  call bar ();
>>  ...
>> }
>>
>> Will this work?
>>
>>
>
> Do I have to do that in every constructor, including
> C++ global constructors?  It is ridiculous.
>
> --
> H.J.
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
  2011-08-26 20:52                                             ` H.J. Lu
  2011-08-26 20:56                                               ` Xinliang David Li
@ 2011-08-26 22:12                                               ` Sriraman Tallam
  1 sibling, 0 replies; 50+ messages in thread
From: Sriraman Tallam @ 2011-08-26 22:12 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Michael Matz, Richard Guenther, Richard Henderson, reply,
	GCC Patches, Paul Pluzhnikov

On Fri, Aug 26, 2011 at 10:45 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Fri, Aug 26, 2011 at 10:37 AM, Sriraman Tallam <tmsriram@google.com> wrote:
>> On Fri, Aug 26, 2011 at 10:24 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Fri, Aug 26, 2011 at 10:17 AM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>> On Fri, Aug 26, 2011 at 10:10 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>> On Fri, Aug 26, 2011 at 10:06 AM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>>>> On Thu, Aug 25, 2011 at 6:02 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>>>> On Thu, Aug 25, 2011 at 5:37 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>>  Thanks for all the comments. I am attaching a new patch
>>>>>>>> incorporating all of the changes mentioned, mainly :
>>>>>>>>
>>>>>>>> 1) Make __cpu_indicator_init a constructor in libgcc and guard to call
>>>>>>>> it only once.
>>>>>>>
>>>>>>> This is unreliable and you don't need 3 symbols from libgcc. You can use
>>>>>>
>>>>>> Do you mean it is unreliable because of the constructor ordering problem?
>>>>>>
>>>>>
>>>>> You do not have total control when __cpu_indicator_init is called.
>>>>
>>>> Like  discussed before, for non-ctor functions, which in my opinion is
>>>> the common use case, it works out great because __cpu_indicator_init
>>>> is guaranteed to be called and I save doing an extra check. It is only
>>>>> for other ctors where this is a problem. So other ctors call this
>>>> explicitly.  What did I miss?
>>>>
>>>
>>> I have
>>>
>>> static void foo ( void ) __attribute__((constructor));
>>>
>>> static void foo ( void )
>>> {
>>>   ...
>>>   call bar ();
>>>   ...
>>> }
>>>
>>> in my application. bar () uses those cpu specific functions.
>>> foo () is called before __cpu_indicator_init.  Since IFUNC
>>> returns the cpu specific function address only for the
>>> first call, the proper cpu specific functions will never be used.
>>
>> Please correct me if I am wrong since I did not follow the IFUNC part
>> you mentioned.  However, it looks like this could be solved with
>> adding an explicit call to __cpu_indicator_init from within the ctor
>> foo. To me, it seems like the pain of adding this call explicitly in
>> other ctors is worth it because it works cleanly for non-ctors.
>>
>> static void foo ( void ) __attribute__((constructor));
>>
>> static void foo ( void )
>> {
>>  ...
>>  __cpu_indicator_init ();
>>  call bar ();
>>  ...
>> }
>>
>> Will this work?
>>
>>
>
> Do I have to do that in every constructor, including
> C++ global constructors?  It is ridiculous.

It seems like libgcc is on the link line after user code in the
command-line and so __cpu_indicator_init should fire first, both when
statically and dynamically linked.
Example:

foo.cc:
int  __attribute__ ((constructor))
foo ()
{
  return 0;
}


However, with something like this :

g++ -Wl,--u,__cpu_indicator_init  -lgcc foo.cc

foo gets called ahead of __cpu_indicator_init. For these abnormal link
usages, call it explicitly. So, can you please give me a common use
case where __cpu_inidicator_init will get called after a constructor.

Thanks,
-Sri.

>
> --
> H.J.
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
  2011-08-26 20:45                                           ` Sriraman Tallam
  2011-08-26 20:52                                             ` H.J. Lu
@ 2011-08-27  7:54                                             ` Xinliang David Li
  1 sibling, 0 replies; 50+ messages in thread
From: Xinliang David Li @ 2011-08-27  7:54 UTC (permalink / raw)
  To: Sriraman Tallam
  Cc: H.J. Lu, Michael Matz, Richard Guenther, Richard Henderson,
	reply, GCC Patches

IFUNC selector will need to call get_cpu_indicator (as proposed by HJ
or something similar), while in other contexts, the implementation
should find a way to make sure the indicator is already initialized
such that the builtins accessing the features can be directly used
(See also Michael and Richard's previous comments).  The runtime
penalty is much smaller.

david

On Fri, Aug 26, 2011 at 10:37 AM, Sriraman Tallam <tmsriram@google.com> wrote:
> On Fri, Aug 26, 2011 at 10:24 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Fri, Aug 26, 2011 at 10:17 AM, Sriraman Tallam <tmsriram@google.com> wrote:
>>> On Fri, Aug 26, 2011 at 10:10 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>> On Fri, Aug 26, 2011 at 10:06 AM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>>> On Thu, Aug 25, 2011 at 6:02 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>>> On Thu, Aug 25, 2011 at 5:37 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>>  Thanks for all the comments. I am attaching a new patch
>>>>>>> incorporating all of the changes mentioned, mainly :
>>>>>>>
>>>>>>> 1) Make __cpu_indicator_init a constructor in libgcc and guard to call
>>>>>>> it only once.
>>>>>>
>>>>>> This is unreliable and you don't need 3 symbols from libgcc. You can use
>>>>>
>>>>> Do you mean it is unreliable because of the constructor ordering problem?
>>>>>
>>>>
>>>> You do not have total control when __cpu_indicator_init is called.
>>>
>>> Like  discussed before, for non-ctor functions, which in my opinion is
>>> the common use case, it works out great because __cpu_indicator_init
>>> is guaranteed to be called and I save doing an extra check. It is only
>>>> for other ctors where this is a problem. So other ctors call this
>>> explicitly.  What did I miss?
>>>
>>
>> I have
>>
>> static void foo ( void ) __attribute__((constructor));
>>
>> static void foo ( void )
>> {
>>   ...
>>   call bar ();
>>   ...
>> }
>>
>> in my application. bar () uses those cpu specific functions.
>> foo () is called before __cpu_indicator_init.  Since IFUNC
>> returns the cpu specific function address only for the
>> first call, the proper cpu specific functions will never be used.
>
> Please correct me if I am wrong since I did not follow the IFUNC part
> you mentioned.  However, it looks like this could be solved with
> adding an explicit call to __cpu_indicator_init from within the ctor
> foo. To me, it seems like the pain of adding this call explicitly in
> other ctors is worth it because it works cleanly for non-ctors.
>
> static void foo ( void ) __attribute__((constructor));
>
> static void foo ( void )
> {
>  ...
>  __cpu_indicator_init ();
>  call bar ();
>  ...
> }
>
> Will this work?
>
> Thanks,
> -Sri.
>
>>
>>
>> --
>> H.J.
>>
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
  2011-08-26  7:33                                 ` H.J. Lu
  2011-08-26 17:59                                   ` Sriraman Tallam
@ 2011-08-28 20:27                                   ` Mike Stump
  1 sibling, 0 replies; 50+ messages in thread
From: Mike Stump @ 2011-08-28 20:27 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Sriraman Tallam, Michael Matz, Richard Guenther,
	Richard Henderson, reply, GCC Patches

On Aug 25, 2011, at 6:02 PM, H.J. Lu wrote:
> On Thu, Aug 25, 2011 at 5:37 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>> 
>> 1) Make __cpu_indicator_init a constructor in libgcc and guard to call
>> it only once.
> 
> This is unreliable and you don't need 3 symbols from libgcc.

I'll add that once you start adding ctors, it is hard to impossible to ever get rid of them, and some people actually care about start up time and have static warnings for any code that has _any_ global constructors.  By having it hidden behind an api, at least only that that want it, pay the price for it.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
  2011-08-26 20:56                                               ` Xinliang David Li
@ 2011-08-28 20:36                                                 ` Mike Stump
  0 siblings, 0 replies; 50+ messages in thread
From: Mike Stump @ 2011-08-28 20:36 UTC (permalink / raw)
  To: Xinliang David Li
  Cc: H.J. Lu, Sriraman Tallam, Michael Matz, Richard Guenther,
	Richard Henderson, reply, GCC Patches, Cary Coutant

On Aug 26, 2011, at 10:58 AM, Xinliang David Li wrote:
> Is there a standard way to force this init function to be called
> before all ctors?  Adding a ctor in one crtx.o ?

Including the ctors that want to run before all other ctors?  Think about that carefully.  :-)

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
  2011-08-26  7:24                               ` Sriraman Tallam
  2011-08-26  7:33                                 ` H.J. Lu
@ 2011-08-29  8:33                                 ` Xinliang David Li
  2011-09-01 18:56                                   ` Sriraman Tallam
  1 sibling, 1 reply; 50+ messages in thread
From: Xinliang David Li @ 2011-08-29  8:33 UTC (permalink / raw)
  To: Sriraman Tallam
  Cc: Michael Matz, H.J. Lu, Richard Guenther, Richard Henderson,
	reply, GCC Patches

Sri, please add a new api to do cpu_indicator initialization on demand
to be used in IFUNC context. Perhaps also add some debug check to make
sure no conflicting cpu model is set.

Ok for google branches for now while the discussion continues.

thanks,

David

On Thu, Aug 25, 2011 at 5:37 PM, Sriraman Tallam <tmsriram@google.com> wrote:
> Hi,
>
>  Thanks for all the comments. I am attaching a new patch
> incorporating all of the changes mentioned, mainly :
>
> 1) Make __cpu_indicator_init a constructor in libgcc and guard to call
> it only once.
> 2) Add symbol versions.
> 3) Move all builtins to the i386 port.
> 4) Add check for atom processor.
> 5) No separate passes to fold the builtins.
>
> Please let me know what you think.
> Thanks,
> -Sri.
>
>        * config/i386/i386.c (build_struct_with_one_bit_fields): New function.
>        (make_var_decl): New function.
>        (get_field_from_struct): New function.
>        (fold_builtin_target): New function.
>        (ix86_fold_builtin): New function.
>        (ix86_expand_builtin): Expand new builtins by folding them.
>        (TARGET_FOLD_BUILTIN): New macro.
>        (IX86_BUILTIN_CPU_SUPPORTS_CMOV): New enum value.
>        (IX86_BUILTIN_CPU_SUPPORTS_MMX): New enum value.
>        (IX86_BUILTIN_CPU_SUPPORTS_POPCOUNT): New enum value.
>        (IX86_BUILTIN_CPU_SUPPORTS_SSE): New enum value.
>        (IX86_BUILTIN_CPU_SUPPORTS_SSE2): New enum value.
>        (IX86_BUILTIN_CPU_SUPPORTS_SSE3): New enum value.
>        (IX86_BUILTIN_CPU_SUPPORTS_SSSE3): New enum value.
>        (IX86_BUILTIN_CPU_SUPPORTS_SSE4_1): New enum value.
>        (IX86_BUILTIN_CPU_SUPPORTS_SSE4_2): New enum value.
>        (IX86_BUILTIN_CPU_IS_AMD): New enum value.
>        (IX86_BUILTIN_CPU_IS_INTEL): New enum value.
>        (IX86_BUILTIN_CPU_IS_INTEL_ATOM): New enum value.
>        (IX86_BUILTIN_CPU_IS_INTEL_CORE2): New enum value.
>        (IX86_BUILTIN_CPU_IS_INTEL_COREI7_NEHALEM): New enum value.
>        (IX86_BUILTIN_CPU_IS_INTEL_COREI7_WESTMERE): New enum value.
>        (IX86_BUILTIN_CPU_IS_INTEL_COREI7_SANDYBRIDGE): New enum value.
>        (IX86_BUILTIN_CPU_IS_AMDFAM10_BARCELONA): New enum value.
>        (IX86_BUILTIN_CPU_IS_AMDFAM10_SHANGHAI): New enum value.
>        (IX86_BUILTIN_CPU_IS_AMDFAM10_ISTANBUL): New enum value.
>        * config/i386/libgcc-glibc.ver (__cpu_indicator_init): Export symbol.
>        (__cpu_model): Export symbol.
>        (__cpu_features): Export symbol.
>        * config/i386/i386-builtin-types.def: New function type.
>
>        * config/i386/i386-cpuinfo.c: New file.
>        * config/i386/t-cpuinfo: New file.
>        * config.host: Add t-cpuinfo to link i386-cpuinfo.o with libgcc
>
>        * gcc.dg/builtin_target.c: New test.
>
>
>
>
> On Tue, Aug 23, 2011 at 4:35 AM, Michael Matz <matz@suse.de> wrote:
>> Hi,
>>
>> On Mon, 22 Aug 2011, H.J. Lu wrote:
>>
>>> > void __attribute__((constructor)) bla(void)
>>> > {
>>> >  __cpu_indicator_init ();
>>> > }
>>> >
>>> > I don't see any complication.?
>>> >
>>>
>>> Order of constructors.  A constructor may call functions
>>> which use __cpu_indicator.
>>
>> That's why I wrote also:
>>
>>> The initializer function has to be callable from pre-.init contexts, e.g.
>>> ifunc dispatchers.
>>
>> It obviously has to be guarded against multiple calls.  The ctor in libgcc
>> would be mere convenience because then non-ctor code can rely on the data
>> being initialized, and only (potential) ctor code has to check and call
>> the init function on demand.
>>
>>
>> Ciao,
>> Michael.
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
  2011-08-29  8:33                                 ` Xinliang David Li
@ 2011-09-01 18:56                                   ` Sriraman Tallam
  0 siblings, 0 replies; 50+ messages in thread
From: Sriraman Tallam @ 2011-09-01 18:56 UTC (permalink / raw)
  To: Xinliang David Li
  Cc: Michael Matz, H.J. Lu, Richard Guenther, Richard Henderson,
	reply, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 5712 bytes --]

On Sun, Aug 28, 2011 at 5:31 PM, Xinliang David Li <davidxl@google.com> wrote:
> Sri, please add a new api to do cpu_indicator initialization on demand
> to be used in IFUNC context. Perhaps also add some debug check to make
> sure no conflicting cpu model is set.
>
> Ok for google branches for now while the discussion continues.

I made the changes and committed to the google branch for now.

	* config/i386/i386-cpuinfo.c: New file.
	* config/i386/t-cpuinfo: New file.
	* config.host: Add t-cpuinfo to link i386-cpuinfo.o with libgcc

	* config/i386/i386.c (build_struct_with_one_bit_fields): New function.
	(make_var_decl): New function.
	(get_field_from_struct): New function.
	(fold_builtin_target): New function.
	(ix86_fold_builtin): New function.
	(ix86_expand_builtin): Expand new builtins by folding them.
	(TARGET_FOLD_BUILTIN): New macro.
	(IX86_BUILTIN_CPU_SUPPORTS_CMOV): New enum value.
	(IX86_BUILTIN_CPU_SUPPORTS_MMX): New enum value.
	(IX86_BUILTIN_CPU_SUPPORTS_POPCOUNT): New enum value.
	(IX86_BUILTIN_CPU_SUPPORTS_SSE): New enum value.
	(IX86_BUILTIN_CPU_SUPPORTS_SSE2): New enum value.
	(IX86_BUILTIN_CPU_SUPPORTS_SSE3): New enum value.
	(IX86_BUILTIN_CPU_SUPPORTS_SSSE3): New enum value.
	(IX86_BUILTIN_CPU_SUPPORTS_SSE4_1): New enum value.
	(IX86_BUILTIN_CPU_SUPPORTS_SSE4_2): New enum value.
	(IX86_BUILTIN_CPU_INIT): New enum value.
	(IX86_BUILTIN_CPU_IS_AMD): New enum value.
	(IX86_BUILTIN_CPU_IS_INTEL): New enum value.
	(IX86_BUILTIN_CPU_IS_INTEL_ATOM): New enum value.
	(IX86_BUILTIN_CPU_IS_INTEL_CORE2): New enum value.
	(IX86_BUILTIN_CPU_IS_INTEL_COREI7_NEHALEM): New enum value.
	(IX86_BUILTIN_CPU_IS_INTEL_COREI7_WESTMERE): New enum value.
	(IX86_BUILTIN_CPU_IS_INTEL_COREI7_SANDYBRIDGE): New enum value.
	(IX86_BUILTIN_CPU_IS_AMDFAM10_BARCELONA): New enum value.
	(IX86_BUILTIN_CPU_IS_AMDFAM10_SHANGHAI): New enum value.
	(IX86_BUILTIN_CPU_IS_AMDFAM10_ISTANBUL): New enum value.
	* config/i386/libgcc-glibc.ver (__cpu_indicator_init): Export symbol.
	(__cpu_model): Export symbol.
	(__cpu_features): Export symbol.
	* config/i386/i386-builtin-types.def: New function type.

	* gcc.dg/builtin_target.c: New test.


Thanks,
-Sri.

>
> thanks,
>
> David
>
> On Thu, Aug 25, 2011 at 5:37 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>> Hi,
>>
>>  Thanks for all the comments. I am attaching a new patch
>> incorporating all of the changes mentioned, mainly :
>>
>> 1) Make __cpu_indicator_init a constructor in libgcc and guard to call
>> it only once.
>> 2) Add symbol versions.
>> 3) Move all builtins to the i386 port.
>> 4) Add check for atom processor.
>> 5) No separate passes to fold the builtins.
>>
>> Please let me know what you think.
>> Thanks,
>> -Sri.
>>
>>        * config/i386/i386.c (build_struct_with_one_bit_fields): New function.
>>        (make_var_decl): New function.
>>        (get_field_from_struct): New function.
>>        (fold_builtin_target): New function.
>>        (ix86_fold_builtin): New function.
>>        (ix86_expand_builtin): Expand new builtins by folding them.
>>        (TARGET_FOLD_BUILTIN): New macro.
>>        (IX86_BUILTIN_CPU_SUPPORTS_CMOV): New enum value.
>>        (IX86_BUILTIN_CPU_SUPPORTS_MMX): New enum value.
>>        (IX86_BUILTIN_CPU_SUPPORTS_POPCOUNT): New enum value.
>>        (IX86_BUILTIN_CPU_SUPPORTS_SSE): New enum value.
>>        (IX86_BUILTIN_CPU_SUPPORTS_SSE2): New enum value.
>>        (IX86_BUILTIN_CPU_SUPPORTS_SSE3): New enum value.
>>        (IX86_BUILTIN_CPU_SUPPORTS_SSSE3): New enum value.
>>        (IX86_BUILTIN_CPU_SUPPORTS_SSE4_1): New enum value.
>>        (IX86_BUILTIN_CPU_SUPPORTS_SSE4_2): New enum value.
>>        (IX86_BUILTIN_CPU_IS_AMD): New enum value.
>>        (IX86_BUILTIN_CPU_IS_INTEL): New enum value.
>>        (IX86_BUILTIN_CPU_IS_INTEL_ATOM): New enum value.
>>        (IX86_BUILTIN_CPU_IS_INTEL_CORE2): New enum value.
>>        (IX86_BUILTIN_CPU_IS_INTEL_COREI7_NEHALEM): New enum value.
>>        (IX86_BUILTIN_CPU_IS_INTEL_COREI7_WESTMERE): New enum value.
>>        (IX86_BUILTIN_CPU_IS_INTEL_COREI7_SANDYBRIDGE): New enum value.
>>        (IX86_BUILTIN_CPU_IS_AMDFAM10_BARCELONA): New enum value.
>>        (IX86_BUILTIN_CPU_IS_AMDFAM10_SHANGHAI): New enum value.
>>        (IX86_BUILTIN_CPU_IS_AMDFAM10_ISTANBUL): New enum value.
>>        * config/i386/libgcc-glibc.ver (__cpu_indicator_init): Export symbol.
>>        (__cpu_model): Export symbol.
>>        (__cpu_features): Export symbol.
>>        * config/i386/i386-builtin-types.def: New function type.
>>
>>        * config/i386/i386-cpuinfo.c: New file.
>>        * config/i386/t-cpuinfo: New file.
>>        * config.host: Add t-cpuinfo to link i386-cpuinfo.o with libgcc
>>
>>        * gcc.dg/builtin_target.c: New test.
>>
>>
>>
>>
>> On Tue, Aug 23, 2011 at 4:35 AM, Michael Matz <matz@suse.de> wrote:
>>> Hi,
>>>
>>> On Mon, 22 Aug 2011, H.J. Lu wrote:
>>>
>>>> > void __attribute__((constructor)) bla(void)
>>>> > {
>>>> >  __cpu_indicator_init ();
>>>> > }
>>>> >
>>>> > I don't see any complication.?
>>>> >
>>>>
>>>> Order of constructors.  A constructor may call functions
>>>> which use __cpu_indicator.
>>>
>>> That's why I wrote also:
>>>
>>>> The initializer function has to be callable from pre-.init contexts, e.g.
>>>> ifunc dispatchers.
>>>
>>> It obviously has to be guarded against multiple calls.  The ctor in libgcc
>>> would be mere convenience because then non-ctor code can rely on the data
>>> being initialized, and only (potential) ctor code has to check and call
>>> the init function on demand.
>>>
>>>
>>> Ciao,
>>> Michael.
>>
>

[-- Attachment #2: CPU_Runtime_patch.txt --]
[-- Type: text/plain, Size: 25552 bytes --]

Index: libgcc/config.host
===================================================================
--- libgcc/config.host	(revision 178425)
+++ libgcc/config.host	(working copy)
@@ -609,7 +609,7 @@ case ${host} in
 i[34567]86-*-linux* | x86_64-*-linux* | \
   i[34567]86-*-kfreebsd*-gnu | i[34567]86-*-knetbsd*-gnu | \
   i[34567]86-*-gnu*)
-	tmake_file="${tmake_file} t-tls"
+	tmake_file="${tmake_file} t-tls i386/t-cpuinfo"
 	if test "$libgcc_cv_cfi" = "yes"; then
 		tmake_file="${tmake_file} t-stack i386/t-stack-i386"
 	fi
Index: libgcc/config/i386/t-cpuinfo
===================================================================
--- libgcc/config/i386/t-cpuinfo	(revision 0)
+++ libgcc/config/i386/t-cpuinfo	(revision 0)
@@ -0,0 +1 @@
+LIB2ADD += $(srcdir)/config/i386/i386-cpuinfo.c
Index: libgcc/config/i386/i386-cpuinfo.c
===================================================================
--- libgcc/config/i386/i386-cpuinfo.c	(revision 0)
+++ libgcc/config/i386/i386-cpuinfo.c	(revision 0)
@@ -0,0 +1,278 @@
+/* Get CPU type and Features for x86 processors.
+   Copyright (C) 2011 Free Software Foundation, Inc.
+   Contributed by Sriraman Tallam (tmsriram@google.com)
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>. */
+
+#include "cpuid.h"
+#include "tsystem.h"
+
+int __cpu_indicator_init (void) __attribute__ ((constructor));
+
+enum vendor_signatures
+{
+  SIG_INTEL =	0x756e6547 /* Genu */,
+  SIG_AMD =	0x68747541 /* Auth */
+};
+
+
+/* Features supported. */
+
+struct __processor_features
+{
+  unsigned int __cpu_cmov : 1;
+  unsigned int __cpu_mmx : 1;
+  unsigned int __cpu_popcnt : 1;
+  unsigned int __cpu_sse : 1;
+  unsigned int __cpu_sse2 : 1;
+  unsigned int __cpu_sse3 : 1;
+  unsigned int __cpu_ssse3 : 1;
+  unsigned int __cpu_sse4_1 : 1;
+  unsigned int __cpu_sse4_2 : 1;
+} __cpu_features;
+
+/* Processor Model. */
+
+struct __processor_model
+{
+  /* Vendor. */
+  unsigned int __cpu_is_amd : 1;
+  unsigned int __cpu_is_intel : 1;
+  /* CPU type. */
+  unsigned int __cpu_is_intel_atom : 1;
+  unsigned int __cpu_is_intel_core2 : 1;
+  unsigned int __cpu_is_intel_corei7_nehalem : 1;
+  unsigned int __cpu_is_intel_corei7_westmere : 1;
+  unsigned int __cpu_is_intel_corei7_sandybridge : 1;
+  unsigned int __cpu_is_amdfam10_barcelona : 1;
+  unsigned int __cpu_is_amdfam10_shanghai : 1;
+  unsigned int __cpu_is_amdfam10_istanbul : 1;
+} __cpu_model;
+
+/* Get the specific type of AMD CPU.  */
+
+static void
+get_amd_cpu (unsigned int family, unsigned int model)
+{
+  switch (family)
+    {
+    case 0x10:
+      switch (model)
+	{
+	case 0x2:
+	  __cpu_model.__cpu_is_amdfam10_barcelona = 1;
+	  break;
+	case 0x4:
+	  __cpu_model.__cpu_is_amdfam10_shanghai = 1;
+	  break;
+	case 0x8:
+	  __cpu_model.__cpu_is_amdfam10_istanbul = 1;
+	  break;
+	default:
+	  break;
+	}
+      break;
+    default:
+      break;
+    }
+}
+
+/* Get the specific type of Intel CPU.  */
+
+static void
+get_intel_cpu (unsigned int family, unsigned int model, unsigned int brand_id)
+{
+  /* Parse family and model only if brand ID is 0. */
+  if (brand_id == 0)
+    {
+      switch (family)
+	{
+	case 0x5:
+	  /* Pentium.  */
+	  break;
+	case 0x6:
+	  switch (model)
+	    {
+	    case 0x1c:
+	    case 0x26:
+	      /* Atom.  */
+	      __cpu_model.__cpu_is_intel_atom = 1;
+	      break;
+	    case 0x1a:
+	    case 0x1e:
+	    case 0x1f:
+	    case 0x2e:
+	      /* Nehalem.  */
+	      __cpu_model.__cpu_is_intel_corei7_nehalem = 1;
+	      break;
+	    case 0x25:
+	    case 0x2c:
+	    case 0x2f:
+	      /* Westmere.  */
+	      __cpu_model.__cpu_is_intel_corei7_westmere = 1;
+	      break;
+	    case 0x2a:
+	      /* Sandy Bridge.  */
+	      __cpu_model.__cpu_is_intel_corei7_sandybridge = 1;
+	      break;
+	    case 0x17:
+	    case 0x1d:
+	      /* Penryn.  */
+	    case 0x0f:
+	      /* Merom.  */
+	      __cpu_model.__cpu_is_intel_core2 = 1;
+	      break;
+	    default:
+	      break;
+	    }
+	  break;
+	default:
+	  /* We have no idea.  */
+	  break;
+	}
+    }
+}	             	
+
+static void
+get_available_features (unsigned int ecx, unsigned int edx)
+{
+  __cpu_features.__cpu_cmov = (edx & bit_CMOV) ? 1 : 0;
+  __cpu_features.__cpu_mmx = (edx & bit_MMX) ? 1 : 0;
+  __cpu_features.__cpu_sse = (edx & bit_SSE) ? 1 : 0;
+  __cpu_features.__cpu_sse2 = (edx & bit_SSE2) ? 1 : 0;
+  __cpu_features.__cpu_popcnt = (ecx & bit_POPCNT) ? 1 : 0;
+  __cpu_features.__cpu_sse3 = (ecx & bit_SSE3) ? 1 : 0;
+  __cpu_features.__cpu_ssse3 = (ecx & bit_SSSE3) ? 1 : 0;
+  __cpu_features.__cpu_sse4_1 = (ecx & bit_SSE4_1) ? 1 : 0;
+  __cpu_features.__cpu_sse4_2 = (ecx & bit_SSE4_2) ? 1 : 0;
+}
+
+
+/* Sanity check for the vendor and cpu type flags.  */
+
+static int
+sanity_check (void)
+{
+  unsigned int one_type = 0;
+
+  /* Vendor cannot be Intel and AMD. */
+  gcc_assert((__cpu_model.__cpu_is_intel == 0)
+             || (__cpu_model.__cpu_is_amd == 0));
+ 
+  /* Only one CPU type can be set. */
+  one_type = (__cpu_model.__cpu_is_intel_atom
+	      + __cpu_model.__cpu_is_intel_core2
+	      + __cpu_model.__cpu_is_intel_corei7_nehalem
+	      + __cpu_model.__cpu_is_intel_corei7_westmere
+	      + __cpu_model.__cpu_is_intel_corei7_sandybridge
+	      + __cpu_model.__cpu_is_amdfam10_barcelona
+	      + __cpu_model.__cpu_is_amdfam10_shanghai
+	      + __cpu_model.__cpu_is_amdfam10_istanbul);
+
+  gcc_assert (one_type <= 1);
+  return 0;
+}
+
+/* A noinline function calling __get_cpuid. Having many calls to
+   cpuid in one function in 32-bit mode causes GCC to complain:
+   "can’t find a register in class ‘CLOBBERED_REGS’".  This is
+   related to PR rtl-optimization 44174. */
+
+static int __attribute__ ((noinline))
+__get_cpuid_output (unsigned int __level,
+		    unsigned int *__eax, unsigned int *__ebx,
+		    unsigned int *__ecx, unsigned int *__edx)
+{
+  return __get_cpuid (__level, __eax, __ebx, __ecx, __edx);
+}
+
+
+/* A constructor function that is sets __cpu_model and __cpu_features with
+   the right values.  This needs to run only once.
+   If another constructor needs to use these values, explicitly call this
+   function from the other constructor.  Otherwise, the ordering of
+   constructors could make this constructor run later.  */
+
+int __attribute__ ((constructor))
+__cpu_indicator_init (void)
+{
+  unsigned int eax, ebx, ecx, edx;
+
+  int max_level = 5;
+  unsigned int vendor;
+  unsigned int model, family, brand_id;
+  static int called = 0;
+
+  /* This function needs to run just once.  */
+  if (called)
+    return 0;
+  else
+    called = 1;
+
+  /* Assume cpuid insn present. Run in level 0 to get vendor id. */
+  if (!__get_cpuid_output (0, &eax, &ebx, &ecx, &edx))
+    return -1;
+
+  vendor = ebx;
+  max_level = eax;
+
+  if (max_level < 1)
+    return -1;
+
+  if (!__get_cpuid_output (1, &eax, &ebx, &ecx, &edx))
+    return -1;
+
+  model = (eax >> 4) & 0x0f;
+  family = (eax >> 8) & 0x0f;
+  brand_id = ebx & 0xff;
+
+  /* Adjust model and family for Intel CPUS. */
+  if (vendor == SIG_INTEL)
+    {
+      unsigned int extended_model, extended_family;
+
+      extended_model = (eax >> 12) & 0xf0;
+      extended_family = (eax >> 20) & 0xff;
+      if (family == 0x0f)
+	{
+	  family += extended_family;
+	  model += extended_model;
+	}
+      else if (family == 0x06)
+	model += extended_model;
+    }
+
+  /* Find CPU model. */
+
+  if (vendor == SIG_AMD)
+    {
+      __cpu_model.__cpu_is_amd = 1;
+      get_amd_cpu (family, model);
+    }
+  else if (vendor == SIG_INTEL)
+    {
+      __cpu_model.__cpu_is_intel = 1;
+      get_intel_cpu (family, model, brand_id);
+    }
+
+  /* Find available features. */
+  get_available_features (ecx, edx);
+
+  sanity_check ();
+
+  return 0;
+}
Index: gcc/testsuite/gcc.dg/builtin_target.c
===================================================================
--- gcc/testsuite/gcc.dg/builtin_target.c	(revision 0)
+++ gcc/testsuite/gcc.dg/builtin_target.c	(revision 0)
@@ -0,0 +1,53 @@
+/* This test checks if the __builtin_cpu_* calls are recognized. */
+
+/* { dg-do run } */
+
+int
+fn1 ()
+{
+  if (__builtin_cpu_supports_cmov () < 0)
+    return -1;
+  if (__builtin_cpu_supports_mmx () < 0)
+    return -1;
+  if (__builtin_cpu_supports_popcount () < 0)
+    return -1;
+  if (__builtin_cpu_supports_sse () < 0)
+    return -1;
+  if (__builtin_cpu_supports_sse2 () < 0)
+    return -1;
+  if (__builtin_cpu_supports_sse3 () < 0)
+    return -1;
+  if (__builtin_cpu_supports_ssse3 () < 0)
+    return -1;
+  if (__builtin_cpu_supports_sse4_1 () < 0)
+    return -1;
+  if (__builtin_cpu_supports_sse4_2 () < 0)
+    return -1;
+  if (__builtin_cpu_is_amd () < 0)
+    return -1;
+  if (__builtin_cpu_is_intel () < 0)
+    return -1;
+  if (__builtin_cpu_is_intel_atom () < 0)
+    return -1;
+  if (__builtin_cpu_is_intel_core2 () < 0)
+    return -1;
+  if (__builtin_cpu_is_intel_corei7_nehalem () < 0)
+    return -1;
+  if (__builtin_cpu_is_intel_corei7_westmere () < 0)
+    return -1;
+  if (__builtin_cpu_is_intel_corei7_sandybridge () < 0)
+    return -1;
+  if (__builtin_cpu_is_amdfam10_barcelona () < 0)
+    return -1;
+  if (__builtin_cpu_is_amdfam10_shanghai () < 0)
+    return -1;
+  if (__builtin_cpu_is_amdfam10_istanbul () < 0)
+    return -1;
+
+  return 0;
+}
+
+int main ()
+{
+  return fn1 ();
+}
Index: gcc/config/i386/i386-builtin-types.def
===================================================================
--- gcc/config/i386/i386-builtin-types.def	(revision 178425)
+++ gcc/config/i386/i386-builtin-types.def	(working copy)
@@ -131,6 +131,7 @@ DEF_FUNCTION_TYPE (UINT64)
 DEF_FUNCTION_TYPE (UNSIGNED)
 DEF_FUNCTION_TYPE (VOID)
 DEF_FUNCTION_TYPE (PVOID)
+DEF_FUNCTION_TYPE (INT)
 
 DEF_FUNCTION_TYPE (FLOAT, FLOAT)
 DEF_FUNCTION_TYPE (FLOAT128, FLOAT128)
Index: gcc/config/i386/libgcc-glibc.ver
===================================================================
--- gcc/config/i386/libgcc-glibc.ver	(revision 178425)
+++ gcc/config/i386/libgcc-glibc.ver	(working copy)
@@ -147,6 +147,12 @@ GCC_4.3.0 {
   __trunctfxf2
   __unordtf2
 }
+
+GCC_4.6.0 {
+  __cpu_indicator_init
+  __cpu_model
+  __cpu_features
+}
 %else
 GCC_4.4.0 {
   __addtf3
@@ -183,4 +189,10 @@ GCC_4.4.0 {
 GCC_4.5.0 {
   __extendxftf2
 }
+
+GCC_4.6.0 {
+  __cpu_indicator_init
+  __cpu_model
+  __cpu_features
+}
 %endif
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 178425)
+++ gcc/config/i386/i386.c	(working copy)
@@ -58,6 +58,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "sched-int.h"
 #include "sbitmap.h"
 #include "fibheap.h"
+#include "tree-flow.h"
+#include "tree-pass.h"
 
 enum upper_128bits_state
 {
@@ -24443,6 +24445,29 @@ enum ix86_builtins
   /* CFString built-in for darwin */
   IX86_BUILTIN_CFSTRING,
 
+  /* Builtins to get CPU features. */
+  IX86_BUILTIN_CPU_SUPPORTS_CMOV,
+  IX86_BUILTIN_CPU_SUPPORTS_MMX,
+  IX86_BUILTIN_CPU_SUPPORTS_POPCOUNT,
+  IX86_BUILTIN_CPU_SUPPORTS_SSE,
+  IX86_BUILTIN_CPU_SUPPORTS_SSE2,
+  IX86_BUILTIN_CPU_SUPPORTS_SSE3,
+  IX86_BUILTIN_CPU_SUPPORTS_SSSE3,
+  IX86_BUILTIN_CPU_SUPPORTS_SSE4_1,
+  IX86_BUILTIN_CPU_SUPPORTS_SSE4_2,
+  /* Builtins to get CPU type. */
+  IX86_BUILTIN_CPU_INIT,
+  IX86_BUILTIN_CPU_IS_AMD,
+  IX86_BUILTIN_CPU_IS_INTEL,
+  IX86_BUILTIN_CPU_IS_INTEL_ATOM,
+  IX86_BUILTIN_CPU_IS_INTEL_CORE2,
+  IX86_BUILTIN_CPU_IS_INTEL_COREI7_NEHALEM,
+  IX86_BUILTIN_CPU_IS_INTEL_COREI7_WESTMERE,
+  IX86_BUILTIN_CPU_IS_INTEL_COREI7_SANDYBRIDGE,
+  IX86_BUILTIN_CPU_IS_AMDFAM10_BARCELONA,
+  IX86_BUILTIN_CPU_IS_AMDFAM10_SHANGHAI,
+  IX86_BUILTIN_CPU_IS_AMDFAM10_ISTANBUL,
+
   IX86_BUILTIN_MAX
 };
 
@@ -25809,6 +25834,318 @@ ix86_init_mmx_sse_builtins (void)
     }
 }
 
+/* Returns a struct type with name NAME and number of fields equal to
+   NUM_FIELDS.  Each field is a unsigned int bit field of length 1 bit. */
+
+static tree
+build_struct_with_one_bit_fields (int num_fields, const char *name)
+{
+  int i;
+  char field_name [10];
+  tree field = NULL_TREE, field_chain = NULL_TREE;
+  tree type = make_node (RECORD_TYPE);
+
+  strcpy (field_name, "k_field");
+
+  for (i = 0; i < num_fields; i++)
+    {
+      /* Name the fields, 0_field, 1_field, ... */
+      field_name [0] = '0' + i;
+      field = build_decl (UNKNOWN_LOCATION, FIELD_DECL,
+			  get_identifier (field_name), unsigned_type_node);
+      DECL_BIT_FIELD (field) = 1;
+      DECL_SIZE (field) = bitsize_one_node;
+      if (field_chain != NULL_TREE)
+	DECL_CHAIN (field) = field_chain;
+      field_chain = field;
+    }
+  finish_builtin_struct (type, name, field_chain, NULL_TREE);
+  return type;
+}
+
+/* Returns a extern, comdat VAR_DECL of type TYPE and name NAME. */
+
+static tree
+make_var_decl (tree type, const char *name)
+{
+  tree new_decl;
+  struct varpool_node *vnode;
+
+  new_decl = build_decl (UNKNOWN_LOCATION,
+	                 VAR_DECL,
+	  	         get_identifier(name),
+		         type);
+
+  DECL_EXTERNAL (new_decl) = 1;
+  TREE_STATIC (new_decl) = 1;
+  TREE_PUBLIC (new_decl) = 1;
+  DECL_INITIAL (new_decl) = 0;
+  DECL_ARTIFICIAL (new_decl) = 0;
+  DECL_PRESERVE_P (new_decl) = 1;
+
+  make_decl_one_only (new_decl, DECL_ASSEMBLER_NAME (new_decl));
+  assemble_variable (new_decl, 0, 0, 0);
+
+  vnode = varpool_node (new_decl);
+  gcc_assert (vnode != NULL);
+  /* Set finalized to 1, otherwise it asserts in function "write_symbol" in
+     lto-streamer-out.c. */
+  vnode->finalized = 1;
+
+  return new_decl;
+}
+
+/* Traverses the chain of fields in STRUCT_TYPE and returns the FIELD_NUM
+   numbered field. */
+
+static tree
+get_field_from_struct (tree struct_type, int field_num)
+{
+  int i;
+  tree field = TYPE_FIELDS (struct_type);
+
+  for (i = 0; i < field_num; i++, field = DECL_CHAIN(field))
+    {
+      gcc_assert (field != NULL_TREE);
+    }
+
+  return field;
+}
+
+/* FNDECL is a __builtin_cpu_* call that is folded into an integer defined
+   in libgcc/config/i386/i386-cpuinfo.c */
+
+static tree 
+fold_builtin_cpu (enum ix86_builtins fn_code)
+{
+  /* This is the order of bit-fields in __processor_features in
+     i386-cpuinfo.c */
+  enum processor_features
+  {
+    F_CMOV = 0,
+    F_MMX,
+    F_POPCNT,
+    F_SSE,
+    F_SSE2,
+    F_SSE3,
+    F_SSSE3,
+    F_SSE4_1,
+    F_SSE4_2,
+    F_MAX
+  };
+
+  /* This is the order of bit-fields in __processor_model in
+     i386-cpuinfo.c */
+  enum processor_model
+  {
+    M_AMD = 0,
+    M_INTEL,
+    M_INTEL_ATOM,
+    M_INTEL_CORE2,
+    M_INTEL_COREI7_NEHALEM,
+    M_INTEL_COREI7_WESTMERE,
+    M_INTEL_COREI7_SANDYBRIDGE,
+    M_AMDFAM10_BARCELONA,
+    M_AMDFAM10_SHANGHAI,
+    M_AMDFAM10_ISTANBUL,
+    M_MAX
+  };
+
+  static tree __processor_features_type = NULL_TREE;
+  static tree __cpu_features_var = NULL_TREE;
+  static tree __processor_model_type = NULL_TREE;
+  static tree __cpu_model_var = NULL_TREE;
+  static tree field;
+  static tree which_struct;
+
+  if (__processor_features_type == NULL_TREE)
+    __processor_features_type = build_struct_with_one_bit_fields (F_MAX,
+ 			          "__processor_features");
+
+  if (__processor_model_type == NULL_TREE)
+    __processor_model_type = build_struct_with_one_bit_fields (M_MAX,
+ 			          "__processor_model");
+
+  if (__cpu_features_var == NULL_TREE)
+    __cpu_features_var = make_var_decl (__processor_features_type,
+					"__cpu_features");
+
+  if (__cpu_model_var == NULL_TREE)
+    __cpu_model_var = make_var_decl (__processor_model_type,
+				     "__cpu_model");
+
+  /* Look at the code to identify the field requested. */ 
+  switch (fn_code)
+    {
+    case IX86_BUILTIN_CPU_SUPPORTS_CMOV:
+      field = get_field_from_struct (__processor_features_type, F_CMOV);
+      which_struct = __cpu_features_var;
+      break;
+    case IX86_BUILTIN_CPU_SUPPORTS_MMX:
+      field = get_field_from_struct (__processor_features_type, F_MMX);
+      which_struct = __cpu_features_var;
+      break;
+    case IX86_BUILTIN_CPU_SUPPORTS_POPCOUNT:
+      field = get_field_from_struct (__processor_features_type, F_POPCNT);
+      which_struct = __cpu_features_var;
+      break;
+    case IX86_BUILTIN_CPU_SUPPORTS_SSE:
+      field = get_field_from_struct (__processor_features_type, F_SSE);
+      which_struct = __cpu_features_var;
+      break;
+    case IX86_BUILTIN_CPU_SUPPORTS_SSE2:
+      field = get_field_from_struct (__processor_features_type, F_SSE2);
+      which_struct = __cpu_features_var;
+      break;
+    case IX86_BUILTIN_CPU_SUPPORTS_SSE3:
+      field = get_field_from_struct (__processor_features_type, F_SSE3);
+      which_struct = __cpu_features_var;
+      break;
+    case IX86_BUILTIN_CPU_SUPPORTS_SSSE3:
+      field = get_field_from_struct (__processor_features_type, F_SSSE3);
+      which_struct = __cpu_features_var;
+      break;
+    case IX86_BUILTIN_CPU_SUPPORTS_SSE4_1:
+      field = get_field_from_struct (__processor_features_type, F_SSE4_1);
+      which_struct = __cpu_features_var;
+      break;
+    case IX86_BUILTIN_CPU_SUPPORTS_SSE4_2:
+      field = get_field_from_struct (__processor_features_type, F_SSE4_2);
+      which_struct = __cpu_features_var;
+      break;
+    case IX86_BUILTIN_CPU_IS_AMD:
+      field = get_field_from_struct (__processor_model_type, M_AMD);
+      which_struct = __cpu_model_var;
+      break;
+    case IX86_BUILTIN_CPU_IS_INTEL:
+      field = get_field_from_struct (__processor_model_type, M_INTEL);
+      which_struct = __cpu_model_var;
+      break;
+    case IX86_BUILTIN_CPU_IS_INTEL_ATOM:
+      field = get_field_from_struct (__processor_model_type, M_INTEL_ATOM);
+      which_struct = __cpu_model_var;
+      break;
+    case IX86_BUILTIN_CPU_IS_INTEL_CORE2:
+      field = get_field_from_struct (__processor_model_type, M_INTEL_CORE2);
+      which_struct = __cpu_model_var;
+      break;
+    case IX86_BUILTIN_CPU_IS_INTEL_COREI7_NEHALEM:
+      field = get_field_from_struct (__processor_model_type,
+				     M_INTEL_COREI7_NEHALEM);
+      which_struct = __cpu_model_var;
+      break;
+    case IX86_BUILTIN_CPU_IS_INTEL_COREI7_WESTMERE:
+      field = get_field_from_struct (__processor_model_type,
+				     M_INTEL_COREI7_WESTMERE);
+      which_struct = __cpu_model_var;
+      break;
+    case IX86_BUILTIN_CPU_IS_INTEL_COREI7_SANDYBRIDGE:
+      field = get_field_from_struct (__processor_model_type,
+				     M_INTEL_COREI7_SANDYBRIDGE);
+      which_struct = __cpu_model_var;
+      break;
+    case IX86_BUILTIN_CPU_IS_AMDFAM10_BARCELONA:
+      field = get_field_from_struct (__processor_model_type,
+				     M_AMDFAM10_BARCELONA);
+      which_struct = __cpu_model_var;
+      break;
+    case IX86_BUILTIN_CPU_IS_AMDFAM10_SHANGHAI:
+      field = get_field_from_struct (__processor_model_type,
+				     M_AMDFAM10_SHANGHAI);
+      which_struct = __cpu_model_var;
+      break;
+    case IX86_BUILTIN_CPU_IS_AMDFAM10_ISTANBUL:
+      field = get_field_from_struct (__processor_model_type,
+				     M_AMDFAM10_ISTANBUL);
+      which_struct = __cpu_model_var;
+      break;
+    default:
+      return NULL_TREE;
+    }
+
+  return build3 (COMPONENT_REF, TREE_TYPE (field), which_struct, field, NULL_TREE);
+}
+
+static tree
+ix86_fold_builtin (tree fndecl, int n_args ATTRIBUTE_UNUSED,
+		   tree *args ATTRIBUTE_UNUSED, bool ignore ATTRIBUTE_UNUSED)
+{
+  const char* decl_name = IDENTIFIER_POINTER (DECL_NAME (fndecl));
+  if (DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_MD
+      && strstr(decl_name, "__builtin_cpu") != NULL)
+    {
+      enum ix86_builtins code = (enum ix86_builtins)
+				DECL_FUNCTION_CODE (fndecl);
+      return fold_builtin_cpu (code);
+    }
+  return NULL_TREE;
+}
+
+/* A builtin to init/return the cpu type or feature.  Returns an
+   integer and the type is a const if IS_CONST is set. */
+
+static void
+make_platform_builtin (const char* name, int code, int is_const)
+{
+  tree decl;
+  tree type;
+
+  type = ix86_get_builtin_func_type (INT_FTYPE_VOID);
+  decl = add_builtin_function (name, type, code, BUILT_IN_MD,
+			       NULL, NULL_TREE);
+  gcc_assert (decl != NULL_TREE);
+  ix86_builtins[(int) code] = decl;
+  if (is_const)
+    TREE_READONLY (decl) = 1;
+} 
+
+/* Builtins to get CPU type and features supported. */
+
+static void
+ix86_init_platform_type_builtins (void)
+{
+  make_platform_builtin ("__builtin_cpu_init",
+			 IX86_BUILTIN_CPU_INIT, 0);
+  make_platform_builtin ("__builtin_cpu_supports_cmov",
+			 IX86_BUILTIN_CPU_SUPPORTS_CMOV, 1);
+  make_platform_builtin ("__builtin_cpu_supports_mmx",
+			 IX86_BUILTIN_CPU_SUPPORTS_MMX, 1);
+  make_platform_builtin ("__builtin_cpu_supports_popcount",
+			 IX86_BUILTIN_CPU_SUPPORTS_POPCOUNT, 1);
+  make_platform_builtin ("__builtin_cpu_supports_sse",
+			 IX86_BUILTIN_CPU_SUPPORTS_SSE, 1);
+  make_platform_builtin ("__builtin_cpu_supports_sse2",
+			 IX86_BUILTIN_CPU_SUPPORTS_SSE2, 1);
+  make_platform_builtin ("__builtin_cpu_supports_sse3",
+			 IX86_BUILTIN_CPU_SUPPORTS_SSE3, 1);
+  make_platform_builtin ("__builtin_cpu_supports_ssse3",
+			 IX86_BUILTIN_CPU_SUPPORTS_SSSE3, 1);
+  make_platform_builtin ("__builtin_cpu_supports_sse4_1",
+			 IX86_BUILTIN_CPU_SUPPORTS_SSE4_1, 1);
+  make_platform_builtin ("__builtin_cpu_supports_sse4_2",
+			 IX86_BUILTIN_CPU_SUPPORTS_SSE4_2, 1);
+  make_platform_builtin ("__builtin_cpu_is_amd",
+			 IX86_BUILTIN_CPU_IS_AMD, 1);
+  make_platform_builtin ("__builtin_cpu_is_intel_atom",
+			 IX86_BUILTIN_CPU_IS_INTEL_ATOM, 1);
+  make_platform_builtin ("__builtin_cpu_is_intel_core2",
+			 IX86_BUILTIN_CPU_IS_INTEL_CORE2, 1);
+  make_platform_builtin ("__builtin_cpu_is_intel",
+			 IX86_BUILTIN_CPU_IS_INTEL, 1);
+  make_platform_builtin ("__builtin_cpu_is_intel_corei7_nehalem",
+			 IX86_BUILTIN_CPU_IS_INTEL_COREI7_NEHALEM, 1);
+  make_platform_builtin ("__builtin_cpu_is_intel_corei7_westmere",
+			 IX86_BUILTIN_CPU_IS_INTEL_COREI7_WESTMERE, 1);
+  make_platform_builtin ("__builtin_cpu_is_intel_corei7_sandybridge",
+			 IX86_BUILTIN_CPU_IS_INTEL_COREI7_SANDYBRIDGE, 1);
+  make_platform_builtin ("__builtin_cpu_is_amdfam10_barcelona",
+			 IX86_BUILTIN_CPU_IS_AMDFAM10_BARCELONA, 1);
+  make_platform_builtin ("__builtin_cpu_is_amdfam10_shanghai",
+			 IX86_BUILTIN_CPU_IS_AMDFAM10_SHANGHAI, 1);
+  make_platform_builtin ("__builtin_cpu_is_amdfam10_istanbul",
+			 IX86_BUILTIN_CPU_IS_AMDFAM10_ISTANBUL, 1);
+}
+
 /* Internal method for ix86_init_builtins.  */
 
 static void
@@ -25892,6 +26229,9 @@ ix86_init_builtins (void)
 
   ix86_init_builtin_types ();
 
+  /* Builtins to get CPU type and features. */
+  ix86_init_platform_type_builtins ();
+
   /* TFmode support builtins.  */
   def_builtin_const (0, "__builtin_infq",
 		     FLOAT128_FTYPE_VOID, IX86_BUILTIN_INFQ);
@@ -27351,6 +27691,44 @@ ix86_expand_builtin (tree exp, rtx target, rtx sub
   enum machine_mode mode0, mode1, mode2;
   unsigned int fcode = DECL_FUNCTION_CODE (fndecl);
 
+  /* For CPU builtins that can be folded, fold first and expand the fold.  */
+  switch (fcode)
+    {
+    case IX86_BUILTIN_CPU_SUPPORTS_CMOV:
+    case IX86_BUILTIN_CPU_SUPPORTS_MMX:
+    case IX86_BUILTIN_CPU_SUPPORTS_POPCOUNT:
+    case IX86_BUILTIN_CPU_SUPPORTS_SSE:
+    case IX86_BUILTIN_CPU_SUPPORTS_SSE2:
+    case IX86_BUILTIN_CPU_SUPPORTS_SSE3:
+    case IX86_BUILTIN_CPU_SUPPORTS_SSSE3:
+    case IX86_BUILTIN_CPU_SUPPORTS_SSE4_1:
+    case IX86_BUILTIN_CPU_SUPPORTS_SSE4_2:
+    case IX86_BUILTIN_CPU_IS_AMD:
+    case IX86_BUILTIN_CPU_IS_INTEL:
+    case IX86_BUILTIN_CPU_IS_INTEL_ATOM:
+    case IX86_BUILTIN_CPU_IS_INTEL_CORE2:
+    case IX86_BUILTIN_CPU_IS_INTEL_COREI7_NEHALEM:
+    case IX86_BUILTIN_CPU_IS_INTEL_COREI7_WESTMERE:
+    case IX86_BUILTIN_CPU_IS_INTEL_COREI7_SANDYBRIDGE:
+    case IX86_BUILTIN_CPU_IS_AMDFAM10_BARCELONA:
+    case IX86_BUILTIN_CPU_IS_AMDFAM10_SHANGHAI:
+    case IX86_BUILTIN_CPU_IS_AMDFAM10_ISTANBUL:
+      {
+        tree fold_expr = fold_builtin_cpu ((enum ix86_builtins) fcode);
+	gcc_assert (fold_expr != NULL_TREE);
+        return expand_expr (fold_expr, target, mode, EXPAND_NORMAL);
+      }
+    case IX86_BUILTIN_CPU_INIT:
+      {
+	/* Make it call __cpu_indicator_init in libgcc. */
+	tree call_expr, fndecl, type;
+        type = build_function_type_list (integer_type_node, NULL_TREE); 
+	fndecl = build_fn_decl ("__cpu_indicator_init", type);
+	call_expr = build_call_expr (fndecl, 0); 
+	return expand_expr (call_expr, target, mode, EXPAND_NORMAL);
+      }
+    }
+
   /* Determine whether the builtin function is available under the current ISA.
      Originally the builtin was not created if it wasn't applicable to the
      current ISA based on the command line switches.  With function specific
@@ -35097,6 +35475,9 @@ ix86_autovectorize_vector_sizes (void)
 #undef TARGET_BUILD_BUILTIN_VA_LIST
 #define TARGET_BUILD_BUILTIN_VA_LIST ix86_build_builtin_va_list
 
+#undef TARGET_FOLD_BUILTIN
+#define TARGET_FOLD_BUILTIN ix86_fold_builtin
+
 #undef TARGET_ENUM_VA_LIST_P
 #define TARGET_ENUM_VA_LIST_P ix86_enum_va_list
 

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2011-09-01 18:56 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-16 21:27 [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046) Sriraman Tallam
2011-08-16 21:27 ` H.J. Lu
2011-08-16 21:52   ` Sriraman Tallam
2011-08-16 23:22 ` Andi Kleen
2011-08-17  6:55 ` Joseph S. Myers
2011-08-17  8:28   ` Sriraman Tallam
2011-08-17  9:38 ` Richard Guenther
2011-08-17 20:04   ` Sriraman Tallam
2011-08-18  9:33     ` Richard Guenther
2011-08-18 14:04       ` Michael Matz
2011-08-18 17:12         ` Xinliang David Li
2011-08-18 21:15       ` Sriraman Tallam
2011-08-18 21:53         ` Richard Henderson
2011-08-18 22:49           ` Sriraman Tallam
2011-08-19  0:30             ` Richard Henderson
2011-08-19 11:55               ` Richard Guenther
2011-08-19 12:11                 ` Jakub Jelinek
2011-08-20 21:48                 ` Richard Henderson
2011-08-20 22:02                   ` H.J. Lu
2011-08-21 11:05                   ` Richard Guenther
2011-08-22 14:27                     ` Michael Matz
2011-08-22 14:33                       ` H.J. Lu
2011-08-22 17:11                         ` Michael Matz
2011-08-22 17:18                           ` H.J. Lu
2011-08-22 19:02                             ` Sriraman Tallam
2011-08-22 19:26                               ` H.J. Lu
2011-08-22 19:44                                 ` Sriraman Tallam
2011-08-22 20:49                             ` Richard Guenther
2011-08-22 20:55                               ` H.J. Lu
2011-08-22 21:22                                 ` Richard Guenther
2011-08-22 21:42                                   ` H.J. Lu
2011-08-22 22:26                                     ` Richard Guenther
2011-08-23 12:33                             ` Michael Matz
2011-08-26  7:24                               ` Sriraman Tallam
2011-08-26  7:33                                 ` H.J. Lu
2011-08-26 17:59                                   ` Sriraman Tallam
2011-08-26 18:21                                     ` H.J. Lu
2011-08-26 19:03                                       ` Sriraman Tallam
2011-08-26 19:41                                         ` H.J. Lu
2011-08-26 20:45                                           ` Sriraman Tallam
2011-08-26 20:52                                             ` H.J. Lu
2011-08-26 20:56                                               ` Xinliang David Li
2011-08-28 20:36                                                 ` Mike Stump
2011-08-26 22:12                                               ` Sriraman Tallam
2011-08-27  7:54                                             ` Xinliang David Li
2011-08-28 20:27                                   ` Mike Stump
2011-08-29  8:33                                 ` Xinliang David Li
2011-09-01 18:56                                   ` Sriraman Tallam
2011-08-18  1:56 ` Hans-Peter Nilsson
2011-08-18  2:06   ` Sriraman Tallam

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).