public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: [RFC, x86] Changes for AVX and AVX2 processors
@ 2012-12-28 13:36 Uros Bizjak
  2012-12-29  5:26 ` Vladimir Yakovlev
  0 siblings, 1 reply; 18+ messages in thread
From: Uros Bizjak @ 2012-12-28 13:36 UTC (permalink / raw)
  To: gcc-patches; +Cc: Vladimir Yakovlev

Hello!

> New processors core-avx and core-avx2 are added. It was done to have
> possibilities to turn new features on for these processors. Please review.

I don't think this is a good approach, you are mixing an architecture
with an ISA extension in the name. We already have
processor_alias_table, where processor architecture and features
(extensions) can be activated, depending on the name.

Uros.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC, x86] Changes for AVX and AVX2 processors
  2012-12-28 13:36 [RFC, x86] Changes for AVX and AVX2 processors Uros Bizjak
@ 2012-12-29  5:26 ` Vladimir Yakovlev
  2012-12-29 10:07   ` Uros Bizjak
  0 siblings, 1 reply; 18+ messages in thread
From: Vladimir Yakovlev @ 2012-12-29  5:26 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches

Hello,

processor_alias_table contains the same processor type for all
"corei7", "corei7-avx", "core-avx-i" and "core-avx2". At least, it has
consequence on checking x86_avx256_split_unaligned_load &
ix86_tune_mask: for all these processors it results the same. Moreover
we cannot turn new features on for AVX/AVX2 using
initial_ix86_tune_features.
.
2012/12/28 Uros Bizjak <ubizjak@gmail.com>:
> Hello!
>
>> New processors core-avx and core-avx2 are added. It was done to have
>> possibilities to turn new features on for these processors. Please review.
>
> I don't think this is a good approach, you are mixing an architecture
> with an ISA extension in the name. We already have
> processor_alias_table, where processor architecture and features
> (extensions) can be activated, depending on the name.
>
> Uros.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC, x86] Changes for AVX and AVX2 processors
  2012-12-29  5:26 ` Vladimir Yakovlev
@ 2012-12-29 10:07   ` Uros Bizjak
       [not found]     ` <CAK1BsWpUdUg+ivi7pFdbUr8R45YjhbBCNhmN=98sMmW99dy-tg@mail.gmail.com>
  0 siblings, 1 reply; 18+ messages in thread
From: Uros Bizjak @ 2012-12-29 10:07 UTC (permalink / raw)
  To: Vladimir Yakovlev; +Cc: gcc-patches

On Sat, Dec 29, 2012 at 6:26 AM, Vladimir Yakovlev <vbyakovl23@gmail.com> wrote:

> processor_alias_table contains the same processor type for all
> "corei7", "corei7-avx", "core-avx-i" and "core-avx2". At least, it has
> consequence on checking x86_avx256_split_unaligned_load &
> ix86_tune_mask: for all these processors it results the same. Moreover
> we cannot turn new features on for AVX/AVX2 using
> initial_ix86_tune_features.

corei7, corei7-avx and core-avx-i are all based on sandybridge (=
PROCESSOR_COREI7) architecture. The only problematic entry is
core-avx2, which should be based on new architecture. I propose
PROCESSOR_HASWELL, in the same way as we have PROCESSOR_NOCONA.

Uros.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC, x86] Changes for AVX and AVX2 processors
       [not found]     ` <CAK1BsWpUdUg+ivi7pFdbUr8R45YjhbBCNhmN=98sMmW99dy-tg@mail.gmail.com>
@ 2012-12-29 16:57       ` Vladimir Yakovlev
  2012-12-30 13:21         ` Uros Bizjak
  2012-12-30 11:59       ` Uros Bizjak
  1 sibling, 1 reply; 18+ messages in thread
From: Vladimir Yakovlev @ 2012-12-29 16:57 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 805 bytes --]

I did changes. Please take a look.

2012/12/29, Uros Bizjak <ubizjak@gmail.com>:
> On Sat, Dec 29, 2012 at 6:26 AM, Vladimir Yakovlev <vbyakovl23@gmail.com>
> wrote:
>
>> processor_alias_table contains the same processor type for all
>> "corei7", "corei7-avx", "core-avx-i" and "core-avx2". At least, it has
>> consequence on checking x86_avx256_split_unaligned_load &
>> ix86_tune_mask: for all these processors it results the same. Moreover
>> we cannot turn new features on for AVX/AVX2 using
>> initial_ix86_tune_features.
>
> corei7, corei7-avx and core-avx-i are all based on sandybridge (=
> PROCESSOR_COREI7) architecture. The only problematic entry is
> core-avx2, which should be based on new architecture. I propose
> PROCESSOR_HASWELL, in the same way as we have PROCESSOR_NOCONA.
>
> Uros.
>

[-- Attachment #2: patch --]
[-- Type: text/plain, Size: 6135 bytes --]

diff --git a/gcc/config/i386/i386-c.c b/gcc/config/i386/i386-c.c
index 08e1afe..2d8abd5 100644
--- a/gcc/config/i386/i386-c.c
+++ b/gcc/config/i386/i386-c.c
@@ -142,11 +142,7 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
       def_or_undef (parse_in, "__corei7");
       def_or_undef (parse_in, "__corei7__");
       break;
-    case PROCESSOR_CORE_AVX:
-      def_or_undef (parse_in, "__core_avx");
-      def_or_undef (parse_in, "__core_avx__");
-      break;
-    case PROCESSOR_CORE_AVX2:
+    case PROCESSOR_HASWELL:
       def_or_undef (parse_in, "__core_avx2");
       def_or_undef (parse_in, "__core_avx2__");
       break;
@@ -240,10 +236,7 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
     case PROCESSOR_COREI7:
       def_or_undef (parse_in, "__tune_corei7__");
       break;
-    case PROCESSOR_CORE_AVX:
-      def_or_undef (parse_in, "__tune_core_avx__");
-      break;
-    case PROCESSOR_CORE_AVX2:
+    case PROCESSOR_HASWELL:
       def_or_undef (parse_in, "__tune_core_avx2__");
       break;
     case PROCESSOR_ATOM:
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 10411da..4adbef6 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -1732,9 +1732,8 @@ const struct processor_costs *ix86_cost = &pentium_cost;
 #define m_P4_NOCONA (m_PENT4 | m_NOCONA)
 #define m_CORE2 (1<<PROCESSOR_CORE2)
 #define m_COREI7 (1<<PROCESSOR_COREI7)
-#define m_CORE_AVX (1<<PROCESSOR_CORE_AVX)
-#define m_CORE_AVX2 (1<<PROCESSOR_CORE_AVX2)
-#define m_CORE_ALL (m_CORE2 | m_COREI7 | m_CORE_AVX | m_CORE_AVX2)
+#define m_HASWELL (1<<PROCESSOR_HASWELL)
+#define m_CORE_ALL (m_CORE2 | m_COREI7  | m_HASWELL)
 #define m_ATOM (1<<PROCESSOR_ATOM)
 
 #define m_GEODE (1<<PROCESSOR_GEODE)
@@ -2438,8 +2437,6 @@ static const struct ptt processor_target_table[PROCESSOR_max] =
   {&core_cost, 16, 10, 16, 10, 16},
   /* Core i7  */
   {&core_cost, 16, 10, 16, 10, 16},
-  /* Core avx  */
-  {&core_cost, 16, 10, 16, 10, 16},
   /* Core avx2  */
   {&core_cost, 16, 10, 16, 10, 16},
   {&generic32_cost, 16, 7, 16, 7, 16},
@@ -2469,7 +2466,6 @@ static const char *const cpu_names[TARGET_CPU_DEFAULT_max] =
   "nocona",
   "core2",
   "corei7",
-  "coreavx",
   "coreavx2",
   "atom",
   "geode",
@@ -2912,17 +2908,17 @@ ix86_option_override_internal (bool main_args_p)
       {"corei7", PROCESSOR_COREI7, CPU_COREI7,
 	PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3
 	| PTA_SSSE3 | PTA_SSE4_1 | PTA_SSE4_2 | PTA_CX16 | PTA_FXSR},
-      {"corei7-avx", PROCESSOR_CORE_AVX, CPU_COREI7,
+      {"corei7-avx", PROCESSOR_COREI7, CPU_COREI7,
 	PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3
 	| PTA_SSSE3 | PTA_SSE4_1 | PTA_SSE4_2 | PTA_AVX
 	| PTA_CX16 | PTA_POPCNT | PTA_AES | PTA_PCLMUL
 	| PTA_FXSR | PTA_XSAVE | PTA_XSAVEOPT},
-      {"core-avx-i", PROCESSOR_CORE_AVX, CPU_COREI7,
+      {"core-avx-i", PROCESSOR_COREI7, CPU_COREI7,
 	PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3
 	| PTA_SSSE3 | PTA_SSE4_1 | PTA_SSE4_2 | PTA_AVX
 	| PTA_CX16 | PTA_POPCNT | PTA_AES | PTA_PCLMUL | PTA_FSGSBASE
 	| PTA_RDRND | PTA_F16C | PTA_FXSR | PTA_XSAVE | PTA_XSAVEOPT},
-      {"core-avx2", PROCESSOR_CORE_AVX2, CPU_COREI7,
+      {"core-avx2", PROCESSOR_HASWELL, CPU_COREI7,
 	PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3
 	| PTA_SSSE3 | PTA_SSE4_1 | PTA_SSE4_2 | PTA_AVX | PTA_AVX2
 	| PTA_CX16 | PTA_POPCNT | PTA_AES | PTA_PCLMUL | PTA_FSGSBASE
@@ -24069,8 +24065,7 @@ ix86_issue_rate (void)
     case PROCESSOR_PENTIUM4:
     case PROCESSOR_CORE2:
     case PROCESSOR_COREI7:
-    case PROCESSOR_CORE_AVX:
-    case PROCESSOR_CORE_AVX2:
+    case PROCESSOR_HASWELL:
     case PROCESSOR_ATHLON:
     case PROCESSOR_K8:
     case PROCESSOR_AMDFAM10:
@@ -24327,8 +24322,7 @@ ia32_multipass_dfa_lookahead (void)
 
     case PROCESSOR_CORE2:
     case PROCESSOR_COREI7:
-    case PROCESSOR_CORE_AVX:
-    case PROCESSOR_CORE_AVX2:
+    case PROCESSOR_HASWELL:
     case PROCESSOR_ATOM:
       /* Generally, we want haifa-sched:max_issue() to look ahead as far
 	 as many instructions can be executed on a cycle, i.e.,
@@ -24873,8 +24867,7 @@ ix86_sched_init_global (FILE *dump ATTRIBUTE_UNUSED,
     {
     case PROCESSOR_CORE2:
     case PROCESSOR_COREI7:
-    case PROCESSOR_CORE_AVX:
-    case PROCESSOR_CORE_AVX2:
+    case PROCESSOR_HASWELL:
       /* Do not perform multipass scheduling for pre-reload schedule
          to save compile time.  */
       if (reload_completed)
@@ -28719,11 +28712,7 @@ get_builtin_code_for_version (tree decl, tree *predicate_list)
 	      arg_str = "corei7";
 	      priority = P_PROC_SSE4_2;
 	      break;
-	    case PROCESSOR_CORE_AVX:
-	      arg_str = "core_avx";
-	      priority = P_PROC_SSE4_2;
-	      break;
-	    case PROCESSOR_CORE_AVX2:
+	    case PROCESSOR_HASWELL:
 	      arg_str = "core_avx2";
 	      priority = P_PROC_SSE4_2;
 	      break;
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index d3ee8b0..ee21c47 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -248,8 +248,7 @@ extern const struct processor_costs ix86_size_cost;
 #define TARGET_NOCONA (ix86_tune == PROCESSOR_NOCONA)
 #define TARGET_CORE2 (ix86_tune == PROCESSOR_CORE2)
 #define TARGET_COREI7 (ix86_tune == PROCESSOR_COREI7)
-#define TARGET_CORE_AVX (ix86_tune == PROCESSOR_CORE_AVX)
-#define TARGET_CORE_AVX2 (ix86_tune == PROCESSOR_CORE_AVX2)
+#define TARGET_HASWELL (ix86_tune == PROCESSOR_HASWELL)
 #define TARGET_GENERIC32 (ix86_tune == PROCESSOR_GENERIC32)
 #define TARGET_GENERIC64 (ix86_tune == PROCESSOR_GENERIC64)
 #define TARGET_GENERIC (TARGET_GENERIC32 || TARGET_GENERIC64)
@@ -605,8 +604,7 @@ enum target_cpu_default
   TARGET_CPU_DEFAULT_nocona,
   TARGET_CPU_DEFAULT_core2,
   TARGET_CPU_DEFAULT_corei7,
-  TARGET_CPU_DEFAULT_core_avx,
-  TARGET_CPU_DEFAULT_core_avx2,
+  TARGET_CPU_DEFAULT_haswell,
   TARGET_CPU_DEFAULT_atom,
 
   TARGET_CPU_DEFAULT_geode,
@@ -2099,8 +2097,7 @@ enum processor_type
   PROCESSOR_NOCONA,
   PROCESSOR_CORE2,
   PROCESSOR_COREI7,
-  PROCESSOR_CORE_AVX,
-  PROCESSOR_CORE_AVX2,
+  PROCESSOR_HASWELL,
   PROCESSOR_GENERIC32,
   PROCESSOR_GENERIC64,
   PROCESSOR_AMDFAM10,

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC, x86] Changes for AVX and AVX2 processors
       [not found]     ` <CAK1BsWpUdUg+ivi7pFdbUr8R45YjhbBCNhmN=98sMmW99dy-tg@mail.gmail.com>
  2012-12-29 16:57       ` Vladimir Yakovlev
@ 2012-12-30 11:59       ` Uros Bizjak
  1 sibling, 0 replies; 18+ messages in thread
From: Uros Bizjak @ 2012-12-30 11:59 UTC (permalink / raw)
  To: Vladimir Yakovlev; +Cc: gcc-patches

On Sat, Dec 29, 2012 at 5:50 PM, Vladimir Yakovlev <vbyakovl23@gmail.com> wrote:
> I did canges. Please take a look.

Please attach the patch, relative to mainline, not an incremental
patch vs. your previous version.

Thanks,
Uros.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC, x86] Changes for AVX and AVX2 processors
  2012-12-29 16:57       ` Vladimir Yakovlev
@ 2012-12-30 13:21         ` Uros Bizjak
  2012-12-30 16:05           ` Vladimir Yakovlev
  0 siblings, 1 reply; 18+ messages in thread
From: Uros Bizjak @ 2012-12-30 13:21 UTC (permalink / raw)
  To: Vladimir Yakovlev; +Cc: gcc-patches

On Sat, Dec 29, 2012 at 5:57 PM, Vladimir Yakovlev <vbyakovl23@gmail.com> wrote:
> I did changes. Please take a look.
>
> 2012/12/29, Uros Bizjak <ubizjak@gmail.com>:
>> On Sat, Dec 29, 2012 at 6:26 AM, Vladimir Yakovlev <vbyakovl23@gmail.com>
>> wrote:
>>
>>> processor_alias_table contains the same processor type for all
>>> "corei7", "corei7-avx", "core-avx-i" and "core-avx2". At least, it has
>>> consequence on checking x86_avx256_split_unaligned_load &
>>> ix86_tune_mask: for all these processors it results the same. Moreover
>>> we cannot turn new features on for AVX/AVX2 using
>>> initial_ix86_tune_features.
>>
>> corei7, corei7-avx and core-avx-i are all based on sandybridge (=
>> PROCESSOR_COREI7) architecture. The only problematic entry is
>> core-avx2, which should be based on new architecture. I propose
>> PROCESSOR_HASWELL, in the same way as we have PROCESSOR_NOCONA.

@@ -2467,6 +2470,7 @@
   "nocona",
   "core2",
   "corei7",
+  "coreavx2",
   "atom",
   "geode",
   "k6",

This string should match processor_alias_table name, so "core-avx2".

@@ -28709,6 +28716,10 @@
              arg_str = "corei7";
              priority = P_PROC_SSE4_2;
              break;
+           case PROCESSOR_HASWELL:
+             arg_str = "core_avx2";
+             priority = P_PROC_SSE4_2;
+             break;
            case PROCESSOR_ATOM:
              arg_str = "atom";
              priority = P_PROC_SSSE3;

This is part of a processor dispatcher functionality. To support this
functionality, some more changes are needed, so it is IMO best to
leave this part out for now. I would also like the author of processor
dispatcher to review changes in this area.

On a related note, it looks to me that corei7 should declare
P_PROC_AVX here (this change should be part of another patch).

Other than that , the patch looks OK, but please repost final version
with a correct ChangeLog.

Uros.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC, x86] Changes for AVX and AVX2 processors
  2012-12-30 13:21         ` Uros Bizjak
@ 2012-12-30 16:05           ` Vladimir Yakovlev
  2012-12-30 18:05             ` Uros Bizjak
  0 siblings, 1 reply; 18+ messages in thread
From: Vladimir Yakovlev @ 2012-12-30 16:05 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2941 bytes --]

I fixed typos and added CalangeLog.

2012-12-27  Vladimir Yakovlev  <vladimir.b.yakovlev@intel.com

	* config/i386/i386-c.c (ix86_target_macros_internal): New case.
	 (ix86_target_macros_internal): Likewise.

	* config/i386/i386.c (m_CORE2I7): Removed.
	(m_CORE_HASWELL): New macro.
	(m_CORE_ALL): Likewise.
	(initial_ix86_tune_features): m_CORE2I7 is replaced by m_CORE_ALL.
	(initial_ix86_arch_features): Likewise.
	(processor_target_table): Initializations for Core avx2.
	(cpu_names): New names "core-avx2".
	(ix86_option_override_internal): Changed PROCESSOR_COREI7 by
	PROCESSOR_CORE_HASWELL.
	(ix86_issue_rate): New case.
	(ia32_multipass_dfa_lookahead): Likewise.
	(ix86_sched_init_global): Likewise.
	(get_builtin_code_for_version): Likewise.

	* config/i386/i386.h (TARGET_HASWELL): New macro.
	(target_cpu_default): New TARGET_CPU_DEFAULT_haswell.
	(processor_type): New PROCESSOR_HASWELL.


2012/12/30 Uros Bizjak <ubizjak@gmail.com>:
> On Sat, Dec 29, 2012 at 5:57 PM, Vladimir Yakovlev <vbyakovl23@gmail.com> wrote:
>> I did changes. Please take a look.
>>
>> 2012/12/29, Uros Bizjak <ubizjak@gmail.com>:
>>> On Sat, Dec 29, 2012 at 6:26 AM, Vladimir Yakovlev <vbyakovl23@gmail.com>
>>> wrote:
>>>
>>>> processor_alias_table contains the same processor type for all
>>>> "corei7", "corei7-avx", "core-avx-i" and "core-avx2". At least, it has
>>>> consequence on checking x86_avx256_split_unaligned_load &
>>>> ix86_tune_mask: for all these processors it results the same. Moreover
>>>> we cannot turn new features on for AVX/AVX2 using
>>>> initial_ix86_tune_features.
>>>
>>> corei7, corei7-avx and core-avx-i are all based on sandybridge (=
>>> PROCESSOR_COREI7) architecture. The only problematic entry is
>>> core-avx2, which should be based on new architecture. I propose
>>> PROCESSOR_HASWELL, in the same way as we have PROCESSOR_NOCONA.
>
> @@ -2467,6 +2470,7 @@
>    "nocona",
>    "core2",
>    "corei7",
> +  "coreavx2",
>    "atom",
>    "geode",
>    "k6",
>
> This string should match processor_alias_table name, so "core-avx2".
>
> @@ -28709,6 +28716,10 @@
>               arg_str = "corei7";
>               priority = P_PROC_SSE4_2;
>               break;
> +           case PROCESSOR_HASWELL:
> +             arg_str = "core_avx2";
> +             priority = P_PROC_SSE4_2;
> +             break;
>             case PROCESSOR_ATOM:
>               arg_str = "atom";
>               priority = P_PROC_SSSE3;
>
> This is part of a processor dispatcher functionality. To support this
> functionality, some more changes are needed, so it is IMO best to
> leave this part out for now. I would also like the author of processor
> dispatcher to review changes in this area.
>
> On a related note, it looks to me that corei7 should declare
> P_PROC_AVX here (this change should be part of another patch).
>
> Other than that , the patch looks OK, but please repost final version
> with a correct ChangeLog.
>
> Uros.

[-- Attachment #2: patch --]
[-- Type: application/octet-stream, Size: 13095 bytes --]

diff --git a/gcc/config/i386/i386-c.c b/gcc/config/i386/i386-c.c
index 22e5e9b..2d8abd5 100644
--- a/gcc/config/i386/i386-c.c
+++ b/gcc/config/i386/i386-c.c
@@ -142,6 +142,10 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
       def_or_undef (parse_in, "__corei7");
       def_or_undef (parse_in, "__corei7__");
       break;
+    case PROCESSOR_HASWELL:
+      def_or_undef (parse_in, "__core_avx2");
+      def_or_undef (parse_in, "__core_avx2__");
+      break;
     case PROCESSOR_ATOM:
       def_or_undef (parse_in, "__atom");
       def_or_undef (parse_in, "__atom__");
@@ -232,6 +236,9 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
     case PROCESSOR_COREI7:
       def_or_undef (parse_in, "__tune_corei7__");
       break;
+    case PROCESSOR_HASWELL:
+      def_or_undef (parse_in, "__tune_core_avx2__");
+      break;
     case PROCESSOR_ATOM:
       def_or_undef (parse_in, "__tune_atom__");
       break;
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 69f44aa..cdabff6 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -1732,7 +1732,8 @@ const struct processor_costs *ix86_cost = &pentium_cost;
 #define m_P4_NOCONA (m_PENT4 | m_NOCONA)
 #define m_CORE2 (1<<PROCESSOR_CORE2)
 #define m_COREI7 (1<<PROCESSOR_COREI7)
-#define m_CORE2I7 (m_CORE2 | m_COREI7)
+#define m_HASWELL (1<<PROCESSOR_HASWELL)
+#define m_CORE_ALL (m_CORE2 | m_COREI7  | m_HASWELL)
 #define m_ATOM (1<<PROCESSOR_ATOM)
 
 #define m_GEODE (1<<PROCESSOR_GEODE)
@@ -1768,16 +1769,16 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = {
      negatively, so enabling for Generic64 seems like good code size
      tradeoff.  We can't enable it for 32bit generic because it does not
      work well with PPro base chips.  */
-  m_386 | m_CORE2I7 | m_K6_GEODE | m_AMD_MULTIPLE | m_GENERIC64,
+  m_386 | m_CORE_ALL | m_K6_GEODE | m_AMD_MULTIPLE | m_GENERIC64,
 
   /* X86_TUNE_PUSH_MEMORY */
-  m_386 | m_P4_NOCONA | m_CORE2I7 | m_K6_GEODE | m_AMD_MULTIPLE | m_GENERIC,
+  m_386 | m_P4_NOCONA | m_CORE_ALL | m_K6_GEODE | m_AMD_MULTIPLE | m_GENERIC,
 
   /* X86_TUNE_ZERO_EXTEND_WITH_AND */
   m_486 | m_PENT,
 
   /* X86_TUNE_UNROLL_STRLEN */
-  m_486 | m_PENT | m_PPRO | m_ATOM | m_CORE2I7 | m_K6 | m_AMD_MULTIPLE | m_GENERIC,
+  m_486 | m_PENT | m_PPRO | m_ATOM | m_CORE_ALL | m_K6 | m_AMD_MULTIPLE | m_GENERIC,
 
   /* X86_TUNE_BRANCH_PREDICTION_HINTS: Branch hints were put in P4 based
      on simulation result. But after P4 was made, no performance benefit
@@ -1789,11 +1790,11 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = {
   ~m_386,
 
   /* X86_TUNE_USE_SAHF */
-  m_PPRO | m_P4_NOCONA | m_CORE2I7 | m_ATOM | m_K6_GEODE | m_K8 | m_AMDFAM10 | m_BDVER | m_BTVER | m_GENERIC,
+  m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_ATOM | m_K6_GEODE | m_K8 | m_AMDFAM10 | m_BDVER | m_BTVER | m_GENERIC,
 
   /* X86_TUNE_MOVX: Enable to zero extend integer registers to avoid
      partial dependencies.  */
-  m_PPRO | m_P4_NOCONA | m_CORE2I7 | m_ATOM | m_GEODE | m_AMD_MULTIPLE  | m_GENERIC,
+  m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_ATOM | m_GEODE | m_AMD_MULTIPLE  | m_GENERIC,
 
   /* X86_TUNE_PARTIAL_REG_STALL: We probably ought to watch for partial
      register stalls on Generic32 compilation setting as well.  However
@@ -1806,17 +1807,17 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = {
   m_PPRO,
 
   /* X86_TUNE_PARTIAL_FLAG_REG_STALL */
-  m_CORE2I7 | m_GENERIC,
+  m_CORE_ALL | m_GENERIC,
 
   /* X86_TUNE_LCP_STALL: Avoid an expensive length-changing prefix stall
    * on 16-bit immediate moves into memory on Core2 and Corei7.  */
-  m_CORE2I7 | m_GENERIC,
+  m_CORE_ALL | m_GENERIC,
 
   /* X86_TUNE_USE_HIMODE_FIOP */
   m_386 | m_486 | m_K6_GEODE,
 
   /* X86_TUNE_USE_SIMODE_FIOP */
-  ~(m_PENT | m_PPRO | m_CORE2I7 | m_ATOM | m_AMD_MULTIPLE | m_GENERIC),
+  ~(m_PENT | m_PPRO | m_CORE_ALL | m_ATOM | m_AMD_MULTIPLE | m_GENERIC),
 
   /* X86_TUNE_USE_MOV0 */
   m_K6,
@@ -1837,7 +1838,7 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = {
   ~(m_PENT | m_PPRO),
 
   /* X86_TUNE_PROMOTE_QIMODE */
-  m_386 | m_486 | m_PENT | m_CORE2I7 | m_ATOM | m_K6_GEODE | m_AMD_MULTIPLE | m_GENERIC,
+  m_386 | m_486 | m_PENT | m_CORE_ALL | m_ATOM | m_K6_GEODE | m_AMD_MULTIPLE | m_GENERIC,
 
   /* X86_TUNE_FAST_PREFIX */
   ~(m_386 | m_486 | m_PENT),
@@ -1878,10 +1879,10 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = {
 
   /* X86_TUNE_INTEGER_DFMODE_MOVES: Enable if integer moves are preferred
      for DFmode copies */
-  ~(m_PPRO | m_P4_NOCONA | m_CORE2I7 | m_ATOM | m_GEODE | m_AMD_MULTIPLE | m_ATOM | m_GENERIC),
+  ~(m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_ATOM | m_GEODE | m_AMD_MULTIPLE | m_ATOM | m_GENERIC),
 
   /* X86_TUNE_PARTIAL_REG_DEPENDENCY */
-  m_P4_NOCONA | m_CORE2I7 | m_ATOM | m_AMD_MULTIPLE | m_GENERIC,
+  m_P4_NOCONA | m_CORE_ALL | m_ATOM | m_AMD_MULTIPLE | m_GENERIC,
 
   /* X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY: In the Generic model we have a
      conflict here in between PPro/Pentium4 based chips that thread 128bit
@@ -1892,7 +1893,7 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = {
      shows that disabling this option on P4 brings over 20% SPECfp regression,
      while enabling it on K8 brings roughly 2.4% regression that can be partly
      masked by careful scheduling of moves.  */
-  m_PPRO | m_P4_NOCONA | m_CORE2I7 | m_ATOM  | m_AMDFAM10 | m_BDVER | m_GENERIC,
+  m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_ATOM  | m_AMDFAM10 | m_BDVER | m_GENERIC,
 
   /* X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL */
   m_COREI7 | m_AMDFAM10 | m_BDVER | m_BTVER,
@@ -1916,7 +1917,7 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = {
   m_PPRO | m_P4_NOCONA,
 
   /* X86_TUNE_MEMORY_MISMATCH_STALL */
-  m_P4_NOCONA | m_CORE2I7 | m_ATOM | m_AMD_MULTIPLE | m_GENERIC,
+  m_P4_NOCONA | m_CORE_ALL | m_ATOM | m_AMD_MULTIPLE | m_GENERIC,
 
   /* X86_TUNE_PROLOGUE_USING_MOVE */
   m_PPRO | m_ATHLON_K8,
@@ -1938,28 +1939,28 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = {
 
   /* X86_TUNE_FOUR_JUMP_LIMIT: Some CPU cores are not able to predict more
      than 4 branch instructions in the 16 byte window.  */
-  m_PPRO | m_P4_NOCONA | m_CORE2I7 | m_ATOM | m_AMD_MULTIPLE | m_GENERIC,
+  m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_ATOM | m_AMD_MULTIPLE | m_GENERIC,
 
   /* X86_TUNE_SCHEDULE */
-  m_PENT | m_PPRO | m_CORE2I7 | m_ATOM | m_K6_GEODE | m_AMD_MULTIPLE | m_GENERIC,
+  m_PENT | m_PPRO | m_CORE_ALL | m_ATOM | m_K6_GEODE | m_AMD_MULTIPLE | m_GENERIC,
 
   /* X86_TUNE_USE_BT */
-  m_CORE2I7 | m_ATOM | m_AMD_MULTIPLE | m_GENERIC,
+  m_CORE_ALL | m_ATOM | m_AMD_MULTIPLE | m_GENERIC,
 
   /* X86_TUNE_USE_INCDEC */
-  ~(m_P4_NOCONA | m_CORE2I7 | m_ATOM | m_GENERIC),
+  ~(m_P4_NOCONA | m_CORE_ALL | m_ATOM | m_GENERIC),
 
   /* X86_TUNE_PAD_RETURNS */
-  m_CORE2I7 | m_AMD_MULTIPLE | m_GENERIC,
+  m_CORE_ALL | m_AMD_MULTIPLE | m_GENERIC,
 
   /* X86_TUNE_PAD_SHORT_FUNCTION: Pad short funtion.  */
   m_ATOM,
 
   /* X86_TUNE_EXT_80387_CONSTANTS */
-  m_PPRO | m_P4_NOCONA | m_CORE2I7 | m_ATOM | m_K6_GEODE | m_ATHLON_K8 | m_GENERIC,
+  m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_ATOM | m_K6_GEODE | m_ATHLON_K8 | m_GENERIC,
 
   /* X86_TUNE_AVOID_VECTOR_DECODE */
-  m_CORE2I7 | m_K8 | m_GENERIC64,
+  m_CORE_ALL | m_K8 | m_GENERIC64,
 
   /* X86_TUNE_PROMOTE_HIMODE_IMUL: Modern CPUs have same latency for HImode
      and SImode multiply, but 386 and 486 do HImode multiply faster.  */
@@ -1967,11 +1968,11 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = {
 
   /* X86_TUNE_SLOW_IMUL_IMM32_MEM: Imul of 32-bit constant and memory is
      vector path on AMD machines.  */
-  m_CORE2I7 | m_K8 | m_AMDFAM10 | m_BDVER | m_BTVER | m_GENERIC64,
+  m_CORE_ALL | m_K8 | m_AMDFAM10 | m_BDVER | m_BTVER | m_GENERIC64,
 
   /* X86_TUNE_SLOW_IMUL_IMM8: Imul of 8-bit constant is vector path on AMD
      machines.  */
-  m_CORE2I7 | m_K8 | m_AMDFAM10 | m_BDVER | m_BTVER | m_GENERIC64,
+  m_CORE_ALL | m_K8 | m_AMDFAM10 | m_BDVER | m_BTVER | m_GENERIC64,
 
   /* X86_TUNE_MOVE_M1_VIA_OR: On pentiums, it is faster to load -1 via OR
      than a MOV.  */
@@ -1988,7 +1989,7 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = {
 
   /* X86_TUNE_USE_VECTOR_FP_CONVERTS: Prefer vector packed SSE conversion
      from FP to FP. */
-  m_CORE2I7 | m_AMDFAM10 | m_GENERIC,
+  m_CORE_ALL | m_AMDFAM10 | m_GENERIC,
 
   /* X86_TUNE_USE_VECTOR_CONVERTS: Prefer vector packed SSE conversion
      from integer to FP. */
@@ -2026,7 +2027,7 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = {
 
   /* X86_TUNE_GENERAL_REGS_SSE_SPILL: Try to spill general regs to SSE
      regs instead of memory.  */
-  m_COREI7 | m_CORE2I7
+  m_CORE_ALL
 };
 
 /* Feature tests against the various architecture variations.  */
@@ -2052,10 +2053,10 @@ static unsigned int initial_ix86_arch_features[X86_ARCH_LAST] = {
 };
 
 static const unsigned int x86_accumulate_outgoing_args
-  = m_PPRO | m_P4_NOCONA | m_ATOM | m_CORE2I7 | m_AMD_MULTIPLE | m_GENERIC;
+  = m_PPRO | m_P4_NOCONA | m_ATOM | m_CORE_ALL | m_AMD_MULTIPLE | m_GENERIC;
 
 static const unsigned int x86_arch_always_fancy_math_387
-  = m_PENT | m_PPRO | m_P4_NOCONA | m_CORE2I7 | m_ATOM | m_AMD_MULTIPLE | m_GENERIC;
+  = m_PENT | m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_ATOM | m_AMD_MULTIPLE | m_GENERIC;
 
 static const unsigned int x86_avx256_split_unaligned_load
   = m_COREI7 | m_GENERIC;
@@ -2436,6 +2437,8 @@ static const struct ptt processor_target_table[PROCESSOR_max] =
   {&core_cost, 16, 10, 16, 10, 16},
   /* Core i7  */
   {&core_cost, 16, 10, 16, 10, 16},
+  /* Core avx2  */
+  {&core_cost, 16, 10, 16, 10, 16},
   {&generic32_cost, 16, 7, 16, 7, 16},
   {&generic64_cost, 16, 10, 16, 10, 16},
   {&amdfam10_cost, 32, 24, 32, 7, 32},
@@ -2463,6 +2466,7 @@ static const char *const cpu_names[TARGET_CPU_DEFAULT_max] =
   "nocona",
   "core2",
   "corei7",
+  "core-avx2",
   "atom",
   "geode",
   "k6",
@@ -2914,7 +2918,7 @@ ix86_option_override_internal (bool main_args_p)
 	| PTA_SSSE3 | PTA_SSE4_1 | PTA_SSE4_2 | PTA_AVX
 	| PTA_CX16 | PTA_POPCNT | PTA_AES | PTA_PCLMUL | PTA_FSGSBASE
 	| PTA_RDRND | PTA_F16C | PTA_FXSR | PTA_XSAVE | PTA_XSAVEOPT},
-      {"core-avx2", PROCESSOR_COREI7, CPU_COREI7,
+      {"core-avx2", PROCESSOR_HASWELL, CPU_COREI7,
 	PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3
 	| PTA_SSSE3 | PTA_SSE4_1 | PTA_SSE4_2 | PTA_AVX | PTA_AVX2
 	| PTA_CX16 | PTA_POPCNT | PTA_AES | PTA_PCLMUL | PTA_FSGSBASE
@@ -24061,6 +24065,7 @@ ix86_issue_rate (void)
     case PROCESSOR_PENTIUM4:
     case PROCESSOR_CORE2:
     case PROCESSOR_COREI7:
+    case PROCESSOR_HASWELL:
     case PROCESSOR_ATHLON:
     case PROCESSOR_K8:
     case PROCESSOR_AMDFAM10:
@@ -24317,6 +24322,7 @@ ia32_multipass_dfa_lookahead (void)
 
     case PROCESSOR_CORE2:
     case PROCESSOR_COREI7:
+    case PROCESSOR_HASWELL:
     case PROCESSOR_ATOM:
       /* Generally, we want haifa-sched:max_issue() to look ahead as far
 	 as many instructions can be executed on a cycle, i.e.,
@@ -24861,6 +24867,7 @@ ix86_sched_init_global (FILE *dump ATTRIBUTE_UNUSED,
     {
     case PROCESSOR_CORE2:
     case PROCESSOR_COREI7:
+    case PROCESSOR_HASWELL:
       /* Do not perform multipass scheduling for pre-reload schedule
          to save compile time.  */
       if (reload_completed)
@@ -28705,6 +28712,10 @@ get_builtin_code_for_version (tree decl, tree *predicate_list)
 	      arg_str = "corei7";
 	      priority = P_PROC_SSE4_2;
 	      break;
+	    case PROCESSOR_HASWELL:
+	      arg_str = "core-avx2";
+	      priority = P_PROC_SSE4_2;
+	      break;
 	    case PROCESSOR_ATOM:
 	      arg_str = "atom";
 	      priority = P_PROC_SSSE3;
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 3ac3451..ee21c47 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -248,6 +248,7 @@ extern const struct processor_costs ix86_size_cost;
 #define TARGET_NOCONA (ix86_tune == PROCESSOR_NOCONA)
 #define TARGET_CORE2 (ix86_tune == PROCESSOR_CORE2)
 #define TARGET_COREI7 (ix86_tune == PROCESSOR_COREI7)
+#define TARGET_HASWELL (ix86_tune == PROCESSOR_HASWELL)
 #define TARGET_GENERIC32 (ix86_tune == PROCESSOR_GENERIC32)
 #define TARGET_GENERIC64 (ix86_tune == PROCESSOR_GENERIC64)
 #define TARGET_GENERIC (TARGET_GENERIC32 || TARGET_GENERIC64)
@@ -603,6 +604,7 @@ enum target_cpu_default
   TARGET_CPU_DEFAULT_nocona,
   TARGET_CPU_DEFAULT_core2,
   TARGET_CPU_DEFAULT_corei7,
+  TARGET_CPU_DEFAULT_haswell,
   TARGET_CPU_DEFAULT_atom,
 
   TARGET_CPU_DEFAULT_geode,
@@ -2095,6 +2097,7 @@ enum processor_type
   PROCESSOR_NOCONA,
   PROCESSOR_CORE2,
   PROCESSOR_COREI7,
+  PROCESSOR_HASWELL,
   PROCESSOR_GENERIC32,
   PROCESSOR_GENERIC64,
   PROCESSOR_AMDFAM10,

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC, x86] Changes for AVX and AVX2 processors
  2012-12-30 16:05           ` Vladimir Yakovlev
@ 2012-12-30 18:05             ` Uros Bizjak
  2013-01-10 11:12               ` Vladimir Yakovlev
  0 siblings, 1 reply; 18+ messages in thread
From: Uros Bizjak @ 2012-12-30 18:05 UTC (permalink / raw)
  To: Vladimir Yakovlev; +Cc: gcc-patches

On Sun, Dec 30, 2012 at 5:05 PM, Vladimir Yakovlev <vbyakovl23@gmail.com> wrote:
> I fixed typos and added CalangeLog.
>
> 2012-12-27  Vladimir Yakovlev  <vladimir.b.yakovlev@intel.com
>
>         * config/i386/i386-c.c (ix86_target_macros_internal): New case.
>          (ix86_target_macros_internal): Likewise.
>
>         * config/i386/i386.c (m_CORE2I7): Removed.
>         (m_CORE_HASWELL): New macro.
>         (m_CORE_ALL): Likewise.
>         (initial_ix86_tune_features): m_CORE2I7 is replaced by m_CORE_ALL.
>         (initial_ix86_arch_features): Likewise.
>         (processor_target_table): Initializations for Core avx2.
>         (cpu_names): New names "core-avx2".
>         (ix86_option_override_internal): Changed PROCESSOR_COREI7 by
>         PROCESSOR_CORE_HASWELL.
>         (ix86_issue_rate): New case.
>         (ia32_multipass_dfa_lookahead): Likewise.
>         (ix86_sched_init_global): Likewise.
>         (get_builtin_code_for_version): Likewise.
>
>         * config/i386/i386.h (TARGET_HASWELL): New macro.
>         (target_cpu_default): New TARGET_CPU_DEFAULT_haswell.
>         (processor_type): New PROCESSOR_HASWELL.

Please remove this part, it should be part of processor dispatcher part:

@@ -28705,6 +28712,10 @@ get_builtin_code_for_version (tree decl, tree
*predicate_list)
 	      arg_str = "corei7";
 	      priority = P_PROC_SSE4_2;
 	      break;
+	    case PROCESSOR_HASWELL:
+	      arg_str = "core-avx2";
+	      priority = P_PROC_SSE4_2;
+	      break;
 	    case PROCESSOR_ATOM:
 	      arg_str = "atom";
 	      priority = P_PROC_SSSE3;

Uros.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC, x86] Changes for AVX and AVX2 processors
  2012-12-30 18:05             ` Uros Bizjak
@ 2013-01-10 11:12               ` Vladimir Yakovlev
  2013-01-10 11:28                 ` Uros Bizjak
  0 siblings, 1 reply; 18+ messages in thread
From: Vladimir Yakovlev @ 2013-01-10 11:12 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2803 bytes --]

Hello Uros,

It seems I didn't sent a patch with last changes. Sorry if so.

Vladimir

 2012-12-27  Vladimir Yakovlev  <vladimir.b.yakovlev@intel.com

         * config/i386/i386-c.c (ix86_target_macros_internal): New case.
          (ix86_target_macros_internal): Likewise.

         * config/i386/i386.c (m_CORE2I7): Removed.
         (m_CORE_HASWELL): New macro.
         (m_CORE_ALL): Likewise.
         (initial_ix86_tune_features): m_CORE2I7 is replaced by m_CORE_ALL.
         (initial_ix86_arch_features): Likewise.
         (processor_target_table): Initializations for Core avx2.
         (cpu_names): New names "core-avx2".
         (ix86_option_override_internal): Changed PROCESSOR_COREI7 by
         PROCESSOR_CORE_HASWELL.
         (ix86_issue_rate): New case.
         (ia32_multipass_dfa_lookahead): Likewise.
         (ix86_sched_init_global): Likewise.

         * config/i386/i386.h (TARGET_HASWELL): New macro.
         (target_cpu_default): New TARGET_CPU_DEFAULT_haswell.
         (processor_type): New PROCESSOR_HASWELL.


2012/12/30 Uros Bizjak <ubizjak@gmail.com>:
> On Sun, Dec 30, 2012 at 5:05 PM, Vladimir Yakovlev <vbyakovl23@gmail.com> wrote:
>> I fixed typos and added CalangeLog.
>>
>> 2012-12-27  Vladimir Yakovlev  <vladimir.b.yakovlev@intel.com
>>
>>         * config/i386/i386-c.c (ix86_target_macros_internal): New case.
>>          (ix86_target_macros_internal): Likewise.
>>
>>         * config/i386/i386.c (m_CORE2I7): Removed.
>>         (m_CORE_HASWELL): New macro.
>>         (m_CORE_ALL): Likewise.
>>         (initial_ix86_tune_features): m_CORE2I7 is replaced by m_CORE_ALL.
>>         (initial_ix86_arch_features): Likewise.
>>         (processor_target_table): Initializations for Core avx2.
>>         (cpu_names): New names "core-avx2".
>>         (ix86_option_override_internal): Changed PROCESSOR_COREI7 by
>>         PROCESSOR_CORE_HASWELL.
>>         (ix86_issue_rate): New case.
>>         (ia32_multipass_dfa_lookahead): Likewise.
>>         (ix86_sched_init_global): Likewise.
>>         (get_builtin_code_for_version): Likewise.
>>
>>         * config/i386/i386.h (TARGET_HASWELL): New macro.
>>         (target_cpu_default): New TARGET_CPU_DEFAULT_haswell.
>>         (processor_type): New PROCESSOR_HASWELL.
>
> Please remove this part, it should be part of processor dispatcher part:
>
> @@ -28705,6 +28712,10 @@ get_builtin_code_for_version (tree decl, tree
> *predicate_list)
>               arg_str = "corei7";
>               priority = P_PROC_SSE4_2;
>               break;
> +           case PROCESSOR_HASWELL:
> +             arg_str = "core-avx2";
> +             priority = P_PROC_SSE4_2;
> +             break;
>             case PROCESSOR_ATOM:
>               arg_str = "atom";
>               priority = P_PROC_SSSE3;
>
> Uros.

[-- Attachment #2: patch1 --]
[-- Type: application/octet-stream, Size: 12723 bytes --]

diff --git a/gcc/config/i386/i386-c.c b/gcc/config/i386/i386-c.c
index 22e5e9b..2d8abd5 100644
--- a/gcc/config/i386/i386-c.c
+++ b/gcc/config/i386/i386-c.c
@@ -142,6 +142,10 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
       def_or_undef (parse_in, "__corei7");
       def_or_undef (parse_in, "__corei7__");
       break;
+    case PROCESSOR_HASWELL:
+      def_or_undef (parse_in, "__core_avx2");
+      def_or_undef (parse_in, "__core_avx2__");
+      break;
     case PROCESSOR_ATOM:
       def_or_undef (parse_in, "__atom");
       def_or_undef (parse_in, "__atom__");
@@ -232,6 +236,9 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
     case PROCESSOR_COREI7:
       def_or_undef (parse_in, "__tune_corei7__");
       break;
+    case PROCESSOR_HASWELL:
+      def_or_undef (parse_in, "__tune_core_avx2__");
+      break;
     case PROCESSOR_ATOM:
       def_or_undef (parse_in, "__tune_atom__");
       break;
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 69f44aa..cdb5d23 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -1732,7 +1732,8 @@ const struct processor_costs *ix86_cost = &pentium_cost;
 #define m_P4_NOCONA (m_PENT4 | m_NOCONA)
 #define m_CORE2 (1<<PROCESSOR_CORE2)
 #define m_COREI7 (1<<PROCESSOR_COREI7)
-#define m_CORE2I7 (m_CORE2 | m_COREI7)
+#define m_HASWELL (1<<PROCESSOR_HASWELL)
+#define m_CORE_ALL (m_CORE2 | m_COREI7  | m_HASWELL)
 #define m_ATOM (1<<PROCESSOR_ATOM)
 
 #define m_GEODE (1<<PROCESSOR_GEODE)
@@ -1768,16 +1769,16 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = {
      negatively, so enabling for Generic64 seems like good code size
      tradeoff.  We can't enable it for 32bit generic because it does not
      work well with PPro base chips.  */
-  m_386 | m_CORE2I7 | m_K6_GEODE | m_AMD_MULTIPLE | m_GENERIC64,
+  m_386 | m_CORE_ALL | m_K6_GEODE | m_AMD_MULTIPLE | m_GENERIC64,
 
   /* X86_TUNE_PUSH_MEMORY */
-  m_386 | m_P4_NOCONA | m_CORE2I7 | m_K6_GEODE | m_AMD_MULTIPLE | m_GENERIC,
+  m_386 | m_P4_NOCONA | m_CORE_ALL | m_K6_GEODE | m_AMD_MULTIPLE | m_GENERIC,
 
   /* X86_TUNE_ZERO_EXTEND_WITH_AND */
   m_486 | m_PENT,
 
   /* X86_TUNE_UNROLL_STRLEN */
-  m_486 | m_PENT | m_PPRO | m_ATOM | m_CORE2I7 | m_K6 | m_AMD_MULTIPLE | m_GENERIC,
+  m_486 | m_PENT | m_PPRO | m_ATOM | m_CORE_ALL | m_K6 | m_AMD_MULTIPLE | m_GENERIC,
 
   /* X86_TUNE_BRANCH_PREDICTION_HINTS: Branch hints were put in P4 based
      on simulation result. But after P4 was made, no performance benefit
@@ -1789,11 +1790,11 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = {
   ~m_386,
 
   /* X86_TUNE_USE_SAHF */
-  m_PPRO | m_P4_NOCONA | m_CORE2I7 | m_ATOM | m_K6_GEODE | m_K8 | m_AMDFAM10 | m_BDVER | m_BTVER | m_GENERIC,
+  m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_ATOM | m_K6_GEODE | m_K8 | m_AMDFAM10 | m_BDVER | m_BTVER | m_GENERIC,
 
   /* X86_TUNE_MOVX: Enable to zero extend integer registers to avoid
      partial dependencies.  */
-  m_PPRO | m_P4_NOCONA | m_CORE2I7 | m_ATOM | m_GEODE | m_AMD_MULTIPLE  | m_GENERIC,
+  m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_ATOM | m_GEODE | m_AMD_MULTIPLE  | m_GENERIC,
 
   /* X86_TUNE_PARTIAL_REG_STALL: We probably ought to watch for partial
      register stalls on Generic32 compilation setting as well.  However
@@ -1806,17 +1807,17 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = {
   m_PPRO,
 
   /* X86_TUNE_PARTIAL_FLAG_REG_STALL */
-  m_CORE2I7 | m_GENERIC,
+  m_CORE_ALL | m_GENERIC,
 
   /* X86_TUNE_LCP_STALL: Avoid an expensive length-changing prefix stall
    * on 16-bit immediate moves into memory on Core2 and Corei7.  */
-  m_CORE2I7 | m_GENERIC,
+  m_CORE_ALL | m_GENERIC,
 
   /* X86_TUNE_USE_HIMODE_FIOP */
   m_386 | m_486 | m_K6_GEODE,
 
   /* X86_TUNE_USE_SIMODE_FIOP */
-  ~(m_PENT | m_PPRO | m_CORE2I7 | m_ATOM | m_AMD_MULTIPLE | m_GENERIC),
+  ~(m_PENT | m_PPRO | m_CORE_ALL | m_ATOM | m_AMD_MULTIPLE | m_GENERIC),
 
   /* X86_TUNE_USE_MOV0 */
   m_K6,
@@ -1837,7 +1838,7 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = {
   ~(m_PENT | m_PPRO),
 
   /* X86_TUNE_PROMOTE_QIMODE */
-  m_386 | m_486 | m_PENT | m_CORE2I7 | m_ATOM | m_K6_GEODE | m_AMD_MULTIPLE | m_GENERIC,
+  m_386 | m_486 | m_PENT | m_CORE_ALL | m_ATOM | m_K6_GEODE | m_AMD_MULTIPLE | m_GENERIC,
 
   /* X86_TUNE_FAST_PREFIX */
   ~(m_386 | m_486 | m_PENT),
@@ -1878,10 +1879,10 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = {
 
   /* X86_TUNE_INTEGER_DFMODE_MOVES: Enable if integer moves are preferred
      for DFmode copies */
-  ~(m_PPRO | m_P4_NOCONA | m_CORE2I7 | m_ATOM | m_GEODE | m_AMD_MULTIPLE | m_ATOM | m_GENERIC),
+  ~(m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_ATOM | m_GEODE | m_AMD_MULTIPLE | m_ATOM | m_GENERIC),
 
   /* X86_TUNE_PARTIAL_REG_DEPENDENCY */
-  m_P4_NOCONA | m_CORE2I7 | m_ATOM | m_AMD_MULTIPLE | m_GENERIC,
+  m_P4_NOCONA | m_CORE_ALL | m_ATOM | m_AMD_MULTIPLE | m_GENERIC,
 
   /* X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY: In the Generic model we have a
      conflict here in between PPro/Pentium4 based chips that thread 128bit
@@ -1892,7 +1893,7 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = {
      shows that disabling this option on P4 brings over 20% SPECfp regression,
      while enabling it on K8 brings roughly 2.4% regression that can be partly
      masked by careful scheduling of moves.  */
-  m_PPRO | m_P4_NOCONA | m_CORE2I7 | m_ATOM  | m_AMDFAM10 | m_BDVER | m_GENERIC,
+  m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_ATOM  | m_AMDFAM10 | m_BDVER | m_GENERIC,
 
   /* X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL */
   m_COREI7 | m_AMDFAM10 | m_BDVER | m_BTVER,
@@ -1916,7 +1917,7 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = {
   m_PPRO | m_P4_NOCONA,
 
   /* X86_TUNE_MEMORY_MISMATCH_STALL */
-  m_P4_NOCONA | m_CORE2I7 | m_ATOM | m_AMD_MULTIPLE | m_GENERIC,
+  m_P4_NOCONA | m_CORE_ALL | m_ATOM | m_AMD_MULTIPLE | m_GENERIC,
 
   /* X86_TUNE_PROLOGUE_USING_MOVE */
   m_PPRO | m_ATHLON_K8,
@@ -1938,28 +1939,28 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = {
 
   /* X86_TUNE_FOUR_JUMP_LIMIT: Some CPU cores are not able to predict more
      than 4 branch instructions in the 16 byte window.  */
-  m_PPRO | m_P4_NOCONA | m_CORE2I7 | m_ATOM | m_AMD_MULTIPLE | m_GENERIC,
+  m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_ATOM | m_AMD_MULTIPLE | m_GENERIC,
 
   /* X86_TUNE_SCHEDULE */
-  m_PENT | m_PPRO | m_CORE2I7 | m_ATOM | m_K6_GEODE | m_AMD_MULTIPLE | m_GENERIC,
+  m_PENT | m_PPRO | m_CORE_ALL | m_ATOM | m_K6_GEODE | m_AMD_MULTIPLE | m_GENERIC,
 
   /* X86_TUNE_USE_BT */
-  m_CORE2I7 | m_ATOM | m_AMD_MULTIPLE | m_GENERIC,
+  m_CORE_ALL | m_ATOM | m_AMD_MULTIPLE | m_GENERIC,
 
   /* X86_TUNE_USE_INCDEC */
-  ~(m_P4_NOCONA | m_CORE2I7 | m_ATOM | m_GENERIC),
+  ~(m_P4_NOCONA | m_CORE_ALL | m_ATOM | m_GENERIC),
 
   /* X86_TUNE_PAD_RETURNS */
-  m_CORE2I7 | m_AMD_MULTIPLE | m_GENERIC,
+  m_CORE_ALL | m_AMD_MULTIPLE | m_GENERIC,
 
   /* X86_TUNE_PAD_SHORT_FUNCTION: Pad short funtion.  */
   m_ATOM,
 
   /* X86_TUNE_EXT_80387_CONSTANTS */
-  m_PPRO | m_P4_NOCONA | m_CORE2I7 | m_ATOM | m_K6_GEODE | m_ATHLON_K8 | m_GENERIC,
+  m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_ATOM | m_K6_GEODE | m_ATHLON_K8 | m_GENERIC,
 
   /* X86_TUNE_AVOID_VECTOR_DECODE */
-  m_CORE2I7 | m_K8 | m_GENERIC64,
+  m_CORE_ALL | m_K8 | m_GENERIC64,
 
   /* X86_TUNE_PROMOTE_HIMODE_IMUL: Modern CPUs have same latency for HImode
      and SImode multiply, but 386 and 486 do HImode multiply faster.  */
@@ -1967,11 +1968,11 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = {
 
   /* X86_TUNE_SLOW_IMUL_IMM32_MEM: Imul of 32-bit constant and memory is
      vector path on AMD machines.  */
-  m_CORE2I7 | m_K8 | m_AMDFAM10 | m_BDVER | m_BTVER | m_GENERIC64,
+  m_CORE_ALL | m_K8 | m_AMDFAM10 | m_BDVER | m_BTVER | m_GENERIC64,
 
   /* X86_TUNE_SLOW_IMUL_IMM8: Imul of 8-bit constant is vector path on AMD
      machines.  */
-  m_CORE2I7 | m_K8 | m_AMDFAM10 | m_BDVER | m_BTVER | m_GENERIC64,
+  m_CORE_ALL | m_K8 | m_AMDFAM10 | m_BDVER | m_BTVER | m_GENERIC64,
 
   /* X86_TUNE_MOVE_M1_VIA_OR: On pentiums, it is faster to load -1 via OR
      than a MOV.  */
@@ -1988,7 +1989,7 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = {
 
   /* X86_TUNE_USE_VECTOR_FP_CONVERTS: Prefer vector packed SSE conversion
      from FP to FP. */
-  m_CORE2I7 | m_AMDFAM10 | m_GENERIC,
+  m_CORE_ALL | m_AMDFAM10 | m_GENERIC,
 
   /* X86_TUNE_USE_VECTOR_CONVERTS: Prefer vector packed SSE conversion
      from integer to FP. */
@@ -2026,7 +2027,7 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = {
 
   /* X86_TUNE_GENERAL_REGS_SSE_SPILL: Try to spill general regs to SSE
      regs instead of memory.  */
-  m_COREI7 | m_CORE2I7
+  m_CORE_ALL
 };
 
 /* Feature tests against the various architecture variations.  */
@@ -2052,10 +2053,10 @@ static unsigned int initial_ix86_arch_features[X86_ARCH_LAST] = {
 };
 
 static const unsigned int x86_accumulate_outgoing_args
-  = m_PPRO | m_P4_NOCONA | m_ATOM | m_CORE2I7 | m_AMD_MULTIPLE | m_GENERIC;
+  = m_PPRO | m_P4_NOCONA | m_ATOM | m_CORE_ALL | m_AMD_MULTIPLE | m_GENERIC;
 
 static const unsigned int x86_arch_always_fancy_math_387
-  = m_PENT | m_PPRO | m_P4_NOCONA | m_CORE2I7 | m_ATOM | m_AMD_MULTIPLE | m_GENERIC;
+  = m_PENT | m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_ATOM | m_AMD_MULTIPLE | m_GENERIC;
 
 static const unsigned int x86_avx256_split_unaligned_load
   = m_COREI7 | m_GENERIC;
@@ -2436,6 +2437,8 @@ static const struct ptt processor_target_table[PROCESSOR_max] =
   {&core_cost, 16, 10, 16, 10, 16},
   /* Core i7  */
   {&core_cost, 16, 10, 16, 10, 16},
+  /* Core avx2  */
+  {&core_cost, 16, 10, 16, 10, 16},
   {&generic32_cost, 16, 7, 16, 7, 16},
   {&generic64_cost, 16, 10, 16, 10, 16},
   {&amdfam10_cost, 32, 24, 32, 7, 32},
@@ -2463,6 +2466,7 @@ static const char *const cpu_names[TARGET_CPU_DEFAULT_max] =
   "nocona",
   "core2",
   "corei7",
+  "core-avx2",
   "atom",
   "geode",
   "k6",
@@ -2914,7 +2918,7 @@ ix86_option_override_internal (bool main_args_p)
 	| PTA_SSSE3 | PTA_SSE4_1 | PTA_SSE4_2 | PTA_AVX
 	| PTA_CX16 | PTA_POPCNT | PTA_AES | PTA_PCLMUL | PTA_FSGSBASE
 	| PTA_RDRND | PTA_F16C | PTA_FXSR | PTA_XSAVE | PTA_XSAVEOPT},
-      {"core-avx2", PROCESSOR_COREI7, CPU_COREI7,
+      {"core-avx2", PROCESSOR_HASWELL, CPU_COREI7,
 	PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3
 	| PTA_SSSE3 | PTA_SSE4_1 | PTA_SSE4_2 | PTA_AVX | PTA_AVX2
 	| PTA_CX16 | PTA_POPCNT | PTA_AES | PTA_PCLMUL | PTA_FSGSBASE
@@ -24061,6 +24065,7 @@ ix86_issue_rate (void)
     case PROCESSOR_PENTIUM4:
     case PROCESSOR_CORE2:
     case PROCESSOR_COREI7:
+    case PROCESSOR_HASWELL:
     case PROCESSOR_ATHLON:
     case PROCESSOR_K8:
     case PROCESSOR_AMDFAM10:
@@ -24317,6 +24322,7 @@ ia32_multipass_dfa_lookahead (void)
 
     case PROCESSOR_CORE2:
     case PROCESSOR_COREI7:
+    case PROCESSOR_HASWELL:
     case PROCESSOR_ATOM:
       /* Generally, we want haifa-sched:max_issue() to look ahead as far
 	 as many instructions can be executed on a cycle, i.e.,
@@ -24861,6 +24867,7 @@ ix86_sched_init_global (FILE *dump ATTRIBUTE_UNUSED,
     {
     case PROCESSOR_CORE2:
     case PROCESSOR_COREI7:
+    case PROCESSOR_HASWELL:
       /* Do not perform multipass scheduling for pre-reload schedule
          to save compile time.  */
       if (reload_completed)
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 3ac3451..ee21c47 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -248,6 +248,7 @@ extern const struct processor_costs ix86_size_cost;
 #define TARGET_NOCONA (ix86_tune == PROCESSOR_NOCONA)
 #define TARGET_CORE2 (ix86_tune == PROCESSOR_CORE2)
 #define TARGET_COREI7 (ix86_tune == PROCESSOR_COREI7)
+#define TARGET_HASWELL (ix86_tune == PROCESSOR_HASWELL)
 #define TARGET_GENERIC32 (ix86_tune == PROCESSOR_GENERIC32)
 #define TARGET_GENERIC64 (ix86_tune == PROCESSOR_GENERIC64)
 #define TARGET_GENERIC (TARGET_GENERIC32 || TARGET_GENERIC64)
@@ -603,6 +604,7 @@ enum target_cpu_default
   TARGET_CPU_DEFAULT_nocona,
   TARGET_CPU_DEFAULT_core2,
   TARGET_CPU_DEFAULT_corei7,
+  TARGET_CPU_DEFAULT_haswell,
   TARGET_CPU_DEFAULT_atom,
 
   TARGET_CPU_DEFAULT_geode,
@@ -2095,6 +2097,7 @@ enum processor_type
   PROCESSOR_NOCONA,
   PROCESSOR_CORE2,
   PROCESSOR_COREI7,
+  PROCESSOR_HASWELL,
   PROCESSOR_GENERIC32,
   PROCESSOR_GENERIC64,
   PROCESSOR_AMDFAM10,

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC, x86] Changes for AVX and AVX2 processors
  2013-01-10 11:12               ` Vladimir Yakovlev
@ 2013-01-10 11:28                 ` Uros Bizjak
  2013-01-10 11:31                   ` Jakub Jelinek
  0 siblings, 1 reply; 18+ messages in thread
From: Uros Bizjak @ 2013-01-10 11:28 UTC (permalink / raw)
  To: Vladimir Yakovlev; +Cc: gcc-patches

On Thu, Jan 10, 2013 at 12:12 PM, Vladimir Yakovlev
<vbyakovl23@gmail.com> wrote:

> It seems I didn't sent a patch with last changes. Sorry if so.
>
> Vladimir
>
>  2012-12-27  Vladimir Yakovlev  <vladimir.b.yakovlev@intel.com
>
>          * config/i386/i386-c.c (ix86_target_macros_internal): New case.
>           (ix86_target_macros_internal): Likewise.
>
>          * config/i386/i386.c (m_CORE2I7): Removed.
>          (m_CORE_HASWELL): New macro.
>          (m_CORE_ALL): Likewise.
>          (initial_ix86_tune_features): m_CORE2I7 is replaced by m_CORE_ALL.
>          (initial_ix86_arch_features): Likewise.
>          (processor_target_table): Initializations for Core avx2.
>          (cpu_names): New names "core-avx2".
>          (ix86_option_override_internal): Changed PROCESSOR_COREI7 by
>          PROCESSOR_CORE_HASWELL.
>          (ix86_issue_rate): New case.
>          (ia32_multipass_dfa_lookahead): Likewise.
>          (ix86_sched_init_global): Likewise.
>
>          * config/i386/i386.h (TARGET_HASWELL): New macro.
>          (target_cpu_default): New TARGET_CPU_DEFAULT_haswell.
>          (processor_type): New PROCESSOR_HASWELL.

As strictly tuning patch, the patch is OK for mainline.

Please note, that (eventual) processor dispatcher patch will have to
wait for next stage1.

Thanks,
Uros.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC, x86] Changes for AVX and AVX2 processors
  2013-01-10 11:28                 ` Uros Bizjak
@ 2013-01-10 11:31                   ` Jakub Jelinek
  2013-01-11 11:25                     ` Vladimir Yakovlev
  0 siblings, 1 reply; 18+ messages in thread
From: Jakub Jelinek @ 2013-01-10 11:31 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: Vladimir Yakovlev, gcc-patches

On Thu, Jan 10, 2013 at 12:28:24PM +0100, Uros Bizjak wrote:
> On Thu, Jan 10, 2013 at 12:12 PM, Vladimir Yakovlev
> <vbyakovl23@gmail.com> wrote:
> 
> > It seems I didn't sent a patch with last changes. Sorry if so.
> >
> > Vladimir
> >
> >  2012-12-27  Vladimir Yakovlev  <vladimir.b.yakovlev@intel.com

Missing > at the end of line.

> >
> >          * config/i386/i386-c.c (ix86_target_macros_internal): New case.
> >           (ix86_target_macros_internal): Likewise.

There is some additional space at the beginning of this line (note, all
ChangeLog lines but the one with date should be tab indented, not space).

	Jakub

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC, x86] Changes for AVX and AVX2 processors
  2013-01-10 11:31                   ` Jakub Jelinek
@ 2013-01-11 11:25                     ` Vladimir Yakovlev
  2013-01-11 11:27                       ` Jakub Jelinek
  0 siblings, 1 reply; 18+ messages in thread
From: Vladimir Yakovlev @ 2013-01-11 11:25 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Uros Bizjak, gcc-patches

I've fixed Changelog. Can we commit the patch to trunk now?

2012-12-27  Vladimir Yakovlev  <vladimir.b.yakovlev@intel.com>

	* config/i386/i386-c.c (ix86_target_macros_internal): New case.
	(ix86_target_macros_internal): Likewise.

	* config/i386/i386.c (m_CORE2I7): Removed.
	(m_CORE_HASWELL): New macro.
	(m_CORE_ALL): Likewise.
	(initial_ix86_tune_features): m_CORE2I7 is replaced by m_CORE_ALL.
	(initial_ix86_arch_features): Likewise.
	(processor_target_table): Initializations for Core avx2.
	(cpu_names): New names "core-avx2".
	(ix86_option_override_internal): Changed PROCESSOR_COREI7 by
	PROCESSOR_CORE_HASWELL.
	(ix86_issue_rate): New case.
	(ia32_multipass_dfa_lookahead): Likewise.
	(ix86_sched_init_global): Likewise.

	* config/i386/i386.h (TARGET_HASWELL): New macro.
	(target_cpu_default): New TARGET_CPU_DEFAULT_haswell.
	(processor_type): New PROCESSOR_HASWELL.


2013/1/10 Jakub Jelinek <jakub@redhat.com>:
> On Thu, Jan 10, 2013 at 12:28:24PM +0100, Uros Bizjak wrote:
>> On Thu, Jan 10, 2013 at 12:12 PM, Vladimir Yakovlev
>> <vbyakovl23@gmail.com> wrote:
>>
>> > It seems I didn't sent a patch with last changes. Sorry if so.
>> >
>> > Vladimir
>> >
>> >  2012-12-27  Vladimir Yakovlev  <vladimir.b.yakovlev@intel.com
>
> Missing > at the end of line.
>
>> >
>> >          * config/i386/i386-c.c (ix86_target_macros_internal): New case.
>> >           (ix86_target_macros_internal): Likewise.
>
> There is some additional space at the beginning of this line (note, all
> ChangeLog lines but the one with date should be tab indented, not space).
>
>         Jakub

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC, x86] Changes for AVX and AVX2 processors
  2013-01-11 11:25                     ` Vladimir Yakovlev
@ 2013-01-11 11:27                       ` Jakub Jelinek
  2013-01-11 12:15                         ` Vladimir Yakovlev
  0 siblings, 1 reply; 18+ messages in thread
From: Jakub Jelinek @ 2013-01-11 11:27 UTC (permalink / raw)
  To: Vladimir Yakovlev; +Cc: Uros Bizjak, gcc-patches

On Fri, Jan 11, 2013 at 03:25:47PM +0400, Vladimir Yakovlev wrote:
> I've fixed Changelog. Can we commit the patch to trunk now?
> 
> 2012-12-27  Vladimir Yakovlev  <vladimir.b.yakovlev@intel.com>
> 
> 	* config/i386/i386-c.c (ix86_target_macros_internal): New case.
> 	(ix86_target_macros_internal): Likewise.
> 
> 	* config/i386/i386.c (m_CORE2I7): Removed.
> 	(m_CORE_HASWELL): New macro.
> 	(m_CORE_ALL): Likewise.
> 	(initial_ix86_tune_features): m_CORE2I7 is replaced by m_CORE_ALL.
> 	(initial_ix86_arch_features): Likewise.
> 	(processor_target_table): Initializations for Core avx2.
> 	(cpu_names): New names "core-avx2".
> 	(ix86_option_override_internal): Changed PROCESSOR_COREI7 by
> 	PROCESSOR_CORE_HASWELL.
> 	(ix86_issue_rate): New case.
> 	(ia32_multipass_dfa_lookahead): Likewise.
> 	(ix86_sched_init_global): Likewise.
> 
> 	* config/i386/i386.h (TARGET_HASWELL): New macro.
> 	(target_cpu_default): New TARGET_CPU_DEFAULT_haswell.
> 	(processor_type): New PROCESSOR_HASWELL.

Uros already acked the patch, so it certainly is ok to commit now.

	Jakub

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC, x86] Changes for AVX and AVX2 processors
  2013-01-11 11:27                       ` Jakub Jelinek
@ 2013-01-11 12:15                         ` Vladimir Yakovlev
  2013-01-11 12:21                           ` Uros Bizjak
  0 siblings, 1 reply; 18+ messages in thread
From: Vladimir Yakovlev @ 2013-01-11 12:15 UTC (permalink / raw)
  To: Uros Bizjak, jakub; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1272 bytes --]

I sent the patch. Send it once more.

2013/1/11 Jakub Jelinek <jakub@redhat.com>:
> On Fri, Jan 11, 2013 at 03:25:47PM +0400, Vladimir Yakovlev wrote:
>> I've fixed Changelog. Can we commit the patch to trunk now?
>>
>> 2012-12-27  Vladimir Yakovlev  <vladimir.b.yakovlev@intel.com>
>>
>>       * config/i386/i386-c.c (ix86_target_macros_internal): New case.
>>       (ix86_target_macros_internal): Likewise.
>>
>>       * config/i386/i386.c (m_CORE2I7): Removed.
>>       (m_CORE_HASWELL): New macro.
>>       (m_CORE_ALL): Likewise.
>>       (initial_ix86_tune_features): m_CORE2I7 is replaced by m_CORE_ALL.
>>       (initial_ix86_arch_features): Likewise.
>>       (processor_target_table): Initializations for Core avx2.
>>       (cpu_names): New names "core-avx2".
>>       (ix86_option_override_internal): Changed PROCESSOR_COREI7 by
>>       PROCESSOR_CORE_HASWELL.
>>       (ix86_issue_rate): New case.
>>       (ia32_multipass_dfa_lookahead): Likewise.
>>       (ix86_sched_init_global): Likewise.
>>
>>       * config/i386/i386.h (TARGET_HASWELL): New macro.
>>       (target_cpu_default): New TARGET_CPU_DEFAULT_haswell.
>>       (processor_type): New PROCESSOR_HASWELL.
>
> Uros already acked the patch, so it certainly is ok to commit now.
>
>         Jakub

[-- Attachment #2: patch1 --]
[-- Type: application/octet-stream, Size: 12723 bytes --]

diff --git a/gcc/config/i386/i386-c.c b/gcc/config/i386/i386-c.c
index 22e5e9b..2d8abd5 100644
--- a/gcc/config/i386/i386-c.c
+++ b/gcc/config/i386/i386-c.c
@@ -142,6 +142,10 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
       def_or_undef (parse_in, "__corei7");
       def_or_undef (parse_in, "__corei7__");
       break;
+    case PROCESSOR_HASWELL:
+      def_or_undef (parse_in, "__core_avx2");
+      def_or_undef (parse_in, "__core_avx2__");
+      break;
     case PROCESSOR_ATOM:
       def_or_undef (parse_in, "__atom");
       def_or_undef (parse_in, "__atom__");
@@ -232,6 +236,9 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
     case PROCESSOR_COREI7:
       def_or_undef (parse_in, "__tune_corei7__");
       break;
+    case PROCESSOR_HASWELL:
+      def_or_undef (parse_in, "__tune_core_avx2__");
+      break;
     case PROCESSOR_ATOM:
       def_or_undef (parse_in, "__tune_atom__");
       break;
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 69f44aa..cdb5d23 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -1732,7 +1732,8 @@ const struct processor_costs *ix86_cost = &pentium_cost;
 #define m_P4_NOCONA (m_PENT4 | m_NOCONA)
 #define m_CORE2 (1<<PROCESSOR_CORE2)
 #define m_COREI7 (1<<PROCESSOR_COREI7)
-#define m_CORE2I7 (m_CORE2 | m_COREI7)
+#define m_HASWELL (1<<PROCESSOR_HASWELL)
+#define m_CORE_ALL (m_CORE2 | m_COREI7  | m_HASWELL)
 #define m_ATOM (1<<PROCESSOR_ATOM)
 
 #define m_GEODE (1<<PROCESSOR_GEODE)
@@ -1768,16 +1769,16 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = {
      negatively, so enabling for Generic64 seems like good code size
      tradeoff.  We can't enable it for 32bit generic because it does not
      work well with PPro base chips.  */
-  m_386 | m_CORE2I7 | m_K6_GEODE | m_AMD_MULTIPLE | m_GENERIC64,
+  m_386 | m_CORE_ALL | m_K6_GEODE | m_AMD_MULTIPLE | m_GENERIC64,
 
   /* X86_TUNE_PUSH_MEMORY */
-  m_386 | m_P4_NOCONA | m_CORE2I7 | m_K6_GEODE | m_AMD_MULTIPLE | m_GENERIC,
+  m_386 | m_P4_NOCONA | m_CORE_ALL | m_K6_GEODE | m_AMD_MULTIPLE | m_GENERIC,
 
   /* X86_TUNE_ZERO_EXTEND_WITH_AND */
   m_486 | m_PENT,
 
   /* X86_TUNE_UNROLL_STRLEN */
-  m_486 | m_PENT | m_PPRO | m_ATOM | m_CORE2I7 | m_K6 | m_AMD_MULTIPLE | m_GENERIC,
+  m_486 | m_PENT | m_PPRO | m_ATOM | m_CORE_ALL | m_K6 | m_AMD_MULTIPLE | m_GENERIC,
 
   /* X86_TUNE_BRANCH_PREDICTION_HINTS: Branch hints were put in P4 based
      on simulation result. But after P4 was made, no performance benefit
@@ -1789,11 +1790,11 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = {
   ~m_386,
 
   /* X86_TUNE_USE_SAHF */
-  m_PPRO | m_P4_NOCONA | m_CORE2I7 | m_ATOM | m_K6_GEODE | m_K8 | m_AMDFAM10 | m_BDVER | m_BTVER | m_GENERIC,
+  m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_ATOM | m_K6_GEODE | m_K8 | m_AMDFAM10 | m_BDVER | m_BTVER | m_GENERIC,
 
   /* X86_TUNE_MOVX: Enable to zero extend integer registers to avoid
      partial dependencies.  */
-  m_PPRO | m_P4_NOCONA | m_CORE2I7 | m_ATOM | m_GEODE | m_AMD_MULTIPLE  | m_GENERIC,
+  m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_ATOM | m_GEODE | m_AMD_MULTIPLE  | m_GENERIC,
 
   /* X86_TUNE_PARTIAL_REG_STALL: We probably ought to watch for partial
      register stalls on Generic32 compilation setting as well.  However
@@ -1806,17 +1807,17 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = {
   m_PPRO,
 
   /* X86_TUNE_PARTIAL_FLAG_REG_STALL */
-  m_CORE2I7 | m_GENERIC,
+  m_CORE_ALL | m_GENERIC,
 
   /* X86_TUNE_LCP_STALL: Avoid an expensive length-changing prefix stall
    * on 16-bit immediate moves into memory on Core2 and Corei7.  */
-  m_CORE2I7 | m_GENERIC,
+  m_CORE_ALL | m_GENERIC,
 
   /* X86_TUNE_USE_HIMODE_FIOP */
   m_386 | m_486 | m_K6_GEODE,
 
   /* X86_TUNE_USE_SIMODE_FIOP */
-  ~(m_PENT | m_PPRO | m_CORE2I7 | m_ATOM | m_AMD_MULTIPLE | m_GENERIC),
+  ~(m_PENT | m_PPRO | m_CORE_ALL | m_ATOM | m_AMD_MULTIPLE | m_GENERIC),
 
   /* X86_TUNE_USE_MOV0 */
   m_K6,
@@ -1837,7 +1838,7 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = {
   ~(m_PENT | m_PPRO),
 
   /* X86_TUNE_PROMOTE_QIMODE */
-  m_386 | m_486 | m_PENT | m_CORE2I7 | m_ATOM | m_K6_GEODE | m_AMD_MULTIPLE | m_GENERIC,
+  m_386 | m_486 | m_PENT | m_CORE_ALL | m_ATOM | m_K6_GEODE | m_AMD_MULTIPLE | m_GENERIC,
 
   /* X86_TUNE_FAST_PREFIX */
   ~(m_386 | m_486 | m_PENT),
@@ -1878,10 +1879,10 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = {
 
   /* X86_TUNE_INTEGER_DFMODE_MOVES: Enable if integer moves are preferred
      for DFmode copies */
-  ~(m_PPRO | m_P4_NOCONA | m_CORE2I7 | m_ATOM | m_GEODE | m_AMD_MULTIPLE | m_ATOM | m_GENERIC),
+  ~(m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_ATOM | m_GEODE | m_AMD_MULTIPLE | m_ATOM | m_GENERIC),
 
   /* X86_TUNE_PARTIAL_REG_DEPENDENCY */
-  m_P4_NOCONA | m_CORE2I7 | m_ATOM | m_AMD_MULTIPLE | m_GENERIC,
+  m_P4_NOCONA | m_CORE_ALL | m_ATOM | m_AMD_MULTIPLE | m_GENERIC,
 
   /* X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY: In the Generic model we have a
      conflict here in between PPro/Pentium4 based chips that thread 128bit
@@ -1892,7 +1893,7 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = {
      shows that disabling this option on P4 brings over 20% SPECfp regression,
      while enabling it on K8 brings roughly 2.4% regression that can be partly
      masked by careful scheduling of moves.  */
-  m_PPRO | m_P4_NOCONA | m_CORE2I7 | m_ATOM  | m_AMDFAM10 | m_BDVER | m_GENERIC,
+  m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_ATOM  | m_AMDFAM10 | m_BDVER | m_GENERIC,
 
   /* X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL */
   m_COREI7 | m_AMDFAM10 | m_BDVER | m_BTVER,
@@ -1916,7 +1917,7 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = {
   m_PPRO | m_P4_NOCONA,
 
   /* X86_TUNE_MEMORY_MISMATCH_STALL */
-  m_P4_NOCONA | m_CORE2I7 | m_ATOM | m_AMD_MULTIPLE | m_GENERIC,
+  m_P4_NOCONA | m_CORE_ALL | m_ATOM | m_AMD_MULTIPLE | m_GENERIC,
 
   /* X86_TUNE_PROLOGUE_USING_MOVE */
   m_PPRO | m_ATHLON_K8,
@@ -1938,28 +1939,28 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = {
 
   /* X86_TUNE_FOUR_JUMP_LIMIT: Some CPU cores are not able to predict more
      than 4 branch instructions in the 16 byte window.  */
-  m_PPRO | m_P4_NOCONA | m_CORE2I7 | m_ATOM | m_AMD_MULTIPLE | m_GENERIC,
+  m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_ATOM | m_AMD_MULTIPLE | m_GENERIC,
 
   /* X86_TUNE_SCHEDULE */
-  m_PENT | m_PPRO | m_CORE2I7 | m_ATOM | m_K6_GEODE | m_AMD_MULTIPLE | m_GENERIC,
+  m_PENT | m_PPRO | m_CORE_ALL | m_ATOM | m_K6_GEODE | m_AMD_MULTIPLE | m_GENERIC,
 
   /* X86_TUNE_USE_BT */
-  m_CORE2I7 | m_ATOM | m_AMD_MULTIPLE | m_GENERIC,
+  m_CORE_ALL | m_ATOM | m_AMD_MULTIPLE | m_GENERIC,
 
   /* X86_TUNE_USE_INCDEC */
-  ~(m_P4_NOCONA | m_CORE2I7 | m_ATOM | m_GENERIC),
+  ~(m_P4_NOCONA | m_CORE_ALL | m_ATOM | m_GENERIC),
 
   /* X86_TUNE_PAD_RETURNS */
-  m_CORE2I7 | m_AMD_MULTIPLE | m_GENERIC,
+  m_CORE_ALL | m_AMD_MULTIPLE | m_GENERIC,
 
   /* X86_TUNE_PAD_SHORT_FUNCTION: Pad short funtion.  */
   m_ATOM,
 
   /* X86_TUNE_EXT_80387_CONSTANTS */
-  m_PPRO | m_P4_NOCONA | m_CORE2I7 | m_ATOM | m_K6_GEODE | m_ATHLON_K8 | m_GENERIC,
+  m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_ATOM | m_K6_GEODE | m_ATHLON_K8 | m_GENERIC,
 
   /* X86_TUNE_AVOID_VECTOR_DECODE */
-  m_CORE2I7 | m_K8 | m_GENERIC64,
+  m_CORE_ALL | m_K8 | m_GENERIC64,
 
   /* X86_TUNE_PROMOTE_HIMODE_IMUL: Modern CPUs have same latency for HImode
      and SImode multiply, but 386 and 486 do HImode multiply faster.  */
@@ -1967,11 +1968,11 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = {
 
   /* X86_TUNE_SLOW_IMUL_IMM32_MEM: Imul of 32-bit constant and memory is
      vector path on AMD machines.  */
-  m_CORE2I7 | m_K8 | m_AMDFAM10 | m_BDVER | m_BTVER | m_GENERIC64,
+  m_CORE_ALL | m_K8 | m_AMDFAM10 | m_BDVER | m_BTVER | m_GENERIC64,
 
   /* X86_TUNE_SLOW_IMUL_IMM8: Imul of 8-bit constant is vector path on AMD
      machines.  */
-  m_CORE2I7 | m_K8 | m_AMDFAM10 | m_BDVER | m_BTVER | m_GENERIC64,
+  m_CORE_ALL | m_K8 | m_AMDFAM10 | m_BDVER | m_BTVER | m_GENERIC64,
 
   /* X86_TUNE_MOVE_M1_VIA_OR: On pentiums, it is faster to load -1 via OR
      than a MOV.  */
@@ -1988,7 +1989,7 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = {
 
   /* X86_TUNE_USE_VECTOR_FP_CONVERTS: Prefer vector packed SSE conversion
      from FP to FP. */
-  m_CORE2I7 | m_AMDFAM10 | m_GENERIC,
+  m_CORE_ALL | m_AMDFAM10 | m_GENERIC,
 
   /* X86_TUNE_USE_VECTOR_CONVERTS: Prefer vector packed SSE conversion
      from integer to FP. */
@@ -2026,7 +2027,7 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = {
 
   /* X86_TUNE_GENERAL_REGS_SSE_SPILL: Try to spill general regs to SSE
      regs instead of memory.  */
-  m_COREI7 | m_CORE2I7
+  m_CORE_ALL
 };
 
 /* Feature tests against the various architecture variations.  */
@@ -2052,10 +2053,10 @@ static unsigned int initial_ix86_arch_features[X86_ARCH_LAST] = {
 };
 
 static const unsigned int x86_accumulate_outgoing_args
-  = m_PPRO | m_P4_NOCONA | m_ATOM | m_CORE2I7 | m_AMD_MULTIPLE | m_GENERIC;
+  = m_PPRO | m_P4_NOCONA | m_ATOM | m_CORE_ALL | m_AMD_MULTIPLE | m_GENERIC;
 
 static const unsigned int x86_arch_always_fancy_math_387
-  = m_PENT | m_PPRO | m_P4_NOCONA | m_CORE2I7 | m_ATOM | m_AMD_MULTIPLE | m_GENERIC;
+  = m_PENT | m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_ATOM | m_AMD_MULTIPLE | m_GENERIC;
 
 static const unsigned int x86_avx256_split_unaligned_load
   = m_COREI7 | m_GENERIC;
@@ -2436,6 +2437,8 @@ static const struct ptt processor_target_table[PROCESSOR_max] =
   {&core_cost, 16, 10, 16, 10, 16},
   /* Core i7  */
   {&core_cost, 16, 10, 16, 10, 16},
+  /* Core avx2  */
+  {&core_cost, 16, 10, 16, 10, 16},
   {&generic32_cost, 16, 7, 16, 7, 16},
   {&generic64_cost, 16, 10, 16, 10, 16},
   {&amdfam10_cost, 32, 24, 32, 7, 32},
@@ -2463,6 +2466,7 @@ static const char *const cpu_names[TARGET_CPU_DEFAULT_max] =
   "nocona",
   "core2",
   "corei7",
+  "core-avx2",
   "atom",
   "geode",
   "k6",
@@ -2914,7 +2918,7 @@ ix86_option_override_internal (bool main_args_p)
 	| PTA_SSSE3 | PTA_SSE4_1 | PTA_SSE4_2 | PTA_AVX
 	| PTA_CX16 | PTA_POPCNT | PTA_AES | PTA_PCLMUL | PTA_FSGSBASE
 	| PTA_RDRND | PTA_F16C | PTA_FXSR | PTA_XSAVE | PTA_XSAVEOPT},
-      {"core-avx2", PROCESSOR_COREI7, CPU_COREI7,
+      {"core-avx2", PROCESSOR_HASWELL, CPU_COREI7,
 	PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3
 	| PTA_SSSE3 | PTA_SSE4_1 | PTA_SSE4_2 | PTA_AVX | PTA_AVX2
 	| PTA_CX16 | PTA_POPCNT | PTA_AES | PTA_PCLMUL | PTA_FSGSBASE
@@ -24061,6 +24065,7 @@ ix86_issue_rate (void)
     case PROCESSOR_PENTIUM4:
     case PROCESSOR_CORE2:
     case PROCESSOR_COREI7:
+    case PROCESSOR_HASWELL:
     case PROCESSOR_ATHLON:
     case PROCESSOR_K8:
     case PROCESSOR_AMDFAM10:
@@ -24317,6 +24322,7 @@ ia32_multipass_dfa_lookahead (void)
 
     case PROCESSOR_CORE2:
     case PROCESSOR_COREI7:
+    case PROCESSOR_HASWELL:
     case PROCESSOR_ATOM:
       /* Generally, we want haifa-sched:max_issue() to look ahead as far
 	 as many instructions can be executed on a cycle, i.e.,
@@ -24861,6 +24867,7 @@ ix86_sched_init_global (FILE *dump ATTRIBUTE_UNUSED,
     {
     case PROCESSOR_CORE2:
     case PROCESSOR_COREI7:
+    case PROCESSOR_HASWELL:
       /* Do not perform multipass scheduling for pre-reload schedule
          to save compile time.  */
       if (reload_completed)
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 3ac3451..ee21c47 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -248,6 +248,7 @@ extern const struct processor_costs ix86_size_cost;
 #define TARGET_NOCONA (ix86_tune == PROCESSOR_NOCONA)
 #define TARGET_CORE2 (ix86_tune == PROCESSOR_CORE2)
 #define TARGET_COREI7 (ix86_tune == PROCESSOR_COREI7)
+#define TARGET_HASWELL (ix86_tune == PROCESSOR_HASWELL)
 #define TARGET_GENERIC32 (ix86_tune == PROCESSOR_GENERIC32)
 #define TARGET_GENERIC64 (ix86_tune == PROCESSOR_GENERIC64)
 #define TARGET_GENERIC (TARGET_GENERIC32 || TARGET_GENERIC64)
@@ -603,6 +604,7 @@ enum target_cpu_default
   TARGET_CPU_DEFAULT_nocona,
   TARGET_CPU_DEFAULT_core2,
   TARGET_CPU_DEFAULT_corei7,
+  TARGET_CPU_DEFAULT_haswell,
   TARGET_CPU_DEFAULT_atom,
 
   TARGET_CPU_DEFAULT_geode,
@@ -2095,6 +2097,7 @@ enum processor_type
   PROCESSOR_NOCONA,
   PROCESSOR_CORE2,
   PROCESSOR_COREI7,
+  PROCESSOR_HASWELL,
   PROCESSOR_GENERIC32,
   PROCESSOR_GENERIC64,
   PROCESSOR_AMDFAM10,

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC, x86] Changes for AVX and AVX2 processors
  2013-01-11 12:15                         ` Vladimir Yakovlev
@ 2013-01-11 12:21                           ` Uros Bizjak
  2013-01-11 12:38                             ` Vladimir Yakovlev
  0 siblings, 1 reply; 18+ messages in thread
From: Uros Bizjak @ 2013-01-11 12:21 UTC (permalink / raw)
  To: Vladimir Yakovlev; +Cc: jakub, gcc-patches

On Fri, Jan 11, 2013 at 1:14 PM, Vladimir Yakovlev <vbyakovl23@gmail.com> wrote:
> I sent the patch. Send it once more.
>
> 2013/1/11 Jakub Jelinek <jakub@redhat.com>:
>> On Fri, Jan 11, 2013 at 03:25:47PM +0400, Vladimir Yakovlev wrote:
>>> I've fixed Changelog. Can we commit the patch to trunk now?
>>>
>>> 2012-12-27  Vladimir Yakovlev  <vladimir.b.yakovlev@intel.com>
>>>
>>>       * config/i386/i386-c.c (ix86_target_macros_internal): New case.
>>>       (ix86_target_macros_internal): Likewise.
>>>
>>>       * config/i386/i386.c (m_CORE2I7): Removed.
>>>       (m_CORE_HASWELL): New macro.
>>>       (m_CORE_ALL): Likewise.
>>>       (initial_ix86_tune_features): m_CORE2I7 is replaced by m_CORE_ALL.
>>>       (initial_ix86_arch_features): Likewise.
>>>       (processor_target_table): Initializations for Core avx2.
>>>       (cpu_names): New names "core-avx2".
>>>       (ix86_option_override_internal): Changed PROCESSOR_COREI7 by
>>>       PROCESSOR_CORE_HASWELL.
>>>       (ix86_issue_rate): New case.
>>>       (ia32_multipass_dfa_lookahead): Likewise.
>>>       (ix86_sched_init_global): Likewise.
>>>
>>>       * config/i386/i386.h (TARGET_HASWELL): New macro.
>>>       (target_cpu_default): New TARGET_CPU_DEFAULT_haswell.
>>>       (processor_type): New PROCESSOR_HASWELL.
>>
>> Uros already acked the patch, so it certainly is ok to commit now.

Yes, the patch is OK, you can commit it to mainline SVN. If you are
unable to commit, please say so in the patch proposal, so someone will
commit the patch for you (as explained in [1]).

[1] http://gcc.gnu.org/contribute.html

Uros.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC, x86] Changes for AVX and AVX2 processors
  2013-01-11 12:21                           ` Uros Bizjak
@ 2013-01-11 12:38                             ` Vladimir Yakovlev
  2013-01-15 10:08                               ` Kirill Yukhin
  0 siblings, 1 reply; 18+ messages in thread
From: Vladimir Yakovlev @ 2013-01-11 12:38 UTC (permalink / raw)
  To: Kirill Yukhin; +Cc: Uros Bizjak, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2546 bytes --]

Kirill,

Could you commit patch?

2013-01-11  Vladimir Yakovlev  <vladimir.b.yakovlev@intel.com>

	* config/i386/i386-c.c (ix86_target_macros_internal): New case.
	(ix86_target_macros_internal): Likewise.

	* config/i386/i386.c (m_CORE2I7): Removed.
	(m_CORE_HASWELL): New macro.
	(m_CORE_ALL): Likewise.
	(initial_ix86_tune_features): m_CORE2I7 is replaced by m_CORE_ALL.
	(initial_ix86_arch_features): Likewise.
	(processor_target_table): Initializations for Core avx2.
	(cpu_names): New names "core-avx2".
	(ix86_option_override_internal): Changed PROCESSOR_COREI7 by
	PROCESSOR_CORE_HASWELL.
	(ix86_issue_rate): New case.
	(ia32_multipass_dfa_lookahead): Likewise.
	(ix86_sched_init_global): Likewise.

	* config/i386/i386.h (TARGET_HASWELL): New macro.
	(target_cpu_default): New TARGET_CPU_DEFAULT_haswell.
	(processor_type): New PROCESSOR_HASWELL.



2013/1/11 Uros Bizjak <ubizjak@gmail.com>:
> On Fri, Jan 11, 2013 at 1:14 PM, Vladimir Yakovlev <vbyakovl23@gmail.com> wrote:
>> I sent the patch. Send it once more.
>>
>> 2013/1/11 Jakub Jelinek <jakub@redhat.com>:
>>> On Fri, Jan 11, 2013 at 03:25:47PM +0400, Vladimir Yakovlev wrote:
>>>> I've fixed Changelog. Can we commit the patch to trunk now?
>>>>
>>>> 2012-12-27  Vladimir Yakovlev  <vladimir.b.yakovlev@intel.com>
>>>>
>>>>       * config/i386/i386-c.c (ix86_target_macros_internal): New case.
>>>>       (ix86_target_macros_internal): Likewise.
>>>>
>>>>       * config/i386/i386.c (m_CORE2I7): Removed.
>>>>       (m_CORE_HASWELL): New macro.
>>>>       (m_CORE_ALL): Likewise.
>>>>       (initial_ix86_tune_features): m_CORE2I7 is replaced by m_CORE_ALL.
>>>>       (initial_ix86_arch_features): Likewise.
>>>>       (processor_target_table): Initializations for Core avx2.
>>>>       (cpu_names): New names "core-avx2".
>>>>       (ix86_option_override_internal): Changed PROCESSOR_COREI7 by
>>>>       PROCESSOR_CORE_HASWELL.
>>>>       (ix86_issue_rate): New case.
>>>>       (ia32_multipass_dfa_lookahead): Likewise.
>>>>       (ix86_sched_init_global): Likewise.
>>>>
>>>>       * config/i386/i386.h (TARGET_HASWELL): New macro.
>>>>       (target_cpu_default): New TARGET_CPU_DEFAULT_haswell.
>>>>       (processor_type): New PROCESSOR_HASWELL.
>>>
>>> Uros already acked the patch, so it certainly is ok to commit now.
>
> Yes, the patch is OK, you can commit it to mainline SVN. If you are
> unable to commit, please say so in the patch proposal, so someone will
> commit the patch for you (as explained in [1]).
>
> [1] http://gcc.gnu.org/contribute.html
>
> Uros.

[-- Attachment #2: patch1 --]
[-- Type: application/octet-stream, Size: 12723 bytes --]

diff --git a/gcc/config/i386/i386-c.c b/gcc/config/i386/i386-c.c
index 22e5e9b..2d8abd5 100644
--- a/gcc/config/i386/i386-c.c
+++ b/gcc/config/i386/i386-c.c
@@ -142,6 +142,10 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
       def_or_undef (parse_in, "__corei7");
       def_or_undef (parse_in, "__corei7__");
       break;
+    case PROCESSOR_HASWELL:
+      def_or_undef (parse_in, "__core_avx2");
+      def_or_undef (parse_in, "__core_avx2__");
+      break;
     case PROCESSOR_ATOM:
       def_or_undef (parse_in, "__atom");
       def_or_undef (parse_in, "__atom__");
@@ -232,6 +236,9 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
     case PROCESSOR_COREI7:
       def_or_undef (parse_in, "__tune_corei7__");
       break;
+    case PROCESSOR_HASWELL:
+      def_or_undef (parse_in, "__tune_core_avx2__");
+      break;
     case PROCESSOR_ATOM:
       def_or_undef (parse_in, "__tune_atom__");
       break;
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 69f44aa..cdb5d23 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -1732,7 +1732,8 @@ const struct processor_costs *ix86_cost = &pentium_cost;
 #define m_P4_NOCONA (m_PENT4 | m_NOCONA)
 #define m_CORE2 (1<<PROCESSOR_CORE2)
 #define m_COREI7 (1<<PROCESSOR_COREI7)
-#define m_CORE2I7 (m_CORE2 | m_COREI7)
+#define m_HASWELL (1<<PROCESSOR_HASWELL)
+#define m_CORE_ALL (m_CORE2 | m_COREI7  | m_HASWELL)
 #define m_ATOM (1<<PROCESSOR_ATOM)
 
 #define m_GEODE (1<<PROCESSOR_GEODE)
@@ -1768,16 +1769,16 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = {
      negatively, so enabling for Generic64 seems like good code size
      tradeoff.  We can't enable it for 32bit generic because it does not
      work well with PPro base chips.  */
-  m_386 | m_CORE2I7 | m_K6_GEODE | m_AMD_MULTIPLE | m_GENERIC64,
+  m_386 | m_CORE_ALL | m_K6_GEODE | m_AMD_MULTIPLE | m_GENERIC64,
 
   /* X86_TUNE_PUSH_MEMORY */
-  m_386 | m_P4_NOCONA | m_CORE2I7 | m_K6_GEODE | m_AMD_MULTIPLE | m_GENERIC,
+  m_386 | m_P4_NOCONA | m_CORE_ALL | m_K6_GEODE | m_AMD_MULTIPLE | m_GENERIC,
 
   /* X86_TUNE_ZERO_EXTEND_WITH_AND */
   m_486 | m_PENT,
 
   /* X86_TUNE_UNROLL_STRLEN */
-  m_486 | m_PENT | m_PPRO | m_ATOM | m_CORE2I7 | m_K6 | m_AMD_MULTIPLE | m_GENERIC,
+  m_486 | m_PENT | m_PPRO | m_ATOM | m_CORE_ALL | m_K6 | m_AMD_MULTIPLE | m_GENERIC,
 
   /* X86_TUNE_BRANCH_PREDICTION_HINTS: Branch hints were put in P4 based
      on simulation result. But after P4 was made, no performance benefit
@@ -1789,11 +1790,11 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = {
   ~m_386,
 
   /* X86_TUNE_USE_SAHF */
-  m_PPRO | m_P4_NOCONA | m_CORE2I7 | m_ATOM | m_K6_GEODE | m_K8 | m_AMDFAM10 | m_BDVER | m_BTVER | m_GENERIC,
+  m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_ATOM | m_K6_GEODE | m_K8 | m_AMDFAM10 | m_BDVER | m_BTVER | m_GENERIC,
 
   /* X86_TUNE_MOVX: Enable to zero extend integer registers to avoid
      partial dependencies.  */
-  m_PPRO | m_P4_NOCONA | m_CORE2I7 | m_ATOM | m_GEODE | m_AMD_MULTIPLE  | m_GENERIC,
+  m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_ATOM | m_GEODE | m_AMD_MULTIPLE  | m_GENERIC,
 
   /* X86_TUNE_PARTIAL_REG_STALL: We probably ought to watch for partial
      register stalls on Generic32 compilation setting as well.  However
@@ -1806,17 +1807,17 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = {
   m_PPRO,
 
   /* X86_TUNE_PARTIAL_FLAG_REG_STALL */
-  m_CORE2I7 | m_GENERIC,
+  m_CORE_ALL | m_GENERIC,
 
   /* X86_TUNE_LCP_STALL: Avoid an expensive length-changing prefix stall
    * on 16-bit immediate moves into memory on Core2 and Corei7.  */
-  m_CORE2I7 | m_GENERIC,
+  m_CORE_ALL | m_GENERIC,
 
   /* X86_TUNE_USE_HIMODE_FIOP */
   m_386 | m_486 | m_K6_GEODE,
 
   /* X86_TUNE_USE_SIMODE_FIOP */
-  ~(m_PENT | m_PPRO | m_CORE2I7 | m_ATOM | m_AMD_MULTIPLE | m_GENERIC),
+  ~(m_PENT | m_PPRO | m_CORE_ALL | m_ATOM | m_AMD_MULTIPLE | m_GENERIC),
 
   /* X86_TUNE_USE_MOV0 */
   m_K6,
@@ -1837,7 +1838,7 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = {
   ~(m_PENT | m_PPRO),
 
   /* X86_TUNE_PROMOTE_QIMODE */
-  m_386 | m_486 | m_PENT | m_CORE2I7 | m_ATOM | m_K6_GEODE | m_AMD_MULTIPLE | m_GENERIC,
+  m_386 | m_486 | m_PENT | m_CORE_ALL | m_ATOM | m_K6_GEODE | m_AMD_MULTIPLE | m_GENERIC,
 
   /* X86_TUNE_FAST_PREFIX */
   ~(m_386 | m_486 | m_PENT),
@@ -1878,10 +1879,10 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = {
 
   /* X86_TUNE_INTEGER_DFMODE_MOVES: Enable if integer moves are preferred
      for DFmode copies */
-  ~(m_PPRO | m_P4_NOCONA | m_CORE2I7 | m_ATOM | m_GEODE | m_AMD_MULTIPLE | m_ATOM | m_GENERIC),
+  ~(m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_ATOM | m_GEODE | m_AMD_MULTIPLE | m_ATOM | m_GENERIC),
 
   /* X86_TUNE_PARTIAL_REG_DEPENDENCY */
-  m_P4_NOCONA | m_CORE2I7 | m_ATOM | m_AMD_MULTIPLE | m_GENERIC,
+  m_P4_NOCONA | m_CORE_ALL | m_ATOM | m_AMD_MULTIPLE | m_GENERIC,
 
   /* X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY: In the Generic model we have a
      conflict here in between PPro/Pentium4 based chips that thread 128bit
@@ -1892,7 +1893,7 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = {
      shows that disabling this option on P4 brings over 20% SPECfp regression,
      while enabling it on K8 brings roughly 2.4% regression that can be partly
      masked by careful scheduling of moves.  */
-  m_PPRO | m_P4_NOCONA | m_CORE2I7 | m_ATOM  | m_AMDFAM10 | m_BDVER | m_GENERIC,
+  m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_ATOM  | m_AMDFAM10 | m_BDVER | m_GENERIC,
 
   /* X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL */
   m_COREI7 | m_AMDFAM10 | m_BDVER | m_BTVER,
@@ -1916,7 +1917,7 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = {
   m_PPRO | m_P4_NOCONA,
 
   /* X86_TUNE_MEMORY_MISMATCH_STALL */
-  m_P4_NOCONA | m_CORE2I7 | m_ATOM | m_AMD_MULTIPLE | m_GENERIC,
+  m_P4_NOCONA | m_CORE_ALL | m_ATOM | m_AMD_MULTIPLE | m_GENERIC,
 
   /* X86_TUNE_PROLOGUE_USING_MOVE */
   m_PPRO | m_ATHLON_K8,
@@ -1938,28 +1939,28 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = {
 
   /* X86_TUNE_FOUR_JUMP_LIMIT: Some CPU cores are not able to predict more
      than 4 branch instructions in the 16 byte window.  */
-  m_PPRO | m_P4_NOCONA | m_CORE2I7 | m_ATOM | m_AMD_MULTIPLE | m_GENERIC,
+  m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_ATOM | m_AMD_MULTIPLE | m_GENERIC,
 
   /* X86_TUNE_SCHEDULE */
-  m_PENT | m_PPRO | m_CORE2I7 | m_ATOM | m_K6_GEODE | m_AMD_MULTIPLE | m_GENERIC,
+  m_PENT | m_PPRO | m_CORE_ALL | m_ATOM | m_K6_GEODE | m_AMD_MULTIPLE | m_GENERIC,
 
   /* X86_TUNE_USE_BT */
-  m_CORE2I7 | m_ATOM | m_AMD_MULTIPLE | m_GENERIC,
+  m_CORE_ALL | m_ATOM | m_AMD_MULTIPLE | m_GENERIC,
 
   /* X86_TUNE_USE_INCDEC */
-  ~(m_P4_NOCONA | m_CORE2I7 | m_ATOM | m_GENERIC),
+  ~(m_P4_NOCONA | m_CORE_ALL | m_ATOM | m_GENERIC),
 
   /* X86_TUNE_PAD_RETURNS */
-  m_CORE2I7 | m_AMD_MULTIPLE | m_GENERIC,
+  m_CORE_ALL | m_AMD_MULTIPLE | m_GENERIC,
 
   /* X86_TUNE_PAD_SHORT_FUNCTION: Pad short funtion.  */
   m_ATOM,
 
   /* X86_TUNE_EXT_80387_CONSTANTS */
-  m_PPRO | m_P4_NOCONA | m_CORE2I7 | m_ATOM | m_K6_GEODE | m_ATHLON_K8 | m_GENERIC,
+  m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_ATOM | m_K6_GEODE | m_ATHLON_K8 | m_GENERIC,
 
   /* X86_TUNE_AVOID_VECTOR_DECODE */
-  m_CORE2I7 | m_K8 | m_GENERIC64,
+  m_CORE_ALL | m_K8 | m_GENERIC64,
 
   /* X86_TUNE_PROMOTE_HIMODE_IMUL: Modern CPUs have same latency for HImode
      and SImode multiply, but 386 and 486 do HImode multiply faster.  */
@@ -1967,11 +1968,11 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = {
 
   /* X86_TUNE_SLOW_IMUL_IMM32_MEM: Imul of 32-bit constant and memory is
      vector path on AMD machines.  */
-  m_CORE2I7 | m_K8 | m_AMDFAM10 | m_BDVER | m_BTVER | m_GENERIC64,
+  m_CORE_ALL | m_K8 | m_AMDFAM10 | m_BDVER | m_BTVER | m_GENERIC64,
 
   /* X86_TUNE_SLOW_IMUL_IMM8: Imul of 8-bit constant is vector path on AMD
      machines.  */
-  m_CORE2I7 | m_K8 | m_AMDFAM10 | m_BDVER | m_BTVER | m_GENERIC64,
+  m_CORE_ALL | m_K8 | m_AMDFAM10 | m_BDVER | m_BTVER | m_GENERIC64,
 
   /* X86_TUNE_MOVE_M1_VIA_OR: On pentiums, it is faster to load -1 via OR
      than a MOV.  */
@@ -1988,7 +1989,7 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = {
 
   /* X86_TUNE_USE_VECTOR_FP_CONVERTS: Prefer vector packed SSE conversion
      from FP to FP. */
-  m_CORE2I7 | m_AMDFAM10 | m_GENERIC,
+  m_CORE_ALL | m_AMDFAM10 | m_GENERIC,
 
   /* X86_TUNE_USE_VECTOR_CONVERTS: Prefer vector packed SSE conversion
      from integer to FP. */
@@ -2026,7 +2027,7 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = {
 
   /* X86_TUNE_GENERAL_REGS_SSE_SPILL: Try to spill general regs to SSE
      regs instead of memory.  */
-  m_COREI7 | m_CORE2I7
+  m_CORE_ALL
 };
 
 /* Feature tests against the various architecture variations.  */
@@ -2052,10 +2053,10 @@ static unsigned int initial_ix86_arch_features[X86_ARCH_LAST] = {
 };
 
 static const unsigned int x86_accumulate_outgoing_args
-  = m_PPRO | m_P4_NOCONA | m_ATOM | m_CORE2I7 | m_AMD_MULTIPLE | m_GENERIC;
+  = m_PPRO | m_P4_NOCONA | m_ATOM | m_CORE_ALL | m_AMD_MULTIPLE | m_GENERIC;
 
 static const unsigned int x86_arch_always_fancy_math_387
-  = m_PENT | m_PPRO | m_P4_NOCONA | m_CORE2I7 | m_ATOM | m_AMD_MULTIPLE | m_GENERIC;
+  = m_PENT | m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_ATOM | m_AMD_MULTIPLE | m_GENERIC;
 
 static const unsigned int x86_avx256_split_unaligned_load
   = m_COREI7 | m_GENERIC;
@@ -2436,6 +2437,8 @@ static const struct ptt processor_target_table[PROCESSOR_max] =
   {&core_cost, 16, 10, 16, 10, 16},
   /* Core i7  */
   {&core_cost, 16, 10, 16, 10, 16},
+  /* Core avx2  */
+  {&core_cost, 16, 10, 16, 10, 16},
   {&generic32_cost, 16, 7, 16, 7, 16},
   {&generic64_cost, 16, 10, 16, 10, 16},
   {&amdfam10_cost, 32, 24, 32, 7, 32},
@@ -2463,6 +2466,7 @@ static const char *const cpu_names[TARGET_CPU_DEFAULT_max] =
   "nocona",
   "core2",
   "corei7",
+  "core-avx2",
   "atom",
   "geode",
   "k6",
@@ -2914,7 +2918,7 @@ ix86_option_override_internal (bool main_args_p)
 	| PTA_SSSE3 | PTA_SSE4_1 | PTA_SSE4_2 | PTA_AVX
 	| PTA_CX16 | PTA_POPCNT | PTA_AES | PTA_PCLMUL | PTA_FSGSBASE
 	| PTA_RDRND | PTA_F16C | PTA_FXSR | PTA_XSAVE | PTA_XSAVEOPT},
-      {"core-avx2", PROCESSOR_COREI7, CPU_COREI7,
+      {"core-avx2", PROCESSOR_HASWELL, CPU_COREI7,
 	PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3
 	| PTA_SSSE3 | PTA_SSE4_1 | PTA_SSE4_2 | PTA_AVX | PTA_AVX2
 	| PTA_CX16 | PTA_POPCNT | PTA_AES | PTA_PCLMUL | PTA_FSGSBASE
@@ -24061,6 +24065,7 @@ ix86_issue_rate (void)
     case PROCESSOR_PENTIUM4:
     case PROCESSOR_CORE2:
     case PROCESSOR_COREI7:
+    case PROCESSOR_HASWELL:
     case PROCESSOR_ATHLON:
     case PROCESSOR_K8:
     case PROCESSOR_AMDFAM10:
@@ -24317,6 +24322,7 @@ ia32_multipass_dfa_lookahead (void)
 
     case PROCESSOR_CORE2:
     case PROCESSOR_COREI7:
+    case PROCESSOR_HASWELL:
     case PROCESSOR_ATOM:
       /* Generally, we want haifa-sched:max_issue() to look ahead as far
 	 as many instructions can be executed on a cycle, i.e.,
@@ -24861,6 +24867,7 @@ ix86_sched_init_global (FILE *dump ATTRIBUTE_UNUSED,
     {
     case PROCESSOR_CORE2:
     case PROCESSOR_COREI7:
+    case PROCESSOR_HASWELL:
       /* Do not perform multipass scheduling for pre-reload schedule
          to save compile time.  */
       if (reload_completed)
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 3ac3451..ee21c47 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -248,6 +248,7 @@ extern const struct processor_costs ix86_size_cost;
 #define TARGET_NOCONA (ix86_tune == PROCESSOR_NOCONA)
 #define TARGET_CORE2 (ix86_tune == PROCESSOR_CORE2)
 #define TARGET_COREI7 (ix86_tune == PROCESSOR_COREI7)
+#define TARGET_HASWELL (ix86_tune == PROCESSOR_HASWELL)
 #define TARGET_GENERIC32 (ix86_tune == PROCESSOR_GENERIC32)
 #define TARGET_GENERIC64 (ix86_tune == PROCESSOR_GENERIC64)
 #define TARGET_GENERIC (TARGET_GENERIC32 || TARGET_GENERIC64)
@@ -603,6 +604,7 @@ enum target_cpu_default
   TARGET_CPU_DEFAULT_nocona,
   TARGET_CPU_DEFAULT_core2,
   TARGET_CPU_DEFAULT_corei7,
+  TARGET_CPU_DEFAULT_haswell,
   TARGET_CPU_DEFAULT_atom,
 
   TARGET_CPU_DEFAULT_geode,
@@ -2095,6 +2097,7 @@ enum processor_type
   PROCESSOR_NOCONA,
   PROCESSOR_CORE2,
   PROCESSOR_COREI7,
+  PROCESSOR_HASWELL,
   PROCESSOR_GENERIC32,
   PROCESSOR_GENERIC64,
   PROCESSOR_AMDFAM10,

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC, x86] Changes for AVX and AVX2 processors
  2013-01-11 12:38                             ` Vladimir Yakovlev
@ 2013-01-15 10:08                               ` Kirill Yukhin
  0 siblings, 0 replies; 18+ messages in thread
From: Kirill Yukhin @ 2013-01-15 10:08 UTC (permalink / raw)
  To: Vladimir Yakovlev; +Cc: Uros Bizjak, gcc-patches List

Hi, this was checked in: http://gcc.gnu.org/ml/gcc-cvs/2013-01/msg00442.html

Thanks, K

On Fri, Jan 11, 2013 at 4:38 PM, Vladimir Yakovlev <vbyakovl23@gmail.com> wrote:
> Kirill,
>
> Could you commit patch?
>
> 2013-01-11  Vladimir Yakovlev  <vladimir.b.yakovlev@intel.com>
>
>         * config/i386/i386-c.c (ix86_target_macros_internal): New case.
>         (ix86_target_macros_internal): Likewise.
>
>         * config/i386/i386.c (m_CORE2I7): Removed.
>         (m_CORE_HASWELL): New macro.
>         (m_CORE_ALL): Likewise.
>         (initial_ix86_tune_features): m_CORE2I7 is replaced by m_CORE_ALL.
>         (initial_ix86_arch_features): Likewise.
>         (processor_target_table): Initializations for Core avx2.
>         (cpu_names): New names "core-avx2".
>         (ix86_option_override_internal): Changed PROCESSOR_COREI7 by
>         PROCESSOR_CORE_HASWELL.
>         (ix86_issue_rate): New case.
>         (ia32_multipass_dfa_lookahead): Likewise.
>         (ix86_sched_init_global): Likewise.
>
>         * config/i386/i386.h (TARGET_HASWELL): New macro.
>         (target_cpu_default): New TARGET_CPU_DEFAULT_haswell.
>         (processor_type): New PROCESSOR_HASWELL.
>
>
>
> 2013/1/11 Uros Bizjak <ubizjak@gmail.com>:
>> On Fri, Jan 11, 2013 at 1:14 PM, Vladimir Yakovlev <vbyakovl23@gmail.com> wrote:
>>> I sent the patch. Send it once more.
>>>
>>> 2013/1/11 Jakub Jelinek <jakub@redhat.com>:
>>>> On Fri, Jan 11, 2013 at 03:25:47PM +0400, Vladimir Yakovlev wrote:
>>>>> I've fixed Changelog. Can we commit the patch to trunk now?
>>>>>
>>>>> 2012-12-27  Vladimir Yakovlev  <vladimir.b.yakovlev@intel.com>
>>>>>
>>>>>       * config/i386/i386-c.c (ix86_target_macros_internal): New case.
>>>>>       (ix86_target_macros_internal): Likewise.
>>>>>
>>>>>       * config/i386/i386.c (m_CORE2I7): Removed.
>>>>>       (m_CORE_HASWELL): New macro.
>>>>>       (m_CORE_ALL): Likewise.
>>>>>       (initial_ix86_tune_features): m_CORE2I7 is replaced by m_CORE_ALL.
>>>>>       (initial_ix86_arch_features): Likewise.
>>>>>       (processor_target_table): Initializations for Core avx2.
>>>>>       (cpu_names): New names "core-avx2".
>>>>>       (ix86_option_override_internal): Changed PROCESSOR_COREI7 by
>>>>>       PROCESSOR_CORE_HASWELL.
>>>>>       (ix86_issue_rate): New case.
>>>>>       (ia32_multipass_dfa_lookahead): Likewise.
>>>>>       (ix86_sched_init_global): Likewise.
>>>>>
>>>>>       * config/i386/i386.h (TARGET_HASWELL): New macro.
>>>>>       (target_cpu_default): New TARGET_CPU_DEFAULT_haswell.
>>>>>       (processor_type): New PROCESSOR_HASWELL.
>>>>
>>>> Uros already acked the patch, so it certainly is ok to commit now.
>>
>> Yes, the patch is OK, you can commit it to mainline SVN. If you are
>> unable to commit, please say so in the patch proposal, so someone will
>> commit the patch for you (as explained in [1]).
>>
>> [1] http://gcc.gnu.org/contribute.html
>>
>> Uros.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC, x86] Changes for AVX and AVX2 processors
@ 2012-12-27 17:07 Vladimir Yakovlev
  0 siblings, 0 replies; 18+ messages in thread
From: Vladimir Yakovlev @ 2012-12-27 17:07 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1343 bytes --]

New processors core-avx and core-avx2 are added. It was done to have
possibilities to turn new features on for these processors. Please review.

2012-12-27  Vladimir Yakovlev  <vladimir.b.yakovlev@intel.com

        * config/i386/i386-c.c (ix86_target_macros_internal): New cases
added.
         (ix86_target_macros_internal): Likewise.

        * config/i386/i386.c (m_CORE2I7): Removed.
        (m_CORE_AVX): New macro.
        (m_CORE_AVX2): Likewise.
        (m_CORE_ALL): Likewise.
        (initial_ix86_tune_features): m_CORE2I7 is replaced by m_CORE_ALL.
        (initial_ix86_arch_features): Likewise.
        (processor_target_table): Initializations for Core avx  and Core
avx2 were added.
        (cpu_names): New names "coreavx" and "coreavx2" are added.
        (ix86_option_override_internal): Changed PROCESSOR_COREI7 by
PROCESSOR_CORE_AVX
        or PROCESSOR_CORE_AVX2.
        (ix86_issue_rate): New cases added.
        (ia32_multipass_dfa_lookahead): Likewise.
        (ix86_sched_init_global): Likewise.
        (get_builtin_code_for_version): Likewise.

        * config/i386/i386.h (TARGET_CORE_AVX): New macro.
        (TARGET_CORE_AVX2): Likewise.
        (target_cpu_default): New TARGET_CPU_DEFAULT_core_avx and
TARGET_CPU_DEFAULT_core_avx2.
        (processor_type): New PROCESSOR_CORE_AVX and PROCESSOR_CORE_AVX2.

[-- Attachment #2: avx2.patch --]
[-- Type: application/octet-stream, Size: 14124 bytes --]

diff --git a/gcc/config/i386/i386-c.c b/gcc/config/i386/i386-c.c
index 22e5e9b..08e1afe 100644
--- a/gcc/config/i386/i386-c.c
+++ b/gcc/config/i386/i386-c.c
@@ -142,6 +142,14 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
       def_or_undef (parse_in, "__corei7");
       def_or_undef (parse_in, "__corei7__");
       break;
+    case PROCESSOR_CORE_AVX:
+      def_or_undef (parse_in, "__core_avx");
+      def_or_undef (parse_in, "__core_avx__");
+      break;
+    case PROCESSOR_CORE_AVX2:
+      def_or_undef (parse_in, "__core_avx2");
+      def_or_undef (parse_in, "__core_avx2__");
+      break;
     case PROCESSOR_ATOM:
       def_or_undef (parse_in, "__atom");
       def_or_undef (parse_in, "__atom__");
@@ -232,6 +240,12 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
     case PROCESSOR_COREI7:
       def_or_undef (parse_in, "__tune_corei7__");
       break;
+    case PROCESSOR_CORE_AVX:
+      def_or_undef (parse_in, "__tune_core_avx__");
+      break;
+    case PROCESSOR_CORE_AVX2:
+      def_or_undef (parse_in, "__tune_core_avx2__");
+      break;
     case PROCESSOR_ATOM:
       def_or_undef (parse_in, "__tune_atom__");
       break;
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 69f44aa..10411da 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -1732,7 +1732,9 @@ const struct processor_costs *ix86_cost = &pentium_cost;
 #define m_P4_NOCONA (m_PENT4 | m_NOCONA)
 #define m_CORE2 (1<<PROCESSOR_CORE2)
 #define m_COREI7 (1<<PROCESSOR_COREI7)
-#define m_CORE2I7 (m_CORE2 | m_COREI7)
+#define m_CORE_AVX (1<<PROCESSOR_CORE_AVX)
+#define m_CORE_AVX2 (1<<PROCESSOR_CORE_AVX2)
+#define m_CORE_ALL (m_CORE2 | m_COREI7 | m_CORE_AVX | m_CORE_AVX2)
 #define m_ATOM (1<<PROCESSOR_ATOM)
 
 #define m_GEODE (1<<PROCESSOR_GEODE)
@@ -1768,16 +1770,16 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = {
      negatively, so enabling for Generic64 seems like good code size
      tradeoff.  We can't enable it for 32bit generic because it does not
      work well with PPro base chips.  */
-  m_386 | m_CORE2I7 | m_K6_GEODE | m_AMD_MULTIPLE | m_GENERIC64,
+  m_386 | m_CORE_ALL | m_K6_GEODE | m_AMD_MULTIPLE | m_GENERIC64,
 
   /* X86_TUNE_PUSH_MEMORY */
-  m_386 | m_P4_NOCONA | m_CORE2I7 | m_K6_GEODE | m_AMD_MULTIPLE | m_GENERIC,
+  m_386 | m_P4_NOCONA | m_CORE_ALL | m_K6_GEODE | m_AMD_MULTIPLE | m_GENERIC,
 
   /* X86_TUNE_ZERO_EXTEND_WITH_AND */
   m_486 | m_PENT,
 
   /* X86_TUNE_UNROLL_STRLEN */
-  m_486 | m_PENT | m_PPRO | m_ATOM | m_CORE2I7 | m_K6 | m_AMD_MULTIPLE | m_GENERIC,
+  m_486 | m_PENT | m_PPRO | m_ATOM | m_CORE_ALL | m_K6 | m_AMD_MULTIPLE | m_GENERIC,
 
   /* X86_TUNE_BRANCH_PREDICTION_HINTS: Branch hints were put in P4 based
      on simulation result. But after P4 was made, no performance benefit
@@ -1789,11 +1791,11 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = {
   ~m_386,
 
   /* X86_TUNE_USE_SAHF */
-  m_PPRO | m_P4_NOCONA | m_CORE2I7 | m_ATOM | m_K6_GEODE | m_K8 | m_AMDFAM10 | m_BDVER | m_BTVER | m_GENERIC,
+  m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_ATOM | m_K6_GEODE | m_K8 | m_AMDFAM10 | m_BDVER | m_BTVER | m_GENERIC,
 
   /* X86_TUNE_MOVX: Enable to zero extend integer registers to avoid
      partial dependencies.  */
-  m_PPRO | m_P4_NOCONA | m_CORE2I7 | m_ATOM | m_GEODE | m_AMD_MULTIPLE  | m_GENERIC,
+  m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_ATOM | m_GEODE | m_AMD_MULTIPLE  | m_GENERIC,
 
   /* X86_TUNE_PARTIAL_REG_STALL: We probably ought to watch for partial
      register stalls on Generic32 compilation setting as well.  However
@@ -1806,17 +1808,17 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = {
   m_PPRO,
 
   /* X86_TUNE_PARTIAL_FLAG_REG_STALL */
-  m_CORE2I7 | m_GENERIC,
+  m_CORE_ALL | m_GENERIC,
 
   /* X86_TUNE_LCP_STALL: Avoid an expensive length-changing prefix stall
    * on 16-bit immediate moves into memory on Core2 and Corei7.  */
-  m_CORE2I7 | m_GENERIC,
+  m_CORE_ALL | m_GENERIC,
 
   /* X86_TUNE_USE_HIMODE_FIOP */
   m_386 | m_486 | m_K6_GEODE,
 
   /* X86_TUNE_USE_SIMODE_FIOP */
-  ~(m_PENT | m_PPRO | m_CORE2I7 | m_ATOM | m_AMD_MULTIPLE | m_GENERIC),
+  ~(m_PENT | m_PPRO | m_CORE_ALL | m_ATOM | m_AMD_MULTIPLE | m_GENERIC),
 
   /* X86_TUNE_USE_MOV0 */
   m_K6,
@@ -1837,7 +1839,7 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = {
   ~(m_PENT | m_PPRO),
 
   /* X86_TUNE_PROMOTE_QIMODE */
-  m_386 | m_486 | m_PENT | m_CORE2I7 | m_ATOM | m_K6_GEODE | m_AMD_MULTIPLE | m_GENERIC,
+  m_386 | m_486 | m_PENT | m_CORE_ALL | m_ATOM | m_K6_GEODE | m_AMD_MULTIPLE | m_GENERIC,
 
   /* X86_TUNE_FAST_PREFIX */
   ~(m_386 | m_486 | m_PENT),
@@ -1878,10 +1880,10 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = {
 
   /* X86_TUNE_INTEGER_DFMODE_MOVES: Enable if integer moves are preferred
      for DFmode copies */
-  ~(m_PPRO | m_P4_NOCONA | m_CORE2I7 | m_ATOM | m_GEODE | m_AMD_MULTIPLE | m_ATOM | m_GENERIC),
+  ~(m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_ATOM | m_GEODE | m_AMD_MULTIPLE | m_ATOM | m_GENERIC),
 
   /* X86_TUNE_PARTIAL_REG_DEPENDENCY */
-  m_P4_NOCONA | m_CORE2I7 | m_ATOM | m_AMD_MULTIPLE | m_GENERIC,
+  m_P4_NOCONA | m_CORE_ALL | m_ATOM | m_AMD_MULTIPLE | m_GENERIC,
 
   /* X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY: In the Generic model we have a
      conflict here in between PPro/Pentium4 based chips that thread 128bit
@@ -1892,7 +1894,7 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = {
      shows that disabling this option on P4 brings over 20% SPECfp regression,
      while enabling it on K8 brings roughly 2.4% regression that can be partly
      masked by careful scheduling of moves.  */
-  m_PPRO | m_P4_NOCONA | m_CORE2I7 | m_ATOM  | m_AMDFAM10 | m_BDVER | m_GENERIC,
+  m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_ATOM  | m_AMDFAM10 | m_BDVER | m_GENERIC,
 
   /* X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL */
   m_COREI7 | m_AMDFAM10 | m_BDVER | m_BTVER,
@@ -1916,7 +1918,7 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = {
   m_PPRO | m_P4_NOCONA,
 
   /* X86_TUNE_MEMORY_MISMATCH_STALL */
-  m_P4_NOCONA | m_CORE2I7 | m_ATOM | m_AMD_MULTIPLE | m_GENERIC,
+  m_P4_NOCONA | m_CORE_ALL | m_ATOM | m_AMD_MULTIPLE | m_GENERIC,
 
   /* X86_TUNE_PROLOGUE_USING_MOVE */
   m_PPRO | m_ATHLON_K8,
@@ -1938,28 +1940,28 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = {
 
   /* X86_TUNE_FOUR_JUMP_LIMIT: Some CPU cores are not able to predict more
      than 4 branch instructions in the 16 byte window.  */
-  m_PPRO | m_P4_NOCONA | m_CORE2I7 | m_ATOM | m_AMD_MULTIPLE | m_GENERIC,
+  m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_ATOM | m_AMD_MULTIPLE | m_GENERIC,
 
   /* X86_TUNE_SCHEDULE */
-  m_PENT | m_PPRO | m_CORE2I7 | m_ATOM | m_K6_GEODE | m_AMD_MULTIPLE | m_GENERIC,
+  m_PENT | m_PPRO | m_CORE_ALL | m_ATOM | m_K6_GEODE | m_AMD_MULTIPLE | m_GENERIC,
 
   /* X86_TUNE_USE_BT */
-  m_CORE2I7 | m_ATOM | m_AMD_MULTIPLE | m_GENERIC,
+  m_CORE_ALL | m_ATOM | m_AMD_MULTIPLE | m_GENERIC,
 
   /* X86_TUNE_USE_INCDEC */
-  ~(m_P4_NOCONA | m_CORE2I7 | m_ATOM | m_GENERIC),
+  ~(m_P4_NOCONA | m_CORE_ALL | m_ATOM | m_GENERIC),
 
   /* X86_TUNE_PAD_RETURNS */
-  m_CORE2I7 | m_AMD_MULTIPLE | m_GENERIC,
+  m_CORE_ALL | m_AMD_MULTIPLE | m_GENERIC,
 
   /* X86_TUNE_PAD_SHORT_FUNCTION: Pad short funtion.  */
   m_ATOM,
 
   /* X86_TUNE_EXT_80387_CONSTANTS */
-  m_PPRO | m_P4_NOCONA | m_CORE2I7 | m_ATOM | m_K6_GEODE | m_ATHLON_K8 | m_GENERIC,
+  m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_ATOM | m_K6_GEODE | m_ATHLON_K8 | m_GENERIC,
 
   /* X86_TUNE_AVOID_VECTOR_DECODE */
-  m_CORE2I7 | m_K8 | m_GENERIC64,
+  m_CORE_ALL | m_K8 | m_GENERIC64,
 
   /* X86_TUNE_PROMOTE_HIMODE_IMUL: Modern CPUs have same latency for HImode
      and SImode multiply, but 386 and 486 do HImode multiply faster.  */
@@ -1967,11 +1969,11 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = {
 
   /* X86_TUNE_SLOW_IMUL_IMM32_MEM: Imul of 32-bit constant and memory is
      vector path on AMD machines.  */
-  m_CORE2I7 | m_K8 | m_AMDFAM10 | m_BDVER | m_BTVER | m_GENERIC64,
+  m_CORE_ALL | m_K8 | m_AMDFAM10 | m_BDVER | m_BTVER | m_GENERIC64,
 
   /* X86_TUNE_SLOW_IMUL_IMM8: Imul of 8-bit constant is vector path on AMD
      machines.  */
-  m_CORE2I7 | m_K8 | m_AMDFAM10 | m_BDVER | m_BTVER | m_GENERIC64,
+  m_CORE_ALL | m_K8 | m_AMDFAM10 | m_BDVER | m_BTVER | m_GENERIC64,
 
   /* X86_TUNE_MOVE_M1_VIA_OR: On pentiums, it is faster to load -1 via OR
      than a MOV.  */
@@ -1988,7 +1990,7 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = {
 
   /* X86_TUNE_USE_VECTOR_FP_CONVERTS: Prefer vector packed SSE conversion
      from FP to FP. */
-  m_CORE2I7 | m_AMDFAM10 | m_GENERIC,
+  m_CORE_ALL | m_AMDFAM10 | m_GENERIC,
 
   /* X86_TUNE_USE_VECTOR_CONVERTS: Prefer vector packed SSE conversion
      from integer to FP. */
@@ -2026,7 +2028,7 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = {
 
   /* X86_TUNE_GENERAL_REGS_SSE_SPILL: Try to spill general regs to SSE
      regs instead of memory.  */
-  m_COREI7 | m_CORE2I7
+  m_CORE_ALL
 };
 
 /* Feature tests against the various architecture variations.  */
@@ -2052,10 +2054,10 @@ static unsigned int initial_ix86_arch_features[X86_ARCH_LAST] = {
 };
 
 static const unsigned int x86_accumulate_outgoing_args
-  = m_PPRO | m_P4_NOCONA | m_ATOM | m_CORE2I7 | m_AMD_MULTIPLE | m_GENERIC;
+  = m_PPRO | m_P4_NOCONA | m_ATOM | m_CORE_ALL | m_AMD_MULTIPLE | m_GENERIC;
 
 static const unsigned int x86_arch_always_fancy_math_387
-  = m_PENT | m_PPRO | m_P4_NOCONA | m_CORE2I7 | m_ATOM | m_AMD_MULTIPLE | m_GENERIC;
+  = m_PENT | m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_ATOM | m_AMD_MULTIPLE | m_GENERIC;
 
 static const unsigned int x86_avx256_split_unaligned_load
   = m_COREI7 | m_GENERIC;
@@ -2436,6 +2438,10 @@ static const struct ptt processor_target_table[PROCESSOR_max] =
   {&core_cost, 16, 10, 16, 10, 16},
   /* Core i7  */
   {&core_cost, 16, 10, 16, 10, 16},
+  /* Core avx  */
+  {&core_cost, 16, 10, 16, 10, 16},
+  /* Core avx2  */
+  {&core_cost, 16, 10, 16, 10, 16},
   {&generic32_cost, 16, 7, 16, 7, 16},
   {&generic64_cost, 16, 10, 16, 10, 16},
   {&amdfam10_cost, 32, 24, 32, 7, 32},
@@ -2463,6 +2469,8 @@ static const char *const cpu_names[TARGET_CPU_DEFAULT_max] =
   "nocona",
   "core2",
   "corei7",
+  "coreavx",
+  "coreavx2",
   "atom",
   "geode",
   "k6",
@@ -2904,17 +2912,17 @@ ix86_option_override_internal (bool main_args_p)
       {"corei7", PROCESSOR_COREI7, CPU_COREI7,
 	PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3
 	| PTA_SSSE3 | PTA_SSE4_1 | PTA_SSE4_2 | PTA_CX16 | PTA_FXSR},
-      {"corei7-avx", PROCESSOR_COREI7, CPU_COREI7,
+      {"corei7-avx", PROCESSOR_CORE_AVX, CPU_COREI7,
 	PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3
 	| PTA_SSSE3 | PTA_SSE4_1 | PTA_SSE4_2 | PTA_AVX
 	| PTA_CX16 | PTA_POPCNT | PTA_AES | PTA_PCLMUL
 	| PTA_FXSR | PTA_XSAVE | PTA_XSAVEOPT},
-      {"core-avx-i", PROCESSOR_COREI7, CPU_COREI7,
+      {"core-avx-i", PROCESSOR_CORE_AVX, CPU_COREI7,
 	PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3
 	| PTA_SSSE3 | PTA_SSE4_1 | PTA_SSE4_2 | PTA_AVX
 	| PTA_CX16 | PTA_POPCNT | PTA_AES | PTA_PCLMUL | PTA_FSGSBASE
 	| PTA_RDRND | PTA_F16C | PTA_FXSR | PTA_XSAVE | PTA_XSAVEOPT},
-      {"core-avx2", PROCESSOR_COREI7, CPU_COREI7,
+      {"core-avx2", PROCESSOR_CORE_AVX2, CPU_COREI7,
 	PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3
 	| PTA_SSSE3 | PTA_SSE4_1 | PTA_SSE4_2 | PTA_AVX | PTA_AVX2
 	| PTA_CX16 | PTA_POPCNT | PTA_AES | PTA_PCLMUL | PTA_FSGSBASE
@@ -24061,6 +24069,8 @@ ix86_issue_rate (void)
     case PROCESSOR_PENTIUM4:
     case PROCESSOR_CORE2:
     case PROCESSOR_COREI7:
+    case PROCESSOR_CORE_AVX:
+    case PROCESSOR_CORE_AVX2:
     case PROCESSOR_ATHLON:
     case PROCESSOR_K8:
     case PROCESSOR_AMDFAM10:
@@ -24317,6 +24327,8 @@ ia32_multipass_dfa_lookahead (void)
 
     case PROCESSOR_CORE2:
     case PROCESSOR_COREI7:
+    case PROCESSOR_CORE_AVX:
+    case PROCESSOR_CORE_AVX2:
     case PROCESSOR_ATOM:
       /* Generally, we want haifa-sched:max_issue() to look ahead as far
 	 as many instructions can be executed on a cycle, i.e.,
@@ -24861,6 +24873,8 @@ ix86_sched_init_global (FILE *dump ATTRIBUTE_UNUSED,
     {
     case PROCESSOR_CORE2:
     case PROCESSOR_COREI7:
+    case PROCESSOR_CORE_AVX:
+    case PROCESSOR_CORE_AVX2:
       /* Do not perform multipass scheduling for pre-reload schedule
          to save compile time.  */
       if (reload_completed)
@@ -28705,6 +28719,14 @@ get_builtin_code_for_version (tree decl, tree *predicate_list)
 	      arg_str = "corei7";
 	      priority = P_PROC_SSE4_2;
 	      break;
+	    case PROCESSOR_CORE_AVX:
+	      arg_str = "core_avx";
+	      priority = P_PROC_SSE4_2;
+	      break;
+	    case PROCESSOR_CORE_AVX2:
+	      arg_str = "core_avx2";
+	      priority = P_PROC_SSE4_2;
+	      break;
 	    case PROCESSOR_ATOM:
 	      arg_str = "atom";
 	      priority = P_PROC_SSSE3;
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 3ac3451..d3ee8b0 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -248,6 +248,8 @@ extern const struct processor_costs ix86_size_cost;
 #define TARGET_NOCONA (ix86_tune == PROCESSOR_NOCONA)
 #define TARGET_CORE2 (ix86_tune == PROCESSOR_CORE2)
 #define TARGET_COREI7 (ix86_tune == PROCESSOR_COREI7)
+#define TARGET_CORE_AVX (ix86_tune == PROCESSOR_CORE_AVX)
+#define TARGET_CORE_AVX2 (ix86_tune == PROCESSOR_CORE_AVX2)
 #define TARGET_GENERIC32 (ix86_tune == PROCESSOR_GENERIC32)
 #define TARGET_GENERIC64 (ix86_tune == PROCESSOR_GENERIC64)
 #define TARGET_GENERIC (TARGET_GENERIC32 || TARGET_GENERIC64)
@@ -603,6 +605,8 @@ enum target_cpu_default
   TARGET_CPU_DEFAULT_nocona,
   TARGET_CPU_DEFAULT_core2,
   TARGET_CPU_DEFAULT_corei7,
+  TARGET_CPU_DEFAULT_core_avx,
+  TARGET_CPU_DEFAULT_core_avx2,
   TARGET_CPU_DEFAULT_atom,
 
   TARGET_CPU_DEFAULT_geode,
@@ -2095,6 +2099,8 @@ enum processor_type
   PROCESSOR_NOCONA,
   PROCESSOR_CORE2,
   PROCESSOR_COREI7,
+  PROCESSOR_CORE_AVX,
+  PROCESSOR_CORE_AVX2,
   PROCESSOR_GENERIC32,
   PROCESSOR_GENERIC64,
   PROCESSOR_AMDFAM10,

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2013-01-15 10:08 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-12-28 13:36 [RFC, x86] Changes for AVX and AVX2 processors Uros Bizjak
2012-12-29  5:26 ` Vladimir Yakovlev
2012-12-29 10:07   ` Uros Bizjak
     [not found]     ` <CAK1BsWpUdUg+ivi7pFdbUr8R45YjhbBCNhmN=98sMmW99dy-tg@mail.gmail.com>
2012-12-29 16:57       ` Vladimir Yakovlev
2012-12-30 13:21         ` Uros Bizjak
2012-12-30 16:05           ` Vladimir Yakovlev
2012-12-30 18:05             ` Uros Bizjak
2013-01-10 11:12               ` Vladimir Yakovlev
2013-01-10 11:28                 ` Uros Bizjak
2013-01-10 11:31                   ` Jakub Jelinek
2013-01-11 11:25                     ` Vladimir Yakovlev
2013-01-11 11:27                       ` Jakub Jelinek
2013-01-11 12:15                         ` Vladimir Yakovlev
2013-01-11 12:21                           ` Uros Bizjak
2013-01-11 12:38                             ` Vladimir Yakovlev
2013-01-15 10:08                               ` Kirill Yukhin
2012-12-30 11:59       ` Uros Bizjak
  -- strict thread matches above, loose matches on Subject: below --
2012-12-27 17:07 Vladimir Yakovlev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).