public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH, i386]: AMD bdver3 enablement
@ 2012-10-11  8:39 Gopalasubramanian, Ganesh
  2012-10-11 17:14 ` Uros Bizjak
  0 siblings, 1 reply; 12+ messages in thread
From: Gopalasubramanian, Ganesh @ 2012-10-11  8:39 UTC (permalink / raw)
  To: gcc-patches; +Cc: Uros Bizjak (ubizjak@gmail.com)

[-- Attachment #1: Type: text/plain, Size: 1868 bytes --]

Hi 

The attached patch (Patch.txt) enables the next version of AMD's bulldozer core.
A new file (bdver3.md) is also attached which describes the pipelines.

Presently, the tuning is copied mostly from its predecessor. 
However, the pipelines are modeled for the new core.

"Make -k check" passes.

Is it OK for upstream?

Regards
Ganesh

2012-10-11  Ganesh Gopalasubramanian  <Ganesh.Gopalasubramanian@amd.com>

	bdver3 Enablement
	* gcc/doc/extend.texi: Add details about bdver3.
	* gcc/doc/invoke.texi: Add details about bdver3.
	* config.gcc (i[34567]86-*-linux* | ...): Add bdver3.
	(case ${target}): Add bdver3.
	* config/i386/i386.h (TARGET_BDVER3): New definition.
	* config/i386/i386.md (define_attr "cpu"): Add bdver3.
	* config/i386/cpuid.h (bit_XSAVEOPT): New field for 
	getting the xsaveopt cpuid flag.
	* config/i386/sse.md (sseshuf): New instruction 
	attribute added for identifying the shuffle instructions.
	* config/i386/i386.opt (flag_dispatch_scheduler): Add bdver3.
	* config/i386/i386-c.c (ix86_target_macros_internal): Add
	bdver3 def_and_undef
	* config/i386/driver-i386.c (host_detect_local_cpu): Let
	-march=native recognize bdver3 processors.	
	* config/i386/i386.c (struct processor_costs btver2_cost): New
	bdver3 cost table.
	(m_BDVER3): New definition.
	(m_AMD_MULTIPLE): Includes m_BDVER3.
	(initial_ix86_tune_features): Add bdver3 tune.
	(processor_target_table): Add bdver3 entry.
	(static const char *const cpu_names): Add bdver3 entry.
	(software_prefetching_beneficial_p): Add bdver3.
	(ix86_option_override_internal): Add bdver3 instruction sets.
	(ix86_issue_rate): Add bdver3.
	(ix86_adjust_cost): Add bdver3.
	(enum target_cpu_default): Add TARGET_CPU_DEFAULT_bdver3.
	(enum processor_type): Add PROCESSOR_BDVER3.
	* config/i386/bdver3.md: New file describing bdver3 pipelines.

[-- Attachment #2: bdver3.md --]
[-- Type: application/octet-stream, Size: 31946 bytes --]

;; Copyright (C) 2012, Free Software Foundation, Inc.
;;
;; This file is part of GCC.
;;
;; GCC is free software; you can redistribute it and/or modify
;; it under the terms of the GNU General Public License as published by
;; the Free Software Foundation; either version 3, or (at your option)
;; any later version.
;;
;; GCC is distributed in the hope that it will be useful,
;; but WITHOUT ANY WARRANTY; without even the implied warranty of
;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
;; GNU General Public License for more details.
;;
;; You should have received a copy of the GNU General Public License
;; along with GCC; see the file COPYING3.  If not see
;; <http://www.gnu.org/licenses/>.
;;
;; AMD bdver3 Scheduling
;;
;; The bdver3 contains three pipelined FP units and two integer units.
;; Fetching and decoding logic is different from previous fam15 processors.
;; Fetching is done every two cycles rather than every cycle and
;; two decode units are available. The decode units therefore decode
;; four instructions in two cycles.
;;
;; Three DirectPath instructions decoders and only one VectorPath decoder
;; is available.  They can decode three DirectPath instructions or one
;; VectorPath instruction per cycle.
;;
;; The load/store queue unit is not attached to the schedulers but
;; communicates with all the execution units separately instead.
;;
;; bdver3 belong to fam15 processors. We use the same insn attribute
;; that was used for bdver3 decoding scheme.

(define_automaton "bdver3,bdver3_ieu,bdver3_load,bdver3_fp,bdver3_agu")

(define_cpu_unit "bdver3-decode0" "bdver3")
(define_cpu_unit "bdver3-decode1" "bdver3")
(define_cpu_unit "bdver3-decodev" "bdver3")

;; Double decoded instructions take two cycles whereas
;; direct instructions take one cycle.
;; Therefore four direct instructions can be decoded by
;; two decoders in two cycles.
;; Vectorpath instructions are single issue instructions.
;; So, we have separate unit for vector instructions.
(exclusion_set "bdver3-decodev" "bdver3-decode0,bdver3-decode1")

(define_reservation "bdver3-vector" "bdver3-decodev")
(define_reservation "bdver3-direct" "(bdver3-decode0|bdver3-decode1)")
;; Double instructions take two cycles to decode.
(define_reservation "bdver3-double" "(bdver3-decode0|bdver3-decode1)*2")

(define_cpu_unit "bdver3-ieu0" "bdver3_ieu")
(define_cpu_unit "bdver3-ieu1" "bdver3_ieu")
(define_reservation "bdver3-ieu" "(bdver3-ieu0|bdver3-ieu1)")

(define_cpu_unit "bdver3-agu0" "bdver3_agu")
(define_cpu_unit "bdver3-agu1" "bdver3_agu")
(define_reservation "bdver3-agu" "(bdver3-agu0|bdver3-agu1)")

(define_cpu_unit "bdver3-load0" "bdver3_load")
(define_cpu_unit "bdver3-load1" "bdver3_load")
(define_reservation "bdver3-load" "bdver3-agu,
				   (bdver3-load0|bdver3-load1),nothing")
;; 128bit SSE instructions issue two loads at once.
(define_reservation "bdver3-load2" "bdver3-agu,
				   (bdver3-load0+bdver3-load1),nothing")

(define_reservation "bdver3-store" "(bdver3-load0 | bdver3-load1)")
;; 128bit SSE instructions issue two stores at once.
(define_reservation "bdver3-store2" "(bdver3-load0+bdver3-load1)")

;; vectorpath (microcoded) instructions are single issue instructions.
;; So, they occupy all the integer units.
(define_reservation "bdver3-ivector" "bdver3-ieu0+bdver3-ieu1+
                                      bdver3-agu0+bdver3-agu1+
                                      bdver3-load0+bdver3-load1")

(define_reservation "bdver3-fpsched" "nothing,nothing,nothing")

;; The floating point loads.
(define_reservation "bdver3-fpload" "(bdver3-fpsched + bdver3-load)")
(define_reservation "bdver3-fpload2" "(bdver3-fpsched + bdver3-load2)")

;; Three FP units.
(define_cpu_unit "bdver3-ffma0" "bdver3_fp")
(define_cpu_unit "bdver3-ffma1" "bdver3_fp")
(define_cpu_unit "bdver3-fpsto" "bdver3_fp")

(define_reservation "bdver3-fvector" "bdver3-ffma0+bdver3-ffma1+
                                      bdver3-fpsto+bdver3-load0+
                                      bdver3-load1")

(define_reservation "bdver3-ffma"     "(bdver3-ffma0 | bdver3-ffma1)")
(define_reservation "bdver3-fcvt"     "bdver3-ffma0")
(define_reservation "bdver3-fmma"     "bdver3-ffma0")
(define_reservation "bdver3-fxbar"    "bdver3-ffma1")
(define_reservation "bdver3-fmal"     "(bdver3-ffma0 | bdver3-fpsto)")
(define_reservation "bdver3-fsto"     "bdver3-fpsto")
(define_reservation "bdver3-fpshuf"    "bdver3-fpsto")

;; Jump instructions are executed in the branch unit completely transparent to us.
(define_insn_reservation "bdver3_call" 2
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "call,callv"))
			 "bdver3-double,(bdver3-agu | bdver3-ieu),nothing")
;; PUSH mem is double path.
(define_insn_reservation "bdver3_push" 1
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "push"))
			 "bdver3-direct,bdver3-ieu,bdver3-store")
;; POP r16/mem are double path.
(define_insn_reservation "bdver3_pop" 1
                         (and (eq_attr "cpu" "bdver3")
                              (eq_attr "type" "pop"))
                         "bdver3-direct,bdver3-ivector")
;; LEAVE no latency info so far, assume same with amdfam10.
(define_insn_reservation "bdver3_leave" 3
                         (and (eq_attr "cpu" "bdver3")
                              (eq_attr "type" "leave"))
                         "bdver3-vector,bdver3-ivector")
;; LEA executes in AGU unit with 1 cycle latency on BDVER3.
(define_insn_reservation "bdver3_lea" 1
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "lea"))
			 "bdver3-direct,bdver3-ieu")
;; MUL executes in special multiplier unit attached to IEU1.
(define_insn_reservation "bdver3_imul_DI" 6
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "imul")
				   (and (eq_attr "mode" "DI")
					(eq_attr "memory" "none,unknown"))))
			 "bdver3-direct,bdver3-ieu1")
(define_insn_reservation "bdver3_imul" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "imul")
				   (eq_attr "memory" "none,unknown")))
			 "bdver3-direct,bdver3-ieu1")
(define_insn_reservation "bdver3_imul_mem_DI" 10
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "imul")
				   (and (eq_attr "mode" "DI")
					(eq_attr "memory" "load,both"))))
			 "bdver3-direct,bdver3-load,bdver3-ieu1")
(define_insn_reservation "bdver3_imul_mem" 8
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "imul")
				   (eq_attr "memory" "load,both")))
			 "bdver3-direct,bdver3-load,bdver3-ieu1")

(define_insn_reservation "bdver3_str" 6
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "str")
				   (eq_attr "memory" "load,both,store")))
			 "bdver3-vector,bdver3-load,bdver3-ivector")

;; Integer instructions.
(define_insn_reservation "bdver3_idirect" 1
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "bdver1_decode" "direct")
				   (and (eq_attr "unit" "integer,unknown")
					(eq_attr "memory" "none,unknown"))))
			 "bdver3-direct,(bdver3-ieu|bdver3-agu)")
(define_insn_reservation "bdver3_ivector" 2
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "bdver1_decode" "vector")
				   (and (eq_attr "unit" "integer,unknown")
					(eq_attr "memory" "none,unknown"))))
			 "bdver3-vector,bdver3-ivector")
(define_insn_reservation "bdver3_idirect_loadmov" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "imov")
				   (eq_attr "memory" "load")))
			 "bdver3-direct,bdver3-load")
(define_insn_reservation "bdver3_idirect_load" 5
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "bdver1_decode" "direct")
				   (and (eq_attr "unit" "integer,unknown")
					(eq_attr "memory" "load"))))
			 "bdver3-direct,bdver3-load,bdver3-ieu")
(define_insn_reservation "bdver3_idirect_movstore" 5
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "imov")
				   (eq_attr "memory" "store")))
			 "bdver3-direct,bdver3-ieu,bdver3-store")
(define_insn_reservation "bdver3_idirect_both" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "bdver1_decode" "direct")
				   (and (eq_attr "unit" "integer,unknown")
					(eq_attr "memory" "both"))))
			 "bdver3-direct,bdver3-load,
			  bdver3-ieu,bdver3-store,
			  bdver3-store")
(define_insn_reservation "bdver3_idirect_store" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "bdver1_decode" "direct")
				   (and (eq_attr "unit" "integer,unknown")
					(eq_attr "memory" "store"))))
			 "bdver3-direct,(bdver3-ieu+bdver3-agu),
			  bdver3-store")
;; BDVER3 floating point units.
(define_insn_reservation "bdver3_fldxf" 13
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "fmov")
				   (and (eq_attr "memory" "load")
					(eq_attr "mode" "XF"))))
			 "bdver3-vector,bdver3-fpload2,bdver3-fvector*9")
(define_insn_reservation "bdver3_fld" 2
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "fmov")
				   (eq_attr "memory" "load")))
			 "bdver3-direct,bdver3-fpload,bdver3-ffma")
(define_insn_reservation "bdver3_fstxf" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "fmov")
				   (and (eq_attr "memory" "store,both")
					(eq_attr "mode" "XF"))))
			 "bdver3-vector,(bdver3-fpsched+bdver3-agu),(bdver3-store2+(bdver3-fvector*6))")
(define_insn_reservation "bdver3_fst" 2
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "fmov")
				   (eq_attr "memory" "store,both")))
			 "bdver3-double,(bdver3-fpsched),(bdver3-fsto+bdver3-store)")
(define_insn_reservation "bdver3_fist" 2
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "fistp,fisttp"))
			 "bdver3-double,(bdver3-fpsched),(bdver3-fsto+bdver3-store)")
(define_insn_reservation "bdver3_fmov_bdver1" 2
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "fmov"))
			 "bdver3-direct,bdver3-fpsched,bdver3-ffma")
(define_insn_reservation "bdver3_fadd_load" 10
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "fop")
				   (eq_attr "memory" "load")))
			 "bdver3-direct,bdver3-fpload,bdver3-ffma")
(define_insn_reservation "bdver3_fadd" 6
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "fop"))
			 "bdver3-direct,bdver3-fpsched,bdver3-ffma")
(define_insn_reservation "bdver3_fmul_load" 6
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "fmul")
				   (eq_attr "memory" "load")))
			 "bdver3-double,bdver3-fpload,bdver3-ffma")
(define_insn_reservation "bdver3_fmul" 6
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "fmul"))
			 "bdver3-direct,bdver3-fpsched,bdver3-ffma")
(define_insn_reservation "bdver3_fsgn" 2
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "fsgn"))
			 "bdver3-direct,bdver3-fpsched,bdver3-ffma")
(define_insn_reservation "bdver3_fdiv_load" 42
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "fdiv")
				   (eq_attr "memory" "load")))
			 "bdver3-direct,bdver3-fpload,bdver3-ffma")
(define_insn_reservation "bdver3_fdiv" 42
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "fdiv"))
			 "bdver3-direct,bdver3-fpsched,bdver3-ffma")
(define_insn_reservation "bdver3_fpspc_load" 143
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "fpspc")
				   (eq_attr "memory" "load")))
			 "bdver3-vector,bdver3-fpload,bdver3-fvector")
(define_insn_reservation "bdver3_fcmov_load" 17
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "fcmov")
				   (eq_attr "memory" "load")))
			 "bdver3-vector,bdver3-fpload,bdver3-fvector")
(define_insn_reservation "bdver3_fcmov" 15
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "fcmov"))
			 "bdver3-vector,bdver3-fpsched,bdver3-fvector")
(define_insn_reservation "bdver3_fcomi_load" 6
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "fcmp")
				   (and (eq_attr "bdver1_decode" "double")
					(eq_attr "memory" "load"))))
			 "bdver3-double,bdver3-fpload,(bdver3-ffma | bdver3-fsto)")
(define_insn_reservation "bdver3_fcomi" 2
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "bdver1_decode" "double")
				   (eq_attr "type" "fcmp")))
			 "bdver3-double,bdver3-fpsched,(bdver3-ffma | bdver3-fsto)")
(define_insn_reservation "bdver3_fcom_load" 6
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "fcmp")
				   (eq_attr "memory" "load")))
			 "bdver3-direct,bdver3-fpload,bdver3-ffma")
(define_insn_reservation "bdver3_fcom" 2
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "fcmp"))
			 "bdver3-direct,bdver3-fpsched,bdver3-ffma")
(define_insn_reservation "bdver3_fxch" 2
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "fxch"))
			 "bdver3-direct,bdver3-fpsched,bdver3-ffma")

;; SSE loads.
(define_insn_reservation "bdver3_ssevector_avx128_unaligned_load" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemov")
				   (and (eq_attr "prefix" "vex")
					(and (eq_attr "movu" "1")
					     (and (eq_attr "mode" "V4SF,V2DF")
						  (eq_attr "memory" "load"))))))
			 "bdver3-direct,bdver3-fpload")
(define_insn_reservation "bdver3_ssevector_avx256_unaligned_load" 5
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemov")
				   (and (eq_attr "movu" "1")
				        (and (eq_attr "mode" "V8SF,V4DF")
				             (eq_attr "memory" "load")))))
			 "bdver3-double,bdver3-fpload")
(define_insn_reservation "bdver3_ssevector_sse128_unaligned_load" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemov")
				   (and (eq_attr "movu" "1")
				        (and (eq_attr "mode" "V4SF,V2DF")
				             (eq_attr "memory" "load")))))
			 "bdver3-direct,bdver3-fpload,bdver3-fmal")
(define_insn_reservation "bdver3_ssevector_avx128_load" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemov")
				   (and (eq_attr "prefix" "vex")
				        (and (eq_attr "mode" "V4SF,V2DF,TI")
				             (eq_attr "memory" "load")))))
			 "bdver3-direct,bdver3-fpload,bdver3-fmal")
(define_insn_reservation "bdver3_ssevector_avx256_load" 5
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemov")
				   (and (eq_attr "mode" "V8SF,V4DF,OI")
				        (eq_attr "memory" "load"))))
			 "bdver3-double,bdver3-fpload,bdver3-fmal")
(define_insn_reservation "bdver3_ssevector_sse128_load" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemov")
				   (and (eq_attr "mode" "V4SF,V2DF,TI")
				        (eq_attr "memory" "load"))))
			 "bdver3-direct,bdver3-fpload")
(define_insn_reservation "bdver3_ssescalar_movq_load" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemov")
				   (and (eq_attr "mode" "DI")
				        (eq_attr "memory" "load"))))
			 "bdver3-direct,bdver3-fpload,bdver3-fmal")
(define_insn_reservation "bdver3_ssescalar_vmovss_load" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemov")
				   (and (eq_attr "prefix" "vex")
				        (and (eq_attr "mode" "SF")
				             (eq_attr "memory" "load")))))
			 "bdver3-direct,bdver3-fpload")
(define_insn_reservation "bdver3_ssescalar_sse128_load" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemov")
				   (and (eq_attr "mode" "SF,DF")
				        (eq_attr "memory" "load"))))
			 "bdver3-direct,bdver3-fpload, bdver3-ffma")
(define_insn_reservation "bdver3_mmxsse_load" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "mmxmov,ssemov")
				   (eq_attr "memory" "load")))
			 "bdver3-direct,bdver3-fpload, bdver3-fmal")

;; SSE stores.
(define_insn_reservation "bdver3_sse_store_avx256" 5
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemov")
				   (and (eq_attr "mode" "V8SF,V4DF,OI")
					(eq_attr "memory" "store,both"))))
			 "bdver3-double,bdver3-fpsched,((bdver3-fsto+bdver3-store)*2)")
(define_insn_reservation "bdver3_sse_store" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemov")
				   (and (eq_attr "mode" "V4SF,V2DF,TI")
					(eq_attr "memory" "store,both"))))
			 "bdver3-direct,bdver3-fpsched,((bdver3-fsto+bdver3-store)*2)")
(define_insn_reservation "bdver3_mmxsse_store_short" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "mmxmov,ssemov")
				   (eq_attr "memory" "store,both")))
			 "bdver3-direct,bdver3-fpsched,(bdver3-fsto+bdver3-store)")

;; Register moves.
(define_insn_reservation "bdver3_ssevector_avx256" 3
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemov")
				   (and (eq_attr "mode" "V8SF,V4DF,OI")
					(eq_attr "memory" "none"))))
			 "bdver3-double,bdver3-fpsched,bdver3-fmal")
(define_insn_reservation "bdver3_movss_movsd" 2
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemov")
				   (and (eq_attr "mode" "SF,DF")
                                        (eq_attr "memory" "none"))))
			 "bdver3-direct,bdver3-fpsched,bdver3-ffma")
(define_insn_reservation "bdver3_mmxssemov" 2
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "mmxmov,ssemov")
				   (eq_attr "memory" "none")))
			 "bdver3-direct,bdver3-fpsched,bdver3-fmal")
;; SSE logs.
(define_insn_reservation "bdver3_sselog_load_256" 7
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "sselog,sselog1")
				   (and (eq_attr "mode" "V8SF")
				   (eq_attr "memory" "load"))))
			 "bdver3-double,bdver3-fpload,bdver3-fmal")
(define_insn_reservation "bdver3_sselog_256" 3
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "sselog,sselog1")
                                   (eq_attr "mode" "V8SF")))
			 "bdver3-double,bdver3-fpsched,bdver3-fmal")
(define_insn_reservation "bdver3_sseshuf_256" 3
                         (and (eq_attr "cpu" "bdver3")
                              (and (eq_attr "type" "sseshuf")
                                   (eq_attr "mode" "V8SF")))
                         "bdver3-double,bdver3-fpsched,bdver3-fpshuf")
(define_insn_reservation "bdver3_sselog_load" 6
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "sselog,sselog1")
				   (eq_attr "memory" "load")))
			 "bdver3-direct,bdver3-fpload,bdver3-fxbar")
(define_insn_reservation "bdver3_sselog" 2
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "sselog,sselog1"))
			 "bdver3-direct,bdver3-fpsched,bdver3-fxbar")

;; PCMP actually executes in FMAL.
(define_insn_reservation "bdver3_ssecmp_load" 6
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecmp")
				   (eq_attr "memory" "load")))
			 "bdver3-direct,bdver3-fpload,bdver3-ffma")
(define_insn_reservation "bdver3_ssecmp" 2
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "ssecmp"))
			 "bdver3-direct,bdver3-fpsched,bdver3-ffma")
(define_insn_reservation "bdver3_ssecomi_load" 6
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecomi")
				   (eq_attr "memory" "load")))
			 "bdver3-double,bdver3-fpload,(bdver3-ffma | bdver3-fsto)")
(define_insn_reservation "bdver3_ssecomi" 2
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "ssecomi"))
			 "bdver3-double,bdver3-fpsched,(bdver3-ffma | bdver3-fsto)")

;; Conversions behaves very irregularly and the scheduling is critical here.
;; Take each instruction separately.

;; 256 bit conversion.
(define_insn_reservation "bdver3_vcvtX2Y_avx256_load" 8
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "load")
					(ior (ior (match_operand:V4DF 0 "register_operand")
					          (ior (match_operand:V8SF 0 "register_operand")
						       (match_operand:V8SI 0 "register_operand")))
					     (ior (match_operand:V4DF 1 "nonimmediate_operand")
						  (ior (match_operand:V8SF 1 "nonimmediate_operand")
						       (match_operand:V8SI 1 "nonimmediate_operand")))))))
			 "bdver3-vector,bdver3-fpload,bdver3-fvector")
(define_insn_reservation "bdver3_vcvtX2Y_avx256" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "none")
					(ior (ior (match_operand:V4DF 0 "register_operand")
					          (ior (match_operand:V8SF 0 "register_operand")
						       (match_operand:V8SI 0 "register_operand")))
					     (ior (match_operand:V4DF 1 "nonimmediate_operand")
						  (ior (match_operand:V8SF 1 "nonimmediate_operand")
						       (match_operand:V8SI 1 "nonimmediate_operand")))))))
			 "bdver3-vector,bdver3-fpsched,bdver3-fvector")
;; CVTSS2SD, CVTSD2SS.
(define_insn_reservation "bdver3_ssecvt_cvtss2sd_load" 8
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "mode" "SF,DF")
					(eq_attr "memory" "load"))))
			 "bdver3-direct,bdver3-fpload,bdver3-fcvt")
(define_insn_reservation "bdver3_ssecvt_cvtss2sd" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "mode" "SF,DF")
					(eq_attr "memory" "none"))))
			 "bdver3-direct,bdver3-fpsched,bdver3-fcvt")
;; CVTSI2SD, CVTSI2SS, CVTSI2SDQ, CVTSI2SSQ.
(define_insn_reservation "bdver3_sseicvt_cvtsi2sd_load" 8
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "sseicvt")
				   (and (eq_attr "mode" "SF,DF")
					(eq_attr "memory" "load"))))
			 "bdver3-direct,bdver3-fpload,bdver3-fcvt")
(define_insn_reservation "bdver3_sseicvt_cvtsi2sd" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "sseicvt")
				   (and (eq_attr "mode" "SF,DF")
					(eq_attr "memory" "none"))))
			 "bdver3-double,bdver3-fpsched,(nothing | bdver3-fcvt)")
;; CVTPD2PS.
(define_insn_reservation "bdver3_ssecvt_cvtpd2ps_load" 8
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "load")
                                        (and (match_operand:V4SF 0 "register_operand")
					     (match_operand:V2DF 1 "nonimmediate_operand")))))
			 "bdver3-double,bdver3-fpload,(bdver3-fxbar | bdver3-fcvt)")
(define_insn_reservation "bdver3_ssecvt_cvtpd2ps" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "none")
                                        (and (match_operand:V4SF 0 "register_operand")
					     (match_operand:V2DF 1 "nonimmediate_operand")))))
			 "bdver3-double,bdver3-fpsched,(bdver3-fxbar | bdver3-fcvt)")
;; CVTPI2PS, CVTDQ2PS.
(define_insn_reservation "bdver3_ssecvt_cvtdq2ps_load" 8
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "load")
                                        (and (match_operand:V4SF 0 "register_operand")
					     (ior (match_operand:V2SI 1 "nonimmediate_operand")
					          (match_operand:V4SI 1 "nonimmediate_operand"))))))
			 "bdver3-direct,bdver3-fpload,bdver3-fcvt")
(define_insn_reservation "bdver3_ssecvt_cvtdq2ps" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "none")
                                        (and (match_operand:V4SF 0 "register_operand")
					     (ior (match_operand:V2SI 1 "nonimmediate_operand")
					          (match_operand:V4SI 1 "nonimmediate_operand"))))))
			 "bdver3-direct,bdver3-fpsched,bdver3-fcvt")
;; CVTDQ2PD.
(define_insn_reservation "bdver3_ssecvt_cvtdq2pd_load" 8
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "load")
                                        (and (match_operand:V2DF 0 "register_operand")
					     (match_operand:V4SI 1 "nonimmediate_operand")))))
			 "bdver3-double,bdver3-fpload,(bdver3-fxbar | bdver3-fcvt)")
(define_insn_reservation "bdver3_ssecvt_cvtdq2pd" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "none")
                                        (and (match_operand:V2DF 0 "register_operand")
					     (match_operand:V4SI 1 "nonimmediate_operand")))))
			 "bdver3-double,bdver3-fpsched,(bdver3-fxbar | bdver3-fcvt)")
;; CVTPS2PD, CVTPI2PD.
(define_insn_reservation "bdver3_ssecvt_cvtps2pd_load" 6
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "load")
                                        (and (match_operand:V2DF 0 "register_operand")
					     (ior (match_operand:V2SI 1 "nonimmediate_operand")
					          (match_operand:V4SF 1 "nonimmediate_operand"))))))
			 "bdver3-double,bdver3-fpload,(bdver3-fxbar | bdver3-fcvt)")
(define_insn_reservation "bdver3_ssecvt_cvtps2pd" 2
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "load")
                                        (and (match_operand:V2DF 0 "register_operand")
					     (ior (match_operand:V2SI 1 "nonimmediate_operand")
					          (match_operand:V4SF 1 "nonimmediate_operand"))))))
			 "bdver3-double,bdver3-fpsched,(bdver3-fxbar | bdver3-fcvt)")
;; CVTSD2SI, CVTSD2SIQ, CVTSS2SI, CVTSS2SIQ, CVTTSD2SI, CVTTSD2SIQ, CVTTSS2SI, CVTTSS2SIQ.
(define_insn_reservation "bdver3_ssecvt_cvtsX2si_load" 8
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "sseicvt")
				   (and (eq_attr "mode" "SI,DI")
					(eq_attr "memory" "load"))))
			 "bdver3-double,bdver3-fpload,(bdver3-fcvt | bdver3-fsto)")
(define_insn_reservation "bdver3_ssecvt_cvtsX2si" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "sseicvt")
				   (and (eq_attr "mode" "SI,DI")
					(eq_attr "memory" "none"))))
			 "bdver3-double,bdver3-fpsched,(bdver3-fcvt | bdver3-fsto)")
;; CVTPD2PI, CVTTPD2PI.
(define_insn_reservation "bdver3_ssecvt_cvtpd2pi_load" 8
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "load")
				        (and (match_operand:V2DF 1 "nonimmediate_operand")
					     (match_operand:V2SI 0 "register_operand")))))
			 "bdver3-double,bdver3-fpload,(bdver3-fcvt | bdver3-fxbar)")
(define_insn_reservation "bdver3_ssecvt_cvtpd2pi" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "none")
				        (and (match_operand:V2DF 1 "nonimmediate_operand")
					     (match_operand:V2SI 0 "register_operand")))))
			 "bdver3-double,bdver3-fpsched,(bdver3-fcvt | bdver3-fxbar)")
;; CVTPD2DQ, CVTTPD2DQ.
(define_insn_reservation "bdver3_ssecvt_cvtpd2dq_load" 6
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "load")
				        (and (match_operand:V2DF 1 "nonimmediate_operand")
					     (match_operand:V4SI 0 "register_operand")))))
			 "bdver3-double,bdver3-fpload,(bdver3-fcvt | bdver3-fxbar)")
(define_insn_reservation "bdver3_ssecvt_cvtpd2dq" 2
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "none")
				        (and (match_operand:V2DF 1 "nonimmediate_operand")
					     (match_operand:V4SI 0 "register_operand")))))
			 "bdver3-double,bdver3-fpsched,(bdver3-fcvt | bdver3-fxbar)")
;; CVTPS2PI, CVTTPS2PI, CVTPS2DQ, CVTTPS2DQ.
(define_insn_reservation "bdver3_ssecvt_cvtps2pi_load" 8
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
                                   (and (eq_attr "memory" "load")
				        (and (match_operand:V4SF 1 "nonimmediate_operand")
				             (ior (match_operand: V2SI 0 "register_operand")
						  (match_operand: V4SI 0 "register_operand"))))))
			 "bdver3-direct,bdver3-fpload,bdver3-fcvt")
(define_insn_reservation "bdver3_ssecvt_cvtps2pi" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "none")
				        (and (match_operand:V4SF 1 "nonimmediate_operand")
				             (ior (match_operand: V2SI 0 "register_operand")
						  (match_operand: V4SI 0 "register_operand"))))))
			 "bdver3-direct,bdver3-fpsched,bdver3-fcvt")

;; SSE MUL, ADD, and MULADD.
(define_insn_reservation "bdver3_ssemuladd_load_256" 11
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemul,sseadd,ssemuladd")
				   (and (eq_attr "mode" "V8SF,V4DF")
					(eq_attr "memory" "load"))))
			 "bdver3-double,bdver3-fpload,bdver3-ffma")
(define_insn_reservation "bdver3_ssemuladd_256" 7
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemul,sseadd,ssemuladd")
				   (and (eq_attr "mode" "V8SF,V4DF")
					(eq_attr "memory" "none"))))
			 "bdver3-double,bdver3-fpsched,bdver3-ffma")
(define_insn_reservation "bdver3_ssemuladd_load" 10
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemul,sseadd,ssemuladd")
				   (eq_attr "memory" "load")))
			 "bdver3-direct,bdver3-fpload,bdver3-ffma")
(define_insn_reservation "bdver3_ssemuladd" 6
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemul,sseadd,ssemuladd")
				   (eq_attr "memory" "none")))
			 "bdver3-direct,bdver3-fpsched,bdver3-ffma")
(define_insn_reservation "bdver3_sseimul_load" 8
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "sseimul")
				   (eq_attr "memory" "load")))
			 "bdver3-direct,bdver3-fpload,bdver3-fmma")
(define_insn_reservation "bdver3_sseimul" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "sseimul")
				   (eq_attr "memory" "none")))
			 "bdver3-direct,bdver3-fpsched,bdver3-fmma")
(define_insn_reservation "bdver3_sseiadd_load" 6
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "sseiadd")
				   (eq_attr "memory" "load")))
			 "bdver3-direct,bdver3-fpload,bdver3-fmal")
(define_insn_reservation "bdver3_sseiadd" 2
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "sseiadd")
				   (eq_attr "memory" "none")))
			 "bdver3-direct,bdver3-fpsched,bdver3-fmal")

;; SSE DIV: no throughput information (assume same as amdfam10).
(define_insn_reservation "bdver3_ssediv_double_load_256" 27
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssediv")
				   (and (eq_attr "mode" "V4DF")
				        (eq_attr "memory" "load"))))
			 "bdver3-double,bdver3-fpload,(bdver3-ffma0*17 | bdver3-ffma1*17)")
(define_insn_reservation "bdver3_ssediv_double_256" 27
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssediv")
				   (and (eq_attr "mode" "V4DF")
				        (eq_attr "memory" "none"))))
			 "bdver3-double,bdver3-fpsched,(bdver3-ffma0*17 | bdver3-ffma1*17)")
(define_insn_reservation "bdver3_ssediv_single_load_256" 27
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssediv")
				   (and (eq_attr "mode" "V8SF")
				        (eq_attr "memory" "load"))))
			 "bdver3-double,bdver3-fpload,(bdver3-ffma0*17 | bdver3-ffma1*17)")
(define_insn_reservation "bdver3_ssediv_single_256" 24
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssediv")
				   (and (eq_attr "mode" "V8SF")
				        (eq_attr "memory" "none"))))
			 "bdver3-double,bdver3-fpsched,(bdver3-ffma0*17 | bdver3-ffma1*17)")
(define_insn_reservation "bdver3_ssediv_double_load" 27
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssediv")
				   (and (eq_attr "mode" "DF,V2DF")
					(eq_attr "memory" "load"))))
			 "bdver3-direct,bdver3-fpload,(bdver3-ffma0*17 | bdver3-ffma1*17)")
(define_insn_reservation "bdver3_ssediv_double" 27
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssediv")
				   (and (eq_attr "mode" "DF,V2DF")
					(eq_attr "memory" "none"))))
			 "bdver3-direct,bdver3-fpsched,(bdver3-ffma0*17 | bdver3-ffma1*17)")
(define_insn_reservation "bdver3_ssediv_single_load" 27 
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssediv")
				   (and (eq_attr "mode" "SF,V4SF")
					(eq_attr "memory" "load"))))
			 "bdver3-direct,bdver3-fpload,(bdver3-ffma0*17 | bdver3-ffma1*17)")
(define_insn_reservation "bdver3_ssediv_single" 24
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssediv")
				   (and (eq_attr "mode" "SF,V4SF")
					(eq_attr "memory" "none"))))
			 "bdver3-direct,bdver3-fpsched,(bdver3-ffma0*17 | bdver3-ffma1*17)")

(define_insn_reservation "bdver3_sseins" 3
                         (and (eq_attr "cpu" "bdver3")
                              (and (eq_attr "type" "sseins")
                                   (eq_attr "mode" "TI")))
                         "bdver3-direct,bdver3-fpsched,bdver3-fxbar")


[-- Attachment #3: Patch.txt --]
[-- Type: text/plain, Size: 21095 bytes --]

Index: gcc/doc/extend.texi
===================================================================
--- gcc/doc/extend.texi	(revision 192031)
+++ gcc/doc/extend.texi	(working copy)
@@ -9589,6 +9589,9 @@
 @item bdver2
 AMD family 15h Bulldozer version 2.
 
+@item bdver3
+AMD family 15h Bulldozer version 3.
+
 @item btver2
 AMD family 16h CPU.
 @end table
Index: gcc/doc/invoke.texi
===================================================================
--- gcc/doc/invoke.texi	(revision 192031)
+++ gcc/doc/invoke.texi	(working copy)
@@ -13445,6 +13445,11 @@
 supersets BMI, TBM, F16C, FMA, AVX, XOP, LWP, AES, PCL_MUL, CX16, MMX, SSE,
 SSE2, SSE3, SSE4A, SSSE3, SSE4.1, SSE4.2, ABM and 64-bit instruction set 
 extensions.)
+@item bdver3
+AMD Family 15h core based CPUs with x86-64 instruction set support.  (This
+supersets BMI, TBM, F16C, FMA, AVX, XOP, LWP, AES, PCL_MUL, CX16, MMX, SSE,
+SSE2, SSE3, SSE4A, SSSE3, SSE4.1, SSE4.2, ABM and 64-bit instruction set 
+extensions.)
 
 @item btver1
 CPUs based on AMD Family 14h cores with x86-64 instruction set support.  (This
Index: gcc/config.gcc
===================================================================
--- gcc/config.gcc	(revision 192031)
+++ gcc/config.gcc	(working copy)
@@ -1231,7 +1231,7 @@
 			TM_MULTILIB_CONFIG=`echo $TM_MULTILIB_CONFIG | sed 's/^,//'`
 			need_64bit_isa=yes
 			case X"${with_cpu}" in
-			Xgeneric|Xatom|Xcore2|Xcorei7|Xcorei7-avx|Xnocona|Xx86-64|Xbdver2|Xbdver1|Xbtver2|Xbtver1|Xamdfam10|Xbarcelona|Xk8|Xopteron|Xathlon64|Xathlon-fx|Xathlon64-sse3|Xk8-sse3|Xopteron-sse3)
+			Xgeneric|Xatom|Xcore2|Xcorei7|Xcorei7-avx|Xnocona|Xx86-64|Xbdver3|Xbdver2|Xbdver1|Xbtver2|Xbtver1|Xamdfam10|Xbarcelona|Xk8|Xopteron|Xathlon64|Xathlon-fx|Xathlon64-sse3|Xk8-sse3|Xopteron-sse3)
 				;;
 			X)
 				if test x$with_cpu_64 = x; then
@@ -1240,7 +1240,7 @@
 				;;
 			*)
 				echo "Unsupported CPU used in --with-cpu=$with_cpu, supported values:" 1>&2
-				echo "generic atom core2 corei7 corei7-avx nocona x86-64 bdver2 bdver1 btver2 btver1 amdfam10 barcelona k8 opteron athlon64 athlon-fx athlon64-sse3 k8-sse3 opteron-sse3" 1>&2
+				echo "generic atom core2 corei7 corei7-avx nocona x86-64 bdver3 bdver2 bdver1 btver2 btver1 amdfam10 barcelona k8 opteron athlon64 athlon-fx athlon64-sse3 k8-sse3 opteron-sse3" 1>&2
 				exit 1
 				;;
 			esac
@@ -1352,7 +1352,7 @@
 		tmake_file="$tmake_file i386/t-sol2-64"
 		need_64bit_isa=yes
 		case X"${with_cpu}" in
-		Xgeneric|Xatom|Xcore2|Xcorei7|Xcorei7-avx|Xnocona|Xx86-64|Xbdver2|Xbdver1|Xbtver2|Xbtver1|Xamdfam10|Xbarcelona|Xk8|Xopteron|Xathlon64|Xathlon-fx|Xathlon64-sse3|Xk8-sse3|Xopteron-sse3)
+		Xgeneric|Xatom|Xcore2|Xcorei7|Xcorei7-avx|Xnocona|Xx86-64|Xbdver3|Xbdver2|Xbdver1|Xbtver2|Xbtver1|Xamdfam10|Xbarcelona|Xk8|Xopteron|Xathlon64|Xathlon-fx|Xathlon64-sse3|Xk8-sse3|Xopteron-sse3)
 			;;
 		X)
 			if test x$with_cpu_64 = x; then
@@ -1361,7 +1361,7 @@
 			;;
 		*)
 			echo "Unsupported CPU used in --with-cpu=$with_cpu, supported values:" 1>&2
-			echo "generic atom core2 corei7 corei7-avx nocona x86-64 bdver2 bdver1 btver2 btver1 amdfam10 barcelona k8 opteron athlon64 athlon-fx athlon64-sse3 k8-sse3 opteron-sse3" 1>&2
+			echo "generic atom core2 corei7 corei7-avx nocona x86-64 bdver3 bdver2 bdver1 btver2 btver1 amdfam10 barcelona k8 opteron athlon64 athlon-fx athlon64-sse3 k8-sse3 opteron-sse3" 1>&2
 			exit 1
 			;;
 		esac
@@ -1418,7 +1418,7 @@
 			if test x$enable_targets = xall; then
 				tm_defines="${tm_defines} TARGET_BI_ARCH=1"
 				case X"${with_cpu}" in
-				Xgeneric|Xatom|Xcore2|Xcorei7|Xcorei7-avx|Xnocona|Xx86-64|Xbdver2|Xbdver1|Xbtver2|Xbtver1|Xamdfam10|Xbarcelona|Xk8|Xopteron|Xathlon64|Xathlon-fx|Xathlon64-sse3|Xk8-sse3|Xopteron-sse3)
+				Xgeneric|Xatom|Xcore2|Xcorei7|Xcorei7-avx|Xnocona|Xx86-64|Xbdver3|Xbdver2|Xbdver1|Xbtver2|Xbtver1|Xamdfam10|Xbarcelona|Xk8|Xopteron|Xathlon64|Xathlon-fx|Xathlon64-sse3|Xk8-sse3|Xopteron-sse3)
 					;;
 				X)
 					if test x$with_cpu_64 = x; then
@@ -1427,7 +1427,7 @@
 					;;
 				*)
 					echo "Unsupported CPU used in --with-cpu=$with_cpu, supported values:" 1>&2
-					echo "generic atom core2 corei7 Xcorei7-avx nocona x86-64 bdver2 bdver1 btver2 btver1 amdfam10 barcelona k8 opteron athlon64 athlon-fx athlon64-sse3 k8-sse3 opteron-sse3" 1>&2
+					echo "generic atom core2 corei7 Xcorei7-avx nocona x86-64 bdver3 bdver2 bdver1 btver2 btver1 amdfam10 barcelona k8 opteron athlon64 athlon-fx athlon64-sse3 k8-sse3 opteron-sse3" 1>&2
 					exit 1
 					;;
 				esac
@@ -2660,6 +2660,10 @@
     ;;
   i686-*-* | i786-*-*)
     case ${target_noncanonical} in
+      bdver3-*)
+        arch=bdver3
+        cpu=bdver3
+        ;;
       bdver2-*)
         arch=bdver2
         cpu=bdver2
@@ -2761,6 +2765,10 @@
     ;;
   x86_64-*-*)
     case ${target_noncanonical} in
+      bdver3-*)
+        arch=bdver3
+        cpu=bdver3
+        ;;
       bdver2-*)
         arch=bdver2
         cpu=bdver2
@@ -3211,8 +3219,8 @@
 				;;
 			"" | x86-64 | generic | native \
 			| k8 | k8-sse3 | athlon64 | athlon64-sse3 | opteron \
-			| opteron-sse3 | athlon-fx | bdver2 | bdver1 | btver2 | btver1 \
-			| amdfam10 | barcelona | nocona | core2 | corei7 \
+			| opteron-sse3 | athlon-fx | bdver3 | bdver2 | bdver1 | btver2 \
+			| btver1 | amdfam10 | barcelona | nocona | core2 | corei7 \
 			| corei7-avx | core-avx-i | core-avx2 | atom)
 				# OK
 				;;
Index: gcc/config/i386/i386.h
===================================================================
--- gcc/config/i386/i386.h	(revision 192031)
+++ gcc/config/i386/i386.h	(working copy)
@@ -251,6 +251,7 @@
 #define TARGET_AMDFAM10 (ix86_tune == PROCESSOR_AMDFAM10)
 #define TARGET_BDVER1 (ix86_tune == PROCESSOR_BDVER1)
 #define TARGET_BDVER2 (ix86_tune == PROCESSOR_BDVER2)
+#define TARGET_BDVER3 (ix86_tune == PROCESSOR_BDVER3)
 #define TARGET_BTVER1 (ix86_tune == PROCESSOR_BTVER1)
 #define TARGET_BTVER2 (ix86_tune == PROCESSOR_BTVER2)
 #define TARGET_ATOM (ix86_tune == PROCESSOR_ATOM)
@@ -610,6 +611,7 @@
   TARGET_CPU_DEFAULT_amdfam10,
   TARGET_CPU_DEFAULT_bdver1,
   TARGET_CPU_DEFAULT_bdver2,
+  TARGET_CPU_DEFAULT_bdver3,
   TARGET_CPU_DEFAULT_btver1,
   TARGET_CPU_DEFAULT_btver2,
 
@@ -2092,6 +2094,7 @@
   PROCESSOR_AMDFAM10,
   PROCESSOR_BDVER1,
   PROCESSOR_BDVER2,
+  PROCESSOR_BDVER3,
   PROCESSOR_BTVER1,
   PROCESSOR_BTVER2,
   PROCESSOR_ATOM,
Index: gcc/config/i386/i386.md
===================================================================
--- gcc/config/i386/i386.md	(revision 192031)
+++ gcc/config/i386/i386.md	(working copy)
@@ -313,7 +313,7 @@
 \f
 ;; Processor type.
 (define_attr "cpu" "none,pentium,pentiumpro,geode,k6,athlon,k8,core2,corei7,
-		    atom,generic64,amdfam10,bdver1,bdver2,btver1,btver2"
+		    atom,generic64,amdfam10,bdver1,bdver2,bdver3,btver1,btver2"
   (const (symbol_ref "ix86_schedule")))
 
 ;; A basic instruction type.  Refinements due to arguments to be
@@ -326,7 +326,7 @@
    push,pop,call,callv,leave,
    str,bitmanip,
    fmov,fop,fsgn,fmul,fdiv,fpspc,fcmov,fcmp,fxch,fistp,fisttp,frndint,
-   sselog,sselog1,sseiadd,sseiadd1,sseishft,sseishft1,sseimul,
+   sselog,sselog1,sseiadd,sseiadd1,sseishft,sseishft1,sseimul,sseshuf,
    sse,ssemov,sseadd,ssemul,ssecmp,ssecomi,ssecvt,ssecvt1,sseicvt,ssediv,sseins,
    ssemuladd,sse4arg,lwp,
    mmx,mmxmov,mmxadd,mmxmul,mmxcmp,mmxcvt,mmxshft"
@@ -937,6 +937,7 @@
 (include "k6.md")
 (include "athlon.md")
 (include "bdver1.md")
+(include "bdver3.md")
 (include "geode.md")
 (include "atom.md")
 (include "core2.md")
Index: gcc/config/i386/cpuid.h
===================================================================
--- gcc/config/i386/cpuid.h	(revision 192031)
+++ gcc/config/i386/cpuid.h	(working copy)
@@ -75,6 +75,9 @@
 #define bit_RDSEED	(1 << 18)
 #define bit_ADX	(1 << 19)
 
+/* Extended Features with cpuid function 0xd */
+#define bit_XSAVEOPT	(1 << 0)
+
 /* Signatures for different CPU implementations as returned in uses
    of cpuid with level 0.  */
 #define signature_AMD_ebx	0x68747541
Index: gcc/config/i386/sse.md
===================================================================
--- gcc/config/i386/sse.md	(revision 192031)
+++ gcc/config/i386/sse.md	(working copy)
@@ -3711,7 +3711,10 @@
 
   return "vshufps\t{%3, %2, %1, %0|%0, %1, %2, %3}";
 }
-  [(set_attr "type" "sselog")
+  [(set (attr "type")
+     (if_then_else (eq_attr "cpu" "bdver3")
+        (const_string "sseshuf")
+        (const_string "sselog")))
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "vex")
    (set_attr "mode" "V8SF")])
@@ -3762,7 +3765,10 @@
     }
 }
   [(set_attr "isa" "noavx,avx")
-   (set_attr "type" "sselog")
+   (set (attr "type")
+     (if_then_else (eq_attr "cpu" "bdver3")
+        (const_string "sseshuf")
+        (const_string "sselog")))
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "V4SF")])
@@ -3869,7 +3875,27 @@
    vmovlps\t{%2, %1, %0|%0, %1, %2}
    %vmovlps\t{%2, %0|%0, %2}"
   [(set_attr "isa" "noavx,avx,noavx,avx,*")
-   (set_attr "type" "sselog,sselog,ssemov,ssemov,ssemov")
+   (set (attr "type")
+        (cond [(and (eq_attr "cpu" "bdver3")
+                 (eq_attr "alternative" "0"))
+                 (const_string "sseshuf")
+               (and (eq_attr "cpu" "bdver3")
+                 (eq_attr "alternative" "1"))
+                 (const_string "sseshuf")
+                 (eq_attr "alternative" "2")
+                 (const_string "ssemov")
+                 (eq_attr "alternative" "3")
+                 (const_string "ssemov")
+                 (eq_attr "alternative" "4")
+                 (const_string "ssemov")
+              (and (not (eq_attr "cpu" "bdver3"))
+                 (eq_attr "alternative" "0"))
+                 (const_string "sselog")
+              (and (not (eq_attr "cpu" "bdver3"))
+                 (eq_attr "alternative" "1"))
+                 (const_string "sselog")
+               ]
+               (const_string "*" )))
    (set_attr "length_immediate" "1,1,*,*,*")
    (set_attr "prefix" "orig,vex,orig,vex,maybe_vex")
    (set_attr "mode" "V4SF,V4SF,V2SF,V2SF,V2SF")])
@@ -3923,7 +3949,23 @@
    vbroadcastss\t{%1, %0|%0, %1}
    shufps\t{$0, %0, %0|%0, %0, 0}"
   [(set_attr "isa" "avx,avx,noavx")
-   (set_attr "type" "sselog1,ssemov,sselog1")
+   (set (attr "type")
+        (cond [(and (eq_attr "cpu" "bdver3")
+                 (eq_attr "alternative" "0"))
+                 (const_string "sseshuf")
+                (and (eq_attr "cpu" "bdver3")
+                 (eq_attr "alternative" "2"))
+                 (const_string "sseshuf")
+                (eq_attr "alternative" "1")
+                 (const_string "ssemov")
+               (and (not (eq_attr "cpu" "bdver3"))
+                 (eq_attr "alternative" "0"))
+                 (const_string "sselog1")
+               (and (not (eq_attr "cpu" "bdver3"))
+                 (eq_attr "alternative" "2"))
+                 (const_string "sselog1")
+               ]
+               (const_string "*" )))
    (set_attr "length_immediate" "1,0,1")
    (set_attr "prefix_extra" "0,1,*")
    (set_attr "prefix" "vex,vex,orig")
@@ -4653,7 +4695,10 @@
 
   return "vshufpd\t{%3, %2, %1, %0|%0, %1, %2, %3}";
 }
-  [(set_attr "type" "sselog")
+  [(set (attr "type")
+     (if_then_else (eq_attr "cpu" "bdver3")
+        (const_string "sseshuf")
+        (const_string "sselog")))
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "vex")
    (set_attr "mode" "V4DF")])
@@ -4767,7 +4812,10 @@
     }
 }
   [(set_attr "isa" "noavx,avx")
-   (set_attr "type" "sselog")
+   (set (attr "type")
+     (if_then_else (eq_attr "cpu" "bdver3")
+        (const_string "sseshuf")
+        (const_string "sselog")))
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "V2DF")])
Index: gcc/config/i386/i386.opt
===================================================================
--- gcc/config/i386/i386.opt	(revision 192031)
+++ gcc/config/i386/i386.opt	(working copy)
@@ -419,7 +419,7 @@
 
 mdispatch-scheduler
 Target RejectNegative Var(flag_dispatch_scheduler)
-Do dispatch scheduling if processor is bdver1 or bdver2 and Haifa scheduling
+Do dispatch scheduling if processor is bdver1 or bdver2 or bdver3 and Haifa scheduling
 is selected.
 
 mprefer-avx128
Index: gcc/config/i386/i386-c.c
===================================================================
--- gcc/config/i386/i386-c.c	(revision 192031)
+++ gcc/config/i386/i386-c.c	(working copy)
@@ -114,6 +114,10 @@
       def_or_undef (parse_in, "__bdver2");
       def_or_undef (parse_in, "__bdver2__");
       break;
+    case PROCESSOR_BDVER3:
+      def_or_undef (parse_in, "__bdver3");
+      def_or_undef (parse_in, "__bdver3__");
+      break;
     case PROCESSOR_BTVER1:
       def_or_undef (parse_in, "__btver1");
       def_or_undef (parse_in, "__btver1__");
@@ -209,7 +213,10 @@
     case PROCESSOR_BDVER2:
       def_or_undef (parse_in, "__tune_bdver2__");
       break;
-   case PROCESSOR_BTVER1:
+    case PROCESSOR_BDVER3:
+      def_or_undef (parse_in, "__tune_bdver3__");
+      break;
+    case PROCESSOR_BTVER1:
       def_or_undef (parse_in, "__tune_btver1__");
       break;
     case PROCESSOR_BTVER2:
Index: gcc/config/i386/driver-i386.c
===================================================================
--- gcc/config/i386/driver-i386.c	(revision 192031)
+++ gcc/config/i386/driver-i386.c	(working copy)
@@ -391,6 +391,7 @@
   unsigned int has_rdrnd = 0, has_f16c = 0, has_fsgsbase = 0;
   unsigned int has_rdseed = 0, has_prfchw = 0, has_adx = 0;
   unsigned int has_osxsave = 0;
+  unsigned int has_xsaveopt = 0;
 
   bool arch;
 
@@ -460,6 +461,12 @@
       has_fsgsbase = ebx & bit_FSGSBASE;
       has_rdseed = ebx & bit_RDSEED;
       has_adx = ebx & bit_ADX;
+
+      /* call the extended function dh with ecx=1
+         to get the cpuid flag value of xsaveopt */
+      __cpuid_count (0xd, 1, eax, ebx, ecx,edx);
+
+      has_xsaveopt = eax & bit_XSAVEOPT ;
     }
 
   /* Get XCR_XFEATURE_ENABLED_MASK register with xgetbv.  */
@@ -530,6 +537,8 @@
 	processor = PROCESSOR_GEODE;
       else if (has_movbe)
 	processor = PROCESSOR_BTVER2;
+      else if (has_xsaveopt)
+        processor = PROCESSOR_BDVER3;
       else if (has_bmi)
         processor = PROCESSOR_BDVER2;
       else if (has_xop)
@@ -700,6 +709,9 @@
     case PROCESSOR_BDVER2:
       cpu = "bdver2";
       break;
+    case PROCESSOR_BDVER3:
+      cpu = "bdver3";
+      break;
     case PROCESSOR_BTVER1:
       cpu = "btver1";
       break;
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 192031)
+++ gcc/config/i386/i386.c	(working copy)
@@ -1427,6 +1427,85 @@
   1,					/* cond_not_taken_branch_cost.  */
 };
 
+struct processor_costs bdver3_cost = {
+  COSTS_N_INSNS (1),			/* cost of an add instruction */
+  COSTS_N_INSNS (1),			/* cost of a lea instruction */
+  COSTS_N_INSNS (1),			/* variable shift costs */
+  COSTS_N_INSNS (1),			/* constant shift costs */
+  {COSTS_N_INSNS (4),			/* cost of starting multiply for QI */
+   COSTS_N_INSNS (4),			/*				 HI */
+   COSTS_N_INSNS (4),			/*				 SI */
+   COSTS_N_INSNS (6),			/*				 DI */
+   COSTS_N_INSNS (6)},			/*			      other */
+  0,					/* cost of multiply per each bit set */
+  {COSTS_N_INSNS (19),			/* cost of a divide/mod for QI */
+   COSTS_N_INSNS (35),			/*			    HI */
+   COSTS_N_INSNS (51),			/*			    SI */
+   COSTS_N_INSNS (83),			/*			    DI */
+   COSTS_N_INSNS (83)},			/*			    other */
+  COSTS_N_INSNS (1),			/* cost of movsx */
+  COSTS_N_INSNS (1),			/* cost of movzx */
+  8,					/* "large" insn */
+  9,					/* MOVE_RATIO */
+  4,				     /* cost for loading QImode using movzbl */
+  {5, 5, 4},				/* cost of loading integer registers
+					   in QImode, HImode and SImode.
+					   Relative to reg-reg move (2).  */
+  {4, 4, 4},				/* cost of storing integer registers */
+  2,					/* cost of reg,reg fld/fst */
+  {5, 5, 12},				/* cost of loading fp registers
+		   			   in SFmode, DFmode and XFmode */
+  {4, 4, 8},				/* cost of storing fp registers
+ 		   			   in SFmode, DFmode and XFmode */
+  2,					/* cost of moving MMX register */
+  {4, 4},				/* cost of loading MMX registers
+					   in SImode and DImode */
+  {4, 4},				/* cost of storing MMX registers
+					   in SImode and DImode */
+  2,					/* cost of moving SSE register */
+  {4, 4, 4},				/* cost of loading SSE registers
+					   in SImode, DImode and TImode */
+  {4, 4, 4},				/* cost of storing SSE registers
+					   in SImode, DImode and TImode */
+  2,					/* MMX or SSE register to integer */
+  16,					/* size of l1 cache.  */
+  2048,					/* size of l2 cache.  */
+  64,					/* size of prefetch block */
+  /* New AMD processors never drop prefetches; if they cannot be performed
+     immediately, they are queued.  We set number of simultaneous prefetches
+     to a large constant to reflect this (it probably is not a good idea not
+     to limit number of prefetches at all, as their execution also takes some
+     time).  */
+  100,					/* number of parallel prefetches */
+  2,					/* Branch cost */
+  COSTS_N_INSNS (6),			/* cost of FADD and FSUB insns.  */
+  COSTS_N_INSNS (6),			/* cost of FMUL instruction.  */
+  COSTS_N_INSNS (42),			/* cost of FDIV instruction.  */
+  COSTS_N_INSNS (2),			/* cost of FABS instruction.  */
+  COSTS_N_INSNS (2),			/* cost of FCHS instruction.  */
+  COSTS_N_INSNS (52),			/* cost of FSQRT instruction.  */
+
+  /*  BDVER3 has optimized REP instruction for medium sized blocks, but for
+      very small blocks it is better to use loop. For large blocks, libcall
+      can do nontemporary accesses and beat inline considerably.  */
+  {{libcall, {{6, loop}, {14, unrolled_loop}, {-1, rep_prefix_4_byte}}},
+   {libcall, {{16, loop}, {8192, rep_prefix_8_byte}, {-1, libcall}}}},
+  {{libcall, {{8, loop}, {24, unrolled_loop},
+	      {2048, rep_prefix_4_byte}, {-1, libcall}}},
+   {libcall, {{48, unrolled_loop}, {8192, rep_prefix_8_byte}, {-1, libcall}}}},
+  6,					/* scalar_stmt_cost.  */
+  4,					/* scalar load_cost.  */
+  4,					/* scalar_store_cost.  */
+  6,					/* vec_stmt_cost.  */
+  0,					/* vec_to_scalar_cost.  */
+  2,					/* scalar_to_vec_cost.  */
+  4,					/* vec_align_load_cost.  */
+  4,					/* vec_unalign_load_cost.  */
+  4,					/* vec_store_cost.  */
+  2,					/* cond_taken_branch_cost.  */
+  1,					/* cond_not_taken_branch_cost.  */
+};
+
 struct processor_costs btver1_cost = {
   COSTS_N_INSNS (1),			/* cost of an add instruction */
   COSTS_N_INSNS (2),			/* cost of a lea instruction */
@@ -1987,7 +2066,8 @@
 #define m_AMDFAM10 (1<<PROCESSOR_AMDFAM10)
 #define m_BDVER1 (1<<PROCESSOR_BDVER1)
 #define m_BDVER2 (1<<PROCESSOR_BDVER2)
-#define m_BDVER	(m_BDVER1 | m_BDVER2)
+#define m_BDVER3 (1<<PROCESSOR_BDVER3)
+#define m_BDVER	(m_BDVER1 | m_BDVER2 | m_BDVER3)
 #define m_BTVER (m_BTVER1 | m_BTVER2)
 #define m_BTVER1 (1<<PROCESSOR_BTVER1)
 #define m_BTVER2 (1<<PROCESSOR_BTVER2)
@@ -2686,6 +2766,7 @@
   {&amdfam10_cost, 32, 24, 32, 7, 32},
   {&bdver1_cost, 32, 24, 32, 7, 32},
   {&bdver2_cost, 32, 24, 32, 7, 32},
+  {&bdver3_cost, 32, 24, 32, 7, 32},
   {&btver1_cost, 32, 24, 32, 7, 32},
   {&btver2_cost, 32, 24, 32, 7, 32},
   {&atom_cost, 16, 15, 16, 7, 16}
@@ -2718,6 +2799,7 @@
   "amdfam10",
   "bdver1",
   "bdver2",
+  "bdver3",
   "btver1",
   "btver2"
 };
@@ -3167,6 +3249,12 @@
 	| PTA_SSE4_2 | PTA_AES | PTA_PCLMUL | PTA_AVX | PTA_FMA4
 	| PTA_XOP | PTA_LWP | PTA_BMI | PTA_TBM | PTA_F16C
 	| PTA_FMA | PTA_PRFCHW},
+      {"bdver3", PROCESSOR_BDVER3, CPU_BDVER3,
+	PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3
+	| PTA_SSE4A | PTA_CX16 | PTA_ABM | PTA_SSSE3 | PTA_SSE4_1
+	| PTA_SSE4_2 | PTA_AES | PTA_PCLMUL | PTA_AVX
+	| PTA_XOP | PTA_LWP | PTA_BMI | PTA_TBM | PTA_F16C
+	| PTA_FMA | PTA_PRFCHW},
       {"btver1", PROCESSOR_BTVER1, CPU_GENERIC64,
 	PTA_64BIT | PTA_MMX |  PTA_SSE  | PTA_SSE2 | PTA_SSE3
 	| PTA_SSSE3 | PTA_SSE4A |PTA_ABM | PTA_CX16 | PTA_PRFCHW},
@@ -24026,6 +24114,7 @@
     case PROCESSOR_GENERIC64:
     case PROCESSOR_BDVER1:
     case PROCESSOR_BDVER2:
+    case PROCESSOR_BDVER3:
     case PROCESSOR_BTVER1:
       return 3;
 
@@ -24215,6 +24304,7 @@
     case PROCESSOR_AMDFAM10:
     case PROCESSOR_BDVER1:
     case PROCESSOR_BDVER2:
+    case PROCESSOR_BDVER3:
     case PROCESSOR_BTVER1:
     case PROCESSOR_BTVER2:
     case PROCESSOR_ATOM:
@@ -28299,7 +28389,8 @@
     M_AMDFAM10H_SHANGHAI,
     M_AMDFAM10H_ISTANBUL,
     M_AMDFAM15H_BDVER1,
-    M_AMDFAM15H_BDVER2
+    M_AMDFAM15H_BDVER2,
+    M_AMDFAM15H_BDVER3
   };
 
   static struct _arch_names_table
@@ -28324,6 +28415,7 @@
       {"amdfam15h", M_AMDFAM15H},
       {"bdver1", M_AMDFAM15H_BDVER1},
       {"bdver2", M_AMDFAM15H_BDVER2},
+      {"bdver3", M_AMDFAM15H_BDVER3},
     };
 
   static struct _isa_names_table
@@ -40460,7 +40552,7 @@
 static bool
 has_dispatch (rtx insn, int action)
 {
-  if ((TARGET_BDVER1 || TARGET_BDVER2)
+  if ((TARGET_BDVER1 || TARGET_BDVER2 || TARGET_BDVER3)
       && flag_dispatch_scheduler)
     switch (action)
       {

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH, i386]: AMD bdver3 enablement
  2012-10-11  8:39 [PATCH, i386]: AMD bdver3 enablement Gopalasubramanian, Ganesh
@ 2012-10-11 17:14 ` Uros Bizjak
  2012-11-05  7:33   ` Gopalasubramanian, Ganesh
  0 siblings, 1 reply; 12+ messages in thread
From: Uros Bizjak @ 2012-10-11 17:14 UTC (permalink / raw)
  To: Gopalasubramanian, Ganesh; +Cc: gcc-patches

On Thu, Oct 11, 2012 at 9:19 AM, Gopalasubramanian, Ganesh
<Ganesh.Gopalasubramanian@amd.com> wrote:

> The attached patch (Patch.txt) enables the next version of AMD's bulldozer core.

Please handle new sseshuf type attribute in various attribute
calculations. You should at least add it to unit attribute
calculation, but please review other calculations. This attribute
replaces sselog, so probably all places that mention sselog needs to
be updated.

> A new file (bdver3.md) is also attached which describes the pipelines.

Please note recent addition - sseadd1, similar to sseadd. You should
handle this and other _1 types in a similar way. _1 types only mark
instructions that do not have operand2, but are otherwise the same as
instructions without prefix.


> 2012-10-11  Ganesh Gopalasubramanian  <Ganesh.Gopalasubramanian@amd.com>
>
>         bdver3 Enablement
>         * gcc/doc/extend.texi: Add details about bdver3.
>         * gcc/doc/invoke.texi: Add details about bdver3.
>         * config.gcc (i[34567]86-*-linux* | ...): Add bdver3.
>         (case ${target}): Add bdver3.
>         * config/i386/i386.h (TARGET_BDVER3): New definition.
>         * config/i386/i386.md (define_attr "cpu"): Add bdver3.
>         * config/i386/cpuid.h (bit_XSAVEOPT): New field for
>         getting the xsaveopt cpuid flag.

Just say "New." here.

>         * config/i386/sse.md (sseshuf): New instruction
>         attribute added for identifying the shuffle instructions.

This is actually "New type attribute."

>         * config/i386/i386.opt (flag_dispatch_scheduler): Add bdver3.
>         * config/i386/i386-c.c (ix86_target_macros_internal): Add
>         bdver3 def_and_undef
>         * config/i386/driver-i386.c (host_detect_local_cpu): Let
>         -march=native recognize bdver3 processors.

"Recognize bdver3 processors."

>         * config/i386/i386.c (struct processor_costs btver2_cost): New
>         bdver3 cost table.

"New."

>         (m_BDVER3): New definition.
>         (m_AMD_MULTIPLE): Includes m_BDVER3.
>         (initial_ix86_tune_features): Add bdver3 tune.
>         (processor_target_table): Add bdver3 entry.
>         (static const char *const cpu_names): Add bdver3 entry.
>         (software_prefetching_beneficial_p): Add bdver3.
>         (ix86_option_override_internal): Add bdver3 instruction sets.
>         (ix86_issue_rate): Add bdver3.
>         (ix86_adjust_cost): Add bdver3.
>         (enum target_cpu_default): Add TARGET_CPU_DEFAULT_bdver3.
>         (enum processor_type): Add PROCESSOR_BDVER3.
>         * config/i386/bdver3.md: New file describing bdver3 pipelines.

The patch looks OK, but please repost due to suggested attribute changes.

Thanks,
Uros.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH, i386]: AMD bdver3 enablement
  2012-10-11 17:14 ` Uros Bizjak
@ 2012-11-05  7:33   ` Gopalasubramanian, Ganesh
  2012-11-05  8:06     ` Uros Bizjak
  0 siblings, 1 reply; 12+ messages in thread
From: Gopalasubramanian, Ganesh @ 2012-11-05  7:33 UTC (permalink / raw)
  To: gcc-patches; +Cc: Uros Bizjak

[-- Attachment #1: Type: text/plain, Size: 4896 bytes --]

Couple of changes done with respect to the review comments.

1. sseshuf type attribute is handled in unit attribute calculation.
2. sseadd1 instruction attribute is handled in the new scheduler descriptions.

The patch is attached as (patch.txt).
The new file (bdver3.md) describing the pipelines is also attached.

Bootstrapping and "make -k check" passes.

OK for upstream?

Regards
Ganesh

2012-11-05  Ganesh Gopalasubramanian  <Ganesh.Gopalasubramanian@amd.com>

	bdver3 Enablement
	* gcc/doc/extend.texi: Add details about bdver3.
	* gcc/doc/invoke.texi: Add details about bdver3.
	* config.gcc (i[34567]86-*-linux* | ...): Add bdver3.
	(case ${target}): Add bdver3.
	* config/i386/i386.h (TARGET_BDVER3): New definition.
	* config/i386/i386.md (define_attr "cpu"): Add bdver3.
	* config/i386/sse.md (sseshuf): New type attribute 
	added for identifying the shuffle instructions.
	* config/i386/i386.opt (flag_dispatch_scheduler): Add bdver3.
	* config/i386/i386-c.c (ix86_target_macros_internal): Add
	bdver3 def_and_undef
	* config/i386/driver-i386.c (host_detect_local_cpu): Let
	-march=native recognize bdver3 processors.
	* config/i386/i386.c (struct processor_costs bdver3_cost): New.
	(m_BDVER3): New definition.
	(m_AMD_MULTIPLE): Includes m_BDVER3.
	(initial_ix86_tune_features): Add bdver3 tune.
	(processor_target_table): Add bdver3 entry.
	(static const char *const cpu_names): Add bdver3 entry.
	(software_prefetching_beneficial_p): Add bdver3.
	(ix86_option_override_internal): Add bdver3 instruction sets.
	(ix86_option_override_internal): Remove XSAVEOPT for bdver1 
	and bdver2.
	(ix86_issue_rate): Add bdver3.
	(ix86_adjust_cost): Add bdver3.
	(enum target_cpu_default): Add TARGET_CPU_DEFAULT_bdver3.
	(enum processor_type): Add PROCESSOR_BDVER3.
	* config/i386/bdver3.md: New file describing bdver3 pipelines.

-----Original Message-----
From: Uros Bizjak [mailto:ubizjak@gmail.com] 
Sent: Thursday, October 11, 2012 10:37 PM
To: Gopalasubramanian, Ganesh
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH, i386]: AMD bdver3 enablement

On Thu, Oct 11, 2012 at 9:19 AM, Gopalasubramanian, Ganesh <Ganesh.Gopalasubramanian@amd.com> wrote:

> The attached patch (Patch.txt) enables the next version of AMD's bulldozer core.

Please handle new sseshuf type attribute in various attribute calculations. You should at least add it to unit attribute calculation, but please review other calculations. This attribute replaces sselog, so probably all places that mention sselog needs to be updated.

> A new file (bdver3.md) is also attached which describes the pipelines.

Please note recent addition - sseadd1, similar to sseadd. You should handle this and other _1 types in a similar way. _1 types only mark instructions that do not have operand2, but are otherwise the same as instructions without prefix.


> 2012-10-11  Ganesh Gopalasubramanian  
> <Ganesh.Gopalasubramanian@amd.com>
>
>         bdver3 Enablement
>         * gcc/doc/extend.texi: Add details about bdver3.
>         * gcc/doc/invoke.texi: Add details about bdver3.
>         * config.gcc (i[34567]86-*-linux* | ...): Add bdver3.
>         (case ${target}): Add bdver3.
>         * config/i386/i386.h (TARGET_BDVER3): New definition.
>         * config/i386/i386.md (define_attr "cpu"): Add bdver3.
>         * config/i386/cpuid.h (bit_XSAVEOPT): New field for
>         getting the xsaveopt cpuid flag.

Just say "New." here.

>         * config/i386/sse.md (sseshuf): New instruction
>         attribute added for identifying the shuffle instructions.

This is actually "New type attribute."

>         * config/i386/i386.opt (flag_dispatch_scheduler): Add bdver3.
>         * config/i386/i386-c.c (ix86_target_macros_internal): Add
>         bdver3 def_and_undef
>         * config/i386/driver-i386.c (host_detect_local_cpu): Let
>         -march=native recognize bdver3 processors.

"Recognize bdver3 processors."

>         * config/i386/i386.c (struct processor_costs btver2_cost): New
>         bdver3 cost table.

"New."

>         (m_BDVER3): New definition.
>         (m_AMD_MULTIPLE): Includes m_BDVER3.
>         (initial_ix86_tune_features): Add bdver3 tune.
>         (processor_target_table): Add bdver3 entry.
>         (static const char *const cpu_names): Add bdver3 entry.
>         (software_prefetching_beneficial_p): Add bdver3.
>         (ix86_option_override_internal): Add bdver3 instruction sets.
>         (ix86_issue_rate): Add bdver3.
>         (ix86_adjust_cost): Add bdver3.
>         (enum target_cpu_default): Add TARGET_CPU_DEFAULT_bdver3.
>         (enum processor_type): Add PROCESSOR_BDVER3.
>         * config/i386/bdver3.md: New file describing bdver3 pipelines.

The patch looks OK, but please repost due to suggested attribute changes.

Thanks,
Uros.


[-- Attachment #2: difflog.txt --]
[-- Type: text/plain, Size: 21325 bytes --]

Index: gcc/doc/extend.texi
===================================================================
--- gcc/doc/extend.texi	(revision 193133)
+++ gcc/doc/extend.texi	(working copy)
@@ -9608,6 +9608,9 @@
 @item bdver2
 AMD family 15h Bulldozer version 2.
 
+@item bdver3
+AMD family 15h Bulldozer version 3.
+
 @item btver2
 AMD family 16h CPU.
 @end table
Index: gcc/doc/invoke.texi
===================================================================
--- gcc/doc/invoke.texi	(revision 193133)
+++ gcc/doc/invoke.texi	(working copy)
@@ -13678,6 +13678,11 @@
 supersets BMI, TBM, F16C, FMA, AVX, XOP, LWP, AES, PCL_MUL, CX16, MMX, SSE,
 SSE2, SSE3, SSE4A, SSSE3, SSE4.1, SSE4.2, ABM and 64-bit instruction set 
 extensions.)
+@item bdver3
+AMD Family 15h core based CPUs with x86-64 instruction set support.  (This
+supersets BMI, TBM, F16C, FMA, AVX, XOP, LWP, AES, PCL_MUL, CX16, MMX, SSE,
+SSE2, SSE3, SSE4A, SSSE3, SSE4.1, SSE4.2, ABM and 64-bit instruction set 
+extensions.)
 
 @item btver1
 CPUs based on AMD Family 14h cores with x86-64 instruction set support.  (This
Index: gcc/config.gcc
===================================================================
--- gcc/config.gcc	(revision 193133)
+++ gcc/config.gcc	(working copy)
@@ -1269,7 +1269,7 @@
 			TM_MULTILIB_CONFIG=`echo $TM_MULTILIB_CONFIG | sed 's/^,//'`
 			need_64bit_isa=yes
 			case X"${with_cpu}" in
-			Xgeneric|Xatom|Xcore2|Xcorei7|Xcorei7-avx|Xnocona|Xx86-64|Xbdver2|Xbdver1|Xbtver2|Xbtver1|Xamdfam10|Xbarcelona|Xk8|Xopteron|Xathlon64|Xathlon-fx|Xathlon64-sse3|Xk8-sse3|Xopteron-sse3)
+			Xgeneric|Xatom|Xcore2|Xcorei7|Xcorei7-avx|Xnocona|Xx86-64|Xbdver3|Xbdver2|Xbdver1|Xbtver2|Xbtver1|Xamdfam10|Xbarcelona|Xk8|Xopteron|Xathlon64|Xathlon-fx|Xathlon64-sse3|Xk8-sse3|Xopteron-sse3)
 				;;
 			X)
 				if test x$with_cpu_64 = x; then
@@ -1278,7 +1278,7 @@
 				;;
 			*)
 				echo "Unsupported CPU used in --with-cpu=$with_cpu, supported values:" 1>&2
-				echo "generic atom core2 corei7 corei7-avx nocona x86-64 bdver2 bdver1 btver2 btver1 amdfam10 barcelona k8 opteron athlon64 athlon-fx athlon64-sse3 k8-sse3 opteron-sse3" 1>&2
+				echo "generic atom core2 corei7 corei7-avx nocona x86-64 bdver3 bdver2 bdver1 btver2 btver1 amdfam10 barcelona k8 opteron athlon64 athlon-fx athlon64-sse3 k8-sse3 opteron-sse3" 1>&2
 				exit 1
 				;;
 			esac
@@ -1390,7 +1390,7 @@
 		tmake_file="$tmake_file i386/t-sol2-64"
 		need_64bit_isa=yes
 		case X"${with_cpu}" in
-		Xgeneric|Xatom|Xcore2|Xcorei7|Xcorei7-avx|Xnocona|Xx86-64|Xbdver2|Xbdver1|Xbtver2|Xbtver1|Xamdfam10|Xbarcelona|Xk8|Xopteron|Xathlon64|Xathlon-fx|Xathlon64-sse3|Xk8-sse3|Xopteron-sse3)
+		Xgeneric|Xatom|Xcore2|Xcorei7|Xcorei7-avx|Xnocona|Xx86-64|Xbdver3|Xbdver2|Xbdver1|Xbtver2|Xbtver1|Xamdfam10|Xbarcelona|Xk8|Xopteron|Xathlon64|Xathlon-fx|Xathlon64-sse3|Xk8-sse3|Xopteron-sse3)
 			;;
 		X)
 			if test x$with_cpu_64 = x; then
@@ -1399,7 +1399,7 @@
 			;;
 		*)
 			echo "Unsupported CPU used in --with-cpu=$with_cpu, supported values:" 1>&2
-			echo "generic atom core2 corei7 corei7-avx nocona x86-64 bdver2 bdver1 btver2 btver1 amdfam10 barcelona k8 opteron athlon64 athlon-fx athlon64-sse3 k8-sse3 opteron-sse3" 1>&2
+			echo "generic atom core2 corei7 corei7-avx nocona x86-64 bdver3 bdver2 bdver1 btver2 btver1 amdfam10 barcelona k8 opteron athlon64 athlon-fx athlon64-sse3 k8-sse3 opteron-sse3" 1>&2
 			exit 1
 			;;
 		esac
@@ -1456,7 +1456,7 @@
 			if test x$enable_targets = xall; then
 				tm_defines="${tm_defines} TARGET_BI_ARCH=1"
 				case X"${with_cpu}" in
-				Xgeneric|Xatom|Xcore2|Xcorei7|Xcorei7-avx|Xnocona|Xx86-64|Xbdver2|Xbdver1|Xbtver2|Xbtver1|Xamdfam10|Xbarcelona|Xk8|Xopteron|Xathlon64|Xathlon-fx|Xathlon64-sse3|Xk8-sse3|Xopteron-sse3)
+				Xgeneric|Xatom|Xcore2|Xcorei7|Xcorei7-avx|Xnocona|Xx86-64|Xbdver3|Xbdver2|Xbdver1|Xbtver2|Xbtver1|Xamdfam10|Xbarcelona|Xk8|Xopteron|Xathlon64|Xathlon-fx|Xathlon64-sse3|Xk8-sse3|Xopteron-sse3)
 					;;
 				X)
 					if test x$with_cpu_64 = x; then
@@ -1465,7 +1465,7 @@
 					;;
 				*)
 					echo "Unsupported CPU used in --with-cpu=$with_cpu, supported values:" 1>&2
-					echo "generic atom core2 corei7 Xcorei7-avx nocona x86-64 bdver2 bdver1 btver2 btver1 amdfam10 barcelona k8 opteron athlon64 athlon-fx athlon64-sse3 k8-sse3 opteron-sse3" 1>&2
+					echo "generic atom core2 corei7 Xcorei7-avx nocona x86-64 bdver3 bdver2 bdver1 btver2 btver1 amdfam10 barcelona k8 opteron athlon64 athlon-fx athlon64-sse3 k8-sse3 opteron-sse3" 1>&2
 					exit 1
 					;;
 				esac
@@ -2706,6 +2706,10 @@
     ;;
   i686-*-* | i786-*-*)
     case ${target_noncanonical} in
+      bdver3-*)
+        arch=bdver3
+        cpu=bdver3
+        ;;
       bdver2-*)
         arch=bdver2
         cpu=bdver2
@@ -2807,6 +2811,10 @@
     ;;
   x86_64-*-*)
     case ${target_noncanonical} in
+      bdver3-*)
+        arch=bdver3
+        cpu=bdver3
+        ;;
       bdver2-*)
         arch=bdver2
         cpu=bdver2
@@ -3344,8 +3352,8 @@
 				;;
 			"" | x86-64 | generic | native \
 			| k8 | k8-sse3 | athlon64 | athlon64-sse3 | opteron \
-			| opteron-sse3 | athlon-fx | bdver2 | bdver1 | btver2 | btver1 \
-			| amdfam10 | barcelona | nocona | core2 | corei7 \
+			| opteron-sse3 | athlon-fx | bdver3 | bdver2 | bdver1 | btver2 \
+			| btver1 | amdfam10 | barcelona | nocona | core2 | corei7 \
 			| corei7-avx | core-avx-i | core-avx2 | atom)
 				# OK
 				;;
Index: gcc/config/i386/i386.h
===================================================================
--- gcc/config/i386/i386.h	(revision 193133)
+++ gcc/config/i386/i386.h	(working copy)
@@ -254,6 +254,7 @@
 #define TARGET_AMDFAM10 (ix86_tune == PROCESSOR_AMDFAM10)
 #define TARGET_BDVER1 (ix86_tune == PROCESSOR_BDVER1)
 #define TARGET_BDVER2 (ix86_tune == PROCESSOR_BDVER2)
+#define TARGET_BDVER3 (ix86_tune == PROCESSOR_BDVER3)
 #define TARGET_BTVER1 (ix86_tune == PROCESSOR_BTVER1)
 #define TARGET_BTVER2 (ix86_tune == PROCESSOR_BTVER2)
 #define TARGET_ATOM (ix86_tune == PROCESSOR_ATOM)
@@ -616,6 +617,7 @@
   TARGET_CPU_DEFAULT_amdfam10,
   TARGET_CPU_DEFAULT_bdver1,
   TARGET_CPU_DEFAULT_bdver2,
+  TARGET_CPU_DEFAULT_bdver3,
   TARGET_CPU_DEFAULT_btver1,
   TARGET_CPU_DEFAULT_btver2,
 
@@ -2098,6 +2100,7 @@
   PROCESSOR_AMDFAM10,
   PROCESSOR_BDVER1,
   PROCESSOR_BDVER2,
+  PROCESSOR_BDVER3,
   PROCESSOR_BTVER1,
   PROCESSOR_BTVER2,
   PROCESSOR_ATOM,
Index: gcc/config/i386/i386.md
===================================================================
--- gcc/config/i386/i386.md	(revision 193133)
+++ gcc/config/i386/i386.md	(working copy)
@@ -323,7 +323,7 @@
 \f
 ;; Processor type.
 (define_attr "cpu" "none,pentium,pentiumpro,geode,k6,athlon,k8,core2,corei7,
-		    atom,generic64,amdfam10,bdver1,bdver2,btver1,btver2"
+		    atom,generic64,amdfam10,bdver1,bdver2,bdver3,btver1,btver2"
   (const (symbol_ref "ix86_schedule")))
 
 ;; A basic instruction type.  Refinements due to arguments to be
@@ -336,7 +336,7 @@
    push,pop,call,callv,leave,
    str,bitmanip,
    fmov,fop,fsgn,fmul,fdiv,fpspc,fcmov,fcmp,fxch,fistp,fisttp,frndint,
-   sselog,sselog1,sseiadd,sseiadd1,sseishft,sseishft1,sseimul,
+   sselog,sselog1,sseiadd,sseiadd1,sseishft,sseishft1,sseimul,sseshuf,
    sse,ssemov,sseadd,sseadd1,ssemul,ssecmp,ssecomi,ssecvt,ssecvt1,sseicvt,
    ssediv,sseins,ssemuladd,sse4arg,lwp,
    mmx,mmxmov,mmxadd,mmxmul,mmxcmp,mmxcvt,mmxshft"
@@ -352,7 +352,7 @@
   (cond [(eq_attr "type" "fmov,fop,fsgn,fmul,fdiv,fpspc,fcmov,fcmp,fxch,fistp,fisttp,frndint")
 	   (const_string "i387")
 	 (eq_attr "type" "sselog,sselog1,sseiadd,sseiadd1,sseishft,sseishft1,sseimul,
-			  sse,ssemov,sseadd,sseadd1,ssemul,ssecmp,ssecomi,ssecvt,
+			  sse,ssemov,sseadd,sseadd1,ssemul,ssecmp,ssecomi,ssecvt,sseshuf,
 			  ssecvt1,sseicvt,ssediv,sseins,ssemuladd,sse4arg")
 	   (const_string "sse")
 	 (eq_attr "type" "mmx,mmxmov,mmxadd,mmxmul,mmxcmp,mmxcvt,mmxshft")
@@ -947,6 +947,7 @@
 (include "k6.md")
 (include "athlon.md")
 (include "bdver1.md")
+(include "bdver3.md")
 (include "geode.md")
 (include "atom.md")
 (include "core2.md")
Index: gcc/config/i386/sse.md
===================================================================
--- gcc/config/i386/sse.md	(revision 193133)
+++ gcc/config/i386/sse.md	(working copy)
@@ -3860,7 +3860,10 @@
 
   return "vshufps\t{%3, %2, %1, %0|%0, %1, %2, %3}";
 }
-  [(set_attr "type" "sselog")
+  [(set (attr "type")
+     (if_then_else (eq_attr "cpu" "bdver3")
+        (const_string "sseshuf")
+        (const_string "sselog")))
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "vex")
    (set_attr "mode" "V8SF")])
@@ -3911,7 +3914,10 @@
     }
 }
   [(set_attr "isa" "noavx,avx")
-   (set_attr "type" "sselog")
+   (set (attr "type")
+     (if_then_else (eq_attr "cpu" "bdver3")
+        (const_string "sseshuf")
+        (const_string "sselog")))
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "V4SF")])
@@ -4018,7 +4024,27 @@
    vmovlps\t{%2, %1, %0|%0, %1, %2}
    %vmovlps\t{%2, %0|%0, %2}"
   [(set_attr "isa" "noavx,avx,noavx,avx,*")
-   (set_attr "type" "sselog,sselog,ssemov,ssemov,ssemov")
+   (set (attr "type")
+        (cond [(and (eq_attr "cpu" "bdver3")
+                 (eq_attr "alternative" "0"))
+                 (const_string "sseshuf")
+               (and (eq_attr "cpu" "bdver3")
+                 (eq_attr "alternative" "1"))
+                 (const_string "sseshuf")
+                 (eq_attr "alternative" "2")
+                 (const_string "ssemov")
+                 (eq_attr "alternative" "3")
+                 (const_string "ssemov")
+                 (eq_attr "alternative" "4")
+                 (const_string "ssemov")
+              (and (not (eq_attr "cpu" "bdver3"))
+                 (eq_attr "alternative" "0"))
+                 (const_string "sselog")
+              (and (not (eq_attr "cpu" "bdver3"))
+                 (eq_attr "alternative" "1"))
+                 (const_string "sselog")
+               ]
+               (const_string "*" )))
    (set_attr "length_immediate" "1,1,*,*,*")
    (set_attr "prefix" "orig,vex,orig,vex,maybe_vex")
    (set_attr "mode" "V4SF,V4SF,V2SF,V2SF,V2SF")])
@@ -4072,7 +4098,23 @@
    vbroadcastss\t{%1, %0|%0, %1}
    shufps\t{$0, %0, %0|%0, %0, 0}"
   [(set_attr "isa" "avx,avx,noavx")
-   (set_attr "type" "sselog1,ssemov,sselog1")
+   (set (attr "type")
+        (cond [(and (eq_attr "cpu" "bdver3")
+                 (eq_attr "alternative" "0"))
+                 (const_string "sseshuf")
+                (and (eq_attr "cpu" "bdver3")
+                 (eq_attr "alternative" "2"))
+                 (const_string "sseshuf")
+                (eq_attr "alternative" "1")
+                 (const_string "ssemov")
+               (and (not (eq_attr "cpu" "bdver3"))
+                 (eq_attr "alternative" "0"))
+                 (const_string "sselog1")
+               (and (not (eq_attr "cpu" "bdver3"))
+                 (eq_attr "alternative" "2"))
+                 (const_string "sselog1")
+               ]
+               (const_string "*" )))
    (set_attr "length_immediate" "1,0,1")
    (set_attr "prefix_extra" "0,1,*")
    (set_attr "prefix" "vex,vex,orig")
@@ -4802,7 +4844,10 @@
 
   return "vshufpd\t{%3, %2, %1, %0|%0, %1, %2, %3}";
 }
-  [(set_attr "type" "sselog")
+  [(set (attr "type")
+     (if_then_else (eq_attr "cpu" "bdver3")
+        (const_string "sseshuf")
+        (const_string "sselog")))
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "vex")
    (set_attr "mode" "V4DF")])
@@ -4916,7 +4961,10 @@
     }
 }
   [(set_attr "isa" "noavx,avx")
-   (set_attr "type" "sselog")
+   (set (attr "type")
+     (if_then_else (eq_attr "cpu" "bdver3")
+        (const_string "sseshuf")
+        (const_string "sselog")))
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "V2DF")])
Index: gcc/config/i386/i386.opt
===================================================================
--- gcc/config/i386/i386.opt	(revision 193133)
+++ gcc/config/i386/i386.opt	(working copy)
@@ -419,7 +419,7 @@
 
 mdispatch-scheduler
 Target RejectNegative Var(flag_dispatch_scheduler)
-Do dispatch scheduling if processor is bdver1 or bdver2 and Haifa scheduling
+Do dispatch scheduling if processor is bdver1 or bdver2 or bdver3 and Haifa scheduling
 is selected.
 
 mprefer-avx128
Index: gcc/config/i386/i386-c.c
===================================================================
--- gcc/config/i386/i386-c.c	(revision 193133)
+++ gcc/config/i386/i386-c.c	(working copy)
@@ -114,6 +114,10 @@
       def_or_undef (parse_in, "__bdver2");
       def_or_undef (parse_in, "__bdver2__");
       break;
+    case PROCESSOR_BDVER3:
+      def_or_undef (parse_in, "__bdver3");
+      def_or_undef (parse_in, "__bdver3__");
+      break;
     case PROCESSOR_BTVER1:
       def_or_undef (parse_in, "__btver1");
       def_or_undef (parse_in, "__btver1__");
@@ -209,7 +213,10 @@
     case PROCESSOR_BDVER2:
       def_or_undef (parse_in, "__tune_bdver2__");
       break;
-   case PROCESSOR_BTVER1:
+    case PROCESSOR_BDVER3:
+      def_or_undef (parse_in, "__tune_bdver3__");
+      break;
+    case PROCESSOR_BTVER1:
       def_or_undef (parse_in, "__tune_btver1__");
       break;
     case PROCESSOR_BTVER2:
Index: gcc/config/i386/driver-i386.c
===================================================================
--- gcc/config/i386/driver-i386.c	(revision 193133)
+++ gcc/config/i386/driver-i386.c	(working copy)
@@ -542,6 +542,8 @@
 	processor = PROCESSOR_GEODE;
       else if (has_movbe)
 	processor = PROCESSOR_BTVER2;
+      else if (has_xsaveopt)
+        processor = PROCESSOR_BDVER3;
       else if (has_bmi)
         processor = PROCESSOR_BDVER2;
       else if (has_xop)
@@ -712,6 +714,9 @@
     case PROCESSOR_BDVER2:
       cpu = "bdver2";
       break;
+    case PROCESSOR_BDVER3:
+      cpu = "bdver3";
+      break;
     case PROCESSOR_BTVER1:
       cpu = "btver1";
       break;
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 193133)
+++ gcc/config/i386/i386.c	(working copy)
@@ -1427,6 +1427,85 @@
   1,					/* cond_not_taken_branch_cost.  */
 };
 
+struct processor_costs bdver3_cost = {
+  COSTS_N_INSNS (1),			/* cost of an add instruction */
+  COSTS_N_INSNS (1),			/* cost of a lea instruction */
+  COSTS_N_INSNS (1),			/* variable shift costs */
+  COSTS_N_INSNS (1),			/* constant shift costs */
+  {COSTS_N_INSNS (4),			/* cost of starting multiply for QI */
+   COSTS_N_INSNS (4),			/*				 HI */
+   COSTS_N_INSNS (4),			/*				 SI */
+   COSTS_N_INSNS (6),			/*				 DI */
+   COSTS_N_INSNS (6)},			/*			      other */
+  0,					/* cost of multiply per each bit set */
+  {COSTS_N_INSNS (19),			/* cost of a divide/mod for QI */
+   COSTS_N_INSNS (35),			/*			    HI */
+   COSTS_N_INSNS (51),			/*			    SI */
+   COSTS_N_INSNS (83),			/*			    DI */
+   COSTS_N_INSNS (83)},			/*			    other */
+  COSTS_N_INSNS (1),			/* cost of movsx */
+  COSTS_N_INSNS (1),			/* cost of movzx */
+  8,					/* "large" insn */
+  9,					/* MOVE_RATIO */
+  4,				     /* cost for loading QImode using movzbl */
+  {5, 5, 4},				/* cost of loading integer registers
+					   in QImode, HImode and SImode.
+					   Relative to reg-reg move (2).  */
+  {4, 4, 4},				/* cost of storing integer registers */
+  2,					/* cost of reg,reg fld/fst */
+  {5, 5, 12},				/* cost of loading fp registers
+		   			   in SFmode, DFmode and XFmode */
+  {4, 4, 8},				/* cost of storing fp registers
+ 		   			   in SFmode, DFmode and XFmode */
+  2,					/* cost of moving MMX register */
+  {4, 4},				/* cost of loading MMX registers
+					   in SImode and DImode */
+  {4, 4},				/* cost of storing MMX registers
+					   in SImode and DImode */
+  2,					/* cost of moving SSE register */
+  {4, 4, 4},				/* cost of loading SSE registers
+					   in SImode, DImode and TImode */
+  {4, 4, 4},				/* cost of storing SSE registers
+					   in SImode, DImode and TImode */
+  2,					/* MMX or SSE register to integer */
+  16,					/* size of l1 cache.  */
+  2048,					/* size of l2 cache.  */
+  64,					/* size of prefetch block */
+  /* New AMD processors never drop prefetches; if they cannot be performed
+     immediately, they are queued.  We set number of simultaneous prefetches
+     to a large constant to reflect this (it probably is not a good idea not
+     to limit number of prefetches at all, as their execution also takes some
+     time).  */
+  100,					/* number of parallel prefetches */
+  2,					/* Branch cost */
+  COSTS_N_INSNS (6),			/* cost of FADD and FSUB insns.  */
+  COSTS_N_INSNS (6),			/* cost of FMUL instruction.  */
+  COSTS_N_INSNS (42),			/* cost of FDIV instruction.  */
+  COSTS_N_INSNS (2),			/* cost of FABS instruction.  */
+  COSTS_N_INSNS (2),			/* cost of FCHS instruction.  */
+  COSTS_N_INSNS (52),			/* cost of FSQRT instruction.  */
+
+  /*  BDVER3 has optimized REP instruction for medium sized blocks, but for
+      very small blocks it is better to use loop. For large blocks, libcall
+      can do nontemporary accesses and beat inline considerably.  */
+  {{libcall, {{6, loop}, {14, unrolled_loop}, {-1, rep_prefix_4_byte}}},
+   {libcall, {{16, loop}, {8192, rep_prefix_8_byte}, {-1, libcall}}}},
+  {{libcall, {{8, loop}, {24, unrolled_loop},
+	      {2048, rep_prefix_4_byte}, {-1, libcall}}},
+   {libcall, {{48, unrolled_loop}, {8192, rep_prefix_8_byte}, {-1, libcall}}}},
+  6,					/* scalar_stmt_cost.  */
+  4,					/* scalar load_cost.  */
+  4,					/* scalar_store_cost.  */
+  6,					/* vec_stmt_cost.  */
+  0,					/* vec_to_scalar_cost.  */
+  2,					/* scalar_to_vec_cost.  */
+  4,					/* vec_align_load_cost.  */
+  4,					/* vec_unalign_load_cost.  */
+  4,					/* vec_store_cost.  */
+  2,					/* cond_taken_branch_cost.  */
+  1,					/* cond_not_taken_branch_cost.  */
+};
+
 struct processor_costs btver1_cost = {
   COSTS_N_INSNS (1),			/* cost of an add instruction */
   COSTS_N_INSNS (2),			/* cost of a lea instruction */
@@ -1987,7 +2066,8 @@
 #define m_AMDFAM10 (1<<PROCESSOR_AMDFAM10)
 #define m_BDVER1 (1<<PROCESSOR_BDVER1)
 #define m_BDVER2 (1<<PROCESSOR_BDVER2)
-#define m_BDVER	(m_BDVER1 | m_BDVER2)
+#define m_BDVER3 (1<<PROCESSOR_BDVER3)
+#define m_BDVER	(m_BDVER1 | m_BDVER2 | m_BDVER3)
 #define m_BTVER (m_BTVER1 | m_BTVER2)
 #define m_BTVER1 (1<<PROCESSOR_BTVER1)
 #define m_BTVER2 (1<<PROCESSOR_BTVER2)
@@ -2690,6 +2770,7 @@
   {&amdfam10_cost, 32, 24, 32, 7, 32},
   {&bdver1_cost, 32, 24, 32, 7, 32},
   {&bdver2_cost, 32, 24, 32, 7, 32},
+  {&bdver3_cost, 32, 24, 32, 7, 32},
   {&btver1_cost, 32, 24, 32, 7, 32},
   {&btver2_cost, 32, 24, 32, 7, 32},
   {&atom_cost, 16, 15, 16, 7, 16}
@@ -2722,6 +2803,7 @@
   "amdfam10",
   "bdver1",
   "bdver2",
+  "bdver3",
   "btver1",
   "btver2"
 };
@@ -3173,18 +3255,24 @@
 	PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3
 	| PTA_SSE4A | PTA_CX16 | PTA_ABM | PTA_SSSE3 | PTA_SSE4_1
 	| PTA_SSE4_2 | PTA_AES | PTA_PCLMUL | PTA_AVX | PTA_FMA4
-	| PTA_XOP | PTA_LWP | PTA_PRFCHW | PTA_FXSR | PTA_XSAVE
-	| PTA_XSAVEOPT},
+	| PTA_XOP | PTA_LWP | PTA_PRFCHW | PTA_FXSR | PTA_XSAVE},
       {"bdver2", PROCESSOR_BDVER2, CPU_BDVER2,
 	PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3
 	| PTA_SSE4A | PTA_CX16 | PTA_ABM | PTA_SSSE3 | PTA_SSE4_1
 	| PTA_SSE4_2 | PTA_AES | PTA_PCLMUL | PTA_AVX | PTA_FMA4
 	| PTA_XOP | PTA_LWP | PTA_BMI | PTA_TBM | PTA_F16C
-	| PTA_FMA | PTA_PRFCHW | PTA_FXSR | PTA_XSAVE | PTA_XSAVEOPT},
+	| PTA_FMA | PTA_PRFCHW | PTA_FXSR | PTA_XSAVE},
+      {"bdver3", PROCESSOR_BDVER3, CPU_BDVER3,
+	PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3
+	| PTA_SSE4A | PTA_CX16 | PTA_ABM | PTA_SSSE3 | PTA_SSE4_1
+	| PTA_SSE4_2 | PTA_AES | PTA_PCLMUL | PTA_AVX
+	| PTA_XOP | PTA_LWP | PTA_BMI | PTA_TBM | PTA_F16C
+	| PTA_FMA | PTA_PRFCHW | PTA_FXSR | PTA_XSAVE 
+	| PTA_XSAVEOPT},
       {"btver1", PROCESSOR_BTVER1, CPU_GENERIC64,
 	PTA_64BIT | PTA_MMX |  PTA_SSE  | PTA_SSE2 | PTA_SSE3
 	| PTA_SSSE3 | PTA_SSE4A |PTA_ABM | PTA_CX16 | PTA_PRFCHW
-	| PTA_FXSR | PTA_XSAVE | PTA_XSAVEOPT},
+	| PTA_FXSR | PTA_XSAVE},
       {"btver2", PROCESSOR_BTVER2, CPU_GENERIC64,
 	PTA_64BIT | PTA_MMX |  PTA_SSE  | PTA_SSE2 | PTA_SSE3
 	| PTA_SSSE3 | PTA_SSE4A |PTA_ABM | PTA_CX16 | PTA_SSE4_1
@@ -24073,6 +24161,7 @@
     case PROCESSOR_GENERIC64:
     case PROCESSOR_BDVER1:
     case PROCESSOR_BDVER2:
+    case PROCESSOR_BDVER3:
     case PROCESSOR_BTVER1:
       return 3;
 
@@ -24262,6 +24351,7 @@
     case PROCESSOR_AMDFAM10:
     case PROCESSOR_BDVER1:
     case PROCESSOR_BDVER2:
+    case PROCESSOR_BDVER3:
     case PROCESSOR_BTVER1:
     case PROCESSOR_BTVER2:
     case PROCESSOR_ATOM:
@@ -28591,7 +28681,8 @@
     M_AMDFAM10H_SHANGHAI,
     M_AMDFAM10H_ISTANBUL,
     M_AMDFAM15H_BDVER1,
-    M_AMDFAM15H_BDVER2
+    M_AMDFAM15H_BDVER2,
+    M_AMDFAM15H_BDVER3
   };
 
   static struct _arch_names_table
@@ -28616,6 +28707,7 @@
       {"amdfam15h", M_AMDFAM15H},
       {"bdver1", M_AMDFAM15H_BDVER1},
       {"bdver2", M_AMDFAM15H_BDVER2},
+      {"bdver3", M_AMDFAM15H_BDVER3},
     };
 
   static struct _isa_names_table
@@ -40962,7 +41054,7 @@
 static bool
 has_dispatch (rtx insn, int action)
 {
-  if ((TARGET_BDVER1 || TARGET_BDVER2)
+  if ((TARGET_BDVER1 || TARGET_BDVER2 || TARGET_BDVER3)
       && flag_dispatch_scheduler)
     switch (action)
       {

[-- Attachment #3: bdver3.md --]
[-- Type: application/octet-stream, Size: 31793 bytes --]

;; Copyright (C) 2012, Free Software Foundation, Inc.
;;
;; This file is part of GCC.
;;
;; GCC is free software; you can redistribute it and/or modify
;; it under the terms of the GNU General Public License as published by
;; the Free Software Foundation; either version 3, or (at your option)
;; any later version.
;;
;; GCC is distributed in the hope that it will be useful,
;; but WITHOUT ANY WARRANTY; without even the implied warranty of
;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
;; GNU General Public License for more details.
;;
;; You should have received a copy of the GNU General Public License
;; along with GCC; see the file COPYING3.  If not see
;; <http://www.gnu.org/licenses/>.
;;
;; AMD bdver3 Scheduling
;;
;; The bdver3 contains three pipelined FP units and two integer units.
;; Fetching and decoding logic is different from previous fam15 processors.
;; Fetching is done every two cycles rather than every cycle and
;; two decode units are available. The decode units therefore decode
;; four instructions in two cycles.
;;
;; The load/store queue unit is not attached to the schedulers but
;; communicates with all the execution units separately instead.
;;
;; bdver3 belong to fam15 processors. We use the same insn attribute
;; that was used for bdver1 decoding scheme.

(define_automaton "bdver3,bdver3_ieu,bdver3_load,bdver3_fp,bdver3_agu")

(define_cpu_unit "bdver3-decode0" "bdver3")
(define_cpu_unit "bdver3-decode1" "bdver3")
(define_cpu_unit "bdver3-decodev" "bdver3")

;; Double decoded instructions take two cycles whereas
;; direct instructions take one cycle.
;; Therefore four direct instructions can be decoded by
;; two decoders in two cycles.
;; Vectorpath instructions are single issue instructions.
;; So, we have separate unit for vector instructions.
(exclusion_set "bdver3-decodev" "bdver3-decode0,bdver3-decode1")

(define_reservation "bdver3-vector" "bdver3-decodev")
(define_reservation "bdver3-direct" "(bdver3-decode0|bdver3-decode1)")
;; Double instructions take two cycles to decode.
(define_reservation "bdver3-double" "(bdver3-decode0|bdver3-decode1)*2")

(define_cpu_unit "bdver3-ieu0" "bdver3_ieu")
(define_cpu_unit "bdver3-ieu1" "bdver3_ieu")
(define_reservation "bdver3-ieu" "(bdver3-ieu0|bdver3-ieu1)")

(define_cpu_unit "bdver3-agu0" "bdver3_agu")
(define_cpu_unit "bdver3-agu1" "bdver3_agu")
(define_reservation "bdver3-agu" "(bdver3-agu0|bdver3-agu1)")

(define_cpu_unit "bdver3-load0" "bdver3_load")
(define_cpu_unit "bdver3-load1" "bdver3_load")
(define_reservation "bdver3-load" "bdver3-agu,
				   (bdver3-load0|bdver3-load1),nothing")
;; 128bit SSE instructions issue two loads at once.
(define_reservation "bdver3-load2" "bdver3-agu,
				   (bdver3-load0+bdver3-load1),nothing")

(define_reservation "bdver3-store" "(bdver3-load0 | bdver3-load1)")
;; 128bit SSE instructions issue two stores at once.
(define_reservation "bdver3-store2" "(bdver3-load0+bdver3-load1)")

;; vectorpath (microcoded) instructions are single issue instructions.
;; So, they occupy all the integer units.
(define_reservation "bdver3-ivector" "bdver3-ieu0+bdver3-ieu1+
                                      bdver3-agu0+bdver3-agu1+
                                      bdver3-load0+bdver3-load1")

(define_reservation "bdver3-fpsched" "nothing,nothing,nothing")

;; The floating point loads.
(define_reservation "bdver3-fpload" "(bdver3-fpsched + bdver3-load)")
(define_reservation "bdver3-fpload2" "(bdver3-fpsched + bdver3-load2)")

;; Three FP units.
(define_cpu_unit "bdver3-ffma0" "bdver3_fp")
(define_cpu_unit "bdver3-ffma1" "bdver3_fp")
(define_cpu_unit "bdver3-fpsto" "bdver3_fp")

(define_reservation "bdver3-fvector" "bdver3-ffma0+bdver3-ffma1+
                                      bdver3-fpsto+bdver3-load0+
                                      bdver3-load1")

(define_reservation "bdver3-ffma"     "(bdver3-ffma0 | bdver3-ffma1)")
(define_reservation "bdver3-fcvt"     "bdver3-ffma0")
(define_reservation "bdver3-fmma"     "bdver3-ffma0")
(define_reservation "bdver3-fxbar"    "bdver3-ffma1")
(define_reservation "bdver3-fmal"     "(bdver3-ffma0 | bdver3-fpsto)")
(define_reservation "bdver3-fsto"     "bdver3-fpsto")
(define_reservation "bdver3-fpshuf"    "bdver3-fpsto")

;; Jump instructions are executed in the branch unit completely transparent to us.
(define_insn_reservation "bdver3_call" 2
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "call,callv"))
			 "bdver3-double,(bdver3-agu | bdver3-ieu),nothing")
;; PUSH mem is double path.
(define_insn_reservation "bdver3_push" 1
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "push"))
			 "bdver3-direct,bdver3-ieu,bdver3-store")
;; POP r16/mem are double path.
(define_insn_reservation "bdver3_pop" 1
                         (and (eq_attr "cpu" "bdver3")
                              (eq_attr "type" "pop"))
                         "bdver3-direct,bdver3-ivector")
;; LEAVE no latency info so far, assume same with amdfam10.
(define_insn_reservation "bdver3_leave" 3
                         (and (eq_attr "cpu" "bdver3")
                              (eq_attr "type" "leave"))
                         "bdver3-vector,bdver3-ivector")
;; LEA executes in AGU unit with 1 cycle latency on BDVER3.
(define_insn_reservation "bdver3_lea" 1
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "lea"))
			 "bdver3-direct,bdver3-ieu")
;; MUL executes in special multiplier unit attached to IEU1.
(define_insn_reservation "bdver3_imul_DI" 6
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "imul")
				   (and (eq_attr "mode" "DI")
					(eq_attr "memory" "none,unknown"))))
			 "bdver3-direct,bdver3-ieu1")
(define_insn_reservation "bdver3_imul" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "imul")
				   (eq_attr "memory" "none,unknown")))
			 "bdver3-direct,bdver3-ieu1")
(define_insn_reservation "bdver3_imul_mem_DI" 10
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "imul")
				   (and (eq_attr "mode" "DI")
					(eq_attr "memory" "load,both"))))
			 "bdver3-direct,bdver3-load,bdver3-ieu1")
(define_insn_reservation "bdver3_imul_mem" 8
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "imul")
				   (eq_attr "memory" "load,both")))
			 "bdver3-direct,bdver3-load,bdver3-ieu1")

(define_insn_reservation "bdver3_str" 6
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "str")
				   (eq_attr "memory" "load,both,store")))
			 "bdver3-vector,bdver3-load,bdver3-ivector")

;; Integer instructions.
(define_insn_reservation "bdver3_idirect" 1
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "bdver1_decode" "direct")
				   (and (eq_attr "unit" "integer,unknown")
					(eq_attr "memory" "none,unknown"))))
			 "bdver3-direct,(bdver3-ieu|bdver3-agu)")
(define_insn_reservation "bdver3_ivector" 2
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "bdver1_decode" "vector")
				   (and (eq_attr "unit" "integer,unknown")
					(eq_attr "memory" "none,unknown"))))
			 "bdver3-vector,bdver3-ivector")
(define_insn_reservation "bdver3_idirect_loadmov" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "imov")
				   (eq_attr "memory" "load")))
			 "bdver3-direct,bdver3-load")
(define_insn_reservation "bdver3_idirect_load" 5
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "bdver1_decode" "direct")
				   (and (eq_attr "unit" "integer,unknown")
					(eq_attr "memory" "load"))))
			 "bdver3-direct,bdver3-load,bdver3-ieu")
(define_insn_reservation "bdver3_idirect_movstore" 5
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "imov")
				   (eq_attr "memory" "store")))
			 "bdver3-direct,bdver3-ieu,bdver3-store")
(define_insn_reservation "bdver3_idirect_both" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "bdver1_decode" "direct")
				   (and (eq_attr "unit" "integer,unknown")
					(eq_attr "memory" "both"))))
			 "bdver3-direct,bdver3-load,
			  bdver3-ieu,bdver3-store,
			  bdver3-store")
(define_insn_reservation "bdver3_idirect_store" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "bdver1_decode" "direct")
				   (and (eq_attr "unit" "integer,unknown")
					(eq_attr "memory" "store"))))
			 "bdver3-direct,(bdver3-ieu+bdver3-agu),
			  bdver3-store")
;; BDVER3 floating point units.
(define_insn_reservation "bdver3_fldxf" 13
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "fmov")
				   (and (eq_attr "memory" "load")
					(eq_attr "mode" "XF"))))
			 "bdver3-vector,bdver3-fpload2,bdver3-fvector*9")
(define_insn_reservation "bdver3_fld" 2
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "fmov")
				   (eq_attr "memory" "load")))
			 "bdver3-direct,bdver3-fpload,bdver3-ffma")
(define_insn_reservation "bdver3_fstxf" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "fmov")
				   (and (eq_attr "memory" "store,both")
					(eq_attr "mode" "XF"))))
			 "bdver3-vector,(bdver3-fpsched+bdver3-agu),(bdver3-store2+(bdver3-fvector*6))")
(define_insn_reservation "bdver3_fst" 2
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "fmov")
				   (eq_attr "memory" "store,both")))
			 "bdver3-double,(bdver3-fpsched),(bdver3-fsto+bdver3-store)")
(define_insn_reservation "bdver3_fist" 2
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "fistp,fisttp"))
			 "bdver3-double,(bdver3-fpsched),(bdver3-fsto+bdver3-store)")
(define_insn_reservation "bdver3_fmov_bdver1" 2
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "fmov"))
			 "bdver3-direct,bdver3-fpsched,bdver3-ffma")
(define_insn_reservation "bdver3_fadd_load" 10
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "fop")
				   (eq_attr "memory" "load")))
			 "bdver3-direct,bdver3-fpload,bdver3-ffma")
(define_insn_reservation "bdver3_fadd" 6
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "fop"))
			 "bdver3-direct,bdver3-fpsched,bdver3-ffma")
(define_insn_reservation "bdver3_fmul_load" 6
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "fmul")
				   (eq_attr "memory" "load")))
			 "bdver3-double,bdver3-fpload,bdver3-ffma")
(define_insn_reservation "bdver3_fmul" 6
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "fmul"))
			 "bdver3-direct,bdver3-fpsched,bdver3-ffma")
(define_insn_reservation "bdver3_fsgn" 2
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "fsgn"))
			 "bdver3-direct,bdver3-fpsched,bdver3-ffma")
(define_insn_reservation "bdver3_fdiv_load" 42
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "fdiv")
				   (eq_attr "memory" "load")))
			 "bdver3-direct,bdver3-fpload,bdver3-ffma")
(define_insn_reservation "bdver3_fdiv" 42
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "fdiv"))
			 "bdver3-direct,bdver3-fpsched,bdver3-ffma")
(define_insn_reservation "bdver3_fpspc_load" 143
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "fpspc")
				   (eq_attr "memory" "load")))
			 "bdver3-vector,bdver3-fpload,bdver3-fvector")
(define_insn_reservation "bdver3_fcmov_load" 17
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "fcmov")
				   (eq_attr "memory" "load")))
			 "bdver3-vector,bdver3-fpload,bdver3-fvector")
(define_insn_reservation "bdver3_fcmov" 15
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "fcmov"))
			 "bdver3-vector,bdver3-fpsched,bdver3-fvector")
(define_insn_reservation "bdver3_fcomi_load" 6
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "fcmp")
				   (and (eq_attr "bdver1_decode" "double")
					(eq_attr "memory" "load"))))
			 "bdver3-double,bdver3-fpload,(bdver3-ffma | bdver3-fsto)")
(define_insn_reservation "bdver3_fcomi" 2
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "bdver1_decode" "double")
				   (eq_attr "type" "fcmp")))
			 "bdver3-double,bdver3-fpsched,(bdver3-ffma | bdver3-fsto)")
(define_insn_reservation "bdver3_fcom_load" 6
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "fcmp")
				   (eq_attr "memory" "load")))
			 "bdver3-direct,bdver3-fpload,bdver3-ffma")
(define_insn_reservation "bdver3_fcom" 2
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "fcmp"))
			 "bdver3-direct,bdver3-fpsched,bdver3-ffma")
(define_insn_reservation "bdver3_fxch" 2
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "fxch"))
			 "bdver3-direct,bdver3-fpsched,bdver3-ffma")

;; SSE loads.
(define_insn_reservation "bdver3_ssevector_avx128_unaligned_load" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemov")
				   (and (eq_attr "prefix" "vex")
					(and (eq_attr "movu" "1")
					     (and (eq_attr "mode" "V4SF,V2DF")
						  (eq_attr "memory" "load"))))))
			 "bdver3-direct,bdver3-fpload")
(define_insn_reservation "bdver3_ssevector_avx256_unaligned_load" 5
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemov")
				   (and (eq_attr "movu" "1")
				        (and (eq_attr "mode" "V8SF,V4DF")
				             (eq_attr "memory" "load")))))
			 "bdver3-double,bdver3-fpload")
(define_insn_reservation "bdver3_ssevector_sse128_unaligned_load" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemov")
				   (and (eq_attr "movu" "1")
				        (and (eq_attr "mode" "V4SF,V2DF")
				             (eq_attr "memory" "load")))))
			 "bdver3-direct,bdver3-fpload,bdver3-fmal")
(define_insn_reservation "bdver3_ssevector_avx128_load" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemov")
				   (and (eq_attr "prefix" "vex")
				        (and (eq_attr "mode" "V4SF,V2DF,TI")
				             (eq_attr "memory" "load")))))
			 "bdver3-direct,bdver3-fpload,bdver3-fmal")
(define_insn_reservation "bdver3_ssevector_avx256_load" 5
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemov")
				   (and (eq_attr "mode" "V8SF,V4DF,OI")
				        (eq_attr "memory" "load"))))
			 "bdver3-double,bdver3-fpload,bdver3-fmal")
(define_insn_reservation "bdver3_ssevector_sse128_load" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemov")
				   (and (eq_attr "mode" "V4SF,V2DF,TI")
				        (eq_attr "memory" "load"))))
			 "bdver3-direct,bdver3-fpload")
(define_insn_reservation "bdver3_ssescalar_movq_load" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemov")
				   (and (eq_attr "mode" "DI")
				        (eq_attr "memory" "load"))))
			 "bdver3-direct,bdver3-fpload,bdver3-fmal")
(define_insn_reservation "bdver3_ssescalar_vmovss_load" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemov")
				   (and (eq_attr "prefix" "vex")
				        (and (eq_attr "mode" "SF")
				             (eq_attr "memory" "load")))))
			 "bdver3-direct,bdver3-fpload")
(define_insn_reservation "bdver3_ssescalar_sse128_load" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemov")
				   (and (eq_attr "mode" "SF,DF")
				        (eq_attr "memory" "load"))))
			 "bdver3-direct,bdver3-fpload, bdver3-ffma")
(define_insn_reservation "bdver3_mmxsse_load" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "mmxmov,ssemov")
				   (eq_attr "memory" "load")))
			 "bdver3-direct,bdver3-fpload, bdver3-fmal")

;; SSE stores.
(define_insn_reservation "bdver3_sse_store_avx256" 5
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemov")
				   (and (eq_attr "mode" "V8SF,V4DF,OI")
					(eq_attr "memory" "store,both"))))
			 "bdver3-double,bdver3-fpsched,((bdver3-fsto+bdver3-store)*2)")
(define_insn_reservation "bdver3_sse_store" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemov")
				   (and (eq_attr "mode" "V4SF,V2DF,TI")
					(eq_attr "memory" "store,both"))))
			 "bdver3-direct,bdver3-fpsched,((bdver3-fsto+bdver3-store)*2)")
(define_insn_reservation "bdver3_mmxsse_store_short" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "mmxmov,ssemov")
				   (eq_attr "memory" "store,both")))
			 "bdver3-direct,bdver3-fpsched,(bdver3-fsto+bdver3-store)")

;; Register moves.
(define_insn_reservation "bdver3_ssevector_avx256" 3
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemov")
				   (and (eq_attr "mode" "V8SF,V4DF,OI")
					(eq_attr "memory" "none"))))
			 "bdver3-double,bdver3-fpsched,bdver3-fmal")
(define_insn_reservation "bdver3_movss_movsd" 2
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemov")
				   (and (eq_attr "mode" "SF,DF")
                                        (eq_attr "memory" "none"))))
			 "bdver3-direct,bdver3-fpsched,bdver3-ffma")
(define_insn_reservation "bdver3_mmxssemov" 2
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "mmxmov,ssemov")
				   (eq_attr "memory" "none")))
			 "bdver3-direct,bdver3-fpsched,bdver3-fmal")
;; SSE logs.
(define_insn_reservation "bdver3_sselog_load_256" 7
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "sselog,sselog1")
				   (and (eq_attr "mode" "V8SF")
				   (eq_attr "memory" "load"))))
			 "bdver3-double,bdver3-fpload,bdver3-fmal")
(define_insn_reservation "bdver3_sselog_256" 3
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "sselog,sselog1")
                                   (eq_attr "mode" "V8SF")))
			 "bdver3-double,bdver3-fpsched,bdver3-fmal")
(define_insn_reservation "bdver3_sseshuf_256" 3
                         (and (eq_attr "cpu" "bdver3")
                              (and (eq_attr "type" "sseshuf")
                                   (eq_attr "mode" "V8SF")))
                         "bdver3-double,bdver3-fpsched,bdver3-fpshuf")
(define_insn_reservation "bdver3_sselog_load" 6
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "sselog,sselog1")
				   (eq_attr "memory" "load")))
			 "bdver3-direct,bdver3-fpload,bdver3-fxbar")
(define_insn_reservation "bdver3_sselog" 2
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "sselog,sselog1"))
			 "bdver3-direct,bdver3-fpsched,bdver3-fxbar")

;; PCMP actually executes in FMAL.
(define_insn_reservation "bdver3_ssecmp_load" 6
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecmp")
				   (eq_attr "memory" "load")))
			 "bdver3-direct,bdver3-fpload,bdver3-ffma")
(define_insn_reservation "bdver3_ssecmp" 2
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "ssecmp"))
			 "bdver3-direct,bdver3-fpsched,bdver3-ffma")
(define_insn_reservation "bdver3_ssecomi_load" 6
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecomi")
				   (eq_attr "memory" "load")))
			 "bdver3-double,bdver3-fpload,(bdver3-ffma | bdver3-fsto)")
(define_insn_reservation "bdver3_ssecomi" 2
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "ssecomi"))
			 "bdver3-double,bdver3-fpsched,(bdver3-ffma | bdver3-fsto)")

;; Conversions behaves very irregularly and the scheduling is critical here.
;; Take each instruction separately.

;; 256 bit conversion.
(define_insn_reservation "bdver3_vcvtX2Y_avx256_load" 8
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "load")
					(ior (ior (match_operand:V4DF 0 "register_operand")
					          (ior (match_operand:V8SF 0 "register_operand")
						       (match_operand:V8SI 0 "register_operand")))
					     (ior (match_operand:V4DF 1 "nonimmediate_operand")
						  (ior (match_operand:V8SF 1 "nonimmediate_operand")
						       (match_operand:V8SI 1 "nonimmediate_operand")))))))
			 "bdver3-vector,bdver3-fpload,bdver3-fvector")
(define_insn_reservation "bdver3_vcvtX2Y_avx256" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "none")
					(ior (ior (match_operand:V4DF 0 "register_operand")
					          (ior (match_operand:V8SF 0 "register_operand")
						       (match_operand:V8SI 0 "register_operand")))
					     (ior (match_operand:V4DF 1 "nonimmediate_operand")
						  (ior (match_operand:V8SF 1 "nonimmediate_operand")
						       (match_operand:V8SI 1 "nonimmediate_operand")))))))
			 "bdver3-vector,bdver3-fpsched,bdver3-fvector")
;; CVTSS2SD, CVTSD2SS.
(define_insn_reservation "bdver3_ssecvt_cvtss2sd_load" 8
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "mode" "SF,DF")
					(eq_attr "memory" "load"))))
			 "bdver3-direct,bdver3-fpload,bdver3-fcvt")
(define_insn_reservation "bdver3_ssecvt_cvtss2sd" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "mode" "SF,DF")
					(eq_attr "memory" "none"))))
			 "bdver3-direct,bdver3-fpsched,bdver3-fcvt")
;; CVTSI2SD, CVTSI2SS, CVTSI2SDQ, CVTSI2SSQ.
(define_insn_reservation "bdver3_sseicvt_cvtsi2sd_load" 8
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "sseicvt")
				   (and (eq_attr "mode" "SF,DF")
					(eq_attr "memory" "load"))))
			 "bdver3-direct,bdver3-fpload,bdver3-fcvt")
(define_insn_reservation "bdver3_sseicvt_cvtsi2sd" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "sseicvt")
				   (and (eq_attr "mode" "SF,DF")
					(eq_attr "memory" "none"))))
			 "bdver3-double,bdver3-fpsched,(nothing | bdver3-fcvt)")
;; CVTPD2PS.
(define_insn_reservation "bdver3_ssecvt_cvtpd2ps_load" 8
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "load")
                                        (and (match_operand:V4SF 0 "register_operand")
					     (match_operand:V2DF 1 "nonimmediate_operand")))))
			 "bdver3-double,bdver3-fpload,(bdver3-fxbar | bdver3-fcvt)")
(define_insn_reservation "bdver3_ssecvt_cvtpd2ps" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "none")
                                        (and (match_operand:V4SF 0 "register_operand")
					     (match_operand:V2DF 1 "nonimmediate_operand")))))
			 "bdver3-double,bdver3-fpsched,(bdver3-fxbar | bdver3-fcvt)")
;; CVTPI2PS, CVTDQ2PS.
(define_insn_reservation "bdver3_ssecvt_cvtdq2ps_load" 8
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "load")
                                        (and (match_operand:V4SF 0 "register_operand")
					     (ior (match_operand:V2SI 1 "nonimmediate_operand")
					          (match_operand:V4SI 1 "nonimmediate_operand"))))))
			 "bdver3-direct,bdver3-fpload,bdver3-fcvt")
(define_insn_reservation "bdver3_ssecvt_cvtdq2ps" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "none")
                                        (and (match_operand:V4SF 0 "register_operand")
					     (ior (match_operand:V2SI 1 "nonimmediate_operand")
					          (match_operand:V4SI 1 "nonimmediate_operand"))))))
			 "bdver3-direct,bdver3-fpsched,bdver3-fcvt")
;; CVTDQ2PD.
(define_insn_reservation "bdver3_ssecvt_cvtdq2pd_load" 8
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "load")
                                        (and (match_operand:V2DF 0 "register_operand")
					     (match_operand:V4SI 1 "nonimmediate_operand")))))
			 "bdver3-double,bdver3-fpload,(bdver3-fxbar | bdver3-fcvt)")
(define_insn_reservation "bdver3_ssecvt_cvtdq2pd" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "none")
                                        (and (match_operand:V2DF 0 "register_operand")
					     (match_operand:V4SI 1 "nonimmediate_operand")))))
			 "bdver3-double,bdver3-fpsched,(bdver3-fxbar | bdver3-fcvt)")
;; CVTPS2PD, CVTPI2PD.
(define_insn_reservation "bdver3_ssecvt_cvtps2pd_load" 6
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "load")
                                        (and (match_operand:V2DF 0 "register_operand")
					     (ior (match_operand:V2SI 1 "nonimmediate_operand")
					          (match_operand:V4SF 1 "nonimmediate_operand"))))))
			 "bdver3-double,bdver3-fpload,(bdver3-fxbar | bdver3-fcvt)")
(define_insn_reservation "bdver3_ssecvt_cvtps2pd" 2
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "load")
                                        (and (match_operand:V2DF 0 "register_operand")
					     (ior (match_operand:V2SI 1 "nonimmediate_operand")
					          (match_operand:V4SF 1 "nonimmediate_operand"))))))
			 "bdver3-double,bdver3-fpsched,(bdver3-fxbar | bdver3-fcvt)")
;; CVTSD2SI, CVTSD2SIQ, CVTSS2SI, CVTSS2SIQ, CVTTSD2SI, CVTTSD2SIQ, CVTTSS2SI, CVTTSS2SIQ.
(define_insn_reservation "bdver3_ssecvt_cvtsX2si_load" 8
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "sseicvt")
				   (and (eq_attr "mode" "SI,DI")
					(eq_attr "memory" "load"))))
			 "bdver3-double,bdver3-fpload,(bdver3-fcvt | bdver3-fsto)")
(define_insn_reservation "bdver3_ssecvt_cvtsX2si" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "sseicvt")
				   (and (eq_attr "mode" "SI,DI")
					(eq_attr "memory" "none"))))
			 "bdver3-double,bdver3-fpsched,(bdver3-fcvt | bdver3-fsto)")
;; CVTPD2PI, CVTTPD2PI.
(define_insn_reservation "bdver3_ssecvt_cvtpd2pi_load" 8
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "load")
				        (and (match_operand:V2DF 1 "nonimmediate_operand")
					     (match_operand:V2SI 0 "register_operand")))))
			 "bdver3-double,bdver3-fpload,(bdver3-fcvt | bdver3-fxbar)")
(define_insn_reservation "bdver3_ssecvt_cvtpd2pi" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "none")
				        (and (match_operand:V2DF 1 "nonimmediate_operand")
					     (match_operand:V2SI 0 "register_operand")))))
			 "bdver3-double,bdver3-fpsched,(bdver3-fcvt | bdver3-fxbar)")
;; CVTPD2DQ, CVTTPD2DQ.
(define_insn_reservation "bdver3_ssecvt_cvtpd2dq_load" 6
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "load")
				        (and (match_operand:V2DF 1 "nonimmediate_operand")
					     (match_operand:V4SI 0 "register_operand")))))
			 "bdver3-double,bdver3-fpload,(bdver3-fcvt | bdver3-fxbar)")
(define_insn_reservation "bdver3_ssecvt_cvtpd2dq" 2
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "none")
				        (and (match_operand:V2DF 1 "nonimmediate_operand")
					     (match_operand:V4SI 0 "register_operand")))))
			 "bdver3-double,bdver3-fpsched,(bdver3-fcvt | bdver3-fxbar)")
;; CVTPS2PI, CVTTPS2PI, CVTPS2DQ, CVTTPS2DQ.
(define_insn_reservation "bdver3_ssecvt_cvtps2pi_load" 8
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
                                   (and (eq_attr "memory" "load")
				        (and (match_operand:V4SF 1 "nonimmediate_operand")
				             (ior (match_operand: V2SI 0 "register_operand")
						  (match_operand: V4SI 0 "register_operand"))))))
			 "bdver3-direct,bdver3-fpload,bdver3-fcvt")
(define_insn_reservation "bdver3_ssecvt_cvtps2pi" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "none")
				        (and (match_operand:V4SF 1 "nonimmediate_operand")
				             (ior (match_operand: V2SI 0 "register_operand")
						  (match_operand: V4SI 0 "register_operand"))))))
			 "bdver3-direct,bdver3-fpsched,bdver3-fcvt")

;; SSE MUL, ADD, and MULADD.
(define_insn_reservation "bdver3_ssemuladd_load_256" 11
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemul,sseadd,sseadd1,ssemuladd")
				   (and (eq_attr "mode" "V8SF,V4DF")
					(eq_attr "memory" "load"))))
			 "bdver3-double,bdver3-fpload,bdver3-ffma")
(define_insn_reservation "bdver3_ssemuladd_256" 7
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemul,sseadd,sseadd1,ssemuladd")
				   (and (eq_attr "mode" "V8SF,V4DF")
					(eq_attr "memory" "none"))))
			 "bdver3-double,bdver3-fpsched,bdver3-ffma")
(define_insn_reservation "bdver3_ssemuladd_load" 10
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemul,sseadd,sseadd1,ssemuladd")
				   (eq_attr "memory" "load")))
			 "bdver3-direct,bdver3-fpload,bdver3-ffma")
(define_insn_reservation "bdver3_ssemuladd" 6
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemul,sseadd,sseadd1,ssemuladd")
				   (eq_attr "memory" "none")))
			 "bdver3-direct,bdver3-fpsched,bdver3-ffma")
(define_insn_reservation "bdver3_sseimul_load" 8
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "sseimul")
				   (eq_attr "memory" "load")))
			 "bdver3-direct,bdver3-fpload,bdver3-fmma")
(define_insn_reservation "bdver3_sseimul" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "sseimul")
				   (eq_attr "memory" "none")))
			 "bdver3-direct,bdver3-fpsched,bdver3-fmma")
(define_insn_reservation "bdver3_sseiadd_load" 6
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "sseiadd")
				   (eq_attr "memory" "load")))
			 "bdver3-direct,bdver3-fpload,bdver3-fmal")
(define_insn_reservation "bdver3_sseiadd" 2
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "sseiadd")
				   (eq_attr "memory" "none")))
			 "bdver3-direct,bdver3-fpsched,bdver3-fmal")

;; SSE DIV: no throughput information (assume same as amdfam10).
(define_insn_reservation "bdver3_ssediv_double_load_256" 27
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssediv")
				   (and (eq_attr "mode" "V4DF")
				        (eq_attr "memory" "load"))))
			 "bdver3-double,bdver3-fpload,(bdver3-ffma0*17 | bdver3-ffma1*17)")
(define_insn_reservation "bdver3_ssediv_double_256" 27
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssediv")
				   (and (eq_attr "mode" "V4DF")
				        (eq_attr "memory" "none"))))
			 "bdver3-double,bdver3-fpsched,(bdver3-ffma0*17 | bdver3-ffma1*17)")
(define_insn_reservation "bdver3_ssediv_single_load_256" 27
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssediv")
				   (and (eq_attr "mode" "V8SF")
				        (eq_attr "memory" "load"))))
			 "bdver3-double,bdver3-fpload,(bdver3-ffma0*17 | bdver3-ffma1*17)")
(define_insn_reservation "bdver3_ssediv_single_256" 24
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssediv")
				   (and (eq_attr "mode" "V8SF")
				        (eq_attr "memory" "none"))))
			 "bdver3-double,bdver3-fpsched,(bdver3-ffma0*17 | bdver3-ffma1*17)")
(define_insn_reservation "bdver3_ssediv_double_load" 27
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssediv")
				   (and (eq_attr "mode" "DF,V2DF")
					(eq_attr "memory" "load"))))
			 "bdver3-direct,bdver3-fpload,(bdver3-ffma0*17 | bdver3-ffma1*17)")
(define_insn_reservation "bdver3_ssediv_double" 27
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssediv")
				   (and (eq_attr "mode" "DF,V2DF")
					(eq_attr "memory" "none"))))
			 "bdver3-direct,bdver3-fpsched,(bdver3-ffma0*17 | bdver3-ffma1*17)")
(define_insn_reservation "bdver3_ssediv_single_load" 27 
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssediv")
				   (and (eq_attr "mode" "SF,V4SF")
					(eq_attr "memory" "load"))))
			 "bdver3-direct,bdver3-fpload,(bdver3-ffma0*17 | bdver3-ffma1*17)")
(define_insn_reservation "bdver3_ssediv_single" 24
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssediv")
				   (and (eq_attr "mode" "SF,V4SF")
					(eq_attr "memory" "none"))))
			 "bdver3-direct,bdver3-fpsched,(bdver3-ffma0*17 | bdver3-ffma1*17)")

(define_insn_reservation "bdver3_sseins" 3
                         (and (eq_attr "cpu" "bdver3")
                              (and (eq_attr "type" "sseins")
                                   (eq_attr "mode" "TI")))
                         "bdver3-direct,bdver3-fpsched,bdver3-fxbar")


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH, i386]: AMD bdver3 enablement
  2012-11-05  7:33   ` Gopalasubramanian, Ganesh
@ 2012-11-05  8:06     ` Uros Bizjak
  2012-11-09  3:39       ` Gopalasubramanian, Ganesh
  2012-11-09  3:50       ` Gopalasubramanian, Ganesh
  0 siblings, 2 replies; 12+ messages in thread
From: Uros Bizjak @ 2012-11-05  8:06 UTC (permalink / raw)
  To: Gopalasubramanian, Ganesh; +Cc: gcc-patches

On Mon, Nov 5, 2012 at 8:33 AM, Gopalasubramanian, Ganesh
<Ganesh.Gopalasubramanian@amd.com> wrote:
> Couple of changes done with respect to the review comments.
>
> 1. sseshuf type attribute is handled in unit attribute calculation.
> 2. sseadd1 instruction attribute is handled in the new scheduler descriptions.
>
> The patch is attached as (patch.txt).
> The new file (bdver3.md) describing the pipelines is also attached.

-  [(set_attr "type" "sselog")
+  [(set (attr "type")
+     (if_then_else (eq_attr "cpu" "bdver3")
+        (const_string "sseshuf")
+        (const_string "sselog")))
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "vex")
    (set_attr "mode" "V8SF")])
@@ -3911,7 +3914,10 @@
     }
 }
   [(set_attr "isa" "noavx,avx")
-   (set_attr "type" "sselog")
+   (set (attr "type")
+     (if_then_else (eq_attr "cpu" "bdver3")
+        (const_string "sseshuf")
+        (const_string "sselog")))
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "V4SF")])
@@ -4018,7 +4024,27 @@
    vmovlps\t{%2, %1, %0|%0, %1, %2}
    %vmovlps\t{%2, %0|%0, %2}"
   [(set_attr "isa" "noavx,avx,noavx,avx,*")
-   (set_attr "type" "sselog,sselog,ssemov,ssemov,ssemov")
+   (set (attr "type")
+        (cond [(and (eq_attr "cpu" "bdver3")
+                 (eq_attr "alternative" "0"))
+                 (const_string "sseshuf")
+               (and (eq_attr "cpu" "bdver3")
+                 (eq_attr "alternative" "1"))
+                 (const_string "sseshuf")
+                 (eq_attr "alternative" "2")
+                 (const_string "ssemov")
+                 (eq_attr "alternative" "3")
+                 (const_string "ssemov")
+                 (eq_attr "alternative" "4")
+                 (const_string "ssemov")
+              (and (not (eq_attr "cpu" "bdver3"))
+                 (eq_attr "alternative" "0"))
+                 (const_string "sselog")
+              (and (not (eq_attr "cpu" "bdver3"))
+                 (eq_attr "alternative" "1"))
+                 (const_string "sselog")
+               ]
+               (const_string "*" )))
    (set_attr "length_immediate" "1,1,*,*,*")
    (set_attr "prefix" "orig,vex,orig,vex,maybe_vex")
    (set_attr "mode" "V4SF,V4SF,V2SF,V2SF,V2SF")])
@@ -4072,7 +4098,23 @@
    vbroadcastss\t{%1, %0|%0, %1}
    shufps\t{$0, %0, %0|%0, %0, 0}"
   [(set_attr "isa" "avx,avx,noavx")
-   (set_attr "type" "sselog1,ssemov,sselog1")
+   (set (attr "type")
+        (cond [(and (eq_attr "cpu" "bdver3")
+                 (eq_attr "alternative" "0"))
+                 (const_string "sseshuf")
+                (and (eq_attr "cpu" "bdver3")
+                 (eq_attr "alternative" "2"))
+                 (const_string "sseshuf")
+                (eq_attr "alternative" "1")
+                 (const_string "ssemov")
+               (and (not (eq_attr "cpu" "bdver3"))
+                 (eq_attr "alternative" "0"))
+                 (const_string "sselog1")
+               (and (not (eq_attr "cpu" "bdver3"))
+                 (eq_attr "alternative" "2"))
+                 (const_string "sselog1")
+               ]
+               (const_string "*" )))

Please don't conditionally change type attribute. Change sselog{,1}
attribute unconditionally to sseshuf{,1} and handle them in the same
way as sselog{,1}.

In other words, add new attributes to all places where original
attributes are handled.

Otherwise, the patch looks good.

Uros.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH, i386]: AMD bdver3 enablement
  2012-11-05  8:06     ` Uros Bizjak
@ 2012-11-09  3:39       ` Gopalasubramanian, Ganesh
  2012-11-11 21:00         ` Uros Bizjak
  2012-11-09  3:50       ` Gopalasubramanian, Ganesh
  1 sibling, 1 reply; 12+ messages in thread
From: Gopalasubramanian, Ganesh @ 2012-11-09  3:39 UTC (permalink / raw)
  To: gcc-patches; +Cc: Uros Bizjak (ubizjak@gmail.com)

[-- Attachment #1: Type: text/plain, Size: 5787 bytes --]

Hi

Changes done with respect to the review comments.
Conditionally setting "sseshuf" type attribute has been removed.
Instead new attribute is added and is included for other attribute calculations.

The patch is attached as (difflog.txt).
The new file (bdver3.md) describing the pipelines is also attached.

Bootstrapping and "make -k check" passes.

OK for upstream?

2012-11-09  Ganesh Gopalasubramanian  <Ganesh.Gopalasubramanian@amd.com>

	bdver3 Enablement
	* gcc/doc/extend.texi: Add details about bdver3.
	* gcc/doc/invoke.texi: Add details about bdver3.
	* config.gcc (i[34567]86-*-linux* | ...): Add bdver3.
	(case ${target}): Add bdver3.
	* config/i386/i386.h (TARGET_BDVER3): New definition.
	* config/i386/i386.md (define_attr "cpu"): Add bdver3.
	* config/i386/sse.md (sseshuf): New type attribute.
	* config/i386/athlon.md (sseshuf):Likewise.
	* config/i386/atom.md (sseshuf):Likewise.
	* config/i386/ppro.md (sseshuf):Likewise.
	* config/i386/bdver1.md (sseshuf):Likewise.
	* config/i386/i386.opt (flag_dispatch_scheduler): Add bdver3.
	* config/i386/i386-c.c (ix86_target_macros_internal): Add
	bdver3 def_and_undef
	* config/i386/driver-i386.c (host_detect_local_cpu): Let
	-march=native recognize bdver3 processors.
	* config/i386/i386.c (struct processor_costs bdver3_cost): New.
	(m_BDVER3): New definition.
	(m_AMD_MULTIPLE): Includes m_BDVER3.
	(initial_ix86_tune_features): Add bdver3 tune.
	(processor_target_table): Add bdver3 entry.
	(static const char *const cpu_names): Add bdver3 entry.
	(software_prefetching_beneficial_p): Add bdver3.
	(ix86_option_override_internal): Add bdver3 instruction sets.
	(ix86_option_override_internal): Remove XSAVEOPT for bdver1 
	and bdver2.
	(ix86_issue_rate): Add bdver3.
	(ix86_adjust_cost): Add bdver3.
	(enum target_cpu_default): Add TARGET_CPU_DEFAULT_bdver3.
	(enum processor_type): Add PROCESSOR_BDVER3.
	* config/i386/bdver3.md: New file describing bdver3 pipelines.

Regards
Ganesh

-----Original Message-----
From: Uros Bizjak [mailto:ubizjak@gmail.com] 
Sent: Monday, November 05, 2012 1:37 PM
To: Gopalasubramanian, Ganesh
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH, i386]: AMD bdver3 enablement

On Mon, Nov 5, 2012 at 8:33 AM, Gopalasubramanian, Ganesh <Ganesh.Gopalasubramanian@amd.com> wrote:
> Couple of changes done with respect to the review comments.
>
> 1. sseshuf type attribute is handled in unit attribute calculation.
> 2. sseadd1 instruction attribute is handled in the new scheduler descriptions.
>
> The patch is attached as (patch.txt).
> The new file (bdver3.md) describing the pipelines is also attached.

-  [(set_attr "type" "sselog")
+  [(set (attr "type")
+     (if_then_else (eq_attr "cpu" "bdver3")
+        (const_string "sseshuf")
+        (const_string "sselog")))
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "vex")
    (set_attr "mode" "V8SF")])
@@ -3911,7 +3914,10 @@
     }
 }
   [(set_attr "isa" "noavx,avx")
-   (set_attr "type" "sselog")
+   (set (attr "type")
+     (if_then_else (eq_attr "cpu" "bdver3")
+        (const_string "sseshuf")
+        (const_string "sselog")))
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "V4SF")])
@@ -4018,7 +4024,27 @@
    vmovlps\t{%2, %1, %0|%0, %1, %2}
    %vmovlps\t{%2, %0|%0, %2}"
   [(set_attr "isa" "noavx,avx,noavx,avx,*")
-   (set_attr "type" "sselog,sselog,ssemov,ssemov,ssemov")
+   (set (attr "type")
+        (cond [(and (eq_attr "cpu" "bdver3")
+                 (eq_attr "alternative" "0"))
+                 (const_string "sseshuf")
+               (and (eq_attr "cpu" "bdver3")
+                 (eq_attr "alternative" "1"))
+                 (const_string "sseshuf")
+                 (eq_attr "alternative" "2")
+                 (const_string "ssemov")
+                 (eq_attr "alternative" "3")
+                 (const_string "ssemov")
+                 (eq_attr "alternative" "4")
+                 (const_string "ssemov")
+              (and (not (eq_attr "cpu" "bdver3"))
+                 (eq_attr "alternative" "0"))
+                 (const_string "sselog")
+              (and (not (eq_attr "cpu" "bdver3"))
+                 (eq_attr "alternative" "1"))
+                 (const_string "sselog")
+               ]
+               (const_string "*" )))
    (set_attr "length_immediate" "1,1,*,*,*")
    (set_attr "prefix" "orig,vex,orig,vex,maybe_vex")
    (set_attr "mode" "V4SF,V4SF,V2SF,V2SF,V2SF")]) @@ -4072,7 +4098,23 @@
    vbroadcastss\t{%1, %0|%0, %1}
    shufps\t{$0, %0, %0|%0, %0, 0}"
   [(set_attr "isa" "avx,avx,noavx")
-   (set_attr "type" "sselog1,ssemov,sselog1")
+   (set (attr "type")
+        (cond [(and (eq_attr "cpu" "bdver3")
+                 (eq_attr "alternative" "0"))
+                 (const_string "sseshuf")
+                (and (eq_attr "cpu" "bdver3")
+                 (eq_attr "alternative" "2"))
+                 (const_string "sseshuf")
+                (eq_attr "alternative" "1")
+                 (const_string "ssemov")
+               (and (not (eq_attr "cpu" "bdver3"))
+                 (eq_attr "alternative" "0"))
+                 (const_string "sselog1")
+               (and (not (eq_attr "cpu" "bdver3"))
+                 (eq_attr "alternative" "2"))
+                 (const_string "sselog1")
+               ]
+               (const_string "*" )))

Please don't conditionally change type attribute. Change sselog{,1} attribute unconditionally to sseshuf{,1} and handle them in the same way as sselog{,1}.

In other words, add new attributes to all places where original attributes are handled.

Otherwise, the patch looks good.

Uros.


[-- Attachment #2: difflog.txt --]
[-- Type: text/plain, Size: 27501 bytes --]

Index: gcc/doc/extend.texi
===================================================================
--- gcc/doc/extend.texi	(revision 193132)
+++ gcc/doc/extend.texi	(working copy)
@@ -9608,6 +9608,9 @@
 @item bdver2
 AMD family 15h Bulldozer version 2.
 
+@item bdver3
+AMD family 15h Bulldozer version 3.
+
 @item btver2
 AMD family 16h CPU.
 @end table
Index: gcc/doc/invoke.texi
===================================================================
--- gcc/doc/invoke.texi	(revision 193132)
+++ gcc/doc/invoke.texi	(working copy)
@@ -13678,6 +13678,11 @@
 supersets BMI, TBM, F16C, FMA, AVX, XOP, LWP, AES, PCL_MUL, CX16, MMX, SSE,
 SSE2, SSE3, SSE4A, SSSE3, SSE4.1, SSE4.2, ABM and 64-bit instruction set 
 extensions.)
+@item bdver3
+AMD Family 15h core based CPUs with x86-64 instruction set support.  (This
+supersets BMI, TBM, F16C, FMA, AVX, XOP, LWP, AES, PCL_MUL, CX16, MMX, SSE,
+SSE2, SSE3, SSE4A, SSSE3, SSE4.1, SSE4.2, ABM and 64-bit instruction set 
+extensions.)
 
 @item btver1
 CPUs based on AMD Family 14h cores with x86-64 instruction set support.  (This
Index: gcc/config.gcc
===================================================================
--- gcc/config.gcc	(revision 193132)
+++ gcc/config.gcc	(working copy)
@@ -1269,7 +1269,7 @@
 			TM_MULTILIB_CONFIG=`echo $TM_MULTILIB_CONFIG | sed 's/^,//'`
 			need_64bit_isa=yes
 			case X"${with_cpu}" in
-			Xgeneric|Xatom|Xcore2|Xcorei7|Xcorei7-avx|Xnocona|Xx86-64|Xbdver2|Xbdver1|Xbtver2|Xbtver1|Xamdfam10|Xbarcelona|Xk8|Xopteron|Xathlon64|Xathlon-fx|Xathlon64-sse3|Xk8-sse3|Xopteron-sse3)
+			Xgeneric|Xatom|Xcore2|Xcorei7|Xcorei7-avx|Xnocona|Xx86-64|Xbdver3|Xbdver2|Xbdver1|Xbtver2|Xbtver1|Xamdfam10|Xbarcelona|Xk8|Xopteron|Xathlon64|Xathlon-fx|Xathlon64-sse3|Xk8-sse3|Xopteron-sse3)
 				;;
 			X)
 				if test x$with_cpu_64 = x; then
@@ -1278,7 +1278,7 @@
 				;;
 			*)
 				echo "Unsupported CPU used in --with-cpu=$with_cpu, supported values:" 1>&2
-				echo "generic atom core2 corei7 corei7-avx nocona x86-64 bdver2 bdver1 btver2 btver1 amdfam10 barcelona k8 opteron athlon64 athlon-fx athlon64-sse3 k8-sse3 opteron-sse3" 1>&2
+				echo "generic atom core2 corei7 corei7-avx nocona x86-64 bdver3 bdver2 bdver1 btver2 btver1 amdfam10 barcelona k8 opteron athlon64 athlon-fx athlon64-sse3 k8-sse3 opteron-sse3" 1>&2
 				exit 1
 				;;
 			esac
@@ -1390,7 +1390,7 @@
 		tmake_file="$tmake_file i386/t-sol2-64"
 		need_64bit_isa=yes
 		case X"${with_cpu}" in
-		Xgeneric|Xatom|Xcore2|Xcorei7|Xcorei7-avx|Xnocona|Xx86-64|Xbdver2|Xbdver1|Xbtver2|Xbtver1|Xamdfam10|Xbarcelona|Xk8|Xopteron|Xathlon64|Xathlon-fx|Xathlon64-sse3|Xk8-sse3|Xopteron-sse3)
+		Xgeneric|Xatom|Xcore2|Xcorei7|Xcorei7-avx|Xnocona|Xx86-64|Xbdver3|Xbdver2|Xbdver1|Xbtver2|Xbtver1|Xamdfam10|Xbarcelona|Xk8|Xopteron|Xathlon64|Xathlon-fx|Xathlon64-sse3|Xk8-sse3|Xopteron-sse3)
 			;;
 		X)
 			if test x$with_cpu_64 = x; then
@@ -1399,7 +1399,7 @@
 			;;
 		*)
 			echo "Unsupported CPU used in --with-cpu=$with_cpu, supported values:" 1>&2
-			echo "generic atom core2 corei7 corei7-avx nocona x86-64 bdver2 bdver1 btver2 btver1 amdfam10 barcelona k8 opteron athlon64 athlon-fx athlon64-sse3 k8-sse3 opteron-sse3" 1>&2
+			echo "generic atom core2 corei7 corei7-avx nocona x86-64 bdver3 bdver2 bdver1 btver2 btver1 amdfam10 barcelona k8 opteron athlon64 athlon-fx athlon64-sse3 k8-sse3 opteron-sse3" 1>&2
 			exit 1
 			;;
 		esac
@@ -1456,7 +1456,7 @@
 			if test x$enable_targets = xall; then
 				tm_defines="${tm_defines} TARGET_BI_ARCH=1"
 				case X"${with_cpu}" in
-				Xgeneric|Xatom|Xcore2|Xcorei7|Xcorei7-avx|Xnocona|Xx86-64|Xbdver2|Xbdver1|Xbtver2|Xbtver1|Xamdfam10|Xbarcelona|Xk8|Xopteron|Xathlon64|Xathlon-fx|Xathlon64-sse3|Xk8-sse3|Xopteron-sse3)
+				Xgeneric|Xatom|Xcore2|Xcorei7|Xcorei7-avx|Xnocona|Xx86-64|Xbdver3|Xbdver2|Xbdver1|Xbtver2|Xbtver1|Xamdfam10|Xbarcelona|Xk8|Xopteron|Xathlon64|Xathlon-fx|Xathlon64-sse3|Xk8-sse3|Xopteron-sse3)
 					;;
 				X)
 					if test x$with_cpu_64 = x; then
@@ -1465,7 +1465,7 @@
 					;;
 				*)
 					echo "Unsupported CPU used in --with-cpu=$with_cpu, supported values:" 1>&2
-					echo "generic atom core2 corei7 Xcorei7-avx nocona x86-64 bdver2 bdver1 btver2 btver1 amdfam10 barcelona k8 opteron athlon64 athlon-fx athlon64-sse3 k8-sse3 opteron-sse3" 1>&2
+					echo "generic atom core2 corei7 Xcorei7-avx nocona x86-64 bdver3 bdver2 bdver1 btver2 btver1 amdfam10 barcelona k8 opteron athlon64 athlon-fx athlon64-sse3 k8-sse3 opteron-sse3" 1>&2
 					exit 1
 					;;
 				esac
@@ -2706,6 +2706,10 @@
     ;;
   i686-*-* | i786-*-*)
     case ${target_noncanonical} in
+      bdver3-*)
+        arch=bdver3
+        cpu=bdver3
+        ;;
       bdver2-*)
         arch=bdver2
         cpu=bdver2
@@ -2807,6 +2811,10 @@
     ;;
   x86_64-*-*)
     case ${target_noncanonical} in
+      bdver3-*)
+        arch=bdver3
+        cpu=bdver3
+        ;;
       bdver2-*)
         arch=bdver2
         cpu=bdver2
@@ -3344,8 +3352,8 @@
 				;;
 			"" | x86-64 | generic | native \
 			| k8 | k8-sse3 | athlon64 | athlon64-sse3 | opteron \
-			| opteron-sse3 | athlon-fx | bdver2 | bdver1 | btver2 | btver1 \
-			| amdfam10 | barcelona | nocona | core2 | corei7 \
+			| opteron-sse3 | athlon-fx | bdver3 | bdver2 | bdver1 | btver2 \
+			| btver1 | amdfam10 | barcelona | nocona | core2 | corei7 \
 			| corei7-avx | core-avx-i | core-avx2 | atom)
 				# OK
 				;;
Index: gcc/config/i386/i386.h
===================================================================
--- gcc/config/i386/i386.h	(revision 193132)
+++ gcc/config/i386/i386.h	(working copy)
@@ -254,6 +254,7 @@
 #define TARGET_AMDFAM10 (ix86_tune == PROCESSOR_AMDFAM10)
 #define TARGET_BDVER1 (ix86_tune == PROCESSOR_BDVER1)
 #define TARGET_BDVER2 (ix86_tune == PROCESSOR_BDVER2)
+#define TARGET_BDVER3 (ix86_tune == PROCESSOR_BDVER3)
 #define TARGET_BTVER1 (ix86_tune == PROCESSOR_BTVER1)
 #define TARGET_BTVER2 (ix86_tune == PROCESSOR_BTVER2)
 #define TARGET_ATOM (ix86_tune == PROCESSOR_ATOM)
@@ -616,6 +617,7 @@
   TARGET_CPU_DEFAULT_amdfam10,
   TARGET_CPU_DEFAULT_bdver1,
   TARGET_CPU_DEFAULT_bdver2,
+  TARGET_CPU_DEFAULT_bdver3,
   TARGET_CPU_DEFAULT_btver1,
   TARGET_CPU_DEFAULT_btver2,
 
@@ -2098,6 +2100,7 @@
   PROCESSOR_AMDFAM10,
   PROCESSOR_BDVER1,
   PROCESSOR_BDVER2,
+  PROCESSOR_BDVER3,
   PROCESSOR_BTVER1,
   PROCESSOR_BTVER2,
   PROCESSOR_ATOM,
Index: gcc/config/i386/i386.md
===================================================================
--- gcc/config/i386/i386.md	(revision 193132)
+++ gcc/config/i386/i386.md	(working copy)
@@ -323,7 +323,7 @@
 \f
 ;; Processor type.
 (define_attr "cpu" "none,pentium,pentiumpro,geode,k6,athlon,k8,core2,corei7,
-		    atom,generic64,amdfam10,bdver1,bdver2,btver1,btver2"
+		    atom,generic64,amdfam10,bdver1,bdver2,bdver3,btver1,btver2"
   (const (symbol_ref "ix86_schedule")))
 
 ;; A basic instruction type.  Refinements due to arguments to be
@@ -336,9 +336,9 @@
    push,pop,call,callv,leave,
    str,bitmanip,
    fmov,fop,fsgn,fmul,fdiv,fpspc,fcmov,fcmp,fxch,fistp,fisttp,frndint,
-   sselog,sselog1,sseiadd,sseiadd1,sseishft,sseishft1,sseimul,
-   sse,ssemov,sseadd,sseadd1,ssemul,ssecmp,ssecomi,ssecvt,ssecvt1,sseicvt,
-   ssediv,sseins,ssemuladd,sse4arg,lwp,
+   sselog,sselog1,sseiadd,sseiadd1,sseishft,sseishft1,sseimul,sse,
+   ssemov,sseadd,sseadd1,ssemul,ssecmp,ssecomi,ssecvt,ssecvt1,sseicvt,
+   sseshuf,sseshuf1,ssediv,sseins,ssemuladd,sse4arg,lwp,
    mmx,mmxmov,mmxadd,mmxmul,mmxcmp,mmxcvt,mmxshft"
   (const_string "other"))
 
@@ -353,7 +353,7 @@
 	   (const_string "i387")
 	 (eq_attr "type" "sselog,sselog1,sseiadd,sseiadd1,sseishft,sseishft1,sseimul,
 			  sse,ssemov,sseadd,sseadd1,ssemul,ssecmp,ssecomi,ssecvt,
-			  ssecvt1,sseicvt,ssediv,sseins,ssemuladd,sse4arg")
+			  sseshuf,sseshuf1,ssecvt1,sseicvt,ssediv,sseins,ssemuladd,sse4arg")
 	   (const_string "sse")
 	 (eq_attr "type" "mmx,mmxmov,mmxadd,mmxmul,mmxcmp,mmxcvt,mmxshft")
 	   (const_string "mmx")
@@ -594,7 +594,7 @@
 	   (if_then_else (match_operand 1 "constant_call_address_operand")
 	     (const_string "none")
 	     (const_string "load"))
-	 (and (eq_attr "type" "alu1,negnot,ishift1,sselog1")
+	 (and (eq_attr "type" "alu1,negnot,ishift1,sselog1,sseshuf1")
 	      (match_operand 1 "memory_operand"))
 	   (const_string "both")
 	 (and (match_operand 0 "memory_operand")
@@ -609,7 +609,7 @@
 		   imov,imovx,icmp,test,bitmanip,
 		   fmov,fcmp,fsgn,
 		   sse,ssemov,ssecmp,ssecomi,ssecvt,ssecvt1,sseicvt,sselog1,
-		   sseadd1,sseiadd1,mmx,mmxmov,mmxcmp,mmxcvt")
+		   sseshuf1,sseadd1,sseiadd1,mmx,mmxmov,mmxcmp,mmxcvt")
 	      (match_operand 2 "memory_operand"))
 	   (const_string "load")
 	 (and (eq_attr "type" "icmov,ssemuladd,sse4arg")
@@ -947,6 +947,7 @@
 (include "k6.md")
 (include "athlon.md")
 (include "bdver1.md")
+(include "bdver3.md")
 (include "geode.md")
 (include "atom.md")
 (include "core2.md")
Index: gcc/config/i386/athlon.md
===================================================================
--- gcc/config/i386/athlon.md	(revision 193132)
+++ gcc/config/i386/athlon.md	(working copy)
@@ -736,6 +736,36 @@
 			      (eq_attr "type" "sselog,sselog1"))
 			 "athlon-direct,athlon-fpsched,(athlon-fadd|athlon-fmul)")
 
+;;SSE shuffle operations
+(define_insn_reservation "athlon_sseshuf_load" 3
+                         (and (eq_attr "cpu" "athlon")
+                              (and (eq_attr "type" "sseshuf,sseshuf1")
+                                   (eq_attr "memory" "load")))
+                         "athlon-vector,athlon-fpload2,(athlon-fmul*2)")
+(define_insn_reservation "athlon_sseshuf_load_k8" 5
+                         (and (eq_attr "cpu" "k8,generic64")
+                              (and (eq_attr "type" "sseshuf,sseshuf1")
+                                   (eq_attr "memory" "load")))
+                         "athlon-double,athlon-fpload2k8,(athlon-fmul*2)")
+(define_insn_reservation "athlon_sseshuf_load_amdfam10" 4
+                         (and (eq_attr "cpu" "amdfam10")
+                              (and (eq_attr "type" "sseshuf,sseshuf1")
+                                   (eq_attr "memory" "load")))
+                         "athlon-direct,athlon-fploadk8,(athlon-fadd|athlon-fmul)")
+
+(define_insn_reservation "athlon_sseshuf" 3
+                         (and (eq_attr "cpu" "athlon")
+                              (eq_attr "type" "sseshuf,sseshuf1"))
+                         "athlon-vector,athlon-fpsched,athlon-fmul*2")
+(define_insn_reservation "athlon_sseshuf_k8" 3
+                         (and (eq_attr "cpu" "k8,generic64")
+                              (eq_attr "type" "sseshuf,sseshuf1"))
+                         "athlon-double,athlon-fpsched,athlon-fmul")
+(define_insn_reservation "athlon_sseshuf_amdfam10" 2
+                         (and (eq_attr "cpu" "amdfam10")
+                              (eq_attr "type" "sseshuf,sseshuf1"))
+                         "athlon-direct,athlon-fpsched,(athlon-fadd|athlon-fmul)")
+
 ;; ??? pcmp executes in addmul, probably not worthwhile to bother about that.
 (define_insn_reservation "athlon_ssecmp_load" 2
 			 (and (eq_attr "cpu" "athlon")
Index: gcc/config/i386/atom.md
===================================================================
--- gcc/config/i386/atom.md	(revision 193132)
+++ gcc/config/i386/atom.md	(working copy)
@@ -455,6 +455,30 @@
             (eq_attr "memory" "!none")))
   "atom-simple-0")
 
+(define_insn_reservation  "atom_sseshuf" 1
+  (and (eq_attr "cpu" "atom")
+       (and (eq_attr "type" "sseshuf")
+            (eq_attr "memory" "none")))
+  "atom-simple-either")
+
+(define_insn_reservation  "atom_sseshuf_mem" 1
+  (and (eq_attr "cpu" "atom")
+       (and (eq_attr "type" "sseshuf")
+            (eq_attr "memory" "!none")))
+  "atom-simple-either")
+
+(define_insn_reservation  "atom_sseshuf1" 1
+  (and (eq_attr "cpu" "atom")
+       (and (eq_attr "type" "sseshuf1")
+            (eq_attr "memory" "none")))
+  "atom-simple-0")
+
+(define_insn_reservation  "atom_sseshuf1_mem" 1
+  (and (eq_attr "cpu" "atom")
+       (and (eq_attr "type" "sseshuf1")
+            (eq_attr "memory" "!none")))
+  "atom-simple-0")
+
 ;; not pmad, not psad
 (define_insn_reservation  "atom_sseiadd" 1
   (and (eq_attr "cpu" "atom")
@@ -743,8 +767,8 @@
                   atom_imul_mem, atom_icmp_mem,
                   atom_test_mem, atom_icmov_mem, atom_sselog_mem,
                   atom_sselog1_mem, atom_fmov_mem, atom_sseadd_mem,
-                  atom_ishift_mem, atom_ishift1_mem, 
-                  atom_rotate_mem, atom_rotate1_mem"
+                  atom_ishift_mem, atom_ishift1_mem, atom_sseshuf_mem, 
+                  atom_sseshuf1_mem, atom_rotate_mem, atom_rotate1_mem"
                   "ix86_agi_dependent")
 
 ;; Stall from imul to lea is 8 cycles.
@@ -757,7 +781,8 @@
                   atom_ishift_mem, atom_ishift1_mem, atom_rotate_mem,
                   atom_rotate1_mem, atom_imul_mem, atom_icmp_mem,
                   atom_test_mem, atom_icmov_mem, atom_sselog_mem,
-                  atom_sselog1_mem, atom_fmov_mem, atom_sseadd_mem"
+                  atom_sselog1_mem, atom_fmov_mem, atom_sseadd_mem,
+		   atom_sseshuf_mem, atom_sseshuf1_mem"
                   "ix86_agi_dependent")
 
 ;; There will be 0 cycle stall from cmp/test to jcc
Index: gcc/config/i386/ppro.md
===================================================================
--- gcc/config/i386/ppro.md	(revision 193132)
+++ gcc/config/i386/ppro.md	(working copy)
@@ -700,6 +700,20 @@
 					(eq_attr "type" "sselog,sselog1"))))
 			 "decoder0,(p2+p1)")
 
+(define_insn_reservation "ppro_sse_shuf_V4SF" 2
+                         (and (eq_attr "cpu" "pentiumpro")
+                              (and (eq_attr "memory" "none")
+                                   (and (eq_attr "mode" "V4SF")
+                                        (eq_attr "type" "sseshuf,sseshuf1"))))
+                         "decodern,p1")
+
+(define_insn_reservation "ppro_sse_shuf_V4SF_load" 2
+                         (and (eq_attr "cpu" "pentiumpro")
+                              (and (eq_attr "memory" "load")
+                                   (and (eq_attr "mode" "V4SF")
+                                        (eq_attr "type" "sseshuf,sseshuf1"))))
+                         "decoder0,(p2+p1)")
+
 (define_insn_reservation "ppro_sse_mov_V4SF" 1
 			 (and (eq_attr "cpu" "pentiumpro")
 			      (and (eq_attr "memory" "none")
Index: gcc/config/i386/sse.md
===================================================================
--- gcc/config/i386/sse.md	(revision 193132)
+++ gcc/config/i386/sse.md	(working copy)
@@ -3860,7 +3860,7 @@
 
   return "vshufps\t{%3, %2, %1, %0|%0, %1, %2, %3}";
 }
-  [(set_attr "type" "sselog")
+  [(set_attr "type" "sseshuf")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "vex")
    (set_attr "mode" "V8SF")])
@@ -3911,7 +3911,7 @@
     }
 }
   [(set_attr "isa" "noavx,avx")
-   (set_attr "type" "sselog")
+   (set_attr "type" "sseshuf")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "V4SF")])
@@ -4018,7 +4018,7 @@
    vmovlps\t{%2, %1, %0|%0, %1, %2}
    %vmovlps\t{%2, %0|%0, %2}"
   [(set_attr "isa" "noavx,avx,noavx,avx,*")
-   (set_attr "type" "sselog,sselog,ssemov,ssemov,ssemov")
+   (set_attr "type" "sseshuf,sseshuf,ssemov,ssemov,ssemov")
    (set_attr "length_immediate" "1,1,*,*,*")
    (set_attr "prefix" "orig,vex,orig,vex,maybe_vex")
    (set_attr "mode" "V4SF,V4SF,V2SF,V2SF,V2SF")])
@@ -4072,7 +4072,7 @@
    vbroadcastss\t{%1, %0|%0, %1}
    shufps\t{$0, %0, %0|%0, %0, 0}"
   [(set_attr "isa" "avx,avx,noavx")
-   (set_attr "type" "sselog1,ssemov,sselog1")
+   (set_attr "type" "sseshuf1,ssemov,sseshuf1")
    (set_attr "length_immediate" "1,0,1")
    (set_attr "prefix_extra" "0,1,*")
    (set_attr "prefix" "vex,vex,orig")
@@ -4802,7 +4802,7 @@
 
   return "vshufpd\t{%3, %2, %1, %0|%0, %1, %2, %3}";
 }
-  [(set_attr "type" "sselog")
+  [(set_attr "type" "sseshuf")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "vex")
    (set_attr "mode" "V4DF")])
@@ -4916,7 +4916,7 @@
     }
 }
   [(set_attr "isa" "noavx,avx")
-   (set_attr "type" "sselog")
+   (set_attr "type" "sseshuf")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "V2DF")])
Index: gcc/config/i386/i386-c.c
===================================================================
--- gcc/config/i386/i386-c.c	(revision 193132)
+++ gcc/config/i386/i386-c.c	(working copy)
@@ -114,6 +114,10 @@
       def_or_undef (parse_in, "__bdver2");
       def_or_undef (parse_in, "__bdver2__");
       break;
+    case PROCESSOR_BDVER3:
+      def_or_undef (parse_in, "__bdver3");
+      def_or_undef (parse_in, "__bdver3__");
+      break;
     case PROCESSOR_BTVER1:
       def_or_undef (parse_in, "__btver1");
       def_or_undef (parse_in, "__btver1__");
@@ -209,7 +213,10 @@
     case PROCESSOR_BDVER2:
       def_or_undef (parse_in, "__tune_bdver2__");
       break;
-   case PROCESSOR_BTVER1:
+    case PROCESSOR_BDVER3:
+      def_or_undef (parse_in, "__tune_bdver3__");
+      break;
+    case PROCESSOR_BTVER1:
       def_or_undef (parse_in, "__tune_btver1__");
       break;
     case PROCESSOR_BTVER2:
Index: gcc/config/i386/i386.opt
===================================================================
--- gcc/config/i386/i386.opt	(revision 193132)
+++ gcc/config/i386/i386.opt	(working copy)
@@ -419,7 +419,7 @@
 
 mdispatch-scheduler
 Target RejectNegative Var(flag_dispatch_scheduler)
-Do dispatch scheduling if processor is bdver1 or bdver2 and Haifa scheduling
+Do dispatch scheduling if processor is bdver1 or bdver2 or bdver3 and Haifa scheduling
 is selected.
 
 mprefer-avx128
Index: gcc/config/i386/bdver1.md
===================================================================
--- gcc/config/i386/bdver1.md	(revision 193132)
+++ gcc/config/i386/bdver1.md	(working copy)
@@ -501,6 +501,28 @@
 			      (eq_attr "type" "sselog,sselog1"))
 			 "bdver1-direct,bdver1-fpsched,bdver1-fxbar")
 
+;; SSE shuffles
+(define_insn_reservation "bdver1_sseshuf_load_256" 7
+                         (and (eq_attr "cpu" "bdver1,bdver2")
+                              (and (eq_attr "type" "sseshuf,sseshuf1")
+                                   (and (eq_attr "mode" "V8SF")
+                                   (eq_attr "memory" "load"))))
+                         "bdver1-double,bdver1-fpload,bdver1-fmal")
+(define_insn_reservation "bdver1_sseshuf_load" 6
+                         (and (eq_attr "cpu" "bdver1,bdver2")
+                              (and (eq_attr "type" "sseshuf,sseshuf1")
+                                   (eq_attr "memory" "load")))
+                         "bdver1-direct,bdver1-fpload,bdver1-fxbar")
+(define_insn_reservation "bdver1_sseshuf_256" 3
+                         (and (eq_attr "cpu" "bdver1,bdver2")
+                              (and (eq_attr "type" "sseshuf,sseshuf1")
+                                   (eq_attr "mode" "V8SF")))
+                         "bdver1-double,bdver1-fpsched,bdver1-fmal")
+(define_insn_reservation "bdver1_sseshuf" 2
+                         (and (eq_attr "cpu" "bdver1,bdver2")
+                              (eq_attr "type" "sseshuf,sseshuf1"))
+                         "bdver1-direct,bdver1-fpsched,bdver1-fxbar")
+
 ;; PCMP actually executes in FMAL.
 (define_insn_reservation "bdver1_ssecmp_load" 6
 			 (and (eq_attr "cpu" "bdver1,bdver2")
Index: gcc/config/i386/driver-i386.c
===================================================================
--- gcc/config/i386/driver-i386.c	(revision 193132)
+++ gcc/config/i386/driver-i386.c	(working copy)
@@ -542,6 +542,8 @@
 	processor = PROCESSOR_GEODE;
       else if (has_movbe)
 	processor = PROCESSOR_BTVER2;
+      else if (has_xsaveopt)
+        processor = PROCESSOR_BDVER3;
       else if (has_bmi)
         processor = PROCESSOR_BDVER2;
       else if (has_xop)
@@ -712,6 +714,9 @@
     case PROCESSOR_BDVER2:
       cpu = "bdver2";
       break;
+    case PROCESSOR_BDVER3:
+      cpu = "bdver3";
+      break;
     case PROCESSOR_BTVER1:
       cpu = "btver1";
       break;
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 193132)
+++ gcc/config/i386/i386.c	(working copy)
@@ -1427,6 +1427,85 @@
   1,					/* cond_not_taken_branch_cost.  */
 };
 
+struct processor_costs bdver3_cost = {
+  COSTS_N_INSNS (1),			/* cost of an add instruction */
+  COSTS_N_INSNS (1),			/* cost of a lea instruction */
+  COSTS_N_INSNS (1),			/* variable shift costs */
+  COSTS_N_INSNS (1),			/* constant shift costs */
+  {COSTS_N_INSNS (4),			/* cost of starting multiply for QI */
+   COSTS_N_INSNS (4),			/*				 HI */
+   COSTS_N_INSNS (4),			/*				 SI */
+   COSTS_N_INSNS (6),			/*				 DI */
+   COSTS_N_INSNS (6)},			/*			      other */
+  0,					/* cost of multiply per each bit set */
+  {COSTS_N_INSNS (19),			/* cost of a divide/mod for QI */
+   COSTS_N_INSNS (35),			/*			    HI */
+   COSTS_N_INSNS (51),			/*			    SI */
+   COSTS_N_INSNS (83),			/*			    DI */
+   COSTS_N_INSNS (83)},			/*			    other */
+  COSTS_N_INSNS (1),			/* cost of movsx */
+  COSTS_N_INSNS (1),			/* cost of movzx */
+  8,					/* "large" insn */
+  9,					/* MOVE_RATIO */
+  4,				     /* cost for loading QImode using movzbl */
+  {5, 5, 4},				/* cost of loading integer registers
+					   in QImode, HImode and SImode.
+					   Relative to reg-reg move (2).  */
+  {4, 4, 4},				/* cost of storing integer registers */
+  2,					/* cost of reg,reg fld/fst */
+  {5, 5, 12},				/* cost of loading fp registers
+		   			   in SFmode, DFmode and XFmode */
+  {4, 4, 8},				/* cost of storing fp registers
+ 		   			   in SFmode, DFmode and XFmode */
+  2,					/* cost of moving MMX register */
+  {4, 4},				/* cost of loading MMX registers
+					   in SImode and DImode */
+  {4, 4},				/* cost of storing MMX registers
+					   in SImode and DImode */
+  2,					/* cost of moving SSE register */
+  {4, 4, 4},				/* cost of loading SSE registers
+					   in SImode, DImode and TImode */
+  {4, 4, 4},				/* cost of storing SSE registers
+					   in SImode, DImode and TImode */
+  2,					/* MMX or SSE register to integer */
+  16,					/* size of l1 cache.  */
+  2048,					/* size of l2 cache.  */
+  64,					/* size of prefetch block */
+  /* New AMD processors never drop prefetches; if they cannot be performed
+     immediately, they are queued.  We set number of simultaneous prefetches
+     to a large constant to reflect this (it probably is not a good idea not
+     to limit number of prefetches at all, as their execution also takes some
+     time).  */
+  100,					/* number of parallel prefetches */
+  2,					/* Branch cost */
+  COSTS_N_INSNS (6),			/* cost of FADD and FSUB insns.  */
+  COSTS_N_INSNS (6),			/* cost of FMUL instruction.  */
+  COSTS_N_INSNS (42),			/* cost of FDIV instruction.  */
+  COSTS_N_INSNS (2),			/* cost of FABS instruction.  */
+  COSTS_N_INSNS (2),			/* cost of FCHS instruction.  */
+  COSTS_N_INSNS (52),			/* cost of FSQRT instruction.  */
+
+  /*  BDVER3 has optimized REP instruction for medium sized blocks, but for
+      very small blocks it is better to use loop. For large blocks, libcall
+      can do nontemporary accesses and beat inline considerably.  */
+  {{libcall, {{6, loop}, {14, unrolled_loop}, {-1, rep_prefix_4_byte}}},
+   {libcall, {{16, loop}, {8192, rep_prefix_8_byte}, {-1, libcall}}}},
+  {{libcall, {{8, loop}, {24, unrolled_loop},
+	      {2048, rep_prefix_4_byte}, {-1, libcall}}},
+   {libcall, {{48, unrolled_loop}, {8192, rep_prefix_8_byte}, {-1, libcall}}}},
+  6,					/* scalar_stmt_cost.  */
+  4,					/* scalar load_cost.  */
+  4,					/* scalar_store_cost.  */
+  6,					/* vec_stmt_cost.  */
+  0,					/* vec_to_scalar_cost.  */
+  2,					/* scalar_to_vec_cost.  */
+  4,					/* vec_align_load_cost.  */
+  4,					/* vec_unalign_load_cost.  */
+  4,					/* vec_store_cost.  */
+  2,					/* cond_taken_branch_cost.  */
+  1,					/* cond_not_taken_branch_cost.  */
+};
+
 struct processor_costs btver1_cost = {
   COSTS_N_INSNS (1),			/* cost of an add instruction */
   COSTS_N_INSNS (2),			/* cost of a lea instruction */
@@ -1987,7 +2066,8 @@
 #define m_AMDFAM10 (1<<PROCESSOR_AMDFAM10)
 #define m_BDVER1 (1<<PROCESSOR_BDVER1)
 #define m_BDVER2 (1<<PROCESSOR_BDVER2)
-#define m_BDVER	(m_BDVER1 | m_BDVER2)
+#define m_BDVER3 (1<<PROCESSOR_BDVER3)
+#define m_BDVER	(m_BDVER1 | m_BDVER2 | m_BDVER3)
 #define m_BTVER (m_BTVER1 | m_BTVER2)
 #define m_BTVER1 (1<<PROCESSOR_BTVER1)
 #define m_BTVER2 (1<<PROCESSOR_BTVER2)
@@ -2690,6 +2770,7 @@
   {&amdfam10_cost, 32, 24, 32, 7, 32},
   {&bdver1_cost, 32, 24, 32, 7, 32},
   {&bdver2_cost, 32, 24, 32, 7, 32},
+  {&bdver3_cost, 32, 24, 32, 7, 32},
   {&btver1_cost, 32, 24, 32, 7, 32},
   {&btver2_cost, 32, 24, 32, 7, 32},
   {&atom_cost, 16, 15, 16, 7, 16}
@@ -2722,6 +2803,7 @@
   "amdfam10",
   "bdver1",
   "bdver2",
+  "bdver3",
   "btver1",
   "btver2"
 };
@@ -3173,18 +3255,24 @@
 	PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3
 	| PTA_SSE4A | PTA_CX16 | PTA_ABM | PTA_SSSE3 | PTA_SSE4_1
 	| PTA_SSE4_2 | PTA_AES | PTA_PCLMUL | PTA_AVX | PTA_FMA4
-	| PTA_XOP | PTA_LWP | PTA_PRFCHW | PTA_FXSR | PTA_XSAVE
-	| PTA_XSAVEOPT},
+	| PTA_XOP | PTA_LWP | PTA_PRFCHW | PTA_FXSR | PTA_XSAVE},
       {"bdver2", PROCESSOR_BDVER2, CPU_BDVER2,
 	PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3
 	| PTA_SSE4A | PTA_CX16 | PTA_ABM | PTA_SSSE3 | PTA_SSE4_1
 	| PTA_SSE4_2 | PTA_AES | PTA_PCLMUL | PTA_AVX | PTA_FMA4
 	| PTA_XOP | PTA_LWP | PTA_BMI | PTA_TBM | PTA_F16C
-	| PTA_FMA | PTA_PRFCHW | PTA_FXSR | PTA_XSAVE | PTA_XSAVEOPT},
+	| PTA_FMA | PTA_PRFCHW | PTA_FXSR | PTA_XSAVE},
+      {"bdver3", PROCESSOR_BDVER3, CPU_BDVER3,
+	PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3
+	| PTA_SSE4A | PTA_CX16 | PTA_ABM | PTA_SSSE3 | PTA_SSE4_1
+	| PTA_SSE4_2 | PTA_AES | PTA_PCLMUL | PTA_AVX
+	| PTA_XOP | PTA_LWP | PTA_BMI | PTA_TBM | PTA_F16C
+	| PTA_FMA | PTA_PRFCHW | PTA_FXSR | PTA_XSAVE 
+	| PTA_XSAVEOPT},
       {"btver1", PROCESSOR_BTVER1, CPU_GENERIC64,
 	PTA_64BIT | PTA_MMX |  PTA_SSE  | PTA_SSE2 | PTA_SSE3
 	| PTA_SSSE3 | PTA_SSE4A |PTA_ABM | PTA_CX16 | PTA_PRFCHW
-	| PTA_FXSR | PTA_XSAVE | PTA_XSAVEOPT},
+	| PTA_FXSR | PTA_XSAVE},
       {"btver2", PROCESSOR_BTVER2, CPU_GENERIC64,
 	PTA_64BIT | PTA_MMX |  PTA_SSE  | PTA_SSE2 | PTA_SSE3
 	| PTA_SSSE3 | PTA_SSE4A |PTA_ABM | PTA_CX16 | PTA_SSE4_1
@@ -24073,6 +24161,7 @@
     case PROCESSOR_GENERIC64:
     case PROCESSOR_BDVER1:
     case PROCESSOR_BDVER2:
+    case PROCESSOR_BDVER3:
     case PROCESSOR_BTVER1:
       return 3;
 
@@ -24262,6 +24351,7 @@
     case PROCESSOR_AMDFAM10:
     case PROCESSOR_BDVER1:
     case PROCESSOR_BDVER2:
+    case PROCESSOR_BDVER3:
     case PROCESSOR_BTVER1:
     case PROCESSOR_BTVER2:
     case PROCESSOR_ATOM:
@@ -28591,7 +28681,8 @@
     M_AMDFAM10H_SHANGHAI,
     M_AMDFAM10H_ISTANBUL,
     M_AMDFAM15H_BDVER1,
-    M_AMDFAM15H_BDVER2
+    M_AMDFAM15H_BDVER2,
+    M_AMDFAM15H_BDVER3
   };
 
   static struct _arch_names_table
@@ -28616,6 +28707,7 @@
       {"amdfam15h", M_AMDFAM15H},
       {"bdver1", M_AMDFAM15H_BDVER1},
       {"bdver2", M_AMDFAM15H_BDVER2},
+      {"bdver3", M_AMDFAM15H_BDVER3},
     };
 
   static struct _isa_names_table
@@ -40962,7 +41054,7 @@
 static bool
 has_dispatch (rtx insn, int action)
 {
-  if ((TARGET_BDVER1 || TARGET_BDVER2)
+  if ((TARGET_BDVER1 || TARGET_BDVER2 || TARGET_BDVER3)
       && flag_dispatch_scheduler)
     switch (action)
       {

[-- Attachment #3: bdver3.md --]
[-- Type: application/octet-stream, Size: 32733 bytes --]

;; Copyright (C) 2012, Free Software Foundation, Inc.
;;
;; This file is part of GCC.
;;
;; GCC is free software; you can redistribute it and/or modify
;; it under the terms of the GNU General Public License as published by
;; the Free Software Foundation; either version 3, or (at your option)
;; any later version.
;;
;; GCC is distributed in the hope that it will be useful,
;; but WITHOUT ANY WARRANTY; without even the implied warranty of
;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
;; GNU General Public License for more details.
;;
;; You should have received a copy of the GNU General Public License
;; along with GCC; see the file COPYING3.  If not see
;; <http://www.gnu.org/licenses/>.
;;
;; AMD bdver3 Scheduling
;;
;; The bdver3 contains three pipelined FP units and two integer units.
;; Fetching and decoding logic is different from previous fam15 processors.
;; Fetching is done every two cycles rather than every cycle and
;; two decode units are available. The decode units therefore decode
;; four instructions in two cycles.
;;
;; The load/store queue unit is not attached to the schedulers but
;; communicates with all the execution units separately instead.
;;
;; bdver3 belong to fam15 processors. We use the same insn attribute
;; that was used for bdver1 decoding scheme.

(define_automaton "bdver3,bdver3_ieu,bdver3_load,bdver3_fp,bdver3_agu")

(define_cpu_unit "bdver3-decode0" "bdver3")
(define_cpu_unit "bdver3-decode1" "bdver3")
(define_cpu_unit "bdver3-decodev" "bdver3")

;; Double decoded instructions take two cycles whereas
;; direct instructions take one cycle.
;; Therefore four direct instructions can be decoded by
;; two decoders in two cycles.
;; Vectorpath instructions are single issue instructions.
;; So, we have separate unit for vector instructions.
(exclusion_set "bdver3-decodev" "bdver3-decode0,bdver3-decode1")

(define_reservation "bdver3-vector" "bdver3-decodev")
(define_reservation "bdver3-direct" "(bdver3-decode0|bdver3-decode1)")
;; Double instructions take two cycles to decode.
(define_reservation "bdver3-double" "(bdver3-decode0|bdver3-decode1)*2")

(define_cpu_unit "bdver3-ieu0" "bdver3_ieu")
(define_cpu_unit "bdver3-ieu1" "bdver3_ieu")
(define_reservation "bdver3-ieu" "(bdver3-ieu0|bdver3-ieu1)")

(define_cpu_unit "bdver3-agu0" "bdver3_agu")
(define_cpu_unit "bdver3-agu1" "bdver3_agu")
(define_reservation "bdver3-agu" "(bdver3-agu0|bdver3-agu1)")

(define_cpu_unit "bdver3-load0" "bdver3_load")
(define_cpu_unit "bdver3-load1" "bdver3_load")
(define_reservation "bdver3-load" "bdver3-agu,
				   (bdver3-load0|bdver3-load1),nothing")
;; 128bit SSE instructions issue two loads at once.
(define_reservation "bdver3-load2" "bdver3-agu,
				   (bdver3-load0+bdver3-load1),nothing")

(define_reservation "bdver3-store" "(bdver3-load0 | bdver3-load1)")
;; 128bit SSE instructions issue two stores at once.
(define_reservation "bdver3-store2" "(bdver3-load0+bdver3-load1)")

;; vectorpath (microcoded) instructions are single issue instructions.
;; So, they occupy all the integer units.
(define_reservation "bdver3-ivector" "bdver3-ieu0+bdver3-ieu1+
                                      bdver3-agu0+bdver3-agu1+
                                      bdver3-load0+bdver3-load1")

(define_reservation "bdver3-fpsched" "nothing,nothing,nothing")

;; The floating point loads.
(define_reservation "bdver3-fpload" "(bdver3-fpsched + bdver3-load)")
(define_reservation "bdver3-fpload2" "(bdver3-fpsched + bdver3-load2)")

;; Three FP units.
(define_cpu_unit "bdver3-ffma0" "bdver3_fp")
(define_cpu_unit "bdver3-ffma1" "bdver3_fp")
(define_cpu_unit "bdver3-fpsto" "bdver3_fp")

(define_reservation "bdver3-fvector" "bdver3-ffma0+bdver3-ffma1+
                                      bdver3-fpsto+bdver3-load0+
                                      bdver3-load1")

(define_reservation "bdver3-ffma"     "(bdver3-ffma0 | bdver3-ffma1)")
(define_reservation "bdver3-fcvt"     "bdver3-ffma0")
(define_reservation "bdver3-fmma"     "bdver3-ffma0")
(define_reservation "bdver3-fxbar"    "bdver3-ffma1")
(define_reservation "bdver3-fmal"     "(bdver3-ffma0 | bdver3-fpsto)")
(define_reservation "bdver3-fsto"     "bdver3-fpsto")
(define_reservation "bdver3-fpshuf"    "bdver3-fpsto")

;; Jump instructions are executed in the branch unit completely transparent to us.
(define_insn_reservation "bdver3_call" 2
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "call,callv"))
			 "bdver3-double,(bdver3-agu | bdver3-ieu),nothing")
;; PUSH mem is double path.
(define_insn_reservation "bdver3_push" 1
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "push"))
			 "bdver3-direct,bdver3-ieu,bdver3-store")
;; POP r16/mem are double path.
(define_insn_reservation "bdver3_pop" 1
                         (and (eq_attr "cpu" "bdver3")
                              (eq_attr "type" "pop"))
                         "bdver3-direct,bdver3-ivector")
;; LEAVE no latency info so far, assume same with amdfam10.
(define_insn_reservation "bdver3_leave" 3
                         (and (eq_attr "cpu" "bdver3")
                              (eq_attr "type" "leave"))
                         "bdver3-vector,bdver3-ivector")
;; LEA executes in AGU unit with 1 cycle latency on BDVER3.
(define_insn_reservation "bdver3_lea" 1
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "lea"))
			 "bdver3-direct,bdver3-ieu")
;; MUL executes in special multiplier unit attached to IEU1.
(define_insn_reservation "bdver3_imul_DI" 6
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "imul")
				   (and (eq_attr "mode" "DI")
					(eq_attr "memory" "none,unknown"))))
			 "bdver3-direct,bdver3-ieu1")
(define_insn_reservation "bdver3_imul" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "imul")
				   (eq_attr "memory" "none,unknown")))
			 "bdver3-direct,bdver3-ieu1")
(define_insn_reservation "bdver3_imul_mem_DI" 10
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "imul")
				   (and (eq_attr "mode" "DI")
					(eq_attr "memory" "load,both"))))
			 "bdver3-direct,bdver3-load,bdver3-ieu1")
(define_insn_reservation "bdver3_imul_mem" 8
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "imul")
				   (eq_attr "memory" "load,both")))
			 "bdver3-direct,bdver3-load,bdver3-ieu1")

(define_insn_reservation "bdver3_str" 6
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "str")
				   (eq_attr "memory" "load,both,store")))
			 "bdver3-vector,bdver3-load,bdver3-ivector")

;; Integer instructions.
(define_insn_reservation "bdver3_idirect" 1
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "bdver1_decode" "direct")
				   (and (eq_attr "unit" "integer,unknown")
					(eq_attr "memory" "none,unknown"))))
			 "bdver3-direct,(bdver3-ieu|bdver3-agu)")
(define_insn_reservation "bdver3_ivector" 2
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "bdver1_decode" "vector")
				   (and (eq_attr "unit" "integer,unknown")
					(eq_attr "memory" "none,unknown"))))
			 "bdver3-vector,bdver3-ivector")
(define_insn_reservation "bdver3_idirect_loadmov" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "imov")
				   (eq_attr "memory" "load")))
			 "bdver3-direct,bdver3-load")
(define_insn_reservation "bdver3_idirect_load" 5
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "bdver1_decode" "direct")
				   (and (eq_attr "unit" "integer,unknown")
					(eq_attr "memory" "load"))))
			 "bdver3-direct,bdver3-load,bdver3-ieu")
(define_insn_reservation "bdver3_idirect_movstore" 5
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "imov")
				   (eq_attr "memory" "store")))
			 "bdver3-direct,bdver3-ieu,bdver3-store")
(define_insn_reservation "bdver3_idirect_both" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "bdver1_decode" "direct")
				   (and (eq_attr "unit" "integer,unknown")
					(eq_attr "memory" "both"))))
			 "bdver3-direct,bdver3-load,
			  bdver3-ieu,bdver3-store,
			  bdver3-store")
(define_insn_reservation "bdver3_idirect_store" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "bdver1_decode" "direct")
				   (and (eq_attr "unit" "integer,unknown")
					(eq_attr "memory" "store"))))
			 "bdver3-direct,(bdver3-ieu+bdver3-agu),
			  bdver3-store")
;; BDVER3 floating point units.
(define_insn_reservation "bdver3_fldxf" 13
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "fmov")
				   (and (eq_attr "memory" "load")
					(eq_attr "mode" "XF"))))
			 "bdver3-vector,bdver3-fpload2,bdver3-fvector*9")
(define_insn_reservation "bdver3_fld" 2
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "fmov")
				   (eq_attr "memory" "load")))
			 "bdver3-direct,bdver3-fpload,bdver3-ffma")
(define_insn_reservation "bdver3_fstxf" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "fmov")
				   (and (eq_attr "memory" "store,both")
					(eq_attr "mode" "XF"))))
			 "bdver3-vector,(bdver3-fpsched+bdver3-agu),(bdver3-store2+(bdver3-fvector*6))")
(define_insn_reservation "bdver3_fst" 2
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "fmov")
				   (eq_attr "memory" "store,both")))
			 "bdver3-double,(bdver3-fpsched),(bdver3-fsto+bdver3-store)")
(define_insn_reservation "bdver3_fist" 2
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "fistp,fisttp"))
			 "bdver3-double,(bdver3-fpsched),(bdver3-fsto+bdver3-store)")
(define_insn_reservation "bdver3_fmov_bdver3" 2
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "fmov"))
			 "bdver3-direct,bdver3-fpsched,bdver3-ffma")
(define_insn_reservation "bdver3_fadd_load" 10
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "fop")
				   (eq_attr "memory" "load")))
			 "bdver3-direct,bdver3-fpload,bdver3-ffma")
(define_insn_reservation "bdver3_fadd" 6
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "fop"))
			 "bdver3-direct,bdver3-fpsched,bdver3-ffma")
(define_insn_reservation "bdver3_fmul_load" 6
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "fmul")
				   (eq_attr "memory" "load")))
			 "bdver3-double,bdver3-fpload,bdver3-ffma")
(define_insn_reservation "bdver3_fmul" 6
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "fmul"))
			 "bdver3-direct,bdver3-fpsched,bdver3-ffma")
(define_insn_reservation "bdver3_fsgn" 2
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "fsgn"))
			 "bdver3-direct,bdver3-fpsched,bdver3-ffma")
(define_insn_reservation "bdver3_fdiv_load" 42
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "fdiv")
				   (eq_attr "memory" "load")))
			 "bdver3-direct,bdver3-fpload,bdver3-ffma")
(define_insn_reservation "bdver3_fdiv" 42
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "fdiv"))
			 "bdver3-direct,bdver3-fpsched,bdver3-ffma")
(define_insn_reservation "bdver3_fpspc_load" 143
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "fpspc")
				   (eq_attr "memory" "load")))
			 "bdver3-vector,bdver3-fpload,bdver3-fvector")
(define_insn_reservation "bdver3_fcmov_load" 17
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "fcmov")
				   (eq_attr "memory" "load")))
			 "bdver3-vector,bdver3-fpload,bdver3-fvector")
(define_insn_reservation "bdver3_fcmov" 15
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "fcmov"))
			 "bdver3-vector,bdver3-fpsched,bdver3-fvector")
(define_insn_reservation "bdver3_fcomi_load" 6
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "fcmp")
				   (and (eq_attr "bdver1_decode" "double")
					(eq_attr "memory" "load"))))
			 "bdver3-double,bdver3-fpload,(bdver3-ffma | bdver3-fsto)")
(define_insn_reservation "bdver3_fcomi" 2
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "bdver1_decode" "double")
				   (eq_attr "type" "fcmp")))
			 "bdver3-double,bdver3-fpsched,(bdver3-ffma | bdver3-fsto)")
(define_insn_reservation "bdver3_fcom_load" 6
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "fcmp")
				   (eq_attr "memory" "load")))
			 "bdver3-direct,bdver3-fpload,bdver3-ffma")
(define_insn_reservation "bdver3_fcom" 2
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "fcmp"))
			 "bdver3-direct,bdver3-fpsched,bdver3-ffma")
(define_insn_reservation "bdver3_fxch" 2
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "fxch"))
			 "bdver3-direct,bdver3-fpsched,bdver3-ffma")

;; SSE loads.
(define_insn_reservation "bdver3_ssevector_avx128_unaligned_load" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemov")
				   (and (eq_attr "prefix" "vex")
					(and (eq_attr "movu" "1")
					     (and (eq_attr "mode" "V4SF,V2DF")
						  (eq_attr "memory" "load"))))))
			 "bdver3-direct,bdver3-fpload")
(define_insn_reservation "bdver3_ssevector_avx256_unaligned_load" 5
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemov")
				   (and (eq_attr "movu" "1")
				        (and (eq_attr "mode" "V8SF,V4DF")
				             (eq_attr "memory" "load")))))
			 "bdver3-double,bdver3-fpload")
(define_insn_reservation "bdver3_ssevector_sse128_unaligned_load" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemov")
				   (and (eq_attr "movu" "1")
				        (and (eq_attr "mode" "V4SF,V2DF")
				             (eq_attr "memory" "load")))))
			 "bdver3-direct,bdver3-fpload,bdver3-fmal")
(define_insn_reservation "bdver3_ssevector_avx128_load" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemov")
				   (and (eq_attr "prefix" "vex")
				        (and (eq_attr "mode" "V4SF,V2DF,TI")
				             (eq_attr "memory" "load")))))
			 "bdver3-direct,bdver3-fpload,bdver3-fmal")
(define_insn_reservation "bdver3_ssevector_avx256_load" 5
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemov")
				   (and (eq_attr "mode" "V8SF,V4DF,OI")
				        (eq_attr "memory" "load"))))
			 "bdver3-double,bdver3-fpload,bdver3-fmal")
(define_insn_reservation "bdver3_ssevector_sse128_load" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemov")
				   (and (eq_attr "mode" "V4SF,V2DF,TI")
				        (eq_attr "memory" "load"))))
			 "bdver3-direct,bdver3-fpload")
(define_insn_reservation "bdver3_ssescalar_movq_load" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemov")
				   (and (eq_attr "mode" "DI")
				        (eq_attr "memory" "load"))))
			 "bdver3-direct,bdver3-fpload,bdver3-fmal")
(define_insn_reservation "bdver3_ssescalar_vmovss_load" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemov")
				   (and (eq_attr "prefix" "vex")
				        (and (eq_attr "mode" "SF")
				             (eq_attr "memory" "load")))))
			 "bdver3-direct,bdver3-fpload")
(define_insn_reservation "bdver3_ssescalar_sse128_load" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemov")
				   (and (eq_attr "mode" "SF,DF")
				        (eq_attr "memory" "load"))))
			 "bdver3-direct,bdver3-fpload, bdver3-ffma")
(define_insn_reservation "bdver3_mmxsse_load" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "mmxmov,ssemov")
				   (eq_attr "memory" "load")))
			 "bdver3-direct,bdver3-fpload, bdver3-fmal")

;; SSE stores.
(define_insn_reservation "bdver3_sse_store_avx256" 5
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemov")
				   (and (eq_attr "mode" "V8SF,V4DF,OI")
					(eq_attr "memory" "store,both"))))
			 "bdver3-double,bdver3-fpsched,((bdver3-fsto+bdver3-store)*2)")
(define_insn_reservation "bdver3_sse_store" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemov")
				   (and (eq_attr "mode" "V4SF,V2DF,TI")
					(eq_attr "memory" "store,both"))))
			 "bdver3-direct,bdver3-fpsched,((bdver3-fsto+bdver3-store)*2)")
(define_insn_reservation "bdver3_mmxsse_store_short" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "mmxmov,ssemov")
				   (eq_attr "memory" "store,both")))
			 "bdver3-direct,bdver3-fpsched,(bdver3-fsto+bdver3-store)")

;; Register moves.
(define_insn_reservation "bdver3_ssevector_avx256" 3
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemov")
				   (and (eq_attr "mode" "V8SF,V4DF,OI")
					(eq_attr "memory" "none"))))
			 "bdver3-double,bdver3-fpsched,bdver3-fmal")
(define_insn_reservation "bdver3_movss_movsd" 2
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemov")
				   (and (eq_attr "mode" "SF,DF")
                                        (eq_attr "memory" "none"))))
			 "bdver3-direct,bdver3-fpsched,bdver3-ffma")
(define_insn_reservation "bdver3_mmxssemov" 2
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "mmxmov,ssemov")
				   (eq_attr "memory" "none")))
			 "bdver3-direct,bdver3-fpsched,bdver3-fmal")
;; SSE logs.
(define_insn_reservation "bdver3_sselog_load_256" 7
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "sselog,sselog1")
				   (and (eq_attr "mode" "V8SF")
				   (eq_attr "memory" "load"))))
			 "bdver3-double,bdver3-fpload,bdver3-fmal")
(define_insn_reservation "bdver3_sselog_256" 3
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "sselog,sselog1")
                                   (eq_attr "mode" "V8SF")))
			 "bdver3-double,bdver3-fpsched,bdver3-fmal")
(define_insn_reservation "bdver3_sselog_load" 6
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "sselog,sselog1")
				   (eq_attr "memory" "load")))
			 "bdver3-direct,bdver3-fpload,bdver3-fxbar")
(define_insn_reservation "bdver3_sselog" 2
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "sselog,sselog1"))
			 "bdver3-direct,bdver3-fpsched,bdver3-fxbar")

;; SSE Shuffles
(define_insn_reservation "bdver3_sseshuf_load_256" 7
                         (and (eq_attr "cpu" "bdver3")
                              (and (eq_attr "type" "sseshuf,sseshuf1")
                                   (and (eq_attr "mode" "V8SF")
                                   (eq_attr "memory" "load"))))
                         "bdver3-double,bdver3-fpload,bdver3-fpshuf")
(define_insn_reservation "bdver3_sseshuf_load" 6
                         (and (eq_attr "cpu" "bdver3")
                              (and (eq_attr "type" "sseshuf,sseshuf1")
                                   (eq_attr "memory" "load")))
                         "bdver3-direct,bdver3-fpload,bdver3-fpshuf")

(define_insn_reservation "bdver3_sseshuf_256" 3
                         (and (eq_attr "cpu" "bdver3")
                              (and (eq_attr "type" "sseshuf")
                                   (eq_attr "mode" "V8SF")))
                         "bdver3-double,bdver3-fpsched,bdver3-fpshuf")
(define_insn_reservation "bdver3_sseshuf" 2
                         (and (eq_attr "cpu" "bdver3")
                              (eq_attr "type" "sseshuf,sseshuf1"))
                         "bdver3-direct,bdver3-fpsched,bdver3-fpshuf")

;; PCMP actually executes in FMAL.
(define_insn_reservation "bdver3_ssecmp_load" 6
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecmp")
				   (eq_attr "memory" "load")))
			 "bdver3-direct,bdver3-fpload,bdver3-ffma")
(define_insn_reservation "bdver3_ssecmp" 2
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "ssecmp"))
			 "bdver3-direct,bdver3-fpsched,bdver3-ffma")
(define_insn_reservation "bdver3_ssecomi_load" 6
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecomi")
				   (eq_attr "memory" "load")))
			 "bdver3-double,bdver3-fpload,(bdver3-ffma | bdver3-fsto)")
(define_insn_reservation "bdver3_ssecomi" 2
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "ssecomi"))
			 "bdver3-double,bdver3-fpsched,(bdver3-ffma | bdver3-fsto)")

;; Conversions behaves very irregularly and the scheduling is critical here.
;; Take each instruction separately.

;; 256 bit conversion.
(define_insn_reservation "bdver3_vcvtX2Y_avx256_load" 8
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "load")
					(ior (ior (match_operand:V4DF 0 "register_operand")
					          (ior (match_operand:V8SF 0 "register_operand")
						       (match_operand:V8SI 0 "register_operand")))
					     (ior (match_operand:V4DF 1 "nonimmediate_operand")
						  (ior (match_operand:V8SF 1 "nonimmediate_operand")
						       (match_operand:V8SI 1 "nonimmediate_operand")))))))
			 "bdver3-vector,bdver3-fpload,bdver3-fvector")
(define_insn_reservation "bdver3_vcvtX2Y_avx256" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "none")
					(ior (ior (match_operand:V4DF 0 "register_operand")
					          (ior (match_operand:V8SF 0 "register_operand")
						       (match_operand:V8SI 0 "register_operand")))
					     (ior (match_operand:V4DF 1 "nonimmediate_operand")
						  (ior (match_operand:V8SF 1 "nonimmediate_operand")
						       (match_operand:V8SI 1 "nonimmediate_operand")))))))
			 "bdver3-vector,bdver3-fpsched,bdver3-fvector")
;; CVTSS2SD, CVTSD2SS.
(define_insn_reservation "bdver3_ssecvt_cvtss2sd_load" 8
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "mode" "SF,DF")
					(eq_attr "memory" "load"))))
			 "bdver3-direct,bdver3-fpload,bdver3-fcvt")
(define_insn_reservation "bdver3_ssecvt_cvtss2sd" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "mode" "SF,DF")
					(eq_attr "memory" "none"))))
			 "bdver3-direct,bdver3-fpsched,bdver3-fcvt")
;; CVTSI2SD, CVTSI2SS, CVTSI2SDQ, CVTSI2SSQ.
(define_insn_reservation "bdver3_sseicvt_cvtsi2sd_load" 8
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "sseicvt")
				   (and (eq_attr "mode" "SF,DF")
					(eq_attr "memory" "load"))))
			 "bdver3-direct,bdver3-fpload,bdver3-fcvt")
(define_insn_reservation "bdver3_sseicvt_cvtsi2sd" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "sseicvt")
				   (and (eq_attr "mode" "SF,DF")
					(eq_attr "memory" "none"))))
			 "bdver3-double,bdver3-fpsched,(nothing | bdver3-fcvt)")
;; CVTPD2PS.
(define_insn_reservation "bdver3_ssecvt_cvtpd2ps_load" 8
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "load")
                                        (and (match_operand:V4SF 0 "register_operand")
					     (match_operand:V2DF 1 "nonimmediate_operand")))))
			 "bdver3-double,bdver3-fpload,(bdver3-fxbar | bdver3-fcvt)")
(define_insn_reservation "bdver3_ssecvt_cvtpd2ps" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "none")
                                        (and (match_operand:V4SF 0 "register_operand")
					     (match_operand:V2DF 1 "nonimmediate_operand")))))
			 "bdver3-double,bdver3-fpsched,(bdver3-fxbar | bdver3-fcvt)")
;; CVTPI2PS, CVTDQ2PS.
(define_insn_reservation "bdver3_ssecvt_cvtdq2ps_load" 8
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "load")
                                        (and (match_operand:V4SF 0 "register_operand")
					     (ior (match_operand:V2SI 1 "nonimmediate_operand")
					          (match_operand:V4SI 1 "nonimmediate_operand"))))))
			 "bdver3-direct,bdver3-fpload,bdver3-fcvt")
(define_insn_reservation "bdver3_ssecvt_cvtdq2ps" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "none")
                                        (and (match_operand:V4SF 0 "register_operand")
					     (ior (match_operand:V2SI 1 "nonimmediate_operand")
					          (match_operand:V4SI 1 "nonimmediate_operand"))))))
			 "bdver3-direct,bdver3-fpsched,bdver3-fcvt")
;; CVTDQ2PD.
(define_insn_reservation "bdver3_ssecvt_cvtdq2pd_load" 8
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "load")
                                        (and (match_operand:V2DF 0 "register_operand")
					     (match_operand:V4SI 1 "nonimmediate_operand")))))
			 "bdver3-double,bdver3-fpload,(bdver3-fxbar | bdver3-fcvt)")
(define_insn_reservation "bdver3_ssecvt_cvtdq2pd" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "none")
                                        (and (match_operand:V2DF 0 "register_operand")
					     (match_operand:V4SI 1 "nonimmediate_operand")))))
			 "bdver3-double,bdver3-fpsched,(bdver3-fxbar | bdver3-fcvt)")
;; CVTPS2PD, CVTPI2PD.
(define_insn_reservation "bdver3_ssecvt_cvtps2pd_load" 6
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "load")
                                        (and (match_operand:V2DF 0 "register_operand")
					     (ior (match_operand:V2SI 1 "nonimmediate_operand")
					          (match_operand:V4SF 1 "nonimmediate_operand"))))))
			 "bdver3-double,bdver3-fpload,(bdver3-fxbar | bdver3-fcvt)")
(define_insn_reservation "bdver3_ssecvt_cvtps2pd" 2
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "load")
                                        (and (match_operand:V2DF 0 "register_operand")
					     (ior (match_operand:V2SI 1 "nonimmediate_operand")
					          (match_operand:V4SF 1 "nonimmediate_operand"))))))
			 "bdver3-double,bdver3-fpsched,(bdver3-fxbar | bdver3-fcvt)")
;; CVTSD2SI, CVTSD2SIQ, CVTSS2SI, CVTSS2SIQ, CVTTSD2SI, CVTTSD2SIQ, CVTTSS2SI, CVTTSS2SIQ.
(define_insn_reservation "bdver3_ssecvt_cvtsX2si_load" 8
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "sseicvt")
				   (and (eq_attr "mode" "SI,DI")
					(eq_attr "memory" "load"))))
			 "bdver3-double,bdver3-fpload,(bdver3-fcvt | bdver3-fsto)")
(define_insn_reservation "bdver3_ssecvt_cvtsX2si" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "sseicvt")
				   (and (eq_attr "mode" "SI,DI")
					(eq_attr "memory" "none"))))
			 "bdver3-double,bdver3-fpsched,(bdver3-fcvt | bdver3-fsto)")
;; CVTPD2PI, CVTTPD2PI.
(define_insn_reservation "bdver3_ssecvt_cvtpd2pi_load" 8
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "load")
				        (and (match_operand:V2DF 1 "nonimmediate_operand")
					     (match_operand:V2SI 0 "register_operand")))))
			 "bdver3-double,bdver3-fpload,(bdver3-fcvt | bdver3-fxbar)")
(define_insn_reservation "bdver3_ssecvt_cvtpd2pi" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "none")
				        (and (match_operand:V2DF 1 "nonimmediate_operand")
					     (match_operand:V2SI 0 "register_operand")))))
			 "bdver3-double,bdver3-fpsched,(bdver3-fcvt | bdver3-fxbar)")
;; CVTPD2DQ, CVTTPD2DQ.
(define_insn_reservation "bdver3_ssecvt_cvtpd2dq_load" 6
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "load")
				        (and (match_operand:V2DF 1 "nonimmediate_operand")
					     (match_operand:V4SI 0 "register_operand")))))
			 "bdver3-double,bdver3-fpload,(bdver3-fcvt | bdver3-fxbar)")
(define_insn_reservation "bdver3_ssecvt_cvtpd2dq" 2
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "none")
				        (and (match_operand:V2DF 1 "nonimmediate_operand")
					     (match_operand:V4SI 0 "register_operand")))))
			 "bdver3-double,bdver3-fpsched,(bdver3-fcvt | bdver3-fxbar)")
;; CVTPS2PI, CVTTPS2PI, CVTPS2DQ, CVTTPS2DQ.
(define_insn_reservation "bdver3_ssecvt_cvtps2pi_load" 8
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
                                   (and (eq_attr "memory" "load")
				        (and (match_operand:V4SF 1 "nonimmediate_operand")
				             (ior (match_operand: V2SI 0 "register_operand")
						  (match_operand: V4SI 0 "register_operand"))))))
			 "bdver3-direct,bdver3-fpload,bdver3-fcvt")
(define_insn_reservation "bdver3_ssecvt_cvtps2pi" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "none")
				        (and (match_operand:V4SF 1 "nonimmediate_operand")
				             (ior (match_operand: V2SI 0 "register_operand")
						  (match_operand: V4SI 0 "register_operand"))))))
			 "bdver3-direct,bdver3-fpsched,bdver3-fcvt")

;; SSE MUL, ADD, and MULADD.
(define_insn_reservation "bdver3_ssemuladd_load_256" 11
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemul,sseadd,sseadd1,ssemuladd")
				   (and (eq_attr "mode" "V8SF,V4DF")
					(eq_attr "memory" "load"))))
			 "bdver3-double,bdver3-fpload,bdver3-ffma")
(define_insn_reservation "bdver3_ssemuladd_256" 7
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemul,sseadd,sseadd1,ssemuladd")
				   (and (eq_attr "mode" "V8SF,V4DF")
					(eq_attr "memory" "none"))))
			 "bdver3-double,bdver3-fpsched,bdver3-ffma")
(define_insn_reservation "bdver3_ssemuladd_load" 10
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemul,sseadd,sseadd1,ssemuladd")
				   (eq_attr "memory" "load")))
			 "bdver3-direct,bdver3-fpload,bdver3-ffma")
(define_insn_reservation "bdver3_ssemuladd" 6
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemul,sseadd,sseadd1,ssemuladd")
				   (eq_attr "memory" "none")))
			 "bdver3-direct,bdver3-fpsched,bdver3-ffma")
(define_insn_reservation "bdver3_sseimul_load" 8
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "sseimul")
				   (eq_attr "memory" "load")))
			 "bdver3-direct,bdver3-fpload,bdver3-fmma")
(define_insn_reservation "bdver3_sseimul" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "sseimul")
				   (eq_attr "memory" "none")))
			 "bdver3-direct,bdver3-fpsched,bdver3-fmma")
(define_insn_reservation "bdver3_sseiadd_load" 6
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "sseiadd")
				   (eq_attr "memory" "load")))
			 "bdver3-direct,bdver3-fpload,bdver3-fmal")
(define_insn_reservation "bdver3_sseiadd" 2
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "sseiadd")
				   (eq_attr "memory" "none")))
			 "bdver3-direct,bdver3-fpsched,bdver3-fmal")

;; SSE DIV: no throughput information (assume same as amdfam10).
(define_insn_reservation "bdver3_ssediv_double_load_256" 27
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssediv")
				   (and (eq_attr "mode" "V4DF")
				        (eq_attr "memory" "load"))))
			 "bdver3-double,bdver3-fpload,(bdver3-ffma0*17 | bdver3-ffma1*17)")
(define_insn_reservation "bdver3_ssediv_double_256" 27
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssediv")
				   (and (eq_attr "mode" "V4DF")
				        (eq_attr "memory" "none"))))
			 "bdver3-double,bdver3-fpsched,(bdver3-ffma0*17 | bdver3-ffma1*17)")
(define_insn_reservation "bdver3_ssediv_single_load_256" 27
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssediv")
				   (and (eq_attr "mode" "V8SF")
				        (eq_attr "memory" "load"))))
			 "bdver3-double,bdver3-fpload,(bdver3-ffma0*17 | bdver3-ffma1*17)")
(define_insn_reservation "bdver3_ssediv_single_256" 24
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssediv")
				   (and (eq_attr "mode" "V8SF")
				        (eq_attr "memory" "none"))))
			 "bdver3-double,bdver3-fpsched,(bdver3-ffma0*17 | bdver3-ffma1*17)")
(define_insn_reservation "bdver3_ssediv_double_load" 27
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssediv")
				   (and (eq_attr "mode" "DF,V2DF")
					(eq_attr "memory" "load"))))
			 "bdver3-direct,bdver3-fpload,(bdver3-ffma0*17 | bdver3-ffma1*17)")
(define_insn_reservation "bdver3_ssediv_double" 27
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssediv")
				   (and (eq_attr "mode" "DF,V2DF")
					(eq_attr "memory" "none"))))
			 "bdver3-direct,bdver3-fpsched,(bdver3-ffma0*17 | bdver3-ffma1*17)")
(define_insn_reservation "bdver3_ssediv_single_load" 27 
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssediv")
				   (and (eq_attr "mode" "SF,V4SF")
					(eq_attr "memory" "load"))))
			 "bdver3-direct,bdver3-fpload,(bdver3-ffma0*17 | bdver3-ffma1*17)")
(define_insn_reservation "bdver3_ssediv_single" 24
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssediv")
				   (and (eq_attr "mode" "SF,V4SF")
					(eq_attr "memory" "none"))))
			 "bdver3-direct,bdver3-fpsched,(bdver3-ffma0*17 | bdver3-ffma1*17)")

(define_insn_reservation "bdver3_sseins" 3
                         (and (eq_attr "cpu" "bdver3")
                              (and (eq_attr "type" "sseins")
                                   (eq_attr "mode" "TI")))
                         "bdver3-direct,bdver3-fpsched,bdver3-fxbar")


^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH, i386]: AMD bdver3 enablement
  2012-11-05  8:06     ` Uros Bizjak
  2012-11-09  3:39       ` Gopalasubramanian, Ganesh
@ 2012-11-09  3:50       ` Gopalasubramanian, Ganesh
  1 sibling, 0 replies; 12+ messages in thread
From: Gopalasubramanian, Ganesh @ 2012-11-09  3:50 UTC (permalink / raw)
  To: gcc-patches; +Cc: Uros Bizjak

[-- Attachment #1: Type: text/plain, Size: 5787 bytes --]

Hi

Changes done with respect to the review comments.
Conditionally setting "sseshuf" type attribute has been removed.
Instead new attribute is added and is included for other attribute calculations.

The patch is attached as (difflog.txt).
The new file (bdver3.md) describing the pipelines is also attached.

Bootstrapping and "make -k check" passes.

OK for upstream?

2012-11-09  Ganesh Gopalasubramanian  <Ganesh.Gopalasubramanian@amd.com>

	bdver3 Enablement
	* gcc/doc/extend.texi: Add details about bdver3.
	* gcc/doc/invoke.texi: Add details about bdver3.
	* config.gcc (i[34567]86-*-linux* | ...): Add bdver3.
	(case ${target}): Add bdver3.
	* config/i386/i386.h (TARGET_BDVER3): New definition.
	* config/i386/i386.md (define_attr "cpu"): Add bdver3.
	* config/i386/sse.md (sseshuf): New type attribute.
	* config/i386/athlon.md (sseshuf):Likewise.
	* config/i386/atom.md (sseshuf):Likewise.
	* config/i386/ppro.md (sseshuf):Likewise.
	* config/i386/bdver1.md (sseshuf):Likewise.
	* config/i386/i386.opt (flag_dispatch_scheduler): Add bdver3.
	* config/i386/i386-c.c (ix86_target_macros_internal): Add
	bdver3 def_and_undef
	* config/i386/driver-i386.c (host_detect_local_cpu): Let
	-march=native recognize bdver3 processors.
	* config/i386/i386.c (struct processor_costs bdver3_cost): New.
	(m_BDVER3): New definition.
	(m_AMD_MULTIPLE): Includes m_BDVER3.
	(initial_ix86_tune_features): Add bdver3 tune.
	(processor_target_table): Add bdver3 entry.
	(static const char *const cpu_names): Add bdver3 entry.
	(software_prefetching_beneficial_p): Add bdver3.
	(ix86_option_override_internal): Add bdver3 instruction sets.
	(ix86_option_override_internal): Remove XSAVEOPT for bdver1 
	and bdver2.
	(ix86_issue_rate): Add bdver3.
	(ix86_adjust_cost): Add bdver3.
	(enum target_cpu_default): Add TARGET_CPU_DEFAULT_bdver3.
	(enum processor_type): Add PROCESSOR_BDVER3.
	* config/i386/bdver3.md: New file describing bdver3 pipelines.

Regards
Ganesh

-----Original Message-----
From: Uros Bizjak [mailto:ubizjak@gmail.com] 
Sent: Monday, November 05, 2012 1:37 PM
To: Gopalasubramanian, Ganesh
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH, i386]: AMD bdver3 enablement

On Mon, Nov 5, 2012 at 8:33 AM, Gopalasubramanian, Ganesh <Ganesh.Gopalasubramanian@amd.com> wrote:
> Couple of changes done with respect to the review comments.
>
> 1. sseshuf type attribute is handled in unit attribute calculation.
> 2. sseadd1 instruction attribute is handled in the new scheduler descriptions.
>
> The patch is attached as (patch.txt).
> The new file (bdver3.md) describing the pipelines is also attached.

-  [(set_attr "type" "sselog")
+  [(set (attr "type")
+     (if_then_else (eq_attr "cpu" "bdver3")
+        (const_string "sseshuf")
+        (const_string "sselog")))
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "vex")
    (set_attr "mode" "V8SF")])
@@ -3911,7 +3914,10 @@
     }
 }
   [(set_attr "isa" "noavx,avx")
-   (set_attr "type" "sselog")
+   (set (attr "type")
+     (if_then_else (eq_attr "cpu" "bdver3")
+        (const_string "sseshuf")
+        (const_string "sselog")))
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "V4SF")])
@@ -4018,7 +4024,27 @@
    vmovlps\t{%2, %1, %0|%0, %1, %2}
    %vmovlps\t{%2, %0|%0, %2}"
   [(set_attr "isa" "noavx,avx,noavx,avx,*")
-   (set_attr "type" "sselog,sselog,ssemov,ssemov,ssemov")
+   (set (attr "type")
+        (cond [(and (eq_attr "cpu" "bdver3")
+                 (eq_attr "alternative" "0"))
+                 (const_string "sseshuf")
+               (and (eq_attr "cpu" "bdver3")
+                 (eq_attr "alternative" "1"))
+                 (const_string "sseshuf")
+                 (eq_attr "alternative" "2")
+                 (const_string "ssemov")
+                 (eq_attr "alternative" "3")
+                 (const_string "ssemov")
+                 (eq_attr "alternative" "4")
+                 (const_string "ssemov")
+              (and (not (eq_attr "cpu" "bdver3"))
+                 (eq_attr "alternative" "0"))
+                 (const_string "sselog")
+              (and (not (eq_attr "cpu" "bdver3"))
+                 (eq_attr "alternative" "1"))
+                 (const_string "sselog")
+               ]
+               (const_string "*" )))
    (set_attr "length_immediate" "1,1,*,*,*")
    (set_attr "prefix" "orig,vex,orig,vex,maybe_vex")
    (set_attr "mode" "V4SF,V4SF,V2SF,V2SF,V2SF")]) @@ -4072,7 +4098,23 @@
    vbroadcastss\t{%1, %0|%0, %1}
    shufps\t{$0, %0, %0|%0, %0, 0}"
   [(set_attr "isa" "avx,avx,noavx")
-   (set_attr "type" "sselog1,ssemov,sselog1")
+   (set (attr "type")
+        (cond [(and (eq_attr "cpu" "bdver3")
+                 (eq_attr "alternative" "0"))
+                 (const_string "sseshuf")
+                (and (eq_attr "cpu" "bdver3")
+                 (eq_attr "alternative" "2"))
+                 (const_string "sseshuf")
+                (eq_attr "alternative" "1")
+                 (const_string "ssemov")
+               (and (not (eq_attr "cpu" "bdver3"))
+                 (eq_attr "alternative" "0"))
+                 (const_string "sselog1")
+               (and (not (eq_attr "cpu" "bdver3"))
+                 (eq_attr "alternative" "2"))
+                 (const_string "sselog1")
+               ]
+               (const_string "*" )))

Please don't conditionally change type attribute. Change sselog{,1} attribute unconditionally to sseshuf{,1} and handle them in the same way as sselog{,1}.

In other words, add new attributes to all places where original attributes are handled.

Otherwise, the patch looks good.

Uros.


[-- Attachment #2: difflog.txt --]
[-- Type: text/plain, Size: 27501 bytes --]

Index: gcc/doc/extend.texi
===================================================================
--- gcc/doc/extend.texi	(revision 193132)
+++ gcc/doc/extend.texi	(working copy)
@@ -9608,6 +9608,9 @@
 @item bdver2
 AMD family 15h Bulldozer version 2.
 
+@item bdver3
+AMD family 15h Bulldozer version 3.
+
 @item btver2
 AMD family 16h CPU.
 @end table
Index: gcc/doc/invoke.texi
===================================================================
--- gcc/doc/invoke.texi	(revision 193132)
+++ gcc/doc/invoke.texi	(working copy)
@@ -13678,6 +13678,11 @@
 supersets BMI, TBM, F16C, FMA, AVX, XOP, LWP, AES, PCL_MUL, CX16, MMX, SSE,
 SSE2, SSE3, SSE4A, SSSE3, SSE4.1, SSE4.2, ABM and 64-bit instruction set 
 extensions.)
+@item bdver3
+AMD Family 15h core based CPUs with x86-64 instruction set support.  (This
+supersets BMI, TBM, F16C, FMA, AVX, XOP, LWP, AES, PCL_MUL, CX16, MMX, SSE,
+SSE2, SSE3, SSE4A, SSSE3, SSE4.1, SSE4.2, ABM and 64-bit instruction set 
+extensions.)
 
 @item btver1
 CPUs based on AMD Family 14h cores with x86-64 instruction set support.  (This
Index: gcc/config.gcc
===================================================================
--- gcc/config.gcc	(revision 193132)
+++ gcc/config.gcc	(working copy)
@@ -1269,7 +1269,7 @@
 			TM_MULTILIB_CONFIG=`echo $TM_MULTILIB_CONFIG | sed 's/^,//'`
 			need_64bit_isa=yes
 			case X"${with_cpu}" in
-			Xgeneric|Xatom|Xcore2|Xcorei7|Xcorei7-avx|Xnocona|Xx86-64|Xbdver2|Xbdver1|Xbtver2|Xbtver1|Xamdfam10|Xbarcelona|Xk8|Xopteron|Xathlon64|Xathlon-fx|Xathlon64-sse3|Xk8-sse3|Xopteron-sse3)
+			Xgeneric|Xatom|Xcore2|Xcorei7|Xcorei7-avx|Xnocona|Xx86-64|Xbdver3|Xbdver2|Xbdver1|Xbtver2|Xbtver1|Xamdfam10|Xbarcelona|Xk8|Xopteron|Xathlon64|Xathlon-fx|Xathlon64-sse3|Xk8-sse3|Xopteron-sse3)
 				;;
 			X)
 				if test x$with_cpu_64 = x; then
@@ -1278,7 +1278,7 @@
 				;;
 			*)
 				echo "Unsupported CPU used in --with-cpu=$with_cpu, supported values:" 1>&2
-				echo "generic atom core2 corei7 corei7-avx nocona x86-64 bdver2 bdver1 btver2 btver1 amdfam10 barcelona k8 opteron athlon64 athlon-fx athlon64-sse3 k8-sse3 opteron-sse3" 1>&2
+				echo "generic atom core2 corei7 corei7-avx nocona x86-64 bdver3 bdver2 bdver1 btver2 btver1 amdfam10 barcelona k8 opteron athlon64 athlon-fx athlon64-sse3 k8-sse3 opteron-sse3" 1>&2
 				exit 1
 				;;
 			esac
@@ -1390,7 +1390,7 @@
 		tmake_file="$tmake_file i386/t-sol2-64"
 		need_64bit_isa=yes
 		case X"${with_cpu}" in
-		Xgeneric|Xatom|Xcore2|Xcorei7|Xcorei7-avx|Xnocona|Xx86-64|Xbdver2|Xbdver1|Xbtver2|Xbtver1|Xamdfam10|Xbarcelona|Xk8|Xopteron|Xathlon64|Xathlon-fx|Xathlon64-sse3|Xk8-sse3|Xopteron-sse3)
+		Xgeneric|Xatom|Xcore2|Xcorei7|Xcorei7-avx|Xnocona|Xx86-64|Xbdver3|Xbdver2|Xbdver1|Xbtver2|Xbtver1|Xamdfam10|Xbarcelona|Xk8|Xopteron|Xathlon64|Xathlon-fx|Xathlon64-sse3|Xk8-sse3|Xopteron-sse3)
 			;;
 		X)
 			if test x$with_cpu_64 = x; then
@@ -1399,7 +1399,7 @@
 			;;
 		*)
 			echo "Unsupported CPU used in --with-cpu=$with_cpu, supported values:" 1>&2
-			echo "generic atom core2 corei7 corei7-avx nocona x86-64 bdver2 bdver1 btver2 btver1 amdfam10 barcelona k8 opteron athlon64 athlon-fx athlon64-sse3 k8-sse3 opteron-sse3" 1>&2
+			echo "generic atom core2 corei7 corei7-avx nocona x86-64 bdver3 bdver2 bdver1 btver2 btver1 amdfam10 barcelona k8 opteron athlon64 athlon-fx athlon64-sse3 k8-sse3 opteron-sse3" 1>&2
 			exit 1
 			;;
 		esac
@@ -1456,7 +1456,7 @@
 			if test x$enable_targets = xall; then
 				tm_defines="${tm_defines} TARGET_BI_ARCH=1"
 				case X"${with_cpu}" in
-				Xgeneric|Xatom|Xcore2|Xcorei7|Xcorei7-avx|Xnocona|Xx86-64|Xbdver2|Xbdver1|Xbtver2|Xbtver1|Xamdfam10|Xbarcelona|Xk8|Xopteron|Xathlon64|Xathlon-fx|Xathlon64-sse3|Xk8-sse3|Xopteron-sse3)
+				Xgeneric|Xatom|Xcore2|Xcorei7|Xcorei7-avx|Xnocona|Xx86-64|Xbdver3|Xbdver2|Xbdver1|Xbtver2|Xbtver1|Xamdfam10|Xbarcelona|Xk8|Xopteron|Xathlon64|Xathlon-fx|Xathlon64-sse3|Xk8-sse3|Xopteron-sse3)
 					;;
 				X)
 					if test x$with_cpu_64 = x; then
@@ -1465,7 +1465,7 @@
 					;;
 				*)
 					echo "Unsupported CPU used in --with-cpu=$with_cpu, supported values:" 1>&2
-					echo "generic atom core2 corei7 Xcorei7-avx nocona x86-64 bdver2 bdver1 btver2 btver1 amdfam10 barcelona k8 opteron athlon64 athlon-fx athlon64-sse3 k8-sse3 opteron-sse3" 1>&2
+					echo "generic atom core2 corei7 Xcorei7-avx nocona x86-64 bdver3 bdver2 bdver1 btver2 btver1 amdfam10 barcelona k8 opteron athlon64 athlon-fx athlon64-sse3 k8-sse3 opteron-sse3" 1>&2
 					exit 1
 					;;
 				esac
@@ -2706,6 +2706,10 @@
     ;;
   i686-*-* | i786-*-*)
     case ${target_noncanonical} in
+      bdver3-*)
+        arch=bdver3
+        cpu=bdver3
+        ;;
       bdver2-*)
         arch=bdver2
         cpu=bdver2
@@ -2807,6 +2811,10 @@
     ;;
   x86_64-*-*)
     case ${target_noncanonical} in
+      bdver3-*)
+        arch=bdver3
+        cpu=bdver3
+        ;;
       bdver2-*)
         arch=bdver2
         cpu=bdver2
@@ -3344,8 +3352,8 @@
 				;;
 			"" | x86-64 | generic | native \
 			| k8 | k8-sse3 | athlon64 | athlon64-sse3 | opteron \
-			| opteron-sse3 | athlon-fx | bdver2 | bdver1 | btver2 | btver1 \
-			| amdfam10 | barcelona | nocona | core2 | corei7 \
+			| opteron-sse3 | athlon-fx | bdver3 | bdver2 | bdver1 | btver2 \
+			| btver1 | amdfam10 | barcelona | nocona | core2 | corei7 \
 			| corei7-avx | core-avx-i | core-avx2 | atom)
 				# OK
 				;;
Index: gcc/config/i386/i386.h
===================================================================
--- gcc/config/i386/i386.h	(revision 193132)
+++ gcc/config/i386/i386.h	(working copy)
@@ -254,6 +254,7 @@
 #define TARGET_AMDFAM10 (ix86_tune == PROCESSOR_AMDFAM10)
 #define TARGET_BDVER1 (ix86_tune == PROCESSOR_BDVER1)
 #define TARGET_BDVER2 (ix86_tune == PROCESSOR_BDVER2)
+#define TARGET_BDVER3 (ix86_tune == PROCESSOR_BDVER3)
 #define TARGET_BTVER1 (ix86_tune == PROCESSOR_BTVER1)
 #define TARGET_BTVER2 (ix86_tune == PROCESSOR_BTVER2)
 #define TARGET_ATOM (ix86_tune == PROCESSOR_ATOM)
@@ -616,6 +617,7 @@
   TARGET_CPU_DEFAULT_amdfam10,
   TARGET_CPU_DEFAULT_bdver1,
   TARGET_CPU_DEFAULT_bdver2,
+  TARGET_CPU_DEFAULT_bdver3,
   TARGET_CPU_DEFAULT_btver1,
   TARGET_CPU_DEFAULT_btver2,
 
@@ -2098,6 +2100,7 @@
   PROCESSOR_AMDFAM10,
   PROCESSOR_BDVER1,
   PROCESSOR_BDVER2,
+  PROCESSOR_BDVER3,
   PROCESSOR_BTVER1,
   PROCESSOR_BTVER2,
   PROCESSOR_ATOM,
Index: gcc/config/i386/i386.md
===================================================================
--- gcc/config/i386/i386.md	(revision 193132)
+++ gcc/config/i386/i386.md	(working copy)
@@ -323,7 +323,7 @@
 \f
 ;; Processor type.
 (define_attr "cpu" "none,pentium,pentiumpro,geode,k6,athlon,k8,core2,corei7,
-		    atom,generic64,amdfam10,bdver1,bdver2,btver1,btver2"
+		    atom,generic64,amdfam10,bdver1,bdver2,bdver3,btver1,btver2"
   (const (symbol_ref "ix86_schedule")))
 
 ;; A basic instruction type.  Refinements due to arguments to be
@@ -336,9 +336,9 @@
    push,pop,call,callv,leave,
    str,bitmanip,
    fmov,fop,fsgn,fmul,fdiv,fpspc,fcmov,fcmp,fxch,fistp,fisttp,frndint,
-   sselog,sselog1,sseiadd,sseiadd1,sseishft,sseishft1,sseimul,
-   sse,ssemov,sseadd,sseadd1,ssemul,ssecmp,ssecomi,ssecvt,ssecvt1,sseicvt,
-   ssediv,sseins,ssemuladd,sse4arg,lwp,
+   sselog,sselog1,sseiadd,sseiadd1,sseishft,sseishft1,sseimul,sse,
+   ssemov,sseadd,sseadd1,ssemul,ssecmp,ssecomi,ssecvt,ssecvt1,sseicvt,
+   sseshuf,sseshuf1,ssediv,sseins,ssemuladd,sse4arg,lwp,
    mmx,mmxmov,mmxadd,mmxmul,mmxcmp,mmxcvt,mmxshft"
   (const_string "other"))
 
@@ -353,7 +353,7 @@
 	   (const_string "i387")
 	 (eq_attr "type" "sselog,sselog1,sseiadd,sseiadd1,sseishft,sseishft1,sseimul,
 			  sse,ssemov,sseadd,sseadd1,ssemul,ssecmp,ssecomi,ssecvt,
-			  ssecvt1,sseicvt,ssediv,sseins,ssemuladd,sse4arg")
+			  sseshuf,sseshuf1,ssecvt1,sseicvt,ssediv,sseins,ssemuladd,sse4arg")
 	   (const_string "sse")
 	 (eq_attr "type" "mmx,mmxmov,mmxadd,mmxmul,mmxcmp,mmxcvt,mmxshft")
 	   (const_string "mmx")
@@ -594,7 +594,7 @@
 	   (if_then_else (match_operand 1 "constant_call_address_operand")
 	     (const_string "none")
 	     (const_string "load"))
-	 (and (eq_attr "type" "alu1,negnot,ishift1,sselog1")
+	 (and (eq_attr "type" "alu1,negnot,ishift1,sselog1,sseshuf1")
 	      (match_operand 1 "memory_operand"))
 	   (const_string "both")
 	 (and (match_operand 0 "memory_operand")
@@ -609,7 +609,7 @@
 		   imov,imovx,icmp,test,bitmanip,
 		   fmov,fcmp,fsgn,
 		   sse,ssemov,ssecmp,ssecomi,ssecvt,ssecvt1,sseicvt,sselog1,
-		   sseadd1,sseiadd1,mmx,mmxmov,mmxcmp,mmxcvt")
+		   sseshuf1,sseadd1,sseiadd1,mmx,mmxmov,mmxcmp,mmxcvt")
 	      (match_operand 2 "memory_operand"))
 	   (const_string "load")
 	 (and (eq_attr "type" "icmov,ssemuladd,sse4arg")
@@ -947,6 +947,7 @@
 (include "k6.md")
 (include "athlon.md")
 (include "bdver1.md")
+(include "bdver3.md")
 (include "geode.md")
 (include "atom.md")
 (include "core2.md")
Index: gcc/config/i386/athlon.md
===================================================================
--- gcc/config/i386/athlon.md	(revision 193132)
+++ gcc/config/i386/athlon.md	(working copy)
@@ -736,6 +736,36 @@
 			      (eq_attr "type" "sselog,sselog1"))
 			 "athlon-direct,athlon-fpsched,(athlon-fadd|athlon-fmul)")
 
+;;SSE shuffle operations
+(define_insn_reservation "athlon_sseshuf_load" 3
+                         (and (eq_attr "cpu" "athlon")
+                              (and (eq_attr "type" "sseshuf,sseshuf1")
+                                   (eq_attr "memory" "load")))
+                         "athlon-vector,athlon-fpload2,(athlon-fmul*2)")
+(define_insn_reservation "athlon_sseshuf_load_k8" 5
+                         (and (eq_attr "cpu" "k8,generic64")
+                              (and (eq_attr "type" "sseshuf,sseshuf1")
+                                   (eq_attr "memory" "load")))
+                         "athlon-double,athlon-fpload2k8,(athlon-fmul*2)")
+(define_insn_reservation "athlon_sseshuf_load_amdfam10" 4
+                         (and (eq_attr "cpu" "amdfam10")
+                              (and (eq_attr "type" "sseshuf,sseshuf1")
+                                   (eq_attr "memory" "load")))
+                         "athlon-direct,athlon-fploadk8,(athlon-fadd|athlon-fmul)")
+
+(define_insn_reservation "athlon_sseshuf" 3
+                         (and (eq_attr "cpu" "athlon")
+                              (eq_attr "type" "sseshuf,sseshuf1"))
+                         "athlon-vector,athlon-fpsched,athlon-fmul*2")
+(define_insn_reservation "athlon_sseshuf_k8" 3
+                         (and (eq_attr "cpu" "k8,generic64")
+                              (eq_attr "type" "sseshuf,sseshuf1"))
+                         "athlon-double,athlon-fpsched,athlon-fmul")
+(define_insn_reservation "athlon_sseshuf_amdfam10" 2
+                         (and (eq_attr "cpu" "amdfam10")
+                              (eq_attr "type" "sseshuf,sseshuf1"))
+                         "athlon-direct,athlon-fpsched,(athlon-fadd|athlon-fmul)")
+
 ;; ??? pcmp executes in addmul, probably not worthwhile to bother about that.
 (define_insn_reservation "athlon_ssecmp_load" 2
 			 (and (eq_attr "cpu" "athlon")
Index: gcc/config/i386/atom.md
===================================================================
--- gcc/config/i386/atom.md	(revision 193132)
+++ gcc/config/i386/atom.md	(working copy)
@@ -455,6 +455,30 @@
             (eq_attr "memory" "!none")))
   "atom-simple-0")
 
+(define_insn_reservation  "atom_sseshuf" 1
+  (and (eq_attr "cpu" "atom")
+       (and (eq_attr "type" "sseshuf")
+            (eq_attr "memory" "none")))
+  "atom-simple-either")
+
+(define_insn_reservation  "atom_sseshuf_mem" 1
+  (and (eq_attr "cpu" "atom")
+       (and (eq_attr "type" "sseshuf")
+            (eq_attr "memory" "!none")))
+  "atom-simple-either")
+
+(define_insn_reservation  "atom_sseshuf1" 1
+  (and (eq_attr "cpu" "atom")
+       (and (eq_attr "type" "sseshuf1")
+            (eq_attr "memory" "none")))
+  "atom-simple-0")
+
+(define_insn_reservation  "atom_sseshuf1_mem" 1
+  (and (eq_attr "cpu" "atom")
+       (and (eq_attr "type" "sseshuf1")
+            (eq_attr "memory" "!none")))
+  "atom-simple-0")
+
 ;; not pmad, not psad
 (define_insn_reservation  "atom_sseiadd" 1
   (and (eq_attr "cpu" "atom")
@@ -743,8 +767,8 @@
                   atom_imul_mem, atom_icmp_mem,
                   atom_test_mem, atom_icmov_mem, atom_sselog_mem,
                   atom_sselog1_mem, atom_fmov_mem, atom_sseadd_mem,
-                  atom_ishift_mem, atom_ishift1_mem, 
-                  atom_rotate_mem, atom_rotate1_mem"
+                  atom_ishift_mem, atom_ishift1_mem, atom_sseshuf_mem, 
+                  atom_sseshuf1_mem, atom_rotate_mem, atom_rotate1_mem"
                   "ix86_agi_dependent")
 
 ;; Stall from imul to lea is 8 cycles.
@@ -757,7 +781,8 @@
                   atom_ishift_mem, atom_ishift1_mem, atom_rotate_mem,
                   atom_rotate1_mem, atom_imul_mem, atom_icmp_mem,
                   atom_test_mem, atom_icmov_mem, atom_sselog_mem,
-                  atom_sselog1_mem, atom_fmov_mem, atom_sseadd_mem"
+                  atom_sselog1_mem, atom_fmov_mem, atom_sseadd_mem,
+		   atom_sseshuf_mem, atom_sseshuf1_mem"
                   "ix86_agi_dependent")
 
 ;; There will be 0 cycle stall from cmp/test to jcc
Index: gcc/config/i386/ppro.md
===================================================================
--- gcc/config/i386/ppro.md	(revision 193132)
+++ gcc/config/i386/ppro.md	(working copy)
@@ -700,6 +700,20 @@
 					(eq_attr "type" "sselog,sselog1"))))
 			 "decoder0,(p2+p1)")
 
+(define_insn_reservation "ppro_sse_shuf_V4SF" 2
+                         (and (eq_attr "cpu" "pentiumpro")
+                              (and (eq_attr "memory" "none")
+                                   (and (eq_attr "mode" "V4SF")
+                                        (eq_attr "type" "sseshuf,sseshuf1"))))
+                         "decodern,p1")
+
+(define_insn_reservation "ppro_sse_shuf_V4SF_load" 2
+                         (and (eq_attr "cpu" "pentiumpro")
+                              (and (eq_attr "memory" "load")
+                                   (and (eq_attr "mode" "V4SF")
+                                        (eq_attr "type" "sseshuf,sseshuf1"))))
+                         "decoder0,(p2+p1)")
+
 (define_insn_reservation "ppro_sse_mov_V4SF" 1
 			 (and (eq_attr "cpu" "pentiumpro")
 			      (and (eq_attr "memory" "none")
Index: gcc/config/i386/sse.md
===================================================================
--- gcc/config/i386/sse.md	(revision 193132)
+++ gcc/config/i386/sse.md	(working copy)
@@ -3860,7 +3860,7 @@
 
   return "vshufps\t{%3, %2, %1, %0|%0, %1, %2, %3}";
 }
-  [(set_attr "type" "sselog")
+  [(set_attr "type" "sseshuf")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "vex")
    (set_attr "mode" "V8SF")])
@@ -3911,7 +3911,7 @@
     }
 }
   [(set_attr "isa" "noavx,avx")
-   (set_attr "type" "sselog")
+   (set_attr "type" "sseshuf")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "V4SF")])
@@ -4018,7 +4018,7 @@
    vmovlps\t{%2, %1, %0|%0, %1, %2}
    %vmovlps\t{%2, %0|%0, %2}"
   [(set_attr "isa" "noavx,avx,noavx,avx,*")
-   (set_attr "type" "sselog,sselog,ssemov,ssemov,ssemov")
+   (set_attr "type" "sseshuf,sseshuf,ssemov,ssemov,ssemov")
    (set_attr "length_immediate" "1,1,*,*,*")
    (set_attr "prefix" "orig,vex,orig,vex,maybe_vex")
    (set_attr "mode" "V4SF,V4SF,V2SF,V2SF,V2SF")])
@@ -4072,7 +4072,7 @@
    vbroadcastss\t{%1, %0|%0, %1}
    shufps\t{$0, %0, %0|%0, %0, 0}"
   [(set_attr "isa" "avx,avx,noavx")
-   (set_attr "type" "sselog1,ssemov,sselog1")
+   (set_attr "type" "sseshuf1,ssemov,sseshuf1")
    (set_attr "length_immediate" "1,0,1")
    (set_attr "prefix_extra" "0,1,*")
    (set_attr "prefix" "vex,vex,orig")
@@ -4802,7 +4802,7 @@
 
   return "vshufpd\t{%3, %2, %1, %0|%0, %1, %2, %3}";
 }
-  [(set_attr "type" "sselog")
+  [(set_attr "type" "sseshuf")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "vex")
    (set_attr "mode" "V4DF")])
@@ -4916,7 +4916,7 @@
     }
 }
   [(set_attr "isa" "noavx,avx")
-   (set_attr "type" "sselog")
+   (set_attr "type" "sseshuf")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "V2DF")])
Index: gcc/config/i386/i386-c.c
===================================================================
--- gcc/config/i386/i386-c.c	(revision 193132)
+++ gcc/config/i386/i386-c.c	(working copy)
@@ -114,6 +114,10 @@
       def_or_undef (parse_in, "__bdver2");
       def_or_undef (parse_in, "__bdver2__");
       break;
+    case PROCESSOR_BDVER3:
+      def_or_undef (parse_in, "__bdver3");
+      def_or_undef (parse_in, "__bdver3__");
+      break;
     case PROCESSOR_BTVER1:
       def_or_undef (parse_in, "__btver1");
       def_or_undef (parse_in, "__btver1__");
@@ -209,7 +213,10 @@
     case PROCESSOR_BDVER2:
       def_or_undef (parse_in, "__tune_bdver2__");
       break;
-   case PROCESSOR_BTVER1:
+    case PROCESSOR_BDVER3:
+      def_or_undef (parse_in, "__tune_bdver3__");
+      break;
+    case PROCESSOR_BTVER1:
       def_or_undef (parse_in, "__tune_btver1__");
       break;
     case PROCESSOR_BTVER2:
Index: gcc/config/i386/i386.opt
===================================================================
--- gcc/config/i386/i386.opt	(revision 193132)
+++ gcc/config/i386/i386.opt	(working copy)
@@ -419,7 +419,7 @@
 
 mdispatch-scheduler
 Target RejectNegative Var(flag_dispatch_scheduler)
-Do dispatch scheduling if processor is bdver1 or bdver2 and Haifa scheduling
+Do dispatch scheduling if processor is bdver1 or bdver2 or bdver3 and Haifa scheduling
 is selected.
 
 mprefer-avx128
Index: gcc/config/i386/bdver1.md
===================================================================
--- gcc/config/i386/bdver1.md	(revision 193132)
+++ gcc/config/i386/bdver1.md	(working copy)
@@ -501,6 +501,28 @@
 			      (eq_attr "type" "sselog,sselog1"))
 			 "bdver1-direct,bdver1-fpsched,bdver1-fxbar")
 
+;; SSE shuffles
+(define_insn_reservation "bdver1_sseshuf_load_256" 7
+                         (and (eq_attr "cpu" "bdver1,bdver2")
+                              (and (eq_attr "type" "sseshuf,sseshuf1")
+                                   (and (eq_attr "mode" "V8SF")
+                                   (eq_attr "memory" "load"))))
+                         "bdver1-double,bdver1-fpload,bdver1-fmal")
+(define_insn_reservation "bdver1_sseshuf_load" 6
+                         (and (eq_attr "cpu" "bdver1,bdver2")
+                              (and (eq_attr "type" "sseshuf,sseshuf1")
+                                   (eq_attr "memory" "load")))
+                         "bdver1-direct,bdver1-fpload,bdver1-fxbar")
+(define_insn_reservation "bdver1_sseshuf_256" 3
+                         (and (eq_attr "cpu" "bdver1,bdver2")
+                              (and (eq_attr "type" "sseshuf,sseshuf1")
+                                   (eq_attr "mode" "V8SF")))
+                         "bdver1-double,bdver1-fpsched,bdver1-fmal")
+(define_insn_reservation "bdver1_sseshuf" 2
+                         (and (eq_attr "cpu" "bdver1,bdver2")
+                              (eq_attr "type" "sseshuf,sseshuf1"))
+                         "bdver1-direct,bdver1-fpsched,bdver1-fxbar")
+
 ;; PCMP actually executes in FMAL.
 (define_insn_reservation "bdver1_ssecmp_load" 6
 			 (and (eq_attr "cpu" "bdver1,bdver2")
Index: gcc/config/i386/driver-i386.c
===================================================================
--- gcc/config/i386/driver-i386.c	(revision 193132)
+++ gcc/config/i386/driver-i386.c	(working copy)
@@ -542,6 +542,8 @@
 	processor = PROCESSOR_GEODE;
       else if (has_movbe)
 	processor = PROCESSOR_BTVER2;
+      else if (has_xsaveopt)
+        processor = PROCESSOR_BDVER3;
       else if (has_bmi)
         processor = PROCESSOR_BDVER2;
       else if (has_xop)
@@ -712,6 +714,9 @@
     case PROCESSOR_BDVER2:
       cpu = "bdver2";
       break;
+    case PROCESSOR_BDVER3:
+      cpu = "bdver3";
+      break;
     case PROCESSOR_BTVER1:
       cpu = "btver1";
       break;
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 193132)
+++ gcc/config/i386/i386.c	(working copy)
@@ -1427,6 +1427,85 @@
   1,					/* cond_not_taken_branch_cost.  */
 };
 
+struct processor_costs bdver3_cost = {
+  COSTS_N_INSNS (1),			/* cost of an add instruction */
+  COSTS_N_INSNS (1),			/* cost of a lea instruction */
+  COSTS_N_INSNS (1),			/* variable shift costs */
+  COSTS_N_INSNS (1),			/* constant shift costs */
+  {COSTS_N_INSNS (4),			/* cost of starting multiply for QI */
+   COSTS_N_INSNS (4),			/*				 HI */
+   COSTS_N_INSNS (4),			/*				 SI */
+   COSTS_N_INSNS (6),			/*				 DI */
+   COSTS_N_INSNS (6)},			/*			      other */
+  0,					/* cost of multiply per each bit set */
+  {COSTS_N_INSNS (19),			/* cost of a divide/mod for QI */
+   COSTS_N_INSNS (35),			/*			    HI */
+   COSTS_N_INSNS (51),			/*			    SI */
+   COSTS_N_INSNS (83),			/*			    DI */
+   COSTS_N_INSNS (83)},			/*			    other */
+  COSTS_N_INSNS (1),			/* cost of movsx */
+  COSTS_N_INSNS (1),			/* cost of movzx */
+  8,					/* "large" insn */
+  9,					/* MOVE_RATIO */
+  4,				     /* cost for loading QImode using movzbl */
+  {5, 5, 4},				/* cost of loading integer registers
+					   in QImode, HImode and SImode.
+					   Relative to reg-reg move (2).  */
+  {4, 4, 4},				/* cost of storing integer registers */
+  2,					/* cost of reg,reg fld/fst */
+  {5, 5, 12},				/* cost of loading fp registers
+		   			   in SFmode, DFmode and XFmode */
+  {4, 4, 8},				/* cost of storing fp registers
+ 		   			   in SFmode, DFmode and XFmode */
+  2,					/* cost of moving MMX register */
+  {4, 4},				/* cost of loading MMX registers
+					   in SImode and DImode */
+  {4, 4},				/* cost of storing MMX registers
+					   in SImode and DImode */
+  2,					/* cost of moving SSE register */
+  {4, 4, 4},				/* cost of loading SSE registers
+					   in SImode, DImode and TImode */
+  {4, 4, 4},				/* cost of storing SSE registers
+					   in SImode, DImode and TImode */
+  2,					/* MMX or SSE register to integer */
+  16,					/* size of l1 cache.  */
+  2048,					/* size of l2 cache.  */
+  64,					/* size of prefetch block */
+  /* New AMD processors never drop prefetches; if they cannot be performed
+     immediately, they are queued.  We set number of simultaneous prefetches
+     to a large constant to reflect this (it probably is not a good idea not
+     to limit number of prefetches at all, as their execution also takes some
+     time).  */
+  100,					/* number of parallel prefetches */
+  2,					/* Branch cost */
+  COSTS_N_INSNS (6),			/* cost of FADD and FSUB insns.  */
+  COSTS_N_INSNS (6),			/* cost of FMUL instruction.  */
+  COSTS_N_INSNS (42),			/* cost of FDIV instruction.  */
+  COSTS_N_INSNS (2),			/* cost of FABS instruction.  */
+  COSTS_N_INSNS (2),			/* cost of FCHS instruction.  */
+  COSTS_N_INSNS (52),			/* cost of FSQRT instruction.  */
+
+  /*  BDVER3 has optimized REP instruction for medium sized blocks, but for
+      very small blocks it is better to use loop. For large blocks, libcall
+      can do nontemporary accesses and beat inline considerably.  */
+  {{libcall, {{6, loop}, {14, unrolled_loop}, {-1, rep_prefix_4_byte}}},
+   {libcall, {{16, loop}, {8192, rep_prefix_8_byte}, {-1, libcall}}}},
+  {{libcall, {{8, loop}, {24, unrolled_loop},
+	      {2048, rep_prefix_4_byte}, {-1, libcall}}},
+   {libcall, {{48, unrolled_loop}, {8192, rep_prefix_8_byte}, {-1, libcall}}}},
+  6,					/* scalar_stmt_cost.  */
+  4,					/* scalar load_cost.  */
+  4,					/* scalar_store_cost.  */
+  6,					/* vec_stmt_cost.  */
+  0,					/* vec_to_scalar_cost.  */
+  2,					/* scalar_to_vec_cost.  */
+  4,					/* vec_align_load_cost.  */
+  4,					/* vec_unalign_load_cost.  */
+  4,					/* vec_store_cost.  */
+  2,					/* cond_taken_branch_cost.  */
+  1,					/* cond_not_taken_branch_cost.  */
+};
+
 struct processor_costs btver1_cost = {
   COSTS_N_INSNS (1),			/* cost of an add instruction */
   COSTS_N_INSNS (2),			/* cost of a lea instruction */
@@ -1987,7 +2066,8 @@
 #define m_AMDFAM10 (1<<PROCESSOR_AMDFAM10)
 #define m_BDVER1 (1<<PROCESSOR_BDVER1)
 #define m_BDVER2 (1<<PROCESSOR_BDVER2)
-#define m_BDVER	(m_BDVER1 | m_BDVER2)
+#define m_BDVER3 (1<<PROCESSOR_BDVER3)
+#define m_BDVER	(m_BDVER1 | m_BDVER2 | m_BDVER3)
 #define m_BTVER (m_BTVER1 | m_BTVER2)
 #define m_BTVER1 (1<<PROCESSOR_BTVER1)
 #define m_BTVER2 (1<<PROCESSOR_BTVER2)
@@ -2690,6 +2770,7 @@
   {&amdfam10_cost, 32, 24, 32, 7, 32},
   {&bdver1_cost, 32, 24, 32, 7, 32},
   {&bdver2_cost, 32, 24, 32, 7, 32},
+  {&bdver3_cost, 32, 24, 32, 7, 32},
   {&btver1_cost, 32, 24, 32, 7, 32},
   {&btver2_cost, 32, 24, 32, 7, 32},
   {&atom_cost, 16, 15, 16, 7, 16}
@@ -2722,6 +2803,7 @@
   "amdfam10",
   "bdver1",
   "bdver2",
+  "bdver3",
   "btver1",
   "btver2"
 };
@@ -3173,18 +3255,24 @@
 	PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3
 	| PTA_SSE4A | PTA_CX16 | PTA_ABM | PTA_SSSE3 | PTA_SSE4_1
 	| PTA_SSE4_2 | PTA_AES | PTA_PCLMUL | PTA_AVX | PTA_FMA4
-	| PTA_XOP | PTA_LWP | PTA_PRFCHW | PTA_FXSR | PTA_XSAVE
-	| PTA_XSAVEOPT},
+	| PTA_XOP | PTA_LWP | PTA_PRFCHW | PTA_FXSR | PTA_XSAVE},
       {"bdver2", PROCESSOR_BDVER2, CPU_BDVER2,
 	PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3
 	| PTA_SSE4A | PTA_CX16 | PTA_ABM | PTA_SSSE3 | PTA_SSE4_1
 	| PTA_SSE4_2 | PTA_AES | PTA_PCLMUL | PTA_AVX | PTA_FMA4
 	| PTA_XOP | PTA_LWP | PTA_BMI | PTA_TBM | PTA_F16C
-	| PTA_FMA | PTA_PRFCHW | PTA_FXSR | PTA_XSAVE | PTA_XSAVEOPT},
+	| PTA_FMA | PTA_PRFCHW | PTA_FXSR | PTA_XSAVE},
+      {"bdver3", PROCESSOR_BDVER3, CPU_BDVER3,
+	PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3
+	| PTA_SSE4A | PTA_CX16 | PTA_ABM | PTA_SSSE3 | PTA_SSE4_1
+	| PTA_SSE4_2 | PTA_AES | PTA_PCLMUL | PTA_AVX
+	| PTA_XOP | PTA_LWP | PTA_BMI | PTA_TBM | PTA_F16C
+	| PTA_FMA | PTA_PRFCHW | PTA_FXSR | PTA_XSAVE 
+	| PTA_XSAVEOPT},
       {"btver1", PROCESSOR_BTVER1, CPU_GENERIC64,
 	PTA_64BIT | PTA_MMX |  PTA_SSE  | PTA_SSE2 | PTA_SSE3
 	| PTA_SSSE3 | PTA_SSE4A |PTA_ABM | PTA_CX16 | PTA_PRFCHW
-	| PTA_FXSR | PTA_XSAVE | PTA_XSAVEOPT},
+	| PTA_FXSR | PTA_XSAVE},
       {"btver2", PROCESSOR_BTVER2, CPU_GENERIC64,
 	PTA_64BIT | PTA_MMX |  PTA_SSE  | PTA_SSE2 | PTA_SSE3
 	| PTA_SSSE3 | PTA_SSE4A |PTA_ABM | PTA_CX16 | PTA_SSE4_1
@@ -24073,6 +24161,7 @@
     case PROCESSOR_GENERIC64:
     case PROCESSOR_BDVER1:
     case PROCESSOR_BDVER2:
+    case PROCESSOR_BDVER3:
     case PROCESSOR_BTVER1:
       return 3;
 
@@ -24262,6 +24351,7 @@
     case PROCESSOR_AMDFAM10:
     case PROCESSOR_BDVER1:
     case PROCESSOR_BDVER2:
+    case PROCESSOR_BDVER3:
     case PROCESSOR_BTVER1:
     case PROCESSOR_BTVER2:
     case PROCESSOR_ATOM:
@@ -28591,7 +28681,8 @@
     M_AMDFAM10H_SHANGHAI,
     M_AMDFAM10H_ISTANBUL,
     M_AMDFAM15H_BDVER1,
-    M_AMDFAM15H_BDVER2
+    M_AMDFAM15H_BDVER2,
+    M_AMDFAM15H_BDVER3
   };
 
   static struct _arch_names_table
@@ -28616,6 +28707,7 @@
       {"amdfam15h", M_AMDFAM15H},
       {"bdver1", M_AMDFAM15H_BDVER1},
       {"bdver2", M_AMDFAM15H_BDVER2},
+      {"bdver3", M_AMDFAM15H_BDVER3},
     };
 
   static struct _isa_names_table
@@ -40962,7 +41054,7 @@
 static bool
 has_dispatch (rtx insn, int action)
 {
-  if ((TARGET_BDVER1 || TARGET_BDVER2)
+  if ((TARGET_BDVER1 || TARGET_BDVER2 || TARGET_BDVER3)
       && flag_dispatch_scheduler)
     switch (action)
       {

[-- Attachment #3: bdver3.md --]
[-- Type: application/octet-stream, Size: 32733 bytes --]

;; Copyright (C) 2012, Free Software Foundation, Inc.
;;
;; This file is part of GCC.
;;
;; GCC is free software; you can redistribute it and/or modify
;; it under the terms of the GNU General Public License as published by
;; the Free Software Foundation; either version 3, or (at your option)
;; any later version.
;;
;; GCC is distributed in the hope that it will be useful,
;; but WITHOUT ANY WARRANTY; without even the implied warranty of
;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
;; GNU General Public License for more details.
;;
;; You should have received a copy of the GNU General Public License
;; along with GCC; see the file COPYING3.  If not see
;; <http://www.gnu.org/licenses/>.
;;
;; AMD bdver3 Scheduling
;;
;; The bdver3 contains three pipelined FP units and two integer units.
;; Fetching and decoding logic is different from previous fam15 processors.
;; Fetching is done every two cycles rather than every cycle and
;; two decode units are available. The decode units therefore decode
;; four instructions in two cycles.
;;
;; The load/store queue unit is not attached to the schedulers but
;; communicates with all the execution units separately instead.
;;
;; bdver3 belong to fam15 processors. We use the same insn attribute
;; that was used for bdver1 decoding scheme.

(define_automaton "bdver3,bdver3_ieu,bdver3_load,bdver3_fp,bdver3_agu")

(define_cpu_unit "bdver3-decode0" "bdver3")
(define_cpu_unit "bdver3-decode1" "bdver3")
(define_cpu_unit "bdver3-decodev" "bdver3")

;; Double decoded instructions take two cycles whereas
;; direct instructions take one cycle.
;; Therefore four direct instructions can be decoded by
;; two decoders in two cycles.
;; Vectorpath instructions are single issue instructions.
;; So, we have separate unit for vector instructions.
(exclusion_set "bdver3-decodev" "bdver3-decode0,bdver3-decode1")

(define_reservation "bdver3-vector" "bdver3-decodev")
(define_reservation "bdver3-direct" "(bdver3-decode0|bdver3-decode1)")
;; Double instructions take two cycles to decode.
(define_reservation "bdver3-double" "(bdver3-decode0|bdver3-decode1)*2")

(define_cpu_unit "bdver3-ieu0" "bdver3_ieu")
(define_cpu_unit "bdver3-ieu1" "bdver3_ieu")
(define_reservation "bdver3-ieu" "(bdver3-ieu0|bdver3-ieu1)")

(define_cpu_unit "bdver3-agu0" "bdver3_agu")
(define_cpu_unit "bdver3-agu1" "bdver3_agu")
(define_reservation "bdver3-agu" "(bdver3-agu0|bdver3-agu1)")

(define_cpu_unit "bdver3-load0" "bdver3_load")
(define_cpu_unit "bdver3-load1" "bdver3_load")
(define_reservation "bdver3-load" "bdver3-agu,
				   (bdver3-load0|bdver3-load1),nothing")
;; 128bit SSE instructions issue two loads at once.
(define_reservation "bdver3-load2" "bdver3-agu,
				   (bdver3-load0+bdver3-load1),nothing")

(define_reservation "bdver3-store" "(bdver3-load0 | bdver3-load1)")
;; 128bit SSE instructions issue two stores at once.
(define_reservation "bdver3-store2" "(bdver3-load0+bdver3-load1)")

;; vectorpath (microcoded) instructions are single issue instructions.
;; So, they occupy all the integer units.
(define_reservation "bdver3-ivector" "bdver3-ieu0+bdver3-ieu1+
                                      bdver3-agu0+bdver3-agu1+
                                      bdver3-load0+bdver3-load1")

(define_reservation "bdver3-fpsched" "nothing,nothing,nothing")

;; The floating point loads.
(define_reservation "bdver3-fpload" "(bdver3-fpsched + bdver3-load)")
(define_reservation "bdver3-fpload2" "(bdver3-fpsched + bdver3-load2)")

;; Three FP units.
(define_cpu_unit "bdver3-ffma0" "bdver3_fp")
(define_cpu_unit "bdver3-ffma1" "bdver3_fp")
(define_cpu_unit "bdver3-fpsto" "bdver3_fp")

(define_reservation "bdver3-fvector" "bdver3-ffma0+bdver3-ffma1+
                                      bdver3-fpsto+bdver3-load0+
                                      bdver3-load1")

(define_reservation "bdver3-ffma"     "(bdver3-ffma0 | bdver3-ffma1)")
(define_reservation "bdver3-fcvt"     "bdver3-ffma0")
(define_reservation "bdver3-fmma"     "bdver3-ffma0")
(define_reservation "bdver3-fxbar"    "bdver3-ffma1")
(define_reservation "bdver3-fmal"     "(bdver3-ffma0 | bdver3-fpsto)")
(define_reservation "bdver3-fsto"     "bdver3-fpsto")
(define_reservation "bdver3-fpshuf"    "bdver3-fpsto")

;; Jump instructions are executed in the branch unit completely transparent to us.
(define_insn_reservation "bdver3_call" 2
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "call,callv"))
			 "bdver3-double,(bdver3-agu | bdver3-ieu),nothing")
;; PUSH mem is double path.
(define_insn_reservation "bdver3_push" 1
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "push"))
			 "bdver3-direct,bdver3-ieu,bdver3-store")
;; POP r16/mem are double path.
(define_insn_reservation "bdver3_pop" 1
                         (and (eq_attr "cpu" "bdver3")
                              (eq_attr "type" "pop"))
                         "bdver3-direct,bdver3-ivector")
;; LEAVE no latency info so far, assume same with amdfam10.
(define_insn_reservation "bdver3_leave" 3
                         (and (eq_attr "cpu" "bdver3")
                              (eq_attr "type" "leave"))
                         "bdver3-vector,bdver3-ivector")
;; LEA executes in AGU unit with 1 cycle latency on BDVER3.
(define_insn_reservation "bdver3_lea" 1
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "lea"))
			 "bdver3-direct,bdver3-ieu")
;; MUL executes in special multiplier unit attached to IEU1.
(define_insn_reservation "bdver3_imul_DI" 6
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "imul")
				   (and (eq_attr "mode" "DI")
					(eq_attr "memory" "none,unknown"))))
			 "bdver3-direct,bdver3-ieu1")
(define_insn_reservation "bdver3_imul" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "imul")
				   (eq_attr "memory" "none,unknown")))
			 "bdver3-direct,bdver3-ieu1")
(define_insn_reservation "bdver3_imul_mem_DI" 10
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "imul")
				   (and (eq_attr "mode" "DI")
					(eq_attr "memory" "load,both"))))
			 "bdver3-direct,bdver3-load,bdver3-ieu1")
(define_insn_reservation "bdver3_imul_mem" 8
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "imul")
				   (eq_attr "memory" "load,both")))
			 "bdver3-direct,bdver3-load,bdver3-ieu1")

(define_insn_reservation "bdver3_str" 6
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "str")
				   (eq_attr "memory" "load,both,store")))
			 "bdver3-vector,bdver3-load,bdver3-ivector")

;; Integer instructions.
(define_insn_reservation "bdver3_idirect" 1
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "bdver1_decode" "direct")
				   (and (eq_attr "unit" "integer,unknown")
					(eq_attr "memory" "none,unknown"))))
			 "bdver3-direct,(bdver3-ieu|bdver3-agu)")
(define_insn_reservation "bdver3_ivector" 2
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "bdver1_decode" "vector")
				   (and (eq_attr "unit" "integer,unknown")
					(eq_attr "memory" "none,unknown"))))
			 "bdver3-vector,bdver3-ivector")
(define_insn_reservation "bdver3_idirect_loadmov" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "imov")
				   (eq_attr "memory" "load")))
			 "bdver3-direct,bdver3-load")
(define_insn_reservation "bdver3_idirect_load" 5
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "bdver1_decode" "direct")
				   (and (eq_attr "unit" "integer,unknown")
					(eq_attr "memory" "load"))))
			 "bdver3-direct,bdver3-load,bdver3-ieu")
(define_insn_reservation "bdver3_idirect_movstore" 5
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "imov")
				   (eq_attr "memory" "store")))
			 "bdver3-direct,bdver3-ieu,bdver3-store")
(define_insn_reservation "bdver3_idirect_both" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "bdver1_decode" "direct")
				   (and (eq_attr "unit" "integer,unknown")
					(eq_attr "memory" "both"))))
			 "bdver3-direct,bdver3-load,
			  bdver3-ieu,bdver3-store,
			  bdver3-store")
(define_insn_reservation "bdver3_idirect_store" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "bdver1_decode" "direct")
				   (and (eq_attr "unit" "integer,unknown")
					(eq_attr "memory" "store"))))
			 "bdver3-direct,(bdver3-ieu+bdver3-agu),
			  bdver3-store")
;; BDVER3 floating point units.
(define_insn_reservation "bdver3_fldxf" 13
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "fmov")
				   (and (eq_attr "memory" "load")
					(eq_attr "mode" "XF"))))
			 "bdver3-vector,bdver3-fpload2,bdver3-fvector*9")
(define_insn_reservation "bdver3_fld" 2
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "fmov")
				   (eq_attr "memory" "load")))
			 "bdver3-direct,bdver3-fpload,bdver3-ffma")
(define_insn_reservation "bdver3_fstxf" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "fmov")
				   (and (eq_attr "memory" "store,both")
					(eq_attr "mode" "XF"))))
			 "bdver3-vector,(bdver3-fpsched+bdver3-agu),(bdver3-store2+(bdver3-fvector*6))")
(define_insn_reservation "bdver3_fst" 2
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "fmov")
				   (eq_attr "memory" "store,both")))
			 "bdver3-double,(bdver3-fpsched),(bdver3-fsto+bdver3-store)")
(define_insn_reservation "bdver3_fist" 2
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "fistp,fisttp"))
			 "bdver3-double,(bdver3-fpsched),(bdver3-fsto+bdver3-store)")
(define_insn_reservation "bdver3_fmov_bdver3" 2
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "fmov"))
			 "bdver3-direct,bdver3-fpsched,bdver3-ffma")
(define_insn_reservation "bdver3_fadd_load" 10
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "fop")
				   (eq_attr "memory" "load")))
			 "bdver3-direct,bdver3-fpload,bdver3-ffma")
(define_insn_reservation "bdver3_fadd" 6
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "fop"))
			 "bdver3-direct,bdver3-fpsched,bdver3-ffma")
(define_insn_reservation "bdver3_fmul_load" 6
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "fmul")
				   (eq_attr "memory" "load")))
			 "bdver3-double,bdver3-fpload,bdver3-ffma")
(define_insn_reservation "bdver3_fmul" 6
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "fmul"))
			 "bdver3-direct,bdver3-fpsched,bdver3-ffma")
(define_insn_reservation "bdver3_fsgn" 2
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "fsgn"))
			 "bdver3-direct,bdver3-fpsched,bdver3-ffma")
(define_insn_reservation "bdver3_fdiv_load" 42
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "fdiv")
				   (eq_attr "memory" "load")))
			 "bdver3-direct,bdver3-fpload,bdver3-ffma")
(define_insn_reservation "bdver3_fdiv" 42
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "fdiv"))
			 "bdver3-direct,bdver3-fpsched,bdver3-ffma")
(define_insn_reservation "bdver3_fpspc_load" 143
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "fpspc")
				   (eq_attr "memory" "load")))
			 "bdver3-vector,bdver3-fpload,bdver3-fvector")
(define_insn_reservation "bdver3_fcmov_load" 17
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "fcmov")
				   (eq_attr "memory" "load")))
			 "bdver3-vector,bdver3-fpload,bdver3-fvector")
(define_insn_reservation "bdver3_fcmov" 15
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "fcmov"))
			 "bdver3-vector,bdver3-fpsched,bdver3-fvector")
(define_insn_reservation "bdver3_fcomi_load" 6
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "fcmp")
				   (and (eq_attr "bdver1_decode" "double")
					(eq_attr "memory" "load"))))
			 "bdver3-double,bdver3-fpload,(bdver3-ffma | bdver3-fsto)")
(define_insn_reservation "bdver3_fcomi" 2
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "bdver1_decode" "double")
				   (eq_attr "type" "fcmp")))
			 "bdver3-double,bdver3-fpsched,(bdver3-ffma | bdver3-fsto)")
(define_insn_reservation "bdver3_fcom_load" 6
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "fcmp")
				   (eq_attr "memory" "load")))
			 "bdver3-direct,bdver3-fpload,bdver3-ffma")
(define_insn_reservation "bdver3_fcom" 2
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "fcmp"))
			 "bdver3-direct,bdver3-fpsched,bdver3-ffma")
(define_insn_reservation "bdver3_fxch" 2
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "fxch"))
			 "bdver3-direct,bdver3-fpsched,bdver3-ffma")

;; SSE loads.
(define_insn_reservation "bdver3_ssevector_avx128_unaligned_load" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemov")
				   (and (eq_attr "prefix" "vex")
					(and (eq_attr "movu" "1")
					     (and (eq_attr "mode" "V4SF,V2DF")
						  (eq_attr "memory" "load"))))))
			 "bdver3-direct,bdver3-fpload")
(define_insn_reservation "bdver3_ssevector_avx256_unaligned_load" 5
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemov")
				   (and (eq_attr "movu" "1")
				        (and (eq_attr "mode" "V8SF,V4DF")
				             (eq_attr "memory" "load")))))
			 "bdver3-double,bdver3-fpload")
(define_insn_reservation "bdver3_ssevector_sse128_unaligned_load" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemov")
				   (and (eq_attr "movu" "1")
				        (and (eq_attr "mode" "V4SF,V2DF")
				             (eq_attr "memory" "load")))))
			 "bdver3-direct,bdver3-fpload,bdver3-fmal")
(define_insn_reservation "bdver3_ssevector_avx128_load" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemov")
				   (and (eq_attr "prefix" "vex")
				        (and (eq_attr "mode" "V4SF,V2DF,TI")
				             (eq_attr "memory" "load")))))
			 "bdver3-direct,bdver3-fpload,bdver3-fmal")
(define_insn_reservation "bdver3_ssevector_avx256_load" 5
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemov")
				   (and (eq_attr "mode" "V8SF,V4DF,OI")
				        (eq_attr "memory" "load"))))
			 "bdver3-double,bdver3-fpload,bdver3-fmal")
(define_insn_reservation "bdver3_ssevector_sse128_load" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemov")
				   (and (eq_attr "mode" "V4SF,V2DF,TI")
				        (eq_attr "memory" "load"))))
			 "bdver3-direct,bdver3-fpload")
(define_insn_reservation "bdver3_ssescalar_movq_load" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemov")
				   (and (eq_attr "mode" "DI")
				        (eq_attr "memory" "load"))))
			 "bdver3-direct,bdver3-fpload,bdver3-fmal")
(define_insn_reservation "bdver3_ssescalar_vmovss_load" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemov")
				   (and (eq_attr "prefix" "vex")
				        (and (eq_attr "mode" "SF")
				             (eq_attr "memory" "load")))))
			 "bdver3-direct,bdver3-fpload")
(define_insn_reservation "bdver3_ssescalar_sse128_load" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemov")
				   (and (eq_attr "mode" "SF,DF")
				        (eq_attr "memory" "load"))))
			 "bdver3-direct,bdver3-fpload, bdver3-ffma")
(define_insn_reservation "bdver3_mmxsse_load" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "mmxmov,ssemov")
				   (eq_attr "memory" "load")))
			 "bdver3-direct,bdver3-fpload, bdver3-fmal")

;; SSE stores.
(define_insn_reservation "bdver3_sse_store_avx256" 5
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemov")
				   (and (eq_attr "mode" "V8SF,V4DF,OI")
					(eq_attr "memory" "store,both"))))
			 "bdver3-double,bdver3-fpsched,((bdver3-fsto+bdver3-store)*2)")
(define_insn_reservation "bdver3_sse_store" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemov")
				   (and (eq_attr "mode" "V4SF,V2DF,TI")
					(eq_attr "memory" "store,both"))))
			 "bdver3-direct,bdver3-fpsched,((bdver3-fsto+bdver3-store)*2)")
(define_insn_reservation "bdver3_mmxsse_store_short" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "mmxmov,ssemov")
				   (eq_attr "memory" "store,both")))
			 "bdver3-direct,bdver3-fpsched,(bdver3-fsto+bdver3-store)")

;; Register moves.
(define_insn_reservation "bdver3_ssevector_avx256" 3
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemov")
				   (and (eq_attr "mode" "V8SF,V4DF,OI")
					(eq_attr "memory" "none"))))
			 "bdver3-double,bdver3-fpsched,bdver3-fmal")
(define_insn_reservation "bdver3_movss_movsd" 2
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemov")
				   (and (eq_attr "mode" "SF,DF")
                                        (eq_attr "memory" "none"))))
			 "bdver3-direct,bdver3-fpsched,bdver3-ffma")
(define_insn_reservation "bdver3_mmxssemov" 2
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "mmxmov,ssemov")
				   (eq_attr "memory" "none")))
			 "bdver3-direct,bdver3-fpsched,bdver3-fmal")
;; SSE logs.
(define_insn_reservation "bdver3_sselog_load_256" 7
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "sselog,sselog1")
				   (and (eq_attr "mode" "V8SF")
				   (eq_attr "memory" "load"))))
			 "bdver3-double,bdver3-fpload,bdver3-fmal")
(define_insn_reservation "bdver3_sselog_256" 3
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "sselog,sselog1")
                                   (eq_attr "mode" "V8SF")))
			 "bdver3-double,bdver3-fpsched,bdver3-fmal")
(define_insn_reservation "bdver3_sselog_load" 6
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "sselog,sselog1")
				   (eq_attr "memory" "load")))
			 "bdver3-direct,bdver3-fpload,bdver3-fxbar")
(define_insn_reservation "bdver3_sselog" 2
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "sselog,sselog1"))
			 "bdver3-direct,bdver3-fpsched,bdver3-fxbar")

;; SSE Shuffles
(define_insn_reservation "bdver3_sseshuf_load_256" 7
                         (and (eq_attr "cpu" "bdver3")
                              (and (eq_attr "type" "sseshuf,sseshuf1")
                                   (and (eq_attr "mode" "V8SF")
                                   (eq_attr "memory" "load"))))
                         "bdver3-double,bdver3-fpload,bdver3-fpshuf")
(define_insn_reservation "bdver3_sseshuf_load" 6
                         (and (eq_attr "cpu" "bdver3")
                              (and (eq_attr "type" "sseshuf,sseshuf1")
                                   (eq_attr "memory" "load")))
                         "bdver3-direct,bdver3-fpload,bdver3-fpshuf")

(define_insn_reservation "bdver3_sseshuf_256" 3
                         (and (eq_attr "cpu" "bdver3")
                              (and (eq_attr "type" "sseshuf")
                                   (eq_attr "mode" "V8SF")))
                         "bdver3-double,bdver3-fpsched,bdver3-fpshuf")
(define_insn_reservation "bdver3_sseshuf" 2
                         (and (eq_attr "cpu" "bdver3")
                              (eq_attr "type" "sseshuf,sseshuf1"))
                         "bdver3-direct,bdver3-fpsched,bdver3-fpshuf")

;; PCMP actually executes in FMAL.
(define_insn_reservation "bdver3_ssecmp_load" 6
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecmp")
				   (eq_attr "memory" "load")))
			 "bdver3-direct,bdver3-fpload,bdver3-ffma")
(define_insn_reservation "bdver3_ssecmp" 2
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "ssecmp"))
			 "bdver3-direct,bdver3-fpsched,bdver3-ffma")
(define_insn_reservation "bdver3_ssecomi_load" 6
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecomi")
				   (eq_attr "memory" "load")))
			 "bdver3-double,bdver3-fpload,(bdver3-ffma | bdver3-fsto)")
(define_insn_reservation "bdver3_ssecomi" 2
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "ssecomi"))
			 "bdver3-double,bdver3-fpsched,(bdver3-ffma | bdver3-fsto)")

;; Conversions behaves very irregularly and the scheduling is critical here.
;; Take each instruction separately.

;; 256 bit conversion.
(define_insn_reservation "bdver3_vcvtX2Y_avx256_load" 8
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "load")
					(ior (ior (match_operand:V4DF 0 "register_operand")
					          (ior (match_operand:V8SF 0 "register_operand")
						       (match_operand:V8SI 0 "register_operand")))
					     (ior (match_operand:V4DF 1 "nonimmediate_operand")
						  (ior (match_operand:V8SF 1 "nonimmediate_operand")
						       (match_operand:V8SI 1 "nonimmediate_operand")))))))
			 "bdver3-vector,bdver3-fpload,bdver3-fvector")
(define_insn_reservation "bdver3_vcvtX2Y_avx256" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "none")
					(ior (ior (match_operand:V4DF 0 "register_operand")
					          (ior (match_operand:V8SF 0 "register_operand")
						       (match_operand:V8SI 0 "register_operand")))
					     (ior (match_operand:V4DF 1 "nonimmediate_operand")
						  (ior (match_operand:V8SF 1 "nonimmediate_operand")
						       (match_operand:V8SI 1 "nonimmediate_operand")))))))
			 "bdver3-vector,bdver3-fpsched,bdver3-fvector")
;; CVTSS2SD, CVTSD2SS.
(define_insn_reservation "bdver3_ssecvt_cvtss2sd_load" 8
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "mode" "SF,DF")
					(eq_attr "memory" "load"))))
			 "bdver3-direct,bdver3-fpload,bdver3-fcvt")
(define_insn_reservation "bdver3_ssecvt_cvtss2sd" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "mode" "SF,DF")
					(eq_attr "memory" "none"))))
			 "bdver3-direct,bdver3-fpsched,bdver3-fcvt")
;; CVTSI2SD, CVTSI2SS, CVTSI2SDQ, CVTSI2SSQ.
(define_insn_reservation "bdver3_sseicvt_cvtsi2sd_load" 8
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "sseicvt")
				   (and (eq_attr "mode" "SF,DF")
					(eq_attr "memory" "load"))))
			 "bdver3-direct,bdver3-fpload,bdver3-fcvt")
(define_insn_reservation "bdver3_sseicvt_cvtsi2sd" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "sseicvt")
				   (and (eq_attr "mode" "SF,DF")
					(eq_attr "memory" "none"))))
			 "bdver3-double,bdver3-fpsched,(nothing | bdver3-fcvt)")
;; CVTPD2PS.
(define_insn_reservation "bdver3_ssecvt_cvtpd2ps_load" 8
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "load")
                                        (and (match_operand:V4SF 0 "register_operand")
					     (match_operand:V2DF 1 "nonimmediate_operand")))))
			 "bdver3-double,bdver3-fpload,(bdver3-fxbar | bdver3-fcvt)")
(define_insn_reservation "bdver3_ssecvt_cvtpd2ps" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "none")
                                        (and (match_operand:V4SF 0 "register_operand")
					     (match_operand:V2DF 1 "nonimmediate_operand")))))
			 "bdver3-double,bdver3-fpsched,(bdver3-fxbar | bdver3-fcvt)")
;; CVTPI2PS, CVTDQ2PS.
(define_insn_reservation "bdver3_ssecvt_cvtdq2ps_load" 8
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "load")
                                        (and (match_operand:V4SF 0 "register_operand")
					     (ior (match_operand:V2SI 1 "nonimmediate_operand")
					          (match_operand:V4SI 1 "nonimmediate_operand"))))))
			 "bdver3-direct,bdver3-fpload,bdver3-fcvt")
(define_insn_reservation "bdver3_ssecvt_cvtdq2ps" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "none")
                                        (and (match_operand:V4SF 0 "register_operand")
					     (ior (match_operand:V2SI 1 "nonimmediate_operand")
					          (match_operand:V4SI 1 "nonimmediate_operand"))))))
			 "bdver3-direct,bdver3-fpsched,bdver3-fcvt")
;; CVTDQ2PD.
(define_insn_reservation "bdver3_ssecvt_cvtdq2pd_load" 8
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "load")
                                        (and (match_operand:V2DF 0 "register_operand")
					     (match_operand:V4SI 1 "nonimmediate_operand")))))
			 "bdver3-double,bdver3-fpload,(bdver3-fxbar | bdver3-fcvt)")
(define_insn_reservation "bdver3_ssecvt_cvtdq2pd" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "none")
                                        (and (match_operand:V2DF 0 "register_operand")
					     (match_operand:V4SI 1 "nonimmediate_operand")))))
			 "bdver3-double,bdver3-fpsched,(bdver3-fxbar | bdver3-fcvt)")
;; CVTPS2PD, CVTPI2PD.
(define_insn_reservation "bdver3_ssecvt_cvtps2pd_load" 6
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "load")
                                        (and (match_operand:V2DF 0 "register_operand")
					     (ior (match_operand:V2SI 1 "nonimmediate_operand")
					          (match_operand:V4SF 1 "nonimmediate_operand"))))))
			 "bdver3-double,bdver3-fpload,(bdver3-fxbar | bdver3-fcvt)")
(define_insn_reservation "bdver3_ssecvt_cvtps2pd" 2
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "load")
                                        (and (match_operand:V2DF 0 "register_operand")
					     (ior (match_operand:V2SI 1 "nonimmediate_operand")
					          (match_operand:V4SF 1 "nonimmediate_operand"))))))
			 "bdver3-double,bdver3-fpsched,(bdver3-fxbar | bdver3-fcvt)")
;; CVTSD2SI, CVTSD2SIQ, CVTSS2SI, CVTSS2SIQ, CVTTSD2SI, CVTTSD2SIQ, CVTTSS2SI, CVTTSS2SIQ.
(define_insn_reservation "bdver3_ssecvt_cvtsX2si_load" 8
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "sseicvt")
				   (and (eq_attr "mode" "SI,DI")
					(eq_attr "memory" "load"))))
			 "bdver3-double,bdver3-fpload,(bdver3-fcvt | bdver3-fsto)")
(define_insn_reservation "bdver3_ssecvt_cvtsX2si" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "sseicvt")
				   (and (eq_attr "mode" "SI,DI")
					(eq_attr "memory" "none"))))
			 "bdver3-double,bdver3-fpsched,(bdver3-fcvt | bdver3-fsto)")
;; CVTPD2PI, CVTTPD2PI.
(define_insn_reservation "bdver3_ssecvt_cvtpd2pi_load" 8
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "load")
				        (and (match_operand:V2DF 1 "nonimmediate_operand")
					     (match_operand:V2SI 0 "register_operand")))))
			 "bdver3-double,bdver3-fpload,(bdver3-fcvt | bdver3-fxbar)")
(define_insn_reservation "bdver3_ssecvt_cvtpd2pi" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "none")
				        (and (match_operand:V2DF 1 "nonimmediate_operand")
					     (match_operand:V2SI 0 "register_operand")))))
			 "bdver3-double,bdver3-fpsched,(bdver3-fcvt | bdver3-fxbar)")
;; CVTPD2DQ, CVTTPD2DQ.
(define_insn_reservation "bdver3_ssecvt_cvtpd2dq_load" 6
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "load")
				        (and (match_operand:V2DF 1 "nonimmediate_operand")
					     (match_operand:V4SI 0 "register_operand")))))
			 "bdver3-double,bdver3-fpload,(bdver3-fcvt | bdver3-fxbar)")
(define_insn_reservation "bdver3_ssecvt_cvtpd2dq" 2
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "none")
				        (and (match_operand:V2DF 1 "nonimmediate_operand")
					     (match_operand:V4SI 0 "register_operand")))))
			 "bdver3-double,bdver3-fpsched,(bdver3-fcvt | bdver3-fxbar)")
;; CVTPS2PI, CVTTPS2PI, CVTPS2DQ, CVTTPS2DQ.
(define_insn_reservation "bdver3_ssecvt_cvtps2pi_load" 8
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
                                   (and (eq_attr "memory" "load")
				        (and (match_operand:V4SF 1 "nonimmediate_operand")
				             (ior (match_operand: V2SI 0 "register_operand")
						  (match_operand: V4SI 0 "register_operand"))))))
			 "bdver3-direct,bdver3-fpload,bdver3-fcvt")
(define_insn_reservation "bdver3_ssecvt_cvtps2pi" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "none")
				        (and (match_operand:V4SF 1 "nonimmediate_operand")
				             (ior (match_operand: V2SI 0 "register_operand")
						  (match_operand: V4SI 0 "register_operand"))))))
			 "bdver3-direct,bdver3-fpsched,bdver3-fcvt")

;; SSE MUL, ADD, and MULADD.
(define_insn_reservation "bdver3_ssemuladd_load_256" 11
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemul,sseadd,sseadd1,ssemuladd")
				   (and (eq_attr "mode" "V8SF,V4DF")
					(eq_attr "memory" "load"))))
			 "bdver3-double,bdver3-fpload,bdver3-ffma")
(define_insn_reservation "bdver3_ssemuladd_256" 7
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemul,sseadd,sseadd1,ssemuladd")
				   (and (eq_attr "mode" "V8SF,V4DF")
					(eq_attr "memory" "none"))))
			 "bdver3-double,bdver3-fpsched,bdver3-ffma")
(define_insn_reservation "bdver3_ssemuladd_load" 10
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemul,sseadd,sseadd1,ssemuladd")
				   (eq_attr "memory" "load")))
			 "bdver3-direct,bdver3-fpload,bdver3-ffma")
(define_insn_reservation "bdver3_ssemuladd" 6
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemul,sseadd,sseadd1,ssemuladd")
				   (eq_attr "memory" "none")))
			 "bdver3-direct,bdver3-fpsched,bdver3-ffma")
(define_insn_reservation "bdver3_sseimul_load" 8
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "sseimul")
				   (eq_attr "memory" "load")))
			 "bdver3-direct,bdver3-fpload,bdver3-fmma")
(define_insn_reservation "bdver3_sseimul" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "sseimul")
				   (eq_attr "memory" "none")))
			 "bdver3-direct,bdver3-fpsched,bdver3-fmma")
(define_insn_reservation "bdver3_sseiadd_load" 6
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "sseiadd")
				   (eq_attr "memory" "load")))
			 "bdver3-direct,bdver3-fpload,bdver3-fmal")
(define_insn_reservation "bdver3_sseiadd" 2
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "sseiadd")
				   (eq_attr "memory" "none")))
			 "bdver3-direct,bdver3-fpsched,bdver3-fmal")

;; SSE DIV: no throughput information (assume same as amdfam10).
(define_insn_reservation "bdver3_ssediv_double_load_256" 27
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssediv")
				   (and (eq_attr "mode" "V4DF")
				        (eq_attr "memory" "load"))))
			 "bdver3-double,bdver3-fpload,(bdver3-ffma0*17 | bdver3-ffma1*17)")
(define_insn_reservation "bdver3_ssediv_double_256" 27
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssediv")
				   (and (eq_attr "mode" "V4DF")
				        (eq_attr "memory" "none"))))
			 "bdver3-double,bdver3-fpsched,(bdver3-ffma0*17 | bdver3-ffma1*17)")
(define_insn_reservation "bdver3_ssediv_single_load_256" 27
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssediv")
				   (and (eq_attr "mode" "V8SF")
				        (eq_attr "memory" "load"))))
			 "bdver3-double,bdver3-fpload,(bdver3-ffma0*17 | bdver3-ffma1*17)")
(define_insn_reservation "bdver3_ssediv_single_256" 24
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssediv")
				   (and (eq_attr "mode" "V8SF")
				        (eq_attr "memory" "none"))))
			 "bdver3-double,bdver3-fpsched,(bdver3-ffma0*17 | bdver3-ffma1*17)")
(define_insn_reservation "bdver3_ssediv_double_load" 27
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssediv")
				   (and (eq_attr "mode" "DF,V2DF")
					(eq_attr "memory" "load"))))
			 "bdver3-direct,bdver3-fpload,(bdver3-ffma0*17 | bdver3-ffma1*17)")
(define_insn_reservation "bdver3_ssediv_double" 27
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssediv")
				   (and (eq_attr "mode" "DF,V2DF")
					(eq_attr "memory" "none"))))
			 "bdver3-direct,bdver3-fpsched,(bdver3-ffma0*17 | bdver3-ffma1*17)")
(define_insn_reservation "bdver3_ssediv_single_load" 27 
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssediv")
				   (and (eq_attr "mode" "SF,V4SF")
					(eq_attr "memory" "load"))))
			 "bdver3-direct,bdver3-fpload,(bdver3-ffma0*17 | bdver3-ffma1*17)")
(define_insn_reservation "bdver3_ssediv_single" 24
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssediv")
				   (and (eq_attr "mode" "SF,V4SF")
					(eq_attr "memory" "none"))))
			 "bdver3-direct,bdver3-fpsched,(bdver3-ffma0*17 | bdver3-ffma1*17)")

(define_insn_reservation "bdver3_sseins" 3
                         (and (eq_attr "cpu" "bdver3")
                              (and (eq_attr "type" "sseins")
                                   (eq_attr "mode" "TI")))
                         "bdver3-direct,bdver3-fpsched,bdver3-fxbar")


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH, i386]: AMD bdver3 enablement
  2012-11-09  3:39       ` Gopalasubramanian, Ganesh
@ 2012-11-11 21:00         ` Uros Bizjak
  2012-11-12  5:35           ` Gopalasubramanian, Ganesh
  0 siblings, 1 reply; 12+ messages in thread
From: Uros Bizjak @ 2012-11-11 21:00 UTC (permalink / raw)
  To: Gopalasubramanian, Ganesh; +Cc: gcc-patches

On Fri, Nov 9, 2012 at 4:39 AM, Gopalasubramanian, Ganesh
<Ganesh.Gopalasubramanian@amd.com> wrote:

> Changes done with respect to the review comments.
> Conditionally setting "sseshuf" type attribute has been removed.
> Instead new attribute is added and is included for other attribute calculations.
>
> The patch is attached as (difflog.txt).
> The new file (bdver3.md) describing the pipelines is also attached.
>
> Bootstrapping and "make -k check" passes.
>
> OK for upstream?
>
> 2012-11-09  Ganesh Gopalasubramanian  <Ganesh.Gopalasubramanian@amd.com>
>
>         bdver3 Enablement
>         * gcc/doc/extend.texi: Add details about bdver3.
>         * gcc/doc/invoke.texi: Add details about bdver3.
>         * config.gcc (i[34567]86-*-linux* | ...): Add bdver3.
>         (case ${target}): Add bdver3.
>         * config/i386/i386.h (TARGET_BDVER3): New definition.
>         * config/i386/i386.md (define_attr "cpu"): Add bdver3.
>         * config/i386/sse.md (sseshuf): New type attribute.
>         * config/i386/athlon.md (sseshuf):Likewise.
>         * config/i386/atom.md (sseshuf):Likewise.
>         * config/i386/ppro.md (sseshuf):Likewise.

Index: gcc/config/i386/atom.md
===================================================================
--- gcc/config/i386/atom.md	(revision 193132)
+++ gcc/config/i386/atom.md	(working copy)
@@ -455,6 +455,30 @@
             (eq_attr "memory" "!none")))
   "atom-simple-0")

+(define_insn_reservation  "atom_sseshuf" 1
+  (and (eq_attr "cpu" "atom")
+       (and (eq_attr "type" "sseshuf")
+            (eq_attr "memory" "none")))
+  "atom-simple-either")
+
+(define_insn_reservation  "atom_sseshuf_mem" 1
+  (and (eq_attr "cpu" "atom")
+       (and (eq_attr "type" "sseshuf")
+            (eq_attr "memory" "!none")))
+  "atom-simple-either")
+
+(define_insn_reservation  "atom_sseshuf1" 1
+  (and (eq_attr "cpu" "atom")
+       (and (eq_attr "type" "sseshuf1")
+            (eq_attr "memory" "none")))
+  "atom-simple-0")
+
+(define_insn_reservation  "atom_sseshuf1_mem" 1
+  (and (eq_attr "cpu" "atom")
+       (and (eq_attr "type" "sseshuf1")
+            (eq_attr "memory" "!none")))
+  "atom-simple-0")
+
 ;; not pmad, not psad
 (define_insn_reservation  "atom_sseiadd" 1
   (and (eq_attr "cpu" "atom")

This was not what I had in mind for changes in existing .md files.
Just change them in this way:

Index: atom.md
===================================================================
--- atom.md     (revision 193407)
+++ atom.md     (working copy)
@@ -594,7 +594,7 @@
 ;; no memory simple
 (define_insn_reservation  "atom_sseadd" 5
   (and (eq_attr "cpu" "atom")
-       (and (eq_attr "type" "sseadd,sseadd1")
+       (and (eq_attr "type" "sseadd,sseshuf,sseadd1,sseshuf1")
             (and (eq_attr "memory" "none")
                  (and (eq_attr "mode" "!V2DF")
                       (eq_attr "atom_unit" "!complex")))))
@@ -603,7 +603,7 @@
 ;; memory simple
 (define_insn_reservation  "atom_sseadd_mem" 5
   (and (eq_attr "cpu" "atom")
-       (and (eq_attr "type" "sseadd,sseadd1")
+       (and (eq_attr "type" "sseadd,sseshuf,sseadd1,sseshuf1")
             (and (eq_attr "memory" "!none")
                  (and (eq_attr "mode" "!V2DF")
                       (eq_attr "atom_unit" "!complex")))))
@@ -612,7 +612,7 @@
 ;; maxps, minps, *pd, hadd, hsub
 (define_insn_reservation  "atom_sseadd_3" 8
   (and (eq_attr "cpu" "atom")
-       (and (eq_attr "type" "sseadd,sseadd1")
+       (and (eq_attr "type" "sseadd,sseshuf,sseadd1,sseshuf1")
             (ior (eq_attr "mode" "V2DF") (eq_attr "atom_unit" "complex"))))
   "atom-complex, atom-all-eu*7")

You can see from the changes of sse.md that this is functionally a no-op change.

Uros.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH, i386]: AMD bdver3 enablement
  2012-11-11 21:00         ` Uros Bizjak
@ 2012-11-12  5:35           ` Gopalasubramanian, Ganesh
  2012-11-12  8:09             ` Uros Bizjak
  0 siblings, 1 reply; 12+ messages in thread
From: Gopalasubramanian, Ganesh @ 2012-11-12  5:35 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches

> You can see from the changes of sse.md that this is functionally a no-op change.
Sseshuf replaces sselog.
So, do you mean it should be added with sselog instead of sseadd?
Adding it with sseadd (instead of sselog) influences the latency information.

Regards
Ganesh

-----Original Message-----
From: Uros Bizjak [mailto:ubizjak@gmail.com] 
Sent: Monday, November 12, 2012 2:30 AM
To: Gopalasubramanian, Ganesh
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH, i386]: AMD bdver3 enablement

On Fri, Nov 9, 2012 at 4:39 AM, Gopalasubramanian, Ganesh
<Ganesh.Gopalasubramanian@amd.com> wrote:

> Changes done with respect to the review comments.
> Conditionally setting "sseshuf" type attribute has been removed.
> Instead new attribute is added and is included for other attribute calculations.
>
> The patch is attached as (difflog.txt).
> The new file (bdver3.md) describing the pipelines is also attached.
>
> Bootstrapping and "make -k check" passes.
>
> OK for upstream?
>
> 2012-11-09  Ganesh Gopalasubramanian  <Ganesh.Gopalasubramanian@amd.com>
>
>         bdver3 Enablement
>         * gcc/doc/extend.texi: Add details about bdver3.
>         * gcc/doc/invoke.texi: Add details about bdver3.
>         * config.gcc (i[34567]86-*-linux* | ...): Add bdver3.
>         (case ${target}): Add bdver3.
>         * config/i386/i386.h (TARGET_BDVER3): New definition.
>         * config/i386/i386.md (define_attr "cpu"): Add bdver3.
>         * config/i386/sse.md (sseshuf): New type attribute.
>         * config/i386/athlon.md (sseshuf):Likewise.
>         * config/i386/atom.md (sseshuf):Likewise.
>         * config/i386/ppro.md (sseshuf):Likewise.

Index: gcc/config/i386/atom.md
===================================================================
--- gcc/config/i386/atom.md	(revision 193132)
+++ gcc/config/i386/atom.md	(working copy)
@@ -455,6 +455,30 @@
             (eq_attr "memory" "!none")))
   "atom-simple-0")

+(define_insn_reservation  "atom_sseshuf" 1
+  (and (eq_attr "cpu" "atom")
+       (and (eq_attr "type" "sseshuf")
+            (eq_attr "memory" "none")))
+  "atom-simple-either")
+
+(define_insn_reservation  "atom_sseshuf_mem" 1
+  (and (eq_attr "cpu" "atom")
+       (and (eq_attr "type" "sseshuf")
+            (eq_attr "memory" "!none")))
+  "atom-simple-either")
+
+(define_insn_reservation  "atom_sseshuf1" 1
+  (and (eq_attr "cpu" "atom")
+       (and (eq_attr "type" "sseshuf1")
+            (eq_attr "memory" "none")))
+  "atom-simple-0")
+
+(define_insn_reservation  "atom_sseshuf1_mem" 1
+  (and (eq_attr "cpu" "atom")
+       (and (eq_attr "type" "sseshuf1")
+            (eq_attr "memory" "!none")))
+  "atom-simple-0")
+
 ;; not pmad, not psad
 (define_insn_reservation  "atom_sseiadd" 1
   (and (eq_attr "cpu" "atom")

This was not what I had in mind for changes in existing .md files.
Just change them in this way:

Index: atom.md
===================================================================
--- atom.md     (revision 193407)
+++ atom.md     (working copy)
@@ -594,7 +594,7 @@
 ;; no memory simple
 (define_insn_reservation  "atom_sseadd" 5
   (and (eq_attr "cpu" "atom")
-       (and (eq_attr "type" "sseadd,sseadd1")
+       (and (eq_attr "type" "sseadd,sseshuf,sseadd1,sseshuf1")
             (and (eq_attr "memory" "none")
                  (and (eq_attr "mode" "!V2DF")
                       (eq_attr "atom_unit" "!complex")))))
@@ -603,7 +603,7 @@
 ;; memory simple
 (define_insn_reservation  "atom_sseadd_mem" 5
   (and (eq_attr "cpu" "atom")
-       (and (eq_attr "type" "sseadd,sseadd1")
+       (and (eq_attr "type" "sseadd,sseshuf,sseadd1,sseshuf1")
             (and (eq_attr "memory" "!none")
                  (and (eq_attr "mode" "!V2DF")
                       (eq_attr "atom_unit" "!complex")))))
@@ -612,7 +612,7 @@
 ;; maxps, minps, *pd, hadd, hsub
 (define_insn_reservation  "atom_sseadd_3" 8
   (and (eq_attr "cpu" "atom")
-       (and (eq_attr "type" "sseadd,sseadd1")
+       (and (eq_attr "type" "sseadd,sseshuf,sseadd1,sseshuf1")
             (ior (eq_attr "mode" "V2DF") (eq_attr "atom_unit" "complex"))))
   "atom-complex, atom-all-eu*7")

You can see from the changes of sse.md that this is functionally a no-op change.

Uros.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH, i386]: AMD bdver3 enablement
  2012-11-12  5:35           ` Gopalasubramanian, Ganesh
@ 2012-11-12  8:09             ` Uros Bizjak
  2012-11-14  9:22               ` Gopalasubramanian, Ganesh
  0 siblings, 1 reply; 12+ messages in thread
From: Uros Bizjak @ 2012-11-12  8:09 UTC (permalink / raw)
  To: Gopalasubramanian, Ganesh; +Cc: gcc-patches

On Mon, Nov 12, 2012 at 6:34 AM, Gopalasubramanian, Ganesh
<Ganesh.Gopalasubramanian@amd.com> wrote:
>> You can see from the changes of sse.md that this is functionally a no-op change.
> Sseshuf replaces sselog.
> So, do you mean it should be added with sselog instead of sseadd?
> Adding it with sseadd (instead of sselog) influences the latency information.

sseshuf replaces sselog in some insn patterns, but should be handled
in the same way in *existing* .md files.

Uros,

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH, i386]: AMD bdver3 enablement
  2012-11-12  8:09             ` Uros Bizjak
@ 2012-11-14  9:22               ` Gopalasubramanian, Ganesh
  2012-11-14 10:45                 ` Uros Bizjak
  0 siblings, 1 reply; 12+ messages in thread
From: Gopalasubramanian, Ganesh @ 2012-11-14  9:22 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2692 bytes --]

Hi Uros!

> sseshuf replaces sselog in some insn patterns, but should be handled in the same way in *existing* .md files.

Modifications done as per the comments. 
1. Sseshuf is added along with sselog in existing md files.
2. sseshuf is handled in a separate pattern in bdver3.md

Bootstrapping and "make -k check" passes.
Ok for trunk?

Regards
Ganesh

2012-11-14  Ganesh Gopalasubramanian  <Ganesh.Gopalasubramanian@amd.com>

	bdver3 Enablement
	* gcc/doc/extend.texi: Add details about bdver3.
	* gcc/doc/invoke.texi: Add details about bdver3.
	* config.gcc (i[34567]86-*-linux* | ...): Add bdver3.
	(case ${target}): Add bdver3.
	* config/i386/i386.h (TARGET_BDVER3): New definition.
	* config/i386/i386.md (define_attr "cpu"): Add bdver3.
	* config/i386/sse.md (sseshuf): New type attribute.
	* config/i386/athlon.md (sseshuf):Likewise.
	* config/i386/atom.md (sseshuf):Likewise.
	* config/i386/ppro.md (sseshuf):Likewise.
	* config/i386/bdver1.md (sseshuf):Likewise.
	* config/i386/i386.opt (flag_dispatch_scheduler): Add bdver3.
	* config/i386/i386-c.c (ix86_target_macros_internal): Add
	bdver3 def_and_undef
	* config/i386/driver-i386.c (host_detect_local_cpu): Let
	-march=native recognize bdver3 processors.
	* config/i386/i386.c (struct processor_costs bdver3_cost): New.
	(m_BDVER3): New definition.
	(m_AMD_MULTIPLE): Includes m_BDVER3.
	(initial_ix86_tune_features): Add bdver3 tune.
	(processor_target_table): Add bdver3 entry.
	(static const char *const cpu_names): Add bdver3 entry.
	(software_prefetching_beneficial_p): Add bdver3.
	(ix86_option_override_internal): Add bdver3 instruction sets.
	(ix86_option_override_internal): Remove XSAVEOPT for bdver1 
	and bdver2.
	(ix86_issue_rate): Add bdver3.
	(ix86_adjust_cost): Add bdver3.
	(enum target_cpu_default): Add TARGET_CPU_DEFAULT_bdver3.
	(enum processor_type): Add PROCESSOR_BDVER3.
	* config/i386/bdver3.md: New file describing bdver3 pipelines.

-----Original Message-----
From: Uros Bizjak [mailto:ubizjak@gmail.com] 
Sent: Monday, November 12, 2012 1:39 PM
To: Gopalasubramanian, Ganesh
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH, i386]: AMD bdver3 enablement

On Mon, Nov 12, 2012 at 6:34 AM, Gopalasubramanian, Ganesh <Ganesh.Gopalasubramanian@amd.com> wrote:
>> You can see from the changes of sse.md that this is functionally a no-op change.
> Sseshuf replaces sselog.
> So, do you mean it should be added with sselog instead of sseadd?
> Adding it with sseadd (instead of sselog) influences the latency information.

sseshuf replaces sselog in some insn patterns, but should be handled in the same way in *existing* .md files.

Uros,


[-- Attachment #2: difflog.txt --]
[-- Type: text/plain, Size: 26059 bytes --]

Index: gcc/doc/extend.texi
===================================================================
--- gcc/doc/extend.texi	(revision 193132)
+++ gcc/doc/extend.texi	(working copy)
@@ -9608,6 +9608,9 @@
 @item bdver2
 AMD family 15h Bulldozer version 2.
 
+@item bdver3
+AMD family 15h Bulldozer version 3.
+
 @item btver2
 AMD family 16h CPU.
 @end table
Index: gcc/doc/invoke.texi
===================================================================
--- gcc/doc/invoke.texi	(revision 193132)
+++ gcc/doc/invoke.texi	(working copy)
@@ -13678,6 +13678,11 @@
 supersets BMI, TBM, F16C, FMA, AVX, XOP, LWP, AES, PCL_MUL, CX16, MMX, SSE,
 SSE2, SSE3, SSE4A, SSSE3, SSE4.1, SSE4.2, ABM and 64-bit instruction set 
 extensions.)
+@item bdver3
+AMD Family 15h core based CPUs with x86-64 instruction set support.  (This
+supersets BMI, TBM, F16C, FMA, AVX, XOP, LWP, AES, PCL_MUL, CX16, MMX, SSE,
+SSE2, SSE3, SSE4A, SSSE3, SSE4.1, SSE4.2, ABM and 64-bit instruction set 
+extensions.)
 
 @item btver1
 CPUs based on AMD Family 14h cores with x86-64 instruction set support.  (This
Index: gcc/config.gcc
===================================================================
--- gcc/config.gcc	(revision 193132)
+++ gcc/config.gcc	(working copy)
@@ -1269,7 +1269,7 @@
 			TM_MULTILIB_CONFIG=`echo $TM_MULTILIB_CONFIG | sed 's/^,//'`
 			need_64bit_isa=yes
 			case X"${with_cpu}" in
-			Xgeneric|Xatom|Xcore2|Xcorei7|Xcorei7-avx|Xnocona|Xx86-64|Xbdver2|Xbdver1|Xbtver2|Xbtver1|Xamdfam10|Xbarcelona|Xk8|Xopteron|Xathlon64|Xathlon-fx|Xathlon64-sse3|Xk8-sse3|Xopteron-sse3)
+			Xgeneric|Xatom|Xcore2|Xcorei7|Xcorei7-avx|Xnocona|Xx86-64|Xbdver3|Xbdver2|Xbdver1|Xbtver2|Xbtver1|Xamdfam10|Xbarcelona|Xk8|Xopteron|Xathlon64|Xathlon-fx|Xathlon64-sse3|Xk8-sse3|Xopteron-sse3)
 				;;
 			X)
 				if test x$with_cpu_64 = x; then
@@ -1278,7 +1278,7 @@
 				;;
 			*)
 				echo "Unsupported CPU used in --with-cpu=$with_cpu, supported values:" 1>&2
-				echo "generic atom core2 corei7 corei7-avx nocona x86-64 bdver2 bdver1 btver2 btver1 amdfam10 barcelona k8 opteron athlon64 athlon-fx athlon64-sse3 k8-sse3 opteron-sse3" 1>&2
+				echo "generic atom core2 corei7 corei7-avx nocona x86-64 bdver3 bdver2 bdver1 btver2 btver1 amdfam10 barcelona k8 opteron athlon64 athlon-fx athlon64-sse3 k8-sse3 opteron-sse3" 1>&2
 				exit 1
 				;;
 			esac
@@ -1390,7 +1390,7 @@
 		tmake_file="$tmake_file i386/t-sol2-64"
 		need_64bit_isa=yes
 		case X"${with_cpu}" in
-		Xgeneric|Xatom|Xcore2|Xcorei7|Xcorei7-avx|Xnocona|Xx86-64|Xbdver2|Xbdver1|Xbtver2|Xbtver1|Xamdfam10|Xbarcelona|Xk8|Xopteron|Xathlon64|Xathlon-fx|Xathlon64-sse3|Xk8-sse3|Xopteron-sse3)
+		Xgeneric|Xatom|Xcore2|Xcorei7|Xcorei7-avx|Xnocona|Xx86-64|Xbdver3|Xbdver2|Xbdver1|Xbtver2|Xbtver1|Xamdfam10|Xbarcelona|Xk8|Xopteron|Xathlon64|Xathlon-fx|Xathlon64-sse3|Xk8-sse3|Xopteron-sse3)
 			;;
 		X)
 			if test x$with_cpu_64 = x; then
@@ -1399,7 +1399,7 @@
 			;;
 		*)
 			echo "Unsupported CPU used in --with-cpu=$with_cpu, supported values:" 1>&2
-			echo "generic atom core2 corei7 corei7-avx nocona x86-64 bdver2 bdver1 btver2 btver1 amdfam10 barcelona k8 opteron athlon64 athlon-fx athlon64-sse3 k8-sse3 opteron-sse3" 1>&2
+			echo "generic atom core2 corei7 corei7-avx nocona x86-64 bdver3 bdver2 bdver1 btver2 btver1 amdfam10 barcelona k8 opteron athlon64 athlon-fx athlon64-sse3 k8-sse3 opteron-sse3" 1>&2
 			exit 1
 			;;
 		esac
@@ -1456,7 +1456,7 @@
 			if test x$enable_targets = xall; then
 				tm_defines="${tm_defines} TARGET_BI_ARCH=1"
 				case X"${with_cpu}" in
-				Xgeneric|Xatom|Xcore2|Xcorei7|Xcorei7-avx|Xnocona|Xx86-64|Xbdver2|Xbdver1|Xbtver2|Xbtver1|Xamdfam10|Xbarcelona|Xk8|Xopteron|Xathlon64|Xathlon-fx|Xathlon64-sse3|Xk8-sse3|Xopteron-sse3)
+				Xgeneric|Xatom|Xcore2|Xcorei7|Xcorei7-avx|Xnocona|Xx86-64|Xbdver3|Xbdver2|Xbdver1|Xbtver2|Xbtver1|Xamdfam10|Xbarcelona|Xk8|Xopteron|Xathlon64|Xathlon-fx|Xathlon64-sse3|Xk8-sse3|Xopteron-sse3)
 					;;
 				X)
 					if test x$with_cpu_64 = x; then
@@ -1465,7 +1465,7 @@
 					;;
 				*)
 					echo "Unsupported CPU used in --with-cpu=$with_cpu, supported values:" 1>&2
-					echo "generic atom core2 corei7 Xcorei7-avx nocona x86-64 bdver2 bdver1 btver2 btver1 amdfam10 barcelona k8 opteron athlon64 athlon-fx athlon64-sse3 k8-sse3 opteron-sse3" 1>&2
+					echo "generic atom core2 corei7 Xcorei7-avx nocona x86-64 bdver3 bdver2 bdver1 btver2 btver1 amdfam10 barcelona k8 opteron athlon64 athlon-fx athlon64-sse3 k8-sse3 opteron-sse3" 1>&2
 					exit 1
 					;;
 				esac
@@ -2706,6 +2706,10 @@
     ;;
   i686-*-* | i786-*-*)
     case ${target_noncanonical} in
+      bdver3-*)
+        arch=bdver3
+        cpu=bdver3
+        ;;
       bdver2-*)
         arch=bdver2
         cpu=bdver2
@@ -2807,6 +2811,10 @@
     ;;
   x86_64-*-*)
     case ${target_noncanonical} in
+      bdver3-*)
+        arch=bdver3
+        cpu=bdver3
+        ;;
       bdver2-*)
         arch=bdver2
         cpu=bdver2
@@ -3344,8 +3352,8 @@
 				;;
 			"" | x86-64 | generic | native \
 			| k8 | k8-sse3 | athlon64 | athlon64-sse3 | opteron \
-			| opteron-sse3 | athlon-fx | bdver2 | bdver1 | btver2 | btver1 \
-			| amdfam10 | barcelona | nocona | core2 | corei7 \
+			| opteron-sse3 | athlon-fx | bdver3 | bdver2 | bdver1 | btver2 \
+			| btver1 | amdfam10 | barcelona | nocona | core2 | corei7 \
 			| corei7-avx | core-avx-i | core-avx2 | atom)
 				# OK
 				;;
Index: gcc/config/i386/i386.h
===================================================================
--- gcc/config/i386/i386.h	(revision 193132)
+++ gcc/config/i386/i386.h	(working copy)
@@ -254,6 +254,7 @@
 #define TARGET_AMDFAM10 (ix86_tune == PROCESSOR_AMDFAM10)
 #define TARGET_BDVER1 (ix86_tune == PROCESSOR_BDVER1)
 #define TARGET_BDVER2 (ix86_tune == PROCESSOR_BDVER2)
+#define TARGET_BDVER3 (ix86_tune == PROCESSOR_BDVER3)
 #define TARGET_BTVER1 (ix86_tune == PROCESSOR_BTVER1)
 #define TARGET_BTVER2 (ix86_tune == PROCESSOR_BTVER2)
 #define TARGET_ATOM (ix86_tune == PROCESSOR_ATOM)
@@ -616,6 +617,7 @@
   TARGET_CPU_DEFAULT_amdfam10,
   TARGET_CPU_DEFAULT_bdver1,
   TARGET_CPU_DEFAULT_bdver2,
+  TARGET_CPU_DEFAULT_bdver3,
   TARGET_CPU_DEFAULT_btver1,
   TARGET_CPU_DEFAULT_btver2,
 
@@ -2098,6 +2100,7 @@
   PROCESSOR_AMDFAM10,
   PROCESSOR_BDVER1,
   PROCESSOR_BDVER2,
+  PROCESSOR_BDVER3,
   PROCESSOR_BTVER1,
   PROCESSOR_BTVER2,
   PROCESSOR_ATOM,
Index: gcc/config/i386/i386.md
===================================================================
--- gcc/config/i386/i386.md	(revision 193132)
+++ gcc/config/i386/i386.md	(working copy)
@@ -323,7 +323,7 @@
 \f
 ;; Processor type.
 (define_attr "cpu" "none,pentium,pentiumpro,geode,k6,athlon,k8,core2,corei7,
-		    atom,generic64,amdfam10,bdver1,bdver2,btver1,btver2"
+		    atom,generic64,amdfam10,bdver1,bdver2,bdver3,btver1,btver2"
   (const (symbol_ref "ix86_schedule")))
 
 ;; A basic instruction type.  Refinements due to arguments to be
@@ -336,9 +336,9 @@
    push,pop,call,callv,leave,
    str,bitmanip,
    fmov,fop,fsgn,fmul,fdiv,fpspc,fcmov,fcmp,fxch,fistp,fisttp,frndint,
-   sselog,sselog1,sseiadd,sseiadd1,sseishft,sseishft1,sseimul,
-   sse,ssemov,sseadd,sseadd1,ssemul,ssecmp,ssecomi,ssecvt,ssecvt1,sseicvt,
-   ssediv,sseins,ssemuladd,sse4arg,lwp,
+   sselog,sselog1,sseiadd,sseiadd1,sseishft,sseishft1,sseimul,sse,
+   ssemov,sseadd,sseadd1,ssemul,ssecmp,ssecomi,ssecvt,ssecvt1,sseicvt,
+   sseshuf,sseshuf1,ssediv,sseins,ssemuladd,sse4arg,lwp,
    mmx,mmxmov,mmxadd,mmxmul,mmxcmp,mmxcvt,mmxshft"
   (const_string "other"))
 
@@ -353,7 +353,7 @@
 	   (const_string "i387")
 	 (eq_attr "type" "sselog,sselog1,sseiadd,sseiadd1,sseishft,sseishft1,sseimul,
 			  sse,ssemov,sseadd,sseadd1,ssemul,ssecmp,ssecomi,ssecvt,
-			  ssecvt1,sseicvt,ssediv,sseins,ssemuladd,sse4arg")
+			  sseshuf,sseshuf1,ssecvt1,sseicvt,ssediv,sseins,ssemuladd,sse4arg")
 	   (const_string "sse")
 	 (eq_attr "type" "mmx,mmxmov,mmxadd,mmxmul,mmxcmp,mmxcvt,mmxshft")
 	   (const_string "mmx")
@@ -594,7 +594,7 @@
 	   (if_then_else (match_operand 1 "constant_call_address_operand")
 	     (const_string "none")
 	     (const_string "load"))
-	 (and (eq_attr "type" "alu1,negnot,ishift1,sselog1")
+	 (and (eq_attr "type" "alu1,negnot,ishift1,sselog1,sseshuf1")
 	      (match_operand 1 "memory_operand"))
 	   (const_string "both")
 	 (and (match_operand 0 "memory_operand")
@@ -609,7 +609,7 @@
 		   imov,imovx,icmp,test,bitmanip,
 		   fmov,fcmp,fsgn,
 		   sse,ssemov,ssecmp,ssecomi,ssecvt,ssecvt1,sseicvt,sselog1,
-		   sseadd1,sseiadd1,mmx,mmxmov,mmxcmp,mmxcvt")
+		   sseshuf1,sseadd1,sseiadd1,mmx,mmxmov,mmxcmp,mmxcvt")
 	      (match_operand 2 "memory_operand"))
 	   (const_string "load")
 	 (and (eq_attr "type" "icmov,ssemuladd,sse4arg")
@@ -947,6 +947,7 @@
 (include "k6.md")
 (include "athlon.md")
 (include "bdver1.md")
+(include "bdver3.md")
 (include "geode.md")
 (include "atom.md")
 (include "core2.md")
Index: gcc/config/i386/athlon.md
===================================================================
--- gcc/config/i386/athlon.md	(revision 193132)
+++ gcc/config/i386/athlon.md	(working copy)
@@ -710,30 +710,30 @@
 
 (define_insn_reservation "athlon_sselog_load" 3
 			 (and (eq_attr "cpu" "athlon")
-			      (and (eq_attr "type" "sselog,sselog1")
+			      (and (eq_attr "type" "sselog,sselog1,sseshuf,sseshuf1")
 				   (eq_attr "memory" "load")))
 			 "athlon-vector,athlon-fpload2,(athlon-fmul*2)")
 (define_insn_reservation "athlon_sselog_load_k8" 5
 			 (and (eq_attr "cpu" "k8,generic64")
-			      (and (eq_attr "type" "sselog,sselog1")
+			      (and (eq_attr "type" "sselog,sselog1,sseshuf,sseshuf1")
 				   (eq_attr "memory" "load")))
 			 "athlon-double,athlon-fpload2k8,(athlon-fmul*2)")
 (define_insn_reservation "athlon_sselog_load_amdfam10" 4
 			 (and (eq_attr "cpu" "amdfam10")
-			      (and (eq_attr "type" "sselog,sselog1")
+			      (and (eq_attr "type" "sselog,sselog1,sseshuf,sseshuf1")
 				   (eq_attr "memory" "load")))
 			 "athlon-direct,athlon-fploadk8,(athlon-fadd|athlon-fmul)")
 (define_insn_reservation "athlon_sselog" 3
 			 (and (eq_attr "cpu" "athlon")
-			      (eq_attr "type" "sselog,sselog1"))
+			      (eq_attr "type" "sselog,sselog1,sseshuf,sseshuf1"))
 			 "athlon-vector,athlon-fpsched,athlon-fmul*2")
 (define_insn_reservation "athlon_sselog_k8" 3
 			 (and (eq_attr "cpu" "k8,generic64")
-			      (eq_attr "type" "sselog,sselog1"))
+			      (eq_attr "type" "sselog,sselog1,sseshuf,sseshuf1"))
 			 "athlon-double,athlon-fpsched,athlon-fmul")
 (define_insn_reservation "athlon_sselog_amdfam10" 2
 			 (and (eq_attr "cpu" "amdfam10")
-			      (eq_attr "type" "sselog,sselog1"))
+			      (eq_attr "type" "sselog,sselog1,sseshuf,sseshuf1"))
 			 "athlon-direct,athlon-fpsched,(athlon-fadd|athlon-fmul)")
 
 ;; ??? pcmp executes in addmul, probably not worthwhile to bother about that.
Index: gcc/config/i386/atom.md
===================================================================
--- gcc/config/i386/atom.md	(revision 193132)
+++ gcc/config/i386/atom.md	(working copy)
@@ -433,25 +433,25 @@
 
 (define_insn_reservation  "atom_sselog" 1
   (and (eq_attr "cpu" "atom")
-       (and (eq_attr "type" "sselog")
+       (and (eq_attr "type" "sselog,sseshuf")
             (eq_attr "memory" "none")))
   "atom-simple-either")
 
 (define_insn_reservation  "atom_sselog_mem" 1
   (and (eq_attr "cpu" "atom")
-       (and (eq_attr "type" "sselog")
+       (and (eq_attr "type" "sselog,sseshuf")
             (eq_attr "memory" "!none")))
   "atom-simple-either")
 
 (define_insn_reservation  "atom_sselog1" 1
   (and (eq_attr "cpu" "atom")
-       (and (eq_attr "type" "sselog1")
+       (and (eq_attr "type" "sselog1,sseshuf1")
             (eq_attr "memory" "none")))
   "atom-simple-0")
 
 (define_insn_reservation  "atom_sselog1_mem" 1
   (and (eq_attr "cpu" "atom")
-       (and (eq_attr "type" "sselog1")
+       (and (eq_attr "type" "sselog1,sseshuf1")
             (eq_attr "memory" "!none")))
   "atom-simple-0")
 
@@ -743,8 +743,8 @@
                   atom_imul_mem, atom_icmp_mem,
                   atom_test_mem, atom_icmov_mem, atom_sselog_mem,
                   atom_sselog1_mem, atom_fmov_mem, atom_sseadd_mem,
-                  atom_ishift_mem, atom_ishift1_mem, 
-                  atom_rotate_mem, atom_rotate1_mem"
+                  atom_ishift_mem, atom_ishift1_mem, atom_rotate_mem, 
+                  atom_rotate1_mem"
                   "ix86_agi_dependent")
 
 ;; Stall from imul to lea is 8 cycles.
Index: gcc/config/i386/ppro.md
===================================================================
--- gcc/config/i386/ppro.md	(revision 193132)
+++ gcc/config/i386/ppro.md	(working copy)
@@ -690,14 +690,14 @@
 			 (and (eq_attr "cpu" "pentiumpro")
 			      (and (eq_attr "memory" "none")
 				   (and (eq_attr "mode" "V4SF")
-					(eq_attr "type" "sselog,sselog1"))))
+					(eq_attr "type" "sselog,sselog1,sseshuf,sseshuf1"))))
 			 "decodern,p1")
 
 (define_insn_reservation "ppro_sse_log_V4SF_load" 2
 			 (and (eq_attr "cpu" "pentiumpro")
 			      (and (eq_attr "memory" "load")
 				   (and (eq_attr "mode" "V4SF")
-					(eq_attr "type" "sselog,sselog1"))))
+					(eq_attr "type" "sselog,sselog1,sseshuf,sseshuf1"))))
 			 "decoder0,(p2+p1)")
 
 (define_insn_reservation "ppro_sse_mov_V4SF" 1
Index: gcc/config/i386/sse.md
===================================================================
--- gcc/config/i386/sse.md	(revision 193132)
+++ gcc/config/i386/sse.md	(working copy)
@@ -3860,7 +3860,7 @@
 
   return "vshufps\t{%3, %2, %1, %0|%0, %1, %2, %3}";
 }
-  [(set_attr "type" "sselog")
+  [(set_attr "type" "sseshuf")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "vex")
    (set_attr "mode" "V8SF")])
@@ -3911,7 +3911,7 @@
     }
 }
   [(set_attr "isa" "noavx,avx")
-   (set_attr "type" "sselog")
+   (set_attr "type" "sseshuf")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "V4SF")])
@@ -4018,7 +4018,7 @@
    vmovlps\t{%2, %1, %0|%0, %1, %2}
    %vmovlps\t{%2, %0|%0, %2}"
   [(set_attr "isa" "noavx,avx,noavx,avx,*")
-   (set_attr "type" "sselog,sselog,ssemov,ssemov,ssemov")
+   (set_attr "type" "sseshuf,sseshuf,ssemov,ssemov,ssemov")
    (set_attr "length_immediate" "1,1,*,*,*")
    (set_attr "prefix" "orig,vex,orig,vex,maybe_vex")
    (set_attr "mode" "V4SF,V4SF,V2SF,V2SF,V2SF")])
@@ -4072,7 +4072,7 @@
    vbroadcastss\t{%1, %0|%0, %1}
    shufps\t{$0, %0, %0|%0, %0, 0}"
   [(set_attr "isa" "avx,avx,noavx")
-   (set_attr "type" "sselog1,ssemov,sselog1")
+   (set_attr "type" "sseshuf1,ssemov,sseshuf1")
    (set_attr "length_immediate" "1,0,1")
    (set_attr "prefix_extra" "0,1,*")
    (set_attr "prefix" "vex,vex,orig")
@@ -4802,7 +4802,7 @@
 
   return "vshufpd\t{%3, %2, %1, %0|%0, %1, %2, %3}";
 }
-  [(set_attr "type" "sselog")
+  [(set_attr "type" "sseshuf")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "vex")
    (set_attr "mode" "V4DF")])
@@ -4916,7 +4916,7 @@
     }
 }
   [(set_attr "isa" "noavx,avx")
-   (set_attr "type" "sselog")
+   (set_attr "type" "sseshuf")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "V2DF")])
Index: gcc/config/i386/i386-c.c
===================================================================
--- gcc/config/i386/i386-c.c	(revision 193132)
+++ gcc/config/i386/i386-c.c	(working copy)
@@ -114,6 +114,10 @@
       def_or_undef (parse_in, "__bdver2");
       def_or_undef (parse_in, "__bdver2__");
       break;
+    case PROCESSOR_BDVER3:
+      def_or_undef (parse_in, "__bdver3");
+      def_or_undef (parse_in, "__bdver3__");
+      break;
     case PROCESSOR_BTVER1:
       def_or_undef (parse_in, "__btver1");
       def_or_undef (parse_in, "__btver1__");
@@ -209,7 +213,10 @@
     case PROCESSOR_BDVER2:
       def_or_undef (parse_in, "__tune_bdver2__");
       break;
-   case PROCESSOR_BTVER1:
+    case PROCESSOR_BDVER3:
+      def_or_undef (parse_in, "__tune_bdver3__");
+      break;
+    case PROCESSOR_BTVER1:
       def_or_undef (parse_in, "__tune_btver1__");
       break;
     case PROCESSOR_BTVER2:
Index: gcc/config/i386/i386.opt
===================================================================
--- gcc/config/i386/i386.opt	(revision 193132)
+++ gcc/config/i386/i386.opt	(working copy)
@@ -419,7 +419,7 @@
 
 mdispatch-scheduler
 Target RejectNegative Var(flag_dispatch_scheduler)
-Do dispatch scheduling if processor is bdver1 or bdver2 and Haifa scheduling
+Do dispatch scheduling if processor is bdver1 or bdver2 or bdver3 and Haifa scheduling
 is selected.
 
 mprefer-avx128
Index: gcc/config/i386/bdver1.md
===================================================================
--- gcc/config/i386/bdver1.md	(revision 193132)
+++ gcc/config/i386/bdver1.md	(working copy)
@@ -482,23 +482,23 @@
 ;; SSE logs.
 (define_insn_reservation "bdver1_sselog_load_256" 7
 			 (and (eq_attr "cpu" "bdver1,bdver2")
-			      (and (eq_attr "type" "sselog,sselog1")
+			      (and (eq_attr "type" "sselog,sselog1,sseshuf,sseshuf1")
 				   (and (eq_attr "mode" "V8SF")
 				   (eq_attr "memory" "load"))))
 			 "bdver1-double,bdver1-fpload,bdver1-fmal")
 (define_insn_reservation "bdver1_sselog_256" 3
 			 (and (eq_attr "cpu" "bdver1,bdver2")
-			      (and (eq_attr "type" "sselog,sselog1")
+			      (and (eq_attr "type" "sselog,sselog1,sseshuf,sseshuf1")
                                    (eq_attr "mode" "V8SF")))
 			 "bdver1-double,bdver1-fpsched,bdver1-fmal")
 (define_insn_reservation "bdver1_sselog_load" 6
 			 (and (eq_attr "cpu" "bdver1,bdver2")
-			      (and (eq_attr "type" "sselog,sselog1")
+			      (and (eq_attr "type" "sselog,sselog1,sseshuf,sseshuf1")
 				   (eq_attr "memory" "load")))
 			 "bdver1-direct,bdver1-fpload,bdver1-fxbar")
 (define_insn_reservation "bdver1_sselog" 2
 			 (and (eq_attr "cpu" "bdver1,bdver2")
-			      (eq_attr "type" "sselog,sselog1"))
+			      (eq_attr "type" "sselog,sselog1,sseshuf,sseshuf1"))
 			 "bdver1-direct,bdver1-fpsched,bdver1-fxbar")
 
 ;; PCMP actually executes in FMAL.
Index: gcc/config/i386/driver-i386.c
===================================================================
--- gcc/config/i386/driver-i386.c	(revision 193132)
+++ gcc/config/i386/driver-i386.c	(working copy)
@@ -542,6 +542,8 @@
 	processor = PROCESSOR_GEODE;
       else if (has_movbe)
 	processor = PROCESSOR_BTVER2;
+      else if (has_xsaveopt)
+        processor = PROCESSOR_BDVER3;
       else if (has_bmi)
         processor = PROCESSOR_BDVER2;
       else if (has_xop)
@@ -712,6 +714,9 @@
     case PROCESSOR_BDVER2:
       cpu = "bdver2";
       break;
+    case PROCESSOR_BDVER3:
+      cpu = "bdver3";
+      break;
     case PROCESSOR_BTVER1:
       cpu = "btver1";
       break;
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 193132)
+++ gcc/config/i386/i386.c	(working copy)
@@ -1427,6 +1427,85 @@
   1,					/* cond_not_taken_branch_cost.  */
 };
 
+struct processor_costs bdver3_cost = {
+  COSTS_N_INSNS (1),			/* cost of an add instruction */
+  COSTS_N_INSNS (1),			/* cost of a lea instruction */
+  COSTS_N_INSNS (1),			/* variable shift costs */
+  COSTS_N_INSNS (1),			/* constant shift costs */
+  {COSTS_N_INSNS (4),			/* cost of starting multiply for QI */
+   COSTS_N_INSNS (4),			/*				 HI */
+   COSTS_N_INSNS (4),			/*				 SI */
+   COSTS_N_INSNS (6),			/*				 DI */
+   COSTS_N_INSNS (6)},			/*			      other */
+  0,					/* cost of multiply per each bit set */
+  {COSTS_N_INSNS (19),			/* cost of a divide/mod for QI */
+   COSTS_N_INSNS (35),			/*			    HI */
+   COSTS_N_INSNS (51),			/*			    SI */
+   COSTS_N_INSNS (83),			/*			    DI */
+   COSTS_N_INSNS (83)},			/*			    other */
+  COSTS_N_INSNS (1),			/* cost of movsx */
+  COSTS_N_INSNS (1),			/* cost of movzx */
+  8,					/* "large" insn */
+  9,					/* MOVE_RATIO */
+  4,				     /* cost for loading QImode using movzbl */
+  {5, 5, 4},				/* cost of loading integer registers
+					   in QImode, HImode and SImode.
+					   Relative to reg-reg move (2).  */
+  {4, 4, 4},				/* cost of storing integer registers */
+  2,					/* cost of reg,reg fld/fst */
+  {5, 5, 12},				/* cost of loading fp registers
+		   			   in SFmode, DFmode and XFmode */
+  {4, 4, 8},				/* cost of storing fp registers
+ 		   			   in SFmode, DFmode and XFmode */
+  2,					/* cost of moving MMX register */
+  {4, 4},				/* cost of loading MMX registers
+					   in SImode and DImode */
+  {4, 4},				/* cost of storing MMX registers
+					   in SImode and DImode */
+  2,					/* cost of moving SSE register */
+  {4, 4, 4},				/* cost of loading SSE registers
+					   in SImode, DImode and TImode */
+  {4, 4, 4},				/* cost of storing SSE registers
+					   in SImode, DImode and TImode */
+  2,					/* MMX or SSE register to integer */
+  16,					/* size of l1 cache.  */
+  2048,					/* size of l2 cache.  */
+  64,					/* size of prefetch block */
+  /* New AMD processors never drop prefetches; if they cannot be performed
+     immediately, they are queued.  We set number of simultaneous prefetches
+     to a large constant to reflect this (it probably is not a good idea not
+     to limit number of prefetches at all, as their execution also takes some
+     time).  */
+  100,					/* number of parallel prefetches */
+  2,					/* Branch cost */
+  COSTS_N_INSNS (6),			/* cost of FADD and FSUB insns.  */
+  COSTS_N_INSNS (6),			/* cost of FMUL instruction.  */
+  COSTS_N_INSNS (42),			/* cost of FDIV instruction.  */
+  COSTS_N_INSNS (2),			/* cost of FABS instruction.  */
+  COSTS_N_INSNS (2),			/* cost of FCHS instruction.  */
+  COSTS_N_INSNS (52),			/* cost of FSQRT instruction.  */
+
+  /*  BDVER3 has optimized REP instruction for medium sized blocks, but for
+      very small blocks it is better to use loop. For large blocks, libcall
+      can do nontemporary accesses and beat inline considerably.  */
+  {{libcall, {{6, loop}, {14, unrolled_loop}, {-1, rep_prefix_4_byte}}},
+   {libcall, {{16, loop}, {8192, rep_prefix_8_byte}, {-1, libcall}}}},
+  {{libcall, {{8, loop}, {24, unrolled_loop},
+	      {2048, rep_prefix_4_byte}, {-1, libcall}}},
+   {libcall, {{48, unrolled_loop}, {8192, rep_prefix_8_byte}, {-1, libcall}}}},
+  6,					/* scalar_stmt_cost.  */
+  4,					/* scalar load_cost.  */
+  4,					/* scalar_store_cost.  */
+  6,					/* vec_stmt_cost.  */
+  0,					/* vec_to_scalar_cost.  */
+  2,					/* scalar_to_vec_cost.  */
+  4,					/* vec_align_load_cost.  */
+  4,					/* vec_unalign_load_cost.  */
+  4,					/* vec_store_cost.  */
+  2,					/* cond_taken_branch_cost.  */
+  1,					/* cond_not_taken_branch_cost.  */
+};
+
 struct processor_costs btver1_cost = {
   COSTS_N_INSNS (1),			/* cost of an add instruction */
   COSTS_N_INSNS (2),			/* cost of a lea instruction */
@@ -1987,7 +2066,8 @@
 #define m_AMDFAM10 (1<<PROCESSOR_AMDFAM10)
 #define m_BDVER1 (1<<PROCESSOR_BDVER1)
 #define m_BDVER2 (1<<PROCESSOR_BDVER2)
-#define m_BDVER	(m_BDVER1 | m_BDVER2)
+#define m_BDVER3 (1<<PROCESSOR_BDVER3)
+#define m_BDVER	(m_BDVER1 | m_BDVER2 | m_BDVER3)
 #define m_BTVER (m_BTVER1 | m_BTVER2)
 #define m_BTVER1 (1<<PROCESSOR_BTVER1)
 #define m_BTVER2 (1<<PROCESSOR_BTVER2)
@@ -2690,6 +2770,7 @@
   {&amdfam10_cost, 32, 24, 32, 7, 32},
   {&bdver1_cost, 32, 24, 32, 7, 32},
   {&bdver2_cost, 32, 24, 32, 7, 32},
+  {&bdver3_cost, 32, 24, 32, 7, 32},
   {&btver1_cost, 32, 24, 32, 7, 32},
   {&btver2_cost, 32, 24, 32, 7, 32},
   {&atom_cost, 16, 15, 16, 7, 16}
@@ -2722,6 +2803,7 @@
   "amdfam10",
   "bdver1",
   "bdver2",
+  "bdver3",
   "btver1",
   "btver2"
 };
@@ -3173,18 +3255,24 @@
 	PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3
 	| PTA_SSE4A | PTA_CX16 | PTA_ABM | PTA_SSSE3 | PTA_SSE4_1
 	| PTA_SSE4_2 | PTA_AES | PTA_PCLMUL | PTA_AVX | PTA_FMA4
-	| PTA_XOP | PTA_LWP | PTA_PRFCHW | PTA_FXSR | PTA_XSAVE
-	| PTA_XSAVEOPT},
+	| PTA_XOP | PTA_LWP | PTA_PRFCHW | PTA_FXSR | PTA_XSAVE},
       {"bdver2", PROCESSOR_BDVER2, CPU_BDVER2,
 	PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3
 	| PTA_SSE4A | PTA_CX16 | PTA_ABM | PTA_SSSE3 | PTA_SSE4_1
 	| PTA_SSE4_2 | PTA_AES | PTA_PCLMUL | PTA_AVX | PTA_FMA4
 	| PTA_XOP | PTA_LWP | PTA_BMI | PTA_TBM | PTA_F16C
-	| PTA_FMA | PTA_PRFCHW | PTA_FXSR | PTA_XSAVE | PTA_XSAVEOPT},
+	| PTA_FMA | PTA_PRFCHW | PTA_FXSR | PTA_XSAVE},
+      {"bdver3", PROCESSOR_BDVER3, CPU_BDVER3,
+	PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3
+	| PTA_SSE4A | PTA_CX16 | PTA_ABM | PTA_SSSE3 | PTA_SSE4_1
+	| PTA_SSE4_2 | PTA_AES | PTA_PCLMUL | PTA_AVX
+	| PTA_XOP | PTA_LWP | PTA_BMI | PTA_TBM | PTA_F16C
+	| PTA_FMA | PTA_PRFCHW | PTA_FXSR | PTA_XSAVE 
+	| PTA_XSAVEOPT},
       {"btver1", PROCESSOR_BTVER1, CPU_GENERIC64,
 	PTA_64BIT | PTA_MMX |  PTA_SSE  | PTA_SSE2 | PTA_SSE3
 	| PTA_SSSE3 | PTA_SSE4A |PTA_ABM | PTA_CX16 | PTA_PRFCHW
-	| PTA_FXSR | PTA_XSAVE | PTA_XSAVEOPT},
+	| PTA_FXSR | PTA_XSAVE},
       {"btver2", PROCESSOR_BTVER2, CPU_GENERIC64,
 	PTA_64BIT | PTA_MMX |  PTA_SSE  | PTA_SSE2 | PTA_SSE3
 	| PTA_SSSE3 | PTA_SSE4A |PTA_ABM | PTA_CX16 | PTA_SSE4_1
@@ -24073,6 +24161,7 @@
     case PROCESSOR_GENERIC64:
     case PROCESSOR_BDVER1:
     case PROCESSOR_BDVER2:
+    case PROCESSOR_BDVER3:
     case PROCESSOR_BTVER1:
       return 3;
 
@@ -24262,6 +24351,7 @@
     case PROCESSOR_AMDFAM10:
     case PROCESSOR_BDVER1:
     case PROCESSOR_BDVER2:
+    case PROCESSOR_BDVER3:
     case PROCESSOR_BTVER1:
     case PROCESSOR_BTVER2:
     case PROCESSOR_ATOM:
@@ -28591,7 +28681,8 @@
     M_AMDFAM10H_SHANGHAI,
     M_AMDFAM10H_ISTANBUL,
     M_AMDFAM15H_BDVER1,
-    M_AMDFAM15H_BDVER2
+    M_AMDFAM15H_BDVER2,
+    M_AMDFAM15H_BDVER3
   };
 
   static struct _arch_names_table
@@ -28616,6 +28707,7 @@
       {"amdfam15h", M_AMDFAM15H},
       {"bdver1", M_AMDFAM15H_BDVER1},
       {"bdver2", M_AMDFAM15H_BDVER2},
+      {"bdver3", M_AMDFAM15H_BDVER3},
     };
 
   static struct _isa_names_table
@@ -40962,7 +41054,7 @@
 static bool
 has_dispatch (rtx insn, int action)
 {
-  if ((TARGET_BDVER1 || TARGET_BDVER2)
+  if ((TARGET_BDVER1 || TARGET_BDVER2 || TARGET_BDVER3)
       && flag_dispatch_scheduler)
     switch (action)
       {

[-- Attachment #3: bdver3.md --]
[-- Type: application/octet-stream, Size: 32733 bytes --]

;; Copyright (C) 2012, Free Software Foundation, Inc.
;;
;; This file is part of GCC.
;;
;; GCC is free software; you can redistribute it and/or modify
;; it under the terms of the GNU General Public License as published by
;; the Free Software Foundation; either version 3, or (at your option)
;; any later version.
;;
;; GCC is distributed in the hope that it will be useful,
;; but WITHOUT ANY WARRANTY; without even the implied warranty of
;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
;; GNU General Public License for more details.
;;
;; You should have received a copy of the GNU General Public License
;; along with GCC; see the file COPYING3.  If not see
;; <http://www.gnu.org/licenses/>.
;;
;; AMD bdver3 Scheduling
;;
;; The bdver3 contains three pipelined FP units and two integer units.
;; Fetching and decoding logic is different from previous fam15 processors.
;; Fetching is done every two cycles rather than every cycle and
;; two decode units are available. The decode units therefore decode
;; four instructions in two cycles.
;;
;; The load/store queue unit is not attached to the schedulers but
;; communicates with all the execution units separately instead.
;;
;; bdver3 belong to fam15 processors. We use the same insn attribute
;; that was used for bdver1 decoding scheme.

(define_automaton "bdver3,bdver3_ieu,bdver3_load,bdver3_fp,bdver3_agu")

(define_cpu_unit "bdver3-decode0" "bdver3")
(define_cpu_unit "bdver3-decode1" "bdver3")
(define_cpu_unit "bdver3-decodev" "bdver3")

;; Double decoded instructions take two cycles whereas
;; direct instructions take one cycle.
;; Therefore four direct instructions can be decoded by
;; two decoders in two cycles.
;; Vectorpath instructions are single issue instructions.
;; So, we have separate unit for vector instructions.
(exclusion_set "bdver3-decodev" "bdver3-decode0,bdver3-decode1")

(define_reservation "bdver3-vector" "bdver3-decodev")
(define_reservation "bdver3-direct" "(bdver3-decode0|bdver3-decode1)")
;; Double instructions take two cycles to decode.
(define_reservation "bdver3-double" "(bdver3-decode0|bdver3-decode1)*2")

(define_cpu_unit "bdver3-ieu0" "bdver3_ieu")
(define_cpu_unit "bdver3-ieu1" "bdver3_ieu")
(define_reservation "bdver3-ieu" "(bdver3-ieu0|bdver3-ieu1)")

(define_cpu_unit "bdver3-agu0" "bdver3_agu")
(define_cpu_unit "bdver3-agu1" "bdver3_agu")
(define_reservation "bdver3-agu" "(bdver3-agu0|bdver3-agu1)")

(define_cpu_unit "bdver3-load0" "bdver3_load")
(define_cpu_unit "bdver3-load1" "bdver3_load")
(define_reservation "bdver3-load" "bdver3-agu,
				   (bdver3-load0|bdver3-load1),nothing")
;; 128bit SSE instructions issue two loads at once.
(define_reservation "bdver3-load2" "bdver3-agu,
				   (bdver3-load0+bdver3-load1),nothing")

(define_reservation "bdver3-store" "(bdver3-load0 | bdver3-load1)")
;; 128bit SSE instructions issue two stores at once.
(define_reservation "bdver3-store2" "(bdver3-load0+bdver3-load1)")

;; vectorpath (microcoded) instructions are single issue instructions.
;; So, they occupy all the integer units.
(define_reservation "bdver3-ivector" "bdver3-ieu0+bdver3-ieu1+
                                      bdver3-agu0+bdver3-agu1+
                                      bdver3-load0+bdver3-load1")

(define_reservation "bdver3-fpsched" "nothing,nothing,nothing")

;; The floating point loads.
(define_reservation "bdver3-fpload" "(bdver3-fpsched + bdver3-load)")
(define_reservation "bdver3-fpload2" "(bdver3-fpsched + bdver3-load2)")

;; Three FP units.
(define_cpu_unit "bdver3-ffma0" "bdver3_fp")
(define_cpu_unit "bdver3-ffma1" "bdver3_fp")
(define_cpu_unit "bdver3-fpsto" "bdver3_fp")

(define_reservation "bdver3-fvector" "bdver3-ffma0+bdver3-ffma1+
                                      bdver3-fpsto+bdver3-load0+
                                      bdver3-load1")

(define_reservation "bdver3-ffma"     "(bdver3-ffma0 | bdver3-ffma1)")
(define_reservation "bdver3-fcvt"     "bdver3-ffma0")
(define_reservation "bdver3-fmma"     "bdver3-ffma0")
(define_reservation "bdver3-fxbar"    "bdver3-ffma1")
(define_reservation "bdver3-fmal"     "(bdver3-ffma0 | bdver3-fpsto)")
(define_reservation "bdver3-fsto"     "bdver3-fpsto")
(define_reservation "bdver3-fpshuf"    "bdver3-fpsto")

;; Jump instructions are executed in the branch unit completely transparent to us.
(define_insn_reservation "bdver3_call" 2
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "call,callv"))
			 "bdver3-double,(bdver3-agu | bdver3-ieu),nothing")
;; PUSH mem is double path.
(define_insn_reservation "bdver3_push" 1
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "push"))
			 "bdver3-direct,bdver3-ieu,bdver3-store")
;; POP r16/mem are double path.
(define_insn_reservation "bdver3_pop" 1
                         (and (eq_attr "cpu" "bdver3")
                              (eq_attr "type" "pop"))
                         "bdver3-direct,bdver3-ivector")
;; LEAVE no latency info so far, assume same with amdfam10.
(define_insn_reservation "bdver3_leave" 3
                         (and (eq_attr "cpu" "bdver3")
                              (eq_attr "type" "leave"))
                         "bdver3-vector,bdver3-ivector")
;; LEA executes in AGU unit with 1 cycle latency on BDVER3.
(define_insn_reservation "bdver3_lea" 1
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "lea"))
			 "bdver3-direct,bdver3-ieu")
;; MUL executes in special multiplier unit attached to IEU1.
(define_insn_reservation "bdver3_imul_DI" 6
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "imul")
				   (and (eq_attr "mode" "DI")
					(eq_attr "memory" "none,unknown"))))
			 "bdver3-direct,bdver3-ieu1")
(define_insn_reservation "bdver3_imul" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "imul")
				   (eq_attr "memory" "none,unknown")))
			 "bdver3-direct,bdver3-ieu1")
(define_insn_reservation "bdver3_imul_mem_DI" 10
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "imul")
				   (and (eq_attr "mode" "DI")
					(eq_attr "memory" "load,both"))))
			 "bdver3-direct,bdver3-load,bdver3-ieu1")
(define_insn_reservation "bdver3_imul_mem" 8
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "imul")
				   (eq_attr "memory" "load,both")))
			 "bdver3-direct,bdver3-load,bdver3-ieu1")

(define_insn_reservation "bdver3_str" 6
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "str")
				   (eq_attr "memory" "load,both,store")))
			 "bdver3-vector,bdver3-load,bdver3-ivector")

;; Integer instructions.
(define_insn_reservation "bdver3_idirect" 1
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "bdver1_decode" "direct")
				   (and (eq_attr "unit" "integer,unknown")
					(eq_attr "memory" "none,unknown"))))
			 "bdver3-direct,(bdver3-ieu|bdver3-agu)")
(define_insn_reservation "bdver3_ivector" 2
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "bdver1_decode" "vector")
				   (and (eq_attr "unit" "integer,unknown")
					(eq_attr "memory" "none,unknown"))))
			 "bdver3-vector,bdver3-ivector")
(define_insn_reservation "bdver3_idirect_loadmov" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "imov")
				   (eq_attr "memory" "load")))
			 "bdver3-direct,bdver3-load")
(define_insn_reservation "bdver3_idirect_load" 5
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "bdver1_decode" "direct")
				   (and (eq_attr "unit" "integer,unknown")
					(eq_attr "memory" "load"))))
			 "bdver3-direct,bdver3-load,bdver3-ieu")
(define_insn_reservation "bdver3_idirect_movstore" 5
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "imov")
				   (eq_attr "memory" "store")))
			 "bdver3-direct,bdver3-ieu,bdver3-store")
(define_insn_reservation "bdver3_idirect_both" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "bdver1_decode" "direct")
				   (and (eq_attr "unit" "integer,unknown")
					(eq_attr "memory" "both"))))
			 "bdver3-direct,bdver3-load,
			  bdver3-ieu,bdver3-store,
			  bdver3-store")
(define_insn_reservation "bdver3_idirect_store" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "bdver1_decode" "direct")
				   (and (eq_attr "unit" "integer,unknown")
					(eq_attr "memory" "store"))))
			 "bdver3-direct,(bdver3-ieu+bdver3-agu),
			  bdver3-store")
;; BDVER3 floating point units.
(define_insn_reservation "bdver3_fldxf" 13
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "fmov")
				   (and (eq_attr "memory" "load")
					(eq_attr "mode" "XF"))))
			 "bdver3-vector,bdver3-fpload2,bdver3-fvector*9")
(define_insn_reservation "bdver3_fld" 2
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "fmov")
				   (eq_attr "memory" "load")))
			 "bdver3-direct,bdver3-fpload,bdver3-ffma")
(define_insn_reservation "bdver3_fstxf" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "fmov")
				   (and (eq_attr "memory" "store,both")
					(eq_attr "mode" "XF"))))
			 "bdver3-vector,(bdver3-fpsched+bdver3-agu),(bdver3-store2+(bdver3-fvector*6))")
(define_insn_reservation "bdver3_fst" 2
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "fmov")
				   (eq_attr "memory" "store,both")))
			 "bdver3-double,(bdver3-fpsched),(bdver3-fsto+bdver3-store)")
(define_insn_reservation "bdver3_fist" 2
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "fistp,fisttp"))
			 "bdver3-double,(bdver3-fpsched),(bdver3-fsto+bdver3-store)")
(define_insn_reservation "bdver3_fmov_bdver3" 2
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "fmov"))
			 "bdver3-direct,bdver3-fpsched,bdver3-ffma")
(define_insn_reservation "bdver3_fadd_load" 10
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "fop")
				   (eq_attr "memory" "load")))
			 "bdver3-direct,bdver3-fpload,bdver3-ffma")
(define_insn_reservation "bdver3_fadd" 6
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "fop"))
			 "bdver3-direct,bdver3-fpsched,bdver3-ffma")
(define_insn_reservation "bdver3_fmul_load" 6
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "fmul")
				   (eq_attr "memory" "load")))
			 "bdver3-double,bdver3-fpload,bdver3-ffma")
(define_insn_reservation "bdver3_fmul" 6
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "fmul"))
			 "bdver3-direct,bdver3-fpsched,bdver3-ffma")
(define_insn_reservation "bdver3_fsgn" 2
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "fsgn"))
			 "bdver3-direct,bdver3-fpsched,bdver3-ffma")
(define_insn_reservation "bdver3_fdiv_load" 42
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "fdiv")
				   (eq_attr "memory" "load")))
			 "bdver3-direct,bdver3-fpload,bdver3-ffma")
(define_insn_reservation "bdver3_fdiv" 42
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "fdiv"))
			 "bdver3-direct,bdver3-fpsched,bdver3-ffma")
(define_insn_reservation "bdver3_fpspc_load" 143
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "fpspc")
				   (eq_attr "memory" "load")))
			 "bdver3-vector,bdver3-fpload,bdver3-fvector")
(define_insn_reservation "bdver3_fcmov_load" 17
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "fcmov")
				   (eq_attr "memory" "load")))
			 "bdver3-vector,bdver3-fpload,bdver3-fvector")
(define_insn_reservation "bdver3_fcmov" 15
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "fcmov"))
			 "bdver3-vector,bdver3-fpsched,bdver3-fvector")
(define_insn_reservation "bdver3_fcomi_load" 6
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "fcmp")
				   (and (eq_attr "bdver1_decode" "double")
					(eq_attr "memory" "load"))))
			 "bdver3-double,bdver3-fpload,(bdver3-ffma | bdver3-fsto)")
(define_insn_reservation "bdver3_fcomi" 2
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "bdver1_decode" "double")
				   (eq_attr "type" "fcmp")))
			 "bdver3-double,bdver3-fpsched,(bdver3-ffma | bdver3-fsto)")
(define_insn_reservation "bdver3_fcom_load" 6
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "fcmp")
				   (eq_attr "memory" "load")))
			 "bdver3-direct,bdver3-fpload,bdver3-ffma")
(define_insn_reservation "bdver3_fcom" 2
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "fcmp"))
			 "bdver3-direct,bdver3-fpsched,bdver3-ffma")
(define_insn_reservation "bdver3_fxch" 2
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "fxch"))
			 "bdver3-direct,bdver3-fpsched,bdver3-ffma")

;; SSE loads.
(define_insn_reservation "bdver3_ssevector_avx128_unaligned_load" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemov")
				   (and (eq_attr "prefix" "vex")
					(and (eq_attr "movu" "1")
					     (and (eq_attr "mode" "V4SF,V2DF")
						  (eq_attr "memory" "load"))))))
			 "bdver3-direct,bdver3-fpload")
(define_insn_reservation "bdver3_ssevector_avx256_unaligned_load" 5
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemov")
				   (and (eq_attr "movu" "1")
				        (and (eq_attr "mode" "V8SF,V4DF")
				             (eq_attr "memory" "load")))))
			 "bdver3-double,bdver3-fpload")
(define_insn_reservation "bdver3_ssevector_sse128_unaligned_load" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemov")
				   (and (eq_attr "movu" "1")
				        (and (eq_attr "mode" "V4SF,V2DF")
				             (eq_attr "memory" "load")))))
			 "bdver3-direct,bdver3-fpload,bdver3-fmal")
(define_insn_reservation "bdver3_ssevector_avx128_load" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemov")
				   (and (eq_attr "prefix" "vex")
				        (and (eq_attr "mode" "V4SF,V2DF,TI")
				             (eq_attr "memory" "load")))))
			 "bdver3-direct,bdver3-fpload,bdver3-fmal")
(define_insn_reservation "bdver3_ssevector_avx256_load" 5
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemov")
				   (and (eq_attr "mode" "V8SF,V4DF,OI")
				        (eq_attr "memory" "load"))))
			 "bdver3-double,bdver3-fpload,bdver3-fmal")
(define_insn_reservation "bdver3_ssevector_sse128_load" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemov")
				   (and (eq_attr "mode" "V4SF,V2DF,TI")
				        (eq_attr "memory" "load"))))
			 "bdver3-direct,bdver3-fpload")
(define_insn_reservation "bdver3_ssescalar_movq_load" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemov")
				   (and (eq_attr "mode" "DI")
				        (eq_attr "memory" "load"))))
			 "bdver3-direct,bdver3-fpload,bdver3-fmal")
(define_insn_reservation "bdver3_ssescalar_vmovss_load" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemov")
				   (and (eq_attr "prefix" "vex")
				        (and (eq_attr "mode" "SF")
				             (eq_attr "memory" "load")))))
			 "bdver3-direct,bdver3-fpload")
(define_insn_reservation "bdver3_ssescalar_sse128_load" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemov")
				   (and (eq_attr "mode" "SF,DF")
				        (eq_attr "memory" "load"))))
			 "bdver3-direct,bdver3-fpload, bdver3-ffma")
(define_insn_reservation "bdver3_mmxsse_load" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "mmxmov,ssemov")
				   (eq_attr "memory" "load")))
			 "bdver3-direct,bdver3-fpload, bdver3-fmal")

;; SSE stores.
(define_insn_reservation "bdver3_sse_store_avx256" 5
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemov")
				   (and (eq_attr "mode" "V8SF,V4DF,OI")
					(eq_attr "memory" "store,both"))))
			 "bdver3-double,bdver3-fpsched,((bdver3-fsto+bdver3-store)*2)")
(define_insn_reservation "bdver3_sse_store" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemov")
				   (and (eq_attr "mode" "V4SF,V2DF,TI")
					(eq_attr "memory" "store,both"))))
			 "bdver3-direct,bdver3-fpsched,((bdver3-fsto+bdver3-store)*2)")
(define_insn_reservation "bdver3_mmxsse_store_short" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "mmxmov,ssemov")
				   (eq_attr "memory" "store,both")))
			 "bdver3-direct,bdver3-fpsched,(bdver3-fsto+bdver3-store)")

;; Register moves.
(define_insn_reservation "bdver3_ssevector_avx256" 3
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemov")
				   (and (eq_attr "mode" "V8SF,V4DF,OI")
					(eq_attr "memory" "none"))))
			 "bdver3-double,bdver3-fpsched,bdver3-fmal")
(define_insn_reservation "bdver3_movss_movsd" 2
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemov")
				   (and (eq_attr "mode" "SF,DF")
                                        (eq_attr "memory" "none"))))
			 "bdver3-direct,bdver3-fpsched,bdver3-ffma")
(define_insn_reservation "bdver3_mmxssemov" 2
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "mmxmov,ssemov")
				   (eq_attr "memory" "none")))
			 "bdver3-direct,bdver3-fpsched,bdver3-fmal")
;; SSE logs.
(define_insn_reservation "bdver3_sselog_load_256" 7
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "sselog,sselog1")
				   (and (eq_attr "mode" "V8SF")
				   (eq_attr "memory" "load"))))
			 "bdver3-double,bdver3-fpload,bdver3-fmal")
(define_insn_reservation "bdver3_sselog_256" 3
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "sselog,sselog1")
                                   (eq_attr "mode" "V8SF")))
			 "bdver3-double,bdver3-fpsched,bdver3-fmal")
(define_insn_reservation "bdver3_sselog_load" 6
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "sselog,sselog1")
				   (eq_attr "memory" "load")))
			 "bdver3-direct,bdver3-fpload,bdver3-fxbar")
(define_insn_reservation "bdver3_sselog" 2
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "sselog,sselog1"))
			 "bdver3-direct,bdver3-fpsched,bdver3-fxbar")

;; SSE Shuffles
(define_insn_reservation "bdver3_sseshuf_load_256" 7
                         (and (eq_attr "cpu" "bdver3")
                              (and (eq_attr "type" "sseshuf,sseshuf1")
                                   (and (eq_attr "mode" "V8SF")
                                   (eq_attr "memory" "load"))))
                         "bdver3-double,bdver3-fpload,bdver3-fpshuf")
(define_insn_reservation "bdver3_sseshuf_load" 6
                         (and (eq_attr "cpu" "bdver3")
                              (and (eq_attr "type" "sseshuf,sseshuf1")
                                   (eq_attr "memory" "load")))
                         "bdver3-direct,bdver3-fpload,bdver3-fpshuf")

(define_insn_reservation "bdver3_sseshuf_256" 3
                         (and (eq_attr "cpu" "bdver3")
                              (and (eq_attr "type" "sseshuf")
                                   (eq_attr "mode" "V8SF")))
                         "bdver3-double,bdver3-fpsched,bdver3-fpshuf")
(define_insn_reservation "bdver3_sseshuf" 2
                         (and (eq_attr "cpu" "bdver3")
                              (eq_attr "type" "sseshuf,sseshuf1"))
                         "bdver3-direct,bdver3-fpsched,bdver3-fpshuf")

;; PCMP actually executes in FMAL.
(define_insn_reservation "bdver3_ssecmp_load" 6
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecmp")
				   (eq_attr "memory" "load")))
			 "bdver3-direct,bdver3-fpload,bdver3-ffma")
(define_insn_reservation "bdver3_ssecmp" 2
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "ssecmp"))
			 "bdver3-direct,bdver3-fpsched,bdver3-ffma")
(define_insn_reservation "bdver3_ssecomi_load" 6
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecomi")
				   (eq_attr "memory" "load")))
			 "bdver3-double,bdver3-fpload,(bdver3-ffma | bdver3-fsto)")
(define_insn_reservation "bdver3_ssecomi" 2
			 (and (eq_attr "cpu" "bdver3")
			      (eq_attr "type" "ssecomi"))
			 "bdver3-double,bdver3-fpsched,(bdver3-ffma | bdver3-fsto)")

;; Conversions behaves very irregularly and the scheduling is critical here.
;; Take each instruction separately.

;; 256 bit conversion.
(define_insn_reservation "bdver3_vcvtX2Y_avx256_load" 8
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "load")
					(ior (ior (match_operand:V4DF 0 "register_operand")
					          (ior (match_operand:V8SF 0 "register_operand")
						       (match_operand:V8SI 0 "register_operand")))
					     (ior (match_operand:V4DF 1 "nonimmediate_operand")
						  (ior (match_operand:V8SF 1 "nonimmediate_operand")
						       (match_operand:V8SI 1 "nonimmediate_operand")))))))
			 "bdver3-vector,bdver3-fpload,bdver3-fvector")
(define_insn_reservation "bdver3_vcvtX2Y_avx256" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "none")
					(ior (ior (match_operand:V4DF 0 "register_operand")
					          (ior (match_operand:V8SF 0 "register_operand")
						       (match_operand:V8SI 0 "register_operand")))
					     (ior (match_operand:V4DF 1 "nonimmediate_operand")
						  (ior (match_operand:V8SF 1 "nonimmediate_operand")
						       (match_operand:V8SI 1 "nonimmediate_operand")))))))
			 "bdver3-vector,bdver3-fpsched,bdver3-fvector")
;; CVTSS2SD, CVTSD2SS.
(define_insn_reservation "bdver3_ssecvt_cvtss2sd_load" 8
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "mode" "SF,DF")
					(eq_attr "memory" "load"))))
			 "bdver3-direct,bdver3-fpload,bdver3-fcvt")
(define_insn_reservation "bdver3_ssecvt_cvtss2sd" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "mode" "SF,DF")
					(eq_attr "memory" "none"))))
			 "bdver3-direct,bdver3-fpsched,bdver3-fcvt")
;; CVTSI2SD, CVTSI2SS, CVTSI2SDQ, CVTSI2SSQ.
(define_insn_reservation "bdver3_sseicvt_cvtsi2sd_load" 8
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "sseicvt")
				   (and (eq_attr "mode" "SF,DF")
					(eq_attr "memory" "load"))))
			 "bdver3-direct,bdver3-fpload,bdver3-fcvt")
(define_insn_reservation "bdver3_sseicvt_cvtsi2sd" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "sseicvt")
				   (and (eq_attr "mode" "SF,DF")
					(eq_attr "memory" "none"))))
			 "bdver3-double,bdver3-fpsched,(nothing | bdver3-fcvt)")
;; CVTPD2PS.
(define_insn_reservation "bdver3_ssecvt_cvtpd2ps_load" 8
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "load")
                                        (and (match_operand:V4SF 0 "register_operand")
					     (match_operand:V2DF 1 "nonimmediate_operand")))))
			 "bdver3-double,bdver3-fpload,(bdver3-fxbar | bdver3-fcvt)")
(define_insn_reservation "bdver3_ssecvt_cvtpd2ps" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "none")
                                        (and (match_operand:V4SF 0 "register_operand")
					     (match_operand:V2DF 1 "nonimmediate_operand")))))
			 "bdver3-double,bdver3-fpsched,(bdver3-fxbar | bdver3-fcvt)")
;; CVTPI2PS, CVTDQ2PS.
(define_insn_reservation "bdver3_ssecvt_cvtdq2ps_load" 8
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "load")
                                        (and (match_operand:V4SF 0 "register_operand")
					     (ior (match_operand:V2SI 1 "nonimmediate_operand")
					          (match_operand:V4SI 1 "nonimmediate_operand"))))))
			 "bdver3-direct,bdver3-fpload,bdver3-fcvt")
(define_insn_reservation "bdver3_ssecvt_cvtdq2ps" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "none")
                                        (and (match_operand:V4SF 0 "register_operand")
					     (ior (match_operand:V2SI 1 "nonimmediate_operand")
					          (match_operand:V4SI 1 "nonimmediate_operand"))))))
			 "bdver3-direct,bdver3-fpsched,bdver3-fcvt")
;; CVTDQ2PD.
(define_insn_reservation "bdver3_ssecvt_cvtdq2pd_load" 8
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "load")
                                        (and (match_operand:V2DF 0 "register_operand")
					     (match_operand:V4SI 1 "nonimmediate_operand")))))
			 "bdver3-double,bdver3-fpload,(bdver3-fxbar | bdver3-fcvt)")
(define_insn_reservation "bdver3_ssecvt_cvtdq2pd" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "none")
                                        (and (match_operand:V2DF 0 "register_operand")
					     (match_operand:V4SI 1 "nonimmediate_operand")))))
			 "bdver3-double,bdver3-fpsched,(bdver3-fxbar | bdver3-fcvt)")
;; CVTPS2PD, CVTPI2PD.
(define_insn_reservation "bdver3_ssecvt_cvtps2pd_load" 6
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "load")
                                        (and (match_operand:V2DF 0 "register_operand")
					     (ior (match_operand:V2SI 1 "nonimmediate_operand")
					          (match_operand:V4SF 1 "nonimmediate_operand"))))))
			 "bdver3-double,bdver3-fpload,(bdver3-fxbar | bdver3-fcvt)")
(define_insn_reservation "bdver3_ssecvt_cvtps2pd" 2
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "load")
                                        (and (match_operand:V2DF 0 "register_operand")
					     (ior (match_operand:V2SI 1 "nonimmediate_operand")
					          (match_operand:V4SF 1 "nonimmediate_operand"))))))
			 "bdver3-double,bdver3-fpsched,(bdver3-fxbar | bdver3-fcvt)")
;; CVTSD2SI, CVTSD2SIQ, CVTSS2SI, CVTSS2SIQ, CVTTSD2SI, CVTTSD2SIQ, CVTTSS2SI, CVTTSS2SIQ.
(define_insn_reservation "bdver3_ssecvt_cvtsX2si_load" 8
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "sseicvt")
				   (and (eq_attr "mode" "SI,DI")
					(eq_attr "memory" "load"))))
			 "bdver3-double,bdver3-fpload,(bdver3-fcvt | bdver3-fsto)")
(define_insn_reservation "bdver3_ssecvt_cvtsX2si" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "sseicvt")
				   (and (eq_attr "mode" "SI,DI")
					(eq_attr "memory" "none"))))
			 "bdver3-double,bdver3-fpsched,(bdver3-fcvt | bdver3-fsto)")
;; CVTPD2PI, CVTTPD2PI.
(define_insn_reservation "bdver3_ssecvt_cvtpd2pi_load" 8
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "load")
				        (and (match_operand:V2DF 1 "nonimmediate_operand")
					     (match_operand:V2SI 0 "register_operand")))))
			 "bdver3-double,bdver3-fpload,(bdver3-fcvt | bdver3-fxbar)")
(define_insn_reservation "bdver3_ssecvt_cvtpd2pi" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "none")
				        (and (match_operand:V2DF 1 "nonimmediate_operand")
					     (match_operand:V2SI 0 "register_operand")))))
			 "bdver3-double,bdver3-fpsched,(bdver3-fcvt | bdver3-fxbar)")
;; CVTPD2DQ, CVTTPD2DQ.
(define_insn_reservation "bdver3_ssecvt_cvtpd2dq_load" 6
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "load")
				        (and (match_operand:V2DF 1 "nonimmediate_operand")
					     (match_operand:V4SI 0 "register_operand")))))
			 "bdver3-double,bdver3-fpload,(bdver3-fcvt | bdver3-fxbar)")
(define_insn_reservation "bdver3_ssecvt_cvtpd2dq" 2
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "none")
				        (and (match_operand:V2DF 1 "nonimmediate_operand")
					     (match_operand:V4SI 0 "register_operand")))))
			 "bdver3-double,bdver3-fpsched,(bdver3-fcvt | bdver3-fxbar)")
;; CVTPS2PI, CVTTPS2PI, CVTPS2DQ, CVTTPS2DQ.
(define_insn_reservation "bdver3_ssecvt_cvtps2pi_load" 8
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
                                   (and (eq_attr "memory" "load")
				        (and (match_operand:V4SF 1 "nonimmediate_operand")
				             (ior (match_operand: V2SI 0 "register_operand")
						  (match_operand: V4SI 0 "register_operand"))))))
			 "bdver3-direct,bdver3-fpload,bdver3-fcvt")
(define_insn_reservation "bdver3_ssecvt_cvtps2pi" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssecvt")
				   (and (eq_attr "memory" "none")
				        (and (match_operand:V4SF 1 "nonimmediate_operand")
				             (ior (match_operand: V2SI 0 "register_operand")
						  (match_operand: V4SI 0 "register_operand"))))))
			 "bdver3-direct,bdver3-fpsched,bdver3-fcvt")

;; SSE MUL, ADD, and MULADD.
(define_insn_reservation "bdver3_ssemuladd_load_256" 11
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemul,sseadd,sseadd1,ssemuladd")
				   (and (eq_attr "mode" "V8SF,V4DF")
					(eq_attr "memory" "load"))))
			 "bdver3-double,bdver3-fpload,bdver3-ffma")
(define_insn_reservation "bdver3_ssemuladd_256" 7
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemul,sseadd,sseadd1,ssemuladd")
				   (and (eq_attr "mode" "V8SF,V4DF")
					(eq_attr "memory" "none"))))
			 "bdver3-double,bdver3-fpsched,bdver3-ffma")
(define_insn_reservation "bdver3_ssemuladd_load" 10
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemul,sseadd,sseadd1,ssemuladd")
				   (eq_attr "memory" "load")))
			 "bdver3-direct,bdver3-fpload,bdver3-ffma")
(define_insn_reservation "bdver3_ssemuladd" 6
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssemul,sseadd,sseadd1,ssemuladd")
				   (eq_attr "memory" "none")))
			 "bdver3-direct,bdver3-fpsched,bdver3-ffma")
(define_insn_reservation "bdver3_sseimul_load" 8
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "sseimul")
				   (eq_attr "memory" "load")))
			 "bdver3-direct,bdver3-fpload,bdver3-fmma")
(define_insn_reservation "bdver3_sseimul" 4
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "sseimul")
				   (eq_attr "memory" "none")))
			 "bdver3-direct,bdver3-fpsched,bdver3-fmma")
(define_insn_reservation "bdver3_sseiadd_load" 6
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "sseiadd")
				   (eq_attr "memory" "load")))
			 "bdver3-direct,bdver3-fpload,bdver3-fmal")
(define_insn_reservation "bdver3_sseiadd" 2
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "sseiadd")
				   (eq_attr "memory" "none")))
			 "bdver3-direct,bdver3-fpsched,bdver3-fmal")

;; SSE DIV: no throughput information (assume same as amdfam10).
(define_insn_reservation "bdver3_ssediv_double_load_256" 27
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssediv")
				   (and (eq_attr "mode" "V4DF")
				        (eq_attr "memory" "load"))))
			 "bdver3-double,bdver3-fpload,(bdver3-ffma0*17 | bdver3-ffma1*17)")
(define_insn_reservation "bdver3_ssediv_double_256" 27
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssediv")
				   (and (eq_attr "mode" "V4DF")
				        (eq_attr "memory" "none"))))
			 "bdver3-double,bdver3-fpsched,(bdver3-ffma0*17 | bdver3-ffma1*17)")
(define_insn_reservation "bdver3_ssediv_single_load_256" 27
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssediv")
				   (and (eq_attr "mode" "V8SF")
				        (eq_attr "memory" "load"))))
			 "bdver3-double,bdver3-fpload,(bdver3-ffma0*17 | bdver3-ffma1*17)")
(define_insn_reservation "bdver3_ssediv_single_256" 24
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssediv")
				   (and (eq_attr "mode" "V8SF")
				        (eq_attr "memory" "none"))))
			 "bdver3-double,bdver3-fpsched,(bdver3-ffma0*17 | bdver3-ffma1*17)")
(define_insn_reservation "bdver3_ssediv_double_load" 27
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssediv")
				   (and (eq_attr "mode" "DF,V2DF")
					(eq_attr "memory" "load"))))
			 "bdver3-direct,bdver3-fpload,(bdver3-ffma0*17 | bdver3-ffma1*17)")
(define_insn_reservation "bdver3_ssediv_double" 27
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssediv")
				   (and (eq_attr "mode" "DF,V2DF")
					(eq_attr "memory" "none"))))
			 "bdver3-direct,bdver3-fpsched,(bdver3-ffma0*17 | bdver3-ffma1*17)")
(define_insn_reservation "bdver3_ssediv_single_load" 27 
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssediv")
				   (and (eq_attr "mode" "SF,V4SF")
					(eq_attr "memory" "load"))))
			 "bdver3-direct,bdver3-fpload,(bdver3-ffma0*17 | bdver3-ffma1*17)")
(define_insn_reservation "bdver3_ssediv_single" 24
			 (and (eq_attr "cpu" "bdver3")
			      (and (eq_attr "type" "ssediv")
				   (and (eq_attr "mode" "SF,V4SF")
					(eq_attr "memory" "none"))))
			 "bdver3-direct,bdver3-fpsched,(bdver3-ffma0*17 | bdver3-ffma1*17)")

(define_insn_reservation "bdver3_sseins" 3
                         (and (eq_attr "cpu" "bdver3")
                              (and (eq_attr "type" "sseins")
                                   (eq_attr "mode" "TI")))
                         "bdver3-direct,bdver3-fpsched,bdver3-fxbar")


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH, i386]: AMD bdver3 enablement
  2012-11-14  9:22               ` Gopalasubramanian, Ganesh
@ 2012-11-14 10:45                 ` Uros Bizjak
  2012-11-16  7:24                   ` Gopalasubramanian, Ganesh
  0 siblings, 1 reply; 12+ messages in thread
From: Uros Bizjak @ 2012-11-14 10:45 UTC (permalink / raw)
  To: Gopalasubramanian, Ganesh; +Cc: gcc-patches

On Wed, Nov 14, 2012 at 10:22 AM, Gopalasubramanian, Ganesh
<Ganesh.Gopalasubramanian@amd.com> wrote:

>> sseshuf replaces sselog in some insn patterns, but should be handled in the same way in *existing* .md files.
>
> Modifications done as per the comments.
> 1. Sseshuf is added along with sselog in existing md files.
> 2. sseshuf is handled in a separate pattern in bdver3.md
>
> Bootstrapping and "make -k check" passes.
> Ok for trunk?
>
> 2012-11-14  Ganesh Gopalasubramanian  <Ganesh.Gopalasubramanian@amd.com>
>
>         bdver3 Enablement
>         * gcc/doc/extend.texi: Add details about bdver3.
>         * gcc/doc/invoke.texi: Add details about bdver3.
>         * config.gcc (i[34567]86-*-linux* | ...): Add bdver3.
>         (case ${target}): Add bdver3.
>         * config/i386/i386.h (TARGET_BDVER3): New definition.
>         * config/i386/i386.md (define_attr "cpu"): Add bdver3.
>         * config/i386/sse.md (sseshuf): New type attribute.
>         * config/i386/athlon.md (sseshuf):Likewise.
>         * config/i386/atom.md (sseshuf):Likewise.
>         * config/i386/ppro.md (sseshuf):Likewise.
>         * config/i386/bdver1.md (sseshuf):Likewise.
>         * config/i386/i386.opt (flag_dispatch_scheduler): Add bdver3.
>         * config/i386/i386-c.c (ix86_target_macros_internal): Add
>         bdver3 def_and_undef
>         * config/i386/driver-i386.c (host_detect_local_cpu): Let
>         -march=native recognize bdver3 processors.
>         * config/i386/i386.c (struct processor_costs bdver3_cost): New.
>         (m_BDVER3): New definition.
>         (m_AMD_MULTIPLE): Includes m_BDVER3.
>         (initial_ix86_tune_features): Add bdver3 tune.
>         (processor_target_table): Add bdver3 entry.
>         (static const char *const cpu_names): Add bdver3 entry.
>         (software_prefetching_beneficial_p): Add bdver3.
>         (ix86_option_override_internal): Add bdver3 instruction sets.
>         (ix86_option_override_internal): Remove XSAVEOPT for bdver1
>         and bdver2.
>         (ix86_issue_rate): Add bdver3.
>         (ix86_adjust_cost): Add bdver3.
>         (enum target_cpu_default): Add TARGET_CPU_DEFAULT_bdver3.
>         (enum processor_type): Add PROCESSOR_BDVER3.
>         * config/i386/bdver3.md: New file describing bdver3 pipelines.

OK for mainline.

Thanks,
Uros.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH, i386]: AMD bdver3 enablement
  2012-11-14 10:45                 ` Uros Bizjak
@ 2012-11-16  7:24                   ` Gopalasubramanian, Ganesh
  0 siblings, 0 replies; 12+ messages in thread
From: Gopalasubramanian, Ganesh @ 2012-11-16  7:24 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches

Thank Uros for the comments.

The changes are committed to trunk
http://gcc.gnu.org/viewcvs?view=revision&revision=193548
http://gcc.gnu.org/viewcvs?view=revision&revision=193549

Regards
Ganesh

-----Original Message-----
From: Uros Bizjak [mailto:ubizjak@gmail.com] 
Sent: Wednesday, November 14, 2012 4:15 PM
To: Gopalasubramanian, Ganesh
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH, i386]: AMD bdver3 enablement

On Wed, Nov 14, 2012 at 10:22 AM, Gopalasubramanian, Ganesh <Ganesh.Gopalasubramanian@amd.com> wrote:

>> sseshuf replaces sselog in some insn patterns, but should be handled in the same way in *existing* .md files.
>
> Modifications done as per the comments.
> 1. Sseshuf is added along with sselog in existing md files.
> 2. sseshuf is handled in a separate pattern in bdver3.md
>
> Bootstrapping and "make -k check" passes.
> Ok for trunk?
>
> 2012-11-14  Ganesh Gopalasubramanian  
> <Ganesh.Gopalasubramanian@amd.com>
>
>         bdver3 Enablement
>         * gcc/doc/extend.texi: Add details about bdver3.
>         * gcc/doc/invoke.texi: Add details about bdver3.
>         * config.gcc (i[34567]86-*-linux* | ...): Add bdver3.
>         (case ${target}): Add bdver3.
>         * config/i386/i386.h (TARGET_BDVER3): New definition.
>         * config/i386/i386.md (define_attr "cpu"): Add bdver3.
>         * config/i386/sse.md (sseshuf): New type attribute.
>         * config/i386/athlon.md (sseshuf):Likewise.
>         * config/i386/atom.md (sseshuf):Likewise.
>         * config/i386/ppro.md (sseshuf):Likewise.
>         * config/i386/bdver1.md (sseshuf):Likewise.
>         * config/i386/i386.opt (flag_dispatch_scheduler): Add bdver3.
>         * config/i386/i386-c.c (ix86_target_macros_internal): Add
>         bdver3 def_and_undef
>         * config/i386/driver-i386.c (host_detect_local_cpu): Let
>         -march=native recognize bdver3 processors.
>         * config/i386/i386.c (struct processor_costs bdver3_cost): New.
>         (m_BDVER3): New definition.
>         (m_AMD_MULTIPLE): Includes m_BDVER3.
>         (initial_ix86_tune_features): Add bdver3 tune.
>         (processor_target_table): Add bdver3 entry.
>         (static const char *const cpu_names): Add bdver3 entry.
>         (software_prefetching_beneficial_p): Add bdver3.
>         (ix86_option_override_internal): Add bdver3 instruction sets.
>         (ix86_option_override_internal): Remove XSAVEOPT for bdver1
>         and bdver2.
>         (ix86_issue_rate): Add bdver3.
>         (ix86_adjust_cost): Add bdver3.
>         (enum target_cpu_default): Add TARGET_CPU_DEFAULT_bdver3.
>         (enum processor_type): Add PROCESSOR_BDVER3.
>         * config/i386/bdver3.md: New file describing bdver3 pipelines.

OK for mainline.

Thanks,
Uros.


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2012-11-16  7:24 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-10-11  8:39 [PATCH, i386]: AMD bdver3 enablement Gopalasubramanian, Ganesh
2012-10-11 17:14 ` Uros Bizjak
2012-11-05  7:33   ` Gopalasubramanian, Ganesh
2012-11-05  8:06     ` Uros Bizjak
2012-11-09  3:39       ` Gopalasubramanian, Ganesh
2012-11-11 21:00         ` Uros Bizjak
2012-11-12  5:35           ` Gopalasubramanian, Ganesh
2012-11-12  8:09             ` Uros Bizjak
2012-11-14  9:22               ` Gopalasubramanian, Ganesh
2012-11-14 10:45                 ` Uros Bizjak
2012-11-16  7:24                   ` Gopalasubramanian, Ganesh
2012-11-09  3:50       ` Gopalasubramanian, Ganesh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).