* [PATCH][X86_64] Separate znver4 insn reservations from older znvers @ 2022-11-14 16:18 Joshi, Tejas Sanjay 2022-11-14 18:51 ` Alexander Monakov 0 siblings, 1 reply; 14+ messages in thread From: Joshi, Tejas Sanjay @ 2022-11-14 16:18 UTC (permalink / raw) To: gcc-patches, honza.hubicka; +Cc: Alexander Monakov, Kumar, Venkataramanan [-- Attachment #1: Type: text/plain, Size: 41177 bytes --] [Public] Hi, PFA the patch which adds znver4 instruction reservations separately from older znver versions: * This also models separate div, fdiv and ssediv units accordingly. * Does not blow-up the insn-automata.cc size (it grew from 201502 to 206141 for me.) * The patch successfully builds, bootstraps, and passes make check. * I have also run spec, showing no regressions for 1-copy 3-iteration runs. However, I observe 1.5% gain for 507.cactuBSSN_r. Is it ok for trunk? Thanks and Regards, Tejas Also, should I inline such long patches? gcc/ChangeLog: * gcc/common/config/i386/i386-common.cc (processor_alias_table): Use CPU_ZNVER4 for znver4. * config/i386/i386.md: Add znver4.md. * config/i386/znver4.md: New. --- gcc/common/config/i386/i386-common.cc | 2 +- gcc/config/i386/i386.md | 1 + gcc/config/i386/znver4.md | 1028 +++++++++++++++++++++++++ 3 files changed, 1030 insertions(+), 1 deletion(-) create mode 100644 gcc/config/i386/znver4.md diff --git a/gcc/common/config/i386/i386-common.cc b/gcc/common/config/i386/i386-common.cc index f66bdd5a2af..4b01c3540e5 100644 --- a/gcc/common/config/i386/i386-common.cc +++ b/gcc/common/config/i386/i386-common.cc @@ -2113,7 +2113,7 @@ const pta processor_alias_table[] = {"znver3", PROCESSOR_ZNVER3, CPU_ZNVER3, PTA_ZNVER3, M_CPU_SUBTYPE (AMDFAM19H_ZNVER3), P_PROC_AVX2}, - {"znver4", PROCESSOR_ZNVER4, CPU_ZNVER3, + {"znver4", PROCESSOR_ZNVER4, CPU_ZNVER4, PTA_ZNVER4, M_CPU_SUBTYPE (AMDFAM19H_ZNVER4), P_PROC_AVX512F}, {"btver1", PROCESSOR_BTVER1, CPU_GENERIC, diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 8081df76741..c18dfe2af9e 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -1312,6 +1312,7 @@ (include "bdver3.md") (include "btver2.md") (include "znver.md") +(include "znver4.md") (include "geode.md") (include "atom.md") (include "slm.md") diff --git a/gcc/config/i386/znver4.md b/gcc/config/i386/znver4.md new file mode 100644 index 00000000000..e3892d1df2f --- /dev/null +++ b/gcc/config/i386/znver4.md @@ -0,0 +1,1028 @@ +;; Copyright (C) 2012-2022 Free Software Foundation, Inc. +;; +;; This file is part of GCC. +;; +;; GCC is free software; you can redistribute it and/or modify +;; it under the terms of the GNU General Public License as published by +;; the Free Software Foundation; either version 3, or (at your option) +;; any later version. +;; +;; GCC is distributed in the hope that it will be useful, +;; but WITHOUT ANY WARRANTY; without even the implied warranty of +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +;; GNU General Public License for more details. +;; +;; You should have received a copy of the GNU General Public License +;; along with GCC; see the file COPYING3. If not see +;; <http://www.gnu.org/licenses/>. +;; + + +(define_attr "znver4_decode" "direct,vector,double" + (const_string "direct")) + +;; AMD znver4 Scheduling +;; Modeling automatons for zen decoders, integer execution pipes, +;; AGU pipes, branch, floating point execution and fp store units. +(define_automaton "znver4, znver4_ieu, znver4_idiv, znver4_fdiv, znver4_ssediv, znver4_agu, znver4_bru, znver4_fpu, znver4_fp_store") + +;; Decoders unit has 4 decoders and all of them can decode fast path +;; and vector type instructions. +(define_cpu_unit "znver4-decode0" "znver4") +(define_cpu_unit "znver4-decode1" "znver4") +(define_cpu_unit "znver4-decode2" "znver4") +(define_cpu_unit "znver4-decode3" "znver4") + +;; Currently blocking all decoders for vector path instructions as +;; they are dispatched separetely as microcode sequence. +(define_reservation "znver4-vector" "znver4-decode0+znver4-decode1+znver4-decode2+znver4-decode3") + +;; Direct instructions can be issued to any of the four decoders. +(define_reservation "znver4-direct" "znver4-decode0|znver4-decode1|znver4-decode2|znver4-decode3") + +;; Fix me: Need to revisit this later to simulate fast path double behavior. +(define_reservation "znver4-double" "znver4-direct") + + +;; Integer unit 4 ALU pipes. +(define_cpu_unit "znver4-ieu0" "znver4_ieu") +(define_cpu_unit "znver4-ieu1" "znver4_ieu") +(define_cpu_unit "znver4-ieu2" "znver4_ieu") +(define_cpu_unit "znver4-ieu3" "znver4_ieu") +(define_reservation "znver4-ieu" "znver4-ieu0|znver4-ieu1|znver4-ieu2|znver4-ieu3") + +;; 3 AGU pipes in znver4 +(define_cpu_unit "znver4-agu0" "znver4_agu") +(define_cpu_unit "znver4-agu1" "znver4_agu") +(define_cpu_unit "znver4-agu2" "znver4_agu") +(define_reservation "znver4-agu-reserve" "znver4-agu0|znver4-agu1|znver4-agu2") + +;; Load is 4 cycles. We do not model reservation of load unit. +(define_reservation "znver4-load" "znver4-agu-reserve") +(define_reservation "znver4-store" "znver4-agu-reserve") + +;; vectorpath (microcoded) instructions are single issue instructions. +;; So, they occupy all the integer units. +(define_reservation "znver4-ivector" "znver4-ieu0+znver4-ieu1 + +znver4-ieu2+znver4-ieu3 + +znver4-agu0+znver4-agu1+znver4-agu2") + +;; Floating point unit 4 FP pipes. +(define_cpu_unit "znver4-fpu0" "znver4_fpu") +(define_cpu_unit "znver4-fpu1" "znver4_fpu") +(define_cpu_unit "znver4-fpu2" "znver4_fpu") +(define_cpu_unit "znver4-fpu3" "znver4_fpu") + +(define_reservation "znver4-fpu" "znver4-fpu0|znver4-fpu1|znver4-fpu2|znver4-fpu3") + +(define_reservation "znver4-fvector" "znver4-fpu0+znver4-fpu1 + +znver4-fpu2+znver4-fpu3 + +znver4-agu0+znver4-agu1+znver4-agu2") + +;; DIV units +(define_cpu_unit "znver4-idiv" "znver4_idiv") +(define_cpu_unit "znver4-fdiv" "znver4_fdiv") +(define_cpu_unit "znver4-ssediv" "znver4_ssediv") + +;; znver4 has a separate branch unit. +(define_cpu_unit "znver4-bru" "znver4_bru") + +;; Separate fp store and fp-to-int store. Although there are 2 store pipes, the +;; throughput is limited to only one per cycle. +(define_cpu_unit "znver4-fp-store" "znver4_fp_store") + +;; Call Instruction +(define_insn_reservation "znver4_call" 1 + (and (eq_attr "cpu" "znver4") + (eq_attr "type" "call,callv")) + "znver4-double,znver4-ieu0|znver4-bru,znver4-store") + +;; Push Instruction +(define_insn_reservation "znver4_push" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "push") + (eq_attr "memory" "store"))) + "znver4-direct,znver4-store") + +(define_insn_reservation "znver4_push_load" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "push") + (eq_attr "memory" "both"))) + "znver4-direct,znver4-load,znver4-store") + +;; Pop instruction +(define_insn_reservation "znver4_pop" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "pop") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load") + +(define_insn_reservation "znver4_pop_mem" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "pop") + (eq_attr "memory" "both"))) + "znver4-direct,znver4-load,znver4-store") + +;; Leave +(define_insn_reservation "znver4_leave" 1 + (and (eq_attr "cpu" "znver4") + (eq_attr "type" "leave")) + "znver4-double,znver4-ieu,znver4-store") + +;; Integer Instructions or General instructions +;; Multiplications +(define_insn_reservation "znver4_imul" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "imul") + (and (eq_attr "mode" "QI,HI,SI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-ieu1") + +(define_insn_reservation "znver4_imul_DI" 4 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "imul") + (and (eq_attr "mode" "DI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-ieu1") + +(define_insn_reservation "znver4_imul_mem" 7 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "imul") + (and (eq_attr "mode" "QI,HI,SI") + (eq_attr "memory" "!none")))) + "znver4-direct,znver4-load,znver4-ieu1") + +(define_insn_reservation "znver4_imul_DI_mem" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "imul") + (and (eq_attr "mode" "DI") + (eq_attr "memory" "!none")))) + "znver4-direct,znver4-load,znver4-ieu1") + +;; Divisions +(define_insn_reservation "znver4_idiv_DI" 18 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "idiv") + (and (eq_attr "mode" "DI") + (eq_attr "memory" "none")))) + "znver4-double,znver4-idiv*10") + +(define_insn_reservation "znver4_idiv_SI" 12 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "idiv") + (and (eq_attr "mode" "SI") + (eq_attr "memory" "none")))) + "znver4-double,znver4-idiv*6") + +(define_insn_reservation "znver4_idiv_HI" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "idiv") + (and (eq_attr "mode" "HI") + (eq_attr "memory" "none")))) + "znver4-double,znver4-idiv*4") + +(define_insn_reservation "znver4_idiv_QI" 9 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "idiv") + (and (eq_attr "mode" "QI") + (eq_attr "memory" "none")))) + "znver4-double,znver4-idiv*4") + +(define_insn_reservation "znver4_idiv_DI_mem" 22 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "idiv") + (and (eq_attr "mode" "DI") + (eq_attr "memory" "!none")))) + "znver4-double,znver4-load,znver4-idiv*10") + +(define_insn_reservation "znver4_idiv_SI_mem" 16 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "idiv") + (and (eq_attr "mode" "SI") + (eq_attr "memory" "!none")))) + "znver4-double,znver4-load,znver4-idiv*6") + +(define_insn_reservation "znver4_idiv_HI_mem" 14 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "idiv") + (and (eq_attr "mode" "HI") + (eq_attr "memory" "!none")))) + "znver4-double,znver4-load,znver4-idiv*4") + +(define_insn_reservation "znver4_idiv_QI_mem" 13 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "idiv") + (and (eq_attr "mode" "QI") + (eq_attr "memory" "!none")))) + "znver4-double,znver4-load,znver4-idiv*4") + +;; STR and ISHIFT are microcoded. +(define_insn_reservation "znver4_str" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "str") + (eq_attr "memory" "both,store"))) + "znver4-vector,znver4-ivector") + +(define_insn_reservation "znver4_ishift" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ishift") + (eq_attr "memory" "both,store"))) + "znver4-vector,znver4-ivector") + +;; MOV - integer movs +(define_insn_reservation "znver4_imovx_double" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "znver1_decode" "double") + (and (eq_attr "type" "imovx") + (eq_attr "memory" "none")))) + "znver4-double,znver4-ieu") + +(define_insn_reservation "znver4_imov_direct" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "imov") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-ieu") + +(define_insn_reservation "znver4_imov_double_store" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "znver1_decode" "double") + (and (eq_attr "type" "imovx") + (eq_attr "memory" "store")))) + "znver4-double,znver4-ieu,znver4-store") + +(define_insn_reservation "znver4_imov_direct_store" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "imov") + (eq_attr "memory" "store"))) + "znver4-direct,znver4-ieu,znver4-store") + +(define_insn_reservation "znver4_imov_load_double_store" 4 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "znver1_decode" "double") + (and (eq_attr "type" "imovx") + (eq_attr "memory" "store")))) + "znver4-double,znver4-load,znver4-ieu,znver4-store") + +(define_insn_reservation "znver4_imov_load_direct_store" 4 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "imov") + (eq_attr "memory" "store"))) + "znver4-direct,znver4-load,znver4-ieu,znver4-store") + +;; INTEGER/GENERAL Instructions +(define_insn_reservation "znver4_insn" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "alu,alu1,negnot,rotate1,ishift1,test,incdec,icmp") + (eq_attr "memory" "none,unknown"))) + "znver4-direct,znver4-ieu") + +(define_insn_reservation "znver4_insn_load" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "alu,alu1,negnot,rotate1,ishift1,test,incdec,icmp") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-ieu") + +(define_insn_reservation "znver4_insn2" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "icmov,setcc") + (eq_attr "memory" "none,unknown"))) + "znver4-direct,znver4-ieu0|znver4-ieu3") + +(define_insn_reservation "znver4_insn2_load" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "icmov,setcc") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-ieu0|znver4-ieu3") + +(define_insn_reservation "znver4_rotate" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "rotate") + (eq_attr "memory" "none,unknown"))) + "znver4-direct,znver4-ieu1|znver4-ieu2") + +(define_insn_reservation "znver4_rotate_load" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "rotate") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-ieu1|znver4-ieu2") + +(define_insn_reservation "znver4_insn_store" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "alu,alu1,negnot,rotate1,ishift1,test,incdec,icmp") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-ieu,znver4-store") + +(define_insn_reservation "znver4_insn2_store" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "icmov,setcc") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-ieu0|znver4-ieu3,znver4-store") + +(define_insn_reservation "znver4_rotate_store" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "rotate") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-ieu1|znver4-ieu2,znver4-store") + +;; Other vector type +(define_insn_reservation "znver4_ieu_vector" 5 + (and (eq_attr "cpu" "znver4") + (eq_attr "type" "other,multi")) + "znver4-vector,znver4-ivector") + +;; alu1 instructions +(define_insn_reservation "znver4_alu1_vector" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "znver1_decode" "vector") + (and (eq_attr "type" "alu1") + (eq_attr "memory" "none,unknown")))) + "znver4-vector,znver4-ivector") + +(define_insn_reservation "znver4_alu1_direct" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "znver1_decode" "direct") + (and (eq_attr "type" "alu1") + (eq_attr "memory" "none,unknown")))) + "znver4-direct,znver4-ieu") + +;; Branches +(define_insn_reservation "znver4_branch" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ibr") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-ieu0|znver4-bru") + +(define_insn_reservation "znver4_branch_mem" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ibr") + (eq_attr "memory" "load"))) + "znver4-vector,znver4-ivector") + +;; LEA instruction with simple addressing +(define_insn_reservation "znver4_lea" 1 + (and (eq_attr "cpu" "znver4") + (eq_attr "type" "lea")) + "znver4-direct,znver4-ieu") + +;; Floating Point +;; FP movs +(define_insn_reservation "znver4_fp_cmov" 6 + (and (eq_attr "cpu" "znver4") + (eq_attr "type" "fcmov")) + "znver4-vector,znver4-fvector") + +(define_insn_reservation "znver4_fp_mov_direct" 1 + (and (eq_attr "cpu" "znver4") + (eq_attr "type" "fmov")) + "znver4-direct,znver4-fpu1") + +(define_insn_reservation "znver4_fp_mov_direct_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "znver1_decode" "direct") + (and (eq_attr "type" "fmov") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu1") + +(define_insn_reservation "znver4_fp_mov_direct_store" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "znver1_decode" "direct") + (and (eq_attr "type" "fmov") + (eq_attr "memory" "store")))) + "znver4-direct,znver4-fpu1,znver4-fp-store") + +(define_insn_reservation "znver4_fp_mov_double" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "znver1_decode" "double") + (and (eq_attr "type" "fmov") + (eq_attr "memory" "none")))) + "znver4-double,znver4-fpu1,znver4-fp-store") + +(define_insn_reservation "znver4_fp_mov_double_load" 12 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "znver1_decode" "double") + (and (eq_attr "type" "fmov") + (eq_attr "memory" "load")))) + "znver4-double,znver4-load,znver4-fpu1,znver4-fp-store") + +;; FSQRT +(define_insn_reservation "znver4_fsqrt" 22 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "fpspc") + (and (eq_attr "mode" "XF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fdiv*20") + +;; FPSPC instructions +(define_insn_reservation "znver4_fp_spc" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "fpspc") + (eq_attr "memory" "none"))) + "znver4-vector,znver4-fvector") + +(define_insn_reservation "znver4_fp_insn_vector" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "znver1_decode" "vector") + (eq_attr "type" "mmxcvt,sselog1,ssemov"))) + "znver4-vector,znver4-fvector") + +;; FABS, FCHS +(define_insn_reservation "znver4_fp_fsgn" 1 + (and (eq_attr "cpu" "znver4") + (eq_attr "type" "fsgn")) + "znver4-direct,znver4-fpu0|znver4-fpu1") + +;; FCMP +(define_insn_reservation "znver4_fp_fcmp" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "fcmp") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-fpu1") + +(define_insn_reservation "znver4_fp_fcmp_double" 4 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "fcmp") + (and (eq_attr "znver1_decode" "double") + (eq_attr "memory" "none")))) + "znver4-double,znver4-fpu1,znver4-fpu2") + +;; FADD, FSUB, FMUL +(define_insn_reservation "znver4_fp_op_mul" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "fop,fmul") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-fpu0") + +(define_insn_reservation "znver4_fp_op_mul_load" 13 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "fop,fmul") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-fpu0") + +;; FDIV +(define_insn_reservation "znver4_fp_div" 15 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "fdiv") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-fdiv*15") + +(define_insn_reservation "znver4_fp_div_load" 22 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "fdiv") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-fdiv*15") + +(define_insn_reservation "znver4_fp_idiv_load" 26 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "fdiv") + (and (eq_attr "fp_int_src" "true") + (eq_attr "memory" "load")))) + "znver4-double,znver4-load,znver4-fdiv*19") + +;; MMX, SSE, SSEn.n instructions +(define_insn_reservation "znver4_fp_mmx " 1 + (and (eq_attr "cpu" "znver4") + (eq_attr "type" "mmx")) + "znver4-direct,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_mmx_add_cmp" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "mmxadd,mmxcmp") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-fpu") + +(define_insn_reservation "znver4_mmx_add_cmp_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "mmxadd,mmxcmp") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-fpu") + +(define_insn_reservation "znver4_mmx_insn" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "mmxcvt,sseshuf,sseshuf1,mmxshft") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_mmx_insn_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "mmxcvt,sseshuf,sseshuf1,mmxshft") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_mmx_mov" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" ",mmxmov") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-fp-store") + +(define_insn_reservation "znver4_mmx_mov_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "mmxmov") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-fp-store") + +(define_insn_reservation "znver4_mmx_mul" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "mmxmul") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-fpu0|znver4-fpu3") + +(define_insn_reservation "znver4_mmx_mul_load" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "mmxmul") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-fpu0|znver4-fpu3") + +;; AVX instructions +(define_insn_reservation "znver4_sse_log" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sselog,sselog1") + (and (eq_attr "mode" "V4SF,V8SF,V2DF,V4DF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu") + +(define_insn_reservation "znver4_sse_log_evex" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sselog,sselog1") + (and (eq_attr "mode" "V16SF,V8DF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu0+znver4-fpu1|znver4-fpu2+znver4-fpu3") + +(define_insn_reservation "znver4_sse_log_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sselog,sselog1") + (and (eq_attr "mode" "V4SF,V8SF,V2DF,V4DF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu") + +(define_insn_reservation "znver4_sse_log_evex_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sselog,sselog1") + (and (eq_attr "mode" "V16SF,V8DF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu0+znver4-fpu1|znver4-fpu2+znver4-fpu3") + +(define_insn_reservation "znver4_sse_ilog" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sselog,sselog1") + (and (eq_attr "mode" "OI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu0+znver4-fpu1|znver4-fpu2+znver4-fpu3") + +(define_insn_reservation "znver4_sse_ilog_evex" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sselog,sselog1") + (and (eq_attr "mode" "TI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu0+znver4-fpu1+znver4-fpu2+znver4-fpu3") + +(define_insn_reservation "znver4_sse_ilog_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sselog,sselog1") + (and (eq_attr "mode" "OI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu0+znver4-fpu1|znver4-fpu2+znver4-fpu3") + +(define_insn_reservation "znver4_sse_ilog_evex_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sselog,sselog1") + (and (eq_attr "mode" "TI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu0+znver4-fpu1+znver4-fpu2+znver4-fpu3") + +(define_insn_reservation "znver4_sse_comi" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecomi") + (eq_attr "memory" "none"))) + "znver4-double,znver4-fpu2|znver4-fpu3,znver4-fp-store") + +(define_insn_reservation "znver4_sse_comi_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecomi") + (eq_attr "memory" "load"))) + "znver4-double,znver4-load,znver4-fpu2|znver4-fpu3,znver4-fp-store") + +(define_insn_reservation "znver4_sse_test" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "prefix_extra" "1") + (and (eq_attr "type" "ssecomi") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_sse_test_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "prefix_extra" "1") + (and (eq_attr "type" "ssecomi") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_sse_imul" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseimul") + (and (eq_attr "mode" "QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu0|znver4-fpu3") + +(define_insn_reservation "znver4_sse_imul_load" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseimul") + (and (eq_attr "mode" "QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu0|znver4-fpu3") + +(define_insn_reservation "znver4_sse_mov" 2 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_sse_mov_load" 9 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_sse_mov_store" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "store")))) + "znver4-direct,znver4-fpu1|znver4-fpu2,znver4-fp-store") + +(define_insn_reservation "znver4_sse_mov_fp" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "V16SF,V8DF,V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu") + +(define_insn_reservation "znver4_sse_mov_fp_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "V16SF,V8DF,V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu") + +(define_insn_reservation "znver4_sse_mov_fp_store" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "V16SF,V8DF,V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "store")))) + "znver4-direct,znver4-fp-store") + +(define_insn_reservation "znver4_sse_add" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseadd") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu2|znver4-fpu3") + +(define_insn_reservation "znver4_sse_add_load" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseadd") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu2|znver4-fpu3") + +(define_insn_reservation "znver4_sse_add1" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseadd1") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "none")))) + "znver4-vector,znver4-fvector") + +(define_insn_reservation "znver4_sse_add1_load" 13 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseadd1") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "load")))) + "znver4-vector,znver4-load,znver4-fvector") + +(define_insn_reservation "znver4_sse_iadd" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseiadd") + (and (eq_attr "mode" "QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu") + +(define_insn_reservation "znver4_sse_iadd_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseiadd") + (and (eq_attr "mode" "QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu") + +(define_insn_reservation "znver4_sse_mul" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemul") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu0|znver4-fpu3") + +(define_insn_reservation "znver4_sse_mul_load" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseadd") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu0|znver4-fpu1") + +(define_insn_reservation "znver4_sse_div_pd" 13 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssediv") + (and (eq_attr "mode" "V4DF,V2DF,V1DF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-ssediv*7") + +(define_insn_reservation "znver4_sse_div_ps" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssediv") + (and (eq_attr "mode" "V8SF,V4SF,V2SF,SF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-ssediv*5") + +(define_insn_reservation "znver4_sse_div_pd_load" 20 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssediv") + (and (eq_attr "mode" "V4DF,V2DF,V1DF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-ssediv*7") + +(define_insn_reservation "znver4_sse_div_ps_load" 17 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssediv") + (and (eq_attr "mode" "V8SF,V4SF,V2SF,SF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-ssediv*5") + +(define_insn_reservation "znver4_sse_cmp_avx" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecmp,ssecomi") + (and (eq_attr "mode" "V4SF,V2DF,V2SF,V1DF,SF,QI,HI,SI,DI,TI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu0|znver4-fpu1") + +(define_insn_reservation "znver4_sse_cmp_avx_load" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecmp,ssecomi") + (and (eq_attr "mode" "V4SF,V2DF,V2SF,V1DF,SF,QI,HI,SI,DI,TI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu0|znver4-fpu1") + +(define_insn_reservation "znver4_sse_cmp_avx2" 4 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecmp,ssecomi") + (and (eq_attr "mode" "V8SF,V4DF,OI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu0|znver4-fpu1") + +(define_insn_reservation "znver4_sse_cmp_avx2_load" 11 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecmp,ssecomi") + (and (eq_attr "mode" "V8SF,V4DF,OI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu0|znver4-fpu1") + +(define_insn_reservation "znver4_sse_cvt" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecvt") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu2|znver4-fpu3") + +(define_insn_reservation "znver4_sse_cvt_load" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecvt") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu2|znver4-fpu3") + +(define_insn_reservation "znver4_sse_icvt" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecvt") + (and (eq_attr "mode" "SI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu2|znver4-fpu3") + +(define_insn_reservation "znver4_sse_icvt_store" 4 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecvt") + (and (eq_attr "mode" "SI") + (eq_attr "memory" "store")))) + "znver4-direct,znver4-fpu2|znver4-fpu3,znver4-fp-store") + +(define_insn_reservation "znver4_sse_shuf" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseshuf") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_sse_shuf_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseshuf") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu") + +(define_insn_reservation "znver4_sse_ishuf" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseshuf") + (and (eq_attr "mode" "OI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_sse_ishuf_load" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseshuf") + (and (eq_attr "mode" "OI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu1|znver4-fpu2") + +;; AVX512 instructions +(define_insn_reservation "znver4_sse_mul_evex" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemul") + (and (eq_attr "mode" "V16SF,V8DF") + (eq_attr "memory" "none")))) + "znver4-double,znver4-fpu0|znver4-fpu3") + +(define_insn_reservation "znver4_sse_mul_evex_load" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseadd") + (and (eq_attr "mode" "V16SF,V8DF") + (eq_attr "memory" "load")))) + "znver4-double,znver4-load,znver4-fpu0|znver4-fpu1") + +(define_insn_reservation "znver4_sse_imul_evex" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseimul") + (and (eq_attr "mode" "XI") + (eq_attr "memory" "none")))) + "znver4-double,znver4-fpu0|znver4-fpu1") + +(define_insn_reservation "znver4_sse_imul_evex_load" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseimul") + (and (eq_attr "mode" "XI") + (eq_attr "memory" "load")))) + "znver4-double,znver4-load,znver4-fpu0|znver4-fpu1") + +(define_insn_reservation "znver4_sse_mov_evex" 4 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "XI") + (eq_attr "memory" "none")))) + "znver4-double,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_sse_mov_evex_load" 11 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "XI") + (eq_attr "memory" "load")))) + "znver4-double,znver4-load,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_sse_mov_evex_store" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "XI") + (eq_attr "memory" "store")))) + "znver4-double,znver4-fpu1|znver4-fpu2,znver4-fp-store") + +(define_insn_reservation "znver4_sse_add_evex" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseadd") + (and (eq_attr "mode" "V16SF,V8DF") + (eq_attr "memory" "none")))) + "znver4-double,znver4-fpu2|znver4-fpu3") + +(define_insn_reservation "znver4_sse_add_evex_load" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseadd") + (and (eq_attr "mode" "V16SF,V8DF") + (eq_attr "memory" "load")))) + "znver4-double,znver4-load,znver4-fpu2|znver4-fpu3") + +(define_insn_reservation "znver4_sse_iadd_evex" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseiadd") + (and (eq_attr "mode" "XI") + (eq_attr "memory" "none")))) + "znver4-double,znver4-fpu") + +(define_insn_reservation "znver4_sse_iadd_evex_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseiadd") + (and (eq_attr "mode" "XI") + (eq_attr "memory" "load")))) + "znver4-double,znver4-load,znver4-fpu") + +(define_insn_reservation "znver4_sse_div_pd_evex" 13 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssediv") + (and (eq_attr "mode" "V8DF") + (eq_attr "memory" "none")))) + "znver4-double,znver4-ssediv*7") + +(define_insn_reservation "znver4_sse_div_ps_evex" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssediv") + (and (eq_attr "mode" "V16SF") + (eq_attr "memory" "none")))) + "znver4-double,znver4-ssediv*5") + +(define_insn_reservation "znver4_sse_div_pd_evex_load" 20 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssediv") + (and (eq_attr "mode" "V8DF") + (eq_attr "memory" "load")))) + "znver4-double,znver4-load,znver4-ssediv*7") + +(define_insn_reservation "znver4_sse_div_ps_evex_load" 17 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssediv") + (and (eq_attr "mode" "V16SF") + (eq_attr "memory" "load")))) + "znver4-double,znver4-load,znver4-ssediv*5") + +(define_insn_reservation "znver4_sse_cmp_avx512" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecmp,ssecomi") + (and (eq_attr "mode" "V16SF,V8DF,XI") + (eq_attr "memory" "none")))) + "znver4-double,znver4-fpu0|znver4-fpu1") + +(define_insn_reservation "znver4_sse_cmp_avx512_load" 12 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecmp,ssecomi") + (and (eq_attr "mode" "V16SF,V8DF,XI") + (eq_attr "memory" "load")))) + "znver4-double,znver4-load,znver4-fpu0|znver4-fpu1") + +(define_insn_reservation "znver4_sse_cvt_evex" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecvt") + (and (eq_attr "mode" "V16SF,V8DF") + (eq_attr "memory" "none")))) + "znver4-double,znver4-fpu1|znver4-fpu2,znver4-fpu2|znver4-fpu3") + +(define_insn_reservation "znver4_sse_cvt_evex_load" 13 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecvt") + (and (eq_attr "mode" "V16SF,V8DF") + (eq_attr "memory" "load")))) + "znver4-double,znver4-load,znver4-fpu1|znver4-fpu2,znver4-fpu2|znver4-fpu3") + +(define_insn_reservation "znver4_sse_shuf_evex" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseshuf") + (and (eq_attr "mode" "V16SF,V8DF") + (eq_attr "memory" "none")))) + "znver4-double,znver4-fpu") + +(define_insn_reservation "znver4_sse_shuf_evex_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseshuf") + (and (eq_attr "mode" "V16SF,V8DF") + (eq_attr "memory" "load")))) + "znver4-double,znver4-load,znver4-fpu") + +(define_insn_reservation "znver4_sse_ishuf_evex" 4 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseshuf") + (and (eq_attr "mode" "XI") + (eq_attr "memory" "none")))) + "znver4-double,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_sse_ishuf_evex_load" 11 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseshuf") + (and (eq_attr "mode" "XI") + (eq_attr "memory" "load")))) + "znver4-double,znver4-load,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_sse_muladd" 4 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemuladd") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-fpu0|znver4-fpu1") + +(define_insn_reservation "znver4_sse_muladd_load" 11 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseshuf") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-fpu0|znver4-fpu1") + +;; AVX512 mask instructions + +(define_insn_reservation "znver4_sse_mskmov" 2 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "mskmov") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-fpu0|znver4-fpu1") + +(define_insn_reservation "znver4_sse_msklog" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "msklog") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-fpu2|znver4-fpu3") -- 2.25.1 [-- Attachment #2: 0001-Add-AMD-znver4-instruction-reservations.patch --] [-- Type: application/octet-stream, Size: 39882 bytes --] From e68993875db5a173429111b8e7cfac099019db0a Mon Sep 17 00:00:00 2001 From: Tejas Joshi <TejasSanjay.Joshi@amd.com> Date: Wed, 9 Nov 2022 00:10:59 +0530 Subject: [PATCH] Add AMD znver4 instruction reservations This adds znver4 automata units and reservations separately from other znver automata, avoiding the insn-automata.cc size blow-up. gcc/ChangeLog: * gcc/common/config/i386/i386-common.cc (processor_alias_table): Use CPU_ZNVER4 for znver4. * config/i386/i386.md: Add znver4.md. * config/i386/znver4.md: New. --- gcc/common/config/i386/i386-common.cc | 2 +- gcc/config/i386/i386.md | 1 + gcc/config/i386/znver4.md | 1028 +++++++++++++++++++++++++ 3 files changed, 1030 insertions(+), 1 deletion(-) create mode 100644 gcc/config/i386/znver4.md diff --git a/gcc/common/config/i386/i386-common.cc b/gcc/common/config/i386/i386-common.cc index f66bdd5a2af..4b01c3540e5 100644 --- a/gcc/common/config/i386/i386-common.cc +++ b/gcc/common/config/i386/i386-common.cc @@ -2113,7 +2113,7 @@ const pta processor_alias_table[] = {"znver3", PROCESSOR_ZNVER3, CPU_ZNVER3, PTA_ZNVER3, M_CPU_SUBTYPE (AMDFAM19H_ZNVER3), P_PROC_AVX2}, - {"znver4", PROCESSOR_ZNVER4, CPU_ZNVER3, + {"znver4", PROCESSOR_ZNVER4, CPU_ZNVER4, PTA_ZNVER4, M_CPU_SUBTYPE (AMDFAM19H_ZNVER4), P_PROC_AVX512F}, {"btver1", PROCESSOR_BTVER1, CPU_GENERIC, diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 8081df76741..c18dfe2af9e 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -1312,6 +1312,7 @@ (include "bdver3.md") (include "btver2.md") (include "znver.md") +(include "znver4.md") (include "geode.md") (include "atom.md") (include "slm.md") diff --git a/gcc/config/i386/znver4.md b/gcc/config/i386/znver4.md new file mode 100644 index 00000000000..e3892d1df2f --- /dev/null +++ b/gcc/config/i386/znver4.md @@ -0,0 +1,1028 @@ +;; Copyright (C) 2012-2022 Free Software Foundation, Inc. +;; +;; This file is part of GCC. +;; +;; GCC is free software; you can redistribute it and/or modify +;; it under the terms of the GNU General Public License as published by +;; the Free Software Foundation; either version 3, or (at your option) +;; any later version. +;; +;; GCC is distributed in the hope that it will be useful, +;; but WITHOUT ANY WARRANTY; without even the implied warranty of +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +;; GNU General Public License for more details. +;; +;; You should have received a copy of the GNU General Public License +;; along with GCC; see the file COPYING3. If not see +;; <http://www.gnu.org/licenses/>. +;; + + +(define_attr "znver4_decode" "direct,vector,double" + (const_string "direct")) + +;; AMD znver4 Scheduling +;; Modeling automatons for zen decoders, integer execution pipes, +;; AGU pipes, branch, floating point execution and fp store units. +(define_automaton "znver4, znver4_ieu, znver4_idiv, znver4_fdiv, znver4_ssediv, znver4_agu, znver4_bru, znver4_fpu, znver4_fp_store") + +;; Decoders unit has 4 decoders and all of them can decode fast path +;; and vector type instructions. +(define_cpu_unit "znver4-decode0" "znver4") +(define_cpu_unit "znver4-decode1" "znver4") +(define_cpu_unit "znver4-decode2" "znver4") +(define_cpu_unit "znver4-decode3" "znver4") + +;; Currently blocking all decoders for vector path instructions as +;; they are dispatched separetely as microcode sequence. +(define_reservation "znver4-vector" "znver4-decode0+znver4-decode1+znver4-decode2+znver4-decode3") + +;; Direct instructions can be issued to any of the four decoders. +(define_reservation "znver4-direct" "znver4-decode0|znver4-decode1|znver4-decode2|znver4-decode3") + +;; Fix me: Need to revisit this later to simulate fast path double behavior. +(define_reservation "znver4-double" "znver4-direct") + + +;; Integer unit 4 ALU pipes. +(define_cpu_unit "znver4-ieu0" "znver4_ieu") +(define_cpu_unit "znver4-ieu1" "znver4_ieu") +(define_cpu_unit "znver4-ieu2" "znver4_ieu") +(define_cpu_unit "znver4-ieu3" "znver4_ieu") +(define_reservation "znver4-ieu" "znver4-ieu0|znver4-ieu1|znver4-ieu2|znver4-ieu3") + +;; 3 AGU pipes in znver4 +(define_cpu_unit "znver4-agu0" "znver4_agu") +(define_cpu_unit "znver4-agu1" "znver4_agu") +(define_cpu_unit "znver4-agu2" "znver4_agu") +(define_reservation "znver4-agu-reserve" "znver4-agu0|znver4-agu1|znver4-agu2") + +;; Load is 4 cycles. We do not model reservation of load unit. +(define_reservation "znver4-load" "znver4-agu-reserve") +(define_reservation "znver4-store" "znver4-agu-reserve") + +;; vectorpath (microcoded) instructions are single issue instructions. +;; So, they occupy all the integer units. +(define_reservation "znver4-ivector" "znver4-ieu0+znver4-ieu1 + +znver4-ieu2+znver4-ieu3 + +znver4-agu0+znver4-agu1+znver4-agu2") + +;; Floating point unit 4 FP pipes. +(define_cpu_unit "znver4-fpu0" "znver4_fpu") +(define_cpu_unit "znver4-fpu1" "znver4_fpu") +(define_cpu_unit "znver4-fpu2" "znver4_fpu") +(define_cpu_unit "znver4-fpu3" "znver4_fpu") + +(define_reservation "znver4-fpu" "znver4-fpu0|znver4-fpu1|znver4-fpu2|znver4-fpu3") + +(define_reservation "znver4-fvector" "znver4-fpu0+znver4-fpu1 + +znver4-fpu2+znver4-fpu3 + +znver4-agu0+znver4-agu1+znver4-agu2") + +;; DIV units +(define_cpu_unit "znver4-idiv" "znver4_idiv") +(define_cpu_unit "znver4-fdiv" "znver4_fdiv") +(define_cpu_unit "znver4-ssediv" "znver4_ssediv") + +;; znver4 has a separate branch unit. +(define_cpu_unit "znver4-bru" "znver4_bru") + +;; Separate fp store and fp-to-int store. Although there are 2 store pipes, the +;; throughput is limited to only one per cycle. +(define_cpu_unit "znver4-fp-store" "znver4_fp_store") + +;; Call Instruction +(define_insn_reservation "znver4_call" 1 + (and (eq_attr "cpu" "znver4") + (eq_attr "type" "call,callv")) + "znver4-double,znver4-ieu0|znver4-bru,znver4-store") + +;; Push Instruction +(define_insn_reservation "znver4_push" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "push") + (eq_attr "memory" "store"))) + "znver4-direct,znver4-store") + +(define_insn_reservation "znver4_push_load" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "push") + (eq_attr "memory" "both"))) + "znver4-direct,znver4-load,znver4-store") + +;; Pop instruction +(define_insn_reservation "znver4_pop" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "pop") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load") + +(define_insn_reservation "znver4_pop_mem" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "pop") + (eq_attr "memory" "both"))) + "znver4-direct,znver4-load,znver4-store") + +;; Leave +(define_insn_reservation "znver4_leave" 1 + (and (eq_attr "cpu" "znver4") + (eq_attr "type" "leave")) + "znver4-double,znver4-ieu,znver4-store") + +;; Integer Instructions or General instructions +;; Multiplications +(define_insn_reservation "znver4_imul" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "imul") + (and (eq_attr "mode" "QI,HI,SI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-ieu1") + +(define_insn_reservation "znver4_imul_DI" 4 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "imul") + (and (eq_attr "mode" "DI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-ieu1") + +(define_insn_reservation "znver4_imul_mem" 7 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "imul") + (and (eq_attr "mode" "QI,HI,SI") + (eq_attr "memory" "!none")))) + "znver4-direct,znver4-load,znver4-ieu1") + +(define_insn_reservation "znver4_imul_DI_mem" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "imul") + (and (eq_attr "mode" "DI") + (eq_attr "memory" "!none")))) + "znver4-direct,znver4-load,znver4-ieu1") + +;; Divisions +(define_insn_reservation "znver4_idiv_DI" 18 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "idiv") + (and (eq_attr "mode" "DI") + (eq_attr "memory" "none")))) + "znver4-double,znver4-idiv*10") + +(define_insn_reservation "znver4_idiv_SI" 12 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "idiv") + (and (eq_attr "mode" "SI") + (eq_attr "memory" "none")))) + "znver4-double,znver4-idiv*6") + +(define_insn_reservation "znver4_idiv_HI" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "idiv") + (and (eq_attr "mode" "HI") + (eq_attr "memory" "none")))) + "znver4-double,znver4-idiv*4") + +(define_insn_reservation "znver4_idiv_QI" 9 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "idiv") + (and (eq_attr "mode" "QI") + (eq_attr "memory" "none")))) + "znver4-double,znver4-idiv*4") + +(define_insn_reservation "znver4_idiv_DI_mem" 22 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "idiv") + (and (eq_attr "mode" "DI") + (eq_attr "memory" "!none")))) + "znver4-double,znver4-load,znver4-idiv*10") + +(define_insn_reservation "znver4_idiv_SI_mem" 16 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "idiv") + (and (eq_attr "mode" "SI") + (eq_attr "memory" "!none")))) + "znver4-double,znver4-load,znver4-idiv*6") + +(define_insn_reservation "znver4_idiv_HI_mem" 14 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "idiv") + (and (eq_attr "mode" "HI") + (eq_attr "memory" "!none")))) + "znver4-double,znver4-load,znver4-idiv*4") + +(define_insn_reservation "znver4_idiv_QI_mem" 13 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "idiv") + (and (eq_attr "mode" "QI") + (eq_attr "memory" "!none")))) + "znver4-double,znver4-load,znver4-idiv*4") + +;; STR and ISHIFT are microcoded. +(define_insn_reservation "znver4_str" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "str") + (eq_attr "memory" "both,store"))) + "znver4-vector,znver4-ivector") + +(define_insn_reservation "znver4_ishift" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ishift") + (eq_attr "memory" "both,store"))) + "znver4-vector,znver4-ivector") + +;; MOV - integer movs +(define_insn_reservation "znver4_imovx_double" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "znver1_decode" "double") + (and (eq_attr "type" "imovx") + (eq_attr "memory" "none")))) + "znver4-double,znver4-ieu") + +(define_insn_reservation "znver4_imov_direct" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "imov") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-ieu") + +(define_insn_reservation "znver4_imov_double_store" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "znver1_decode" "double") + (and (eq_attr "type" "imovx") + (eq_attr "memory" "store")))) + "znver4-double,znver4-ieu,znver4-store") + +(define_insn_reservation "znver4_imov_direct_store" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "imov") + (eq_attr "memory" "store"))) + "znver4-direct,znver4-ieu,znver4-store") + +(define_insn_reservation "znver4_imov_load_double_store" 4 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "znver1_decode" "double") + (and (eq_attr "type" "imovx") + (eq_attr "memory" "store")))) + "znver4-double,znver4-load,znver4-ieu,znver4-store") + +(define_insn_reservation "znver4_imov_load_direct_store" 4 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "imov") + (eq_attr "memory" "store"))) + "znver4-direct,znver4-load,znver4-ieu,znver4-store") + +;; INTEGER/GENERAL Instructions +(define_insn_reservation "znver4_insn" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "alu,alu1,negnot,rotate1,ishift1,test,incdec,icmp") + (eq_attr "memory" "none,unknown"))) + "znver4-direct,znver4-ieu") + +(define_insn_reservation "znver4_insn_load" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "alu,alu1,negnot,rotate1,ishift1,test,incdec,icmp") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-ieu") + +(define_insn_reservation "znver4_insn2" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "icmov,setcc") + (eq_attr "memory" "none,unknown"))) + "znver4-direct,znver4-ieu0|znver4-ieu3") + +(define_insn_reservation "znver4_insn2_load" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "icmov,setcc") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-ieu0|znver4-ieu3") + +(define_insn_reservation "znver4_rotate" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "rotate") + (eq_attr "memory" "none,unknown"))) + "znver4-direct,znver4-ieu1|znver4-ieu2") + +(define_insn_reservation "znver4_rotate_load" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "rotate") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-ieu1|znver4-ieu2") + +(define_insn_reservation "znver4_insn_store" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "alu,alu1,negnot,rotate1,ishift1,test,incdec,icmp") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-ieu,znver4-store") + +(define_insn_reservation "znver4_insn2_store" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "icmov,setcc") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-ieu0|znver4-ieu3,znver4-store") + +(define_insn_reservation "znver4_rotate_store" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "rotate") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-ieu1|znver4-ieu2,znver4-store") + +;; Other vector type +(define_insn_reservation "znver4_ieu_vector" 5 + (and (eq_attr "cpu" "znver4") + (eq_attr "type" "other,multi")) + "znver4-vector,znver4-ivector") + +;; alu1 instructions +(define_insn_reservation "znver4_alu1_vector" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "znver1_decode" "vector") + (and (eq_attr "type" "alu1") + (eq_attr "memory" "none,unknown")))) + "znver4-vector,znver4-ivector") + +(define_insn_reservation "znver4_alu1_direct" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "znver1_decode" "direct") + (and (eq_attr "type" "alu1") + (eq_attr "memory" "none,unknown")))) + "znver4-direct,znver4-ieu") + +;; Branches +(define_insn_reservation "znver4_branch" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ibr") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-ieu0|znver4-bru") + +(define_insn_reservation "znver4_branch_mem" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ibr") + (eq_attr "memory" "load"))) + "znver4-vector,znver4-ivector") + +;; LEA instruction with simple addressing +(define_insn_reservation "znver4_lea" 1 + (and (eq_attr "cpu" "znver4") + (eq_attr "type" "lea")) + "znver4-direct,znver4-ieu") + +;; Floating Point +;; FP movs +(define_insn_reservation "znver4_fp_cmov" 6 + (and (eq_attr "cpu" "znver4") + (eq_attr "type" "fcmov")) + "znver4-vector,znver4-fvector") + +(define_insn_reservation "znver4_fp_mov_direct" 1 + (and (eq_attr "cpu" "znver4") + (eq_attr "type" "fmov")) + "znver4-direct,znver4-fpu1") + +(define_insn_reservation "znver4_fp_mov_direct_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "znver1_decode" "direct") + (and (eq_attr "type" "fmov") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu1") + +(define_insn_reservation "znver4_fp_mov_direct_store" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "znver1_decode" "direct") + (and (eq_attr "type" "fmov") + (eq_attr "memory" "store")))) + "znver4-direct,znver4-fpu1,znver4-fp-store") + +(define_insn_reservation "znver4_fp_mov_double" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "znver1_decode" "double") + (and (eq_attr "type" "fmov") + (eq_attr "memory" "none")))) + "znver4-double,znver4-fpu1,znver4-fp-store") + +(define_insn_reservation "znver4_fp_mov_double_load" 12 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "znver1_decode" "double") + (and (eq_attr "type" "fmov") + (eq_attr "memory" "load")))) + "znver4-double,znver4-load,znver4-fpu1,znver4-fp-store") + +;; FSQRT +(define_insn_reservation "znver4_fsqrt" 22 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "fpspc") + (and (eq_attr "mode" "XF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fdiv*20") + +;; FPSPC instructions +(define_insn_reservation "znver4_fp_spc" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "fpspc") + (eq_attr "memory" "none"))) + "znver4-vector,znver4-fvector") + +(define_insn_reservation "znver4_fp_insn_vector" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "znver1_decode" "vector") + (eq_attr "type" "mmxcvt,sselog1,ssemov"))) + "znver4-vector,znver4-fvector") + +;; FABS, FCHS +(define_insn_reservation "znver4_fp_fsgn" 1 + (and (eq_attr "cpu" "znver4") + (eq_attr "type" "fsgn")) + "znver4-direct,znver4-fpu0|znver4-fpu1") + +;; FCMP +(define_insn_reservation "znver4_fp_fcmp" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "fcmp") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-fpu1") + +(define_insn_reservation "znver4_fp_fcmp_double" 4 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "fcmp") + (and (eq_attr "znver1_decode" "double") + (eq_attr "memory" "none")))) + "znver4-double,znver4-fpu1,znver4-fpu2") + +;; FADD, FSUB, FMUL +(define_insn_reservation "znver4_fp_op_mul" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "fop,fmul") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-fpu0") + +(define_insn_reservation "znver4_fp_op_mul_load" 13 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "fop,fmul") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-fpu0") + +;; FDIV +(define_insn_reservation "znver4_fp_div" 15 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "fdiv") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-fdiv*15") + +(define_insn_reservation "znver4_fp_div_load" 22 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "fdiv") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-fdiv*15") + +(define_insn_reservation "znver4_fp_idiv_load" 26 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "fdiv") + (and (eq_attr "fp_int_src" "true") + (eq_attr "memory" "load")))) + "znver4-double,znver4-load,znver4-fdiv*19") + +;; MMX, SSE, SSEn.n instructions +(define_insn_reservation "znver4_fp_mmx " 1 + (and (eq_attr "cpu" "znver4") + (eq_attr "type" "mmx")) + "znver4-direct,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_mmx_add_cmp" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "mmxadd,mmxcmp") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-fpu") + +(define_insn_reservation "znver4_mmx_add_cmp_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "mmxadd,mmxcmp") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-fpu") + +(define_insn_reservation "znver4_mmx_insn" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "mmxcvt,sseshuf,sseshuf1,mmxshft") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_mmx_insn_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "mmxcvt,sseshuf,sseshuf1,mmxshft") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_mmx_mov" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" ",mmxmov") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-fp-store") + +(define_insn_reservation "znver4_mmx_mov_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "mmxmov") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-fp-store") + +(define_insn_reservation "znver4_mmx_mul" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "mmxmul") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-fpu0|znver4-fpu3") + +(define_insn_reservation "znver4_mmx_mul_load" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "mmxmul") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-fpu0|znver4-fpu3") + +;; AVX instructions +(define_insn_reservation "znver4_sse_log" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sselog,sselog1") + (and (eq_attr "mode" "V4SF,V8SF,V2DF,V4DF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu") + +(define_insn_reservation "znver4_sse_log_evex" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sselog,sselog1") + (and (eq_attr "mode" "V16SF,V8DF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu0+znver4-fpu1|znver4-fpu2+znver4-fpu3") + +(define_insn_reservation "znver4_sse_log_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sselog,sselog1") + (and (eq_attr "mode" "V4SF,V8SF,V2DF,V4DF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu") + +(define_insn_reservation "znver4_sse_log_evex_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sselog,sselog1") + (and (eq_attr "mode" "V16SF,V8DF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu0+znver4-fpu1|znver4-fpu2+znver4-fpu3") + +(define_insn_reservation "znver4_sse_ilog" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sselog,sselog1") + (and (eq_attr "mode" "OI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu0+znver4-fpu1|znver4-fpu2+znver4-fpu3") + +(define_insn_reservation "znver4_sse_ilog_evex" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sselog,sselog1") + (and (eq_attr "mode" "TI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu0+znver4-fpu1+znver4-fpu2+znver4-fpu3") + +(define_insn_reservation "znver4_sse_ilog_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sselog,sselog1") + (and (eq_attr "mode" "OI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu0+znver4-fpu1|znver4-fpu2+znver4-fpu3") + +(define_insn_reservation "znver4_sse_ilog_evex_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sselog,sselog1") + (and (eq_attr "mode" "TI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu0+znver4-fpu1+znver4-fpu2+znver4-fpu3") + +(define_insn_reservation "znver4_sse_comi" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecomi") + (eq_attr "memory" "none"))) + "znver4-double,znver4-fpu2|znver4-fpu3,znver4-fp-store") + +(define_insn_reservation "znver4_sse_comi_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecomi") + (eq_attr "memory" "load"))) + "znver4-double,znver4-load,znver4-fpu2|znver4-fpu3,znver4-fp-store") + +(define_insn_reservation "znver4_sse_test" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "prefix_extra" "1") + (and (eq_attr "type" "ssecomi") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_sse_test_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "prefix_extra" "1") + (and (eq_attr "type" "ssecomi") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_sse_imul" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseimul") + (and (eq_attr "mode" "QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu0|znver4-fpu3") + +(define_insn_reservation "znver4_sse_imul_load" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseimul") + (and (eq_attr "mode" "QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu0|znver4-fpu3") + +(define_insn_reservation "znver4_sse_mov" 2 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_sse_mov_load" 9 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_sse_mov_store" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "store")))) + "znver4-direct,znver4-fpu1|znver4-fpu2,znver4-fp-store") + +(define_insn_reservation "znver4_sse_mov_fp" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "V16SF,V8DF,V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu") + +(define_insn_reservation "znver4_sse_mov_fp_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "V16SF,V8DF,V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu") + +(define_insn_reservation "znver4_sse_mov_fp_store" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "V16SF,V8DF,V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "store")))) + "znver4-direct,znver4-fp-store") + +(define_insn_reservation "znver4_sse_add" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseadd") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu2|znver4-fpu3") + +(define_insn_reservation "znver4_sse_add_load" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseadd") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu2|znver4-fpu3") + +(define_insn_reservation "znver4_sse_add1" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseadd1") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "none")))) + "znver4-vector,znver4-fvector") + +(define_insn_reservation "znver4_sse_add1_load" 13 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseadd1") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "load")))) + "znver4-vector,znver4-load,znver4-fvector") + +(define_insn_reservation "znver4_sse_iadd" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseiadd") + (and (eq_attr "mode" "QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu") + +(define_insn_reservation "znver4_sse_iadd_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseiadd") + (and (eq_attr "mode" "QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu") + +(define_insn_reservation "znver4_sse_mul" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemul") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu0|znver4-fpu3") + +(define_insn_reservation "znver4_sse_mul_load" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseadd") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu0|znver4-fpu1") + +(define_insn_reservation "znver4_sse_div_pd" 13 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssediv") + (and (eq_attr "mode" "V4DF,V2DF,V1DF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-ssediv*7") + +(define_insn_reservation "znver4_sse_div_ps" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssediv") + (and (eq_attr "mode" "V8SF,V4SF,V2SF,SF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-ssediv*5") + +(define_insn_reservation "znver4_sse_div_pd_load" 20 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssediv") + (and (eq_attr "mode" "V4DF,V2DF,V1DF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-ssediv*7") + +(define_insn_reservation "znver4_sse_div_ps_load" 17 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssediv") + (and (eq_attr "mode" "V8SF,V4SF,V2SF,SF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-ssediv*5") + +(define_insn_reservation "znver4_sse_cmp_avx" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecmp,ssecomi") + (and (eq_attr "mode" "V4SF,V2DF,V2SF,V1DF,SF,QI,HI,SI,DI,TI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu0|znver4-fpu1") + +(define_insn_reservation "znver4_sse_cmp_avx_load" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecmp,ssecomi") + (and (eq_attr "mode" "V4SF,V2DF,V2SF,V1DF,SF,QI,HI,SI,DI,TI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu0|znver4-fpu1") + +(define_insn_reservation "znver4_sse_cmp_avx2" 4 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecmp,ssecomi") + (and (eq_attr "mode" "V8SF,V4DF,OI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu0|znver4-fpu1") + +(define_insn_reservation "znver4_sse_cmp_avx2_load" 11 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecmp,ssecomi") + (and (eq_attr "mode" "V8SF,V4DF,OI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu0|znver4-fpu1") + +(define_insn_reservation "znver4_sse_cvt" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecvt") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu2|znver4-fpu3") + +(define_insn_reservation "znver4_sse_cvt_load" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecvt") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu2|znver4-fpu3") + +(define_insn_reservation "znver4_sse_icvt" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecvt") + (and (eq_attr "mode" "SI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu2|znver4-fpu3") + +(define_insn_reservation "znver4_sse_icvt_store" 4 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecvt") + (and (eq_attr "mode" "SI") + (eq_attr "memory" "store")))) + "znver4-direct,znver4-fpu2|znver4-fpu3,znver4-fp-store") + +(define_insn_reservation "znver4_sse_shuf" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseshuf") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_sse_shuf_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseshuf") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu") + +(define_insn_reservation "znver4_sse_ishuf" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseshuf") + (and (eq_attr "mode" "OI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_sse_ishuf_load" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseshuf") + (and (eq_attr "mode" "OI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu1|znver4-fpu2") + +;; AVX512 instructions +(define_insn_reservation "znver4_sse_mul_evex" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemul") + (and (eq_attr "mode" "V16SF,V8DF") + (eq_attr "memory" "none")))) + "znver4-double,znver4-fpu0|znver4-fpu3") + +(define_insn_reservation "znver4_sse_mul_evex_load" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseadd") + (and (eq_attr "mode" "V16SF,V8DF") + (eq_attr "memory" "load")))) + "znver4-double,znver4-load,znver4-fpu0|znver4-fpu1") + +(define_insn_reservation "znver4_sse_imul_evex" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseimul") + (and (eq_attr "mode" "XI") + (eq_attr "memory" "none")))) + "znver4-double,znver4-fpu0|znver4-fpu1") + +(define_insn_reservation "znver4_sse_imul_evex_load" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseimul") + (and (eq_attr "mode" "XI") + (eq_attr "memory" "load")))) + "znver4-double,znver4-load,znver4-fpu0|znver4-fpu1") + +(define_insn_reservation "znver4_sse_mov_evex" 4 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "XI") + (eq_attr "memory" "none")))) + "znver4-double,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_sse_mov_evex_load" 11 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "XI") + (eq_attr "memory" "load")))) + "znver4-double,znver4-load,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_sse_mov_evex_store" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "XI") + (eq_attr "memory" "store")))) + "znver4-double,znver4-fpu1|znver4-fpu2,znver4-fp-store") + +(define_insn_reservation "znver4_sse_add_evex" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseadd") + (and (eq_attr "mode" "V16SF,V8DF") + (eq_attr "memory" "none")))) + "znver4-double,znver4-fpu2|znver4-fpu3") + +(define_insn_reservation "znver4_sse_add_evex_load" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseadd") + (and (eq_attr "mode" "V16SF,V8DF") + (eq_attr "memory" "load")))) + "znver4-double,znver4-load,znver4-fpu2|znver4-fpu3") + +(define_insn_reservation "znver4_sse_iadd_evex" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseiadd") + (and (eq_attr "mode" "XI") + (eq_attr "memory" "none")))) + "znver4-double,znver4-fpu") + +(define_insn_reservation "znver4_sse_iadd_evex_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseiadd") + (and (eq_attr "mode" "XI") + (eq_attr "memory" "load")))) + "znver4-double,znver4-load,znver4-fpu") + +(define_insn_reservation "znver4_sse_div_pd_evex" 13 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssediv") + (and (eq_attr "mode" "V8DF") + (eq_attr "memory" "none")))) + "znver4-double,znver4-ssediv*7") + +(define_insn_reservation "znver4_sse_div_ps_evex" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssediv") + (and (eq_attr "mode" "V16SF") + (eq_attr "memory" "none")))) + "znver4-double,znver4-ssediv*5") + +(define_insn_reservation "znver4_sse_div_pd_evex_load" 20 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssediv") + (and (eq_attr "mode" "V8DF") + (eq_attr "memory" "load")))) + "znver4-double,znver4-load,znver4-ssediv*7") + +(define_insn_reservation "znver4_sse_div_ps_evex_load" 17 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssediv") + (and (eq_attr "mode" "V16SF") + (eq_attr "memory" "load")))) + "znver4-double,znver4-load,znver4-ssediv*5") + +(define_insn_reservation "znver4_sse_cmp_avx512" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecmp,ssecomi") + (and (eq_attr "mode" "V16SF,V8DF,XI") + (eq_attr "memory" "none")))) + "znver4-double,znver4-fpu0|znver4-fpu1") + +(define_insn_reservation "znver4_sse_cmp_avx512_load" 12 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecmp,ssecomi") + (and (eq_attr "mode" "V16SF,V8DF,XI") + (eq_attr "memory" "load")))) + "znver4-double,znver4-load,znver4-fpu0|znver4-fpu1") + +(define_insn_reservation "znver4_sse_cvt_evex" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecvt") + (and (eq_attr "mode" "V16SF,V8DF") + (eq_attr "memory" "none")))) + "znver4-double,znver4-fpu1|znver4-fpu2,znver4-fpu2|znver4-fpu3") + +(define_insn_reservation "znver4_sse_cvt_evex_load" 13 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecvt") + (and (eq_attr "mode" "V16SF,V8DF") + (eq_attr "memory" "load")))) + "znver4-double,znver4-load,znver4-fpu1|znver4-fpu2,znver4-fpu2|znver4-fpu3") + +(define_insn_reservation "znver4_sse_shuf_evex" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseshuf") + (and (eq_attr "mode" "V16SF,V8DF") + (eq_attr "memory" "none")))) + "znver4-double,znver4-fpu") + +(define_insn_reservation "znver4_sse_shuf_evex_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseshuf") + (and (eq_attr "mode" "V16SF,V8DF") + (eq_attr "memory" "load")))) + "znver4-double,znver4-load,znver4-fpu") + +(define_insn_reservation "znver4_sse_ishuf_evex" 4 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseshuf") + (and (eq_attr "mode" "XI") + (eq_attr "memory" "none")))) + "znver4-double,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_sse_ishuf_evex_load" 11 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseshuf") + (and (eq_attr "mode" "XI") + (eq_attr "memory" "load")))) + "znver4-double,znver4-load,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_sse_muladd" 4 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemuladd") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-fpu0|znver4-fpu1") + +(define_insn_reservation "znver4_sse_muladd_load" 11 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseshuf") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-fpu0|znver4-fpu1") + +;; AVX512 mask instructions + +(define_insn_reservation "znver4_sse_mskmov" 2 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "mskmov") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-fpu0|znver4-fpu1") + +(define_insn_reservation "znver4_sse_msklog" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "msklog") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-fpu2|znver4-fpu3") -- 2.25.1 ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH][X86_64] Separate znver4 insn reservations from older znvers 2022-11-14 16:18 [PATCH][X86_64] Separate znver4 insn reservations from older znvers Joshi, Tejas Sanjay @ 2022-11-14 18:51 ` Alexander Monakov 2022-11-15 12:08 ` Joshi, Tejas Sanjay 0 siblings, 1 reply; 14+ messages in thread From: Alexander Monakov @ 2022-11-14 18:51 UTC (permalink / raw) To: Joshi, Tejas Sanjay; +Cc: gcc-patches, honza.hubicka, Kumar, Venkataramanan On Mon, 14 Nov 2022, Joshi, Tejas Sanjay wrote: > [Public] > > Hi, Hi. I'm still waiting for feedback on fixes for existing models: https://inbox.sourceware.org/gcc-patches/5ae6fc21-edc6-133-aee2-a41e16eb5b7@ispras.ru/T/#t did you have a chance to look at those? > PFA the patch which adds znver4 instruction reservations separately from older > znver versions: > * This also models separate div, fdiv and ssediv units accordingly. Why are you modeling 'fdiv' and 'ssediv' separately? When preparing the above patches, I checked that x87 and SSE divisions use the same hardware unit, and I don't see a strong reason to artificially clone it in the model. (integer divider is a separate unit from the floating-point divider) > * Does not blow-up the insn-automata.cc size (it grew from 201502 to 206141 for me.) > * The patch successfully builds, bootstraps, and passes make check. > * I have also run spec, showing no regressions for 1-copy 3-iteration runs. However, I observe 1.5% gain for 507.cactuBSSN_r. I have a question on AVX512 modeling in your patch: > +;; AVX instructions > +(define_insn_reservation "znver4_sse_log" 1 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "sselog,sselog1") > + (and (eq_attr "mode" "V4SF,V8SF,V2DF,V4DF") > + (eq_attr "memory" "none")))) > + "znver4-direct,znver4-fpu") > + > +(define_insn_reservation "znver4_sse_log_evex" 1 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "sselog,sselog1") > + (and (eq_attr "mode" "V16SF,V8DF") > + (eq_attr "memory" "none")))) > + "znver4-direct,znver4-fpu0+znver4-fpu1|znver4-fpu2+znver4-fpu3") > + This is an AVX512 instruction, and you're modeling that it occupies two ports at once and thus has half throughput, but later in the AVX512 section: > +;; AVX512 instructions > +(define_insn_reservation "znver4_sse_mul_evex" 3 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssemul") > + (and (eq_attr "mode" "V16SF,V8DF") > + (eq_attr "memory" "none")))) > + "znver4-double,znver4-fpu0|znver4-fpu3") none of the instructions are modeled this way. If that's on purpose, can you add a comment? It's surprising, since generally AVX512 has half throughput compared to AVX256 on Zen 4, but the model doesn't seem to reflect that. Alexander ^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: [PATCH][X86_64] Separate znver4 insn reservations from older znvers 2022-11-14 18:51 ` Alexander Monakov @ 2022-11-15 12:08 ` Joshi, Tejas Sanjay 2022-11-15 12:51 ` Alexander Monakov 0 siblings, 1 reply; 14+ messages in thread From: Joshi, Tejas Sanjay @ 2022-11-15 12:08 UTC (permalink / raw) To: Alexander Monakov, gcc-patches; +Cc: honza.hubicka, Kumar, Venkataramanan [Public] Hi, Thank you for reviewing the patch. > Hi. I'm still waiting for feedback on fixes for existing models: > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Finbox. > sourceware.org%2Fgcc-patches%2F5ae6fc21-edc6-133-aee2- > a41e16eb5b7%40ispras.ru%2FT%2F%23t&data=05%7C01%7CTejasSanja > y.Joshi%40amd.com%7C5e440454f42948dd6b2e08dac6714448%7C3dd8961fe > 4884e608e11a82d994e183d%7C0%7C0%7C638040487038011623%7CUnknow > n%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1ha > WwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=iWNT2VRhEHxgpbq > Y4dNYjuzdvz%2BaV5XkLTuAegjj%2B5Q%3D&reserved=0 > did you have a chance to look at those? I am yet to evaluate that patch, I will soon revert back. > Why are you modeling 'fdiv' and 'ssediv' separately? When preparing the > above patches, I checked that x87 and SSE divisions use the same hardware > unit, and I don't see a strong reason to artificially clone it in the model. I thought of modelling them separately as they are different ISA groups. But yes, since they execute in the same unit, we can model them in the same automaton. > I have a question on AVX512 modeling in your patch: > > > +;; AVX instructions > > +(define_insn_reservation "znver4_sse_log" 1 > > + (and (eq_attr "cpu" "znver4") > > + (and (eq_attr "type" "sselog,sselog1") > > + (and (eq_attr "mode" "V4SF,V8SF,V2DF,V4DF") > > + (eq_attr "memory" "none")))) > > + "znver4-direct,znver4-fpu") > > + > > +(define_insn_reservation "znver4_sse_log_evex" 1 > > + (and (eq_attr "cpu" "znver4") > > + (and (eq_attr "type" "sselog,sselog1") > > + (and (eq_attr "mode" "V16SF,V8DF") > > + (eq_attr "memory" "none")))) > > + > > +"znver4-direct,znver4-fpu0+znver4-fpu1|znver4-fpu2+znver4-fpu3") > > + > > This is an AVX512 instruction, and you're modeling that it occupies two ports > at once and thus has half throughput, but later in the AVX512 section: > > > +;; AVX512 instructions > > +(define_insn_reservation "znver4_sse_mul_evex" 3 > > + (and (eq_attr "cpu" "znver4") > > + (and (eq_attr "type" "ssemul") > > + (and (eq_attr "mode" "V16SF,V8DF") > > + (eq_attr "memory" "none")))) > > + "znver4-double,znver4-fpu0|znver4-fpu3") > > none of the instructions are modeled this way. If that's on purpose, can you > add a comment? It's surprising, since generally AVX512 has half throughput > compared to AVX256 on Zen 4, but the model doesn't seem to reflect that. > > +"znver4-direct,znver4-fpu0+znver4-fpu1|znver4-fpu2+znver4-fpu3") AVX512 instructions (512-bitwide) occupy 2 consecutive cycles in the pipes they execute. So, it should be modelled as shown below: (define_insn_reservation "znver4_sse_log_evex" 1 (and (eq_attr "cpu" "znver4") (and (eq_attr "type" "sselog") (and (eq_attr "mode" "V16SF,V8DF,XI") (eq_attr "memory" "none")))) "znver4-double,(znver4-fpu)*2") (define_insn_reservation "znver4_sse_mul_evex" 3 (and (eq_attr "cpu" "znver4") (and (eq_attr "type" "ssemul") (and (eq_attr "mode" "V16SF,V8DF") (eq_attr "memory" "none")))) "znver4-double,(znver4-fpu0|znver4-fpu1)*2") Doing this way increased the insn-automata.cc size from 201402 lines to 212189. Hope it is a tolerable increase or do you have any suggestions? I will revise all avx512 instructions and post it. Thanks and Regards, Tejas ^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: [PATCH][X86_64] Separate znver4 insn reservations from older znvers 2022-11-15 12:08 ` Joshi, Tejas Sanjay @ 2022-11-15 12:51 ` Alexander Monakov 2022-11-21 11:40 ` Joshi, Tejas Sanjay 0 siblings, 1 reply; 14+ messages in thread From: Alexander Monakov @ 2022-11-15 12:51 UTC (permalink / raw) To: Joshi, Tejas Sanjay; +Cc: gcc-patches, honza.hubicka, Kumar, Venkataramanan On Tue, 15 Nov 2022, Joshi, Tejas Sanjay wrote: > > > +;; AVX instructions > > > +(define_insn_reservation "znver4_sse_log" 1 > > > + (and (eq_attr "cpu" "znver4") > > > + (and (eq_attr "type" "sselog,sselog1") > > > + (and (eq_attr "mode" "V4SF,V8SF,V2DF,V4DF") > > > + (eq_attr "memory" "none")))) > > > + "znver4-direct,znver4-fpu") > > > + > > > +(define_insn_reservation "znver4_sse_log_evex" 1 > > > + (and (eq_attr "cpu" "znver4") > > > + (and (eq_attr "type" "sselog,sselog1") > > > + (and (eq_attr "mode" "V16SF,V8DF") > > > + (eq_attr "memory" "none")))) > > > + > > > +"znver4-direct,znver4-fpu0+znver4-fpu1|znver4-fpu2+znver4-fpu3") > > > + > > > > This is an AVX512 instruction, and you're modeling that it occupies two ports > > at once and thus has half throughput, but later in the AVX512 section: > > > > > +;; AVX512 instructions > > > +(define_insn_reservation "znver4_sse_mul_evex" 3 > > > + (and (eq_attr "cpu" "znver4") > > > + (and (eq_attr "type" "ssemul") > > > + (and (eq_attr "mode" "V16SF,V8DF") > > > + (eq_attr "memory" "none")))) > > > + "znver4-double,znver4-fpu0|znver4-fpu3") > > > > none of the instructions are modeled this way. If that's on purpose, can you > > add a comment? It's surprising, since generally AVX512 has half throughput > > compared to AVX256 on Zen 4, but the model doesn't seem to reflect that. > > > > +"znver4-direct,znver4-fpu0+znver4-fpu1|znver4-fpu2+znver4-fpu3") > > AVX512 instructions (512-bitwide) occupy 2 consecutive cycles in the pipes > they execute. So, it should be modelled as shown below: > > (define_insn_reservation "znver4_sse_log_evex" 1 > (and (eq_attr "cpu" "znver4") > (and (eq_attr "type" "sselog") > (and (eq_attr "mode" "V16SF,V8DF,XI") > (eq_attr "memory" "none")))) > "znver4-double,(znver4-fpu)*2") I think instead of (znver4-fpu)*2 there should be znver4-fpu0*2|znver4-fpu1*2|znver4-fpu2*2|znver4-fpu3*2 assuming the instruction occupies the same pipe on both cycles (your variant models as if it can move from one pipe to another). > (define_insn_reservation "znver4_sse_mul_evex" 3 > (and (eq_attr "cpu" "znver4") > (and (eq_attr "type" "ssemul") > (and (eq_attr "mode" "V16SF,V8DF") > (eq_attr "memory" "none")))) > "znver4-double,(znver4-fpu0|znver4-fpu1)*2") Likewise here, znver4-fpu0*2|znver4-fpu1*2. > Doing this way increased the insn-automata.cc size from 201402 lines to 212189. Please reevaluate on top of my patches, the impact will be different. > Hope it is a tolerable increase or do you have any suggestions? Please take the corrections above into account. Also I think it's better to use znver4-direct rather than znver4-double for AVX512 instructions, because they are decoded as one uop, not two (it won't make a practical difference due to a "Fix me", but it's a simple improvement). Thanks. Alexander ^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: [PATCH][X86_64] Separate znver4 insn reservations from older znvers 2022-11-15 12:51 ` Alexander Monakov @ 2022-11-21 11:40 ` Joshi, Tejas Sanjay 2022-11-21 15:30 ` Alexander Monakov 0 siblings, 1 reply; 14+ messages in thread From: Joshi, Tejas Sanjay @ 2022-11-21 11:40 UTC (permalink / raw) To: Alexander Monakov, gcc-patches; +Cc: honza.hubicka, Kumar, Venkataramanan [-- Attachment #1: Type: text/plain, Size: 41792 bytes --] [Public] Hi, > I think instead of (znver4-fpu)*2 there should be > > znver4-fpu0*2|znver4-fpu1*2|znver4-fpu2*2|znver4-fpu3*2 > > assuming the instruction occupies the same pipe on both cycles (your variant > models as if it can move from one pipe to another). > Also I think it's better to use znver4-direct rather than znver4-double for > AVX512 instructions, because they are decoded as one uop, not two (it won't > make a practical difference due to a "Fix me", but it's a simple improvement). I have addressed all your comments in the patch attached here. I have also used znver4-direct for avx512 insns. * This patch increased the insn-automata.cc size from 201502 to 214902. * Compile time and binary size on my machine remains same. * Make check and bootstrap build have no issues. * Spec cpu2017 also don't have any issues with this patch. Is this ok for trunk? Thanks and Regards, Tejas gcc/ChangeLog: * gcc/common/config/i386/i386-common.cc (processor_alias_table): Use CPU_ZNVER4 for znver4. * config/i386/i386.md: Add znver4.md. * config/i386/znver4.md: New. --- gcc/common/config/i386/i386-common.cc | 2 +- gcc/config/i386/i386.md | 1 + gcc/config/i386/znver4.md | 1027 +++++++++++++++++++++++++ 3 files changed, 1029 insertions(+), 1 deletion(-) create mode 100644 gcc/config/i386/znver4.md diff --git a/gcc/common/config/i386/i386-common.cc b/gcc/common/config/i386/i386-common.cc index f66bdd5a2af..4b01c3540e5 100644 --- a/gcc/common/config/i386/i386-common.cc +++ b/gcc/common/config/i386/i386-common.cc @@ -2113,7 +2113,7 @@ const pta processor_alias_table[] = {"znver3", PROCESSOR_ZNVER3, CPU_ZNVER3, PTA_ZNVER3, M_CPU_SUBTYPE (AMDFAM19H_ZNVER3), P_PROC_AVX2}, - {"znver4", PROCESSOR_ZNVER4, CPU_ZNVER3, + {"znver4", PROCESSOR_ZNVER4, CPU_ZNVER4, PTA_ZNVER4, M_CPU_SUBTYPE (AMDFAM19H_ZNVER4), P_PROC_AVX512F}, {"btver1", PROCESSOR_BTVER1, CPU_GENERIC, diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 8081df76741..c18dfe2af9e 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -1312,6 +1312,7 @@ (include "bdver3.md") (include "btver2.md") (include "znver.md") +(include "znver4.md") (include "geode.md") (include "atom.md") (include "slm.md") diff --git a/gcc/config/i386/znver4.md b/gcc/config/i386/znver4.md new file mode 100644 index 00000000000..74b66caa38b --- /dev/null +++ b/gcc/config/i386/znver4.md @@ -0,0 +1,1027 @@ +;; Copyright (C) 2012-2022 Free Software Foundation, Inc. +;; +;; This file is part of GCC. +;; +;; GCC is free software; you can redistribute it and/or modify +;; it under the terms of the GNU General Public License as published by +;; the Free Software Foundation; either version 3, or (at your option) +;; any later version. +;; +;; GCC is distributed in the hope that it will be useful, +;; but WITHOUT ANY WARRANTY; without even the implied warranty of +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +;; GNU General Public License for more details. +;; +;; You should have received a copy of the GNU General Public License +;; along with GCC; see the file COPYING3. If not see +;; <http://www.gnu.org/licenses/>. +;; + + +(define_attr "znver4_decode" "direct,vector,double" + (const_string "direct")) + +;; AMD znver4 Scheduling +;; Modeling automatons for zen decoders, integer execution pipes, +;; AGU pipes, branch, floating point execution and fp store units. +(define_automaton "znver4, znver4_ieu, znver4_idiv, znver4_fdiv, znver4_agu, znver4_bru, znver4_fpu, znver4_fp_store") + +;; Decoders unit has 4 decoders and all of them can decode fast path +;; and vector type instructions. +(define_cpu_unit "znver4-decode0" "znver4") +(define_cpu_unit "znver4-decode1" "znver4") +(define_cpu_unit "znver4-decode2" "znver4") +(define_cpu_unit "znver4-decode3" "znver4") + +;; Currently blocking all decoders for vector path instructions as +;; they are dispatched separetely as microcode sequence. +(define_reservation "znver4-vector" "znver4-decode0+znver4-decode1+znver4-decode2+znver4-decode3") + +;; Direct instructions can be issued to any of the four decoders. +(define_reservation "znver4-direct" "znver4-decode0|znver4-decode1|znver4-decode2|znver4-decode3") + +;; Fix me: Need to revisit this later to simulate fast path double behavior. +(define_reservation "znver4-double" "znver4-direct") + + +;; Integer unit 4 ALU pipes. +(define_cpu_unit "znver4-ieu0" "znver4_ieu") +(define_cpu_unit "znver4-ieu1" "znver4_ieu") +(define_cpu_unit "znver4-ieu2" "znver4_ieu") +(define_cpu_unit "znver4-ieu3" "znver4_ieu") +(define_reservation "znver4-ieu" "znver4-ieu0|znver4-ieu1|znver4-ieu2|znver4-ieu3") + +;; 3 AGU pipes in znver4 +(define_cpu_unit "znver4-agu0" "znver4_agu") +(define_cpu_unit "znver4-agu1" "znver4_agu") +(define_cpu_unit "znver4-agu2" "znver4_agu") +(define_reservation "znver4-agu-reserve" "znver4-agu0|znver4-agu1|znver4-agu2") + +;; Load is 4 cycles. We do not model reservation of load unit. +(define_reservation "znver4-load" "znver4-agu-reserve") +(define_reservation "znver4-store" "znver4-agu-reserve") + +;; vectorpath (microcoded) instructions are single issue instructions. +;; So, they occupy all the integer units. +(define_reservation "znver4-ivector" "znver4-ieu0+znver4-ieu1 + +znver4-ieu2+znver4-ieu3 + +znver4-agu0+znver4-agu1+znver4-agu2") + +;; Floating point unit 4 FP pipes. +(define_cpu_unit "znver4-fpu0" "znver4_fpu") +(define_cpu_unit "znver4-fpu1" "znver4_fpu") +(define_cpu_unit "znver4-fpu2" "znver4_fpu") +(define_cpu_unit "znver4-fpu3" "znver4_fpu") + +(define_reservation "znver4-fpu" "znver4-fpu0|znver4-fpu1|znver4-fpu2|znver4-fpu3") + +(define_reservation "znver4-fvector" "znver4-fpu0+znver4-fpu1 + +znver4-fpu2+znver4-fpu3 + +znver4-agu0+znver4-agu1+znver4-agu2") + +;; DIV units +(define_cpu_unit "znver4-idiv" "znver4_idiv") +(define_cpu_unit "znver4-fdiv" "znver4_fdiv") + +;; znver4 has a separate branch unit. +(define_cpu_unit "znver4-bru" "znver4_bru") + +;; Separate fp store and fp-to-int store. Although there are 2 store pipes, the +;; throughput is limited to only one per cycle. +(define_cpu_unit "znver4-fp-store" "znver4_fp_store") + +;; Call Instruction +(define_insn_reservation "znver4_call" 1 + (and (eq_attr "cpu" "znver4") + (eq_attr "type" "call,callv")) + "znver4-double,znver4-ieu0|znver4-bru,znver4-store") + +;; Push Instruction +(define_insn_reservation "znver4_push" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "push") + (eq_attr "memory" "store"))) + "znver4-direct,znver4-store") + +(define_insn_reservation "znver4_push_load" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "push") + (eq_attr "memory" "both"))) + "znver4-direct,znver4-load,znver4-store") + +;; Pop instruction +(define_insn_reservation "znver4_pop" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "pop") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load") + +(define_insn_reservation "znver4_pop_mem" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "pop") + (eq_attr "memory" "both"))) + "znver4-direct,znver4-load,znver4-store") + +;; Leave +(define_insn_reservation "znver4_leave" 1 + (and (eq_attr "cpu" "znver4") + (eq_attr "type" "leave")) + "znver4-double,znver4-ieu,znver4-store") + +;; Integer Instructions or General instructions +;; Multiplications +(define_insn_reservation "znver4_imul" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "imul") + (and (eq_attr "mode" "QI,HI,SI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-ieu1") + +(define_insn_reservation "znver4_imul_DI" 4 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "imul") + (and (eq_attr "mode" "DI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-ieu1") + +(define_insn_reservation "znver4_imul_mem" 7 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "imul") + (and (eq_attr "mode" "QI,HI,SI") + (eq_attr "memory" "!none")))) + "znver4-direct,znver4-load,znver4-ieu1") + +(define_insn_reservation "znver4_imul_DI_mem" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "imul") + (and (eq_attr "mode" "DI") + (eq_attr "memory" "!none")))) + "znver4-direct,znver4-load,znver4-ieu1") + +;; Divisions +(define_insn_reservation "znver4_idiv_DI" 18 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "idiv") + (and (eq_attr "mode" "DI") + (eq_attr "memory" "none")))) + "znver4-double,znver4-idiv*10") + +(define_insn_reservation "znver4_idiv_SI" 12 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "idiv") + (and (eq_attr "mode" "SI") + (eq_attr "memory" "none")))) + "znver4-double,znver4-idiv*6") + +(define_insn_reservation "znver4_idiv_HI" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "idiv") + (and (eq_attr "mode" "HI") + (eq_attr "memory" "none")))) + "znver4-double,znver4-idiv*4") + +(define_insn_reservation "znver4_idiv_QI" 9 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "idiv") + (and (eq_attr "mode" "QI") + (eq_attr "memory" "none")))) + "znver4-double,znver4-idiv*4") + +(define_insn_reservation "znver4_idiv_DI_mem" 22 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "idiv") + (and (eq_attr "mode" "DI") + (eq_attr "memory" "!none")))) + "znver4-double,znver4-load,znver4-idiv*10") + +(define_insn_reservation "znver4_idiv_SI_mem" 16 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "idiv") + (and (eq_attr "mode" "SI") + (eq_attr "memory" "!none")))) + "znver4-double,znver4-load,znver4-idiv*6") + +(define_insn_reservation "znver4_idiv_HI_mem" 14 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "idiv") + (and (eq_attr "mode" "HI") + (eq_attr "memory" "!none")))) + "znver4-double,znver4-load,znver4-idiv*4") + +(define_insn_reservation "znver4_idiv_QI_mem" 13 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "idiv") + (and (eq_attr "mode" "QI") + (eq_attr "memory" "!none")))) + "znver4-double,znver4-load,znver4-idiv*4") + +;; STR and ISHIFT are microcoded. +(define_insn_reservation "znver4_str" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "str") + (eq_attr "memory" "both,store"))) + "znver4-vector,znver4-ivector") + +(define_insn_reservation "znver4_ishift" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ishift") + (eq_attr "memory" "both,store"))) + "znver4-vector,znver4-ivector") + +;; MOV - integer movs +(define_insn_reservation "znver4_imovx_double" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "znver1_decode" "double") + (and (eq_attr "type" "imovx") + (eq_attr "memory" "none")))) + "znver4-double,znver4-ieu") + +(define_insn_reservation "znver4_imov_direct" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "imov") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-ieu") + +(define_insn_reservation "znver4_imov_double_store" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "znver1_decode" "double") + (and (eq_attr "type" "imovx") + (eq_attr "memory" "store")))) + "znver4-double,znver4-ieu,znver4-store") + +(define_insn_reservation "znver4_imov_direct_store" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "imov") + (eq_attr "memory" "store"))) + "znver4-direct,znver4-ieu,znver4-store") + +(define_insn_reservation "znver4_imov_load_double_store" 4 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "znver1_decode" "double") + (and (eq_attr "type" "imovx") + (eq_attr "memory" "store")))) + "znver4-double,znver4-load,znver4-ieu,znver4-store") + +(define_insn_reservation "znver4_imov_load_direct_store" 4 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "imov") + (eq_attr "memory" "store"))) + "znver4-direct,znver4-load,znver4-ieu,znver4-store") + +;; INTEGER/GENERAL Instructions +(define_insn_reservation "znver4_insn" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "alu,alu1,negnot,rotate1,ishift1,test,incdec,icmp") + (eq_attr "memory" "none,unknown"))) + "znver4-direct,znver4-ieu") + +(define_insn_reservation "znver4_insn_load" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "alu,alu1,negnot,rotate1,ishift1,test,incdec,icmp") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-ieu") + +(define_insn_reservation "znver4_insn2" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "icmov,setcc") + (eq_attr "memory" "none,unknown"))) + "znver4-direct,znver4-ieu0|znver4-ieu3") + +(define_insn_reservation "znver4_insn2_load" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "icmov,setcc") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-ieu0|znver4-ieu3") + +(define_insn_reservation "znver4_rotate" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "rotate") + (eq_attr "memory" "none,unknown"))) + "znver4-direct,znver4-ieu1|znver4-ieu2") + +(define_insn_reservation "znver4_rotate_load" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "rotate") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-ieu1|znver4-ieu2") + +(define_insn_reservation "znver4_insn_store" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "alu,alu1,negnot,rotate1,ishift1,test,incdec,icmp") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-ieu,znver4-store") + +(define_insn_reservation "znver4_insn2_store" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "icmov,setcc") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-ieu0|znver4-ieu3,znver4-store") + +(define_insn_reservation "znver4_rotate_store" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "rotate") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-ieu1|znver4-ieu2,znver4-store") + +;; Other vector type +(define_insn_reservation "znver4_ieu_vector" 5 + (and (eq_attr "cpu" "znver4") + (eq_attr "type" "other,multi")) + "znver4-vector,znver4-ivector") + +;; alu1 instructions +(define_insn_reservation "znver4_alu1_vector" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "znver1_decode" "vector") + (and (eq_attr "type" "alu1") + (eq_attr "memory" "none,unknown")))) + "znver4-vector,znver4-ivector") + +(define_insn_reservation "znver4_alu1_direct" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "znver1_decode" "direct") + (and (eq_attr "type" "alu1") + (eq_attr "memory" "none,unknown")))) + "znver4-direct,znver4-ieu") + +;; Branches +(define_insn_reservation "znver4_branch" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ibr") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-ieu0|znver4-bru") + +(define_insn_reservation "znver4_branch_mem" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ibr") + (eq_attr "memory" "load"))) + "znver4-vector,znver4-ivector") + +;; LEA instruction with simple addressing +(define_insn_reservation "znver4_lea" 1 + (and (eq_attr "cpu" "znver4") + (eq_attr "type" "lea")) + "znver4-direct,znver4-ieu") + +;; Floating Point +;; FP movs +(define_insn_reservation "znver4_fp_cmov" 6 + (and (eq_attr "cpu" "znver4") + (eq_attr "type" "fcmov")) + "znver4-vector,znver4-fvector") + +(define_insn_reservation "znver4_fp_mov_direct" 1 + (and (eq_attr "cpu" "znver4") + (eq_attr "type" "fmov")) + "znver4-direct,znver4-fpu1") + +(define_insn_reservation "znver4_fp_mov_direct_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "znver1_decode" "direct") + (and (eq_attr "type" "fmov") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu1") + +(define_insn_reservation "znver4_fp_mov_direct_store" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "znver1_decode" "direct") + (and (eq_attr "type" "fmov") + (eq_attr "memory" "store")))) + "znver4-direct,znver4-fpu1,znver4-fp-store") + +(define_insn_reservation "znver4_fp_mov_double" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "znver1_decode" "double") + (and (eq_attr "type" "fmov") + (eq_attr "memory" "none")))) + "znver4-double,znver4-fpu1,znver4-fp-store") + +(define_insn_reservation "znver4_fp_mov_double_load" 12 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "znver1_decode" "double") + (and (eq_attr "type" "fmov") + (eq_attr "memory" "load")))) + "znver4-double,znver4-load,znver4-fpu1,znver4-fp-store") + +;; FSQRT +(define_insn_reservation "znver4_fsqrt" 22 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "fpspc") + (and (eq_attr "mode" "XF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fdiv*20") + +;; FPSPC instructions +(define_insn_reservation "znver4_fp_spc" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "fpspc") + (eq_attr "memory" "none"))) + "znver4-vector,znver4-fvector") + +(define_insn_reservation "znver4_fp_insn_vector" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "znver1_decode" "vector") + (eq_attr "type" "mmxcvt,sselog1,ssemov"))) + "znver4-vector,znver4-fvector") + +;; FABS, FCHS +(define_insn_reservation "znver4_fp_fsgn" 1 + (and (eq_attr "cpu" "znver4") + (eq_attr "type" "fsgn")) + "znver4-direct,znver4-fpu0|znver4-fpu1") + +;; FCMP +(define_insn_reservation "znver4_fp_fcmp" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "fcmp") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-fpu1") + +(define_insn_reservation "znver4_fp_fcmp_double" 4 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "fcmp") + (and (eq_attr "znver1_decode" "double") + (eq_attr "memory" "none")))) + "znver4-double,znver4-fpu1,znver4-fpu2") + +;; FADD, FSUB, FMUL +(define_insn_reservation "znver4_fp_op_mul" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "fop,fmul") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-fpu0") + +(define_insn_reservation "znver4_fp_op_mul_load" 13 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "fop,fmul") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-fpu0") + +;; FDIV +(define_insn_reservation "znver4_fp_div" 15 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "fdiv") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-fdiv*15") + +(define_insn_reservation "znver4_fp_div_load" 22 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "fdiv") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-fdiv*15") + +(define_insn_reservation "znver4_fp_idiv_load" 26 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "fdiv") + (and (eq_attr "fp_int_src" "true") + (eq_attr "memory" "load")))) + "znver4-double,znver4-load,znver4-fdiv*19") + +;; MMX, SSE, SSEn.n instructions +(define_insn_reservation "znver4_fp_mmx " 1 + (and (eq_attr "cpu" "znver4") + (eq_attr "type" "mmx")) + "znver4-direct,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_mmx_add_cmp" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "mmxadd,mmxcmp") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-fpu") + +(define_insn_reservation "znver4_mmx_add_cmp_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "mmxadd,mmxcmp") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-fpu") + +(define_insn_reservation "znver4_mmx_insn" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "mmxcvt,sseshuf,sseshuf1,mmxshft") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_mmx_insn_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "mmxcvt,sseshuf,sseshuf1,mmxshft") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_mmx_mov" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" ",mmxmov") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-fp-store") + +(define_insn_reservation "znver4_mmx_mov_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "mmxmov") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-fp-store") + +(define_insn_reservation "znver4_mmx_mul" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "mmxmul") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-fpu0|znver4-fpu3") + +(define_insn_reservation "znver4_mmx_mul_load" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "mmxmul") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-fpu0|znver4-fpu3") + +;; AVX instructions +(define_insn_reservation "znver4_sse_log" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sselog") + (and (eq_attr "mode" "V4SF,V8SF,V2DF,V4DF,QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu") + +(define_insn_reservation "znver4_sse_log_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sselog") + (and (eq_attr "mode" "V4SF,V8SF,V2DF,V4DF,QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu") + +(define_insn_reservation "znver4_sse_log1" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sselog1") + (and (eq_attr "mode" "V4SF,V8SF,V2DF,V4DF,QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu1|znver4-fpu2,znver4-fp-store") + +(define_insn_reservation "znver4_sse_log1_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sselog1") + (and (eq_attr "mode" "V4SF,V8SF,V2DF,V4DF,QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu1|znver4-fpu2,znver4-fp-store") + +(define_insn_reservation "znver4_sse_comi" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecomi") + (eq_attr "memory" "none"))) + "znver4-double,znver4-fpu2|znver4-fpu3,znver4-fp-store") + +(define_insn_reservation "znver4_sse_comi_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecomi") + (eq_attr "memory" "load"))) + "znver4-double,znver4-load,znver4-fpu2|znver4-fpu3,znver4-fp-store") + +(define_insn_reservation "znver4_sse_test" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "prefix_extra" "1") + (and (eq_attr "type" "ssecomi") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_sse_test_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "prefix_extra" "1") + (and (eq_attr "type" "ssecomi") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_sse_imul" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseimul") + (and (eq_attr "mode" "QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu0|znver4-fpu3") + +(define_insn_reservation "znver4_sse_imul_load" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseimul") + (and (eq_attr "mode" "QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu0|znver4-fpu1") + +(define_insn_reservation "znver4_sse_mov" 2 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_sse_mov_load" 9 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_sse_mov_store" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "store")))) + "znver4-direct,znver4-fpu1|znver4-fpu2,znver4-fp-store") + +(define_insn_reservation "znver4_sse_mov_fp" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "V16SF,V8DF,V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu") + +(define_insn_reservation "znver4_sse_mov_fp_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "V16SF,V8DF,V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu") + +(define_insn_reservation "znver4_sse_mov_fp_store" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "V16SF,V8DF,V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "store")))) + "znver4-direct,znver4-fp-store") + +(define_insn_reservation "znver4_sse_add" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseadd") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu2|znver4-fpu3") + +(define_insn_reservation "znver4_sse_add_load" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseadd") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu2|znver4-fpu3") + +(define_insn_reservation "znver4_sse_add1" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseadd1") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "none")))) + "znver4-vector,znver4-fvector") + +(define_insn_reservation "znver4_sse_add1_load" 13 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseadd1") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "load")))) + "znver4-vector,znver4-load,znver4-fvector") + +(define_insn_reservation "znver4_sse_iadd" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseiadd") + (and (eq_attr "mode" "QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu") + +(define_insn_reservation "znver4_sse_iadd_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseiadd") + (and (eq_attr "mode" "QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu") + +(define_insn_reservation "znver4_sse_mul" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemul") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu0|znver4-fpu1") + +(define_insn_reservation "znver4_sse_mul_load" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemul") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu0|znver4-fpu1") + +(define_insn_reservation "znver4_sse_div_pd" 13 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssediv") + (and (eq_attr "mode" "V4DF,V2DF,V1DF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fdiv*7") + +(define_insn_reservation "znver4_sse_div_ps" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssediv") + (and (eq_attr "mode" "V8SF,V4SF,V2SF,SF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fdiv*5") + +(define_insn_reservation "znver4_sse_div_pd_load" 20 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssediv") + (and (eq_attr "mode" "V4DF,V2DF,V1DF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fdiv*7") + +(define_insn_reservation "znver4_sse_div_ps_load" 17 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssediv") + (and (eq_attr "mode" "V8SF,V4SF,V2SF,SF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fdiv*5") + +(define_insn_reservation "znver4_sse_cmp_avx" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecmp,ssecomi") + (and (eq_attr "mode" "V4SF,V2DF,V2SF,V1DF,SF,QI,HI,SI,DI,TI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu0|znver4-fpu1") + +(define_insn_reservation "znver4_sse_cmp_avx_load" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecmp,ssecomi") + (and (eq_attr "mode" "V4SF,V2DF,V2SF,V1DF,SF,QI,HI,SI,DI,TI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu0|znver4-fpu1") + +(define_insn_reservation "znver4_sse_cmp_avx2" 4 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecmp,ssecomi") + (and (eq_attr "mode" "V8SF,V4DF,OI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu0|znver4-fpu1") + +(define_insn_reservation "znver4_sse_cmp_avx2_load" 11 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecmp,ssecomi") + (and (eq_attr "mode" "V8SF,V4DF,OI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu0|znver4-fpu1") + +(define_insn_reservation "znver4_sse_cvt" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecvt") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu2|znver4-fpu3") + +(define_insn_reservation "znver4_sse_cvt_load" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecvt") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu2|znver4-fpu3") + +(define_insn_reservation "znver4_sse_icvt" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecvt") + (and (eq_attr "mode" "SI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu2|znver4-fpu3") + +(define_insn_reservation "znver4_sse_icvt_store" 4 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecvt") + (and (eq_attr "mode" "SI") + (eq_attr "memory" "store")))) + "znver4-double,znver4-fpu2|znver4-fpu3,znver4-fp-store") + +(define_insn_reservation "znver4_sse_shuf" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseshuf") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_sse_shuf_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseshuf") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu") + +(define_insn_reservation "znver4_sse_ishuf" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseshuf") + (and (eq_attr "mode" "OI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_sse_ishuf_load" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseshuf") + (and (eq_attr "mode" "OI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu1|znver4-fpu2") + +;; AVX512 instructions +(define_insn_reservation "znver4_sse_log_evex" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sselog") + (and (eq_attr "mode" "V16SF,V8DF,XI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu0*2|znver4-fpu1*2|znver4-fpu2*2|znver4-fpu3*2") + +(define_insn_reservation "znver4_sse_log_evex_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sselog") + (and (eq_attr "mode" "V16SF,V8DF,XI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu0*2|znver4-fpu1*2|znver4-fpu2*2|znver4-fpu3*2") + +(define_insn_reservation "znver4_sse_log1_evex" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sselog1") + (and (eq_attr "mode" "V16SF,V8DF,XI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu1*2|znver4-fpu2*2,znver4-fp-store") + +(define_insn_reservation "znver4_sse_log1_evex_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sselog1") + (and (eq_attr "mode" "V16SF,V8DF,XI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu1*2|znver4-fpu2*2,znver4-fp-store") + +(define_insn_reservation "znver4_sse_mul_evex" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemul") + (and (eq_attr "mode" "V16SF,V8DF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu0*2|znver4-fpu1*2") + +(define_insn_reservation "znver4_sse_mul_evex_load" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemul") + (and (eq_attr "mode" "V16SF,V8DF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu0*2|znver4-fpu1*2") + +(define_insn_reservation "znver4_sse_imul_evex" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseimul") + (and (eq_attr "mode" "XI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu0*2|znver4-fpu3*2") + +(define_insn_reservation "znver4_sse_imul_evex_load" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseimul") + (and (eq_attr "mode" "XI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu0*2|znver4-fpu1*2") + +(define_insn_reservation "znver4_sse_mov_evex" 4 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "XI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu1*2|znver4-fpu2*2") + +(define_insn_reservation "znver4_sse_mov_evex_load" 11 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "XI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu1*2|znver4-fpu2*2") + +(define_insn_reservation "znver4_sse_mov_evex_store" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "XI") + (eq_attr "memory" "store")))) + "znver4-direct,znver4-fpu1*2|znver4-fpu2*2,znver4-fp-store") + +(define_insn_reservation "znver4_sse_add_evex" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseadd") + (and (eq_attr "mode" "V16SF,V8DF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu2*2|znver4-fpu3*2") + +(define_insn_reservation "znver4_sse_add_evex_load" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseadd") + (and (eq_attr "mode" "V16SF,V8DF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu2*2|znver4-fpu3*2") + +(define_insn_reservation "znver4_sse_iadd_evex" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseiadd") + (and (eq_attr "mode" "XI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu0*2|znver4-fpu1*2|znver4-fpu2*2|znver4-fpu3*2") + +(define_insn_reservation "znver4_sse_iadd_evex_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseiadd") + (and (eq_attr "mode" "XI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu0*2|znver4-fpu1*2|znver4-fpu2*2|znver4-fpu3*2") + +(define_insn_reservation "znver4_sse_div_pd_evex" 13 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssediv") + (and (eq_attr "mode" "V8DF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fdiv*7") + +(define_insn_reservation "znver4_sse_div_ps_evex" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssediv") + (and (eq_attr "mode" "V16SF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fdiv*5") + +(define_insn_reservation "znver4_sse_div_pd_evex_load" 20 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssediv") + (and (eq_attr "mode" "V8DF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fdiv*7") + +(define_insn_reservation "znver4_sse_div_ps_evex_load" 17 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssediv") + (and (eq_attr "mode" "V16SF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fdiv*5") + +(define_insn_reservation "znver4_sse_cmp_avx512" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecmp,ssecomi") + (and (eq_attr "mode" "V16SF,V8DF,XI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu0*2|znver4-fpu1*2") + +(define_insn_reservation "znver4_sse_cmp_avx512_load" 12 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecmp,ssecomi") + (and (eq_attr "mode" "V16SF,V8DF,XI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu0*2|znver4-fpu1*2") + +(define_insn_reservation "znver4_sse_cvt_evex" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecvt") + (and (eq_attr "mode" "V16SF,V8DF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu1*2|znver4-fpu2*2,znver4-fpu2*2|znver4-fpu3*2") + +(define_insn_reservation "znver4_sse_cvt_evex_load" 13 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecvt") + (and (eq_attr "mode" "V16SF,V8DF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu1*2|znver4-fpu2*2,znver4-fpu2*2|znver4-fpu3*2") + +(define_insn_reservation "znver4_sse_shuf_evex" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseshuf") + (and (eq_attr "mode" "V16SF,V8DF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu0*2|znver4-fpu1*2|znver4-fpu2*2|znver4-fpu3*2") + +(define_insn_reservation "znver4_sse_shuf_evex_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseshuf") + (and (eq_attr "mode" "V16SF,V8DF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu0*2|znver4-fpu1*2|znver4-fpu2*2|znver4-fpu3*2") + +(define_insn_reservation "znver4_sse_ishuf_evex" 4 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseshuf") + (and (eq_attr "mode" "XI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu1*2|znver4-fpu2*2") + +(define_insn_reservation "znver4_sse_ishuf_evex_load" 11 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseshuf") + (and (eq_attr "mode" "XI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu1*2|znver4-fpu2*2") + +(define_insn_reservation "znver4_sse_muladd" 4 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemuladd") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-fpu0*2|znver4-fpu1*2") + +(define_insn_reservation "znver4_sse_muladd_load" 11 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseshuf") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-fpu0*2|znver4-fpu1*2") + +;; AVX512 mask instructions + +(define_insn_reservation "znver4_sse_mskmov" 2 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "mskmov") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-fpu0*2|znver4-fpu1*2") + +(define_insn_reservation "znver4_sse_msklog" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "msklog") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-fpu2*2|znver4-fpu3*2") -- 2.25.1 [-- Attachment #2: 0001-Add-AMD-znver4-instruction-reservations.patch --] [-- Type: application/octet-stream, Size: 40130 bytes --] From d95a5cfcf90dcc50c3e286926acfc478925c6561 Mon Sep 17 00:00:00 2001 From: Tejas Joshi <TejasSanjay.Joshi@amd.com> Date: Wed, 9 Nov 2022 00:10:59 +0530 Subject: [PATCH] Add AMD znver4 instruction reservations This adds znver4 automata units and reservations separately from other znver automata, avoiding the insn-automata.cc size blow-up. gcc/ChangeLog: * gcc/common/config/i386/i386-common.cc (processor_alias_table): Use CPU_ZNVER4 for znver4. * config/i386/i386.md: Add znver4.md. * config/i386/znver4.md: New. --- gcc/common/config/i386/i386-common.cc | 2 +- gcc/config/i386/i386.md | 1 + gcc/config/i386/znver4.md | 1027 +++++++++++++++++++++++++ 3 files changed, 1029 insertions(+), 1 deletion(-) create mode 100644 gcc/config/i386/znver4.md diff --git a/gcc/common/config/i386/i386-common.cc b/gcc/common/config/i386/i386-common.cc index f66bdd5a2af..4b01c3540e5 100644 --- a/gcc/common/config/i386/i386-common.cc +++ b/gcc/common/config/i386/i386-common.cc @@ -2113,7 +2113,7 @@ const pta processor_alias_table[] = {"znver3", PROCESSOR_ZNVER3, CPU_ZNVER3, PTA_ZNVER3, M_CPU_SUBTYPE (AMDFAM19H_ZNVER3), P_PROC_AVX2}, - {"znver4", PROCESSOR_ZNVER4, CPU_ZNVER3, + {"znver4", PROCESSOR_ZNVER4, CPU_ZNVER4, PTA_ZNVER4, M_CPU_SUBTYPE (AMDFAM19H_ZNVER4), P_PROC_AVX512F}, {"btver1", PROCESSOR_BTVER1, CPU_GENERIC, diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 8081df76741..c18dfe2af9e 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -1312,6 +1312,7 @@ (include "bdver3.md") (include "btver2.md") (include "znver.md") +(include "znver4.md") (include "geode.md") (include "atom.md") (include "slm.md") diff --git a/gcc/config/i386/znver4.md b/gcc/config/i386/znver4.md new file mode 100644 index 00000000000..74b66caa38b --- /dev/null +++ b/gcc/config/i386/znver4.md @@ -0,0 +1,1027 @@ +;; Copyright (C) 2012-2022 Free Software Foundation, Inc. +;; +;; This file is part of GCC. +;; +;; GCC is free software; you can redistribute it and/or modify +;; it under the terms of the GNU General Public License as published by +;; the Free Software Foundation; either version 3, or (at your option) +;; any later version. +;; +;; GCC is distributed in the hope that it will be useful, +;; but WITHOUT ANY WARRANTY; without even the implied warranty of +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +;; GNU General Public License for more details. +;; +;; You should have received a copy of the GNU General Public License +;; along with GCC; see the file COPYING3. If not see +;; <http://www.gnu.org/licenses/>. +;; + + +(define_attr "znver4_decode" "direct,vector,double" + (const_string "direct")) + +;; AMD znver4 Scheduling +;; Modeling automatons for zen decoders, integer execution pipes, +;; AGU pipes, branch, floating point execution and fp store units. +(define_automaton "znver4, znver4_ieu, znver4_idiv, znver4_fdiv, znver4_agu, znver4_bru, znver4_fpu, znver4_fp_store") + +;; Decoders unit has 4 decoders and all of them can decode fast path +;; and vector type instructions. +(define_cpu_unit "znver4-decode0" "znver4") +(define_cpu_unit "znver4-decode1" "znver4") +(define_cpu_unit "znver4-decode2" "znver4") +(define_cpu_unit "znver4-decode3" "znver4") + +;; Currently blocking all decoders for vector path instructions as +;; they are dispatched separetely as microcode sequence. +(define_reservation "znver4-vector" "znver4-decode0+znver4-decode1+znver4-decode2+znver4-decode3") + +;; Direct instructions can be issued to any of the four decoders. +(define_reservation "znver4-direct" "znver4-decode0|znver4-decode1|znver4-decode2|znver4-decode3") + +;; Fix me: Need to revisit this later to simulate fast path double behavior. +(define_reservation "znver4-double" "znver4-direct") + + +;; Integer unit 4 ALU pipes. +(define_cpu_unit "znver4-ieu0" "znver4_ieu") +(define_cpu_unit "znver4-ieu1" "znver4_ieu") +(define_cpu_unit "znver4-ieu2" "znver4_ieu") +(define_cpu_unit "znver4-ieu3" "znver4_ieu") +(define_reservation "znver4-ieu" "znver4-ieu0|znver4-ieu1|znver4-ieu2|znver4-ieu3") + +;; 3 AGU pipes in znver4 +(define_cpu_unit "znver4-agu0" "znver4_agu") +(define_cpu_unit "znver4-agu1" "znver4_agu") +(define_cpu_unit "znver4-agu2" "znver4_agu") +(define_reservation "znver4-agu-reserve" "znver4-agu0|znver4-agu1|znver4-agu2") + +;; Load is 4 cycles. We do not model reservation of load unit. +(define_reservation "znver4-load" "znver4-agu-reserve") +(define_reservation "znver4-store" "znver4-agu-reserve") + +;; vectorpath (microcoded) instructions are single issue instructions. +;; So, they occupy all the integer units. +(define_reservation "znver4-ivector" "znver4-ieu0+znver4-ieu1 + +znver4-ieu2+znver4-ieu3 + +znver4-agu0+znver4-agu1+znver4-agu2") + +;; Floating point unit 4 FP pipes. +(define_cpu_unit "znver4-fpu0" "znver4_fpu") +(define_cpu_unit "znver4-fpu1" "znver4_fpu") +(define_cpu_unit "znver4-fpu2" "znver4_fpu") +(define_cpu_unit "znver4-fpu3" "znver4_fpu") + +(define_reservation "znver4-fpu" "znver4-fpu0|znver4-fpu1|znver4-fpu2|znver4-fpu3") + +(define_reservation "znver4-fvector" "znver4-fpu0+znver4-fpu1 + +znver4-fpu2+znver4-fpu3 + +znver4-agu0+znver4-agu1+znver4-agu2") + +;; DIV units +(define_cpu_unit "znver4-idiv" "znver4_idiv") +(define_cpu_unit "znver4-fdiv" "znver4_fdiv") + +;; znver4 has a separate branch unit. +(define_cpu_unit "znver4-bru" "znver4_bru") + +;; Separate fp store and fp-to-int store. Although there are 2 store pipes, the +;; throughput is limited to only one per cycle. +(define_cpu_unit "znver4-fp-store" "znver4_fp_store") + +;; Call Instruction +(define_insn_reservation "znver4_call" 1 + (and (eq_attr "cpu" "znver4") + (eq_attr "type" "call,callv")) + "znver4-double,znver4-ieu0|znver4-bru,znver4-store") + +;; Push Instruction +(define_insn_reservation "znver4_push" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "push") + (eq_attr "memory" "store"))) + "znver4-direct,znver4-store") + +(define_insn_reservation "znver4_push_load" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "push") + (eq_attr "memory" "both"))) + "znver4-direct,znver4-load,znver4-store") + +;; Pop instruction +(define_insn_reservation "znver4_pop" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "pop") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load") + +(define_insn_reservation "znver4_pop_mem" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "pop") + (eq_attr "memory" "both"))) + "znver4-direct,znver4-load,znver4-store") + +;; Leave +(define_insn_reservation "znver4_leave" 1 + (and (eq_attr "cpu" "znver4") + (eq_attr "type" "leave")) + "znver4-double,znver4-ieu,znver4-store") + +;; Integer Instructions or General instructions +;; Multiplications +(define_insn_reservation "znver4_imul" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "imul") + (and (eq_attr "mode" "QI,HI,SI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-ieu1") + +(define_insn_reservation "znver4_imul_DI" 4 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "imul") + (and (eq_attr "mode" "DI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-ieu1") + +(define_insn_reservation "znver4_imul_mem" 7 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "imul") + (and (eq_attr "mode" "QI,HI,SI") + (eq_attr "memory" "!none")))) + "znver4-direct,znver4-load,znver4-ieu1") + +(define_insn_reservation "znver4_imul_DI_mem" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "imul") + (and (eq_attr "mode" "DI") + (eq_attr "memory" "!none")))) + "znver4-direct,znver4-load,znver4-ieu1") + +;; Divisions +(define_insn_reservation "znver4_idiv_DI" 18 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "idiv") + (and (eq_attr "mode" "DI") + (eq_attr "memory" "none")))) + "znver4-double,znver4-idiv*10") + +(define_insn_reservation "znver4_idiv_SI" 12 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "idiv") + (and (eq_attr "mode" "SI") + (eq_attr "memory" "none")))) + "znver4-double,znver4-idiv*6") + +(define_insn_reservation "znver4_idiv_HI" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "idiv") + (and (eq_attr "mode" "HI") + (eq_attr "memory" "none")))) + "znver4-double,znver4-idiv*4") + +(define_insn_reservation "znver4_idiv_QI" 9 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "idiv") + (and (eq_attr "mode" "QI") + (eq_attr "memory" "none")))) + "znver4-double,znver4-idiv*4") + +(define_insn_reservation "znver4_idiv_DI_mem" 22 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "idiv") + (and (eq_attr "mode" "DI") + (eq_attr "memory" "!none")))) + "znver4-double,znver4-load,znver4-idiv*10") + +(define_insn_reservation "znver4_idiv_SI_mem" 16 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "idiv") + (and (eq_attr "mode" "SI") + (eq_attr "memory" "!none")))) + "znver4-double,znver4-load,znver4-idiv*6") + +(define_insn_reservation "znver4_idiv_HI_mem" 14 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "idiv") + (and (eq_attr "mode" "HI") + (eq_attr "memory" "!none")))) + "znver4-double,znver4-load,znver4-idiv*4") + +(define_insn_reservation "znver4_idiv_QI_mem" 13 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "idiv") + (and (eq_attr "mode" "QI") + (eq_attr "memory" "!none")))) + "znver4-double,znver4-load,znver4-idiv*4") + +;; STR and ISHIFT are microcoded. +(define_insn_reservation "znver4_str" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "str") + (eq_attr "memory" "both,store"))) + "znver4-vector,znver4-ivector") + +(define_insn_reservation "znver4_ishift" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ishift") + (eq_attr "memory" "both,store"))) + "znver4-vector,znver4-ivector") + +;; MOV - integer movs +(define_insn_reservation "znver4_imovx_double" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "znver1_decode" "double") + (and (eq_attr "type" "imovx") + (eq_attr "memory" "none")))) + "znver4-double,znver4-ieu") + +(define_insn_reservation "znver4_imov_direct" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "imov") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-ieu") + +(define_insn_reservation "znver4_imov_double_store" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "znver1_decode" "double") + (and (eq_attr "type" "imovx") + (eq_attr "memory" "store")))) + "znver4-double,znver4-ieu,znver4-store") + +(define_insn_reservation "znver4_imov_direct_store" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "imov") + (eq_attr "memory" "store"))) + "znver4-direct,znver4-ieu,znver4-store") + +(define_insn_reservation "znver4_imov_load_double_store" 4 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "znver1_decode" "double") + (and (eq_attr "type" "imovx") + (eq_attr "memory" "store")))) + "znver4-double,znver4-load,znver4-ieu,znver4-store") + +(define_insn_reservation "znver4_imov_load_direct_store" 4 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "imov") + (eq_attr "memory" "store"))) + "znver4-direct,znver4-load,znver4-ieu,znver4-store") + +;; INTEGER/GENERAL Instructions +(define_insn_reservation "znver4_insn" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "alu,alu1,negnot,rotate1,ishift1,test,incdec,icmp") + (eq_attr "memory" "none,unknown"))) + "znver4-direct,znver4-ieu") + +(define_insn_reservation "znver4_insn_load" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "alu,alu1,negnot,rotate1,ishift1,test,incdec,icmp") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-ieu") + +(define_insn_reservation "znver4_insn2" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "icmov,setcc") + (eq_attr "memory" "none,unknown"))) + "znver4-direct,znver4-ieu0|znver4-ieu3") + +(define_insn_reservation "znver4_insn2_load" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "icmov,setcc") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-ieu0|znver4-ieu3") + +(define_insn_reservation "znver4_rotate" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "rotate") + (eq_attr "memory" "none,unknown"))) + "znver4-direct,znver4-ieu1|znver4-ieu2") + +(define_insn_reservation "znver4_rotate_load" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "rotate") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-ieu1|znver4-ieu2") + +(define_insn_reservation "znver4_insn_store" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "alu,alu1,negnot,rotate1,ishift1,test,incdec,icmp") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-ieu,znver4-store") + +(define_insn_reservation "znver4_insn2_store" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "icmov,setcc") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-ieu0|znver4-ieu3,znver4-store") + +(define_insn_reservation "znver4_rotate_store" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "rotate") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-ieu1|znver4-ieu2,znver4-store") + +;; Other vector type +(define_insn_reservation "znver4_ieu_vector" 5 + (and (eq_attr "cpu" "znver4") + (eq_attr "type" "other,multi")) + "znver4-vector,znver4-ivector") + +;; alu1 instructions +(define_insn_reservation "znver4_alu1_vector" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "znver1_decode" "vector") + (and (eq_attr "type" "alu1") + (eq_attr "memory" "none,unknown")))) + "znver4-vector,znver4-ivector") + +(define_insn_reservation "znver4_alu1_direct" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "znver1_decode" "direct") + (and (eq_attr "type" "alu1") + (eq_attr "memory" "none,unknown")))) + "znver4-direct,znver4-ieu") + +;; Branches +(define_insn_reservation "znver4_branch" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ibr") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-ieu0|znver4-bru") + +(define_insn_reservation "znver4_branch_mem" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ibr") + (eq_attr "memory" "load"))) + "znver4-vector,znver4-ivector") + +;; LEA instruction with simple addressing +(define_insn_reservation "znver4_lea" 1 + (and (eq_attr "cpu" "znver4") + (eq_attr "type" "lea")) + "znver4-direct,znver4-ieu") + +;; Floating Point +;; FP movs +(define_insn_reservation "znver4_fp_cmov" 6 + (and (eq_attr "cpu" "znver4") + (eq_attr "type" "fcmov")) + "znver4-vector,znver4-fvector") + +(define_insn_reservation "znver4_fp_mov_direct" 1 + (and (eq_attr "cpu" "znver4") + (eq_attr "type" "fmov")) + "znver4-direct,znver4-fpu1") + +(define_insn_reservation "znver4_fp_mov_direct_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "znver1_decode" "direct") + (and (eq_attr "type" "fmov") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu1") + +(define_insn_reservation "znver4_fp_mov_direct_store" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "znver1_decode" "direct") + (and (eq_attr "type" "fmov") + (eq_attr "memory" "store")))) + "znver4-direct,znver4-fpu1,znver4-fp-store") + +(define_insn_reservation "znver4_fp_mov_double" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "znver1_decode" "double") + (and (eq_attr "type" "fmov") + (eq_attr "memory" "none")))) + "znver4-double,znver4-fpu1,znver4-fp-store") + +(define_insn_reservation "znver4_fp_mov_double_load" 12 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "znver1_decode" "double") + (and (eq_attr "type" "fmov") + (eq_attr "memory" "load")))) + "znver4-double,znver4-load,znver4-fpu1,znver4-fp-store") + +;; FSQRT +(define_insn_reservation "znver4_fsqrt" 22 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "fpspc") + (and (eq_attr "mode" "XF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fdiv*20") + +;; FPSPC instructions +(define_insn_reservation "znver4_fp_spc" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "fpspc") + (eq_attr "memory" "none"))) + "znver4-vector,znver4-fvector") + +(define_insn_reservation "znver4_fp_insn_vector" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "znver1_decode" "vector") + (eq_attr "type" "mmxcvt,sselog1,ssemov"))) + "znver4-vector,znver4-fvector") + +;; FABS, FCHS +(define_insn_reservation "znver4_fp_fsgn" 1 + (and (eq_attr "cpu" "znver4") + (eq_attr "type" "fsgn")) + "znver4-direct,znver4-fpu0|znver4-fpu1") + +;; FCMP +(define_insn_reservation "znver4_fp_fcmp" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "fcmp") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-fpu1") + +(define_insn_reservation "znver4_fp_fcmp_double" 4 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "fcmp") + (and (eq_attr "znver1_decode" "double") + (eq_attr "memory" "none")))) + "znver4-double,znver4-fpu1,znver4-fpu2") + +;; FADD, FSUB, FMUL +(define_insn_reservation "znver4_fp_op_mul" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "fop,fmul") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-fpu0") + +(define_insn_reservation "znver4_fp_op_mul_load" 13 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "fop,fmul") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-fpu0") + +;; FDIV +(define_insn_reservation "znver4_fp_div" 15 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "fdiv") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-fdiv*15") + +(define_insn_reservation "znver4_fp_div_load" 22 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "fdiv") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-fdiv*15") + +(define_insn_reservation "znver4_fp_idiv_load" 26 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "fdiv") + (and (eq_attr "fp_int_src" "true") + (eq_attr "memory" "load")))) + "znver4-double,znver4-load,znver4-fdiv*19") + +;; MMX, SSE, SSEn.n instructions +(define_insn_reservation "znver4_fp_mmx " 1 + (and (eq_attr "cpu" "znver4") + (eq_attr "type" "mmx")) + "znver4-direct,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_mmx_add_cmp" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "mmxadd,mmxcmp") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-fpu") + +(define_insn_reservation "znver4_mmx_add_cmp_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "mmxadd,mmxcmp") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-fpu") + +(define_insn_reservation "znver4_mmx_insn" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "mmxcvt,sseshuf,sseshuf1,mmxshft") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_mmx_insn_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "mmxcvt,sseshuf,sseshuf1,mmxshft") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_mmx_mov" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" ",mmxmov") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-fp-store") + +(define_insn_reservation "znver4_mmx_mov_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "mmxmov") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-fp-store") + +(define_insn_reservation "znver4_mmx_mul" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "mmxmul") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-fpu0|znver4-fpu3") + +(define_insn_reservation "znver4_mmx_mul_load" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "mmxmul") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-fpu0|znver4-fpu3") + +;; AVX instructions +(define_insn_reservation "znver4_sse_log" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sselog") + (and (eq_attr "mode" "V4SF,V8SF,V2DF,V4DF,QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu") + +(define_insn_reservation "znver4_sse_log_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sselog") + (and (eq_attr "mode" "V4SF,V8SF,V2DF,V4DF,QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu") + +(define_insn_reservation "znver4_sse_log1" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sselog1") + (and (eq_attr "mode" "V4SF,V8SF,V2DF,V4DF,QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu1|znver4-fpu2,znver4-fp-store") + +(define_insn_reservation "znver4_sse_log1_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sselog1") + (and (eq_attr "mode" "V4SF,V8SF,V2DF,V4DF,QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu1|znver4-fpu2,znver4-fp-store") + +(define_insn_reservation "znver4_sse_comi" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecomi") + (eq_attr "memory" "none"))) + "znver4-double,znver4-fpu2|znver4-fpu3,znver4-fp-store") + +(define_insn_reservation "znver4_sse_comi_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecomi") + (eq_attr "memory" "load"))) + "znver4-double,znver4-load,znver4-fpu2|znver4-fpu3,znver4-fp-store") + +(define_insn_reservation "znver4_sse_test" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "prefix_extra" "1") + (and (eq_attr "type" "ssecomi") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_sse_test_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "prefix_extra" "1") + (and (eq_attr "type" "ssecomi") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_sse_imul" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseimul") + (and (eq_attr "mode" "QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu0|znver4-fpu3") + +(define_insn_reservation "znver4_sse_imul_load" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseimul") + (and (eq_attr "mode" "QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu0|znver4-fpu1") + +(define_insn_reservation "znver4_sse_mov" 2 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_sse_mov_load" 9 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_sse_mov_store" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "store")))) + "znver4-direct,znver4-fpu1|znver4-fpu2,znver4-fp-store") + +(define_insn_reservation "znver4_sse_mov_fp" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "V16SF,V8DF,V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu") + +(define_insn_reservation "znver4_sse_mov_fp_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "V16SF,V8DF,V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu") + +(define_insn_reservation "znver4_sse_mov_fp_store" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "V16SF,V8DF,V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "store")))) + "znver4-direct,znver4-fp-store") + +(define_insn_reservation "znver4_sse_add" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseadd") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu2|znver4-fpu3") + +(define_insn_reservation "znver4_sse_add_load" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseadd") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu2|znver4-fpu3") + +(define_insn_reservation "znver4_sse_add1" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseadd1") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "none")))) + "znver4-vector,znver4-fvector") + +(define_insn_reservation "znver4_sse_add1_load" 13 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseadd1") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "load")))) + "znver4-vector,znver4-load,znver4-fvector") + +(define_insn_reservation "znver4_sse_iadd" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseiadd") + (and (eq_attr "mode" "QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu") + +(define_insn_reservation "znver4_sse_iadd_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseiadd") + (and (eq_attr "mode" "QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu") + +(define_insn_reservation "znver4_sse_mul" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemul") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu0|znver4-fpu1") + +(define_insn_reservation "znver4_sse_mul_load" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemul") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu0|znver4-fpu1") + +(define_insn_reservation "znver4_sse_div_pd" 13 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssediv") + (and (eq_attr "mode" "V4DF,V2DF,V1DF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fdiv*7") + +(define_insn_reservation "znver4_sse_div_ps" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssediv") + (and (eq_attr "mode" "V8SF,V4SF,V2SF,SF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fdiv*5") + +(define_insn_reservation "znver4_sse_div_pd_load" 20 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssediv") + (and (eq_attr "mode" "V4DF,V2DF,V1DF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fdiv*7") + +(define_insn_reservation "znver4_sse_div_ps_load" 17 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssediv") + (and (eq_attr "mode" "V8SF,V4SF,V2SF,SF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fdiv*5") + +(define_insn_reservation "znver4_sse_cmp_avx" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecmp,ssecomi") + (and (eq_attr "mode" "V4SF,V2DF,V2SF,V1DF,SF,QI,HI,SI,DI,TI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu0|znver4-fpu1") + +(define_insn_reservation "znver4_sse_cmp_avx_load" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecmp,ssecomi") + (and (eq_attr "mode" "V4SF,V2DF,V2SF,V1DF,SF,QI,HI,SI,DI,TI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu0|znver4-fpu1") + +(define_insn_reservation "znver4_sse_cmp_avx2" 4 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecmp,ssecomi") + (and (eq_attr "mode" "V8SF,V4DF,OI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu0|znver4-fpu1") + +(define_insn_reservation "znver4_sse_cmp_avx2_load" 11 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecmp,ssecomi") + (and (eq_attr "mode" "V8SF,V4DF,OI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu0|znver4-fpu1") + +(define_insn_reservation "znver4_sse_cvt" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecvt") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu2|znver4-fpu3") + +(define_insn_reservation "znver4_sse_cvt_load" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecvt") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu2|znver4-fpu3") + +(define_insn_reservation "znver4_sse_icvt" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecvt") + (and (eq_attr "mode" "SI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu2|znver4-fpu3") + +(define_insn_reservation "znver4_sse_icvt_store" 4 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecvt") + (and (eq_attr "mode" "SI") + (eq_attr "memory" "store")))) + "znver4-double,znver4-fpu2|znver4-fpu3,znver4-fp-store") + +(define_insn_reservation "znver4_sse_shuf" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseshuf") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_sse_shuf_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseshuf") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu") + +(define_insn_reservation "znver4_sse_ishuf" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseshuf") + (and (eq_attr "mode" "OI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_sse_ishuf_load" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseshuf") + (and (eq_attr "mode" "OI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu1|znver4-fpu2") + +;; AVX512 instructions +(define_insn_reservation "znver4_sse_log_evex" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sselog") + (and (eq_attr "mode" "V16SF,V8DF,XI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu0*2|znver4-fpu1*2|znver4-fpu2*2|znver4-fpu3*2") + +(define_insn_reservation "znver4_sse_log_evex_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sselog") + (and (eq_attr "mode" "V16SF,V8DF,XI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu0*2|znver4-fpu1*2|znver4-fpu2*2|znver4-fpu3*2") + +(define_insn_reservation "znver4_sse_log1_evex" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sselog1") + (and (eq_attr "mode" "V16SF,V8DF,XI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu1*2|znver4-fpu2*2,znver4-fp-store") + +(define_insn_reservation "znver4_sse_log1_evex_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sselog1") + (and (eq_attr "mode" "V16SF,V8DF,XI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu1*2|znver4-fpu2*2,znver4-fp-store") + +(define_insn_reservation "znver4_sse_mul_evex" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemul") + (and (eq_attr "mode" "V16SF,V8DF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu0*2|znver4-fpu1*2") + +(define_insn_reservation "znver4_sse_mul_evex_load" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemul") + (and (eq_attr "mode" "V16SF,V8DF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu0*2|znver4-fpu1*2") + +(define_insn_reservation "znver4_sse_imul_evex" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseimul") + (and (eq_attr "mode" "XI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu0*2|znver4-fpu3*2") + +(define_insn_reservation "znver4_sse_imul_evex_load" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseimul") + (and (eq_attr "mode" "XI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu0*2|znver4-fpu1*2") + +(define_insn_reservation "znver4_sse_mov_evex" 4 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "XI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu1*2|znver4-fpu2*2") + +(define_insn_reservation "znver4_sse_mov_evex_load" 11 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "XI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu1*2|znver4-fpu2*2") + +(define_insn_reservation "znver4_sse_mov_evex_store" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "XI") + (eq_attr "memory" "store")))) + "znver4-direct,znver4-fpu1*2|znver4-fpu2*2,znver4-fp-store") + +(define_insn_reservation "znver4_sse_add_evex" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseadd") + (and (eq_attr "mode" "V16SF,V8DF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu2*2|znver4-fpu3*2") + +(define_insn_reservation "znver4_sse_add_evex_load" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseadd") + (and (eq_attr "mode" "V16SF,V8DF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu2*2|znver4-fpu3*2") + +(define_insn_reservation "znver4_sse_iadd_evex" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseiadd") + (and (eq_attr "mode" "XI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu0*2|znver4-fpu1*2|znver4-fpu2*2|znver4-fpu3*2") + +(define_insn_reservation "znver4_sse_iadd_evex_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseiadd") + (and (eq_attr "mode" "XI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu0*2|znver4-fpu1*2|znver4-fpu2*2|znver4-fpu3*2") + +(define_insn_reservation "znver4_sse_div_pd_evex" 13 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssediv") + (and (eq_attr "mode" "V8DF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fdiv*7") + +(define_insn_reservation "znver4_sse_div_ps_evex" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssediv") + (and (eq_attr "mode" "V16SF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fdiv*5") + +(define_insn_reservation "znver4_sse_div_pd_evex_load" 20 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssediv") + (and (eq_attr "mode" "V8DF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fdiv*7") + +(define_insn_reservation "znver4_sse_div_ps_evex_load" 17 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssediv") + (and (eq_attr "mode" "V16SF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fdiv*5") + +(define_insn_reservation "znver4_sse_cmp_avx512" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecmp,ssecomi") + (and (eq_attr "mode" "V16SF,V8DF,XI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu0*2|znver4-fpu1*2") + +(define_insn_reservation "znver4_sse_cmp_avx512_load" 12 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecmp,ssecomi") + (and (eq_attr "mode" "V16SF,V8DF,XI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu0*2|znver4-fpu1*2") + +(define_insn_reservation "znver4_sse_cvt_evex" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecvt") + (and (eq_attr "mode" "V16SF,V8DF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu1*2|znver4-fpu2*2,znver4-fpu2*2|znver4-fpu3*2") + +(define_insn_reservation "znver4_sse_cvt_evex_load" 13 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecvt") + (and (eq_attr "mode" "V16SF,V8DF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu1*2|znver4-fpu2*2,znver4-fpu2*2|znver4-fpu3*2") + +(define_insn_reservation "znver4_sse_shuf_evex" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseshuf") + (and (eq_attr "mode" "V16SF,V8DF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu0*2|znver4-fpu1*2|znver4-fpu2*2|znver4-fpu3*2") + +(define_insn_reservation "znver4_sse_shuf_evex_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseshuf") + (and (eq_attr "mode" "V16SF,V8DF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu0*2|znver4-fpu1*2|znver4-fpu2*2|znver4-fpu3*2") + +(define_insn_reservation "znver4_sse_ishuf_evex" 4 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseshuf") + (and (eq_attr "mode" "XI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu1*2|znver4-fpu2*2") + +(define_insn_reservation "znver4_sse_ishuf_evex_load" 11 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseshuf") + (and (eq_attr "mode" "XI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu1*2|znver4-fpu2*2") + +(define_insn_reservation "znver4_sse_muladd" 4 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemuladd") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-fpu0*2|znver4-fpu1*2") + +(define_insn_reservation "znver4_sse_muladd_load" 11 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseshuf") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-fpu0*2|znver4-fpu1*2") + +;; AVX512 mask instructions + +(define_insn_reservation "znver4_sse_mskmov" 2 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "mskmov") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-fpu0*2|znver4-fpu1*2") + +(define_insn_reservation "znver4_sse_msklog" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "msklog") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-fpu2*2|znver4-fpu3*2") -- 2.25.1 ^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: [PATCH][X86_64] Separate znver4 insn reservations from older znvers 2022-11-21 11:40 ` Joshi, Tejas Sanjay @ 2022-11-21 15:30 ` Alexander Monakov 2022-12-01 11:28 ` Joshi, Tejas Sanjay 0 siblings, 1 reply; 14+ messages in thread From: Alexander Monakov @ 2022-11-21 15:30 UTC (permalink / raw) To: Joshi, Tejas Sanjay; +Cc: gcc-patches, honza.hubicka, Kumar, Venkataramanan On Mon, 21 Nov 2022, Joshi, Tejas Sanjay wrote: > I have addressed all your comments in the patch attached here. I have also > used znver4-direct for avx512 insns. Thanks. > * This patch increased the insn-automata.cc size from 201502 to 214902. Assuming it's the number of lines of code, I have 102847, perhaps you're measuring without my patches? You can use 'size -A gcc/insn-automata.o' to measure binary size growth. > * Compile time and binary size on my machine remains same. > * Make check and bootstrap build have no issues. > * Spec cpu2017 also don't have any issues with this patch. > > Is this ok for trunk? I cannot approve or reject your patch, this is up to Honza who I believe was investigating if combining this with older Zen models makes sense. In the meantime, I see a few more issues that can be easily corrected, please see below. > --- /dev/null > +++ b/gcc/config/i386/znver4.md > +;; FSQRT > +(define_insn_reservation "znver4_fsqrt" 22 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "fpspc") > + (and (eq_attr "mode" "XF") > + (eq_attr "memory" "none")))) > + "znver4-direct,znver4-fdiv*20") This should be znver4-fdiv*10 (not *20) according to Agner Fog's measurements. > +;; FDIV > +(define_insn_reservation "znver4_fp_div" 15 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "fdiv") > + (eq_attr "memory" "none"))) > + "znver4-direct,znver4-fdiv*15") znver4-fdiv*6 instead of *15 here and in two patterns following this one. > +(define_insn_reservation "znver4_sse_div_pd" 13 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssediv") > + (and (eq_attr "mode" "V4DF,V2DF,V1DF") > + (eq_attr "memory" "none")))) > + "znver4-direct,znver4-fdiv*7") Agner Fog's measurements indicate fdiv*5 here. > + > +(define_insn_reservation "znver4_sse_div_ps" 10 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssediv") > + (and (eq_attr "mode" "V8SF,V4SF,V2SF,SF") > + (eq_attr "memory" "none")))) > + "znver4-direct,znver4-fdiv*5") Agner Fog's measurements indicate fdiv*3 here. > + > +(define_insn_reservation "znver4_sse_div_pd_load" 20 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssediv") > + (and (eq_attr "mode" "V4DF,V2DF,V1DF") > + (eq_attr "memory" "load")))) > + "znver4-direct,znver4-load,znver4-fdiv*7") fdiv*5? > + > +(define_insn_reservation "znver4_sse_div_ps_load" 17 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssediv") > + (and (eq_attr "mode" "V8SF,V4SF,V2SF,SF") > + (eq_attr "memory" "load")))) > + "znver4-direct,znver4-load,znver4-fdiv*5") fdiv*3? > +(define_insn_reservation "znver4_sse_div_pd_evex" 13 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssediv") > + (and (eq_attr "mode" "V8DF") > + (eq_attr "memory" "none")))) > + "znver4-direct,znver4-fdiv*7") This should be twice as much as the corresponding SSE/AVX instruction (fdiv*14 or fdiv*10; Agner Fog measured 9 cycles as reciprocal throughput). > + > +(define_insn_reservation "znver4_sse_div_ps_evex" 10 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssediv") > + (and (eq_attr "mode" "V16SF") > + (eq_attr "memory" "none")))) > + "znver4-direct,znver4-fdiv*5") Likewise (fdiv*6). > +(define_insn_reservation "znver4_sse_div_pd_evex_load" 20 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssediv") > + (and (eq_attr "mode" "V8DF") > + (eq_attr "memory" "load")))) > + "znver4-direct,znver4-load,znver4-fdiv*7") Likewise. > +(define_insn_reservation "znver4_sse_div_ps_evex_load" 17 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssediv") > + (and (eq_attr "mode" "V16SF") > + (eq_attr "memory" "load")))) > + "znver4-direct,znver4-load,znver4-fdiv*5") Likewise. Alexander ^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: [PATCH][X86_64] Separate znver4 insn reservations from older znvers 2022-11-21 15:30 ` Alexander Monakov @ 2022-12-01 11:28 ` Joshi, Tejas Sanjay 2022-12-01 19:01 ` Alexander Monakov 2022-12-12 21:41 ` Jan Hubička 0 siblings, 2 replies; 14+ messages in thread From: Joshi, Tejas Sanjay @ 2022-12-01 11:28 UTC (permalink / raw) To: Alexander Monakov, gcc-patches; +Cc: honza.hubicka, Kumar, Venkataramanan [-- Attachment #1: Type: text/plain, Size: 45433 bytes --] [Public] Hi, > > --- /dev/null > > +++ b/gcc/config/i386/znver4.md > > +;; FSQRT > > +(define_insn_reservation "znver4_fsqrt" 22 > > + (and (eq_attr "cpu" "znver4") > > + (and (eq_attr "type" "fpspc") > > + (and (eq_attr "mode" "XF") > > + (eq_attr "memory" "none")))) > > + "znver4-direct,znver4-fdiv*20") > > This should be znver4-fdiv*10 (not *20) according to Agner Fog's > measurements. > > > +;; FDIV > > +(define_insn_reservation "znver4_fp_div" 15 > > + (and (eq_attr "cpu" "znver4") > > + (and (eq_attr "type" "fdiv") > > + (eq_attr "memory" "none"))) > > + "znver4-direct,znver4-fdiv*15") > > znver4-fdiv*6 instead of *15 here and in two patterns following this one. > > > +(define_insn_reservation "znver4_sse_div_pd" 13 > > + (and (eq_attr "cpu" "znver4") > > + (and (eq_attr "type" "ssediv") > > + (and (eq_attr "mode" "V4DF,V2DF,V1DF") > > + (eq_attr "memory" "none")))) > > + "znver4-direct,znver4-fdiv*7") > > Agner Fog's measurements indicate fdiv*5 here. > > > + > > +(define_insn_reservation "znver4_sse_div_ps" 10 > > + (and (eq_attr "cpu" "znver4") > > + (and (eq_attr "type" "ssediv") > > + (and (eq_attr "mode" "V8SF,V4SF,V2SF,SF") > > + (eq_attr "memory" "none")))) > > + "znver4-direct,znver4-fdiv*5") > > Agner Fog's measurements indicate fdiv*3 here. > > > + > > +(define_insn_reservation "znver4_sse_div_pd_load" 20 > > + (and (eq_attr "cpu" "znver4") > > + (and (eq_attr "type" "ssediv") > > + (and (eq_attr "mode" "V4DF,V2DF,V1DF") > > + (eq_attr "memory" "load")))) > > + "znver4-direct,znver4-load,znver4-fdiv*7") > > fdiv*5? > > > + > > +(define_insn_reservation "znver4_sse_div_ps_load" 17 > > + (and (eq_attr "cpu" "znver4") > > + (and (eq_attr "type" "ssediv") > > + (and (eq_attr "mode" "V8SF,V4SF,V2SF,SF") > > + (eq_attr "memory" "load")))) > > + "znver4-direct,znver4-load,znver4-fdiv*5") > > fdiv*3? > > > +(define_insn_reservation "znver4_sse_div_pd_evex" 13 > > + (and (eq_attr "cpu" "znver4") > > + (and (eq_attr "type" "ssediv") > > + (and (eq_attr "mode" "V8DF") > > + (eq_attr "memory" "none")))) > > + "znver4-direct,znver4-fdiv*7") > > This should be twice as much as the corresponding SSE/AVX instruction > (fdiv*14 or fdiv*10; Agner Fog measured 9 cycles as reciprocal throughput). > > > + > > +(define_insn_reservation "znver4_sse_div_ps_evex" 10 > > + (and (eq_attr "cpu" "znver4") > > + (and (eq_attr "type" "ssediv") > > + (and (eq_attr "mode" "V16SF") > > + (eq_attr "memory" "none")))) > > + "znver4-direct,znver4-fdiv*5") > > Likewise (fdiv*6). > > > +(define_insn_reservation "znver4_sse_div_pd_evex_load" 20 > > + (and (eq_attr "cpu" "znver4") > > + (and (eq_attr "type" "ssediv") > > + (and (eq_attr "mode" "V8DF") > > + (eq_attr "memory" "load")))) > > + "znver4-direct,znver4-load,znver4-fdiv*7") > > Likewise. > > > +(define_insn_reservation "znver4_sse_div_ps_evex_load" 17 > > + (and (eq_attr "cpu" "znver4") > > + (and (eq_attr "type" "ssediv") > > + (and (eq_attr "mode" "V16SF") > > + (eq_attr "memory" "load")))) > > + "znver4-direct,znver4-load,znver4-fdiv*5") > > Likewise. I have addressed all your comments in this revised patch, PFA and inlined below. Is it ok for trunk? Thanks and Regards, Tejas gcc/ChangeLog: * gcc/common/config/i386/i386-common.cc (processor_alias_table): Use CPU_ZNVER4 for znver4. * config/i386/i386.md: Add znver4.md. * config/i386/znver4.md: New. --- gcc/common/config/i386/i386-common.cc | 2 +- gcc/config/i386/i386.md | 1 + gcc/config/i386/znver4.md | 1027 +++++++++++++++++++++++++ 3 files changed, 1029 insertions(+), 1 deletion(-) create mode 100644 gcc/config/i386/znver4.md diff --git a/gcc/common/config/i386/i386-common.cc b/gcc/common/config/i386/i386-common.cc index 6ce2a588adc..6d941642911 100644 --- a/gcc/common/config/i386/i386-common.cc +++ b/gcc/common/config/i386/i386-common.cc @@ -2207,7 +2207,7 @@ const pta processor_alias_table[] = {"znver3", PROCESSOR_ZNVER3, CPU_ZNVER3, PTA_ZNVER3, M_CPU_SUBTYPE (AMDFAM19H_ZNVER3), P_PROC_AVX2}, - {"znver4", PROCESSOR_ZNVER4, CPU_ZNVER3, + {"znver4", PROCESSOR_ZNVER4, CPU_ZNVER4, PTA_ZNVER4, M_CPU_SUBTYPE (AMDFAM19H_ZNVER4), P_PROC_AVX512F}, {"btver1", PROCESSOR_BTVER1, CPU_GENERIC, diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 01faa911b77..ebb4eec1961 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -1318,6 +1318,7 @@ (include "bdver3.md") (include "btver2.md") (include "znver.md") +(include "znver4.md") (include "geode.md") (include "atom.md") (include "slm.md") diff --git a/gcc/config/i386/znver4.md b/gcc/config/i386/znver4.md new file mode 100644 index 00000000000..9d52dc517f5 --- /dev/null +++ b/gcc/config/i386/znver4.md @@ -0,0 +1,1027 @@ +;; Copyright (C) 2012-2022 Free Software Foundation, Inc. +;; +;; This file is part of GCC. +;; +;; GCC is free software; you can redistribute it and/or modify +;; it under the terms of the GNU General Public License as published by +;; the Free Software Foundation; either version 3, or (at your option) +;; any later version. +;; +;; GCC is distributed in the hope that it will be useful, +;; but WITHOUT ANY WARRANTY; without even the implied warranty of +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +;; GNU General Public License for more details. +;; +;; You should have received a copy of the GNU General Public License +;; along with GCC; see the file COPYING3. If not see +;; <http://www.gnu.org/licenses/>. +;; + + +(define_attr "znver4_decode" "direct,vector,double" + (const_string "direct")) + +;; AMD znver4 Scheduling +;; Modeling automatons for zen decoders, integer execution pipes, +;; AGU pipes, branch, floating point execution and fp store units. +(define_automaton "znver4, znver4_ieu, znver4_idiv, znver4_fdiv, znver4_agu, znver4_bru, znver4_fpu, znver4_fp_store") + +;; Decoders unit has 4 decoders and all of them can decode fast path +;; and vector type instructions. +(define_cpu_unit "znver4-decode0" "znver4") +(define_cpu_unit "znver4-decode1" "znver4") +(define_cpu_unit "znver4-decode2" "znver4") +(define_cpu_unit "znver4-decode3" "znver4") + +;; Currently blocking all decoders for vector path instructions as +;; they are dispatched separetely as microcode sequence. +(define_reservation "znver4-vector" "znver4-decode0+znver4-decode1+znver4-decode2+znver4-decode3") + +;; Direct instructions can be issued to any of the four decoders. +(define_reservation "znver4-direct" "znver4-decode0|znver4-decode1|znver4-decode2|znver4-decode3") + +;; Fix me: Need to revisit this later to simulate fast path double behavior. +(define_reservation "znver4-double" "znver4-direct") + + +;; Integer unit 4 ALU pipes. +(define_cpu_unit "znver4-ieu0" "znver4_ieu") +(define_cpu_unit "znver4-ieu1" "znver4_ieu") +(define_cpu_unit "znver4-ieu2" "znver4_ieu") +(define_cpu_unit "znver4-ieu3" "znver4_ieu") +(define_reservation "znver4-ieu" "znver4-ieu0|znver4-ieu1|znver4-ieu2|znver4-ieu3") + +;; 3 AGU pipes in znver4 +(define_cpu_unit "znver4-agu0" "znver4_agu") +(define_cpu_unit "znver4-agu1" "znver4_agu") +(define_cpu_unit "znver4-agu2" "znver4_agu") +(define_reservation "znver4-agu-reserve" "znver4-agu0|znver4-agu1|znver4-agu2") + +;; Load is 4 cycles. We do not model reservation of load unit. +(define_reservation "znver4-load" "znver4-agu-reserve") +(define_reservation "znver4-store" "znver4-agu-reserve") + +;; vectorpath (microcoded) instructions are single issue instructions. +;; So, they occupy all the integer units. +(define_reservation "znver4-ivector" "znver4-ieu0+znver4-ieu1 + +znver4-ieu2+znver4-ieu3 + +znver4-agu0+znver4-agu1+znver4-agu2") + +;; Floating point unit 4 FP pipes. +(define_cpu_unit "znver4-fpu0" "znver4_fpu") +(define_cpu_unit "znver4-fpu1" "znver4_fpu") +(define_cpu_unit "znver4-fpu2" "znver4_fpu") +(define_cpu_unit "znver4-fpu3" "znver4_fpu") + +(define_reservation "znver4-fpu" "znver4-fpu0|znver4-fpu1|znver4-fpu2|znver4-fpu3") + +(define_reservation "znver4-fvector" "znver4-fpu0+znver4-fpu1 + +znver4-fpu2+znver4-fpu3 + +znver4-agu0+znver4-agu1+znver4-agu2") + +;; DIV units +(define_cpu_unit "znver4-idiv" "znver4_idiv") +(define_cpu_unit "znver4-fdiv" "znver4_fdiv") + +;; znver4 has a separate branch unit. +(define_cpu_unit "znver4-bru" "znver4_bru") + +;; Separate fp store and fp-to-int store. Although there are 2 store pipes, the +;; throughput is limited to only one per cycle. +(define_cpu_unit "znver4-fp-store" "znver4_fp_store") + +;; Call Instruction +(define_insn_reservation "znver4_call" 1 + (and (eq_attr "cpu" "znver4") + (eq_attr "type" "call,callv")) + "znver4-double,znver4-ieu0|znver4-bru,znver4-store") + +;; Push Instruction +(define_insn_reservation "znver4_push" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "push") + (eq_attr "memory" "store"))) + "znver4-direct,znver4-store") + +(define_insn_reservation "znver4_push_load" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "push") + (eq_attr "memory" "both"))) + "znver4-direct,znver4-load,znver4-store") + +;; Pop instruction +(define_insn_reservation "znver4_pop" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "pop") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load") + +(define_insn_reservation "znver4_pop_mem" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "pop") + (eq_attr "memory" "both"))) + "znver4-direct,znver4-load,znver4-store") + +;; Leave +(define_insn_reservation "znver4_leave" 1 + (and (eq_attr "cpu" "znver4") + (eq_attr "type" "leave")) + "znver4-double,znver4-ieu,znver4-store") + +;; Integer Instructions or General instructions +;; Multiplications +(define_insn_reservation "znver4_imul" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "imul") + (and (eq_attr "mode" "QI,HI,SI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-ieu1") + +(define_insn_reservation "znver4_imul_DI" 4 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "imul") + (and (eq_attr "mode" "DI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-ieu1") + +(define_insn_reservation "znver4_imul_mem" 7 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "imul") + (and (eq_attr "mode" "QI,HI,SI") + (eq_attr "memory" "!none")))) + "znver4-direct,znver4-load,znver4-ieu1") + +(define_insn_reservation "znver4_imul_DI_mem" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "imul") + (and (eq_attr "mode" "DI") + (eq_attr "memory" "!none")))) + "znver4-direct,znver4-load,znver4-ieu1") + +;; Divisions +(define_insn_reservation "znver4_idiv_DI" 18 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "idiv") + (and (eq_attr "mode" "DI") + (eq_attr "memory" "none")))) + "znver4-double,znver4-idiv*10") + +(define_insn_reservation "znver4_idiv_SI" 12 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "idiv") + (and (eq_attr "mode" "SI") + (eq_attr "memory" "none")))) + "znver4-double,znver4-idiv*6") + +(define_insn_reservation "znver4_idiv_HI" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "idiv") + (and (eq_attr "mode" "HI") + (eq_attr "memory" "none")))) + "znver4-double,znver4-idiv*4") + +(define_insn_reservation "znver4_idiv_QI" 9 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "idiv") + (and (eq_attr "mode" "QI") + (eq_attr "memory" "none")))) + "znver4-double,znver4-idiv*4") + +(define_insn_reservation "znver4_idiv_DI_mem" 22 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "idiv") + (and (eq_attr "mode" "DI") + (eq_attr "memory" "!none")))) + "znver4-double,znver4-load,znver4-idiv*10") + +(define_insn_reservation "znver4_idiv_SI_mem" 16 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "idiv") + (and (eq_attr "mode" "SI") + (eq_attr "memory" "!none")))) + "znver4-double,znver4-load,znver4-idiv*6") + +(define_insn_reservation "znver4_idiv_HI_mem" 14 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "idiv") + (and (eq_attr "mode" "HI") + (eq_attr "memory" "!none")))) + "znver4-double,znver4-load,znver4-idiv*4") + +(define_insn_reservation "znver4_idiv_QI_mem" 13 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "idiv") + (and (eq_attr "mode" "QI") + (eq_attr "memory" "!none")))) + "znver4-double,znver4-load,znver4-idiv*4") + +;; STR and ISHIFT are microcoded. +(define_insn_reservation "znver4_str" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "str") + (eq_attr "memory" "both,store"))) + "znver4-vector,znver4-ivector") + +(define_insn_reservation "znver4_ishift" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ishift") + (eq_attr "memory" "both,store"))) + "znver4-vector,znver4-ivector") + +;; MOV - integer movs +(define_insn_reservation "znver4_imovx_double" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "znver1_decode" "double") + (and (eq_attr "type" "imovx") + (eq_attr "memory" "none")))) + "znver4-double,znver4-ieu") + +(define_insn_reservation "znver4_imov_direct" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "imov") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-ieu") + +(define_insn_reservation "znver4_imov_double_store" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "znver1_decode" "double") + (and (eq_attr "type" "imovx") + (eq_attr "memory" "store")))) + "znver4-double,znver4-ieu,znver4-store") + +(define_insn_reservation "znver4_imov_direct_store" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "imov") + (eq_attr "memory" "store"))) + "znver4-direct,znver4-ieu,znver4-store") + +(define_insn_reservation "znver4_imov_load_double_store" 4 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "znver1_decode" "double") + (and (eq_attr "type" "imovx") + (eq_attr "memory" "store")))) + "znver4-double,znver4-load,znver4-ieu,znver4-store") + +(define_insn_reservation "znver4_imov_load_direct_store" 4 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "imov") + (eq_attr "memory" "store"))) + "znver4-direct,znver4-load,znver4-ieu,znver4-store") + +;; INTEGER/GENERAL Instructions +(define_insn_reservation "znver4_insn" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "alu,alu1,negnot,rotate1,ishift1,test,incdec,icmp") + (eq_attr "memory" "none,unknown"))) + "znver4-direct,znver4-ieu") + +(define_insn_reservation "znver4_insn_load" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "alu,alu1,negnot,rotate1,ishift1,test,incdec,icmp") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-ieu") + +(define_insn_reservation "znver4_insn2" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "icmov,setcc") + (eq_attr "memory" "none,unknown"))) + "znver4-direct,znver4-ieu0|znver4-ieu3") + +(define_insn_reservation "znver4_insn2_load" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "icmov,setcc") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-ieu0|znver4-ieu3") + +(define_insn_reservation "znver4_rotate" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "rotate") + (eq_attr "memory" "none,unknown"))) + "znver4-direct,znver4-ieu1|znver4-ieu2") + +(define_insn_reservation "znver4_rotate_load" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "rotate") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-ieu1|znver4-ieu2") + +(define_insn_reservation "znver4_insn_store" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "alu,alu1,negnot,rotate1,ishift1,test,incdec,icmp") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-ieu,znver4-store") + +(define_insn_reservation "znver4_insn2_store" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "icmov,setcc") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-ieu0|znver4-ieu3,znver4-store") + +(define_insn_reservation "znver4_rotate_store" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "rotate") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-ieu1|znver4-ieu2,znver4-store") + +;; Other vector type +(define_insn_reservation "znver4_ieu_vector" 5 + (and (eq_attr "cpu" "znver4") + (eq_attr "type" "other,multi")) + "znver4-vector,znver4-ivector") + +;; alu1 instructions +(define_insn_reservation "znver4_alu1_vector" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "znver1_decode" "vector") + (and (eq_attr "type" "alu1") + (eq_attr "memory" "none,unknown")))) + "znver4-vector,znver4-ivector") + +(define_insn_reservation "znver4_alu1_direct" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "znver1_decode" "direct") + (and (eq_attr "type" "alu1") + (eq_attr "memory" "none,unknown")))) + "znver4-direct,znver4-ieu") + +;; Branches +(define_insn_reservation "znver4_branch" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ibr") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-ieu0|znver4-bru") + +(define_insn_reservation "znver4_branch_mem" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ibr") + (eq_attr "memory" "load"))) + "znver4-vector,znver4-ivector") + +;; LEA instruction with simple addressing +(define_insn_reservation "znver4_lea" 1 + (and (eq_attr "cpu" "znver4") + (eq_attr "type" "lea")) + "znver4-direct,znver4-ieu") + +;; Floating Point +;; FP movs +(define_insn_reservation "znver4_fp_cmov" 6 + (and (eq_attr "cpu" "znver4") + (eq_attr "type" "fcmov")) + "znver4-vector,znver4-fvector") + +(define_insn_reservation "znver4_fp_mov_direct" 1 + (and (eq_attr "cpu" "znver4") + (eq_attr "type" "fmov")) + "znver4-direct,znver4-fpu1") + +(define_insn_reservation "znver4_fp_mov_direct_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "znver1_decode" "direct") + (and (eq_attr "type" "fmov") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu1") + +(define_insn_reservation "znver4_fp_mov_direct_store" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "znver1_decode" "direct") + (and (eq_attr "type" "fmov") + (eq_attr "memory" "store")))) + "znver4-direct,znver4-fpu1,znver4-fp-store") + +(define_insn_reservation "znver4_fp_mov_double" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "znver1_decode" "double") + (and (eq_attr "type" "fmov") + (eq_attr "memory" "none")))) + "znver4-double,znver4-fpu1,znver4-fp-store") + +(define_insn_reservation "znver4_fp_mov_double_load" 12 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "znver1_decode" "double") + (and (eq_attr "type" "fmov") + (eq_attr "memory" "load")))) + "znver4-double,znver4-load,znver4-fpu1,znver4-fp-store") + +;; FSQRT +(define_insn_reservation "znver4_fsqrt" 22 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "fpspc") + (and (eq_attr "mode" "XF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fdiv*10") + +;; FPSPC instructions +(define_insn_reservation "znver4_fp_spc" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "fpspc") + (eq_attr "memory" "none"))) + "znver4-vector,znver4-fvector") + +(define_insn_reservation "znver4_fp_insn_vector" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "znver1_decode" "vector") + (eq_attr "type" "mmxcvt,sselog1,ssemov"))) + "znver4-vector,znver4-fvector") + +;; FABS, FCHS +(define_insn_reservation "znver4_fp_fsgn" 1 + (and (eq_attr "cpu" "znver4") + (eq_attr "type" "fsgn")) + "znver4-direct,znver4-fpu0|znver4-fpu1") + +;; FCMP +(define_insn_reservation "znver4_fp_fcmp" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "fcmp") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-fpu1") + +(define_insn_reservation "znver4_fp_fcmp_double" 4 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "fcmp") + (and (eq_attr "znver1_decode" "double") + (eq_attr "memory" "none")))) + "znver4-double,znver4-fpu1,znver4-fpu2") + +;; FADD, FSUB, FMUL +(define_insn_reservation "znver4_fp_op_mul" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "fop,fmul") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-fpu0") + +(define_insn_reservation "znver4_fp_op_mul_load" 13 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "fop,fmul") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-fpu0") + +;; FDIV +(define_insn_reservation "znver4_fp_div" 15 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "fdiv") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-fdiv*6") + +(define_insn_reservation "znver4_fp_div_load" 22 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "fdiv") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-fdiv*6") + +(define_insn_reservation "znver4_fp_idiv_load" 26 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "fdiv") + (and (eq_attr "fp_int_src" "true") + (eq_attr "memory" "load")))) + "znver4-double,znver4-load,znver4-fdiv*6") + +;; MMX, SSE, SSEn.n instructions +(define_insn_reservation "znver4_fp_mmx " 1 + (and (eq_attr "cpu" "znver4") + (eq_attr "type" "mmx")) + "znver4-direct,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_mmx_add_cmp" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "mmxadd,mmxcmp") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-fpu") + +(define_insn_reservation "znver4_mmx_add_cmp_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "mmxadd,mmxcmp") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-fpu") + +(define_insn_reservation "znver4_mmx_insn" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "mmxcvt,sseshuf,sseshuf1,mmxshft") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_mmx_insn_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "mmxcvt,sseshuf,sseshuf1,mmxshft") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_mmx_mov" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" ",mmxmov") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-fp-store") + +(define_insn_reservation "znver4_mmx_mov_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "mmxmov") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-fp-store") + +(define_insn_reservation "znver4_mmx_mul" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "mmxmul") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-fpu0|znver4-fpu3") + +(define_insn_reservation "znver4_mmx_mul_load" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "mmxmul") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-fpu0|znver4-fpu3") + +;; AVX instructions +(define_insn_reservation "znver4_sse_log" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sselog") + (and (eq_attr "mode" "V4SF,V8SF,V2DF,V4DF,QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu") + +(define_insn_reservation "znver4_sse_log_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sselog") + (and (eq_attr "mode" "V4SF,V8SF,V2DF,V4DF,QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu") + +(define_insn_reservation "znver4_sse_log1" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sselog1") + (and (eq_attr "mode" "V4SF,V8SF,V2DF,V4DF,QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu1|znver4-fpu2,znver4-fp-store") + +(define_insn_reservation "znver4_sse_log1_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sselog1") + (and (eq_attr "mode" "V4SF,V8SF,V2DF,V4DF,QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu1|znver4-fpu2,znver4-fp-store") + +(define_insn_reservation "znver4_sse_comi" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecomi") + (eq_attr "memory" "none"))) + "znver4-double,znver4-fpu2|znver4-fpu3,znver4-fp-store") + +(define_insn_reservation "znver4_sse_comi_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecomi") + (eq_attr "memory" "load"))) + "znver4-double,znver4-load,znver4-fpu2|znver4-fpu3,znver4-fp-store") + +(define_insn_reservation "znver4_sse_test" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "prefix_extra" "1") + (and (eq_attr "type" "ssecomi") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_sse_test_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "prefix_extra" "1") + (and (eq_attr "type" "ssecomi") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_sse_imul" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseimul") + (and (eq_attr "mode" "QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu0|znver4-fpu3") + +(define_insn_reservation "znver4_sse_imul_load" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseimul") + (and (eq_attr "mode" "QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu0|znver4-fpu1") + +(define_insn_reservation "znver4_sse_mov" 2 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_sse_mov_load" 9 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_sse_mov_store" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "store")))) + "znver4-direct,znver4-fpu1|znver4-fpu2,znver4-fp-store") + +(define_insn_reservation "znver4_sse_mov_fp" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "V16SF,V8DF,V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu") + +(define_insn_reservation "znver4_sse_mov_fp_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "V16SF,V8DF,V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu") + +(define_insn_reservation "znver4_sse_mov_fp_store" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "V16SF,V8DF,V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "store")))) + "znver4-direct,znver4-fp-store") + +(define_insn_reservation "znver4_sse_add" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseadd") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu2|znver4-fpu3") + +(define_insn_reservation "znver4_sse_add_load" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseadd") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu2|znver4-fpu3") + +(define_insn_reservation "znver4_sse_add1" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseadd1") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "none")))) + "znver4-vector,znver4-fvector") + +(define_insn_reservation "znver4_sse_add1_load" 13 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseadd1") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "load")))) + "znver4-vector,znver4-load,znver4-fvector") + +(define_insn_reservation "znver4_sse_iadd" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseiadd") + (and (eq_attr "mode" "QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu") + +(define_insn_reservation "znver4_sse_iadd_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseiadd") + (and (eq_attr "mode" "QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu") + +(define_insn_reservation "znver4_sse_mul" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemul") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu0|znver4-fpu1") + +(define_insn_reservation "znver4_sse_mul_load" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemul") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu0|znver4-fpu1") + +(define_insn_reservation "znver4_sse_div_pd" 13 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssediv") + (and (eq_attr "mode" "V4DF,V2DF,V1DF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fdiv*5") + +(define_insn_reservation "znver4_sse_div_ps" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssediv") + (and (eq_attr "mode" "V8SF,V4SF,V2SF,SF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fdiv*3") + +(define_insn_reservation "znver4_sse_div_pd_load" 20 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssediv") + (and (eq_attr "mode" "V4DF,V2DF,V1DF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fdiv*5") + +(define_insn_reservation "znver4_sse_div_ps_load" 17 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssediv") + (and (eq_attr "mode" "V8SF,V4SF,V2SF,SF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fdiv*3") + +(define_insn_reservation "znver4_sse_cmp_avx" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecmp,ssecomi") + (and (eq_attr "mode" "V4SF,V2DF,V2SF,V1DF,SF,QI,HI,SI,DI,TI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu0|znver4-fpu1") + +(define_insn_reservation "znver4_sse_cmp_avx_load" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecmp,ssecomi") + (and (eq_attr "mode" "V4SF,V2DF,V2SF,V1DF,SF,QI,HI,SI,DI,TI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu0|znver4-fpu1") + +(define_insn_reservation "znver4_sse_cmp_avx2" 4 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecmp,ssecomi") + (and (eq_attr "mode" "V8SF,V4DF,OI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu0|znver4-fpu1") + +(define_insn_reservation "znver4_sse_cmp_avx2_load" 11 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecmp,ssecomi") + (and (eq_attr "mode" "V8SF,V4DF,OI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu0|znver4-fpu1") + +(define_insn_reservation "znver4_sse_cvt" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecvt") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu2|znver4-fpu3") + +(define_insn_reservation "znver4_sse_cvt_load" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecvt") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu2|znver4-fpu3") + +(define_insn_reservation "znver4_sse_icvt" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecvt") + (and (eq_attr "mode" "SI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu2|znver4-fpu3") + +(define_insn_reservation "znver4_sse_icvt_store" 4 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecvt") + (and (eq_attr "mode" "SI") + (eq_attr "memory" "store")))) + "znver4-double,znver4-fpu2|znver4-fpu3,znver4-fp-store") + +(define_insn_reservation "znver4_sse_shuf" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseshuf") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_sse_shuf_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseshuf") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu") + +(define_insn_reservation "znver4_sse_ishuf" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseshuf") + (and (eq_attr "mode" "OI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_sse_ishuf_load" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseshuf") + (and (eq_attr "mode" "OI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu1|znver4-fpu2") + +;; AVX512 instructions +(define_insn_reservation "znver4_sse_log_evex" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sselog") + (and (eq_attr "mode" "V16SF,V8DF,XI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu0*2|znver4-fpu1*2|znver4-fpu2*2|znver4-fpu3*2") + +(define_insn_reservation "znver4_sse_log_evex_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sselog") + (and (eq_attr "mode" "V16SF,V8DF,XI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu0*2|znver4-fpu1*2|znver4-fpu2*2|znver4-fpu3*2") + +(define_insn_reservation "znver4_sse_log1_evex" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sselog1") + (and (eq_attr "mode" "V16SF,V8DF,XI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu1*2|znver4-fpu2*2,znver4-fp-store") + +(define_insn_reservation "znver4_sse_log1_evex_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sselog1") + (and (eq_attr "mode" "V16SF,V8DF,XI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu1*2|znver4-fpu2*2,znver4-fp-store") + +(define_insn_reservation "znver4_sse_mul_evex" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemul") + (and (eq_attr "mode" "V16SF,V8DF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu0*2|znver4-fpu1*2") + +(define_insn_reservation "znver4_sse_mul_evex_load" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemul") + (and (eq_attr "mode" "V16SF,V8DF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu0*2|znver4-fpu1*2") + +(define_insn_reservation "znver4_sse_imul_evex" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseimul") + (and (eq_attr "mode" "XI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu0*2|znver4-fpu3*2") + +(define_insn_reservation "znver4_sse_imul_evex_load" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseimul") + (and (eq_attr "mode" "XI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu0*2|znver4-fpu1*2") + +(define_insn_reservation "znver4_sse_mov_evex" 4 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "XI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu1*2|znver4-fpu2*2") + +(define_insn_reservation "znver4_sse_mov_evex_load" 11 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "XI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu1*2|znver4-fpu2*2") + +(define_insn_reservation "znver4_sse_mov_evex_store" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "XI") + (eq_attr "memory" "store")))) + "znver4-direct,znver4-fpu1*2|znver4-fpu2*2,znver4-fp-store") + +(define_insn_reservation "znver4_sse_add_evex" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseadd") + (and (eq_attr "mode" "V16SF,V8DF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu2*2|znver4-fpu3*2") + +(define_insn_reservation "znver4_sse_add_evex_load" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseadd") + (and (eq_attr "mode" "V16SF,V8DF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu2*2|znver4-fpu3*2") + +(define_insn_reservation "znver4_sse_iadd_evex" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseiadd") + (and (eq_attr "mode" "XI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu0*2|znver4-fpu1*2|znver4-fpu2*2|znver4-fpu3*2") + +(define_insn_reservation "znver4_sse_iadd_evex_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseiadd") + (and (eq_attr "mode" "XI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu0*2|znver4-fpu1*2|znver4-fpu2*2|znver4-fpu3*2") + +(define_insn_reservation "znver4_sse_div_pd_evex" 13 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssediv") + (and (eq_attr "mode" "V8DF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fdiv*9") + +(define_insn_reservation "znver4_sse_div_ps_evex" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssediv") + (and (eq_attr "mode" "V16SF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fdiv*6") + +(define_insn_reservation "znver4_sse_div_pd_evex_load" 20 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssediv") + (and (eq_attr "mode" "V8DF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fdiv*9") + +(define_insn_reservation "znver4_sse_div_ps_evex_load" 17 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssediv") + (and (eq_attr "mode" "V16SF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fdiv*6") + +(define_insn_reservation "znver4_sse_cmp_avx512" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecmp,ssecomi") + (and (eq_attr "mode" "V16SF,V8DF,XI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu0*2|znver4-fpu1*2") + +(define_insn_reservation "znver4_sse_cmp_avx512_load" 12 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecmp,ssecomi") + (and (eq_attr "mode" "V16SF,V8DF,XI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu0*2|znver4-fpu1*2") + +(define_insn_reservation "znver4_sse_cvt_evex" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecvt") + (and (eq_attr "mode" "V16SF,V8DF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu1*2|znver4-fpu2*2,znver4-fpu2*2|znver4-fpu3*2") + +(define_insn_reservation "znver4_sse_cvt_evex_load" 13 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecvt") + (and (eq_attr "mode" "V16SF,V8DF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu1*2|znver4-fpu2*2,znver4-fpu2*2|znver4-fpu3*2") + +(define_insn_reservation "znver4_sse_shuf_evex" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseshuf") + (and (eq_attr "mode" "V16SF,V8DF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu0*2|znver4-fpu1*2|znver4-fpu2*2|znver4-fpu3*2") + +(define_insn_reservation "znver4_sse_shuf_evex_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseshuf") + (and (eq_attr "mode" "V16SF,V8DF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu0*2|znver4-fpu1*2|znver4-fpu2*2|znver4-fpu3*2") + +(define_insn_reservation "znver4_sse_ishuf_evex" 4 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseshuf") + (and (eq_attr "mode" "XI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu1*2|znver4-fpu2*2") + +(define_insn_reservation "znver4_sse_ishuf_evex_load" 11 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseshuf") + (and (eq_attr "mode" "XI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu1*2|znver4-fpu2*2") + +(define_insn_reservation "znver4_sse_muladd" 4 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemuladd") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-fpu0*2|znver4-fpu1*2") + +(define_insn_reservation "znver4_sse_muladd_load" 11 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseshuf") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-fpu0*2|znver4-fpu1*2") + +;; AVX512 mask instructions + +(define_insn_reservation "znver4_sse_mskmov" 2 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "mskmov") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-fpu0*2|znver4-fpu1*2") + +(define_insn_reservation "znver4_sse_msklog" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "msklog") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-fpu2*2|znver4-fpu3*2") [-- Attachment #2: winmail.dat --] [-- Type: application/ms-tnef, Size: 80854 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: [PATCH][X86_64] Separate znver4 insn reservations from older znvers 2022-12-01 11:28 ` Joshi, Tejas Sanjay @ 2022-12-01 19:01 ` Alexander Monakov 2022-12-12 21:41 ` Jan Hubička 1 sibling, 0 replies; 14+ messages in thread From: Alexander Monakov @ 2022-12-01 19:01 UTC (permalink / raw) To: Joshi, Tejas Sanjay; +Cc: gcc-patches, honza.hubicka, Kumar, Venkataramanan On Thu, 1 Dec 2022, Joshi, Tejas Sanjay wrote: > I have addressed all your comments in this revised patch, PFA and inlined below. Thank you. Honza, please let me know if any further input is needed from my side. For reference, here's how insn-automata.o table sizes look with this patch (top 17, in bytes): 20068 r bdver1_fp_check 20068 r bdver1_fp_transitions 26208 r slm_min_issue_delay 27244 r bdver1_fp_min_issue_delay 28518 r glm_check 28518 r glm_transitions 33345 r znver4_fpu_min_issue_delay 33690 r geode_min_issue_delay 46980 r bdver3_fp_min_issue_delay 49428 r glm_min_issue_delay 53730 r btver2_fp_min_issue_delay 53760 r znver1_fp_transitions 93960 r bdver3_fp_transitions 106102 r lujiazui_core_check 106102 r lujiazui_core_transitions 133380 r znver4_fpu_transitions 196123 r lujiazui_core_min_issue_delay There is a plan to further reduce Lujiazui and b[td]verX table sizes by properly modeling division units like we did for znver.md (PR 87832). Alexander ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH][X86_64] Separate znver4 insn reservations from older znvers 2022-12-01 11:28 ` Joshi, Tejas Sanjay 2022-12-01 19:01 ` Alexander Monakov @ 2022-12-12 21:41 ` Jan Hubička 2022-12-22 17:34 ` Joshi, Tejas Sanjay 1 sibling, 1 reply; 14+ messages in thread From: Jan Hubička @ 2022-12-12 21:41 UTC (permalink / raw) To: Joshi, Tejas Sanjay; +Cc: Alexander Monakov, gcc-patches, Kumar, Venkataramanan [-- Attachment #1: Type: text/plain, Size: 35240 bytes --] > I have addressed all your comments in this revised patch, PFA and inlined > below. > > Is it ok for trunk? > > Thanks and Regards, > Tejas > > gcc/ChangeLog: > > * gcc/common/config/i386/i386-common.cc (processor_alias_table): > Use CPU_ZNVER4 for znver4. > * config/i386/i386.md: Add znver4.md. > * config/i386/znver4.md: New. > Hi, I went through the patch and compared with Agner's table and have few comments below. > > --- > gcc/common/config/i386/i386-common.cc | 2 +- > gcc/config/i386/i386.md | 1 + > gcc/config/i386/znver4.md | 1027 +++++++++++++++++++++++++ > 3 files changed, 1029 insertions(+), 1 deletion(-) > create mode 100644 gcc/config/i386/znver4.md > > diff --git a/gcc/common/config/i386/i386-common.cc > b/gcc/common/config/i386/i386-common.cc > index 6ce2a588adc..6d941642911 100644 > --- a/gcc/common/config/i386/i386-common.cc > +++ b/gcc/common/config/i386/i386-common.cc > @@ -2207,7 +2207,7 @@ const pta processor_alias_table[] = > {"znver3", PROCESSOR_ZNVER3, CPU_ZNVER3, > PTA_ZNVER3, > M_CPU_SUBTYPE (AMDFAM19H_ZNVER3), P_PROC_AVX2}, > - {"znver4", PROCESSOR_ZNVER4, CPU_ZNVER3, > + {"znver4", PROCESSOR_ZNVER4, CPU_ZNVER4, > PTA_ZNVER4, > M_CPU_SUBTYPE (AMDFAM19H_ZNVER4), P_PROC_AVX512F}, > {"btver1", PROCESSOR_BTVER1, CPU_GENERIC, > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md > index 01faa911b77..ebb4eec1961 100644 > --- a/gcc/config/i386/i386.md > +++ b/gcc/config/i386/i386.md > @@ -1318,6 +1318,7 @@ > (include "bdver3.md") > (include "btver2.md") > (include "znver.md") > +(include "znver4.md") > (include "geode.md") > (include "atom.md") > (include "slm.md") > diff --git a/gcc/config/i386/znver4.md b/gcc/config/i386/znver4.md > new file mode 100644 > index 00000000000..9d52dc517f5 > --- /dev/null > +++ b/gcc/config/i386/znver4.md > @@ -0,0 +1,1027 @@ > +;; Copyright (C) 2012-2022 Free Software Foundation, Inc. > +;; > +;; This file is part of GCC. > +;; > +;; GCC is free software; you can redistribute it and/or modify > +;; it under the terms of the GNU General Public License as published by > +;; the Free Software Foundation; either version 3, or (at your option) > +;; any later version. > +;; > +;; GCC is distributed in the hope that it will be useful, > +;; but WITHOUT ANY WARRANTY; without even the implied warranty of > +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > +;; GNU General Public License for more details. > +;; > +;; You should have received a copy of the GNU General Public License > +;; along with GCC; see the file COPYING3. If not see > +;; <http://www.gnu.org/licenses/>. > +;; > + > + > +(define_attr "znver4_decode" "direct,vector,double" > + (const_string "direct")) > + > +;; AMD znver4 Scheduling > +;; Modeling automatons for zen decoders, integer execution pipes, > +;; AGU pipes, branch, floating point execution and fp store units. > +(define_automaton "znver4, znver4_ieu, znver4_idiv, znver4_fdiv, > znver4_agu, znver4_bru, znver4_fpu, znver4_fp_store") > + > +;; Decoders unit has 4 decoders and all of them can decode fast path > +;; and vector type instructions. > +(define_cpu_unit "znver4-decode0" "znver4") > +(define_cpu_unit "znver4-decode1" "znver4") > +(define_cpu_unit "znver4-decode2" "znver4") > +(define_cpu_unit "znver4-decode3" "znver4") > + > +;; Currently blocking all decoders for vector path instructions as > +;; they are dispatched separetely as microcode sequence. > +(define_reservation "znver4-vector" > "znver4-decode0+znver4-decode1+znver4-decode2+znver4-decode3") > + > +;; Direct instructions can be issued to any of the four decoders. > +(define_reservation "znver4-direct" > "znver4-decode0|znver4-decode1|znver4-decode2|znver4-decode3") > + > +;; Fix me: Need to revisit this later to simulate fast path double > behavior. > +(define_reservation "znver4-double" "znver4-direct") > + > + > +;; Integer unit 4 ALU pipes. > +(define_cpu_unit "znver4-ieu0" "znver4_ieu") > +(define_cpu_unit "znver4-ieu1" "znver4_ieu") > +(define_cpu_unit "znver4-ieu2" "znver4_ieu") > +(define_cpu_unit "znver4-ieu3" "znver4_ieu") > +(define_reservation "znver4-ieu" > "znver4-ieu0|znver4-ieu1|znver4-ieu2|znver4-ieu3") > + > +;; 3 AGU pipes in znver4 > +(define_cpu_unit "znver4-agu0" "znver4_agu") > +(define_cpu_unit "znver4-agu1" "znver4_agu") > +(define_cpu_unit "znver4-agu2" "znver4_agu") > +(define_reservation "znver4-agu-reserve" > "znver4-agu0|znver4-agu1|znver4-agu2") > + > +;; Load is 4 cycles. We do not model reservation of load unit. > +(define_reservation "znver4-load" "znver4-agu-reserve") > +(define_reservation "znver4-store" "znver4-agu-reserve") > + > +;; vectorpath (microcoded) instructions are single issue instructions. > +;; So, they occupy all the integer units. > +(define_reservation "znver4-ivector" "znver4-ieu0+znver4-ieu1 > + +znver4-ieu2+znver4-ieu3 > + > +znver4-agu0+znver4-agu1+znver4-agu2") > + > +;; Floating point unit 4 FP pipes. > +(define_cpu_unit "znver4-fpu0" "znver4_fpu") > +(define_cpu_unit "znver4-fpu1" "znver4_fpu") > +(define_cpu_unit "znver4-fpu2" "znver4_fpu") > +(define_cpu_unit "znver4-fpu3" "znver4_fpu") > + > +(define_reservation "znver4-fpu" > "znver4-fpu0|znver4-fpu1|znver4-fpu2|znver4-fpu3") > + > +(define_reservation "znver4-fvector" "znver4-fpu0+znver4-fpu1 > + +znver4-fpu2+znver4-fpu3 > + > +znver4-agu0+znver4-agu1+znver4-agu2") > + > +;; DIV units > +(define_cpu_unit "znver4-idiv" "znver4_idiv") > +(define_cpu_unit "znver4-fdiv" "znver4_fdiv") > + > +;; znver4 has a separate branch unit. > +(define_cpu_unit "znver4-bru" "znver4_bru") > So this unit is new since znver2 model. Rest of stuff above is the same, right? > + > +;; Separate fp store and fp-to-int store. Although there are 2 store > pipes, the > +;; throughput is limited to only one per cycle. > +(define_cpu_unit "znver4-fp-store" "znver4_fp_store") > + > +;; Call Instruction > +(define_insn_reservation "znver4_call" 1 > + (and (eq_attr "cpu" "znver4") > + (eq_attr "type" "call,callv")) > + > "znver4-double,znver4-ieu0|znver4-bru,znver4-store") > Perhaps this should be znver4-ieu0+znver4-bru to model the fact that call uses the branch unit? Also using | for things from different units is not really working as expected since in independent automatons we will end up chosing to not reserve anything. > + > +;; Push Instruction > +(define_insn_reservation "znver4_push" 1 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "push") > + (eq_attr "memory" "store"))) > + "znver4-direct,znver4-store") > + > +(define_insn_reservation "znver4_push_load" 5 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "push") > + (eq_attr "memory" "both"))) > + "znver4-direct,znver4-load,znver4-store") > Here we have 5 cycles instead of 4 for push. However reservations wise we do cycle 0 - direct cycle 1 - load cycle 2 - store and nothing in remaining two cycles. > + > +;; Pop instruction > +(define_insn_reservation "znver4_pop" 1 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "pop") > + (eq_attr "memory" "load"))) > + "znver4-direct,znver4-load") > + > +(define_insn_reservation "znver4_pop_mem" 5 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "pop") > + (eq_attr "memory" "both"))) > + "znver4-direct,znver4-load,znver4-store") > Similar situation here. I remember that at znver1 time I had problem with defining load as 4-cycle operation since the model got too large. This is probably artifact of that. I wonder if with the separation of division from the main model this still causes troubles... > + > +;; Leave > +(define_insn_reservation "znver4_leave" 1 > + (and (eq_attr "cpu" "znver4") > + (eq_attr "type" "leave")) > + "znver4-double,znver4-ieu,znver4-store") > + > +;; Integer Instructions or General instructions > +;; Multiplications > +(define_insn_reservation "znver4_imul" 3 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "imul") > + (and (eq_attr "mode" "QI,HI,SI") > + (eq_attr "memory" "none")))) > + "znver4-direct,znver4-ieu1") > + > +(define_insn_reservation "znver4_imul_DI" 4 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "imul") > + (and (eq_attr "mode" "DI") > + (eq_attr "memory" "none")))) > + "znver4-direct,znver4-ieu1") > Note that Agner Fog's tables claims znver4 imul to be still 3 cycles even for 64bit.. + > > +(define_insn_reservation "znver4_imov_direct_store" 1 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "imov") > + (eq_attr "memory" "store"))) > + "znver4-direct,znver4-ieu,znver4-store") > + > +(define_insn_reservation "znver4_imov_load_double_store" 4 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "znver1_decode" "double") > + (and (eq_attr "type" "imovx") > + (eq_attr "memory" "store")))) > + > "znver4-double,znver4-load,znver4-ieu,znver4-store") > + > +(define_insn_reservation "znver4_imov_load_direct_store" 4 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "imov") > + (eq_attr "memory" "store"))) > + > "znver4-direct,znver4-load,znver4-ieu,znver4-store") > I wonder why znver4-load is here when memory is "store"? > + > +;; INTEGER/GENERAL Instructions > +(define_insn_reservation "znver4_insn2_load" 5 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "icmov,setcc") > + (eq_attr "memory" "load"))) > + > "znver4-direct,znver4-load,znver4-ieu0|znver4-ieu3") > + > + > +(define_insn_reservation "znver4_insn2_store" 1 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "icmov,setcc") > + (eq_attr "memory" "load"))) > + > "znver4-direct,znver4-ieu0|znver4-ieu3,znver4-store") > This looks like bug to me that is cut&pasted few times. Stores should test memory for "store" otherwise it is identical to reservation above. > + > +;; Other vector type > +(define_insn_reservation "znver4_ieu_vector" 5 > + (and (eq_attr "cpu" "znver4") > + (eq_attr "type" "other,multi")) > + "znver4-vector,znver4-ivector") > znvermodel tests also for str. Store and both is handled earlier but what about loads? > + > +;; alu1 instructions > +(define_insn_reservation "znver4_alu1_vector" 3 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "znver1_decode" "vector") > + (and (eq_attr "type" "alu1") > + (eq_attr "memory" > "none,unknown")))) > + "znver4-vector,znver4-ivector") > + > +(define_insn_reservation "znver4_alu1_direct" 1 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "znver1_decode" "direct") > + (and (eq_attr "type" "alu1") > + (eq_attr "memory" > "none,unknown")))) > + "znver4-direct,znver4-ieu") > + > +;; Branches > +(define_insn_reservation "znver4_branch" 1 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ibr") > + (eq_attr "memory" "none"))) > + "znver4-direct,znver4-ieu0|znver4-bru") > Probably + insead of | > + > +(define_insn_reservation "znver4_branch_mem" 5 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ibr") > + (eq_attr "memory" "load"))) > + "znver4-vector,znver4-ivector") > No bru use here? > + > +;; LEA instruction with simple addressing > +(define_insn_reservation "znver4_lea" 1 > + (and (eq_attr "cpu" "znver4") > + (eq_attr "type" "lea")) > + "znver4-direct,znver4-ieu") > + > +;; Floating Point > +;; FP movs > +(define_insn_reservation "znver4_fp_cmov" 6 > + (and (eq_attr "cpu" "znver4") > + (eq_attr "type" "fcmov")) > + "znver4-vector,znver4-fvector") > Agner Fog claims 4 cycles here. > +(define_insn_reservation "znver4_fp_mov_direct_load" 8 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "znver1_decode" "direct") > + (and (eq_attr "type" "fmov") > + (eq_attr "memory" "load")))) > + "znver4-direct,znver4-load,znver4-fpu1") > 7 cycles in Agner Fog's manual > + > +(define_insn_reservation "znver4_fp_mov_direct_store" 5 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "znver1_decode" "direct") > + (and (eq_attr "type" "fmov") > + (eq_attr "memory" "store")))) > + "znver4-direct,znver4-fpu1,znver4-fp-store") > 6 by Agner Fog > + > +(define_insn_reservation "znver4_fp_mov_double" 5 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "znver1_decode" "double") > + (and (eq_attr "type" "fmov") > + (eq_attr "memory" "none")))) > + "znver4-double,znver4-fpu1,znver4-fp-store") > I wonder what this matches. > + > +(define_insn_reservation "znver4_fp_mov_double_load" 12 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "znver1_decode" "double") > + (and (eq_attr "type" "fmov") > + (eq_attr "memory" "load")))) > + > "znver4-double,znver4-load,znver4-fpu1,znver4-fp-store") > It seems that fild is modeled as fmov and double decode, but Agner Fog claims it is single decode with latency 13. > +;; FADD, FSUB, FMUL > +(define_insn_reservation "znver4_fp_op_mul" 6 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "fop,fmul") > + (eq_attr "memory" "none"))) > + "znver4-direct,znver4-fpu0") > 7 by Agner Fog > + > +;; AVX instructions > +(define_insn_reservation "znver4_sse_log" 1 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "sselog") > + (and (eq_attr "mode" > "V4SF,V8SF,V2DF,V4DF,QI,HI,SI,DI,TI,OI") > + (eq_attr "memory" "none")))) > + "znver4-direct,znver4-fpu") > + > +(define_insn_reservation "znver4_sse_log_load" 8 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "sselog") > + (and (eq_attr "mode" > "V4SF,V8SF,V2DF,V4DF,QI,HI,SI,DI,TI,OI") > + (eq_attr "memory" "load")))) > + "znver4-direct,znver4-load,znver4-fpu") > Agner fog lists sse load as 5 cycles, so I wonder if it is right for operation to be 8 instead of 6? > + > +(define_insn_reservation "znver4_sse_log1" 1 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "sselog1") > + (and (eq_attr "mode" > "V4SF,V8SF,V2DF,V4DF,QI,HI,SI,DI,TI,OI") > + (eq_attr "memory" "none")))) > + > "znver4-direct,znver4-fpu1|znver4-fpu2,znver4-fp-store") > So logical operations take store unit even if they are not storing? > + > +(define_insn_reservation "znver4_sse_log1_load" 8 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "sselog1") > + (and (eq_attr "mode" > "V4SF,V8SF,V2DF,V4DF,QI,HI,SI,DI,TI,OI") > + (eq_attr "memory" "load")))) > + > "znver4-direct,znver4-load,znver4-fpu1|znver4-fpu2,znver4-fp-store") > + > +(define_insn_reservation "znver4_sse_comi" 1 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssecomi") > + (eq_attr "memory" "none"))) > + > "znver4-double,znver4-fpu2|znver4-fpu3,znver4-fp-store") > + > +(define_insn_reservation "znver4_sse_comi_load" 8 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssecomi") > + (eq_attr "memory" "load"))) > + > "znver4-double,znver4-load,znver4-fpu2|znver4-fpu3,znver4-fp-store") > + > +(define_insn_reservation "znver4_sse_test" 1 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "prefix_extra" "1") > + (and (eq_attr "type" "ssecomi") > + (eq_attr "memory" "none")))) > + "znver4-direct,znver4-fpu1|znver4-fpu2") > + > +(define_insn_reservation "znver4_sse_test_load" 8 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "prefix_extra" "1") > + (and (eq_attr "type" "ssecomi") > + (eq_attr "memory" "load")))) > + > "znver4-direct,znver4-load,znver4-fpu1|znver4-fpu2") > + > +(define_insn_reservation "znver4_sse_imul" 3 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "sseimul") > + (and (eq_attr "mode" > "QI,HI,SI,DI,TI,OI") > + (eq_attr "memory" "none")))) > + "znver4-direct,znver4-fpu0|znver4-fpu3") > + > +(define_insn_reservation "znver4_sse_imul_load" 10 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "sseimul") > + (and (eq_attr "mode" > "QI,HI,SI,DI,TI,OI") > + (eq_attr "memory" "load")))) > + > "znver4-direct,znver4-load,znver4-fpu0|znver4-fpu1") > + > +(define_insn_reservation "znver4_sse_mov" 2 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssemov") > + (and (eq_attr "mode" > "QI,HI,SI,DI,TI,OI") > + (eq_attr "memory" "none")))) > + "znver4-direct,znver4-fpu1|znver4-fpu2") > I think this is copied from znver1 table. It should be 1 cycle and often 0 if renaming happens? > + > +(define_insn_reservation "znver4_sse_mov_load" 9 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssemov") > + (and (eq_attr "mode" > "QI,HI,SI,DI,TI,OI") > + (eq_attr "memory" "load")))) > + > "znver4-direct,znver4-load,znver4-fpu1|znver4-fpu2") > 5 cycles Agner Fog > + > +(define_insn_reservation "znver4_sse_mov_store" 5 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssemov") > + (and (eq_attr "mode" > "QI,HI,SI,DI,TI,OI") > + (eq_attr "memory" "store")))) > + > "znver4-direct,znver4-fpu1|znver4-fpu2,znver4-fp-store") > We model other stores as 1 cycle operation, why you use 5 here? Also it seems that normal store is 1 op, so why it needs fpu? > +(define_insn_reservation "znver4_sse_add1" 6 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "sseadd1") > + (and (eq_attr "mode" > "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") > + (eq_attr "memory" "none")))) > + "znver4-vector,znver4-fvector") > This seems to match hadd that is listed as 4 cycles in Agner Fog. Why it is vector decode? > + > +(define_insn_reservation "znver4_sse_add1_load" 13 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "sseadd1") > + (and (eq_attr "mode" > "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") > + (eq_attr "memory" "load")))) > + "znver4-vector,znver4-load,znver4-fvector") > Similarly here. > +(define_insn_reservation "znver4_sse_cmp_avx" 3 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssecmp,ssecomi") > + (and (eq_attr "mode" > "V4SF,V2DF,V2SF,V1DF,SF,QI,HI,SI,DI,TI") > + (eq_attr "memory" "none")))) > + "znver4-direct,znver4-fpu0|znver4-fpu1") > + > +(define_insn_reservation "znver4_sse_cmp_avx_load" 10 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssecmp,ssecomi") > + (and (eq_attr "mode" > "V4SF,V2DF,V2SF,V1DF,SF,QI,HI,SI,DI,TI") > + (eq_attr "memory" "load")))) > + > "znver4-direct,znver4-load,znver4-fpu0|znver4-fpu1") > + > +(define_insn_reservation "znver4_sse_cmp_avx2" 4 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssecmp,ssecomi") > + (and (eq_attr "mode" "V8SF,V4DF,OI") > + (eq_attr "memory" "none")))) > + "znver4-direct,znver4-fpu0|znver4-fpu1") > Is there really differnece between 128bit and 256bit here? +;; AVX512 instructions > > +(define_insn_reservation "znver4_sse_log_evex" 1 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "sselog") > + (and (eq_attr "mode" "V16SF,V8DF,XI") > + (eq_attr "memory" "none")))) > + > "znver4-direct,znver4-fpu0*2|znver4-fpu1*2|znver4-fpu2*2|znver4-fpu3*2") > I think the instruction does two operaitos in parallel so (znver4-fpu0+znver4-fpu1)|(znver4-fpu1+znver4-fpu2)|(znver4-fpu0+znver4-fpu2) > + > +(define_insn_reservation "znver4_sse_log_evex_load" 8 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "sselog") > + (and (eq_attr "mode" "V16SF,V8DF,XI") > + (eq_attr "memory" "load")))) > + > "znver4-direct,znver4-load,znver4-fpu0*2|znver4-fpu1*2|znver4-fpu2*2|znver4-fpu3*2") > + > +(define_insn_reservation "znver4_sse_log1_evex" 1 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "sselog1") > + (and (eq_attr "mode" "V16SF,V8DF,XI") > + (eq_attr "memory" "none")))) > + > "znver4-direct,znver4-fpu1*2|znver4-fpu2*2,znver4-fp-store") > + > +(define_insn_reservation "znver4_sse_log1_evex_load" 8 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "sselog1") > + (and (eq_attr "mode" "V16SF,V8DF,XI") > + (eq_attr "memory" "load")))) > + > "znver4-direct,znver4-load,znver4-fpu1*2|znver4-fpu2*2,znver4-fp-store") > + > +(define_insn_reservation "znver4_sse_mul_evex" 3 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssemul") > + (and (eq_attr "mode" "V16SF,V8DF") > + (eq_attr "memory" "none")))) > + "znver4-direct,znver4-fpu0*2|znver4-fpu1*2") > + > +(define_insn_reservation "znver4_sse_mul_evex_load" 10 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssemul") > + (and (eq_attr "mode" "V16SF,V8DF") > + (eq_attr "memory" "load")))) > + > "znver4-direct,znver4-load,znver4-fpu0*2|znver4-fpu1*2") > + > +(define_insn_reservation "znver4_sse_imul_evex" 3 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "sseimul") > + (and (eq_attr "mode" "XI") > + (eq_attr "memory" "none")))) > + "znver4-direct,znver4-fpu0*2|znver4-fpu3*2") > + > +(define_insn_reservation "znver4_sse_imul_evex_load" 10 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "sseimul") > + (and (eq_attr "mode" "XI") > + (eq_attr "memory" "load")))) > + > "znver4-direct,znver4-load,znver4-fpu0*2|znver4-fpu1*2") > + > +(define_insn_reservation "znver4_sse_mov_evex" 4 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssemov") > + (and (eq_attr "mode" "XI") > + (eq_attr "memory" "none")))) > + "znver4-direct,znver4-fpu1*2|znver4-fpu2*2") > + > +(define_insn_reservation "znver4_sse_mov_evex_load" 11 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssemov") > + (and (eq_attr "mode" "XI") > + (eq_attr "memory" "load")))) > + > "znver4-direct,znver4-load,znver4-fpu1*2|znver4-fpu2*2") > + > +(define_insn_reservation "znver4_sse_mov_evex_store" 5 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssemov") > + (and (eq_attr "mode" "XI") > + (eq_attr "memory" "store")))) > + > "znver4-direct,znver4-fpu1*2|znver4-fpu2*2,znver4-fp-store") > + > +(define_insn_reservation "znver4_sse_add_evex" 3 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "sseadd") > + (and (eq_attr "mode" "V16SF,V8DF") > + (eq_attr "memory" "none")))) > + "znver4-direct,znver4-fpu2*2|znver4-fpu3*2") > + > +(define_insn_reservation "znver4_sse_add_evex_load" 10 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "sseadd") > + (and (eq_attr "mode" "V16SF,V8DF") > + (eq_attr "memory" "load")))) > + > "znver4-direct,znver4-load,znver4-fpu2*2|znver4-fpu3*2") > + > +(define_insn_reservation "znver4_sse_iadd_evex" 1 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "sseiadd") > + (and (eq_attr "mode" "XI") > + (eq_attr "memory" "none")))) > + > "znver4-direct,znver4-fpu0*2|znver4-fpu1*2|znver4-fpu2*2|znver4-fpu3*2") > + > +(define_insn_reservation "znver4_sse_iadd_evex_load" 8 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "sseiadd") > + (and (eq_attr "mode" "XI") > + (eq_attr "memory" "load")))) > + > "znver4-direct,znver4-load,znver4-fpu0*2|znver4-fpu1*2|znver4-fpu2*2|znver4-fpu3*2") > + > +(define_insn_reservation "znver4_sse_div_pd_evex" 13 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssediv") > + (and (eq_attr "mode" "V8DF") > + (eq_attr "memory" "none")))) > + "znver4-direct,znver4-fdiv*9") > + > +(define_insn_reservation "znver4_sse_div_ps_evex" 10 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssediv") > + (and (eq_attr "mode" "V16SF") > + (eq_attr "memory" "none")))) > + "znver4-direct,znver4-fdiv*6") > + > +(define_insn_reservation "znver4_sse_div_pd_evex_load" 20 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssediv") > + (and (eq_attr "mode" "V8DF") > + (eq_attr "memory" "load")))) > + "znver4-direct,znver4-load,znver4-fdiv*9") > + > +(define_insn_reservation "znver4_sse_div_ps_evex_load" 17 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssediv") > + (and (eq_attr "mode" "V16SF") > + (eq_attr "memory" "load")))) > + "znver4-direct,znver4-load,znver4-fdiv*6") > + > +(define_insn_reservation "znver4_sse_cmp_avx512" 5 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssecmp,ssecomi") > + (and (eq_attr "mode" "V16SF,V8DF,XI") > + (eq_attr "memory" "none")))) > + "znver4-direct,znver4-fpu0*2|znver4-fpu1*2") > + > +(define_insn_reservation "znver4_sse_cmp_avx512_load" 12 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssecmp,ssecomi") > + (and (eq_attr "mode" "V16SF,V8DF,XI") > + (eq_attr "memory" "load")))) > + > "znver4-direct,znver4-load,znver4-fpu0*2|znver4-fpu1*2") > + > +(define_insn_reservation "znver4_sse_cvt_evex" 6 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssecvt") > + (and (eq_attr "mode" "V16SF,V8DF") > + (eq_attr "memory" "none")))) > + > "znver4-direct,znver4-fpu1*2|znver4-fpu2*2,znver4-fpu2*2|znver4-fpu3*2") > + > +(define_insn_reservation "znver4_sse_cvt_evex_load" 13 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssecvt") > + (and (eq_attr "mode" "V16SF,V8DF") > + (eq_attr "memory" "load")))) > + > "znver4-direct,znver4-load,znver4-fpu1*2|znver4-fpu2*2,znver4-fpu2*2|znver4-fpu3*2") > + > +(define_insn_reservation "znver4_sse_shuf_evex" 1 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "sseshuf") > + (and (eq_attr "mode" "V16SF,V8DF") > + (eq_attr "memory" "none")))) > + > "znver4-direct,znver4-fpu0*2|znver4-fpu1*2|znver4-fpu2*2|znver4-fpu3*2") > + > +(define_insn_reservation "znver4_sse_shuf_evex_load" 8 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "sseshuf") > + (and (eq_attr "mode" "V16SF,V8DF") > + (eq_attr "memory" "load")))) > + > "znver4-direct,znver4-load,znver4-fpu0*2|znver4-fpu1*2|znver4-fpu2*2|znver4-fpu3*2") > + > +(define_insn_reservation "znver4_sse_ishuf_evex" 4 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "sseshuf") > + (and (eq_attr "mode" "XI") > + (eq_attr "memory" "none")))) > + "znver4-direct,znver4-fpu1*2|znver4-fpu2*2") > + > +(define_insn_reservation "znver4_sse_ishuf_evex_load" 11 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "sseshuf") > + (and (eq_attr "mode" "XI") > + (eq_attr "memory" "load")))) > + > "znver4-direct,znver4-load,znver4-fpu1*2|znver4-fpu2*2") > + > +(define_insn_reservation "znver4_sse_muladd" 4 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssemuladd") > + (eq_attr "memory" "none"))) > + "znver4-direct,znver4-fpu0*2|znver4-fpu1*2") > + > +(define_insn_reservation "znver4_sse_muladd_load" 11 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "sseshuf") > + (eq_attr "memory" "load"))) > + > "znver4-direct,znver4-load,znver4-fpu0*2|znver4-fpu1*2") > + > +;; AVX512 mask instructions > + > +(define_insn_reservation "znver4_sse_mskmov" 2 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "mskmov") > + (eq_attr "memory" "none"))) > + "znver4-direct,znver4-fpu0*2|znver4-fpu1*2") > + > +(define_insn_reservation "znver4_sse_msklog" 1 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "msklog") > + (eq_attr "memory" "none"))) > + "znver4-direct,znver4-fpu2*2|znver4-fpu3*2") > Honza ^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: [PATCH][X86_64] Separate znver4 insn reservations from older znvers 2022-12-12 21:41 ` Jan Hubička @ 2022-12-22 17:34 ` Joshi, Tejas Sanjay 2023-01-03 14:36 ` Jan Hubicka 0 siblings, 1 reply; 14+ messages in thread From: Joshi, Tejas Sanjay @ 2022-12-22 17:34 UTC (permalink / raw) To: Jan Hubička, gcc-patches; +Cc: Alexander Monakov, Kumar, Venkataramanan [-- Attachment #1: Type: text/plain, Size: 42655 bytes --] [Public] Hello, I have addressed all your comments in this revision of the patch, please find attached and inlined. * I have updated all the latencies with Agner's measurements. * Incorrect pipelines, loads/stores are addressed. * The double pumped avx512 insns take one cycle for 256 half and the next cycle for remaining 256-bit half in the same pipeline, thus pipe*2. Is this ok for trunk? Thanks and Regards, Tejas gcc/ChangeLog: * gcc/common/config/i386/i386-common.cc (processor_alias_table): Use CPU_ZNVER4 for znver4. * config/i386/i386.md: Add znver4.md. * config/i386/znver4.md: New. Change-Id: Iea39c1c01d4992cf7ac476bd6de65887910bbcbe --- gcc/common/config/i386/i386-common.cc | 2 +- gcc/config/i386/i386.md | 1 + gcc/config/i386/znver4.md | 1068 +++++++++++++++++++++++++ 3 files changed, 1070 insertions(+), 1 deletion(-) create mode 100644 gcc/config/i386/znver4.md diff --git a/gcc/common/config/i386/i386-common.cc b/gcc/common/config/i386/i386-common.cc index 660a977b68b..c7adea57683 100644 --- a/gcc/common/config/i386/i386-common.cc +++ b/gcc/common/config/i386/i386-common.cc @@ -2215,7 +2215,7 @@ const pta processor_alias_table[] = {"znver3", PROCESSOR_ZNVER3, CPU_ZNVER3, PTA_ZNVER3, M_CPU_SUBTYPE (AMDFAM19H_ZNVER3), P_PROC_AVX2}, - {"znver4", PROCESSOR_ZNVER4, CPU_ZNVER3, + {"znver4", PROCESSOR_ZNVER4, CPU_ZNVER4, PTA_ZNVER4, M_CPU_SUBTYPE (AMDFAM19H_ZNVER4), P_PROC_AVX512F}, {"btver1", PROCESSOR_BTVER1, CPU_GENERIC, diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 9451883396c..3a88f16a21a 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -1319,6 +1319,7 @@ (include "bdver3.md") (include "btver2.md") (include "znver.md") +(include "znver4.md") (include "geode.md") (include "atom.md") (include "slm.md") diff --git a/gcc/config/i386/znver4.md b/gcc/config/i386/znver4.md new file mode 100644 index 00000000000..d0b239822a8 --- /dev/null +++ b/gcc/config/i386/znver4.md @@ -0,0 +1,1068 @@ +;; Copyright (C) 2012-2022 Free Software Foundation, Inc. +;; +;; This file is part of GCC. +;; +;; GCC is free software; you can redistribute it and/or modify +;; it under the terms of the GNU General Public License as published by +;; the Free Software Foundation; either version 3, or (at your option) +;; any later version. +;; +;; GCC is distributed in the hope that it will be useful, +;; but WITHOUT ANY WARRANTY; without even the implied warranty of +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +;; GNU General Public License for more details. +;; +;; You should have received a copy of the GNU General Public License +;; along with GCC; see the file COPYING3. If not see +;; <http://www.gnu.org/licenses/>. +;; + + +(define_attr "znver4_decode" "direct,vector,double" + (const_string "direct")) + +;; AMD znver4 Scheduling +;; Modeling automatons for zen decoders, integer execution pipes, +;; AGU pipes, branch, floating point execution and fp store units. +(define_automaton "znver4, znver4_ieu, znver4_idiv, znver4_fdiv, znver4_agu, znver4_fpu, znver4_fp_store") + +;; Decoders unit has 4 decoders and all of them can decode fast path +;; and vector type instructions. +(define_cpu_unit "znver4-decode0" "znver4") +(define_cpu_unit "znver4-decode1" "znver4") +(define_cpu_unit "znver4-decode2" "znver4") +(define_cpu_unit "znver4-decode3" "znver4") + +;; Currently blocking all decoders for vector path instructions as +;; they are dispatched separetely as microcode sequence. +(define_reservation "znver4-vector" "znver4-decode0+znver4-decode1+znver4-decode2+znver4-decode3") + +;; Direct instructions can be issued to any of the four decoders. +(define_reservation "znver4-direct" "znver4-decode0|znver4-decode1|znver4-decode2|znver4-decode3") + +;; Fix me: Need to revisit this later to simulate fast path double behavior. +(define_reservation "znver4-double" "znver4-direct") + + +;; Integer unit 4 ALU pipes. +(define_cpu_unit "znver4-ieu0" "znver4_ieu") +(define_cpu_unit "znver4-ieu1" "znver4_ieu") +(define_cpu_unit "znver4-ieu2" "znver4_ieu") +(define_cpu_unit "znver4-ieu3" "znver4_ieu") +;; Znver4 has an additional branch unit. +(define_cpu_unit "znver4-bru0" "znver4_ieu") +(define_reservation "znver4-ieu" "znver4-ieu0|znver4-ieu1|znver4-ieu2|znver4-ieu3") + +;; 3 AGU pipes in znver4 +(define_cpu_unit "znver4-agu0" "znver4_agu") +(define_cpu_unit "znver4-agu1" "znver4_agu") +(define_cpu_unit "znver4-agu2" "znver4_agu") +(define_reservation "znver4-agu-reserve" "znver4-agu0|znver4-agu1|znver4-agu2") + +;; Load is 4 cycles. We do not model reservation of load unit. +(define_reservation "znver4-load" "znver4-agu-reserve") +(define_reservation "znver4-store" "znver4-agu-reserve") + +;; vectorpath (microcoded) instructions are single issue instructions. +;; So, they occupy all the integer units. +(define_reservation "znver4-ivector" "znver4-ieu0+znver4-ieu1 + +znver4-ieu2+znver4-ieu3+znver4-bru0 + +znver4-agu0+znver4-agu1+znver4-agu2") + +;; Floating point unit 4 FP pipes. +(define_cpu_unit "znver4-fpu0" "znver4_fpu") +(define_cpu_unit "znver4-fpu1" "znver4_fpu") +(define_cpu_unit "znver4-fpu2" "znver4_fpu") +(define_cpu_unit "znver4-fpu3" "znver4_fpu") + +(define_reservation "znver4-fpu" "znver4-fpu0|znver4-fpu1|znver4-fpu2|znver4-fpu3") + +(define_reservation "znver4-fvector" "znver4-fpu0+znver4-fpu1 + +znver4-fpu2+znver4-fpu3 + +znver4-agu0+znver4-agu1+znver4-agu2") + +;; DIV units +(define_cpu_unit "znver4-idiv" "znver4_idiv") +(define_cpu_unit "znver4-fdiv" "znver4_fdiv") + +;; Separate fp store and fp-to-int store. Although there are 2 store pipes, the +;; throughput is limited to only one per cycle. +(define_cpu_unit "znver4-fp-store" "znver4_fp_store") + + +;; Integer Instructions +;; Move instructions +;; XCHG +(define_insn_reservation "znver4_imov_double" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "znver1_decode" "double") + (and (eq_attr "type" "imov") + (eq_attr "memory" "none")))) + "znver4-double,znver4-ieu") + +(define_insn_reservation "znver4_imov_double_load" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "znver1_decode" "double") + (and (eq_attr "type" "imov") + (eq_attr "memory" "load")))) + "znver4-double,znver4-load,znver4-ieu") + +;; imov, imovx +(define_insn_reservation "znver4_imov" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "imov,imovx") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-ieu") + +(define_insn_reservation "znver4_imov_load" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "imov,imovx") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-ieu") + +;; Push Instruction +(define_insn_reservation "znver4_push" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "push") + (eq_attr "memory" "store"))) + "znver4-direct,znver4-store") + +(define_insn_reservation "znver4_push_mem" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "push") + (eq_attr "memory" "both"))) + "znver4-direct,znver4-load,znver4-store") + +;; Pop instruction +(define_insn_reservation "znver4_pop" 4 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "pop") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load") + +(define_insn_reservation "znver4_pop_mem" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "pop") + (eq_attr "memory" "both"))) + "znver4-direct,znver4-load,znver4-store") + +;; Integer Instructions or General instructions +;; Multiplications +(define_insn_reservation "znver4_imul" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "imul") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-ieu1") + +(define_insn_reservation "znver4_imul_load" 7 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "imul") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-ieu1") + +;; Divisions +(define_insn_reservation "znver4_idiv_DI" 18 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "idiv") + (and (eq_attr "mode" "DI") + (eq_attr "memory" "none")))) + "znver4-double,znver4-idiv*10") + +(define_insn_reservation "znver4_idiv_SI" 12 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "idiv") + (and (eq_attr "mode" "SI") + (eq_attr "memory" "none")))) + "znver4-double,znver4-idiv*6") + +(define_insn_reservation "znver4_idiv_HI" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "idiv") + (and (eq_attr "mode" "HI") + (eq_attr "memory" "none")))) + "znver4-double,znver4-idiv*4") + +(define_insn_reservation "znver4_idiv_QI" 9 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "idiv") + (and (eq_attr "mode" "QI") + (eq_attr "memory" "none")))) + "znver4-double,znver4-idiv*4") + +(define_insn_reservation "znver4_idiv_DI_load" 22 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "idiv") + (and (eq_attr "mode" "DI") + (eq_attr "memory" "load")))) + "znver4-double,znver4-load,znver4-idiv*10") + +(define_insn_reservation "znver4_idiv_SI_load" 16 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "idiv") + (and (eq_attr "mode" "SI") + (eq_attr "memory" "load")))) + "znver4-double,znver4-load,znver4-idiv*6") + +(define_insn_reservation "znver4_idiv_HI_load" 14 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "idiv") + (and (eq_attr "mode" "HI") + (eq_attr "memory" "load")))) + "znver4-double,znver4-load,znver4-idiv*4") + +(define_insn_reservation "znver4_idiv_QI_load" 13 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "idiv") + (and (eq_attr "mode" "QI") + (eq_attr "memory" "load")))) + "znver4-double,znver4-load,znver4-idiv*4") + +;; INTEGER/GENERAL Instructions +(define_insn_reservation "znver4_insn" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "alu,alu1,negnot,rotate1,ishift1,test,incdec,icmp") + (eq_attr "memory" "none,unknown"))) + "znver4-direct,znver4-ieu") + +(define_insn_reservation "znver4_insn_load" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "alu,alu1,negnot,rotate1,ishift1,test,incdec,icmp") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-ieu") + +(define_insn_reservation "znver4_insn2" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "icmov,setcc") + (eq_attr "memory" "none,unknown"))) + "znver4-direct,znver4-ieu0|znver4-ieu3") + +(define_insn_reservation "znver4_insn2_load" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "icmov,setcc") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-ieu0|znver4-ieu3") + +(define_insn_reservation "znver4_rotate" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "rotate") + (eq_attr "memory" "none,unknown"))) + "znver4-direct,znver4-ieu1|znver4-ieu2") + +(define_insn_reservation "znver4_rotate_load" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "rotate") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-ieu1|znver4-ieu2") + +(define_insn_reservation "znver4_insn_store" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "alu,alu1,negnot,rotate1,ishift1,test,incdec,icmp") + (eq_attr "memory" "store"))) + "znver4-direct,znver4-ieu,znver4-store") + +(define_insn_reservation "znver4_insn2_store" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "icmov,setcc") + (eq_attr "memory" "store"))) + "znver4-direct,znver4-ieu0|znver4-ieu3,znver4-store") + +(define_insn_reservation "znver4_rotate_store" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "rotate") + (eq_attr "memory" "store"))) + "znver4-direct,znver4-ieu1|znver4-ieu2,znver4-store") + +;; alu1 instructions +(define_insn_reservation "znver4_alu1_vector" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "znver1_decode" "vector") + (and (eq_attr "type" "alu1") + (eq_attr "memory" "none,unknown")))) + "znver4-vector,znver4-ivector*3") + +(define_insn_reservation "znver4_alu1_vector_load" 7 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "znver1_decode" "vector") + (and (eq_attr "type" "alu1") + (eq_attr "memory" "load")))) + "znver4-vector,znver4-load,znver4-ivector*3") + +;; Call Instruction +(define_insn_reservation "znver4_call" 1 + (and (eq_attr "cpu" "znver4") + (eq_attr "type" "call,callv")) + "znver4-double,znver4-ieu0|znver4-bru0,znver4-store") + +;; Branches +(define_insn_reservation "znver4_branch" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ibr") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-ieu0|znver4-bru0") + +(define_insn_reservation "znver4_branch_load" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ibr") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-ieu0|znver4-bru0") + +(define_insn_reservation "znver4_branch_vector" 2 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ibr") + (eq_attr "memory" "none,unknown"))) + "znver4-vector,znver4-ivector*2") + +(define_insn_reservation "znver4_branch_vector_load" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ibr") + (eq_attr "memory" "load"))) + "znver4-vector,znver4-load,znver4-ivector*2") + +;; LEA instruction with simple addressing +(define_insn_reservation "znver4_lea" 1 + (and (eq_attr "cpu" "znver4") + (eq_attr "type" "lea")) + "znver4-direct,znver4-ieu") + +;; Leave +(define_insn_reservation "znver4_leave" 1 + (and (eq_attr "cpu" "znver4") + (eq_attr "type" "leave")) + "znver4-double,znver4-ieu,znver4-store") + +;; STR and ISHIFT are microcoded. +(define_insn_reservation "znver4_str" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "str") + (eq_attr "memory" "none"))) + "znver4-vector,znver4-ivector*3") + +(define_insn_reservation "znver4_str_load" 7 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "str") + (eq_attr "memory" "load"))) + "znver4-vector,znver4-load,znver4-ivector*3") + +(define_insn_reservation "znver4_ishift" 2 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ishift") + (eq_attr "memory" "none"))) + "znver4-vector,znver4-ivector*2") + +(define_insn_reservation "znver4_ishift_load" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ishift") + (eq_attr "memory" "load"))) + "znver4-vector,znver4-load,znver4-ivector*2") + +;; Other vector type +(define_insn_reservation "znver4_ieu_vector" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "other,multi") + (eq_attr "memory" "none,unknown"))) + "znver4-vector,znver4-ivector*5") + +(define_insn_reservation "znver4_ieu_vector_load" 9 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "other,multi") + (eq_attr "memory" "load"))) + "znver4-vector,znver4-load,znver4-ivector*5") + +;; Floating Point +;; FP movs +(define_insn_reservation "znver4_fp_cmov" 4 + (and (eq_attr "cpu" "znver4") + (eq_attr "type" "fcmov")) + "znver4-vector,znver4-fvector*3") + +(define_insn_reservation "znver4_fp_mov_direct" 1 + (and (eq_attr "cpu" "znver4") + (eq_attr "type" "fmov")) + "znver4-direct,znver4-fpu0|znver4-fpu1") + +;;FLD +(define_insn_reservation "znver4_fp_mov_direct_load" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "znver1_decode" "direct") + (and (eq_attr "type" "fmov") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu0|znver4-fpu1") + +;;FST +(define_insn_reservation "znver4_fp_mov_direct_store" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "znver1_decode" "direct") + (and (eq_attr "type" "fmov") + (eq_attr "memory" "store")))) + "znver4-direct,znver4-fpu0|znver4-fpu1,znver4-fp-store") + +;;FILD +(define_insn_reservation "znver4_fp_mov_double_load" 13 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "znver1_decode" "double") + (and (eq_attr "type" "fmov") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu1") + +;;FIST +(define_insn_reservation "znver4_fp_mov_double_store" 7 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "znver1_decode" "double") + (and (eq_attr "type" "fmov") + (eq_attr "memory" "store")))) + "znver4-double,znver4-fpu1,znver4-fp-store") + +;; FSQRT +(define_insn_reservation "znver4_fsqrt" 22 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "fpspc") + (and (eq_attr "mode" "XF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fdiv*10") + +;; FPSPC instructions +(define_insn_reservation "znver4_fp_spc" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "fpspc") + (eq_attr "memory" "none"))) + "znver4-vector,znver4-fvector*6") + +(define_insn_reservation "znver4_fp_insn_vector" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "znver1_decode" "vector") + (eq_attr "type" "mmxcvt,sselog1,ssemov"))) + "znver4-vector,znver4-fvector*6") + +;; FADD, FSUB, FMUL +(define_insn_reservation "znver4_fp_op_mul" 7 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "fop,fmul") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-fpu0") + +(define_insn_reservation "znver4_fp_op_mul_load" 12 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "fop,fmul") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-fpu0") + +;; FDIV +(define_insn_reservation "znver4_fp_div" 15 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "fdiv") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-fdiv*6") + +(define_insn_reservation "znver4_fp_div_load" 20 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "fdiv") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-fdiv*6") + +(define_insn_reservation "znver4_fp_idiv_load" 24 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "fdiv") + (and (eq_attr "fp_int_src" "true") + (eq_attr "memory" "load")))) + "znver4-double,znver4-load,znver4-fdiv*6") + +;; FABS, FCHS +(define_insn_reservation "znver4_fp_fsgn" 1 + (and (eq_attr "cpu" "znver4") + (eq_attr "type" "fsgn")) + "znver4-direct,znver4-fpu0|znver4-fpu1") + +;; FCMP +(define_insn_reservation "znver4_fp_fcmp" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "fcmp") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-fpu1") + +(define_insn_reservation "znver4_fp_fcmp_double" 4 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "fcmp") + (and (eq_attr "znver1_decode" "double") + (eq_attr "memory" "none")))) + "znver4-double,znver4-fpu1,znver4-fpu2") + +;; MMX, SSE, SSEn.n instructions +(define_insn_reservation "znver4_fp_mmx " 1 + (and (eq_attr "cpu" "znver4") + (eq_attr "type" "mmx")) + "znver4-direct,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_mmx_add_cmp" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "mmxadd,mmxcmp") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-fpu") + +(define_insn_reservation "znver4_mmx_add_cmp_load" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "mmxadd,mmxcmp") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-fpu") + +(define_insn_reservation "znver4_mmx_insn" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "mmxcvt,sseshuf,sseshuf1,mmxshft") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_mmx_insn_load" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "mmxcvt,sseshuf,sseshuf1,mmxshft") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_mmx_mov" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "mmxmov") + (eq_attr "memory" "store"))) + "znver4-direct,znver4-fp-store") + +(define_insn_reservation "znver4_mmx_mov_load" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "mmxmov") + (eq_attr "memory" "both"))) + "znver4-direct,znver4-load,znver4-fp-store") + +(define_insn_reservation "znver4_mmx_mul" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "mmxmul") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-fpu0|znver4-fpu3") + +(define_insn_reservation "znver4_mmx_mul_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "mmxmul") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-fpu0|znver4-fpu3") + +;; AVX instructions +(define_insn_reservation "znver4_sse_log" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sselog") + (and (eq_attr "mode" "V4SF,V8SF,V2DF,V4DF,QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu") + +(define_insn_reservation "znver4_sse_log_load" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sselog") + (and (eq_attr "mode" "V4SF,V8SF,V2DF,V4DF,QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu") + +(define_insn_reservation "znver4_sse_log1" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sselog1") + (and (eq_attr "mode" "V4SF,V8SF,V2DF,V4DF,QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "store")))) + "znver4-direct,znver4-fpu1|znver4-fpu2,znver4-fp-store") + +(define_insn_reservation "znver4_sse_log1_load" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sselog1") + (and (eq_attr "mode" "V4SF,V8SF,V2DF,V4DF,QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "both")))) + "znver4-direct,znver4-load,znver4-fpu1|znver4-fpu2,znver4-fp-store") + +(define_insn_reservation "znver4_sse_comi" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecomi") + (eq_attr "memory" "store"))) + "znver4-double,znver4-fpu2|znver4-fpu3,znver4-fp-store") + +(define_insn_reservation "znver4_sse_comi_load" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecomi") + (eq_attr "memory" "both"))) + "znver4-double,znver4-load,znver4-fpu2|znver4-fpu3,znver4-fp-store") + +(define_insn_reservation "znver4_sse_test" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "prefix_extra" "1") + (and (eq_attr "type" "ssecomi") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_sse_test_load" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "prefix_extra" "1") + (and (eq_attr "type" "ssecomi") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_sse_imul" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseimul") + (and (eq_attr "mode" "QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu0|znver4-fpu3") + +(define_insn_reservation "znver4_sse_imul_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseimul") + (and (eq_attr "mode" "QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu0|znver4-fpu1") + +(define_insn_reservation "znver4_sse_mov" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_sse_mov_load" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_sse_mov_store" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "store")))) + "znver4-direct,znver4-fpu1|znver4-fpu2,znver4-fp-store") + +(define_insn_reservation "znver4_sse_mov_fp" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "V16SF,V8DF,V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu") + +(define_insn_reservation "znver4_sse_mov_fp_load" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "V16SF,V8DF,V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu") + +(define_insn_reservation "znver4_sse_mov_fp_store" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "V16SF,V8DF,V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "store")))) + "znver4-direct,znver4-fp-store") + +(define_insn_reservation "znver4_sse_add" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseadd") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu2|znver4-fpu3") + +(define_insn_reservation "znver4_sse_add_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseadd") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu2|znver4-fpu3") + +(define_insn_reservation "znver4_sse_add1" 4 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseadd1") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "none")))) + "znver4-vector,znver4-fvector*2") + +(define_insn_reservation "znver4_sse_add1_load" 9 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseadd1") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "load")))) + "znver4-vector,znver4-load,znver4-fvector*2") + +(define_insn_reservation "znver4_sse_iadd" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseiadd") + (and (eq_attr "mode" "QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu") + +(define_insn_reservation "znver4_sse_iadd_load" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseiadd") + (and (eq_attr "mode" "QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu") + +(define_insn_reservation "znver4_sse_mul" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemul") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu0|znver4-fpu1") + +(define_insn_reservation "znver4_sse_mul_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemul") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu0|znver4-fpu1") + +(define_insn_reservation "znver4_sse_div_pd" 13 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssediv") + (and (eq_attr "mode" "V4DF,V2DF,V1DF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fdiv*5") + +(define_insn_reservation "znver4_sse_div_ps" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssediv") + (and (eq_attr "mode" "V8SF,V4SF,V2SF,SF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fdiv*3") + +(define_insn_reservation "znver4_sse_div_pd_load" 18 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssediv") + (and (eq_attr "mode" "V4DF,V2DF,V1DF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fdiv*5") + +(define_insn_reservation "znver4_sse_div_ps_load" 15 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssediv") + (and (eq_attr "mode" "V8SF,V4SF,V2SF,SF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fdiv*3") + +(define_insn_reservation "znver4_sse_cmp_avx" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecmp") + (and (eq_attr "prefix" "vex") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu0|znver4-fpu1") + +(define_insn_reservation "znver4_sse_cmp_avx_load" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecmp") + (and (eq_attr "prefix" "vex") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu0|znver4-fpu1") + +(define_insn_reservation "znver4_sse_comi_avx" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecomi") + (eq_attr "memory" "store"))) + "znver4-direct,znver4-fpu2+znver4-fpu3,znver4-fp-store") + +(define_insn_reservation "znver4_sse_comi_avx_load" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecomi") + (eq_attr "memory" "both"))) + "znver4-direct,znver4-load,znver4-fpu2+znver4-fpu3,znver4-fp-store") + +(define_insn_reservation "znver4_sse_cvt" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecvt") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu2|znver4-fpu3") + +(define_insn_reservation "znver4_sse_cvt_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecvt") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu2|znver4-fpu3") + +(define_insn_reservation "znver4_sse_icvt" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecvt") + (and (eq_attr "mode" "SI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu2|znver4-fpu3") + +(define_insn_reservation "znver4_sse_icvt_store" 4 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecvt") + (and (eq_attr "mode" "SI") + (eq_attr "memory" "store")))) + "znver4-double,znver4-fpu2|znver4-fpu3,znver4-fp-store") + +(define_insn_reservation "znver4_sse_shuf" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseshuf") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_sse_shuf_load" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseshuf") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu") + +(define_insn_reservation "znver4_sse_ishuf" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseshuf") + (and (eq_attr "mode" "OI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_sse_ishuf_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseshuf") + (and (eq_attr "mode" "OI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu1|znver4-fpu2") + +;; AVX512 instructions +(define_insn_reservation "znver4_sse_log_evex" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sselog") + (and (eq_attr "mode" "V16SF,V8DF,XI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu0*2|znver4-fpu1*2|znver4-fpu2*2|znver4-fpu3*2") + +(define_insn_reservation "znver4_sse_log_evex_load" 7 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sselog") + (and (eq_attr "mode" "V16SF,V8DF,XI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu0*2|znver4-fpu1*2|znver4-fpu2*2|znver4-fpu3*2") + +(define_insn_reservation "znver4_sse_log1_evex" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sselog1") + (and (eq_attr "mode" "V16SF,V8DF,XI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu1*2|znver4-fpu2*2,znver4-fp-store") + +(define_insn_reservation "znver4_sse_log1_evex_load" 7 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sselog1") + (and (eq_attr "mode" "V16SF,V8DF,XI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu1*2|znver4-fpu2*2,znver4-fp-store") + +(define_insn_reservation "znver4_sse_mul_evex" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemul") + (and (eq_attr "mode" "V16SF,V8DF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu0*2|znver4-fpu1*2") + +(define_insn_reservation "znver4_sse_mul_evex_load" 9 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemul") + (and (eq_attr "mode" "V16SF,V8DF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu0*2|znver4-fpu1*2") + +(define_insn_reservation "znver4_sse_imul_evex" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseimul") + (and (eq_attr "mode" "XI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu0*2|znver4-fpu3*2") + +(define_insn_reservation "znver4_sse_imul_evex_load" 9 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseimul") + (and (eq_attr "mode" "XI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu0*2|znver4-fpu1*2") + +(define_insn_reservation "znver4_sse_mov_evex" 4 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "XI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu1*2|znver4-fpu2*2") + +(define_insn_reservation "znver4_sse_mov_evex_load" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "XI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu1*2|znver4-fpu2*2") + +(define_insn_reservation "znver4_sse_mov_evex_store" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "XI") + (eq_attr "memory" "store")))) + "znver4-direct,znver4-fpu1*2|znver4-fpu2*2,znver4-fp-store") + +(define_insn_reservation "znver4_sse_add_evex" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseadd") + (and (eq_attr "mode" "V16SF,V8DF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu2*2|znver4-fpu3*2") + +(define_insn_reservation "znver4_sse_add_evex_load" 9 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseadd") + (and (eq_attr "mode" "V16SF,V8DF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu2*2|znver4-fpu3*2") + +(define_insn_reservation "znver4_sse_iadd_evex" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseiadd") + (and (eq_attr "mode" "XI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu0*2|znver4-fpu1*2|znver4-fpu2*2|znver4-fpu3*2") + +(define_insn_reservation "znver4_sse_iadd_evex_load" 7 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseiadd") + (and (eq_attr "mode" "XI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu0*2|znver4-fpu1*2|znver4-fpu2*2|znver4-fpu3*2") + +(define_insn_reservation "znver4_sse_div_pd_evex" 13 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssediv") + (and (eq_attr "mode" "V8DF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fdiv*9") + +(define_insn_reservation "znver4_sse_div_ps_evex" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssediv") + (and (eq_attr "mode" "V16SF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fdiv*6") + +(define_insn_reservation "znver4_sse_div_pd_evex_load" 19 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssediv") + (and (eq_attr "mode" "V8DF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fdiv*9") + +(define_insn_reservation "znver4_sse_div_ps_evex_load" 16 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssediv") + (and (eq_attr "mode" "V16SF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fdiv*6") + +(define_insn_reservation "znver4_sse_cmp_avx128" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecmp") + (and (eq_attr "mode" "V4SF,V2DF,V2SF,V1DF,SF") + (and (eq_attr "prefix" "evex") + (eq_attr "memory" "none"))))) + "znver4-direct,znver4-fpu0*2|znver4-fpu1*2") + +(define_insn_reservation "znver4_sse_cmp_avx128_load" 9 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecmp") + (and (eq_attr "mode" "V4SF,V2DF,V2SF,V1DF,SF") + (and (eq_attr "prefix" "evex") + (eq_attr "memory" "load"))))) + "znver4-direct,znver4-load,znver4-fpu0*2|znver4-fpu1*2") + +(define_insn_reservation "znver4_sse_cmp_avx256" 4 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecmp") + (and (eq_attr "mode" "V8SF,V4DF") + (and (eq_attr "prefix" "evex") + (eq_attr "memory" "none"))))) + "znver4-direct,znver4-fpu0*2|znver4-fpu1*2") + +(define_insn_reservation "znver4_sse_cmp_avx256_load" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecmp") + (and (eq_attr "mode" "V8SF,V4DF") + (and (eq_attr "prefix" "evex") + (eq_attr "memory" "load"))))) + "znver4-direct,znver4-load,znver4-fpu0*2|znver4-fpu1*2") + +(define_insn_reservation "znver4_sse_cmp_avx512" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecmp") + (and (eq_attr "mode" "V16SF,V8DF") + (and (eq_attr "prefix" "evex") + (eq_attr "memory" "none"))))) + "znver4-direct,znver4-fpu0*2|znver4-fpu1*2") + +(define_insn_reservation "znver4_sse_cmp_avx512_load" 11 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecmp") + (and (eq_attr "mode" "V16SF,V8DF") + (and (eq_attr "prefix" "evex") + (eq_attr "memory" "load"))))) + "znver4-direct,znver4-load,znver4-fpu0*2|znver4-fpu1*2") + +(define_insn_reservation "znver4_sse_cvt_evex" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecvt") + (and (eq_attr "mode" "V16SF,V8DF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu1*2|znver4-fpu2*2,znver4-fpu2*2|znver4-fpu3*2") + +(define_insn_reservation "znver4_sse_cvt_evex_load" 12 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecvt") + (and (eq_attr "mode" "V16SF,V8DF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu1*2|znver4-fpu2*2,znver4-fpu2*2|znver4-fpu3*2") + +(define_insn_reservation "znver4_sse_shuf_evex" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseshuf") + (and (eq_attr "mode" "V16SF,V8DF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu0*2|znver4-fpu1*2|znver4-fpu2*2|znver4-fpu3*2") + +(define_insn_reservation "znver4_sse_shuf_evex_load" 7 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseshuf") + (and (eq_attr "mode" "V16SF,V8DF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu0*2|znver4-fpu1*2|znver4-fpu2*2|znver4-fpu3*2") + +(define_insn_reservation "znver4_sse_ishuf_evex" 4 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseshuf") + (and (eq_attr "mode" "XI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu1*2|znver4-fpu2*2") + +(define_insn_reservation "znver4_sse_ishuf_evex_load" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseshuf") + (and (eq_attr "mode" "XI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu1*2|znver4-fpu2*2") + +(define_insn_reservation "znver4_sse_muladd" 4 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemuladd") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-fpu0*2|znver4-fpu1*2") + +(define_insn_reservation "znver4_sse_muladd_load" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseshuf") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-fpu0*2|znver4-fpu1*2") + +;; AVX512 mask instructions + +(define_insn_reservation "znver4_sse_mskmov" 2 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "mskmov") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-fpu0*2|znver4-fpu1*2") + +(define_insn_reservation "znver4_sse_msklog" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "msklog") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-fpu2*2|znver4-fpu3*2") -- [-- Attachment #2: 0002-Add-AMD-znver4-instruction-reservations.patch --] [-- Type: application/octet-stream, Size: 41453 bytes --] From d19c9f9e7171bf016fd05f3ccaa95795410066cf Mon Sep 17 00:00:00 2001 From: Tejas Joshi <TejasSanjay.Joshi@amd.com> Date: Wed, 9 Nov 2022 00:10:59 +0530 Subject: [PATCH] Add AMD znver4 instruction reservations This adds znver4 automata units and reservations separately from other znver automata, avoiding the insn-automata.cc size blow-up. gcc/ChangeLog: * gcc/common/config/i386/i386-common.cc (processor_alias_table): Use CPU_ZNVER4 for znver4. * config/i386/i386.md: Add znver4.md. * config/i386/znver4.md: New. Change-Id: Iea39c1c01d4992cf7ac476bd6de65887910bbcbe --- gcc/common/config/i386/i386-common.cc | 2 +- gcc/config/i386/i386.md | 1 + gcc/config/i386/znver4.md | 1068 +++++++++++++++++++++++++ 3 files changed, 1070 insertions(+), 1 deletion(-) create mode 100644 gcc/config/i386/znver4.md diff --git a/gcc/common/config/i386/i386-common.cc b/gcc/common/config/i386/i386-common.cc index 660a977b68b..c7adea57683 100644 --- a/gcc/common/config/i386/i386-common.cc +++ b/gcc/common/config/i386/i386-common.cc @@ -2215,7 +2215,7 @@ const pta processor_alias_table[] = {"znver3", PROCESSOR_ZNVER3, CPU_ZNVER3, PTA_ZNVER3, M_CPU_SUBTYPE (AMDFAM19H_ZNVER3), P_PROC_AVX2}, - {"znver4", PROCESSOR_ZNVER4, CPU_ZNVER3, + {"znver4", PROCESSOR_ZNVER4, CPU_ZNVER4, PTA_ZNVER4, M_CPU_SUBTYPE (AMDFAM19H_ZNVER4), P_PROC_AVX512F}, {"btver1", PROCESSOR_BTVER1, CPU_GENERIC, diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 9451883396c..3a88f16a21a 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -1319,6 +1319,7 @@ (include "bdver3.md") (include "btver2.md") (include "znver.md") +(include "znver4.md") (include "geode.md") (include "atom.md") (include "slm.md") diff --git a/gcc/config/i386/znver4.md b/gcc/config/i386/znver4.md new file mode 100644 index 00000000000..d0b239822a8 --- /dev/null +++ b/gcc/config/i386/znver4.md @@ -0,0 +1,1068 @@ +;; Copyright (C) 2012-2022 Free Software Foundation, Inc. +;; +;; This file is part of GCC. +;; +;; GCC is free software; you can redistribute it and/or modify +;; it under the terms of the GNU General Public License as published by +;; the Free Software Foundation; either version 3, or (at your option) +;; any later version. +;; +;; GCC is distributed in the hope that it will be useful, +;; but WITHOUT ANY WARRANTY; without even the implied warranty of +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +;; GNU General Public License for more details. +;; +;; You should have received a copy of the GNU General Public License +;; along with GCC; see the file COPYING3. If not see +;; <http://www.gnu.org/licenses/>. +;; + + +(define_attr "znver4_decode" "direct,vector,double" + (const_string "direct")) + +;; AMD znver4 Scheduling +;; Modeling automatons for zen decoders, integer execution pipes, +;; AGU pipes, branch, floating point execution and fp store units. +(define_automaton "znver4, znver4_ieu, znver4_idiv, znver4_fdiv, znver4_agu, znver4_fpu, znver4_fp_store") + +;; Decoders unit has 4 decoders and all of them can decode fast path +;; and vector type instructions. +(define_cpu_unit "znver4-decode0" "znver4") +(define_cpu_unit "znver4-decode1" "znver4") +(define_cpu_unit "znver4-decode2" "znver4") +(define_cpu_unit "znver4-decode3" "znver4") + +;; Currently blocking all decoders for vector path instructions as +;; they are dispatched separetely as microcode sequence. +(define_reservation "znver4-vector" "znver4-decode0+znver4-decode1+znver4-decode2+znver4-decode3") + +;; Direct instructions can be issued to any of the four decoders. +(define_reservation "znver4-direct" "znver4-decode0|znver4-decode1|znver4-decode2|znver4-decode3") + +;; Fix me: Need to revisit this later to simulate fast path double behavior. +(define_reservation "znver4-double" "znver4-direct") + + +;; Integer unit 4 ALU pipes. +(define_cpu_unit "znver4-ieu0" "znver4_ieu") +(define_cpu_unit "znver4-ieu1" "znver4_ieu") +(define_cpu_unit "znver4-ieu2" "znver4_ieu") +(define_cpu_unit "znver4-ieu3" "znver4_ieu") +;; Znver4 has an additional branch unit. +(define_cpu_unit "znver4-bru0" "znver4_ieu") +(define_reservation "znver4-ieu" "znver4-ieu0|znver4-ieu1|znver4-ieu2|znver4-ieu3") + +;; 3 AGU pipes in znver4 +(define_cpu_unit "znver4-agu0" "znver4_agu") +(define_cpu_unit "znver4-agu1" "znver4_agu") +(define_cpu_unit "znver4-agu2" "znver4_agu") +(define_reservation "znver4-agu-reserve" "znver4-agu0|znver4-agu1|znver4-agu2") + +;; Load is 4 cycles. We do not model reservation of load unit. +(define_reservation "znver4-load" "znver4-agu-reserve") +(define_reservation "znver4-store" "znver4-agu-reserve") + +;; vectorpath (microcoded) instructions are single issue instructions. +;; So, they occupy all the integer units. +(define_reservation "znver4-ivector" "znver4-ieu0+znver4-ieu1 + +znver4-ieu2+znver4-ieu3+znver4-bru0 + +znver4-agu0+znver4-agu1+znver4-agu2") + +;; Floating point unit 4 FP pipes. +(define_cpu_unit "znver4-fpu0" "znver4_fpu") +(define_cpu_unit "znver4-fpu1" "znver4_fpu") +(define_cpu_unit "znver4-fpu2" "znver4_fpu") +(define_cpu_unit "znver4-fpu3" "znver4_fpu") + +(define_reservation "znver4-fpu" "znver4-fpu0|znver4-fpu1|znver4-fpu2|znver4-fpu3") + +(define_reservation "znver4-fvector" "znver4-fpu0+znver4-fpu1 + +znver4-fpu2+znver4-fpu3 + +znver4-agu0+znver4-agu1+znver4-agu2") + +;; DIV units +(define_cpu_unit "znver4-idiv" "znver4_idiv") +(define_cpu_unit "znver4-fdiv" "znver4_fdiv") + +;; Separate fp store and fp-to-int store. Although there are 2 store pipes, the +;; throughput is limited to only one per cycle. +(define_cpu_unit "znver4-fp-store" "znver4_fp_store") + + +;; Integer Instructions +;; Move instructions +;; XCHG +(define_insn_reservation "znver4_imov_double" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "znver1_decode" "double") + (and (eq_attr "type" "imov") + (eq_attr "memory" "none")))) + "znver4-double,znver4-ieu") + +(define_insn_reservation "znver4_imov_double_load" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "znver1_decode" "double") + (and (eq_attr "type" "imov") + (eq_attr "memory" "load")))) + "znver4-double,znver4-load,znver4-ieu") + +;; imov, imovx +(define_insn_reservation "znver4_imov" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "imov,imovx") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-ieu") + +(define_insn_reservation "znver4_imov_load" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "imov,imovx") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-ieu") + +;; Push Instruction +(define_insn_reservation "znver4_push" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "push") + (eq_attr "memory" "store"))) + "znver4-direct,znver4-store") + +(define_insn_reservation "znver4_push_mem" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "push") + (eq_attr "memory" "both"))) + "znver4-direct,znver4-load,znver4-store") + +;; Pop instruction +(define_insn_reservation "znver4_pop" 4 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "pop") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load") + +(define_insn_reservation "znver4_pop_mem" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "pop") + (eq_attr "memory" "both"))) + "znver4-direct,znver4-load,znver4-store") + +;; Integer Instructions or General instructions +;; Multiplications +(define_insn_reservation "znver4_imul" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "imul") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-ieu1") + +(define_insn_reservation "znver4_imul_load" 7 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "imul") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-ieu1") + +;; Divisions +(define_insn_reservation "znver4_idiv_DI" 18 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "idiv") + (and (eq_attr "mode" "DI") + (eq_attr "memory" "none")))) + "znver4-double,znver4-idiv*10") + +(define_insn_reservation "znver4_idiv_SI" 12 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "idiv") + (and (eq_attr "mode" "SI") + (eq_attr "memory" "none")))) + "znver4-double,znver4-idiv*6") + +(define_insn_reservation "znver4_idiv_HI" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "idiv") + (and (eq_attr "mode" "HI") + (eq_attr "memory" "none")))) + "znver4-double,znver4-idiv*4") + +(define_insn_reservation "znver4_idiv_QI" 9 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "idiv") + (and (eq_attr "mode" "QI") + (eq_attr "memory" "none")))) + "znver4-double,znver4-idiv*4") + +(define_insn_reservation "znver4_idiv_DI_load" 22 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "idiv") + (and (eq_attr "mode" "DI") + (eq_attr "memory" "load")))) + "znver4-double,znver4-load,znver4-idiv*10") + +(define_insn_reservation "znver4_idiv_SI_load" 16 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "idiv") + (and (eq_attr "mode" "SI") + (eq_attr "memory" "load")))) + "znver4-double,znver4-load,znver4-idiv*6") + +(define_insn_reservation "znver4_idiv_HI_load" 14 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "idiv") + (and (eq_attr "mode" "HI") + (eq_attr "memory" "load")))) + "znver4-double,znver4-load,znver4-idiv*4") + +(define_insn_reservation "znver4_idiv_QI_load" 13 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "idiv") + (and (eq_attr "mode" "QI") + (eq_attr "memory" "load")))) + "znver4-double,znver4-load,znver4-idiv*4") + +;; INTEGER/GENERAL Instructions +(define_insn_reservation "znver4_insn" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "alu,alu1,negnot,rotate1,ishift1,test,incdec,icmp") + (eq_attr "memory" "none,unknown"))) + "znver4-direct,znver4-ieu") + +(define_insn_reservation "znver4_insn_load" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "alu,alu1,negnot,rotate1,ishift1,test,incdec,icmp") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-ieu") + +(define_insn_reservation "znver4_insn2" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "icmov,setcc") + (eq_attr "memory" "none,unknown"))) + "znver4-direct,znver4-ieu0|znver4-ieu3") + +(define_insn_reservation "znver4_insn2_load" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "icmov,setcc") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-ieu0|znver4-ieu3") + +(define_insn_reservation "znver4_rotate" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "rotate") + (eq_attr "memory" "none,unknown"))) + "znver4-direct,znver4-ieu1|znver4-ieu2") + +(define_insn_reservation "znver4_rotate_load" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "rotate") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-ieu1|znver4-ieu2") + +(define_insn_reservation "znver4_insn_store" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "alu,alu1,negnot,rotate1,ishift1,test,incdec,icmp") + (eq_attr "memory" "store"))) + "znver4-direct,znver4-ieu,znver4-store") + +(define_insn_reservation "znver4_insn2_store" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "icmov,setcc") + (eq_attr "memory" "store"))) + "znver4-direct,znver4-ieu0|znver4-ieu3,znver4-store") + +(define_insn_reservation "znver4_rotate_store" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "rotate") + (eq_attr "memory" "store"))) + "znver4-direct,znver4-ieu1|znver4-ieu2,znver4-store") + +;; alu1 instructions +(define_insn_reservation "znver4_alu1_vector" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "znver1_decode" "vector") + (and (eq_attr "type" "alu1") + (eq_attr "memory" "none,unknown")))) + "znver4-vector,znver4-ivector*3") + +(define_insn_reservation "znver4_alu1_vector_load" 7 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "znver1_decode" "vector") + (and (eq_attr "type" "alu1") + (eq_attr "memory" "load")))) + "znver4-vector,znver4-load,znver4-ivector*3") + +;; Call Instruction +(define_insn_reservation "znver4_call" 1 + (and (eq_attr "cpu" "znver4") + (eq_attr "type" "call,callv")) + "znver4-double,znver4-ieu0|znver4-bru0,znver4-store") + +;; Branches +(define_insn_reservation "znver4_branch" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ibr") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-ieu0|znver4-bru0") + +(define_insn_reservation "znver4_branch_load" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ibr") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-ieu0|znver4-bru0") + +(define_insn_reservation "znver4_branch_vector" 2 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ibr") + (eq_attr "memory" "none,unknown"))) + "znver4-vector,znver4-ivector*2") + +(define_insn_reservation "znver4_branch_vector_load" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ibr") + (eq_attr "memory" "load"))) + "znver4-vector,znver4-load,znver4-ivector*2") + +;; LEA instruction with simple addressing +(define_insn_reservation "znver4_lea" 1 + (and (eq_attr "cpu" "znver4") + (eq_attr "type" "lea")) + "znver4-direct,znver4-ieu") + +;; Leave +(define_insn_reservation "znver4_leave" 1 + (and (eq_attr "cpu" "znver4") + (eq_attr "type" "leave")) + "znver4-double,znver4-ieu,znver4-store") + +;; STR and ISHIFT are microcoded. +(define_insn_reservation "znver4_str" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "str") + (eq_attr "memory" "none"))) + "znver4-vector,znver4-ivector*3") + +(define_insn_reservation "znver4_str_load" 7 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "str") + (eq_attr "memory" "load"))) + "znver4-vector,znver4-load,znver4-ivector*3") + +(define_insn_reservation "znver4_ishift" 2 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ishift") + (eq_attr "memory" "none"))) + "znver4-vector,znver4-ivector*2") + +(define_insn_reservation "znver4_ishift_load" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ishift") + (eq_attr "memory" "load"))) + "znver4-vector,znver4-load,znver4-ivector*2") + +;; Other vector type +(define_insn_reservation "znver4_ieu_vector" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "other,multi") + (eq_attr "memory" "none,unknown"))) + "znver4-vector,znver4-ivector*5") + +(define_insn_reservation "znver4_ieu_vector_load" 9 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "other,multi") + (eq_attr "memory" "load"))) + "znver4-vector,znver4-load,znver4-ivector*5") + +;; Floating Point +;; FP movs +(define_insn_reservation "znver4_fp_cmov" 4 + (and (eq_attr "cpu" "znver4") + (eq_attr "type" "fcmov")) + "znver4-vector,znver4-fvector*3") + +(define_insn_reservation "znver4_fp_mov_direct" 1 + (and (eq_attr "cpu" "znver4") + (eq_attr "type" "fmov")) + "znver4-direct,znver4-fpu0|znver4-fpu1") + +;;FLD +(define_insn_reservation "znver4_fp_mov_direct_load" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "znver1_decode" "direct") + (and (eq_attr "type" "fmov") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu0|znver4-fpu1") + +;;FST +(define_insn_reservation "znver4_fp_mov_direct_store" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "znver1_decode" "direct") + (and (eq_attr "type" "fmov") + (eq_attr "memory" "store")))) + "znver4-direct,znver4-fpu0|znver4-fpu1,znver4-fp-store") + +;;FILD +(define_insn_reservation "znver4_fp_mov_double_load" 13 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "znver1_decode" "double") + (and (eq_attr "type" "fmov") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu1") + +;;FIST +(define_insn_reservation "znver4_fp_mov_double_store" 7 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "znver1_decode" "double") + (and (eq_attr "type" "fmov") + (eq_attr "memory" "store")))) + "znver4-double,znver4-fpu1,znver4-fp-store") + +;; FSQRT +(define_insn_reservation "znver4_fsqrt" 22 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "fpspc") + (and (eq_attr "mode" "XF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fdiv*10") + +;; FPSPC instructions +(define_insn_reservation "znver4_fp_spc" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "fpspc") + (eq_attr "memory" "none"))) + "znver4-vector,znver4-fvector*6") + +(define_insn_reservation "znver4_fp_insn_vector" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "znver1_decode" "vector") + (eq_attr "type" "mmxcvt,sselog1,ssemov"))) + "znver4-vector,znver4-fvector*6") + +;; FADD, FSUB, FMUL +(define_insn_reservation "znver4_fp_op_mul" 7 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "fop,fmul") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-fpu0") + +(define_insn_reservation "znver4_fp_op_mul_load" 12 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "fop,fmul") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-fpu0") + +;; FDIV +(define_insn_reservation "znver4_fp_div" 15 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "fdiv") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-fdiv*6") + +(define_insn_reservation "znver4_fp_div_load" 20 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "fdiv") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-fdiv*6") + +(define_insn_reservation "znver4_fp_idiv_load" 24 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "fdiv") + (and (eq_attr "fp_int_src" "true") + (eq_attr "memory" "load")))) + "znver4-double,znver4-load,znver4-fdiv*6") + +;; FABS, FCHS +(define_insn_reservation "znver4_fp_fsgn" 1 + (and (eq_attr "cpu" "znver4") + (eq_attr "type" "fsgn")) + "znver4-direct,znver4-fpu0|znver4-fpu1") + +;; FCMP +(define_insn_reservation "znver4_fp_fcmp" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "fcmp") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-fpu1") + +(define_insn_reservation "znver4_fp_fcmp_double" 4 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "fcmp") + (and (eq_attr "znver1_decode" "double") + (eq_attr "memory" "none")))) + "znver4-double,znver4-fpu1,znver4-fpu2") + +;; MMX, SSE, SSEn.n instructions +(define_insn_reservation "znver4_fp_mmx " 1 + (and (eq_attr "cpu" "znver4") + (eq_attr "type" "mmx")) + "znver4-direct,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_mmx_add_cmp" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "mmxadd,mmxcmp") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-fpu") + +(define_insn_reservation "znver4_mmx_add_cmp_load" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "mmxadd,mmxcmp") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-fpu") + +(define_insn_reservation "znver4_mmx_insn" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "mmxcvt,sseshuf,sseshuf1,mmxshft") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_mmx_insn_load" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "mmxcvt,sseshuf,sseshuf1,mmxshft") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_mmx_mov" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "mmxmov") + (eq_attr "memory" "store"))) + "znver4-direct,znver4-fp-store") + +(define_insn_reservation "znver4_mmx_mov_load" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "mmxmov") + (eq_attr "memory" "both"))) + "znver4-direct,znver4-load,znver4-fp-store") + +(define_insn_reservation "znver4_mmx_mul" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "mmxmul") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-fpu0|znver4-fpu3") + +(define_insn_reservation "znver4_mmx_mul_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "mmxmul") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-fpu0|znver4-fpu3") + +;; AVX instructions +(define_insn_reservation "znver4_sse_log" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sselog") + (and (eq_attr "mode" "V4SF,V8SF,V2DF,V4DF,QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu") + +(define_insn_reservation "znver4_sse_log_load" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sselog") + (and (eq_attr "mode" "V4SF,V8SF,V2DF,V4DF,QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu") + +(define_insn_reservation "znver4_sse_log1" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sselog1") + (and (eq_attr "mode" "V4SF,V8SF,V2DF,V4DF,QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "store")))) + "znver4-direct,znver4-fpu1|znver4-fpu2,znver4-fp-store") + +(define_insn_reservation "znver4_sse_log1_load" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sselog1") + (and (eq_attr "mode" "V4SF,V8SF,V2DF,V4DF,QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "both")))) + "znver4-direct,znver4-load,znver4-fpu1|znver4-fpu2,znver4-fp-store") + +(define_insn_reservation "znver4_sse_comi" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecomi") + (eq_attr "memory" "store"))) + "znver4-double,znver4-fpu2|znver4-fpu3,znver4-fp-store") + +(define_insn_reservation "znver4_sse_comi_load" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecomi") + (eq_attr "memory" "both"))) + "znver4-double,znver4-load,znver4-fpu2|znver4-fpu3,znver4-fp-store") + +(define_insn_reservation "znver4_sse_test" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "prefix_extra" "1") + (and (eq_attr "type" "ssecomi") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_sse_test_load" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "prefix_extra" "1") + (and (eq_attr "type" "ssecomi") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_sse_imul" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseimul") + (and (eq_attr "mode" "QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu0|znver4-fpu3") + +(define_insn_reservation "znver4_sse_imul_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseimul") + (and (eq_attr "mode" "QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu0|znver4-fpu1") + +(define_insn_reservation "znver4_sse_mov" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_sse_mov_load" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_sse_mov_store" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "store")))) + "znver4-direct,znver4-fpu1|znver4-fpu2,znver4-fp-store") + +(define_insn_reservation "znver4_sse_mov_fp" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "V16SF,V8DF,V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu") + +(define_insn_reservation "znver4_sse_mov_fp_load" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "V16SF,V8DF,V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu") + +(define_insn_reservation "znver4_sse_mov_fp_store" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "V16SF,V8DF,V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "store")))) + "znver4-direct,znver4-fp-store") + +(define_insn_reservation "znver4_sse_add" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseadd") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu2|znver4-fpu3") + +(define_insn_reservation "znver4_sse_add_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseadd") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu2|znver4-fpu3") + +(define_insn_reservation "znver4_sse_add1" 4 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseadd1") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "none")))) + "znver4-vector,znver4-fvector*2") + +(define_insn_reservation "znver4_sse_add1_load" 9 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseadd1") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "load")))) + "znver4-vector,znver4-load,znver4-fvector*2") + +(define_insn_reservation "znver4_sse_iadd" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseiadd") + (and (eq_attr "mode" "QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu") + +(define_insn_reservation "znver4_sse_iadd_load" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseiadd") + (and (eq_attr "mode" "QI,HI,SI,DI,TI,OI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu") + +(define_insn_reservation "znver4_sse_mul" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemul") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu0|znver4-fpu1") + +(define_insn_reservation "znver4_sse_mul_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemul") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu0|znver4-fpu1") + +(define_insn_reservation "znver4_sse_div_pd" 13 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssediv") + (and (eq_attr "mode" "V4DF,V2DF,V1DF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fdiv*5") + +(define_insn_reservation "znver4_sse_div_ps" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssediv") + (and (eq_attr "mode" "V8SF,V4SF,V2SF,SF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fdiv*3") + +(define_insn_reservation "znver4_sse_div_pd_load" 18 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssediv") + (and (eq_attr "mode" "V4DF,V2DF,V1DF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fdiv*5") + +(define_insn_reservation "znver4_sse_div_ps_load" 15 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssediv") + (and (eq_attr "mode" "V8SF,V4SF,V2SF,SF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fdiv*3") + +(define_insn_reservation "znver4_sse_cmp_avx" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecmp") + (and (eq_attr "prefix" "vex") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu0|znver4-fpu1") + +(define_insn_reservation "znver4_sse_cmp_avx_load" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecmp") + (and (eq_attr "prefix" "vex") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu0|znver4-fpu1") + +(define_insn_reservation "znver4_sse_comi_avx" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecomi") + (eq_attr "memory" "store"))) + "znver4-direct,znver4-fpu2+znver4-fpu3,znver4-fp-store") + +(define_insn_reservation "znver4_sse_comi_avx_load" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecomi") + (eq_attr "memory" "both"))) + "znver4-direct,znver4-load,znver4-fpu2+znver4-fpu3,znver4-fp-store") + +(define_insn_reservation "znver4_sse_cvt" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecvt") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu2|znver4-fpu3") + +(define_insn_reservation "znver4_sse_cvt_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecvt") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu2|znver4-fpu3") + +(define_insn_reservation "znver4_sse_icvt" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecvt") + (and (eq_attr "mode" "SI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu2|znver4-fpu3") + +(define_insn_reservation "znver4_sse_icvt_store" 4 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecvt") + (and (eq_attr "mode" "SI") + (eq_attr "memory" "store")))) + "znver4-double,znver4-fpu2|znver4-fpu3,znver4-fp-store") + +(define_insn_reservation "znver4_sse_shuf" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseshuf") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_sse_shuf_load" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseshuf") + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu") + +(define_insn_reservation "znver4_sse_ishuf" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseshuf") + (and (eq_attr "mode" "OI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu1|znver4-fpu2") + +(define_insn_reservation "znver4_sse_ishuf_load" 8 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseshuf") + (and (eq_attr "mode" "OI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu1|znver4-fpu2") + +;; AVX512 instructions +(define_insn_reservation "znver4_sse_log_evex" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sselog") + (and (eq_attr "mode" "V16SF,V8DF,XI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu0*2|znver4-fpu1*2|znver4-fpu2*2|znver4-fpu3*2") + +(define_insn_reservation "znver4_sse_log_evex_load" 7 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sselog") + (and (eq_attr "mode" "V16SF,V8DF,XI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu0*2|znver4-fpu1*2|znver4-fpu2*2|znver4-fpu3*2") + +(define_insn_reservation "znver4_sse_log1_evex" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sselog1") + (and (eq_attr "mode" "V16SF,V8DF,XI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu1*2|znver4-fpu2*2,znver4-fp-store") + +(define_insn_reservation "znver4_sse_log1_evex_load" 7 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sselog1") + (and (eq_attr "mode" "V16SF,V8DF,XI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu1*2|znver4-fpu2*2,znver4-fp-store") + +(define_insn_reservation "znver4_sse_mul_evex" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemul") + (and (eq_attr "mode" "V16SF,V8DF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu0*2|znver4-fpu1*2") + +(define_insn_reservation "znver4_sse_mul_evex_load" 9 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemul") + (and (eq_attr "mode" "V16SF,V8DF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu0*2|znver4-fpu1*2") + +(define_insn_reservation "znver4_sse_imul_evex" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseimul") + (and (eq_attr "mode" "XI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu0*2|znver4-fpu3*2") + +(define_insn_reservation "znver4_sse_imul_evex_load" 9 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseimul") + (and (eq_attr "mode" "XI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu0*2|znver4-fpu1*2") + +(define_insn_reservation "znver4_sse_mov_evex" 4 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "XI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu1*2|znver4-fpu2*2") + +(define_insn_reservation "znver4_sse_mov_evex_load" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "XI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu1*2|znver4-fpu2*2") + +(define_insn_reservation "znver4_sse_mov_evex_store" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "XI") + (eq_attr "memory" "store")))) + "znver4-direct,znver4-fpu1*2|znver4-fpu2*2,znver4-fp-store") + +(define_insn_reservation "znver4_sse_add_evex" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseadd") + (and (eq_attr "mode" "V16SF,V8DF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu2*2|znver4-fpu3*2") + +(define_insn_reservation "znver4_sse_add_evex_load" 9 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseadd") + (and (eq_attr "mode" "V16SF,V8DF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu2*2|znver4-fpu3*2") + +(define_insn_reservation "znver4_sse_iadd_evex" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseiadd") + (and (eq_attr "mode" "XI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu0*2|znver4-fpu1*2|znver4-fpu2*2|znver4-fpu3*2") + +(define_insn_reservation "znver4_sse_iadd_evex_load" 7 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseiadd") + (and (eq_attr "mode" "XI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu0*2|znver4-fpu1*2|znver4-fpu2*2|znver4-fpu3*2") + +(define_insn_reservation "znver4_sse_div_pd_evex" 13 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssediv") + (and (eq_attr "mode" "V8DF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fdiv*9") + +(define_insn_reservation "znver4_sse_div_ps_evex" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssediv") + (and (eq_attr "mode" "V16SF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fdiv*6") + +(define_insn_reservation "znver4_sse_div_pd_evex_load" 19 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssediv") + (and (eq_attr "mode" "V8DF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fdiv*9") + +(define_insn_reservation "znver4_sse_div_ps_evex_load" 16 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssediv") + (and (eq_attr "mode" "V16SF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fdiv*6") + +(define_insn_reservation "znver4_sse_cmp_avx128" 3 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecmp") + (and (eq_attr "mode" "V4SF,V2DF,V2SF,V1DF,SF") + (and (eq_attr "prefix" "evex") + (eq_attr "memory" "none"))))) + "znver4-direct,znver4-fpu0*2|znver4-fpu1*2") + +(define_insn_reservation "znver4_sse_cmp_avx128_load" 9 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecmp") + (and (eq_attr "mode" "V4SF,V2DF,V2SF,V1DF,SF") + (and (eq_attr "prefix" "evex") + (eq_attr "memory" "load"))))) + "znver4-direct,znver4-load,znver4-fpu0*2|znver4-fpu1*2") + +(define_insn_reservation "znver4_sse_cmp_avx256" 4 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecmp") + (and (eq_attr "mode" "V8SF,V4DF") + (and (eq_attr "prefix" "evex") + (eq_attr "memory" "none"))))) + "znver4-direct,znver4-fpu0*2|znver4-fpu1*2") + +(define_insn_reservation "znver4_sse_cmp_avx256_load" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecmp") + (and (eq_attr "mode" "V8SF,V4DF") + (and (eq_attr "prefix" "evex") + (eq_attr "memory" "load"))))) + "znver4-direct,znver4-load,znver4-fpu0*2|znver4-fpu1*2") + +(define_insn_reservation "znver4_sse_cmp_avx512" 5 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecmp") + (and (eq_attr "mode" "V16SF,V8DF") + (and (eq_attr "prefix" "evex") + (eq_attr "memory" "none"))))) + "znver4-direct,znver4-fpu0*2|znver4-fpu1*2") + +(define_insn_reservation "znver4_sse_cmp_avx512_load" 11 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecmp") + (and (eq_attr "mode" "V16SF,V8DF") + (and (eq_attr "prefix" "evex") + (eq_attr "memory" "load"))))) + "znver4-direct,znver4-load,znver4-fpu0*2|znver4-fpu1*2") + +(define_insn_reservation "znver4_sse_cvt_evex" 6 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecvt") + (and (eq_attr "mode" "V16SF,V8DF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu1*2|znver4-fpu2*2,znver4-fpu2*2|znver4-fpu3*2") + +(define_insn_reservation "znver4_sse_cvt_evex_load" 12 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssecvt") + (and (eq_attr "mode" "V16SF,V8DF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu1*2|znver4-fpu2*2,znver4-fpu2*2|znver4-fpu3*2") + +(define_insn_reservation "znver4_sse_shuf_evex" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseshuf") + (and (eq_attr "mode" "V16SF,V8DF") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu0*2|znver4-fpu1*2|znver4-fpu2*2|znver4-fpu3*2") + +(define_insn_reservation "znver4_sse_shuf_evex_load" 7 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseshuf") + (and (eq_attr "mode" "V16SF,V8DF") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu0*2|znver4-fpu1*2|znver4-fpu2*2|znver4-fpu3*2") + +(define_insn_reservation "znver4_sse_ishuf_evex" 4 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseshuf") + (and (eq_attr "mode" "XI") + (eq_attr "memory" "none")))) + "znver4-direct,znver4-fpu1*2|znver4-fpu2*2") + +(define_insn_reservation "znver4_sse_ishuf_evex_load" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseshuf") + (and (eq_attr "mode" "XI") + (eq_attr "memory" "load")))) + "znver4-direct,znver4-load,znver4-fpu1*2|znver4-fpu2*2") + +(define_insn_reservation "znver4_sse_muladd" 4 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "ssemuladd") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-fpu0*2|znver4-fpu1*2") + +(define_insn_reservation "znver4_sse_muladd_load" 10 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "sseshuf") + (eq_attr "memory" "load"))) + "znver4-direct,znver4-load,znver4-fpu0*2|znver4-fpu1*2") + +;; AVX512 mask instructions + +(define_insn_reservation "znver4_sse_mskmov" 2 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "mskmov") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-fpu0*2|znver4-fpu1*2") + +(define_insn_reservation "znver4_sse_msklog" 1 + (and (eq_attr "cpu" "znver4") + (and (eq_attr "type" "msklog") + (eq_attr "memory" "none"))) + "znver4-direct,znver4-fpu2*2|znver4-fpu3*2") -- 2.25.1 ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH][X86_64] Separate znver4 insn reservations from older znvers 2022-12-22 17:34 ` Joshi, Tejas Sanjay @ 2023-01-03 14:36 ` Jan Hubicka 2023-01-03 14:52 ` Alexander Monakov 2023-01-05 5:52 ` Joshi, Tejas Sanjay 0 siblings, 2 replies; 14+ messages in thread From: Jan Hubicka @ 2023-01-03 14:36 UTC (permalink / raw) To: Joshi, Tejas Sanjay; +Cc: gcc-patches, Alexander Monakov, Kumar, Venkataramanan > [Public] > > Hello, > > I have addressed all your comments in this revision of the patch, please find attached and inlined. > > * I have updated all the latencies with Agner's measurements. > * Incorrect pipelines, loads/stores are addressed. > * The double pumped avx512 insns take one cycle for 256 half and the next cycle for remaining 256-bit half in the same pipeline, thus pipe*2. > > Is this ok for trunk? > > Thanks and Regards, > Tejas > > gcc/ChangeLog: > > * gcc/common/config/i386/i386-common.cc (processor_alias_table): > Use CPU_ZNVER4 for znver4. > * config/i386/i386.md: Add znver4.md. > * config/i386/znver4.md: New. OK, thanks! Honza > > Change-Id: Iea39c1c01d4992cf7ac476bd6de65887910bbcbe > --- > gcc/common/config/i386/i386-common.cc | 2 +- > gcc/config/i386/i386.md | 1 + > gcc/config/i386/znver4.md | 1068 +++++++++++++++++++++++++ > 3 files changed, 1070 insertions(+), 1 deletion(-) > create mode 100644 gcc/config/i386/znver4.md > > diff --git a/gcc/common/config/i386/i386-common.cc b/gcc/common/config/i386/i386-common.cc > index 660a977b68b..c7adea57683 100644 > --- a/gcc/common/config/i386/i386-common.cc > +++ b/gcc/common/config/i386/i386-common.cc > @@ -2215,7 +2215,7 @@ const pta processor_alias_table[] = > {"znver3", PROCESSOR_ZNVER3, CPU_ZNVER3, > PTA_ZNVER3, > M_CPU_SUBTYPE (AMDFAM19H_ZNVER3), P_PROC_AVX2}, > - {"znver4", PROCESSOR_ZNVER4, CPU_ZNVER3, > + {"znver4", PROCESSOR_ZNVER4, CPU_ZNVER4, > PTA_ZNVER4, > M_CPU_SUBTYPE (AMDFAM19H_ZNVER4), P_PROC_AVX512F}, > {"btver1", PROCESSOR_BTVER1, CPU_GENERIC, > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md > index 9451883396c..3a88f16a21a 100644 > --- a/gcc/config/i386/i386.md > +++ b/gcc/config/i386/i386.md > @@ -1319,6 +1319,7 @@ > (include "bdver3.md") > (include "btver2.md") > (include "znver.md") > +(include "znver4.md") > (include "geode.md") > (include "atom.md") > (include "slm.md") > diff --git a/gcc/config/i386/znver4.md b/gcc/config/i386/znver4.md > new file mode 100644 > index 00000000000..d0b239822a8 > --- /dev/null > +++ b/gcc/config/i386/znver4.md > @@ -0,0 +1,1068 @@ > +;; Copyright (C) 2012-2022 Free Software Foundation, Inc. > +;; > +;; This file is part of GCC. > +;; > +;; GCC is free software; you can redistribute it and/or modify > +;; it under the terms of the GNU General Public License as published by > +;; the Free Software Foundation; either version 3, or (at your option) > +;; any later version. > +;; > +;; GCC is distributed in the hope that it will be useful, > +;; but WITHOUT ANY WARRANTY; without even the implied warranty of > +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > +;; GNU General Public License for more details. > +;; > +;; You should have received a copy of the GNU General Public License > +;; along with GCC; see the file COPYING3. If not see > +;; <http://www.gnu.org/licenses/>. > +;; > + > + > +(define_attr "znver4_decode" "direct,vector,double" > + (const_string "direct")) > + > +;; AMD znver4 Scheduling > +;; Modeling automatons for zen decoders, integer execution pipes, > +;; AGU pipes, branch, floating point execution and fp store units. > +(define_automaton "znver4, znver4_ieu, znver4_idiv, znver4_fdiv, znver4_agu, znver4_fpu, znver4_fp_store") > + > +;; Decoders unit has 4 decoders and all of them can decode fast path > +;; and vector type instructions. > +(define_cpu_unit "znver4-decode0" "znver4") > +(define_cpu_unit "znver4-decode1" "znver4") > +(define_cpu_unit "znver4-decode2" "znver4") > +(define_cpu_unit "znver4-decode3" "znver4") > + > +;; Currently blocking all decoders for vector path instructions as > +;; they are dispatched separetely as microcode sequence. > +(define_reservation "znver4-vector" "znver4-decode0+znver4-decode1+znver4-decode2+znver4-decode3") > + > +;; Direct instructions can be issued to any of the four decoders. > +(define_reservation "znver4-direct" "znver4-decode0|znver4-decode1|znver4-decode2|znver4-decode3") > + > +;; Fix me: Need to revisit this later to simulate fast path double behavior. > +(define_reservation "znver4-double" "znver4-direct") > + > + > +;; Integer unit 4 ALU pipes. > +(define_cpu_unit "znver4-ieu0" "znver4_ieu") > +(define_cpu_unit "znver4-ieu1" "znver4_ieu") > +(define_cpu_unit "znver4-ieu2" "znver4_ieu") > +(define_cpu_unit "znver4-ieu3" "znver4_ieu") > +;; Znver4 has an additional branch unit. > +(define_cpu_unit "znver4-bru0" "znver4_ieu") > +(define_reservation "znver4-ieu" "znver4-ieu0|znver4-ieu1|znver4-ieu2|znver4-ieu3") > + > +;; 3 AGU pipes in znver4 > +(define_cpu_unit "znver4-agu0" "znver4_agu") > +(define_cpu_unit "znver4-agu1" "znver4_agu") > +(define_cpu_unit "znver4-agu2" "znver4_agu") > +(define_reservation "znver4-agu-reserve" "znver4-agu0|znver4-agu1|znver4-agu2") > + > +;; Load is 4 cycles. We do not model reservation of load unit. > +(define_reservation "znver4-load" "znver4-agu-reserve") > +(define_reservation "znver4-store" "znver4-agu-reserve") > + > +;; vectorpath (microcoded) instructions are single issue instructions. > +;; So, they occupy all the integer units. > +(define_reservation "znver4-ivector" "znver4-ieu0+znver4-ieu1 > + +znver4-ieu2+znver4-ieu3+znver4-bru0 > + +znver4-agu0+znver4-agu1+znver4-agu2") > + > +;; Floating point unit 4 FP pipes. > +(define_cpu_unit "znver4-fpu0" "znver4_fpu") > +(define_cpu_unit "znver4-fpu1" "znver4_fpu") > +(define_cpu_unit "znver4-fpu2" "znver4_fpu") > +(define_cpu_unit "znver4-fpu3" "znver4_fpu") > + > +(define_reservation "znver4-fpu" "znver4-fpu0|znver4-fpu1|znver4-fpu2|znver4-fpu3") > + > +(define_reservation "znver4-fvector" "znver4-fpu0+znver4-fpu1 > + +znver4-fpu2+znver4-fpu3 > + +znver4-agu0+znver4-agu1+znver4-agu2") > + > +;; DIV units > +(define_cpu_unit "znver4-idiv" "znver4_idiv") > +(define_cpu_unit "znver4-fdiv" "znver4_fdiv") > + > +;; Separate fp store and fp-to-int store. Although there are 2 store pipes, the > +;; throughput is limited to only one per cycle. > +(define_cpu_unit "znver4-fp-store" "znver4_fp_store") > + > + > +;; Integer Instructions > +;; Move instructions > +;; XCHG > +(define_insn_reservation "znver4_imov_double" 1 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "znver1_decode" "double") > + (and (eq_attr "type" "imov") > + (eq_attr "memory" "none")))) > + "znver4-double,znver4-ieu") > + > +(define_insn_reservation "znver4_imov_double_load" 5 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "znver1_decode" "double") > + (and (eq_attr "type" "imov") > + (eq_attr "memory" "load")))) > + "znver4-double,znver4-load,znver4-ieu") > + > +;; imov, imovx > +(define_insn_reservation "znver4_imov" 1 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "imov,imovx") > + (eq_attr "memory" "none"))) > + "znver4-direct,znver4-ieu") > + > +(define_insn_reservation "znver4_imov_load" 5 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "imov,imovx") > + (eq_attr "memory" "load"))) > + "znver4-direct,znver4-load,znver4-ieu") > + > +;; Push Instruction > +(define_insn_reservation "znver4_push" 1 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "push") > + (eq_attr "memory" "store"))) > + "znver4-direct,znver4-store") > + > +(define_insn_reservation "znver4_push_mem" 5 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "push") > + (eq_attr "memory" "both"))) > + "znver4-direct,znver4-load,znver4-store") > + > +;; Pop instruction > +(define_insn_reservation "znver4_pop" 4 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "pop") > + (eq_attr "memory" "load"))) > + "znver4-direct,znver4-load") > + > +(define_insn_reservation "znver4_pop_mem" 5 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "pop") > + (eq_attr "memory" "both"))) > + "znver4-direct,znver4-load,znver4-store") > + > +;; Integer Instructions or General instructions > +;; Multiplications > +(define_insn_reservation "znver4_imul" 3 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "imul") > + (eq_attr "memory" "none"))) > + "znver4-direct,znver4-ieu1") > + > +(define_insn_reservation "znver4_imul_load" 7 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "imul") > + (eq_attr "memory" "load"))) > + "znver4-direct,znver4-load,znver4-ieu1") > + > +;; Divisions > +(define_insn_reservation "znver4_idiv_DI" 18 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "idiv") > + (and (eq_attr "mode" "DI") > + (eq_attr "memory" "none")))) > + "znver4-double,znver4-idiv*10") > + > +(define_insn_reservation "znver4_idiv_SI" 12 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "idiv") > + (and (eq_attr "mode" "SI") > + (eq_attr "memory" "none")))) > + "znver4-double,znver4-idiv*6") > + > +(define_insn_reservation "znver4_idiv_HI" 10 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "idiv") > + (and (eq_attr "mode" "HI") > + (eq_attr "memory" "none")))) > + "znver4-double,znver4-idiv*4") > + > +(define_insn_reservation "znver4_idiv_QI" 9 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "idiv") > + (and (eq_attr "mode" "QI") > + (eq_attr "memory" "none")))) > + "znver4-double,znver4-idiv*4") > + > +(define_insn_reservation "znver4_idiv_DI_load" 22 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "idiv") > + (and (eq_attr "mode" "DI") > + (eq_attr "memory" "load")))) > + "znver4-double,znver4-load,znver4-idiv*10") > + > +(define_insn_reservation "znver4_idiv_SI_load" 16 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "idiv") > + (and (eq_attr "mode" "SI") > + (eq_attr "memory" "load")))) > + "znver4-double,znver4-load,znver4-idiv*6") > + > +(define_insn_reservation "znver4_idiv_HI_load" 14 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "idiv") > + (and (eq_attr "mode" "HI") > + (eq_attr "memory" "load")))) > + "znver4-double,znver4-load,znver4-idiv*4") > + > +(define_insn_reservation "znver4_idiv_QI_load" 13 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "idiv") > + (and (eq_attr "mode" "QI") > + (eq_attr "memory" "load")))) > + "znver4-double,znver4-load,znver4-idiv*4") > + > +;; INTEGER/GENERAL Instructions > +(define_insn_reservation "znver4_insn" 1 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "alu,alu1,negnot,rotate1,ishift1,test,incdec,icmp") > + (eq_attr "memory" "none,unknown"))) > + "znver4-direct,znver4-ieu") > + > +(define_insn_reservation "znver4_insn_load" 5 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "alu,alu1,negnot,rotate1,ishift1,test,incdec,icmp") > + (eq_attr "memory" "load"))) > + "znver4-direct,znver4-load,znver4-ieu") > + > +(define_insn_reservation "znver4_insn2" 1 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "icmov,setcc") > + (eq_attr "memory" "none,unknown"))) > + "znver4-direct,znver4-ieu0|znver4-ieu3") > + > +(define_insn_reservation "znver4_insn2_load" 5 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "icmov,setcc") > + (eq_attr "memory" "load"))) > + "znver4-direct,znver4-load,znver4-ieu0|znver4-ieu3") > + > +(define_insn_reservation "znver4_rotate" 1 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "rotate") > + (eq_attr "memory" "none,unknown"))) > + "znver4-direct,znver4-ieu1|znver4-ieu2") > + > +(define_insn_reservation "znver4_rotate_load" 5 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "rotate") > + (eq_attr "memory" "load"))) > + "znver4-direct,znver4-load,znver4-ieu1|znver4-ieu2") > + > +(define_insn_reservation "znver4_insn_store" 1 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "alu,alu1,negnot,rotate1,ishift1,test,incdec,icmp") > + (eq_attr "memory" "store"))) > + "znver4-direct,znver4-ieu,znver4-store") > + > +(define_insn_reservation "znver4_insn2_store" 1 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "icmov,setcc") > + (eq_attr "memory" "store"))) > + "znver4-direct,znver4-ieu0|znver4-ieu3,znver4-store") > + > +(define_insn_reservation "znver4_rotate_store" 1 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "rotate") > + (eq_attr "memory" "store"))) > + "znver4-direct,znver4-ieu1|znver4-ieu2,znver4-store") > + > +;; alu1 instructions > +(define_insn_reservation "znver4_alu1_vector" 3 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "znver1_decode" "vector") > + (and (eq_attr "type" "alu1") > + (eq_attr "memory" "none,unknown")))) > + "znver4-vector,znver4-ivector*3") > + > +(define_insn_reservation "znver4_alu1_vector_load" 7 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "znver1_decode" "vector") > + (and (eq_attr "type" "alu1") > + (eq_attr "memory" "load")))) > + "znver4-vector,znver4-load,znver4-ivector*3") > + > +;; Call Instruction > +(define_insn_reservation "znver4_call" 1 > + (and (eq_attr "cpu" "znver4") > + (eq_attr "type" "call,callv")) > + "znver4-double,znver4-ieu0|znver4-bru0,znver4-store") > + > +;; Branches > +(define_insn_reservation "znver4_branch" 1 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ibr") > + (eq_attr "memory" "none"))) > + "znver4-direct,znver4-ieu0|znver4-bru0") > + > +(define_insn_reservation "znver4_branch_load" 5 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ibr") > + (eq_attr "memory" "load"))) > + "znver4-direct,znver4-load,znver4-ieu0|znver4-bru0") > + > +(define_insn_reservation "znver4_branch_vector" 2 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ibr") > + (eq_attr "memory" "none,unknown"))) > + "znver4-vector,znver4-ivector*2") > + > +(define_insn_reservation "znver4_branch_vector_load" 6 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ibr") > + (eq_attr "memory" "load"))) > + "znver4-vector,znver4-load,znver4-ivector*2") > + > +;; LEA instruction with simple addressing > +(define_insn_reservation "znver4_lea" 1 > + (and (eq_attr "cpu" "znver4") > + (eq_attr "type" "lea")) > + "znver4-direct,znver4-ieu") > + > +;; Leave > +(define_insn_reservation "znver4_leave" 1 > + (and (eq_attr "cpu" "znver4") > + (eq_attr "type" "leave")) > + "znver4-double,znver4-ieu,znver4-store") > + > +;; STR and ISHIFT are microcoded. > +(define_insn_reservation "znver4_str" 3 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "str") > + (eq_attr "memory" "none"))) > + "znver4-vector,znver4-ivector*3") > + > +(define_insn_reservation "znver4_str_load" 7 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "str") > + (eq_attr "memory" "load"))) > + "znver4-vector,znver4-load,znver4-ivector*3") > + > +(define_insn_reservation "znver4_ishift" 2 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ishift") > + (eq_attr "memory" "none"))) > + "znver4-vector,znver4-ivector*2") > + > +(define_insn_reservation "znver4_ishift_load" 6 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ishift") > + (eq_attr "memory" "load"))) > + "znver4-vector,znver4-load,znver4-ivector*2") > + > +;; Other vector type > +(define_insn_reservation "znver4_ieu_vector" 5 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "other,multi") > + (eq_attr "memory" "none,unknown"))) > + "znver4-vector,znver4-ivector*5") > + > +(define_insn_reservation "znver4_ieu_vector_load" 9 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "other,multi") > + (eq_attr "memory" "load"))) > + "znver4-vector,znver4-load,znver4-ivector*5") > + > +;; Floating Point > +;; FP movs > +(define_insn_reservation "znver4_fp_cmov" 4 > + (and (eq_attr "cpu" "znver4") > + (eq_attr "type" "fcmov")) > + "znver4-vector,znver4-fvector*3") > + > +(define_insn_reservation "znver4_fp_mov_direct" 1 > + (and (eq_attr "cpu" "znver4") > + (eq_attr "type" "fmov")) > + "znver4-direct,znver4-fpu0|znver4-fpu1") > + > +;;FLD > +(define_insn_reservation "znver4_fp_mov_direct_load" 6 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "znver1_decode" "direct") > + (and (eq_attr "type" "fmov") > + (eq_attr "memory" "load")))) > + "znver4-direct,znver4-load,znver4-fpu0|znver4-fpu1") > + > +;;FST > +(define_insn_reservation "znver4_fp_mov_direct_store" 6 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "znver1_decode" "direct") > + (and (eq_attr "type" "fmov") > + (eq_attr "memory" "store")))) > + "znver4-direct,znver4-fpu0|znver4-fpu1,znver4-fp-store") > + > +;;FILD > +(define_insn_reservation "znver4_fp_mov_double_load" 13 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "znver1_decode" "double") > + (and (eq_attr "type" "fmov") > + (eq_attr "memory" "load")))) > + "znver4-direct,znver4-load,znver4-fpu1") > + > +;;FIST > +(define_insn_reservation "znver4_fp_mov_double_store" 7 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "znver1_decode" "double") > + (and (eq_attr "type" "fmov") > + (eq_attr "memory" "store")))) > + "znver4-double,znver4-fpu1,znver4-fp-store") > + > +;; FSQRT > +(define_insn_reservation "znver4_fsqrt" 22 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "fpspc") > + (and (eq_attr "mode" "XF") > + (eq_attr "memory" "none")))) > + "znver4-direct,znver4-fdiv*10") > + > +;; FPSPC instructions > +(define_insn_reservation "znver4_fp_spc" 6 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "fpspc") > + (eq_attr "memory" "none"))) > + "znver4-vector,znver4-fvector*6") > + > +(define_insn_reservation "znver4_fp_insn_vector" 6 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "znver1_decode" "vector") > + (eq_attr "type" "mmxcvt,sselog1,ssemov"))) > + "znver4-vector,znver4-fvector*6") > + > +;; FADD, FSUB, FMUL > +(define_insn_reservation "znver4_fp_op_mul" 7 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "fop,fmul") > + (eq_attr "memory" "none"))) > + "znver4-direct,znver4-fpu0") > + > +(define_insn_reservation "znver4_fp_op_mul_load" 12 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "fop,fmul") > + (eq_attr "memory" "load"))) > + "znver4-direct,znver4-load,znver4-fpu0") > + > +;; FDIV > +(define_insn_reservation "znver4_fp_div" 15 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "fdiv") > + (eq_attr "memory" "none"))) > + "znver4-direct,znver4-fdiv*6") > + > +(define_insn_reservation "znver4_fp_div_load" 20 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "fdiv") > + (eq_attr "memory" "load"))) > + "znver4-direct,znver4-load,znver4-fdiv*6") > + > +(define_insn_reservation "znver4_fp_idiv_load" 24 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "fdiv") > + (and (eq_attr "fp_int_src" "true") > + (eq_attr "memory" "load")))) > + "znver4-double,znver4-load,znver4-fdiv*6") > + > +;; FABS, FCHS > +(define_insn_reservation "znver4_fp_fsgn" 1 > + (and (eq_attr "cpu" "znver4") > + (eq_attr "type" "fsgn")) > + "znver4-direct,znver4-fpu0|znver4-fpu1") > + > +;; FCMP > +(define_insn_reservation "znver4_fp_fcmp" 3 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "fcmp") > + (eq_attr "memory" "none"))) > + "znver4-direct,znver4-fpu1") > + > +(define_insn_reservation "znver4_fp_fcmp_double" 4 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "fcmp") > + (and (eq_attr "znver1_decode" "double") > + (eq_attr "memory" "none")))) > + "znver4-double,znver4-fpu1,znver4-fpu2") > + > +;; MMX, SSE, SSEn.n instructions > +(define_insn_reservation "znver4_fp_mmx " 1 > + (and (eq_attr "cpu" "znver4") > + (eq_attr "type" "mmx")) > + "znver4-direct,znver4-fpu1|znver4-fpu2") > + > +(define_insn_reservation "znver4_mmx_add_cmp" 1 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "mmxadd,mmxcmp") > + (eq_attr "memory" "none"))) > + "znver4-direct,znver4-fpu") > + > +(define_insn_reservation "znver4_mmx_add_cmp_load" 6 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "mmxadd,mmxcmp") > + (eq_attr "memory" "load"))) > + "znver4-direct,znver4-load,znver4-fpu") > + > +(define_insn_reservation "znver4_mmx_insn" 1 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "mmxcvt,sseshuf,sseshuf1,mmxshft") > + (eq_attr "memory" "none"))) > + "znver4-direct,znver4-fpu1|znver4-fpu2") > + > +(define_insn_reservation "znver4_mmx_insn_load" 6 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "mmxcvt,sseshuf,sseshuf1,mmxshft") > + (eq_attr "memory" "load"))) > + "znver4-direct,znver4-load,znver4-fpu1|znver4-fpu2") > + > +(define_insn_reservation "znver4_mmx_mov" 1 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "mmxmov") > + (eq_attr "memory" "store"))) > + "znver4-direct,znver4-fp-store") > + > +(define_insn_reservation "znver4_mmx_mov_load" 6 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "mmxmov") > + (eq_attr "memory" "both"))) > + "znver4-direct,znver4-load,znver4-fp-store") > + > +(define_insn_reservation "znver4_mmx_mul" 3 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "mmxmul") > + (eq_attr "memory" "none"))) > + "znver4-direct,znver4-fpu0|znver4-fpu3") > + > +(define_insn_reservation "znver4_mmx_mul_load" 8 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "mmxmul") > + (eq_attr "memory" "load"))) > + "znver4-direct,znver4-load,znver4-fpu0|znver4-fpu3") > + > +;; AVX instructions > +(define_insn_reservation "znver4_sse_log" 1 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "sselog") > + (and (eq_attr "mode" "V4SF,V8SF,V2DF,V4DF,QI,HI,SI,DI,TI,OI") > + (eq_attr "memory" "none")))) > + "znver4-direct,znver4-fpu") > + > +(define_insn_reservation "znver4_sse_log_load" 6 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "sselog") > + (and (eq_attr "mode" "V4SF,V8SF,V2DF,V4DF,QI,HI,SI,DI,TI,OI") > + (eq_attr "memory" "load")))) > + "znver4-direct,znver4-load,znver4-fpu") > + > +(define_insn_reservation "znver4_sse_log1" 1 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "sselog1") > + (and (eq_attr "mode" "V4SF,V8SF,V2DF,V4DF,QI,HI,SI,DI,TI,OI") > + (eq_attr "memory" "store")))) > + "znver4-direct,znver4-fpu1|znver4-fpu2,znver4-fp-store") > + > +(define_insn_reservation "znver4_sse_log1_load" 6 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "sselog1") > + (and (eq_attr "mode" "V4SF,V8SF,V2DF,V4DF,QI,HI,SI,DI,TI,OI") > + (eq_attr "memory" "both")))) > + "znver4-direct,znver4-load,znver4-fpu1|znver4-fpu2,znver4-fp-store") > + > +(define_insn_reservation "znver4_sse_comi" 1 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssecomi") > + (eq_attr "memory" "store"))) > + "znver4-double,znver4-fpu2|znver4-fpu3,znver4-fp-store") > + > +(define_insn_reservation "znver4_sse_comi_load" 6 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssecomi") > + (eq_attr "memory" "both"))) > + "znver4-double,znver4-load,znver4-fpu2|znver4-fpu3,znver4-fp-store") > + > +(define_insn_reservation "znver4_sse_test" 1 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "prefix_extra" "1") > + (and (eq_attr "type" "ssecomi") > + (eq_attr "memory" "none")))) > + "znver4-direct,znver4-fpu1|znver4-fpu2") > + > +(define_insn_reservation "znver4_sse_test_load" 6 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "prefix_extra" "1") > + (and (eq_attr "type" "ssecomi") > + (eq_attr "memory" "load")))) > + "znver4-direct,znver4-load,znver4-fpu1|znver4-fpu2") > + > +(define_insn_reservation "znver4_sse_imul" 3 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "sseimul") > + (and (eq_attr "mode" "QI,HI,SI,DI,TI,OI") > + (eq_attr "memory" "none")))) > + "znver4-direct,znver4-fpu0|znver4-fpu3") > + > +(define_insn_reservation "znver4_sse_imul_load" 8 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "sseimul") > + (and (eq_attr "mode" "QI,HI,SI,DI,TI,OI") > + (eq_attr "memory" "load")))) > + "znver4-direct,znver4-load,znver4-fpu0|znver4-fpu1") > + > +(define_insn_reservation "znver4_sse_mov" 1 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssemov") > + (and (eq_attr "mode" "QI,HI,SI,DI,TI,OI") > + (eq_attr "memory" "none")))) > + "znver4-direct,znver4-fpu1|znver4-fpu2") > + > +(define_insn_reservation "znver4_sse_mov_load" 6 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssemov") > + (and (eq_attr "mode" "QI,HI,SI,DI,TI,OI") > + (eq_attr "memory" "load")))) > + "znver4-direct,znver4-load,znver4-fpu1|znver4-fpu2") > + > +(define_insn_reservation "znver4_sse_mov_store" 1 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssemov") > + (and (eq_attr "mode" "QI,HI,SI,DI,TI,OI") > + (eq_attr "memory" "store")))) > + "znver4-direct,znver4-fpu1|znver4-fpu2,znver4-fp-store") > + > +(define_insn_reservation "znver4_sse_mov_fp" 1 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssemov") > + (and (eq_attr "mode" "V16SF,V8DF,V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") > + (eq_attr "memory" "none")))) > + "znver4-direct,znver4-fpu") > + > +(define_insn_reservation "znver4_sse_mov_fp_load" 6 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssemov") > + (and (eq_attr "mode" "V16SF,V8DF,V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") > + (eq_attr "memory" "load")))) > + "znver4-direct,znver4-load,znver4-fpu") > + > +(define_insn_reservation "znver4_sse_mov_fp_store" 1 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssemov") > + (and (eq_attr "mode" "V16SF,V8DF,V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") > + (eq_attr "memory" "store")))) > + "znver4-direct,znver4-fp-store") > + > +(define_insn_reservation "znver4_sse_add" 3 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "sseadd") > + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") > + (eq_attr "memory" "none")))) > + "znver4-direct,znver4-fpu2|znver4-fpu3") > + > +(define_insn_reservation "znver4_sse_add_load" 8 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "sseadd") > + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") > + (eq_attr "memory" "load")))) > + "znver4-direct,znver4-load,znver4-fpu2|znver4-fpu3") > + > +(define_insn_reservation "znver4_sse_add1" 4 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "sseadd1") > + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") > + (eq_attr "memory" "none")))) > + "znver4-vector,znver4-fvector*2") > + > +(define_insn_reservation "znver4_sse_add1_load" 9 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "sseadd1") > + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") > + (eq_attr "memory" "load")))) > + "znver4-vector,znver4-load,znver4-fvector*2") > + > +(define_insn_reservation "znver4_sse_iadd" 1 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "sseiadd") > + (and (eq_attr "mode" "QI,HI,SI,DI,TI,OI") > + (eq_attr "memory" "none")))) > + "znver4-direct,znver4-fpu") > + > +(define_insn_reservation "znver4_sse_iadd_load" 6 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "sseiadd") > + (and (eq_attr "mode" "QI,HI,SI,DI,TI,OI") > + (eq_attr "memory" "load")))) > + "znver4-direct,znver4-load,znver4-fpu") > + > +(define_insn_reservation "znver4_sse_mul" 3 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssemul") > + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") > + (eq_attr "memory" "none")))) > + "znver4-direct,znver4-fpu0|znver4-fpu1") > + > +(define_insn_reservation "znver4_sse_mul_load" 8 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssemul") > + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") > + (eq_attr "memory" "load")))) > + "znver4-direct,znver4-load,znver4-fpu0|znver4-fpu1") > + > +(define_insn_reservation "znver4_sse_div_pd" 13 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssediv") > + (and (eq_attr "mode" "V4DF,V2DF,V1DF") > + (eq_attr "memory" "none")))) > + "znver4-direct,znver4-fdiv*5") > + > +(define_insn_reservation "znver4_sse_div_ps" 10 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssediv") > + (and (eq_attr "mode" "V8SF,V4SF,V2SF,SF") > + (eq_attr "memory" "none")))) > + "znver4-direct,znver4-fdiv*3") > + > +(define_insn_reservation "znver4_sse_div_pd_load" 18 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssediv") > + (and (eq_attr "mode" "V4DF,V2DF,V1DF") > + (eq_attr "memory" "load")))) > + "znver4-direct,znver4-load,znver4-fdiv*5") > + > +(define_insn_reservation "znver4_sse_div_ps_load" 15 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssediv") > + (and (eq_attr "mode" "V8SF,V4SF,V2SF,SF") > + (eq_attr "memory" "load")))) > + "znver4-direct,znver4-load,znver4-fdiv*3") > + > +(define_insn_reservation "znver4_sse_cmp_avx" 1 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssecmp") > + (and (eq_attr "prefix" "vex") > + (eq_attr "memory" "none")))) > + "znver4-direct,znver4-fpu0|znver4-fpu1") > + > +(define_insn_reservation "znver4_sse_cmp_avx_load" 6 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssecmp") > + (and (eq_attr "prefix" "vex") > + (eq_attr "memory" "load")))) > + "znver4-direct,znver4-load,znver4-fpu0|znver4-fpu1") > + > +(define_insn_reservation "znver4_sse_comi_avx" 1 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssecomi") > + (eq_attr "memory" "store"))) > + "znver4-direct,znver4-fpu2+znver4-fpu3,znver4-fp-store") > + > +(define_insn_reservation "znver4_sse_comi_avx_load" 6 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssecomi") > + (eq_attr "memory" "both"))) > + "znver4-direct,znver4-load,znver4-fpu2+znver4-fpu3,znver4-fp-store") > + > +(define_insn_reservation "znver4_sse_cvt" 3 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssecvt") > + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") > + (eq_attr "memory" "none")))) > + "znver4-direct,znver4-fpu2|znver4-fpu3") > + > +(define_insn_reservation "znver4_sse_cvt_load" 8 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssecvt") > + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") > + (eq_attr "memory" "load")))) > + "znver4-direct,znver4-load,znver4-fpu2|znver4-fpu3") > + > +(define_insn_reservation "znver4_sse_icvt" 3 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssecvt") > + (and (eq_attr "mode" "SI") > + (eq_attr "memory" "none")))) > + "znver4-direct,znver4-fpu2|znver4-fpu3") > + > +(define_insn_reservation "znver4_sse_icvt_store" 4 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssecvt") > + (and (eq_attr "mode" "SI") > + (eq_attr "memory" "store")))) > + "znver4-double,znver4-fpu2|znver4-fpu3,znver4-fp-store") > + > +(define_insn_reservation "znver4_sse_shuf" 1 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "sseshuf") > + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") > + (eq_attr "memory" "none")))) > + "znver4-direct,znver4-fpu1|znver4-fpu2") > + > +(define_insn_reservation "znver4_sse_shuf_load" 6 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "sseshuf") > + (and (eq_attr "mode" "V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,SF") > + (eq_attr "memory" "load")))) > + "znver4-direct,znver4-load,znver4-fpu") > + > +(define_insn_reservation "znver4_sse_ishuf" 3 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "sseshuf") > + (and (eq_attr "mode" "OI") > + (eq_attr "memory" "none")))) > + "znver4-direct,znver4-fpu1|znver4-fpu2") > + > +(define_insn_reservation "znver4_sse_ishuf_load" 8 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "sseshuf") > + (and (eq_attr "mode" "OI") > + (eq_attr "memory" "load")))) > + "znver4-direct,znver4-load,znver4-fpu1|znver4-fpu2") > + > +;; AVX512 instructions > +(define_insn_reservation "znver4_sse_log_evex" 1 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "sselog") > + (and (eq_attr "mode" "V16SF,V8DF,XI") > + (eq_attr "memory" "none")))) > + "znver4-direct,znver4-fpu0*2|znver4-fpu1*2|znver4-fpu2*2|znver4-fpu3*2") > + > +(define_insn_reservation "znver4_sse_log_evex_load" 7 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "sselog") > + (and (eq_attr "mode" "V16SF,V8DF,XI") > + (eq_attr "memory" "load")))) > + "znver4-direct,znver4-load,znver4-fpu0*2|znver4-fpu1*2|znver4-fpu2*2|znver4-fpu3*2") > + > +(define_insn_reservation "znver4_sse_log1_evex" 1 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "sselog1") > + (and (eq_attr "mode" "V16SF,V8DF,XI") > + (eq_attr "memory" "none")))) > + "znver4-direct,znver4-fpu1*2|znver4-fpu2*2,znver4-fp-store") > + > +(define_insn_reservation "znver4_sse_log1_evex_load" 7 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "sselog1") > + (and (eq_attr "mode" "V16SF,V8DF,XI") > + (eq_attr "memory" "load")))) > + "znver4-direct,znver4-load,znver4-fpu1*2|znver4-fpu2*2,znver4-fp-store") > + > +(define_insn_reservation "znver4_sse_mul_evex" 3 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssemul") > + (and (eq_attr "mode" "V16SF,V8DF") > + (eq_attr "memory" "none")))) > + "znver4-direct,znver4-fpu0*2|znver4-fpu1*2") > + > +(define_insn_reservation "znver4_sse_mul_evex_load" 9 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssemul") > + (and (eq_attr "mode" "V16SF,V8DF") > + (eq_attr "memory" "load")))) > + "znver4-direct,znver4-load,znver4-fpu0*2|znver4-fpu1*2") > + > +(define_insn_reservation "znver4_sse_imul_evex" 3 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "sseimul") > + (and (eq_attr "mode" "XI") > + (eq_attr "memory" "none")))) > + "znver4-direct,znver4-fpu0*2|znver4-fpu3*2") > + > +(define_insn_reservation "znver4_sse_imul_evex_load" 9 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "sseimul") > + (and (eq_attr "mode" "XI") > + (eq_attr "memory" "load")))) > + "znver4-direct,znver4-load,znver4-fpu0*2|znver4-fpu1*2") > + > +(define_insn_reservation "znver4_sse_mov_evex" 4 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssemov") > + (and (eq_attr "mode" "XI") > + (eq_attr "memory" "none")))) > + "znver4-direct,znver4-fpu1*2|znver4-fpu2*2") > + > +(define_insn_reservation "znver4_sse_mov_evex_load" 10 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssemov") > + (and (eq_attr "mode" "XI") > + (eq_attr "memory" "load")))) > + "znver4-direct,znver4-load,znver4-fpu1*2|znver4-fpu2*2") > + > +(define_insn_reservation "znver4_sse_mov_evex_store" 5 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssemov") > + (and (eq_attr "mode" "XI") > + (eq_attr "memory" "store")))) > + "znver4-direct,znver4-fpu1*2|znver4-fpu2*2,znver4-fp-store") > + > +(define_insn_reservation "znver4_sse_add_evex" 3 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "sseadd") > + (and (eq_attr "mode" "V16SF,V8DF") > + (eq_attr "memory" "none")))) > + "znver4-direct,znver4-fpu2*2|znver4-fpu3*2") > + > +(define_insn_reservation "znver4_sse_add_evex_load" 9 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "sseadd") > + (and (eq_attr "mode" "V16SF,V8DF") > + (eq_attr "memory" "load")))) > + "znver4-direct,znver4-load,znver4-fpu2*2|znver4-fpu3*2") > + > +(define_insn_reservation "znver4_sse_iadd_evex" 1 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "sseiadd") > + (and (eq_attr "mode" "XI") > + (eq_attr "memory" "none")))) > + "znver4-direct,znver4-fpu0*2|znver4-fpu1*2|znver4-fpu2*2|znver4-fpu3*2") > + > +(define_insn_reservation "znver4_sse_iadd_evex_load" 7 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "sseiadd") > + (and (eq_attr "mode" "XI") > + (eq_attr "memory" "load")))) > + "znver4-direct,znver4-load,znver4-fpu0*2|znver4-fpu1*2|znver4-fpu2*2|znver4-fpu3*2") > + > +(define_insn_reservation "znver4_sse_div_pd_evex" 13 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssediv") > + (and (eq_attr "mode" "V8DF") > + (eq_attr "memory" "none")))) > + "znver4-direct,znver4-fdiv*9") > + > +(define_insn_reservation "znver4_sse_div_ps_evex" 10 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssediv") > + (and (eq_attr "mode" "V16SF") > + (eq_attr "memory" "none")))) > + "znver4-direct,znver4-fdiv*6") > + > +(define_insn_reservation "znver4_sse_div_pd_evex_load" 19 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssediv") > + (and (eq_attr "mode" "V8DF") > + (eq_attr "memory" "load")))) > + "znver4-direct,znver4-load,znver4-fdiv*9") > + > +(define_insn_reservation "znver4_sse_div_ps_evex_load" 16 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssediv") > + (and (eq_attr "mode" "V16SF") > + (eq_attr "memory" "load")))) > + "znver4-direct,znver4-load,znver4-fdiv*6") > + > +(define_insn_reservation "znver4_sse_cmp_avx128" 3 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssecmp") > + (and (eq_attr "mode" "V4SF,V2DF,V2SF,V1DF,SF") > + (and (eq_attr "prefix" "evex") > + (eq_attr "memory" "none"))))) > + "znver4-direct,znver4-fpu0*2|znver4-fpu1*2") > + > +(define_insn_reservation "znver4_sse_cmp_avx128_load" 9 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssecmp") > + (and (eq_attr "mode" "V4SF,V2DF,V2SF,V1DF,SF") > + (and (eq_attr "prefix" "evex") > + (eq_attr "memory" "load"))))) > + "znver4-direct,znver4-load,znver4-fpu0*2|znver4-fpu1*2") > + > +(define_insn_reservation "znver4_sse_cmp_avx256" 4 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssecmp") > + (and (eq_attr "mode" "V8SF,V4DF") > + (and (eq_attr "prefix" "evex") > + (eq_attr "memory" "none"))))) > + "znver4-direct,znver4-fpu0*2|znver4-fpu1*2") > + > +(define_insn_reservation "znver4_sse_cmp_avx256_load" 10 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssecmp") > + (and (eq_attr "mode" "V8SF,V4DF") > + (and (eq_attr "prefix" "evex") > + (eq_attr "memory" "load"))))) > + "znver4-direct,znver4-load,znver4-fpu0*2|znver4-fpu1*2") > + > +(define_insn_reservation "znver4_sse_cmp_avx512" 5 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssecmp") > + (and (eq_attr "mode" "V16SF,V8DF") > + (and (eq_attr "prefix" "evex") > + (eq_attr "memory" "none"))))) > + "znver4-direct,znver4-fpu0*2|znver4-fpu1*2") > + > +(define_insn_reservation "znver4_sse_cmp_avx512_load" 11 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssecmp") > + (and (eq_attr "mode" "V16SF,V8DF") > + (and (eq_attr "prefix" "evex") > + (eq_attr "memory" "load"))))) > + "znver4-direct,znver4-load,znver4-fpu0*2|znver4-fpu1*2") > + > +(define_insn_reservation "znver4_sse_cvt_evex" 6 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssecvt") > + (and (eq_attr "mode" "V16SF,V8DF") > + (eq_attr "memory" "none")))) > + "znver4-direct,znver4-fpu1*2|znver4-fpu2*2,znver4-fpu2*2|znver4-fpu3*2") > + > +(define_insn_reservation "znver4_sse_cvt_evex_load" 12 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssecvt") > + (and (eq_attr "mode" "V16SF,V8DF") > + (eq_attr "memory" "load")))) > + "znver4-direct,znver4-load,znver4-fpu1*2|znver4-fpu2*2,znver4-fpu2*2|znver4-fpu3*2") > + > +(define_insn_reservation "znver4_sse_shuf_evex" 1 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "sseshuf") > + (and (eq_attr "mode" "V16SF,V8DF") > + (eq_attr "memory" "none")))) > + "znver4-direct,znver4-fpu0*2|znver4-fpu1*2|znver4-fpu2*2|znver4-fpu3*2") > + > +(define_insn_reservation "znver4_sse_shuf_evex_load" 7 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "sseshuf") > + (and (eq_attr "mode" "V16SF,V8DF") > + (eq_attr "memory" "load")))) > + "znver4-direct,znver4-load,znver4-fpu0*2|znver4-fpu1*2|znver4-fpu2*2|znver4-fpu3*2") > + > +(define_insn_reservation "znver4_sse_ishuf_evex" 4 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "sseshuf") > + (and (eq_attr "mode" "XI") > + (eq_attr "memory" "none")))) > + "znver4-direct,znver4-fpu1*2|znver4-fpu2*2") > + > +(define_insn_reservation "znver4_sse_ishuf_evex_load" 10 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "sseshuf") > + (and (eq_attr "mode" "XI") > + (eq_attr "memory" "load")))) > + "znver4-direct,znver4-load,znver4-fpu1*2|znver4-fpu2*2") > + > +(define_insn_reservation "znver4_sse_muladd" 4 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssemuladd") > + (eq_attr "memory" "none"))) > + "znver4-direct,znver4-fpu0*2|znver4-fpu1*2") > + > +(define_insn_reservation "znver4_sse_muladd_load" 10 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "sseshuf") > + (eq_attr "memory" "load"))) > + "znver4-direct,znver4-load,znver4-fpu0*2|znver4-fpu1*2") > + > +;; AVX512 mask instructions > + > +(define_insn_reservation "znver4_sse_mskmov" 2 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "mskmov") > + (eq_attr "memory" "none"))) > + "znver4-direct,znver4-fpu0*2|znver4-fpu1*2") > + > +(define_insn_reservation "znver4_sse_msklog" 1 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "msklog") > + (eq_attr "memory" "none"))) > + "znver4-direct,znver4-fpu2*2|znver4-fpu3*2") > -- ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH][X86_64] Separate znver4 insn reservations from older znvers 2023-01-03 14:36 ` Jan Hubicka @ 2023-01-03 14:52 ` Alexander Monakov 2023-01-03 18:28 ` Jan Hubicka 2023-01-05 5:52 ` Joshi, Tejas Sanjay 1 sibling, 1 reply; 14+ messages in thread From: Alexander Monakov @ 2023-01-03 14:52 UTC (permalink / raw) To: Jan Hubicka; +Cc: Joshi, Tejas Sanjay, gcc-patches, Kumar, Venkataramanan On Tue, 3 Jan 2023, Jan Hubicka wrote: > > * gcc/common/config/i386/i386-common.cc (processor_alias_table): > > Use CPU_ZNVER4 for znver4. > > * config/i386/i386.md: Add znver4.md. > > * config/i386/znver4.md: New. > OK, > thanks! Honza, I'm curious what are your further plans for this, you mentioned merging znver4.md back in znver.md if I recall correctly? Alexander ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH][X86_64] Separate znver4 insn reservations from older znvers 2023-01-03 14:52 ` Alexander Monakov @ 2023-01-03 18:28 ` Jan Hubicka 0 siblings, 0 replies; 14+ messages in thread From: Jan Hubicka @ 2023-01-03 18:28 UTC (permalink / raw) To: Alexander Monakov; +Cc: Joshi, Tejas Sanjay, gcc-patches, Kumar, Venkataramanan > > On Tue, 3 Jan 2023, Jan Hubicka wrote: > > > > * gcc/common/config/i386/i386-common.cc (processor_alias_table): > > > Use CPU_ZNVER4 for znver4. > > > * config/i386/i386.md: Add znver4.md. > > > * config/i386/znver4.md: New. > > OK, > > thanks! > > Honza, I'm curious what are your further plans for this, you mentioned > merging znver4.md back in znver.md if I recall correctly? I was looking into that over Christmas (and it was also reason for my first pass through where I was asking for various differences). There are number of small divergences between znver.md and znver4.md that seem to make the merged automaton bigger than having two automatons. So merging both meaningfuly would mean modifying znver1-3 model or znver4 models. With Tejas I think we mostly verified that the areas znver4 modes is different from znver1-3 are correct for znver4 and sometimes also for znver3 (for example the branching unit is present already there but not bodelled). Splitting znver1-3 and 4 is definitly not optimal. However given the time constrains and desire to not break znver1-3 I think going with znver4.md is good option at least for GCC12/13. Overall I am not sure how beneficial the model overall is: since we schedule on BB basis and model CPU as in-order with no register renaming, the scheduler has rarely chance to fill most of execution units and de-facto optimizes for wastly different CPU than reality is). We get noticebale SPEC perfomance boost for -fschedule-insns2 but it seems to be mostly for scheduling for latencies. LLVM's model seems to do more than we do, but comparing both compilers I was not really able to tell if either of them get noticeable benefit from the actual model of reservation units (and not only latencies). I would welcome toughts/ideas/measurements on this. Honza > > Alexander ^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: [PATCH][X86_64] Separate znver4 insn reservations from older znvers 2023-01-03 14:36 ` Jan Hubicka 2023-01-03 14:52 ` Alexander Monakov @ 2023-01-05 5:52 ` Joshi, Tejas Sanjay 1 sibling, 0 replies; 14+ messages in thread From: Joshi, Tejas Sanjay @ 2023-01-05 5:52 UTC (permalink / raw) To: Jan Hubicka, gcc-patches; +Cc: Alexander Monakov, Kumar, Venkataramanan [Public] Hello, > OK, > thanks! > Honza Thanks! We have pushed the patch. Regards, Tejas ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2023-01-05 5:52 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-11-14 16:18 [PATCH][X86_64] Separate znver4 insn reservations from older znvers Joshi, Tejas Sanjay 2022-11-14 18:51 ` Alexander Monakov 2022-11-15 12:08 ` Joshi, Tejas Sanjay 2022-11-15 12:51 ` Alexander Monakov 2022-11-21 11:40 ` Joshi, Tejas Sanjay 2022-11-21 15:30 ` Alexander Monakov 2022-12-01 11:28 ` Joshi, Tejas Sanjay 2022-12-01 19:01 ` Alexander Monakov 2022-12-12 21:41 ` Jan Hubička 2022-12-22 17:34 ` Joshi, Tejas Sanjay 2023-01-03 14:36 ` Jan Hubicka 2023-01-03 14:52 ` Alexander Monakov 2023-01-03 18:28 ` Jan Hubicka 2023-01-05 5:52 ` Joshi, Tejas Sanjay
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).