* [PATCH] [x86_64]: Zhaoxin shijidadao enablement
@ 2024-05-27 8:32 MayShao
2024-05-28 6:15 ` Uros Bizjak
0 siblings, 1 reply; 4+ messages in thread
From: MayShao @ 2024-05-27 8:32 UTC (permalink / raw)
To: hubicka, ubizjak
Cc: gcc-patches, louisqi, hawkwang, silviazhao, xiangjiehong, timhu, mayshao
From: mayshao <mayshao-oc@zhaoxin.com>
Hi all:
This patch enables -march/-mtune=shijidadao, costs and tunings are set according to the characteristics of the processor.
Bootstrapped /regtested X86_64.
Ok for trunk?
BR
Mayshao
gcc/ChangeLog:
* common/config/i386/cpuinfo.h (get_zhaoxin_cpu): Recognize shijidadao.
* common/config/i386/i386-common.cc: Add shijidadao.
* common/config/i386/i386-cpuinfo.h (enum processor_subtypes):
Add ZHAOXIN_FAM7H_SHIJIDADAO.
* config.gcc: Add shijidadao.
* config/i386/driver-i386.cc (host_detect_local_cpu):
Let -march=native recognize shijidadao processors.
* config/i386/i386-c.cc (ix86_target_macros_internal): Add shijidadao.
* config/i386/i386-options.cc (m_ZHAOXIN): Add m_SHIJIDADAO.
(m_SHIJIDADAO): New definition.
* config/i386/i386.h (enum processor_type): Add PROCESSOR_SHIJIDADAO.
* config/i386/x86-tune-costs.h (struct processor_costs):
Add shijidadao_cost.
* config/i386/x86-tune-sched.cc (ix86_issue_rate): Add shijidadao.
(ix86_adjust_cost): Ditto.
* config/i386/x86-tune.def (X86_TUNE_USE_GATHER_2PARTS): Add m_SHIJIDADAO.
(X86_TUNE_USE_GATHER_4PARTS): Ditto.
(X86_TUNE_USE_GATHER_8PARTS): Ditto.
(X86_TUNE_AVOID_128FMA_CHAINS): Ditto.
* doc/extend.texi: Add details about shijidadao.
* doc/invoke.texi: Ditto.
gcc/testsuite/ChangeLog:
* g++.target/i386/mv32.C: Handle new -march
* gcc.target/i386/funcspec-56.inc: Ditto.
---
gcc/common/config/i386/cpuinfo.h | 8 +-
gcc/common/config/i386/i386-common.cc | 8 +-
gcc/common/config/i386/i386-cpuinfo.h | 1 +
gcc/config.gcc | 14 ++-
gcc/config/i386/driver-i386.cc | 11 +-
gcc/config/i386/i386-c.cc | 7 ++
gcc/config/i386/i386-options.cc | 4 +-
gcc/config/i386/i386.h | 1 +
gcc/config/i386/x86-tune-costs.h | 116 ++++++++++++++++++
gcc/config/i386/x86-tune-sched.cc | 2 +
gcc/config/i386/x86-tune.def | 8 +-
gcc/doc/extend.texi | 3 +
gcc/doc/invoke.texi | 6 +
gcc/testsuite/g++.target/i386/mv32.C | 6 +
gcc/testsuite/gcc.target/i386/funcspec-56.inc | 2 +
15 files changed, 183 insertions(+), 14 deletions(-)
diff --git a/gcc/common/config/i386/cpuinfo.h b/gcc/common/config/i386/cpuinfo.h
index 4610bf6d6a4..936039725ab 100644
--- a/gcc/common/config/i386/cpuinfo.h
+++ b/gcc/common/config/i386/cpuinfo.h
@@ -667,12 +667,18 @@ get_zhaoxin_cpu (struct __processor_model *cpu_model,
reset_cpu_feature (cpu_model, cpu_features2, FEATURE_F16C);
cpu_model->__cpu_subtype = ZHAOXIN_FAM7H_LUJIAZUI;
}
- else if (model >= 0x5b)
+ else if (model == 0x5b)
{
cpu = "yongfeng";
CHECK___builtin_cpu_is ("yongfeng");
cpu_model->__cpu_subtype = ZHAOXIN_FAM7H_YONGFENG;
}
+ else if (model >= 0x6b)
+ {
+ cpu = "shijidadao";
+ CHECK___builtin_cpu_is ("shijidadao");
+ cpu_model->__cpu_subtype = ZHAOXIN_FAM7H_SHIJIDADAO;
+ }
break;
default:
break;
diff --git a/gcc/common/config/i386/i386-common.cc b/gcc/common/config/i386/i386-common.cc
index 895e5fa662d..eb3f94c529c 100644
--- a/gcc/common/config/i386/i386-common.cc
+++ b/gcc/common/config/i386/i386-common.cc
@@ -2066,6 +2066,7 @@ const char *const processor_names[] =
"intel",
"lujiazui",
"yongfeng",
+ "shijidadao",
"geode",
"k6",
"athlon",
@@ -2271,10 +2272,13 @@ const pta processor_alias_table[] =
| PTA_SSSE3 | PTA_SSE4_1 | PTA_FXSR, 0, P_NONE},
{"lujiazui", PROCESSOR_LUJIAZUI, CPU_LUJIAZUI,
PTA_LUJIAZUI,
- M_CPU_SUBTYPE (ZHAOXIN_FAM7H_LUJIAZUI), P_NONE},
+ M_CPU_SUBTYPE (ZHAOXIN_FAM7H_LUJIAZUI), P_PROC_BMI},
{"yongfeng", PROCESSOR_YONGFENG, CPU_YONGFENG,
PTA_YONGFENG,
- M_CPU_SUBTYPE (ZHAOXIN_FAM7H_YONGFENG), P_NONE},
+ M_CPU_SUBTYPE (ZHAOXIN_FAM7H_YONGFENG), P_PROC_AVX2},
+ {"shijidadao", PROCESSOR_SHIJIDADAO, CPU_YONGFENG,
+ PTA_YONGFENG,
+ M_CPU_SUBTYPE (ZHAOXIN_FAM7H_SHIJIDADAO), P_PROC_AVX2},
{"k8", PROCESSOR_K8, CPU_K8,
PTA_64BIT | PTA_MMX | PTA_3DNOW | PTA_3DNOW_A | PTA_SSE
| PTA_SSE2 | PTA_NO_SAHF | PTA_FXSR, 0, P_NONE},
diff --git a/gcc/common/config/i386/i386-cpuinfo.h b/gcc/common/config/i386/i386-cpuinfo.h
index 9edad96d4fd..fa3b76f4931 100644
--- a/gcc/common/config/i386/i386-cpuinfo.h
+++ b/gcc/common/config/i386/i386-cpuinfo.h
@@ -104,6 +104,7 @@ enum processor_subtypes
INTEL_COREI7_PANTHERLAKE,
ZHAOXIN_FAM7H_YONGFENG,
AMDFAM1AH_ZNVER5,
+ ZHAOXIN_FAM7H_SHIJIDADAO,
CPU_SUBTYPE_MAX
};
diff --git a/gcc/config.gcc b/gcc/config.gcc
index a37113bd00a..91a3a3152dc 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -711,9 +711,9 @@ atom slm nehalem westmere sandybridge ivybridge haswell broadwell bonnell \
silvermont skylake-avx512 cannonlake icelake-client icelake-server \
skylake goldmont goldmont-plus tremont cascadelake tigerlake cooperlake \
sapphirerapids alderlake rocketlake eden-x2 nano nano-1000 nano-2000 nano-3000 \
-nano-x2 eden-x4 nano-x4 lujiazui yongfeng x86-64 x86-64-v2 x86-64-v3 x86-64-v4 \
-sierraforest graniterapids graniterapids-d grandridge arrowlake arrowlake-s \
-clearwaterforest pantherlake native"
+nano-x2 eden-x4 nano-x4 lujiazui yongfeng shijidadao x86-64 x86-64-v2 \
+x86-64-v3 x86-64-v4 sierraforest graniterapids graniterapids-d grandridge \
+arrowlake arrowlake-s clearwaterforest pantherlake native"
# Additional x86 processors supported by --with-cpu=. Each processor
# MUST be separated by exactly one space.
@@ -3855,6 +3855,10 @@ case ${target} in
arch=yongfeng
cpu=yongfeng
;;
+ shijidadao-*)
+ arch=shijidadao
+ cpu=shijidadao
+ ;;
pentium2-*)
arch=pentium2
cpu=pentium2
@@ -3980,6 +3984,10 @@ case ${target} in
arch=yongfeng
cpu=yongfeng
;;
+ shijidadao-*)
+ arch=shijidadao
+ cpu=shijidadao
+ ;;
nocona-*)
arch=nocona
cpu=nocona
diff --git a/gcc/config/i386/driver-i386.cc b/gcc/config/i386/driver-i386.cc
index 0176d8b6cd2..11470eaea12 100644
--- a/gcc/config/i386/driver-i386.cc
+++ b/gcc/config/i386/driver-i386.cc
@@ -558,10 +558,12 @@ const char *host_detect_local_cpu (int argc, const char **argv)
switch (family)
{
case 7:
- if (model == 0x3b)
- processor = PROCESSOR_LUJIAZUI;
- else if (model >= 0x5b)
+ if (model >= 0x6b)
+ processor = PROCESSOR_SHIJIDADAO;
+ else if (model == 0x5b)
processor = PROCESSOR_YONGFENG;
+ else if (model == 0x3b)
+ processor = PROCESSOR_LUJIAZUI;
break;
default:
break;
@@ -853,6 +855,9 @@ const char *host_detect_local_cpu (int argc, const char **argv)
case PROCESSOR_YONGFENG:
cpu = "yongfeng";
break;
+ case PROCESSOR_SHIJIDADAO:
+ cpu = "shijidadao";
+ break;
default:
/* Use something reasonable. */
diff --git a/gcc/config/i386/i386-c.cc b/gcc/config/i386/i386-c.cc
index 7b0ad9e9181..403475d5b6b 100644
--- a/gcc/config/i386/i386-c.cc
+++ b/gcc/config/i386/i386-c.cc
@@ -156,6 +156,10 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
def_or_undef (parse_in, "__yongfeng");
def_or_undef (parse_in, "__yongfeng__");
break;
+ case PROCESSOR_SHIJIDADAO:
+ def_or_undef (parse_in, "__shijidadao");
+ def_or_undef (parse_in, "__shijidadao__");
+ break;
case PROCESSOR_PENTIUM4:
def_or_undef (parse_in, "__pentium4");
def_or_undef (parse_in, "__pentium4__");
@@ -386,6 +390,9 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
case PROCESSOR_YONGFENG:
def_or_undef (parse_in, "__tune_yongfeng__");
break;
+ case PROCESSOR_SHIJIDADAO:
+ def_or_undef (parse_in, "__tune_shijidadao__");
+ break;
case PROCESSOR_PENTIUM4:
def_or_undef (parse_in, "__tune_pentium4__");
break;
diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
index 78602a17f7e..2af869d0ecf 100644
--- a/gcc/config/i386/i386-options.cc
+++ b/gcc/config/i386/i386-options.cc
@@ -155,7 +155,8 @@ along with GCC; see the file COPYING3. If not see
#define m_LUJIAZUI (HOST_WIDE_INT_1U<<PROCESSOR_LUJIAZUI)
#define m_YONGFENG (HOST_WIDE_INT_1U<<PROCESSOR_YONGFENG)
-#define m_ZHAOXIN (m_LUJIAZUI | m_YONGFENG)
+#define m_SHIJIDADAO (HOST_WIDE_INT_1U<<PROCESSOR_SHIJIDADAO)
+#define m_ZHAOXIN (m_LUJIAZUI | m_YONGFENG | m_SHIJIDADAO)
#define m_GEODE (HOST_WIDE_INT_1U<<PROCESSOR_GEODE)
#define m_K6 (HOST_WIDE_INT_1U<<PROCESSOR_K6)
@@ -793,6 +794,7 @@ static const struct processor_costs *processor_cost_table[] =
&intel_cost,
&lujiazui_cost,
&yongfeng_cost,
+ &shijidadao_cost,
&geode_cost,
&k6_cost,
&athlon_cost,
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 359a8408263..d8c225cacdd 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -2301,6 +2301,7 @@ enum processor_type
PROCESSOR_INTEL,
PROCESSOR_LUJIAZUI,
PROCESSOR_YONGFENG,
+ PROCESSOR_SHIJIDADAO,
PROCESSOR_GEODE,
PROCESSOR_K6,
PROCESSOR_ATHLON,
diff --git a/gcc/config/i386/x86-tune-costs.h b/gcc/config/i386/x86-tune-costs.h
index 65d7d1f7e42..1060b0685a8 100644
--- a/gcc/config/i386/x86-tune-costs.h
+++ b/gcc/config/i386/x86-tune-costs.h
@@ -3644,6 +3644,122 @@ struct processor_costs yongfeng_cost = {
2, /* Small unroll factor. */
};
+/* shijidadao_cost should produce code tuned for ZHAOXIN shijidadao CPU. */
+static stringop_algs shijidadao_memcpy[2] = {
+ {libcall, {{8, unrolled_loop, true}, {256, unrolled_loop, false},
+ {-1, libcall, false}}},
+ {libcall, {{10, loop, true}, {256, unrolled_loop, false},
+ {-1, libcall, false}}}};
+static stringop_algs shijidadao_memset[2] = {
+ {libcall, {{4, loop, true}, {128, unrolled_loop, false},
+ {-1, libcall, false}}},
+ {libcall, {{1, rep_prefix_4_byte, false}, {14, loop, true},
+ {1024, vector_loop, false},
+ {-1, libcall, false}}}};
+static const
+struct processor_costs shijidadao_cost = {
+ {
+ /* Start of register allocator costs. integer->integer move cost is 2. */
+ 8, /* cost for loading QImode using movzbl. */
+ {8, 8, 8}, /* cost of loading integer registers
+ in QImode, HImode and SImode.
+ Relative to reg-reg move (2). */
+ {8, 8, 8}, /* cost of storing integer registers. */
+ 2, /* cost of reg,reg fld/fst. */
+ {8, 8, 8}, /* cost of loading fp registers
+ in SFmode, DFmode and XFmode. */
+ {8, 8, 8}, /* cost of storing fp registers
+ in SFmode, DFmode and XFmode. */
+ 2, /* cost of moving MMX register. */
+ {8, 8}, /* cost of loading MMX registers
+ in SImode and DImode. */
+ {8, 8}, /* cost of storing MMX registers
+ in SImode and DImode. */
+ 2, 3, 4, /* cost of moving XMM,YMM,ZMM register. */
+ {8, 8, 8, 10, 15}, /* cost of loading SSE registers
+ in 32,64,128,256 and 512-bit. */
+ {8, 8, 8, 10, 15}, /* cost of storing SSE registers
+ in 32,64,128,256 and 512-bit. */
+ 8, 8, /* SSE->integer and integer->SSE moves. */
+ 8, 8, /* mask->integer and integer->mask moves. */
+ {8, 8, 8}, /* cost of loading mask register
+ in QImode, HImode, SImode. */
+ {8, 8, 8}, /* cost if storing mask register
+ in QImode, HImode, SImode. */
+ 2, /* cost of moving mask register. */
+ /* End of register allocator costs. */
+ },
+
+ COSTS_N_INSNS (1), /* cost of an add instruction. */
+ COSTS_N_INSNS (1), /* cost of a lea instruction. */
+ COSTS_N_INSNS (1), /* variable shift costs. */
+ COSTS_N_INSNS (1), /* constant shift costs. */
+ {COSTS_N_INSNS (2), /* cost of starting multiply for QI. */
+ COSTS_N_INSNS (3), /* HI. */
+ COSTS_N_INSNS (2), /* SI. */
+ COSTS_N_INSNS (2), /* DI. */
+ COSTS_N_INSNS (3)}, /* other. */
+ 0, /* cost of multiply per each bit set. */
+ {COSTS_N_INSNS (9), /* cost of a divide/mod for QI. */
+ COSTS_N_INSNS (10), /* HI. */
+ COSTS_N_INSNS (9), /* SI. */
+ COSTS_N_INSNS (50), /* DI. */
+ COSTS_N_INSNS (50)}, /* other. */
+ COSTS_N_INSNS (1), /* cost of movsx. */
+ COSTS_N_INSNS (1), /* cost of movzx. */
+ 8, /* "large" insn. */
+ 17, /* MOVE_RATIO. */
+ 6, /* CLEAR_RATIO. */
+ {8, 8, 8}, /* cost of loading integer registers
+ in QImode, HImode and SImode.
+ Relative to reg-reg move (2). */
+ {8, 8, 8}, /* cost of storing integer registers. */
+ {8, 8, 8, 12, 15}, /* cost of loading SSE register
+ in 32bit, 64bit, 128bit, 256bit and 512bit. */
+ {8, 8, 8, 12, 15}, /* cost of storing SSE register
+ in 32bit, 64bit, 128bit, 256bit and 512bit. */
+ {8, 8, 8, 12, 15}, /* cost of unaligned loads. */
+ {8, 8, 8, 12, 15}, /* cost of unaligned storess. */
+ 2, 3, 4, /* cost of moving XMM,YMM,ZMM register. */
+ 8, /* cost of moving SSE register to integer. */
+ 18, 6, /* Gather load static, per_elt. */
+ 18, 6, /* Gather store static, per_elt. */
+ 32, /* size of l1 cache. */
+ 256, /* size of l2 cache. */
+ 64, /* size of prefetch block. */
+ 12, /* number of parallel prefetches. */
+ 3, /* Branch cost. */
+ COSTS_N_INSNS (3), /* cost of FADD and FSUB insns. */
+ COSTS_N_INSNS (3), /* cost of FMUL instruction. */
+ COSTS_N_INSNS (13), /* cost of FDIV instruction. */
+ COSTS_N_INSNS (2), /* cost of FABS instruction. */
+ COSTS_N_INSNS (2), /* cost of FCHS instruction. */
+ COSTS_N_INSNS (44), /* cost of FSQRT instruction. */
+
+ COSTS_N_INSNS (1), /* cost of cheap SSE instruction. */
+ COSTS_N_INSNS (3), /* cost of ADDSS/SD SUBSS/SD insns. */
+ COSTS_N_INSNS (3), /* cost of MULSS instruction. */
+ COSTS_N_INSNS (3), /* cost of MULSD instruction. */
+ COSTS_N_INSNS (5), /* cost of FMA SS instruction. */
+ COSTS_N_INSNS (5), /* cost of FMA SD instruction. */
+ COSTS_N_INSNS (11), /* cost of DIVSS instruction. */
+ COSTS_N_INSNS (14), /* cost of DIVSD instruction. */
+ COSTS_N_INSNS (11), /* cost of SQRTSS instruction. */
+ COSTS_N_INSNS (18), /* cost of SQRTSD instruction. */
+ 4, 4, 4, 4, /* reassoc int, fp, vec_int, vec_fp. */
+ shijidadao_memcpy,
+ shijidadao_memset,
+ COSTS_N_INSNS (3), /* cond_taken_branch_cost. */
+ COSTS_N_INSNS (1), /* cond_not_taken_branch_cost. */
+ "16:11:8", /* Loop alignment. */
+ "16:11:8", /* Jump alignment. */
+ "0:0:8", /* Label alignment. */
+ "16", /* Func alignment. */
+ 4, /* Small unroll limit. */
+ 2, /* Small unroll factor. */
+};
+
+
/* Generic should produce code tuned for Core-i7 (and newer chips)
and btver1 (and newer chips). */
diff --git a/gcc/config/i386/x86-tune-sched.cc b/gcc/config/i386/x86-tune-sched.cc
index f70846e628e..d77298b0e34 100644
--- a/gcc/config/i386/x86-tune-sched.cc
+++ b/gcc/config/i386/x86-tune-sched.cc
@@ -79,6 +79,7 @@ ix86_issue_rate (void)
case PROCESSOR_CANNONLAKE:
case PROCESSOR_ALDERLAKE:
case PROCESSOR_YONGFENG:
+ case PROCESSOR_SHIJIDADAO:
case PROCESSOR_GENERIC:
return 4;
@@ -446,6 +447,7 @@ ix86_adjust_cost (rtx_insn *insn, int dep_type, rtx_insn *dep_insn, int cost,
break;
case PROCESSOR_YONGFENG:
+ case PROCESSOR_SHIJIDADAO:
/* Stack engine allows to execute push&pop instructions in parallel. */
if ((insn_type == TYPE_PUSH || insn_type == TYPE_POP)
&& (dep_insn_type == TYPE_PUSH || dep_insn_type == TYPE_POP))
diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def
index 0fa1484b48d..dbb2543c68f 100644
--- a/gcc/config/i386/x86-tune.def
+++ b/gcc/config/i386/x86-tune.def
@@ -477,7 +477,7 @@ DEF_TUNE (X86_TUNE_AVOID_4BYTE_PREFIXES, "avoid_4byte_prefixes",
elements. */
DEF_TUNE (X86_TUNE_USE_GATHER_2PARTS, "use_gather_2parts",
~(m_ZNVER1 | m_ZNVER2 | m_ZNVER3 | m_ZNVER4 | m_CORE_HYBRID
- | m_YONGFENG | m_CORE_ATOM | m_GENERIC | m_GDS))
+ | m_YONGFENG | m_SHIJIDADAO | m_CORE_ATOM | m_GENERIC | m_GDS))
/* X86_TUNE_USE_SCATTER_2PARTS: Use scater instructions for vectors with 2
elements. */
@@ -488,7 +488,7 @@ DEF_TUNE (X86_TUNE_USE_SCATTER_2PARTS, "use_scatter_2parts",
elements. */
DEF_TUNE (X86_TUNE_USE_GATHER_4PARTS, "use_gather_4parts",
~(m_ZNVER1 | m_ZNVER2 | m_ZNVER3 | m_ZNVER4 | m_CORE_HYBRID
- | m_YONGFENG | m_CORE_ATOM | m_GENERIC | m_GDS))
+ | m_YONGFENG | m_SHIJIDADAO | m_CORE_ATOM | m_GENERIC | m_GDS))
/* X86_TUNE_USE_SCATTER_4PARTS: Use scater instructions for vectors with 4
elements. */
@@ -499,7 +499,7 @@ DEF_TUNE (X86_TUNE_USE_SCATTER_4PARTS, "use_scatter_4parts",
elements. */
DEF_TUNE (X86_TUNE_USE_GATHER_8PARTS, "use_gather_8parts",
~(m_ZNVER1 | m_ZNVER2 | m_ZNVER4 | m_CORE_HYBRID | m_CORE_ATOM
- | m_YONGFENG | m_GENERIC | m_GDS))
+ | m_YONGFENG | m_SHIJIDADAO | m_GENERIC | m_GDS))
/* X86_TUNE_USE_SCATTER: Use scater instructions for vectors with 8 or more
elements. */
@@ -509,7 +509,7 @@ DEF_TUNE (X86_TUNE_USE_SCATTER_8PARTS, "use_scatter_8parts",
/* X86_TUNE_AVOID_128FMA_CHAINS: Avoid creating loops with tight 128bit or
smaller FMA chain. */
DEF_TUNE (X86_TUNE_AVOID_128FMA_CHAINS, "avoid_fma_chains", m_ZNVER1 | m_ZNVER2 | m_ZNVER3 | m_ZNVER4
- | m_YONGFENG | m_GENERIC)
+ | m_YONGFENG | m_SHIJIDADAO | m_GENERIC)
/* X86_TUNE_AVOID_256FMA_CHAINS: Avoid creating loops with tight 256bit or
smaller FMA chain. */
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 8786249fb6f..11bb675830e 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -26153,6 +26153,9 @@ ZHAOXIN lujiazui CPU.
@item yongfeng
ZHAOXIN yongfeng CPU.
+@item shijidadao
+ZHAOXIN shijidadao CPU.
+
@item amdfam10h
AMD Family 10h CPU.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 218901c0b20..82bf07a83bb 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -34799,6 +34799,12 @@ SSE4.2, AVX, POPCNT, AES, PCLMUL, RDRND, XSAVE, XSAVEOPT, FSGSBASE, CX16,
ABM, BMI, BMI2, F16C, FXSR, RDSEED, AVX2, FMA, SHA, LZCNT
instruction set support.
+@item shijidadao
+ZHAOXIN shijidadao CPU with x86-64, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1,
+SSE4.2, AVX, POPCNT, AES, PCLMUL, RDRND, XSAVE, XSAVEOPT, FSGSBASE, CX16,
+ABM, BMI, BMI2, F16C, FXSR, RDSEED, AVX2, FMA, SHA, LZCNT
+instruction set support.
+
@item geode
AMD Geode embedded processor with MMX and 3DNow!@: instruction set support.
@end table
diff --git a/gcc/testsuite/g++.target/i386/mv32.C b/gcc/testsuite/g++.target/i386/mv32.C
index 6c993218d01..b311c35baa3 100644
--- a/gcc/testsuite/g++.target/i386/mv32.C
+++ b/gcc/testsuite/g++.target/i386/mv32.C
@@ -21,6 +21,10 @@ int __attribute__ ((target("arch=yongfeng"))) foo () {
return 2;
}
+int __attribute__ ((target("arch=shijidadao"))) foo () {
+ return 3;
+}
+
int main ()
{
int val = foo ();
@@ -29,6 +33,8 @@ int main ()
assert (val == 1);
else if (__builtin_cpu_is ("yongfeng"))
assert (val == 2);
+ else if (__builtin_cpu_is ("shijidadao"))
+ assert (val == 3);
else
assert (val == 0);
diff --git a/gcc/testsuite/gcc.target/i386/funcspec-56.inc b/gcc/testsuite/gcc.target/i386/funcspec-56.inc
index 2a50f5bf67c..c4dc89367ef 100644
--- a/gcc/testsuite/gcc.target/i386/funcspec-56.inc
+++ b/gcc/testsuite/gcc.target/i386/funcspec-56.inc
@@ -208,6 +208,7 @@ extern void test_arch_arrowlake_s (void) __attribute__((__target__("arch=arrowla
extern void test_arch_pantherlake (void) __attribute__((__target__("arch=pantherlake")));
extern void test_arch_lujiazui (void) __attribute__((__target__("arch=lujiazui")));
extern void test_arch_yongfeng (void) __attribute__((__target__("arch=yongfeng")));
+extern void test_arch_shijidadao (void) __attribute__((__target__("arch=shijidadao")));
extern void test_arch_k8 (void) __attribute__((__target__("arch=k8")));
extern void test_arch_k8_sse3 (void) __attribute__((__target__("arch=k8-sse3")));
extern void test_arch_opteron (void) __attribute__((__target__("arch=opteron")));
@@ -233,6 +234,7 @@ extern void test_tune_corei7_avx (void) __attribute__((__target__("tune=corei7-
extern void test_tune_core_avx2 (void) __attribute__((__target__("tune=core-avx2")));
extern void test_tune_lujiazui (void) __attribute__((__target__("tune=lujiazui")));
extern void test_tune_yongfeng (void) __attribute__((__target__("tune=yongfeng")));
+extern void test_tune_shijidadao (void) __attribute__((__target__("tune=shijidadao")));
extern void test_tune_k8 (void) __attribute__((__target__("tune=k8")));
extern void test_tune_k8_sse3 (void) __attribute__((__target__("tune=k8-sse3")));
extern void test_tune_opteron (void) __attribute__((__target__("tune=opteron")));
--
2.27.0
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] [x86_64]: Zhaoxin shijidadao enablement
2024-05-27 8:32 [PATCH] [x86_64]: Zhaoxin shijidadao enablement MayShao
@ 2024-05-28 6:15 ` Uros Bizjak
2024-06-18 7:21 ` mayshao-oc
0 siblings, 1 reply; 4+ messages in thread
From: Uros Bizjak @ 2024-05-28 6:15 UTC (permalink / raw)
To: MayShao
Cc: hubicka, gcc-patches, louisqi, hawkwang, silviazhao, xiangjiehong, timhu
On Mon, May 27, 2024 at 10:33 AM MayShao <MayShao-oc@zhaoxin.com> wrote:
>
> From: mayshao <mayshao-oc@zhaoxin.com>
>
> Hi all:
> This patch enables -march/-mtune=shijidadao, costs and tunings are set according to the characteristics of the processor.
>
> Bootstrapped /regtested X86_64.
>
> Ok for trunk?
OK.
Thanks,
Uros.
> BR
> Mayshao
> gcc/ChangeLog:
>
> * common/config/i386/cpuinfo.h (get_zhaoxin_cpu): Recognize shijidadao.
> * common/config/i386/i386-common.cc: Add shijidadao.
> * common/config/i386/i386-cpuinfo.h (enum processor_subtypes):
> Add ZHAOXIN_FAM7H_SHIJIDADAO.
> * config.gcc: Add shijidadao.
> * config/i386/driver-i386.cc (host_detect_local_cpu):
> Let -march=native recognize shijidadao processors.
> * config/i386/i386-c.cc (ix86_target_macros_internal): Add shijidadao.
> * config/i386/i386-options.cc (m_ZHAOXIN): Add m_SHIJIDADAO.
> (m_SHIJIDADAO): New definition.
> * config/i386/i386.h (enum processor_type): Add PROCESSOR_SHIJIDADAO.
> * config/i386/x86-tune-costs.h (struct processor_costs):
> Add shijidadao_cost.
> * config/i386/x86-tune-sched.cc (ix86_issue_rate): Add shijidadao.
> (ix86_adjust_cost): Ditto.
> * config/i386/x86-tune.def (X86_TUNE_USE_GATHER_2PARTS): Add m_SHIJIDADAO.
> (X86_TUNE_USE_GATHER_4PARTS): Ditto.
> (X86_TUNE_USE_GATHER_8PARTS): Ditto.
> (X86_TUNE_AVOID_128FMA_CHAINS): Ditto.
> * doc/extend.texi: Add details about shijidadao.
> * doc/invoke.texi: Ditto.
>
> gcc/testsuite/ChangeLog:
>
> * g++.target/i386/mv32.C: Handle new -march
> * gcc.target/i386/funcspec-56.inc: Ditto.
> ---
> gcc/common/config/i386/cpuinfo.h | 8 +-
> gcc/common/config/i386/i386-common.cc | 8 +-
> gcc/common/config/i386/i386-cpuinfo.h | 1 +
> gcc/config.gcc | 14 ++-
> gcc/config/i386/driver-i386.cc | 11 +-
> gcc/config/i386/i386-c.cc | 7 ++
> gcc/config/i386/i386-options.cc | 4 +-
> gcc/config/i386/i386.h | 1 +
> gcc/config/i386/x86-tune-costs.h | 116 ++++++++++++++++++
> gcc/config/i386/x86-tune-sched.cc | 2 +
> gcc/config/i386/x86-tune.def | 8 +-
> gcc/doc/extend.texi | 3 +
> gcc/doc/invoke.texi | 6 +
> gcc/testsuite/g++.target/i386/mv32.C | 6 +
> gcc/testsuite/gcc.target/i386/funcspec-56.inc | 2 +
> 15 files changed, 183 insertions(+), 14 deletions(-)
>
> diff --git a/gcc/common/config/i386/cpuinfo.h b/gcc/common/config/i386/cpuinfo.h
> index 4610bf6d6a4..936039725ab 100644
> --- a/gcc/common/config/i386/cpuinfo.h
> +++ b/gcc/common/config/i386/cpuinfo.h
> @@ -667,12 +667,18 @@ get_zhaoxin_cpu (struct __processor_model *cpu_model,
> reset_cpu_feature (cpu_model, cpu_features2, FEATURE_F16C);
> cpu_model->__cpu_subtype = ZHAOXIN_FAM7H_LUJIAZUI;
> }
> - else if (model >= 0x5b)
> + else if (model == 0x5b)
> {
> cpu = "yongfeng";
> CHECK___builtin_cpu_is ("yongfeng");
> cpu_model->__cpu_subtype = ZHAOXIN_FAM7H_YONGFENG;
> }
> + else if (model >= 0x6b)
> + {
> + cpu = "shijidadao";
> + CHECK___builtin_cpu_is ("shijidadao");
> + cpu_model->__cpu_subtype = ZHAOXIN_FAM7H_SHIJIDADAO;
> + }
> break;
> default:
> break;
> diff --git a/gcc/common/config/i386/i386-common.cc b/gcc/common/config/i386/i386-common.cc
> index 895e5fa662d..eb3f94c529c 100644
> --- a/gcc/common/config/i386/i386-common.cc
> +++ b/gcc/common/config/i386/i386-common.cc
> @@ -2066,6 +2066,7 @@ const char *const processor_names[] =
> "intel",
> "lujiazui",
> "yongfeng",
> + "shijidadao",
> "geode",
> "k6",
> "athlon",
> @@ -2271,10 +2272,13 @@ const pta processor_alias_table[] =
> | PTA_SSSE3 | PTA_SSE4_1 | PTA_FXSR, 0, P_NONE},
> {"lujiazui", PROCESSOR_LUJIAZUI, CPU_LUJIAZUI,
> PTA_LUJIAZUI,
> - M_CPU_SUBTYPE (ZHAOXIN_FAM7H_LUJIAZUI), P_NONE},
> + M_CPU_SUBTYPE (ZHAOXIN_FAM7H_LUJIAZUI), P_PROC_BMI},
> {"yongfeng", PROCESSOR_YONGFENG, CPU_YONGFENG,
> PTA_YONGFENG,
> - M_CPU_SUBTYPE (ZHAOXIN_FAM7H_YONGFENG), P_NONE},
> + M_CPU_SUBTYPE (ZHAOXIN_FAM7H_YONGFENG), P_PROC_AVX2},
> + {"shijidadao", PROCESSOR_SHIJIDADAO, CPU_YONGFENG,
> + PTA_YONGFENG,
> + M_CPU_SUBTYPE (ZHAOXIN_FAM7H_SHIJIDADAO), P_PROC_AVX2},
> {"k8", PROCESSOR_K8, CPU_K8,
> PTA_64BIT | PTA_MMX | PTA_3DNOW | PTA_3DNOW_A | PTA_SSE
> | PTA_SSE2 | PTA_NO_SAHF | PTA_FXSR, 0, P_NONE},
> diff --git a/gcc/common/config/i386/i386-cpuinfo.h b/gcc/common/config/i386/i386-cpuinfo.h
> index 9edad96d4fd..fa3b76f4931 100644
> --- a/gcc/common/config/i386/i386-cpuinfo.h
> +++ b/gcc/common/config/i386/i386-cpuinfo.h
> @@ -104,6 +104,7 @@ enum processor_subtypes
> INTEL_COREI7_PANTHERLAKE,
> ZHAOXIN_FAM7H_YONGFENG,
> AMDFAM1AH_ZNVER5,
> + ZHAOXIN_FAM7H_SHIJIDADAO,
> CPU_SUBTYPE_MAX
> };
>
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index a37113bd00a..91a3a3152dc 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -711,9 +711,9 @@ atom slm nehalem westmere sandybridge ivybridge haswell broadwell bonnell \
> silvermont skylake-avx512 cannonlake icelake-client icelake-server \
> skylake goldmont goldmont-plus tremont cascadelake tigerlake cooperlake \
> sapphirerapids alderlake rocketlake eden-x2 nano nano-1000 nano-2000 nano-3000 \
> -nano-x2 eden-x4 nano-x4 lujiazui yongfeng x86-64 x86-64-v2 x86-64-v3 x86-64-v4 \
> -sierraforest graniterapids graniterapids-d grandridge arrowlake arrowlake-s \
> -clearwaterforest pantherlake native"
> +nano-x2 eden-x4 nano-x4 lujiazui yongfeng shijidadao x86-64 x86-64-v2 \
> +x86-64-v3 x86-64-v4 sierraforest graniterapids graniterapids-d grandridge \
> +arrowlake arrowlake-s clearwaterforest pantherlake native"
>
> # Additional x86 processors supported by --with-cpu=. Each processor
> # MUST be separated by exactly one space.
> @@ -3855,6 +3855,10 @@ case ${target} in
> arch=yongfeng
> cpu=yongfeng
> ;;
> + shijidadao-*)
> + arch=shijidadao
> + cpu=shijidadao
> + ;;
> pentium2-*)
> arch=pentium2
> cpu=pentium2
> @@ -3980,6 +3984,10 @@ case ${target} in
> arch=yongfeng
> cpu=yongfeng
> ;;
> + shijidadao-*)
> + arch=shijidadao
> + cpu=shijidadao
> + ;;
> nocona-*)
> arch=nocona
> cpu=nocona
> diff --git a/gcc/config/i386/driver-i386.cc b/gcc/config/i386/driver-i386.cc
> index 0176d8b6cd2..11470eaea12 100644
> --- a/gcc/config/i386/driver-i386.cc
> +++ b/gcc/config/i386/driver-i386.cc
> @@ -558,10 +558,12 @@ const char *host_detect_local_cpu (int argc, const char **argv)
> switch (family)
> {
> case 7:
> - if (model == 0x3b)
> - processor = PROCESSOR_LUJIAZUI;
> - else if (model >= 0x5b)
> + if (model >= 0x6b)
> + processor = PROCESSOR_SHIJIDADAO;
> + else if (model == 0x5b)
> processor = PROCESSOR_YONGFENG;
> + else if (model == 0x3b)
> + processor = PROCESSOR_LUJIAZUI;
> break;
> default:
> break;
> @@ -853,6 +855,9 @@ const char *host_detect_local_cpu (int argc, const char **argv)
> case PROCESSOR_YONGFENG:
> cpu = "yongfeng";
> break;
> + case PROCESSOR_SHIJIDADAO:
> + cpu = "shijidadao";
> + break;
>
> default:
> /* Use something reasonable. */
> diff --git a/gcc/config/i386/i386-c.cc b/gcc/config/i386/i386-c.cc
> index 7b0ad9e9181..403475d5b6b 100644
> --- a/gcc/config/i386/i386-c.cc
> +++ b/gcc/config/i386/i386-c.cc
> @@ -156,6 +156,10 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
> def_or_undef (parse_in, "__yongfeng");
> def_or_undef (parse_in, "__yongfeng__");
> break;
> + case PROCESSOR_SHIJIDADAO:
> + def_or_undef (parse_in, "__shijidadao");
> + def_or_undef (parse_in, "__shijidadao__");
> + break;
> case PROCESSOR_PENTIUM4:
> def_or_undef (parse_in, "__pentium4");
> def_or_undef (parse_in, "__pentium4__");
> @@ -386,6 +390,9 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
> case PROCESSOR_YONGFENG:
> def_or_undef (parse_in, "__tune_yongfeng__");
> break;
> + case PROCESSOR_SHIJIDADAO:
> + def_or_undef (parse_in, "__tune_shijidadao__");
> + break;
> case PROCESSOR_PENTIUM4:
> def_or_undef (parse_in, "__tune_pentium4__");
> break;
> diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
> index 78602a17f7e..2af869d0ecf 100644
> --- a/gcc/config/i386/i386-options.cc
> +++ b/gcc/config/i386/i386-options.cc
> @@ -155,7 +155,8 @@ along with GCC; see the file COPYING3. If not see
>
> #define m_LUJIAZUI (HOST_WIDE_INT_1U<<PROCESSOR_LUJIAZUI)
> #define m_YONGFENG (HOST_WIDE_INT_1U<<PROCESSOR_YONGFENG)
> -#define m_ZHAOXIN (m_LUJIAZUI | m_YONGFENG)
> +#define m_SHIJIDADAO (HOST_WIDE_INT_1U<<PROCESSOR_SHIJIDADAO)
> +#define m_ZHAOXIN (m_LUJIAZUI | m_YONGFENG | m_SHIJIDADAO)
>
> #define m_GEODE (HOST_WIDE_INT_1U<<PROCESSOR_GEODE)
> #define m_K6 (HOST_WIDE_INT_1U<<PROCESSOR_K6)
> @@ -793,6 +794,7 @@ static const struct processor_costs *processor_cost_table[] =
> &intel_cost,
> &lujiazui_cost,
> &yongfeng_cost,
> + &shijidadao_cost,
> &geode_cost,
> &k6_cost,
> &athlon_cost,
> diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> index 359a8408263..d8c225cacdd 100644
> --- a/gcc/config/i386/i386.h
> +++ b/gcc/config/i386/i386.h
> @@ -2301,6 +2301,7 @@ enum processor_type
> PROCESSOR_INTEL,
> PROCESSOR_LUJIAZUI,
> PROCESSOR_YONGFENG,
> + PROCESSOR_SHIJIDADAO,
> PROCESSOR_GEODE,
> PROCESSOR_K6,
> PROCESSOR_ATHLON,
> diff --git a/gcc/config/i386/x86-tune-costs.h b/gcc/config/i386/x86-tune-costs.h
> index 65d7d1f7e42..1060b0685a8 100644
> --- a/gcc/config/i386/x86-tune-costs.h
> +++ b/gcc/config/i386/x86-tune-costs.h
> @@ -3644,6 +3644,122 @@ struct processor_costs yongfeng_cost = {
> 2, /* Small unroll factor. */
> };
>
> +/* shijidadao_cost should produce code tuned for ZHAOXIN shijidadao CPU. */
> +static stringop_algs shijidadao_memcpy[2] = {
> + {libcall, {{8, unrolled_loop, true}, {256, unrolled_loop, false},
> + {-1, libcall, false}}},
> + {libcall, {{10, loop, true}, {256, unrolled_loop, false},
> + {-1, libcall, false}}}};
> +static stringop_algs shijidadao_memset[2] = {
> + {libcall, {{4, loop, true}, {128, unrolled_loop, false},
> + {-1, libcall, false}}},
> + {libcall, {{1, rep_prefix_4_byte, false}, {14, loop, true},
> + {1024, vector_loop, false},
> + {-1, libcall, false}}}};
> +static const
> +struct processor_costs shijidadao_cost = {
> + {
> + /* Start of register allocator costs. integer->integer move cost is 2. */
> + 8, /* cost for loading QImode using movzbl. */
> + {8, 8, 8}, /* cost of loading integer registers
> + in QImode, HImode and SImode.
> + Relative to reg-reg move (2). */
> + {8, 8, 8}, /* cost of storing integer registers. */
> + 2, /* cost of reg,reg fld/fst. */
> + {8, 8, 8}, /* cost of loading fp registers
> + in SFmode, DFmode and XFmode. */
> + {8, 8, 8}, /* cost of storing fp registers
> + in SFmode, DFmode and XFmode. */
> + 2, /* cost of moving MMX register. */
> + {8, 8}, /* cost of loading MMX registers
> + in SImode and DImode. */
> + {8, 8}, /* cost of storing MMX registers
> + in SImode and DImode. */
> + 2, 3, 4, /* cost of moving XMM,YMM,ZMM register. */
> + {8, 8, 8, 10, 15}, /* cost of loading SSE registers
> + in 32,64,128,256 and 512-bit. */
> + {8, 8, 8, 10, 15}, /* cost of storing SSE registers
> + in 32,64,128,256 and 512-bit. */
> + 8, 8, /* SSE->integer and integer->SSE moves. */
> + 8, 8, /* mask->integer and integer->mask moves. */
> + {8, 8, 8}, /* cost of loading mask register
> + in QImode, HImode, SImode. */
> + {8, 8, 8}, /* cost if storing mask register
> + in QImode, HImode, SImode. */
> + 2, /* cost of moving mask register. */
> + /* End of register allocator costs. */
> + },
> +
> + COSTS_N_INSNS (1), /* cost of an add instruction. */
> + COSTS_N_INSNS (1), /* cost of a lea instruction. */
> + COSTS_N_INSNS (1), /* variable shift costs. */
> + COSTS_N_INSNS (1), /* constant shift costs. */
> + {COSTS_N_INSNS (2), /* cost of starting multiply for QI. */
> + COSTS_N_INSNS (3), /* HI. */
> + COSTS_N_INSNS (2), /* SI. */
> + COSTS_N_INSNS (2), /* DI. */
> + COSTS_N_INSNS (3)}, /* other. */
> + 0, /* cost of multiply per each bit set. */
> + {COSTS_N_INSNS (9), /* cost of a divide/mod for QI. */
> + COSTS_N_INSNS (10), /* HI. */
> + COSTS_N_INSNS (9), /* SI. */
> + COSTS_N_INSNS (50), /* DI. */
> + COSTS_N_INSNS (50)}, /* other. */
> + COSTS_N_INSNS (1), /* cost of movsx. */
> + COSTS_N_INSNS (1), /* cost of movzx. */
> + 8, /* "large" insn. */
> + 17, /* MOVE_RATIO. */
> + 6, /* CLEAR_RATIO. */
> + {8, 8, 8}, /* cost of loading integer registers
> + in QImode, HImode and SImode.
> + Relative to reg-reg move (2). */
> + {8, 8, 8}, /* cost of storing integer registers. */
> + {8, 8, 8, 12, 15}, /* cost of loading SSE register
> + in 32bit, 64bit, 128bit, 256bit and 512bit. */
> + {8, 8, 8, 12, 15}, /* cost of storing SSE register
> + in 32bit, 64bit, 128bit, 256bit and 512bit. */
> + {8, 8, 8, 12, 15}, /* cost of unaligned loads. */
> + {8, 8, 8, 12, 15}, /* cost of unaligned storess. */
> + 2, 3, 4, /* cost of moving XMM,YMM,ZMM register. */
> + 8, /* cost of moving SSE register to integer. */
> + 18, 6, /* Gather load static, per_elt. */
> + 18, 6, /* Gather store static, per_elt. */
> + 32, /* size of l1 cache. */
> + 256, /* size of l2 cache. */
> + 64, /* size of prefetch block. */
> + 12, /* number of parallel prefetches. */
> + 3, /* Branch cost. */
> + COSTS_N_INSNS (3), /* cost of FADD and FSUB insns. */
> + COSTS_N_INSNS (3), /* cost of FMUL instruction. */
> + COSTS_N_INSNS (13), /* cost of FDIV instruction. */
> + COSTS_N_INSNS (2), /* cost of FABS instruction. */
> + COSTS_N_INSNS (2), /* cost of FCHS instruction. */
> + COSTS_N_INSNS (44), /* cost of FSQRT instruction. */
> +
> + COSTS_N_INSNS (1), /* cost of cheap SSE instruction. */
> + COSTS_N_INSNS (3), /* cost of ADDSS/SD SUBSS/SD insns. */
> + COSTS_N_INSNS (3), /* cost of MULSS instruction. */
> + COSTS_N_INSNS (3), /* cost of MULSD instruction. */
> + COSTS_N_INSNS (5), /* cost of FMA SS instruction. */
> + COSTS_N_INSNS (5), /* cost of FMA SD instruction. */
> + COSTS_N_INSNS (11), /* cost of DIVSS instruction. */
> + COSTS_N_INSNS (14), /* cost of DIVSD instruction. */
> + COSTS_N_INSNS (11), /* cost of SQRTSS instruction. */
> + COSTS_N_INSNS (18), /* cost of SQRTSD instruction. */
> + 4, 4, 4, 4, /* reassoc int, fp, vec_int, vec_fp. */
> + shijidadao_memcpy,
> + shijidadao_memset,
> + COSTS_N_INSNS (3), /* cond_taken_branch_cost. */
> + COSTS_N_INSNS (1), /* cond_not_taken_branch_cost. */
> + "16:11:8", /* Loop alignment. */
> + "16:11:8", /* Jump alignment. */
> + "0:0:8", /* Label alignment. */
> + "16", /* Func alignment. */
> + 4, /* Small unroll limit. */
> + 2, /* Small unroll factor. */
> +};
> +
> +
>
> /* Generic should produce code tuned for Core-i7 (and newer chips)
> and btver1 (and newer chips). */
> diff --git a/gcc/config/i386/x86-tune-sched.cc b/gcc/config/i386/x86-tune-sched.cc
> index f70846e628e..d77298b0e34 100644
> --- a/gcc/config/i386/x86-tune-sched.cc
> +++ b/gcc/config/i386/x86-tune-sched.cc
> @@ -79,6 +79,7 @@ ix86_issue_rate (void)
> case PROCESSOR_CANNONLAKE:
> case PROCESSOR_ALDERLAKE:
> case PROCESSOR_YONGFENG:
> + case PROCESSOR_SHIJIDADAO:
> case PROCESSOR_GENERIC:
> return 4;
>
> @@ -446,6 +447,7 @@ ix86_adjust_cost (rtx_insn *insn, int dep_type, rtx_insn *dep_insn, int cost,
> break;
>
> case PROCESSOR_YONGFENG:
> + case PROCESSOR_SHIJIDADAO:
> /* Stack engine allows to execute push&pop instructions in parallel. */
> if ((insn_type == TYPE_PUSH || insn_type == TYPE_POP)
> && (dep_insn_type == TYPE_PUSH || dep_insn_type == TYPE_POP))
> diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def
> index 0fa1484b48d..dbb2543c68f 100644
> --- a/gcc/config/i386/x86-tune.def
> +++ b/gcc/config/i386/x86-tune.def
> @@ -477,7 +477,7 @@ DEF_TUNE (X86_TUNE_AVOID_4BYTE_PREFIXES, "avoid_4byte_prefixes",
> elements. */
> DEF_TUNE (X86_TUNE_USE_GATHER_2PARTS, "use_gather_2parts",
> ~(m_ZNVER1 | m_ZNVER2 | m_ZNVER3 | m_ZNVER4 | m_CORE_HYBRID
> - | m_YONGFENG | m_CORE_ATOM | m_GENERIC | m_GDS))
> + | m_YONGFENG | m_SHIJIDADAO | m_CORE_ATOM | m_GENERIC | m_GDS))
>
> /* X86_TUNE_USE_SCATTER_2PARTS: Use scater instructions for vectors with 2
> elements. */
> @@ -488,7 +488,7 @@ DEF_TUNE (X86_TUNE_USE_SCATTER_2PARTS, "use_scatter_2parts",
> elements. */
> DEF_TUNE (X86_TUNE_USE_GATHER_4PARTS, "use_gather_4parts",
> ~(m_ZNVER1 | m_ZNVER2 | m_ZNVER3 | m_ZNVER4 | m_CORE_HYBRID
> - | m_YONGFENG | m_CORE_ATOM | m_GENERIC | m_GDS))
> + | m_YONGFENG | m_SHIJIDADAO | m_CORE_ATOM | m_GENERIC | m_GDS))
>
> /* X86_TUNE_USE_SCATTER_4PARTS: Use scater instructions for vectors with 4
> elements. */
> @@ -499,7 +499,7 @@ DEF_TUNE (X86_TUNE_USE_SCATTER_4PARTS, "use_scatter_4parts",
> elements. */
> DEF_TUNE (X86_TUNE_USE_GATHER_8PARTS, "use_gather_8parts",
> ~(m_ZNVER1 | m_ZNVER2 | m_ZNVER4 | m_CORE_HYBRID | m_CORE_ATOM
> - | m_YONGFENG | m_GENERIC | m_GDS))
> + | m_YONGFENG | m_SHIJIDADAO | m_GENERIC | m_GDS))
>
> /* X86_TUNE_USE_SCATTER: Use scater instructions for vectors with 8 or more
> elements. */
> @@ -509,7 +509,7 @@ DEF_TUNE (X86_TUNE_USE_SCATTER_8PARTS, "use_scatter_8parts",
> /* X86_TUNE_AVOID_128FMA_CHAINS: Avoid creating loops with tight 128bit or
> smaller FMA chain. */
> DEF_TUNE (X86_TUNE_AVOID_128FMA_CHAINS, "avoid_fma_chains", m_ZNVER1 | m_ZNVER2 | m_ZNVER3 | m_ZNVER4
> - | m_YONGFENG | m_GENERIC)
> + | m_YONGFENG | m_SHIJIDADAO | m_GENERIC)
>
> /* X86_TUNE_AVOID_256FMA_CHAINS: Avoid creating loops with tight 256bit or
> smaller FMA chain. */
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index 8786249fb6f..11bb675830e 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -26153,6 +26153,9 @@ ZHAOXIN lujiazui CPU.
> @item yongfeng
> ZHAOXIN yongfeng CPU.
>
> +@item shijidadao
> +ZHAOXIN shijidadao CPU.
> +
> @item amdfam10h
> AMD Family 10h CPU.
>
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 218901c0b20..82bf07a83bb 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -34799,6 +34799,12 @@ SSE4.2, AVX, POPCNT, AES, PCLMUL, RDRND, XSAVE, XSAVEOPT, FSGSBASE, CX16,
> ABM, BMI, BMI2, F16C, FXSR, RDSEED, AVX2, FMA, SHA, LZCNT
> instruction set support.
>
> +@item shijidadao
> +ZHAOXIN shijidadao CPU with x86-64, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1,
> +SSE4.2, AVX, POPCNT, AES, PCLMUL, RDRND, XSAVE, XSAVEOPT, FSGSBASE, CX16,
> +ABM, BMI, BMI2, F16C, FXSR, RDSEED, AVX2, FMA, SHA, LZCNT
> +instruction set support.
> +
> @item geode
> AMD Geode embedded processor with MMX and 3DNow!@: instruction set support.
> @end table
> diff --git a/gcc/testsuite/g++.target/i386/mv32.C b/gcc/testsuite/g++.target/i386/mv32.C
> index 6c993218d01..b311c35baa3 100644
> --- a/gcc/testsuite/g++.target/i386/mv32.C
> +++ b/gcc/testsuite/g++.target/i386/mv32.C
> @@ -21,6 +21,10 @@ int __attribute__ ((target("arch=yongfeng"))) foo () {
> return 2;
> }
>
> +int __attribute__ ((target("arch=shijidadao"))) foo () {
> + return 3;
> +}
> +
> int main ()
> {
> int val = foo ();
> @@ -29,6 +33,8 @@ int main ()
> assert (val == 1);
> else if (__builtin_cpu_is ("yongfeng"))
> assert (val == 2);
> + else if (__builtin_cpu_is ("shijidadao"))
> + assert (val == 3);
> else
> assert (val == 0);
>
> diff --git a/gcc/testsuite/gcc.target/i386/funcspec-56.inc b/gcc/testsuite/gcc.target/i386/funcspec-56.inc
> index 2a50f5bf67c..c4dc89367ef 100644
> --- a/gcc/testsuite/gcc.target/i386/funcspec-56.inc
> +++ b/gcc/testsuite/gcc.target/i386/funcspec-56.inc
> @@ -208,6 +208,7 @@ extern void test_arch_arrowlake_s (void) __attribute__((__target__("arch=arrowla
> extern void test_arch_pantherlake (void) __attribute__((__target__("arch=pantherlake")));
> extern void test_arch_lujiazui (void) __attribute__((__target__("arch=lujiazui")));
> extern void test_arch_yongfeng (void) __attribute__((__target__("arch=yongfeng")));
> +extern void test_arch_shijidadao (void) __attribute__((__target__("arch=shijidadao")));
> extern void test_arch_k8 (void) __attribute__((__target__("arch=k8")));
> extern void test_arch_k8_sse3 (void) __attribute__((__target__("arch=k8-sse3")));
> extern void test_arch_opteron (void) __attribute__((__target__("arch=opteron")));
> @@ -233,6 +234,7 @@ extern void test_tune_corei7_avx (void) __attribute__((__target__("tune=corei7-
> extern void test_tune_core_avx2 (void) __attribute__((__target__("tune=core-avx2")));
> extern void test_tune_lujiazui (void) __attribute__((__target__("tune=lujiazui")));
> extern void test_tune_yongfeng (void) __attribute__((__target__("tune=yongfeng")));
> +extern void test_tune_shijidadao (void) __attribute__((__target__("tune=shijidadao")));
> extern void test_tune_k8 (void) __attribute__((__target__("tune=k8")));
> extern void test_tune_k8_sse3 (void) __attribute__((__target__("tune=k8-sse3")));
> extern void test_tune_opteron (void) __attribute__((__target__("tune=opteron")));
> --
> 2.27.0
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] [x86_64]: Zhaoxin shijidadao enablement
2024-05-28 6:15 ` Uros Bizjak
@ 2024-06-18 7:21 ` mayshao-oc
2024-06-19 18:22 ` Uros Bizjak
0 siblings, 1 reply; 4+ messages in thread
From: mayshao-oc @ 2024-06-18 7:21 UTC (permalink / raw)
To: Uros Bizjak
Cc: hubicka, gcc-patches, louisqi, hawkwang, silviazhao, xiangjiehong, timhu
On 5/28/24 14:15, Uros Bizjak wrote:
>
>
>
> On Mon, May 27, 2024 at 10:33 AM MayShao <MayShao-oc@zhaoxin.com> wrote:
>>
>> From: mayshao <mayshao-oc@zhaoxin.com>
>>
>> Hi all:
>> This patch enables -march/-mtune=shijidadao, costs and tunings are set according to the characteristics of the processor.
>>
>> Bootstrapped /regtested X86_64.
>>
>> Ok for trunk?
>
> OK.
>
> Thanks,
> Uros.
Thanks for your review, please help me commit.
BR
Mayshao
>
>> BR
>> Mayshao
>> gcc/ChangeLog:
>>
>> * common/config/i386/cpuinfo.h (get_zhaoxin_cpu): Recognize shijidadao.
>> * common/config/i386/i386-common.cc: Add shijidadao.
>> * common/config/i386/i386-cpuinfo.h (enum processor_subtypes):
>> Add ZHAOXIN_FAM7H_SHIJIDADAO.
>> * config.gcc: Add shijidadao.
>> * config/i386/driver-i386.cc (host_detect_local_cpu):
>> Let -march=native recognize shijidadao processors.
>> * config/i386/i386-c.cc (ix86_target_macros_internal): Add shijidadao.
>> * config/i386/i386-options.cc (m_ZHAOXIN): Add m_SHIJIDADAO.
>> (m_SHIJIDADAO): New definition.
>> * config/i386/i386.h (enum processor_type): Add PROCESSOR_SHIJIDADAO.
>> * config/i386/x86-tune-costs.h (struct processor_costs):
>> Add shijidadao_cost.
>> * config/i386/x86-tune-sched.cc (ix86_issue_rate): Add shijidadao.
>> (ix86_adjust_cost): Ditto.
>> * config/i386/x86-tune.def (X86_TUNE_USE_GATHER_2PARTS): Add m_SHIJIDADAO.
>> (X86_TUNE_USE_GATHER_4PARTS): Ditto.
>> (X86_TUNE_USE_GATHER_8PARTS): Ditto.
>> (X86_TUNE_AVOID_128FMA_CHAINS): Ditto.
>> * doc/extend.texi: Add details about shijidadao.
>> * doc/invoke.texi: Ditto.
>>
>> gcc/testsuite/ChangeLog:
>>
>> * g++.target/i386/mv32.C: Handle new -march
>> * gcc.target/i386/funcspec-56.inc: Ditto.
>> ---
>> gcc/common/config/i386/cpuinfo.h | 8 +-
>> gcc/common/config/i386/i386-common.cc | 8 +-
>> gcc/common/config/i386/i386-cpuinfo.h | 1 +
>> gcc/config.gcc | 14 ++-
>> gcc/config/i386/driver-i386.cc | 11 +-
>> gcc/config/i386/i386-c.cc | 7 ++
>> gcc/config/i386/i386-options.cc | 4 +-
>> gcc/config/i386/i386.h | 1 +
>> gcc/config/i386/x86-tune-costs.h | 116 ++++++++++++++++++
>> gcc/config/i386/x86-tune-sched.cc | 2 +
>> gcc/config/i386/x86-tune.def | 8 +-
>> gcc/doc/extend.texi | 3 +
>> gcc/doc/invoke.texi | 6 +
>> gcc/testsuite/g++.target/i386/mv32.C | 6 +
>> gcc/testsuite/gcc.target/i386/funcspec-56.inc | 2 +
>> 15 files changed, 183 insertions(+), 14 deletions(-)
>>
>> diff --git a/gcc/common/config/i386/cpuinfo.h b/gcc/common/config/i386/cpuinfo.h
>> index 4610bf6d6a4..936039725ab 100644
>> --- a/gcc/common/config/i386/cpuinfo.h
>> +++ b/gcc/common/config/i386/cpuinfo.h
>> @@ -667,12 +667,18 @@ get_zhaoxin_cpu (struct __processor_model *cpu_model,
>> reset_cpu_feature (cpu_model, cpu_features2, FEATURE_F16C);
>> cpu_model->__cpu_subtype = ZHAOXIN_FAM7H_LUJIAZUI;
>> }
>> - else if (model >= 0x5b)
>> + else if (model == 0x5b)
>> {
>> cpu = "yongfeng";
>> CHECK___builtin_cpu_is ("yongfeng");
>> cpu_model->__cpu_subtype = ZHAOXIN_FAM7H_YONGFENG;
>> }
>> + else if (model >= 0x6b)
>> + {
>> + cpu = "shijidadao";
>> + CHECK___builtin_cpu_is ("shijidadao");
>> + cpu_model->__cpu_subtype = ZHAOXIN_FAM7H_SHIJIDADAO;
>> + }
>> break;
>> default:
>> break;
>> diff --git a/gcc/common/config/i386/i386-common.cc b/gcc/common/config/i386/i386-common.cc
>> index 895e5fa662d..eb3f94c529c 100644
>> --- a/gcc/common/config/i386/i386-common.cc
>> +++ b/gcc/common/config/i386/i386-common.cc
>> @@ -2066,6 +2066,7 @@ const char *const processor_names[] =
>> "intel",
>> "lujiazui",
>> "yongfeng",
>> + "shijidadao",
>> "geode",
>> "k6",
>> "athlon",
>> @@ -2271,10 +2272,13 @@ const pta processor_alias_table[] =
>> | PTA_SSSE3 | PTA_SSE4_1 | PTA_FXSR, 0, P_NONE},
>> {"lujiazui", PROCESSOR_LUJIAZUI, CPU_LUJIAZUI,
>> PTA_LUJIAZUI,
>> - M_CPU_SUBTYPE (ZHAOXIN_FAM7H_LUJIAZUI), P_NONE},
>> + M_CPU_SUBTYPE (ZHAOXIN_FAM7H_LUJIAZUI), P_PROC_BMI},
>> {"yongfeng", PROCESSOR_YONGFENG, CPU_YONGFENG,
>> PTA_YONGFENG,
>> - M_CPU_SUBTYPE (ZHAOXIN_FAM7H_YONGFENG), P_NONE},
>> + M_CPU_SUBTYPE (ZHAOXIN_FAM7H_YONGFENG), P_PROC_AVX2},
>> + {"shijidadao", PROCESSOR_SHIJIDADAO, CPU_YONGFENG,
>> + PTA_YONGFENG,
>> + M_CPU_SUBTYPE (ZHAOXIN_FAM7H_SHIJIDADAO), P_PROC_AVX2},
>> {"k8", PROCESSOR_K8, CPU_K8,
>> PTA_64BIT | PTA_MMX | PTA_3DNOW | PTA_3DNOW_A | PTA_SSE
>> | PTA_SSE2 | PTA_NO_SAHF | PTA_FXSR, 0, P_NONE},
>> diff --git a/gcc/common/config/i386/i386-cpuinfo.h b/gcc/common/config/i386/i386-cpuinfo.h
>> index 9edad96d4fd..fa3b76f4931 100644
>> --- a/gcc/common/config/i386/i386-cpuinfo.h
>> +++ b/gcc/common/config/i386/i386-cpuinfo.h
>> @@ -104,6 +104,7 @@ enum processor_subtypes
>> INTEL_COREI7_PANTHERLAKE,
>> ZHAOXIN_FAM7H_YONGFENG,
>> AMDFAM1AH_ZNVER5,
>> + ZHAOXIN_FAM7H_SHIJIDADAO,
>> CPU_SUBTYPE_MAX
>> };
>>
>> diff --git a/gcc/config.gcc b/gcc/config.gcc
>> index a37113bd00a..91a3a3152dc 100644
>> --- a/gcc/config.gcc
>> +++ b/gcc/config.gcc
>> @@ -711,9 +711,9 @@ atom slm nehalem westmere sandybridge ivybridge haswell broadwell bonnell \
>> silvermont skylake-avx512 cannonlake icelake-client icelake-server \
>> skylake goldmont goldmont-plus tremont cascadelake tigerlake cooperlake \
>> sapphirerapids alderlake rocketlake eden-x2 nano nano-1000 nano-2000 nano-3000 \
>> -nano-x2 eden-x4 nano-x4 lujiazui yongfeng x86-64 x86-64-v2 x86-64-v3 x86-64-v4 \
>> -sierraforest graniterapids graniterapids-d grandridge arrowlake arrowlake-s \
>> -clearwaterforest pantherlake native"
>> +nano-x2 eden-x4 nano-x4 lujiazui yongfeng shijidadao x86-64 x86-64-v2 \
>> +x86-64-v3 x86-64-v4 sierraforest graniterapids graniterapids-d grandridge \
>> +arrowlake arrowlake-s clearwaterforest pantherlake native"
>>
>> # Additional x86 processors supported by --with-cpu=. Each processor
>> # MUST be separated by exactly one space.
>> @@ -3855,6 +3855,10 @@ case ${target} in
>> arch=yongfeng
>> cpu=yongfeng
>> ;;
>> + shijidadao-*)
>> + arch=shijidadao
>> + cpu=shijidadao
>> + ;;
>> pentium2-*)
>> arch=pentium2
>> cpu=pentium2
>> @@ -3980,6 +3984,10 @@ case ${target} in
>> arch=yongfeng
>> cpu=yongfeng
>> ;;
>> + shijidadao-*)
>> + arch=shijidadao
>> + cpu=shijidadao
>> + ;;
>> nocona-*)
>> arch=nocona
>> cpu=nocona
>> diff --git a/gcc/config/i386/driver-i386.cc b/gcc/config/i386/driver-i386.cc
>> index 0176d8b6cd2..11470eaea12 100644
>> --- a/gcc/config/i386/driver-i386.cc
>> +++ b/gcc/config/i386/driver-i386.cc
>> @@ -558,10 +558,12 @@ const char *host_detect_local_cpu (int argc, const char **argv)
>> switch (family)
>> {
>> case 7:
>> - if (model == 0x3b)
>> - processor = PROCESSOR_LUJIAZUI;
>> - else if (model >= 0x5b)
>> + if (model >= 0x6b)
>> + processor = PROCESSOR_SHIJIDADAO;
>> + else if (model == 0x5b)
>> processor = PROCESSOR_YONGFENG;
>> + else if (model == 0x3b)
>> + processor = PROCESSOR_LUJIAZUI;
>> break;
>> default:
>> break;
>> @@ -853,6 +855,9 @@ const char *host_detect_local_cpu (int argc, const char **argv)
>> case PROCESSOR_YONGFENG:
>> cpu = "yongfeng";
>> break;
>> + case PROCESSOR_SHIJIDADAO:
>> + cpu = "shijidadao";
>> + break;
>>
>> default:
>> /* Use something reasonable. */
>> diff --git a/gcc/config/i386/i386-c.cc b/gcc/config/i386/i386-c.cc
>> index 7b0ad9e9181..403475d5b6b 100644
>> --- a/gcc/config/i386/i386-c.cc
>> +++ b/gcc/config/i386/i386-c.cc
>> @@ -156,6 +156,10 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
>> def_or_undef (parse_in, "__yongfeng");
>> def_or_undef (parse_in, "__yongfeng__");
>> break;
>> + case PROCESSOR_SHIJIDADAO:
>> + def_or_undef (parse_in, "__shijidadao");
>> + def_or_undef (parse_in, "__shijidadao__");
>> + break;
>> case PROCESSOR_PENTIUM4:
>> def_or_undef (parse_in, "__pentium4");
>> def_or_undef (parse_in, "__pentium4__");
>> @@ -386,6 +390,9 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
>> case PROCESSOR_YONGFENG:
>> def_or_undef (parse_in, "__tune_yongfeng__");
>> break;
>> + case PROCESSOR_SHIJIDADAO:
>> + def_or_undef (parse_in, "__tune_shijidadao__");
>> + break;
>> case PROCESSOR_PENTIUM4:
>> def_or_undef (parse_in, "__tune_pentium4__");
>> break;
>> diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
>> index 78602a17f7e..2af869d0ecf 100644
>> --- a/gcc/config/i386/i386-options.cc
>> +++ b/gcc/config/i386/i386-options.cc
>> @@ -155,7 +155,8 @@ along with GCC; see the file COPYING3. If not see
>>
>> #define m_LUJIAZUI (HOST_WIDE_INT_1U<<PROCESSOR_LUJIAZUI)
>> #define m_YONGFENG (HOST_WIDE_INT_1U<<PROCESSOR_YONGFENG)
>> -#define m_ZHAOXIN (m_LUJIAZUI | m_YONGFENG)
>> +#define m_SHIJIDADAO (HOST_WIDE_INT_1U<<PROCESSOR_SHIJIDADAO)
>> +#define m_ZHAOXIN (m_LUJIAZUI | m_YONGFENG | m_SHIJIDADAO)
>>
>> #define m_GEODE (HOST_WIDE_INT_1U<<PROCESSOR_GEODE)
>> #define m_K6 (HOST_WIDE_INT_1U<<PROCESSOR_K6)
>> @@ -793,6 +794,7 @@ static const struct processor_costs *processor_cost_table[] =
>> &intel_cost,
>> &lujiazui_cost,
>> &yongfeng_cost,
>> + &shijidadao_cost,
>> &geode_cost,
>> &k6_cost,
>> &athlon_cost,
>> diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
>> index 359a8408263..d8c225cacdd 100644
>> --- a/gcc/config/i386/i386.h
>> +++ b/gcc/config/i386/i386.h
>> @@ -2301,6 +2301,7 @@ enum processor_type
>> PROCESSOR_INTEL,
>> PROCESSOR_LUJIAZUI,
>> PROCESSOR_YONGFENG,
>> + PROCESSOR_SHIJIDADAO,
>> PROCESSOR_GEODE,
>> PROCESSOR_K6,
>> PROCESSOR_ATHLON,
>> diff --git a/gcc/config/i386/x86-tune-costs.h b/gcc/config/i386/x86-tune-costs.h
>> index 65d7d1f7e42..1060b0685a8 100644
>> --- a/gcc/config/i386/x86-tune-costs.h
>> +++ b/gcc/config/i386/x86-tune-costs.h
>> @@ -3644,6 +3644,122 @@ struct processor_costs yongfeng_cost = {
>> 2, /* Small unroll factor. */
>> };
>>
>> +/* shijidadao_cost should produce code tuned for ZHAOXIN shijidadao CPU. */
>> +static stringop_algs shijidadao_memcpy[2] = {
>> + {libcall, {{8, unrolled_loop, true}, {256, unrolled_loop, false},
>> + {-1, libcall, false}}},
>> + {libcall, {{10, loop, true}, {256, unrolled_loop, false},
>> + {-1, libcall, false}}}};
>> +static stringop_algs shijidadao_memset[2] = {
>> + {libcall, {{4, loop, true}, {128, unrolled_loop, false},
>> + {-1, libcall, false}}},
>> + {libcall, {{1, rep_prefix_4_byte, false}, {14, loop, true},
>> + {1024, vector_loop, false},
>> + {-1, libcall, false}}}};
>> +static const
>> +struct processor_costs shijidadao_cost = {
>> + {
>> + /* Start of register allocator costs. integer->integer move cost is 2. */
>> + 8, /* cost for loading QImode using movzbl. */
>> + {8, 8, 8}, /* cost of loading integer registers
>> + in QImode, HImode and SImode.
>> + Relative to reg-reg move (2). */
>> + {8, 8, 8}, /* cost of storing integer registers. */
>> + 2, /* cost of reg,reg fld/fst. */
>> + {8, 8, 8}, /* cost of loading fp registers
>> + in SFmode, DFmode and XFmode. */
>> + {8, 8, 8}, /* cost of storing fp registers
>> + in SFmode, DFmode and XFmode. */
>> + 2, /* cost of moving MMX register. */
>> + {8, 8}, /* cost of loading MMX registers
>> + in SImode and DImode. */
>> + {8, 8}, /* cost of storing MMX registers
>> + in SImode and DImode. */
>> + 2, 3, 4, /* cost of moving XMM,YMM,ZMM register. */
>> + {8, 8, 8, 10, 15}, /* cost of loading SSE registers
>> + in 32,64,128,256 and 512-bit. */
>> + {8, 8, 8, 10, 15}, /* cost of storing SSE registers
>> + in 32,64,128,256 and 512-bit. */
>> + 8, 8, /* SSE->integer and integer->SSE moves. */
>> + 8, 8, /* mask->integer and integer->mask moves. */
>> + {8, 8, 8}, /* cost of loading mask register
>> + in QImode, HImode, SImode. */
>> + {8, 8, 8}, /* cost if storing mask register
>> + in QImode, HImode, SImode. */
>> + 2, /* cost of moving mask register. */
>> + /* End of register allocator costs. */
>> + },
>> +
>> + COSTS_N_INSNS (1), /* cost of an add instruction. */
>> + COSTS_N_INSNS (1), /* cost of a lea instruction. */
>> + COSTS_N_INSNS (1), /* variable shift costs. */
>> + COSTS_N_INSNS (1), /* constant shift costs. */
>> + {COSTS_N_INSNS (2), /* cost of starting multiply for QI. */
>> + COSTS_N_INSNS (3), /* HI. */
>> + COSTS_N_INSNS (2), /* SI. */
>> + COSTS_N_INSNS (2), /* DI. */
>> + COSTS_N_INSNS (3)}, /* other. */
>> + 0, /* cost of multiply per each bit set. */
>> + {COSTS_N_INSNS (9), /* cost of a divide/mod for QI. */
>> + COSTS_N_INSNS (10), /* HI. */
>> + COSTS_N_INSNS (9), /* SI. */
>> + COSTS_N_INSNS (50), /* DI. */
>> + COSTS_N_INSNS (50)}, /* other. */
>> + COSTS_N_INSNS (1), /* cost of movsx. */
>> + COSTS_N_INSNS (1), /* cost of movzx. */
>> + 8, /* "large" insn. */
>> + 17, /* MOVE_RATIO. */
>> + 6, /* CLEAR_RATIO. */
>> + {8, 8, 8}, /* cost of loading integer registers
>> + in QImode, HImode and SImode.
>> + Relative to reg-reg move (2). */
>> + {8, 8, 8}, /* cost of storing integer registers. */
>> + {8, 8, 8, 12, 15}, /* cost of loading SSE register
>> + in 32bit, 64bit, 128bit, 256bit and 512bit. */
>> + {8, 8, 8, 12, 15}, /* cost of storing SSE register
>> + in 32bit, 64bit, 128bit, 256bit and 512bit. */
>> + {8, 8, 8, 12, 15}, /* cost of unaligned loads. */
>> + {8, 8, 8, 12, 15}, /* cost of unaligned storess. */
>> + 2, 3, 4, /* cost of moving XMM,YMM,ZMM register. */
>> + 8, /* cost of moving SSE register to integer. */
>> + 18, 6, /* Gather load static, per_elt. */
>> + 18, 6, /* Gather store static, per_elt. */
>> + 32, /* size of l1 cache. */
>> + 256, /* size of l2 cache. */
>> + 64, /* size of prefetch block. */
>> + 12, /* number of parallel prefetches. */
>> + 3, /* Branch cost. */
>> + COSTS_N_INSNS (3), /* cost of FADD and FSUB insns. */
>> + COSTS_N_INSNS (3), /* cost of FMUL instruction. */
>> + COSTS_N_INSNS (13), /* cost of FDIV instruction. */
>> + COSTS_N_INSNS (2), /* cost of FABS instruction. */
>> + COSTS_N_INSNS (2), /* cost of FCHS instruction. */
>> + COSTS_N_INSNS (44), /* cost of FSQRT instruction. */
>> +
>> + COSTS_N_INSNS (1), /* cost of cheap SSE instruction. */
>> + COSTS_N_INSNS (3), /* cost of ADDSS/SD SUBSS/SD insns. */
>> + COSTS_N_INSNS (3), /* cost of MULSS instruction. */
>> + COSTS_N_INSNS (3), /* cost of MULSD instruction. */
>> + COSTS_N_INSNS (5), /* cost of FMA SS instruction. */
>> + COSTS_N_INSNS (5), /* cost of FMA SD instruction. */
>> + COSTS_N_INSNS (11), /* cost of DIVSS instruction. */
>> + COSTS_N_INSNS (14), /* cost of DIVSD instruction. */
>> + COSTS_N_INSNS (11), /* cost of SQRTSS instruction. */
>> + COSTS_N_INSNS (18), /* cost of SQRTSD instruction. */
>> + 4, 4, 4, 4, /* reassoc int, fp, vec_int, vec_fp. */
>> + shijidadao_memcpy,
>> + shijidadao_memset,
>> + COSTS_N_INSNS (3), /* cond_taken_branch_cost. */
>> + COSTS_N_INSNS (1), /* cond_not_taken_branch_cost. */
>> + "16:11:8", /* Loop alignment. */
>> + "16:11:8", /* Jump alignment. */
>> + "0:0:8", /* Label alignment. */
>> + "16", /* Func alignment. */
>> + 4, /* Small unroll limit. */
>> + 2, /* Small unroll factor. */
>> +};
>> +
>> +
>>
>> /* Generic should produce code tuned for Core-i7 (and newer chips)
>> and btver1 (and newer chips). */
>> diff --git a/gcc/config/i386/x86-tune-sched.cc b/gcc/config/i386/x86-tune-sched.cc
>> index f70846e628e..d77298b0e34 100644
>> --- a/gcc/config/i386/x86-tune-sched.cc
>> +++ b/gcc/config/i386/x86-tune-sched.cc
>> @@ -79,6 +79,7 @@ ix86_issue_rate (void)
>> case PROCESSOR_CANNONLAKE:
>> case PROCESSOR_ALDERLAKE:
>> case PROCESSOR_YONGFENG:
>> + case PROCESSOR_SHIJIDADAO:
>> case PROCESSOR_GENERIC:
>> return 4;
>>
>> @@ -446,6 +447,7 @@ ix86_adjust_cost (rtx_insn *insn, int dep_type, rtx_insn *dep_insn, int cost,
>> break;
>>
>> case PROCESSOR_YONGFENG:
>> + case PROCESSOR_SHIJIDADAO:
>> /* Stack engine allows to execute push&pop instructions in parallel. */
>> if ((insn_type == TYPE_PUSH || insn_type == TYPE_POP)
>> && (dep_insn_type == TYPE_PUSH || dep_insn_type == TYPE_POP))
>> diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def
>> index 0fa1484b48d..dbb2543c68f 100644
>> --- a/gcc/config/i386/x86-tune.def
>> +++ b/gcc/config/i386/x86-tune.def
>> @@ -477,7 +477,7 @@ DEF_TUNE (X86_TUNE_AVOID_4BYTE_PREFIXES, "avoid_4byte_prefixes",
>> elements. */
>> DEF_TUNE (X86_TUNE_USE_GATHER_2PARTS, "use_gather_2parts",
>> ~(m_ZNVER1 | m_ZNVER2 | m_ZNVER3 | m_ZNVER4 | m_CORE_HYBRID
>> - | m_YONGFENG | m_CORE_ATOM | m_GENERIC | m_GDS))
>> + | m_YONGFENG | m_SHIJIDADAO | m_CORE_ATOM | m_GENERIC | m_GDS))
>>
>> /* X86_TUNE_USE_SCATTER_2PARTS: Use scater instructions for vectors with 2
>> elements. */
>> @@ -488,7 +488,7 @@ DEF_TUNE (X86_TUNE_USE_SCATTER_2PARTS, "use_scatter_2parts",
>> elements. */
>> DEF_TUNE (X86_TUNE_USE_GATHER_4PARTS, "use_gather_4parts",
>> ~(m_ZNVER1 | m_ZNVER2 | m_ZNVER3 | m_ZNVER4 | m_CORE_HYBRID
>> - | m_YONGFENG | m_CORE_ATOM | m_GENERIC | m_GDS))
>> + | m_YONGFENG | m_SHIJIDADAO | m_CORE_ATOM | m_GENERIC | m_GDS))
>>
>> /* X86_TUNE_USE_SCATTER_4PARTS: Use scater instructions for vectors with 4
>> elements. */
>> @@ -499,7 +499,7 @@ DEF_TUNE (X86_TUNE_USE_SCATTER_4PARTS, "use_scatter_4parts",
>> elements. */
>> DEF_TUNE (X86_TUNE_USE_GATHER_8PARTS, "use_gather_8parts",
>> ~(m_ZNVER1 | m_ZNVER2 | m_ZNVER4 | m_CORE_HYBRID | m_CORE_ATOM
>> - | m_YONGFENG | m_GENERIC | m_GDS))
>> + | m_YONGFENG | m_SHIJIDADAO | m_GENERIC | m_GDS))
>>
>> /* X86_TUNE_USE_SCATTER: Use scater instructions for vectors with 8 or more
>> elements. */
>> @@ -509,7 +509,7 @@ DEF_TUNE (X86_TUNE_USE_SCATTER_8PARTS, "use_scatter_8parts",
>> /* X86_TUNE_AVOID_128FMA_CHAINS: Avoid creating loops with tight 128bit or
>> smaller FMA chain. */
>> DEF_TUNE (X86_TUNE_AVOID_128FMA_CHAINS, "avoid_fma_chains", m_ZNVER1 | m_ZNVER2 | m_ZNVER3 | m_ZNVER4
>> - | m_YONGFENG | m_GENERIC)
>> + | m_YONGFENG | m_SHIJIDADAO | m_GENERIC)
>>
>> /* X86_TUNE_AVOID_256FMA_CHAINS: Avoid creating loops with tight 256bit or
>> smaller FMA chain. */
>> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
>> index 8786249fb6f..11bb675830e 100644
>> --- a/gcc/doc/extend.texi
>> +++ b/gcc/doc/extend.texi
>> @@ -26153,6 +26153,9 @@ ZHAOXIN lujiazui CPU.
>> @item yongfeng
>> ZHAOXIN yongfeng CPU.
>>
>> +@item shijidadao
>> +ZHAOXIN shijidadao CPU.
>> +
>> @item amdfam10h
>> AMD Family 10h CPU.
>>
>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>> index 218901c0b20..82bf07a83bb 100644
>> --- a/gcc/doc/invoke.texi
>> +++ b/gcc/doc/invoke.texi
>> @@ -34799,6 +34799,12 @@ SSE4.2, AVX, POPCNT, AES, PCLMUL, RDRND, XSAVE, XSAVEOPT, FSGSBASE, CX16,
>> ABM, BMI, BMI2, F16C, FXSR, RDSEED, AVX2, FMA, SHA, LZCNT
>> instruction set support.
>>
>> +@item shijidadao
>> +ZHAOXIN shijidadao CPU with x86-64, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1,
>> +SSE4.2, AVX, POPCNT, AES, PCLMUL, RDRND, XSAVE, XSAVEOPT, FSGSBASE, CX16,
>> +ABM, BMI, BMI2, F16C, FXSR, RDSEED, AVX2, FMA, SHA, LZCNT
>> +instruction set support.
>> +
>> @item geode
>> AMD Geode embedded processor with MMX and 3DNow!@: instruction set support.
>> @end table
>> diff --git a/gcc/testsuite/g++.target/i386/mv32.C b/gcc/testsuite/g++.target/i386/mv32.C
>> index 6c993218d01..b311c35baa3 100644
>> --- a/gcc/testsuite/g++.target/i386/mv32.C
>> +++ b/gcc/testsuite/g++.target/i386/mv32.C
>> @@ -21,6 +21,10 @@ int __attribute__ ((target("arch=yongfeng"))) foo () {
>> return 2;
>> }
>>
>> +int __attribute__ ((target("arch=shijidadao"))) foo () {
>> + return 3;
>> +}
>> +
>> int main ()
>> {
>> int val = foo ();
>> @@ -29,6 +33,8 @@ int main ()
>> assert (val == 1);
>> else if (__builtin_cpu_is ("yongfeng"))
>> assert (val == 2);
>> + else if (__builtin_cpu_is ("shijidadao"))
>> + assert (val == 3);
>> else
>> assert (val == 0);
>>
>> diff --git a/gcc/testsuite/gcc.target/i386/funcspec-56.inc b/gcc/testsuite/gcc.target/i386/funcspec-56.inc
>> index 2a50f5bf67c..c4dc89367ef 100644
>> --- a/gcc/testsuite/gcc.target/i386/funcspec-56.inc
>> +++ b/gcc/testsuite/gcc.target/i386/funcspec-56.inc
>> @@ -208,6 +208,7 @@ extern void test_arch_arrowlake_s (void) __attribute__((__target__("arch=arrowla
>> extern void test_arch_pantherlake (void) __attribute__((__target__("arch=pantherlake")));
>> extern void test_arch_lujiazui (void) __attribute__((__target__("arch=lujiazui")));
>> extern void test_arch_yongfeng (void) __attribute__((__target__("arch=yongfeng")));
>> +extern void test_arch_shijidadao (void) __attribute__((__target__("arch=shijidadao")));
>> extern void test_arch_k8 (void) __attribute__((__target__("arch=k8")));
>> extern void test_arch_k8_sse3 (void) __attribute__((__target__("arch=k8-sse3")));
>> extern void test_arch_opteron (void) __attribute__((__target__("arch=opteron")));
>> @@ -233,6 +234,7 @@ extern void test_tune_corei7_avx (void) __attribute__((__target__("tune=corei7-
>> extern void test_tune_core_avx2 (void) __attribute__((__target__("tune=core-avx2")));
>> extern void test_tune_lujiazui (void) __attribute__((__target__("tune=lujiazui")));
>> extern void test_tune_yongfeng (void) __attribute__((__target__("tune=yongfeng")));
>> +extern void test_tune_shijidadao (void) __attribute__((__target__("tune=shijidadao")));
>> extern void test_tune_k8 (void) __attribute__((__target__("tune=k8")));
>> extern void test_tune_k8_sse3 (void) __attribute__((__target__("tune=k8-sse3")));
>> extern void test_tune_opteron (void) __attribute__((__target__("tune=opteron")));
>> --
>> 2.27.0
>>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] [x86_64]: Zhaoxin shijidadao enablement
2024-06-18 7:21 ` mayshao-oc
@ 2024-06-19 18:22 ` Uros Bizjak
0 siblings, 0 replies; 4+ messages in thread
From: Uros Bizjak @ 2024-06-19 18:22 UTC (permalink / raw)
To: mayshao-oc
Cc: hubicka, gcc-patches, louisqi, hawkwang, silviazhao, xiangjiehong, timhu
On Tue, Jun 18, 2024 at 9:21 AM mayshao-oc <mayshao-oc@zhaoxin.com> wrote:
>
>
>
> On 5/28/24 14:15, Uros Bizjak wrote:
> >
> >
> >
> > On Mon, May 27, 2024 at 10:33 AM MayShao <MayShao-oc@zhaoxin.com> wrote:
> >>
> >> From: mayshao <mayshao-oc@zhaoxin.com>
> >>
> >> Hi all:
> >> This patch enables -march/-mtune=shijidadao, costs and tunings are set according to the characteristics of the processor.
> >>
> >> Bootstrapped /regtested X86_64.
> >>
> >> Ok for trunk?
> >
> > OK.
> >
> > Thanks,
> > Uros.
>
> Thanks for your review, please help me commit.
Done, committed as r15-1454 [1].
[1] https://gcc.gnu.org/pipermail/gcc-cvs/2024-June/404474.html
Thanks,
Uros.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-06-19 18:22 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-27 8:32 [PATCH] [x86_64]: Zhaoxin shijidadao enablement MayShao
2024-05-28 6:15 ` Uros Bizjak
2024-06-18 7:21 ` mayshao-oc
2024-06-19 18:22 ` Uros Bizjak
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).