[PATCH 1/6]AArch64: Refactor costs models to different files.

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH 1/6]AArch64: Refactor costs models to different files.
@ 2023-11-15 17:06 Tamar Christina
  2023-11-15 17:07 ` [PATCH 2/6]AArch64: Remove special handling of generic cpu Tamar Christina
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Tamar Christina @ 2023-11-15 17:06 UTC (permalink / raw)
  To: gcc-patches
  Cc: nd, Richard.Earnshaw, Marcus.Shawcroft, Kyrylo.Tkachov,
	richard.sandiford

[-- Attachment #1: Type: text/plain, Size: 182581 bytes --]

Hi All,

This patch series attempts to move the generic cost model in AArch64 to a new
and modern generic standard.  The current standard is quite old and generates
very suboptimal code out of the box for user of GCC.

The goal is for the new cost model to be beneficial on newer/current Arm
Microarchitectures while not being too negative for older ones.

It does not change any core specific optimization.  The final changes reflect
both performance optimizations and size optimizations.

This first patch just re-organizes the cost structures to their own files.
The AArch64.cc file has gotten very big and it's hard to follow.

No functional changes are expected from this change.  Note that since all the
structures have private visibility I've put them in header files instead.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	PR target/111370
	* config/aarch64/aarch64.cc (generic_addrcost_table,
	exynosm1_addrcost_table,
	xgene1_addrcost_table,
	thunderx2t99_addrcost_table,
	thunderx3t110_addrcost_table,
	tsv110_addrcost_table,
	qdf24xx_addrcost_table,
	a64fx_addrcost_table,
	neoversev1_addrcost_table,
	neoversen2_addrcost_table,
	neoversev2_addrcost_table,
	generic_regmove_cost,
	cortexa57_regmove_cost,
	cortexa53_regmove_cost,
	exynosm1_regmove_cost,
	thunderx_regmove_cost,
	xgene1_regmove_cost,
	qdf24xx_regmove_cost,
	thunderx2t99_regmove_cost,
	thunderx3t110_regmove_cost,
	tsv110_regmove_cost,
	a64fx_regmove_cost,
	neoversen2_regmove_cost,
	neoversev1_regmove_cost,
	neoversev2_regmove_cost,
	generic_vector_cost,
	a64fx_vector_cost,
	qdf24xx_vector_cost,
	thunderx_vector_cost,
	tsv110_vector_cost,
	cortexa57_vector_cost,
	exynosm1_vector_cost,
	xgene1_vector_cost,
	thunderx2t99_vector_cost,
	thunderx3t110_vector_cost,
	ampere1_vector_cost,
	generic_branch_cost,
	generic_tunings,
	cortexa35_tunings,
	cortexa53_tunings,
	cortexa57_tunings,
	cortexa72_tunings,
	cortexa73_tunings,
	exynosm1_tunings,
	thunderxt88_tunings,
	thunderx_tunings,
	tsv110_tunings,
	xgene1_tunings,
	emag_tunings,
	qdf24xx_tunings,
	saphira_tunings,
	thunderx2t99_tunings,
	thunderx3t110_tunings,
	neoversen1_tunings,
	ampere1_tunings,
	ampere1a_tunings,
	neoversev1_vector_cost,
	neoversev1_tunings,
	neoverse512tvb_vector_cost,
	neoverse512tvb_tunings,
	neoversen2_vector_cost,
	neoversen2_tunings,
	neoversev2_vector_cost,
	neoversev2_tunings
	a64fx_tunings): Split into own files.
	* config/aarch64/tuning_models/a64fx.h: New file.
	* config/aarch64/tuning_models/ampere1.h: New file.
	* config/aarch64/tuning_models/ampere1a.h: New file.
	* config/aarch64/tuning_models/cortexa35.h: New file.
	* config/aarch64/tuning_models/cortexa53.h: New file.
	* config/aarch64/tuning_models/cortexa57.h: New file.
	* config/aarch64/tuning_models/cortexa72.h: New file.
	* config/aarch64/tuning_models/cortexa73.h: New file.
	* config/aarch64/tuning_models/emag.h: New file.
	* config/aarch64/tuning_models/exynosm1.h: New file.
	* config/aarch64/tuning_models/generic.h: New file.
	* config/aarch64/tuning_models/neoverse512tvb.h: New file.
	* config/aarch64/tuning_models/neoversen1.h: New file.
	* config/aarch64/tuning_models/neoversen2.h: New file.
	* config/aarch64/tuning_models/neoversev1.h: New file.
	* config/aarch64/tuning_models/neoversev2.h: New file.
	* config/aarch64/tuning_models/qdf24xx.h: New file.
	* config/aarch64/tuning_models/saphira.h: New file.
	* config/aarch64/tuning_models/thunderx.h: New file.
	* config/aarch64/tuning_models/thunderx2t99.h: New file.
	* config/aarch64/tuning_models/thunderx3t110.h: New file.
	* config/aarch64/tuning_models/thunderxt88.h: New file.
	* config/aarch64/tuning_models/tsv110.h: New file.
	* config/aarch64/tuning_models/xgene1.h: New file.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 9fbfc548a891f5d11940c6fd3c49a14bfbdec886..07b1cde39209f5c7740e336b499e9aed31e4c515 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -354,2405 +354,30 @@ static const struct aarch64_flag_desc aarch64_tuning_flags[] =
 };
 
 /* Tuning parameters.  */
-
-static const struct cpu_addrcost_table generic_addrcost_table =
-{
-    {
-      1, /* hi  */
-      0, /* si  */
-      0, /* di  */
-      1, /* ti  */
-    },
-  0, /* pre_modify  */
-  0, /* post_modify  */
-  0, /* post_modify_ld3_st3  */
-  0, /* post_modify_ld4_st4  */
-  0, /* register_offset  */
-  0, /* register_sextend  */
-  0, /* register_zextend  */
-  0 /* imm_offset  */
-};
-
-static const struct cpu_addrcost_table exynosm1_addrcost_table =
-{
-    {
-      0, /* hi  */
-      0, /* si  */
-      0, /* di  */
-      2, /* ti  */
-    },
-  0, /* pre_modify  */
-  0, /* post_modify  */
-  0, /* post_modify_ld3_st3  */
-  0, /* post_modify_ld4_st4  */
-  1, /* register_offset  */
-  1, /* register_sextend  */
-  2, /* register_zextend  */
-  0, /* imm_offset  */
-};
-
-static const struct cpu_addrcost_table xgene1_addrcost_table =
-{
-    {
-      1, /* hi  */
-      0, /* si  */
-      0, /* di  */
-      1, /* ti  */
-    },
-  1, /* pre_modify  */
-  1, /* post_modify  */
-  1, /* post_modify_ld3_st3  */
-  1, /* post_modify_ld4_st4  */
-  0, /* register_offset  */
-  1, /* register_sextend  */
-  1, /* register_zextend  */
-  0, /* imm_offset  */
-};
-
-static const struct cpu_addrcost_table thunderx2t99_addrcost_table =
-{
-    {
-      1, /* hi  */
-      1, /* si  */
-      1, /* di  */
-      2, /* ti  */
-    },
-  0, /* pre_modify  */
-  0, /* post_modify  */
-  0, /* post_modify_ld3_st3  */
-  0, /* post_modify_ld4_st4  */
-  2, /* register_offset  */
-  3, /* register_sextend  */
-  3, /* register_zextend  */
-  0, /* imm_offset  */
-};
-
-static const struct cpu_addrcost_table thunderx3t110_addrcost_table =
-{
-    {
-      1, /* hi  */
-      1, /* si  */
-      1, /* di  */
-      2, /* ti  */
-    },
-  0, /* pre_modify  */
-  0, /* post_modify  */
-  0, /* post_modify_ld3_st3  */
-  0, /* post_modify_ld4_st4  */
-  2, /* register_offset  */
-  3, /* register_sextend  */
-  3, /* register_zextend  */
-  0, /* imm_offset  */
-};
-
-static const struct cpu_addrcost_table tsv110_addrcost_table =
-{
-    {
-      1, /* hi  */
-      0, /* si  */
-      0, /* di  */
-      1, /* ti  */
-    },
-  0, /* pre_modify  */
-  0, /* post_modify  */
-  0, /* post_modify_ld3_st3  */
-  0, /* post_modify_ld4_st4  */
-  0, /* register_offset  */
-  1, /* register_sextend  */
-  1, /* register_zextend  */
-  0, /* imm_offset  */
-};
-
-static const struct cpu_addrcost_table qdf24xx_addrcost_table =
-{
-    {
-      1, /* hi  */
-      1, /* si  */
-      1, /* di  */
-      2, /* ti  */
-    },
-  1, /* pre_modify  */
-  1, /* post_modify  */
-  1, /* post_modify_ld3_st3  */
-  1, /* post_modify_ld4_st4  */
-  3, /* register_offset  */
-  3, /* register_sextend  */
-  3, /* register_zextend  */
-  2, /* imm_offset  */
-};
-
-static const struct cpu_addrcost_table a64fx_addrcost_table =
-{
-    {
-      1, /* hi  */
-      1, /* si  */
-      1, /* di  */
-      2, /* ti  */
-    },
-  0, /* pre_modify  */
-  0, /* post_modify  */
-  0, /* post_modify_ld3_st3  */
-  0, /* post_modify_ld4_st4  */
-  2, /* register_offset  */
-  3, /* register_sextend  */
-  3, /* register_zextend  */
-  0, /* imm_offset  */
-};
-
-static const struct cpu_addrcost_table neoversev1_addrcost_table =
-{
-    {
-      1, /* hi  */
-      0, /* si  */
-      0, /* di  */
-      1, /* ti  */
-    },
-  0, /* pre_modify  */
-  0, /* post_modify  */
-  3, /* post_modify_ld3_st3  */
-  3, /* post_modify_ld4_st4  */
-  0, /* register_offset  */
-  0, /* register_sextend  */
-  0, /* register_zextend  */
-  0 /* imm_offset  */
-};
-
-static const struct cpu_addrcost_table neoversen2_addrcost_table =
-{
-    {
-      1, /* hi  */
-      0, /* si  */
-      0, /* di  */
-      1, /* ti  */
-    },
-  0, /* pre_modify  */
-  0, /* post_modify  */
-  2, /* post_modify_ld3_st3  */
-  2, /* post_modify_ld4_st4  */
-  0, /* register_offset  */
-  0, /* register_sextend  */
-  0, /* register_zextend  */
-  0 /* imm_offset  */
-};
-
-static const struct cpu_addrcost_table neoversev2_addrcost_table =
-{
-    {
-      1, /* hi  */
-      0, /* si  */
-      0, /* di  */
-      1, /* ti  */
-    },
-  0, /* pre_modify  */
-  0, /* post_modify  */
-  2, /* post_modify_ld3_st3  */
-  2, /* post_modify_ld4_st4  */
-  0, /* register_offset  */
-  0, /* register_sextend  */
-  0, /* register_zextend  */
-  0 /* imm_offset  */
-};
-
-static const struct cpu_regmove_cost generic_regmove_cost =
-{
-  1, /* GP2GP  */
-  /* Avoid the use of slow int<->fp moves for spilling by setting
-     their cost higher than memmov_cost.  */
-  5, /* GP2FP  */
-  5, /* FP2GP  */
-  2 /* FP2FP  */
-};
-
-static const struct cpu_regmove_cost cortexa57_regmove_cost =
-{
-  1, /* GP2GP  */
-  /* Avoid the use of slow int<->fp moves for spilling by setting
-     their cost higher than memmov_cost.  */
-  5, /* GP2FP  */
-  5, /* FP2GP  */
-  2 /* FP2FP  */
-};
-
-static const struct cpu_regmove_cost cortexa53_regmove_cost =
-{
-  1, /* GP2GP  */
-  /* Avoid the use of slow int<->fp moves for spilling by setting
-     their cost higher than memmov_cost.  */
-  5, /* GP2FP  */
-  5, /* FP2GP  */
-  2 /* FP2FP  */
-};
-
-static const struct cpu_regmove_cost exynosm1_regmove_cost =
-{
-  1, /* GP2GP  */
-  /* Avoid the use of slow int<->fp moves for spilling by setting
-     their cost higher than memmov_cost (actual, 4 and 9).  */
-  9, /* GP2FP  */
-  9, /* FP2GP  */
-  1 /* FP2FP  */
-};
-
-static const struct cpu_regmove_cost thunderx_regmove_cost =
-{
-  2, /* GP2GP  */
-  2, /* GP2FP  */
-  6, /* FP2GP  */
-  4 /* FP2FP  */
-};
-
-static const struct cpu_regmove_cost xgene1_regmove_cost =
-{
-  1, /* GP2GP  */
-  /* Avoid the use of slow int<->fp moves for spilling by setting
-     their cost higher than memmov_cost.  */
-  8, /* GP2FP  */
-  8, /* FP2GP  */
-  2 /* FP2FP  */
-};
-
-static const struct cpu_regmove_cost qdf24xx_regmove_cost =
-{
-  2, /* GP2GP  */
-  /* Avoid the use of int<->fp moves for spilling.  */
-  6, /* GP2FP  */
-  6, /* FP2GP  */
-  4 /* FP2FP  */
-};
-
-static const struct cpu_regmove_cost thunderx2t99_regmove_cost =
-{
-  1, /* GP2GP  */
-  /* Avoid the use of int<->fp moves for spilling.  */
-  5, /* GP2FP  */
-  6, /* FP2GP  */
-  3, /* FP2FP  */
-};
-
-static const struct cpu_regmove_cost thunderx3t110_regmove_cost =
-{
-  1, /* GP2GP  */
-  /* Avoid the use of int<->fp moves for spilling.  */
-  4, /* GP2FP  */
-  5, /* FP2GP  */
-  4  /* FP2FP  */
-};
-
-static const struct cpu_regmove_cost tsv110_regmove_cost =
-{
-  1, /* GP2GP  */
-  /* Avoid the use of slow int<->fp moves for spilling by setting
-     their cost higher than memmov_cost.  */
-  2, /* GP2FP  */
-  3, /* FP2GP  */
-  2  /* FP2FP  */
-};
-
-static const struct cpu_regmove_cost a64fx_regmove_cost =
-{
-  1, /* GP2GP  */
-  /* Avoid the use of slow int<->fp moves for spilling by setting
-     their cost higher than memmov_cost.  */
-  5, /* GP2FP  */
-  7, /* FP2GP  */
-  2 /* FP2FP  */
-};
-
-static const struct cpu_regmove_cost neoversen2_regmove_cost =
-{
-  1, /* GP2GP  */
-  /* Spilling to int<->fp instead of memory is recommended so set
-     realistic costs compared to memmov_cost.  */
-  3, /* GP2FP  */
-  2, /* FP2GP  */
-  2 /* FP2FP  */
-};
-
-static const struct cpu_regmove_cost neoversev1_regmove_cost =
-{
-  1, /* GP2GP  */
-  /* Spilling to int<->fp instead of memory is recommended so set
-     realistic costs compared to memmov_cost.  */
-  3, /* GP2FP  */
-  2, /* FP2GP  */
-  2 /* FP2FP  */
-};
-
-static const struct cpu_regmove_cost neoversev2_regmove_cost =
-{
-  1, /* GP2GP  */
-  /* Spilling to int<->fp instead of memory is recommended so set
-     realistic costs compared to memmov_cost.  */
-  3, /* GP2FP  */
-  2, /* FP2GP  */
-  2 /* FP2FP  */
-};
-
-/* Generic costs for Advanced SIMD vector operations.   */
-static const advsimd_vec_cost generic_advsimd_vector_cost =
-{
-  1, /* int_stmt_cost  */
-  1, /* fp_stmt_cost  */
-  0, /* ld2_st2_permute_cost  */
-  0, /* ld3_st3_permute_cost  */
-  0, /* ld4_st4_permute_cost  */
-  2, /* permute_cost  */
-  2, /* reduc_i8_cost  */
-  2, /* reduc_i16_cost  */
-  2, /* reduc_i32_cost  */
-  2, /* reduc_i64_cost  */
-  2, /* reduc_f16_cost  */
-  2, /* reduc_f32_cost  */
-  2, /* reduc_f64_cost  */
-  2, /* store_elt_extra_cost  */
-  2, /* vec_to_scalar_cost  */
-  1, /* scalar_to_vec_cost  */
-  1, /* align_load_cost  */
-  1, /* unalign_load_cost  */
-  1, /* unalign_store_cost  */
-  1  /* store_cost  */
-};
-
-/* Generic costs for SVE vector operations.  */
-static const sve_vec_cost generic_sve_vector_cost =
-{
-  {
-    1, /* int_stmt_cost  */
-    1, /* fp_stmt_cost  */
-    0, /* ld2_st2_permute_cost  */
-    0, /* ld3_st3_permute_cost  */
-    0, /* ld4_st4_permute_cost  */
-    2, /* permute_cost  */
-    2, /* reduc_i8_cost  */
-    2, /* reduc_i16_cost  */
-    2, /* reduc_i32_cost  */
-    2, /* reduc_i64_cost  */
-    2, /* reduc_f16_cost  */
-    2, /* reduc_f32_cost  */
-    2, /* reduc_f64_cost  */
-    2, /* store_elt_extra_cost  */
-    2, /* vec_to_scalar_cost  */
-    1, /* scalar_to_vec_cost  */
-    1, /* align_load_cost  */
-    1, /* unalign_load_cost  */
-    1, /* unalign_store_cost  */
-    1  /* store_cost  */
-  },
-  2, /* clast_cost  */
-  2, /* fadda_f16_cost  */
-  2, /* fadda_f32_cost  */
-  2, /* fadda_f64_cost  */
-  4, /* gather_load_x32_cost  */
-  2, /* gather_load_x64_cost  */
-  1 /* scatter_store_elt_cost  */
-};
-
-/* Generic costs for vector insn classes.  */
-static const struct cpu_vector_cost generic_vector_cost =
-{
-  1, /* scalar_int_stmt_cost  */
-  1, /* scalar_fp_stmt_cost  */
-  1, /* scalar_load_cost  */
-  1, /* scalar_store_cost  */
-  3, /* cond_taken_branch_cost  */
-  1, /* cond_not_taken_branch_cost  */
-  &generic_advsimd_vector_cost, /* advsimd  */
-  &generic_sve_vector_cost, /* sve */
-  nullptr /* issue_info  */
-};
-
-static const advsimd_vec_cost a64fx_advsimd_vector_cost =
-{
-  2, /* int_stmt_cost  */
-  5, /* fp_stmt_cost  */
-  0, /* ld2_st2_permute_cost  */
-  0, /* ld3_st3_permute_cost  */
-  0, /* ld4_st4_permute_cost  */
-  3, /* permute_cost  */
-  13, /* reduc_i8_cost  */
-  13, /* reduc_i16_cost  */
-  13, /* reduc_i32_cost  */
-  13, /* reduc_i64_cost  */
-  13, /* reduc_f16_cost  */
-  13, /* reduc_f32_cost  */
-  13, /* reduc_f64_cost  */
-  13, /* store_elt_extra_cost  */
-  13, /* vec_to_scalar_cost  */
-  4, /* scalar_to_vec_cost  */
-  6, /* align_load_cost  */
-  6, /* unalign_load_cost  */
-  1, /* unalign_store_cost  */
-  1  /* store_cost  */
-};
-
-static const sve_vec_cost a64fx_sve_vector_cost =
-{
-  {
-    2, /* int_stmt_cost  */
-    5, /* fp_stmt_cost  */
-    0, /* ld2_st2_permute_cost  */
-    0, /* ld3_st3_permute_cost  */
-    0, /* ld4_st4_permute_cost  */
-    3, /* permute_cost  */
-    13, /* reduc_i8_cost  */
-    13, /* reduc_i16_cost  */
-    13, /* reduc_i32_cost  */
-    13, /* reduc_i64_cost  */
-    13, /* reduc_f16_cost  */
-    13, /* reduc_f32_cost  */
-    13, /* reduc_f64_cost  */
-    13, /* store_elt_extra_cost  */
-    13, /* vec_to_scalar_cost  */
-    4, /* scalar_to_vec_cost  */
-    6, /* align_load_cost  */
-    6, /* unalign_load_cost  */
-    1, /* unalign_store_cost  */
-    1  /* store_cost  */
-  },
-  13, /* clast_cost  */
-  13, /* fadda_f16_cost  */
-  13, /* fadda_f32_cost  */
-  13, /* fadda_f64_cost  */
-  64, /* gather_load_x32_cost  */
-  32, /* gather_load_x64_cost  */
-  1 /* scatter_store_elt_cost  */
-};
-
-static const struct cpu_vector_cost a64fx_vector_cost =
-{
-  1, /* scalar_int_stmt_cost  */
-  5, /* scalar_fp_stmt_cost  */
-  4, /* scalar_load_cost  */
-  1, /* scalar_store_cost  */
-  3, /* cond_taken_branch_cost  */
-  1, /* cond_not_taken_branch_cost  */
-  &a64fx_advsimd_vector_cost, /* advsimd  */
-  &a64fx_sve_vector_cost, /* sve  */
-  nullptr /* issue_info  */
-};
-
-static const advsimd_vec_cost qdf24xx_advsimd_vector_cost =
-{
-  1, /* int_stmt_cost  */
-  3, /* fp_stmt_cost  */
-  0, /* ld2_st2_permute_cost  */
-  0, /* ld3_st3_permute_cost  */
-  0, /* ld4_st4_permute_cost  */
-  2, /* permute_cost  */
-  1, /* reduc_i8_cost  */
-  1, /* reduc_i16_cost  */
-  1, /* reduc_i32_cost  */
-  1, /* reduc_i64_cost  */
-  1, /* reduc_f16_cost  */
-  1, /* reduc_f32_cost  */
-  1, /* reduc_f64_cost  */
-  1, /* store_elt_extra_cost  */
-  1, /* vec_to_scalar_cost  */
-  1, /* scalar_to_vec_cost  */
-  1, /* align_load_cost  */
-  1, /* unalign_load_cost  */
-  1, /* unalign_store_cost  */
-  1  /* store_cost  */
-};
-
-/* QDF24XX costs for vector insn classes.  */
-static const struct cpu_vector_cost qdf24xx_vector_cost =
-{
-  1, /* scalar_int_stmt_cost  */
-  1, /* scalar_fp_stmt_cost  */
-  1, /* scalar_load_cost  */
-  1, /* scalar_store_cost  */
-  3, /* cond_taken_branch_cost  */
-  1, /* cond_not_taken_branch_cost  */
-  &qdf24xx_advsimd_vector_cost, /* advsimd  */
-  nullptr, /* sve  */
-  nullptr /* issue_info  */
-};
-
-
-static const advsimd_vec_cost thunderx_advsimd_vector_cost =
-{
-  4, /* int_stmt_cost  */
-  1, /* fp_stmt_cost  */
-  0, /* ld2_st2_permute_cost  */
-  0, /* ld3_st3_permute_cost  */
-  0, /* ld4_st4_permute_cost  */
-  4, /* permute_cost  */
-  2, /* reduc_i8_cost  */
-  2, /* reduc_i16_cost  */
-  2, /* reduc_i32_cost  */
-  2, /* reduc_i64_cost  */
-  2, /* reduc_f16_cost  */
-  2, /* reduc_f32_cost  */
-  2, /* reduc_f64_cost  */
-  2, /* store_elt_extra_cost  */
-  2, /* vec_to_scalar_cost  */
-  2, /* scalar_to_vec_cost  */
-  3, /* align_load_cost  */
-  5, /* unalign_load_cost  */
-  5, /* unalign_store_cost  */
-  1  /* store_cost  */
-};
-
-/* ThunderX costs for vector insn classes.  */
-static const struct cpu_vector_cost thunderx_vector_cost =
-{
-  1, /* scalar_int_stmt_cost  */
-  1, /* scalar_fp_stmt_cost  */
-  3, /* scalar_load_cost  */
-  1, /* scalar_store_cost  */
-  3, /* cond_taken_branch_cost  */
-  3, /* cond_not_taken_branch_cost  */
-  &thunderx_advsimd_vector_cost, /* advsimd  */
-  nullptr, /* sve  */
-  nullptr /* issue_info  */
-};
-
-static const advsimd_vec_cost tsv110_advsimd_vector_cost =
-{
-  2, /* int_stmt_cost  */
-  2, /* fp_stmt_cost  */
-  0, /* ld2_st2_permute_cost  */
-  0, /* ld3_st3_permute_cost  */
-  0, /* ld4_st4_permute_cost  */
-  2, /* permute_cost  */
-  3, /* reduc_i8_cost  */
-  3, /* reduc_i16_cost  */
-  3, /* reduc_i32_cost  */
-  3, /* reduc_i64_cost  */
-  3, /* reduc_f16_cost  */
-  3, /* reduc_f32_cost  */
-  3, /* reduc_f64_cost  */
-  3, /* store_elt_extra_cost  */
-  3, /* vec_to_scalar_cost  */
-  2, /* scalar_to_vec_cost  */
-  5, /* align_load_cost  */
-  5, /* unalign_load_cost  */
-  1, /* unalign_store_cost  */
-  1  /* store_cost  */
-};
-
-static const struct cpu_vector_cost tsv110_vector_cost =
-{
-  1, /* scalar_int_stmt_cost  */
-  1, /* scalar_fp_stmt_cost  */
-  5, /* scalar_load_cost  */
-  1, /* scalar_store_cost  */
-  1, /* cond_taken_branch_cost  */
-  1, /* cond_not_taken_branch_cost  */
-  &tsv110_advsimd_vector_cost, /* advsimd  */
-  nullptr, /* sve  */
-  nullptr /* issue_info  */
-};
-
-static const advsimd_vec_cost cortexa57_advsimd_vector_cost =
-{
-  2, /* int_stmt_cost  */
-  2, /* fp_stmt_cost  */
-  0, /* ld2_st2_permute_cost  */
-  0, /* ld3_st3_permute_cost  */
-  0, /* ld4_st4_permute_cost  */
-  3, /* permute_cost  */
-  8, /* reduc_i8_cost  */
-  8, /* reduc_i16_cost  */
-  8, /* reduc_i32_cost  */
-  8, /* reduc_i64_cost  */
-  8, /* reduc_f16_cost  */
-  8, /* reduc_f32_cost  */
-  8, /* reduc_f64_cost  */
-  8, /* store_elt_extra_cost  */
-  8, /* vec_to_scalar_cost  */
-  8, /* scalar_to_vec_cost  */
-  4, /* align_load_cost  */
-  4, /* unalign_load_cost  */
-  1, /* unalign_store_cost  */
-  1  /* store_cost  */
-};
-
-/* Cortex-A57 costs for vector insn classes.  */
-static const struct cpu_vector_cost cortexa57_vector_cost =
-{
-  1, /* scalar_int_stmt_cost  */
-  1, /* scalar_fp_stmt_cost  */
-  4, /* scalar_load_cost  */
-  1, /* scalar_store_cost  */
-  1, /* cond_taken_branch_cost  */
-  1, /* cond_not_taken_branch_cost  */
-  &cortexa57_advsimd_vector_cost, /* advsimd  */
-  nullptr, /* sve  */
-  nullptr /* issue_info  */
-};
-
-static const advsimd_vec_cost exynosm1_advsimd_vector_cost =
-{
-  3, /* int_stmt_cost  */
-  3, /* fp_stmt_cost  */
-  0, /* ld2_st2_permute_cost  */
-  0, /* ld3_st3_permute_cost  */
-  0, /* ld4_st4_permute_cost  */
-  3, /* permute_cost  */
-  3, /* reduc_i8_cost  */
-  3, /* reduc_i16_cost  */
-  3, /* reduc_i32_cost  */
-  3, /* reduc_i64_cost  */
-  3, /* reduc_f16_cost  */
-  3, /* reduc_f32_cost  */
-  3, /* reduc_f64_cost  */
-  3, /* store_elt_extra_cost  */
-  3, /* vec_to_scalar_cost  */
-  3, /* scalar_to_vec_cost  */
-  5, /* align_load_cost  */
-  5, /* unalign_load_cost  */
-  1, /* unalign_store_cost  */
-  1  /* store_cost  */
-};
-
-static const struct cpu_vector_cost exynosm1_vector_cost =
-{
-  1, /* scalar_int_stmt_cost  */
-  1, /* scalar_fp_stmt_cost  */
-  5, /* scalar_load_cost  */
-  1, /* scalar_store_cost  */
-  1, /* cond_taken_branch_cost  */
-  1, /* cond_not_taken_branch_cost  */
-  &exynosm1_advsimd_vector_cost, /* advsimd  */
-  nullptr, /* sve  */
-  nullptr /* issue_info  */
-};
-
-static const advsimd_vec_cost xgene1_advsimd_vector_cost =
-{
-  2, /* int_stmt_cost  */
-  2, /* fp_stmt_cost  */
-  0, /* ld2_st2_permute_cost  */
-  0, /* ld3_st3_permute_cost  */
-  0, /* ld4_st4_permute_cost  */
-  2, /* permute_cost  */
-  4, /* reduc_i8_cost  */
-  4, /* reduc_i16_cost  */
-  4, /* reduc_i32_cost  */
-  4, /* reduc_i64_cost  */
-  4, /* reduc_f16_cost  */
-  4, /* reduc_f32_cost  */
-  4, /* reduc_f64_cost  */
-  4, /* store_elt_extra_cost  */
-  4, /* vec_to_scalar_cost  */
-  4, /* scalar_to_vec_cost  */
-  10, /* align_load_cost  */
-  10, /* unalign_load_cost  */
-  2, /* unalign_store_cost  */
-  2  /* store_cost  */
-};
-
-/* Generic costs for vector insn classes.  */
-static const struct cpu_vector_cost xgene1_vector_cost =
-{
-  1, /* scalar_int_stmt_cost  */
-  1, /* scalar_fp_stmt_cost  */
-  5, /* scalar_load_cost  */
-  1, /* scalar_store_cost  */
-  2, /* cond_taken_branch_cost  */
-  1, /* cond_not_taken_branch_cost  */
-  &xgene1_advsimd_vector_cost, /* advsimd  */
-  nullptr, /* sve  */
-  nullptr /* issue_info  */
-};
-
-static const advsimd_vec_cost thunderx2t99_advsimd_vector_cost =
-{
-  4, /* int_stmt_cost  */
-  5, /* fp_stmt_cost  */
-  0, /* ld2_st2_permute_cost  */
-  0, /* ld3_st3_permute_cost  */
-  0, /* ld4_st4_permute_cost  */
-  10, /* permute_cost  */
-  6, /* reduc_i8_cost  */
-  6, /* reduc_i16_cost  */
-  6, /* reduc_i32_cost  */
-  6, /* reduc_i64_cost  */
-  6, /* reduc_f16_cost  */
-  6, /* reduc_f32_cost  */
-  6, /* reduc_f64_cost  */
-  6, /* store_elt_extra_cost  */
-  6, /* vec_to_scalar_cost  */
-  5, /* scalar_to_vec_cost  */
-  4, /* align_load_cost  */
-  4, /* unalign_load_cost  */
-  1, /* unalign_store_cost  */
-  1  /* store_cost  */
-};
-
-/* Costs for vector insn classes for Vulcan.  */
-static const struct cpu_vector_cost thunderx2t99_vector_cost =
-{
-  1, /* scalar_int_stmt_cost  */
-  6, /* scalar_fp_stmt_cost  */
-  4, /* scalar_load_cost  */
-  1, /* scalar_store_cost  */
-  2, /* cond_taken_branch_cost  */
-  1,  /* cond_not_taken_branch_cost  */
-  &thunderx2t99_advsimd_vector_cost, /* advsimd  */
-  nullptr, /* sve  */
-  nullptr /* issue_info  */
-};
-
-static const advsimd_vec_cost thunderx3t110_advsimd_vector_cost =
-{
-  5, /* int_stmt_cost  */
-  5, /* fp_stmt_cost  */
-  0, /* ld2_st2_permute_cost  */
-  0, /* ld3_st3_permute_cost  */
-  0, /* ld4_st4_permute_cost  */
-  10, /* permute_cost  */
-  5, /* reduc_i8_cost  */
-  5, /* reduc_i16_cost  */
-  5, /* reduc_i32_cost  */
-  5, /* reduc_i64_cost  */
-  5, /* reduc_f16_cost  */
-  5, /* reduc_f32_cost  */
-  5, /* reduc_f64_cost  */
-  5, /* store_elt_extra_cost  */
-  5, /* vec_to_scalar_cost  */
-  5, /* scalar_to_vec_cost  */
-  4, /* align_load_cost  */
-  4, /* unalign_load_cost  */
-  4, /* unalign_store_cost  */
-  4  /* store_cost  */
-};
-
-static const struct cpu_vector_cost thunderx3t110_vector_cost =
-{
-  1, /* scalar_int_stmt_cost  */
-  5, /* scalar_fp_stmt_cost  */
-  4, /* scalar_load_cost  */
-  1, /* scalar_store_cost  */
-  2, /* cond_taken_branch_cost  */
-  1,  /* cond_not_taken_branch_cost  */
-  &thunderx3t110_advsimd_vector_cost, /* advsimd  */
-  nullptr, /* sve  */
-  nullptr /* issue_info  */
-};
-
-static const advsimd_vec_cost ampere1_advsimd_vector_cost =
-{
-  1, /* int_stmt_cost  */
-  3, /* fp_stmt_cost  */
-  0, /* ld2_st2_permute_cost  */
-  0, /* ld3_st3_permute_cost  */
-  0, /* ld4_st4_permute_cost  */
-  2, /* permute_cost  */
-  12, /* reduc_i8_cost  */
-  9, /* reduc_i16_cost  */
-  6, /* reduc_i32_cost  */
-  5, /* reduc_i64_cost  */
-  9, /* reduc_f16_cost  */
-  6, /* reduc_f32_cost  */
-  5, /* reduc_f64_cost  */
-  8, /* store_elt_extra_cost  */
-  6, /* vec_to_scalar_cost  */
-  7, /* scalar_to_vec_cost  */
-  4, /* align_load_cost  */
-  4, /* unalign_load_cost  */
-  1, /* unalign_store_cost  */
-  1  /* store_cost  */
-};
-
-/* Ampere-1 costs for vector insn classes.  */
-static const struct cpu_vector_cost ampere1_vector_cost =
-{
-  1, /* scalar_int_stmt_cost  */
-  3, /* scalar_fp_stmt_cost  */
-  4, /* scalar_load_cost  */
-  1, /* scalar_store_cost  */
-  1, /* cond_taken_branch_cost  */
-  1, /* cond_not_taken_branch_cost  */
-  &ampere1_advsimd_vector_cost, /* advsimd  */
-  nullptr, /* sve  */
-  nullptr  /* issue_info  */
-};
-
-/* Generic costs for branch instructions.  */
-static const struct cpu_branch_cost generic_branch_cost =
-{
-  1,  /* Predictable.  */
-  3   /* Unpredictable.  */
-};
-
-/* Generic approximation modes.  */
-static const cpu_approx_modes generic_approx_modes =
-{
-  AARCH64_APPROX_NONE,	/* division  */
-  AARCH64_APPROX_NONE,	/* sqrt  */
-  AARCH64_APPROX_NONE	/* recip_sqrt  */
-};
-
-/* Approximation modes for Exynos M1.  */
-static const cpu_approx_modes exynosm1_approx_modes =
-{
-  AARCH64_APPROX_NONE,	/* division  */
-  AARCH64_APPROX_ALL,	/* sqrt  */
-  AARCH64_APPROX_ALL	/* recip_sqrt  */
-};
-
-/* Approximation modes for X-Gene 1.  */
-static const cpu_approx_modes xgene1_approx_modes =
-{
-  AARCH64_APPROX_NONE,	/* division  */
-  AARCH64_APPROX_NONE,	/* sqrt  */
-  AARCH64_APPROX_ALL	/* recip_sqrt  */
-};
-
-/* Generic prefetch settings (which disable prefetch).  */
-static const cpu_prefetch_tune generic_prefetch_tune =
-{
-  0,			/* num_slots  */
-  -1,			/* l1_cache_size  */
-  -1,			/* l1_cache_line_size  */
-  -1,			/* l2_cache_size  */
-  true,			/* prefetch_dynamic_strides */
-  -1,			/* minimum_stride */
-  -1			/* default_opt_level  */
-};
-
-static const cpu_prefetch_tune exynosm1_prefetch_tune =
-{
-  0,			/* num_slots  */
-  -1,			/* l1_cache_size  */
-  64,			/* l1_cache_line_size  */
-  -1,			/* l2_cache_size  */
-  true,			/* prefetch_dynamic_strides */
-  -1,			/* minimum_stride */
-  -1			/* default_opt_level  */
-};
-
-static const cpu_prefetch_tune qdf24xx_prefetch_tune =
-{
-  4,			/* num_slots  */
-  32,			/* l1_cache_size  */
-  64,			/* l1_cache_line_size  */
-  512,			/* l2_cache_size  */
-  false,		/* prefetch_dynamic_strides */
-  2048,			/* minimum_stride */
-  3			/* default_opt_level  */
-};
-
-static const cpu_prefetch_tune thunderxt88_prefetch_tune =
-{
-  8,			/* num_slots  */
-  32,			/* l1_cache_size  */
-  128,			/* l1_cache_line_size  */
-  16*1024,		/* l2_cache_size  */
-  true,			/* prefetch_dynamic_strides */
-  -1,			/* minimum_stride */
-  3			/* default_opt_level  */
-};
-
-static const cpu_prefetch_tune thunderx_prefetch_tune =
-{
-  8,			/* num_slots  */
-  32,			/* l1_cache_size  */
-  128,			/* l1_cache_line_size  */
-  -1,			/* l2_cache_size  */
-  true,			/* prefetch_dynamic_strides */
-  -1,			/* minimum_stride */
-  -1			/* default_opt_level  */
-};
-
-static const cpu_prefetch_tune thunderx2t99_prefetch_tune =
-{
-  8,			/* num_slots  */
-  32,			/* l1_cache_size  */
-  64,			/* l1_cache_line_size  */
-  256,			/* l2_cache_size  */
-  true,			/* prefetch_dynamic_strides */
-  -1,			/* minimum_stride */
-  -1			/* default_opt_level  */
-};
-
-static const cpu_prefetch_tune thunderx3t110_prefetch_tune =
-{
-  8,			/* num_slots  */
-  32,			/* l1_cache_size  */
-  64,			/* l1_cache_line_size  */
-  256,			/* l2_cache_size  */
-  true,			/* prefetch_dynamic_strides */
-  -1,			/* minimum_stride */
-  -1			/* default_opt_level  */
-};
-
-static const cpu_prefetch_tune tsv110_prefetch_tune =
-{
-  0,                    /* num_slots  */
-  64,                   /* l1_cache_size  */
-  64,                   /* l1_cache_line_size  */
-  512,                  /* l2_cache_size  */
-  true,                 /* prefetch_dynamic_strides */
-  -1,                   /* minimum_stride */
-  -1                    /* default_opt_level  */
-};
-
-static const cpu_prefetch_tune xgene1_prefetch_tune =
-{
-  8,			/* num_slots  */
-  32,			/* l1_cache_size  */
-  64,			/* l1_cache_line_size  */
-  256,			/* l2_cache_size  */
-  true,                 /* prefetch_dynamic_strides */
-  -1,                   /* minimum_stride */
-  -1			/* default_opt_level  */
-};
-
-static const cpu_prefetch_tune a64fx_prefetch_tune =
-{
-  8,			/* num_slots  */
-  64,			/* l1_cache_size  */
-  256,			/* l1_cache_line_size  */
-  32768,		/* l2_cache_size  */
-  true,			/* prefetch_dynamic_strides */
-  -1,			/* minimum_stride */
-  -1			/* default_opt_level  */
-};
-
-static const cpu_prefetch_tune ampere1_prefetch_tune =
-{
-  0,			/* num_slots  */
-  64,			/* l1_cache_size  */
-  64,			/* l1_cache_line_size  */
-  2048,			/* l2_cache_size  */
-  true,			/* prefetch_dynamic_strides */
-  -1,			/* minimum_stride */
-  -1			/* default_opt_level  */
-};
-
-static const struct tune_params generic_tunings =
-{
-  &cortexa57_extra_costs,
-  &generic_addrcost_table,
-  &generic_regmove_cost,
-  &generic_vector_cost,
-  &generic_branch_cost,
-  &generic_approx_modes,
-  SVE_NOT_IMPLEMENTED, /* sve_width  */
-  { 4, /* load_int.  */
-    4, /* store_int.  */
-    4, /* load_fp.  */
-    4, /* store_fp.  */
-    4, /* load_pred.  */
-    4 /* store_pred.  */
-  }, /* memmov_cost.  */
-  2, /* issue_rate  */
-  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops  */
-  "16:12",	/* function_align.  */
-  "4",	/* jump_align.  */
-  "8",	/* loop_align.  */
-  2,	/* int_reassoc_width.  */
-  4,	/* fp_reassoc_width.  */
-  1,	/* fma_reassoc_width.  */
-  1,	/* vec_reassoc_width.  */
-  2,	/* min_div_recip_mul_sf.  */
-  2,	/* min_div_recip_mul_df.  */
-  0,	/* max_case_values.  */
-  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
-  /* Enabling AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS significantly benefits
-     Neoverse V1.  It does not have a noticeable effect on A64FX and should
-     have at most a very minor effect on SVE2 cores.  */
-  (AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS),	/* tune_flags.  */
-  &generic_prefetch_tune,
-  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
-  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
-};
-
-static const struct tune_params cortexa35_tunings =
-{
-  &cortexa53_extra_costs,
-  &generic_addrcost_table,
-  &cortexa53_regmove_cost,
-  &generic_vector_cost,
-  &generic_branch_cost,
-  &generic_approx_modes,
-  SVE_NOT_IMPLEMENTED, /* sve_width  */
-  { 4, /* load_int.  */
-    4, /* store_int.  */
-    4, /* load_fp.  */
-    4, /* store_fp.  */
-    4, /* load_pred.  */
-    4 /* store_pred.  */
-  }, /* memmov_cost.  */
-  1, /* issue_rate  */
-  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
-   | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops  */
-  "16",	/* function_align.  */
-  "4",	/* jump_align.  */
-  "8",	/* loop_align.  */
-  2,	/* int_reassoc_width.  */
-  4,	/* fp_reassoc_width.  */
-  1,	/* fma_reassoc_width.  */
-  1,	/* vec_reassoc_width.  */
-  2,	/* min_div_recip_mul_sf.  */
-  2,	/* min_div_recip_mul_df.  */
-  0,	/* max_case_values.  */
-  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
-  (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
-  &generic_prefetch_tune,
-  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
-  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
-};
-
-static const struct tune_params cortexa53_tunings =
-{
-  &cortexa53_extra_costs,
-  &generic_addrcost_table,
-  &cortexa53_regmove_cost,
-  &generic_vector_cost,
-  &generic_branch_cost,
-  &generic_approx_modes,
-  SVE_NOT_IMPLEMENTED, /* sve_width  */
-  { 4, /* load_int.  */
-    4, /* store_int.  */
-    4, /* load_fp.  */
-    4, /* store_fp.  */
-    4, /* load_pred.  */
-    4 /* store_pred.  */
-  }, /* memmov_cost.  */
-  2, /* issue_rate  */
-  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
-   | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops  */
-  "16",	/* function_align.  */
-  "4",	/* jump_align.  */
-  "8",	/* loop_align.  */
-  2,	/* int_reassoc_width.  */
-  4,	/* fp_reassoc_width.  */
-  1,	/* fma_reassoc_width.  */
-  1,	/* vec_reassoc_width.  */
-  2,	/* min_div_recip_mul_sf.  */
-  2,	/* min_div_recip_mul_df.  */
-  0,	/* max_case_values.  */
-  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
-  (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
-  &generic_prefetch_tune,
-  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
-  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
-};
-
-static const struct tune_params cortexa57_tunings =
-{
-  &cortexa57_extra_costs,
-  &generic_addrcost_table,
-  &cortexa57_regmove_cost,
-  &cortexa57_vector_cost,
-  &generic_branch_cost,
-  &generic_approx_modes,
-  SVE_NOT_IMPLEMENTED, /* sve_width  */
-  { 4, /* load_int.  */
-    4, /* store_int.  */
-    4, /* load_fp.  */
-    4, /* store_fp.  */
-    4, /* load_pred.  */
-    4 /* store_pred.  */
-  }, /* memmov_cost.  */
-  3, /* issue_rate  */
-  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
-   | AARCH64_FUSE_MOVK_MOVK), /* fusible_ops  */
-  "16",	/* function_align.  */
-  "4",	/* jump_align.  */
-  "8",	/* loop_align.  */
-  2,	/* int_reassoc_width.  */
-  4,	/* fp_reassoc_width.  */
-  1,	/* fma_reassoc_width.  */
-  1,	/* vec_reassoc_width.  */
-  2,	/* min_div_recip_mul_sf.  */
-  2,	/* min_div_recip_mul_df.  */
-  0,	/* max_case_values.  */
-  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
-  (AARCH64_EXTRA_TUNE_RENAME_FMA_REGS),	/* tune_flags.  */
-  &generic_prefetch_tune,
-  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
-  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
-};
-
-static const struct tune_params cortexa72_tunings =
-{
-  &cortexa57_extra_costs,
-  &generic_addrcost_table,
-  &cortexa57_regmove_cost,
-  &cortexa57_vector_cost,
-  &generic_branch_cost,
-  &generic_approx_modes,
-  SVE_NOT_IMPLEMENTED, /* sve_width  */
-  { 4, /* load_int.  */
-    4, /* store_int.  */
-    4, /* load_fp.  */
-    4, /* store_fp.  */
-    4, /* load_pred.  */
-    4 /* store_pred.  */
-  }, /* memmov_cost.  */
-  3, /* issue_rate  */
-  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
-   | AARCH64_FUSE_MOVK_MOVK), /* fusible_ops  */
-  "16",	/* function_align.  */
-  "4",	/* jump_align.  */
-  "8",	/* loop_align.  */
-  2,	/* int_reassoc_width.  */
-  4,	/* fp_reassoc_width.  */
-  1,	/* fma_reassoc_width.  */
-  1,	/* vec_reassoc_width.  */
-  2,	/* min_div_recip_mul_sf.  */
-  2,	/* min_div_recip_mul_df.  */
-  0,	/* max_case_values.  */
-  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
-  (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
-  &generic_prefetch_tune,
-  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
-  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
-};
-
-static const struct tune_params cortexa73_tunings =
-{
-  &cortexa57_extra_costs,
-  &generic_addrcost_table,
-  &cortexa57_regmove_cost,
-  &cortexa57_vector_cost,
-  &generic_branch_cost,
-  &generic_approx_modes,
-  SVE_NOT_IMPLEMENTED, /* sve_width  */
-  { 4, /* load_int.  */
-    4, /* store_int.  */
-    4, /* load_fp.  */
-    4, /* store_fp.  */
-    4, /* load_pred.  */
-    4 /* store_pred.  */
-  }, /* memmov_cost.  */
-  2, /* issue_rate.  */
-  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
-   | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops  */
-  "16",	/* function_align.  */
-  "4",	/* jump_align.  */
-  "8",	/* loop_align.  */
-  2,	/* int_reassoc_width.  */
-  4,	/* fp_reassoc_width.  */
-  1,	/* fma_reassoc_width.  */
-  1,	/* vec_reassoc_width.  */
-  2,	/* min_div_recip_mul_sf.  */
-  2,	/* min_div_recip_mul_df.  */
-  0,	/* max_case_values.  */
-  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
-  (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
-  &generic_prefetch_tune,
-  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
-  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
-};
-
-static const struct tune_params exynosm1_tunings =
-{
-  &exynosm1_extra_costs,
-  &exynosm1_addrcost_table,
-  &exynosm1_regmove_cost,
-  &exynosm1_vector_cost,
-  &generic_branch_cost,
-  &exynosm1_approx_modes,
-  SVE_NOT_IMPLEMENTED, /* sve_width  */
-  { 4, /* load_int.  */
-    4, /* store_int.  */
-    4, /* load_fp.  */
-    4, /* store_fp.  */
-    4, /* load_pred.  */
-    4 /* store_pred.  */
-  }, /* memmov_cost.  */
-  3,	/* issue_rate  */
-  (AARCH64_FUSE_AES_AESMC), /* fusible_ops  */
-  "4",	/* function_align.  */
-  "4",	/* jump_align.  */
-  "4",	/* loop_align.  */
-  2,	/* int_reassoc_width.  */
-  4,	/* fp_reassoc_width.  */
-  1,	/* fma_reassoc_width.  */
-  1,	/* vec_reassoc_width.  */
-  2,	/* min_div_recip_mul_sf.  */
-  2,	/* min_div_recip_mul_df.  */
-  48,	/* max_case_values.  */
-  tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model.  */
-  (AARCH64_EXTRA_TUNE_NONE), /* tune_flags.  */
-  &exynosm1_prefetch_tune,
-  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
-  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
-};
-
-static const struct tune_params thunderxt88_tunings =
-{
-  &thunderx_extra_costs,
-  &generic_addrcost_table,
-  &thunderx_regmove_cost,
-  &thunderx_vector_cost,
-  &generic_branch_cost,
-  &generic_approx_modes,
-  SVE_NOT_IMPLEMENTED, /* sve_width  */
-  { 6, /* load_int.  */
-    6, /* store_int.  */
-    6, /* load_fp.  */
-    6, /* store_fp.  */
-    6, /* load_pred.  */
-    6 /* store_pred.  */
-  }, /* memmov_cost.  */
-  2, /* issue_rate  */
-  AARCH64_FUSE_ALU_BRANCH, /* fusible_ops  */
-  "8",	/* function_align.  */
-  "8",	/* jump_align.  */
-  "8",	/* loop_align.  */
-  2,	/* int_reassoc_width.  */
-  4,	/* fp_reassoc_width.  */
-  1,	/* fma_reassoc_width.  */
-  1,	/* vec_reassoc_width.  */
-  2,	/* min_div_recip_mul_sf.  */
-  2,	/* min_div_recip_mul_df.  */
-  0,	/* max_case_values.  */
-  tune_params::AUTOPREFETCHER_OFF,	/* autoprefetcher_model.  */
-  (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
-  &thunderxt88_prefetch_tune,
-  AARCH64_LDP_STP_POLICY_ALIGNED,   /* ldp_policy_model.  */
-  AARCH64_LDP_STP_POLICY_ALIGNED    /* stp_policy_model.  */
-};
-
-static const struct tune_params thunderx_tunings =
-{
-  &thunderx_extra_costs,
-  &generic_addrcost_table,
-  &thunderx_regmove_cost,
-  &thunderx_vector_cost,
-  &generic_branch_cost,
-  &generic_approx_modes,
-  SVE_NOT_IMPLEMENTED, /* sve_width  */
-  { 6, /* load_int.  */
-    6, /* store_int.  */
-    6, /* load_fp.  */
-    6, /* store_fp.  */
-    6, /* load_pred.  */
-    6 /* store_pred.  */
-  }, /* memmov_cost.  */
-  2, /* issue_rate  */
-  AARCH64_FUSE_ALU_BRANCH, /* fusible_ops  */
-  "8",	/* function_align.  */
-  "8",	/* jump_align.  */
-  "8",	/* loop_align.  */
-  2,	/* int_reassoc_width.  */
-  4,	/* fp_reassoc_width.  */
-  1,	/* fma_reassoc_width.  */
-  1,	/* vec_reassoc_width.  */
-  2,	/* min_div_recip_mul_sf.  */
-  2,	/* min_div_recip_mul_df.  */
-  0,	/* max_case_values.  */
-  tune_params::AUTOPREFETCHER_OFF,	/* autoprefetcher_model.  */
-  (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND),	/* tune_flags.  */
-  &thunderx_prefetch_tune,
-  AARCH64_LDP_STP_POLICY_ALIGNED,   /* ldp_policy_model.  */
-  AARCH64_LDP_STP_POLICY_ALIGNED    /* stp_policy_model.  */
-};
-
-static const struct tune_params tsv110_tunings =
-{
-  &tsv110_extra_costs,
-  &tsv110_addrcost_table,
-  &tsv110_regmove_cost,
-  &tsv110_vector_cost,
-  &generic_branch_cost,
-  &generic_approx_modes,
-  SVE_NOT_IMPLEMENTED, /* sve_width  */
-  { 4, /* load_int.  */
-    4, /* store_int.  */
-    4, /* load_fp.  */
-    4, /* store_fp.  */
-    4, /* load_pred.  */
-    4 /* store_pred.  */
-  }, /* memmov_cost.  */
-  4,    /* issue_rate  */
-  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_ALU_BRANCH
-   | AARCH64_FUSE_ALU_CBZ), /* fusible_ops  */
-  "16", /* function_align.  */
-  "4",  /* jump_align.  */
-  "8",  /* loop_align.  */
-  2,    /* int_reassoc_width.  */
-  4,    /* fp_reassoc_width.  */
-  1,	/* fma_reassoc_width.  */
-  1,    /* vec_reassoc_width.  */
-  2,    /* min_div_recip_mul_sf.  */
-  2,    /* min_div_recip_mul_df.  */
-  0,    /* max_case_values.  */
-  tune_params::AUTOPREFETCHER_WEAK,     /* autoprefetcher_model.  */
-  (AARCH64_EXTRA_TUNE_NONE),     /* tune_flags.  */
-  &tsv110_prefetch_tune,
-  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
-  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
-};
-
-static const struct tune_params xgene1_tunings =
-{
-  &xgene1_extra_costs,
-  &xgene1_addrcost_table,
-  &xgene1_regmove_cost,
-  &xgene1_vector_cost,
-  &generic_branch_cost,
-  &xgene1_approx_modes,
-  SVE_NOT_IMPLEMENTED, /* sve_width  */
-  { 6, /* load_int.  */
-    6, /* store_int.  */
-    6, /* load_fp.  */
-    6, /* store_fp.  */
-    6, /* load_pred.  */
-    6 /* store_pred.  */
-  }, /* memmov_cost.  */
-  4, /* issue_rate  */
-  AARCH64_FUSE_NOTHING, /* fusible_ops  */
-  "16",	/* function_align.  */
-  "16",	/* jump_align.  */
-  "16",	/* loop_align.  */
-  2,	/* int_reassoc_width.  */
-  4,	/* fp_reassoc_width.  */
-  1,	/* fma_reassoc_width.  */
-  1,	/* vec_reassoc_width.  */
-  2,	/* min_div_recip_mul_sf.  */
-  2,	/* min_div_recip_mul_df.  */
-  17,	/* max_case_values.  */
-  tune_params::AUTOPREFETCHER_OFF,	/* autoprefetcher_model.  */
-  (AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS),	/* tune_flags.  */
-  &xgene1_prefetch_tune,
-  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
-  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
-};
-
-static const struct tune_params emag_tunings =
-{
-  &xgene1_extra_costs,
-  &xgene1_addrcost_table,
-  &xgene1_regmove_cost,
-  &xgene1_vector_cost,
-  &generic_branch_cost,
-  &xgene1_approx_modes,
-  SVE_NOT_IMPLEMENTED,
-  { 6, /* load_int.  */
-    6, /* store_int.  */
-    6, /* load_fp.  */
-    6, /* store_fp.  */
-    6, /* load_pred.  */
-    6 /* store_pred.  */
-  }, /* memmov_cost.  */
-  4, /* issue_rate  */
-  AARCH64_FUSE_NOTHING, /* fusible_ops  */
-  "16",	/* function_align.  */
-  "16",	/* jump_align.  */
-  "16",	/* loop_align.  */
-  2,	/* int_reassoc_width.  */
-  4,	/* fp_reassoc_width.  */
-  1,	/* fma_reassoc_width.  */
-  1,	/* vec_reassoc_width.  */
-  2,	/* min_div_recip_mul_sf.  */
-  2,	/* min_div_recip_mul_df.  */
-  17,	/* max_case_values.  */
-  tune_params::AUTOPREFETCHER_OFF,	/* autoprefetcher_model.  */
-  (AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS),	/* tune_flags.  */
-  &xgene1_prefetch_tune,
-  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
-  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
-};
-
-static const struct tune_params qdf24xx_tunings =
-{
-  &qdf24xx_extra_costs,
-  &qdf24xx_addrcost_table,
-  &qdf24xx_regmove_cost,
-  &qdf24xx_vector_cost,
-  &generic_branch_cost,
-  &generic_approx_modes,
-  SVE_NOT_IMPLEMENTED, /* sve_width  */
-  { 4, /* load_int.  */
-    4, /* store_int.  */
-    4, /* load_fp.  */
-    4, /* store_fp.  */
-    4, /* load_pred.  */
-    4 /* store_pred.  */
-  }, /* memmov_cost.  */
-  4, /* issue_rate  */
-  (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
-   | AARCH64_FUSE_MOVK_MOVK), /* fuseable_ops  */
-  "16",	/* function_align.  */
-  "8",	/* jump_align.  */
-  "16",	/* loop_align.  */
-  2,	/* int_reassoc_width.  */
-  4,	/* fp_reassoc_width.  */
-  1,	/* fma_reassoc_width.  */
-  1,	/* vec_reassoc_width.  */
-  2,	/* min_div_recip_mul_sf.  */
-  2,	/* min_div_recip_mul_df.  */
-  0,	/* max_case_values.  */
-  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
-  AARCH64_EXTRA_TUNE_RENAME_LOAD_REGS, /* tune_flags.  */
-  &qdf24xx_prefetch_tune,
-  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
-  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
-};
-
-/* Tuning structure for the Qualcomm Saphira core.  Default to falkor values
-   for now.  */
-static const struct tune_params saphira_tunings =
-{
-  &generic_extra_costs,
-  &generic_addrcost_table,
-  &generic_regmove_cost,
-  &generic_vector_cost,
-  &generic_branch_cost,
-  &generic_approx_modes,
-  SVE_NOT_IMPLEMENTED, /* sve_width  */
-  { 4, /* load_int.  */
-    4, /* store_int.  */
-    4, /* load_fp.  */
-    4, /* store_fp.  */
-    4, /* load_pred.  */
-    4 /* store_pred.  */
-  }, /* memmov_cost.  */
-  4, /* issue_rate  */
-  (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
-   | AARCH64_FUSE_MOVK_MOVK), /* fuseable_ops  */
-  "16",	/* function_align.  */
-  "8",	/* jump_align.  */
-  "16",	/* loop_align.  */
-  2,	/* int_reassoc_width.  */
-  4,	/* fp_reassoc_width.  */
-  1,	/* fma_reassoc_width.  */
-  1,	/* vec_reassoc_width.  */
-  2,	/* min_div_recip_mul_sf.  */
-  2,	/* min_div_recip_mul_df.  */
-  0,	/* max_case_values.  */
-  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
-  (AARCH64_EXTRA_TUNE_NONE),		/* tune_flags.  */
-  &generic_prefetch_tune,
-  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
-  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
-};
-
-static const struct tune_params thunderx2t99_tunings =
-{
-  &thunderx2t99_extra_costs,
-  &thunderx2t99_addrcost_table,
-  &thunderx2t99_regmove_cost,
-  &thunderx2t99_vector_cost,
-  &generic_branch_cost,
-  &generic_approx_modes,
-  SVE_NOT_IMPLEMENTED, /* sve_width  */
-  { 4, /* load_int.  */
-    4, /* store_int.  */
-    4, /* load_fp.  */
-    4, /* store_fp.  */
-    4, /* load_pred.  */
-    4 /* store_pred.  */
-  }, /* memmov_cost.  */
-  4, /* issue_rate.  */
-  (AARCH64_FUSE_ALU_BRANCH | AARCH64_FUSE_AES_AESMC
-   | AARCH64_FUSE_ALU_CBZ), /* fusible_ops  */
-  "16",	/* function_align.  */
-  "8",	/* jump_align.  */
-  "16",	/* loop_align.  */
-  3,	/* int_reassoc_width.  */
-  2,	/* fp_reassoc_width.  */
-  1,	/* fma_reassoc_width.  */
-  2,	/* vec_reassoc_width.  */
-  2,	/* min_div_recip_mul_sf.  */
-  2,	/* min_div_recip_mul_df.  */
-  0,	/* max_case_values.  */
-  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
-  (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
-  &thunderx2t99_prefetch_tune,
-  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
-  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
-};
-
-static const struct tune_params thunderx3t110_tunings =
-{
-  &thunderx3t110_extra_costs,
-  &thunderx3t110_addrcost_table,
-  &thunderx3t110_regmove_cost,
-  &thunderx3t110_vector_cost,
-  &generic_branch_cost,
-  &generic_approx_modes,
-  SVE_NOT_IMPLEMENTED, /* sve_width  */
-  { 4, /* load_int.  */
-    4, /* store_int.  */
-    4, /* load_fp.  */
-    4, /* store_fp.  */
-    4, /* load_pred.  */
-    4 /* store_pred.  */
-  }, /* memmov_cost.  */
-  6, /* issue_rate.  */
-  (AARCH64_FUSE_ALU_BRANCH | AARCH64_FUSE_AES_AESMC
-   | AARCH64_FUSE_ALU_CBZ), /* fusible_ops  */
-  "16",	/* function_align.  */
-  "8",	/* jump_align.  */
-  "16",	/* loop_align.  */
-  3,	/* int_reassoc_width.  */
-  2,	/* fp_reassoc_width.  */
-  1,	/* fma_reassoc_width.  */
-  2,	/* vec_reassoc_width.  */
-  2,	/* min_div_recip_mul_sf.  */
-  2,	/* min_div_recip_mul_df.  */
-  0,	/* max_case_values.  */
-  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
-  (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
-  &thunderx3t110_prefetch_tune,
-  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
-  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
-};
-
-static const struct tune_params neoversen1_tunings =
-{
-  &cortexa76_extra_costs,
-  &generic_addrcost_table,
-  &generic_regmove_cost,
-  &cortexa57_vector_cost,
-  &generic_branch_cost,
-  &generic_approx_modes,
-  SVE_NOT_IMPLEMENTED, /* sve_width  */
-  { 4, /* load_int.  */
-    2, /* store_int.  */
-    5, /* load_fp.  */
-    2, /* store_fp.  */
-    4, /* load_pred.  */
-    4 /* store_pred.  */
-  }, /* memmov_cost.  */
-  3, /* issue_rate  */
-  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops  */
-  "32:16",	/* function_align.  */
-  "4",		/* jump_align.  */
-  "32:16",	/* loop_align.  */
-  2,	/* int_reassoc_width.  */
-  4,	/* fp_reassoc_width.  */
-  1,	/* fma_reassoc_width.  */
-  2,	/* vec_reassoc_width.  */
-  2,	/* min_div_recip_mul_sf.  */
-  2,	/* min_div_recip_mul_df.  */
-  0,	/* max_case_values.  */
-  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
-  (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND),	/* tune_flags.  */
-  &generic_prefetch_tune,
-  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
-  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
-};
-
-static const struct tune_params ampere1_tunings =
-{
-  &ampere1_extra_costs,
-  &generic_addrcost_table,
-  &generic_regmove_cost,
-  &ampere1_vector_cost,
-  &generic_branch_cost,
-  &generic_approx_modes,
-  SVE_NOT_IMPLEMENTED, /* sve_width  */
-  { 4, /* load_int.  */
-    4, /* store_int.  */
-    4, /* load_fp.  */
-    4, /* store_fp.  */
-    4, /* load_pred.  */
-    4 /* store_pred.  */
-  }, /* memmov_cost.  */
-  4, /* issue_rate  */
-  (AARCH64_FUSE_ADRP_ADD | AARCH64_FUSE_AES_AESMC |
-   AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_MOVK_MOVK |
-   AARCH64_FUSE_ALU_BRANCH /* adds, ands, bics, ccmp, ccmn */ |
-   AARCH64_FUSE_CMP_BRANCH),
-  /* fusible_ops  */
-  "32",		/* function_align.  */
-  "4",		/* jump_align.  */
-  "32:16",	/* loop_align.  */
-  2,	/* int_reassoc_width.  */
-  4,	/* fp_reassoc_width.  */
-  4,	/* fma_reassoc_width.  */
-  2,	/* vec_reassoc_width.  */
-  2,	/* min_div_recip_mul_sf.  */
-  2,	/* min_div_recip_mul_df.  */
-  0,	/* max_case_values.  */
-  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
-  (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
-  &ampere1_prefetch_tune,
-  AARCH64_LDP_STP_POLICY_ALIGNED,   /* ldp_policy_model.  */
-  AARCH64_LDP_STP_POLICY_ALIGNED    /* stp_policy_model.  */
-};
-
-static const struct tune_params ampere1a_tunings =
-{
-  &ampere1a_extra_costs,
-  &generic_addrcost_table,
-  &generic_regmove_cost,
-  &ampere1_vector_cost,
-  &generic_branch_cost,
-  &generic_approx_modes,
-  SVE_NOT_IMPLEMENTED, /* sve_width  */
-  { 4, /* load_int.  */
-    4, /* store_int.  */
-    4, /* load_fp.  */
-    4, /* store_fp.  */
-    4, /* load_pred.  */
-    4 /* store_pred.  */
-  }, /* memmov_cost.  */
-  4, /* issue_rate  */
-  (AARCH64_FUSE_ADRP_ADD | AARCH64_FUSE_AES_AESMC |
-   AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_MOVK_MOVK |
-   AARCH64_FUSE_ALU_BRANCH /* adds, ands, bics, ccmp, ccmn */ |
-   AARCH64_FUSE_CMP_BRANCH | AARCH64_FUSE_ALU_CBZ |
-   AARCH64_FUSE_ADDSUB_2REG_CONST1),
-  /* fusible_ops  */
-  "32",		/* function_align.  */
-  "4",		/* jump_align.  */
-  "32:16",	/* loop_align.  */
-  2,	/* int_reassoc_width.  */
-  4,	/* fp_reassoc_width.  */
-  1,	/* fma_reassoc_width.  */
-  2,	/* vec_reassoc_width.  */
-  2,	/* min_div_recip_mul_sf.  */
-  2,	/* min_div_recip_mul_df.  */
-  0,	/* max_case_values.  */
-  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
-  (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
-  &ampere1_prefetch_tune,
-  AARCH64_LDP_STP_POLICY_ALIGNED,   /* ldp_policy_model.  */
-  AARCH64_LDP_STP_POLICY_ALIGNED    /* stp_policy_model.  */
-};
-
-static const advsimd_vec_cost neoversev1_advsimd_vector_cost =
-{
-  2, /* int_stmt_cost  */
-  2, /* fp_stmt_cost  */
-  4, /* ld2_st2_permute_cost */
-  4, /* ld3_st3_permute_cost  */
-  5, /* ld4_st4_permute_cost  */
-  3, /* permute_cost  */
-  4, /* reduc_i8_cost  */
-  4, /* reduc_i16_cost  */
-  2, /* reduc_i32_cost  */
-  2, /* reduc_i64_cost  */
-  6, /* reduc_f16_cost  */
-  3, /* reduc_f32_cost  */
-  2, /* reduc_f64_cost  */
-  2, /* store_elt_extra_cost  */
-  /* This value is just inherited from the Cortex-A57 table.  */
-  8, /* vec_to_scalar_cost  */
-  /* This depends very much on what the scalar value is and
-     where it comes from.  E.g. some constants take two dependent
-     instructions or a load, while others might be moved from a GPR.
-     4 seems to be a reasonable compromise in practice.  */
-  4, /* scalar_to_vec_cost  */
-  4, /* align_load_cost  */
-  4, /* unalign_load_cost  */
-  /* Although stores have a latency of 2 and compete for the
-     vector pipes, in practice it's better not to model that.  */
-  1, /* unalign_store_cost  */
-  1  /* store_cost  */
-};
-
-static const sve_vec_cost neoversev1_sve_vector_cost =
-{
-  {
-    2, /* int_stmt_cost  */
-    2, /* fp_stmt_cost  */
-    4, /* ld2_st2_permute_cost  */
-    7, /* ld3_st3_permute_cost  */
-    8, /* ld4_st4_permute_cost  */
-    3, /* permute_cost  */
-    /* Theoretically, a reduction involving 31 scalar ADDs could
-       complete in ~9 cycles and would have a cost of 31.  [SU]ADDV
-       completes in 14 cycles, so give it a cost of 31 + 5.  */
-    36, /* reduc_i8_cost  */
-    /* Likewise for 15 scalar ADDs (~5 cycles) vs. 12: 15 + 7.  */
-    22, /* reduc_i16_cost  */
-    /* Likewise for 7 scalar ADDs (~3 cycles) vs. 10: 7 + 7.  */
-    14, /* reduc_i32_cost  */
-    /* Likewise for 3 scalar ADDs (~2 cycles) vs. 10: 3 + 8.  */
-    11, /* reduc_i64_cost  */
-    /* Theoretically, a reduction involving 15 scalar FADDs could
-       complete in ~9 cycles and would have a cost of 30.  FADDV
-       completes in 13 cycles, so give it a cost of 30 + 4.  */
-    34, /* reduc_f16_cost  */
-    /* Likewise for 7 scalar FADDs (~6 cycles) vs. 11: 14 + 5.  */
-    19, /* reduc_f32_cost  */
-    /* Likewise for 3 scalar FADDs (~4 cycles) vs. 9: 6 + 5.  */
-    11, /* reduc_f64_cost  */
-    2, /* store_elt_extra_cost  */
-    /* This value is just inherited from the Cortex-A57 table.  */
-    8, /* vec_to_scalar_cost  */
-    /* See the comment above the Advanced SIMD versions.  */
-    4, /* scalar_to_vec_cost  */
-    4, /* align_load_cost  */
-    4, /* unalign_load_cost  */
-    /* Although stores have a latency of 2 and compete for the
-       vector pipes, in practice it's better not to model that.  */
-    1, /* unalign_store_cost  */
-    1  /* store_cost  */
-  },
-  3, /* clast_cost  */
-  19, /* fadda_f16_cost  */
-  11, /* fadda_f32_cost  */
-  8, /* fadda_f64_cost  */
-  32, /* gather_load_x32_cost  */
-  16, /* gather_load_x64_cost  */
-  3 /* scatter_store_elt_cost  */
-};
-
-static const aarch64_scalar_vec_issue_info neoversev1_scalar_issue_info =
-{
-  3, /* loads_stores_per_cycle  */
-  2, /* stores_per_cycle  */
-  4, /* general_ops_per_cycle  */
-  0, /* fp_simd_load_general_ops  */
-  1 /* fp_simd_store_general_ops  */
-};
-
-static const aarch64_advsimd_vec_issue_info neoversev1_advsimd_issue_info =
-{
-  {
-    3, /* loads_stores_per_cycle  */
-    2, /* stores_per_cycle  */
-    4, /* general_ops_per_cycle  */
-    0, /* fp_simd_load_general_ops  */
-    1 /* fp_simd_store_general_ops  */
-  },
-  2, /* ld2_st2_general_ops  */
-  2, /* ld3_st3_general_ops  */
-  3 /* ld4_st4_general_ops  */
-};
-
-static const aarch64_sve_vec_issue_info neoversev1_sve_issue_info =
-{
-  {
-    {
-      2, /* loads_per_cycle  */
-      2, /* stores_per_cycle  */
-      2, /* general_ops_per_cycle  */
-      0, /* fp_simd_load_general_ops  */
-      1 /* fp_simd_store_general_ops  */
-    },
-    2, /* ld2_st2_general_ops  */
-    2, /* ld3_st3_general_ops  */
-    3 /* ld4_st4_general_ops  */
-  },
-  1, /* pred_ops_per_cycle  */
-  2, /* while_pred_ops  */
-  2, /* int_cmp_pred_ops  */
-  1, /* fp_cmp_pred_ops  */
-  1, /* gather_scatter_pair_general_ops  */
-  1 /* gather_scatter_pair_pred_ops  */
-};
-
-static const aarch64_vec_issue_info neoversev1_vec_issue_info =
-{
-  &neoversev1_scalar_issue_info,
-  &neoversev1_advsimd_issue_info,
-  &neoversev1_sve_issue_info
-};
-
-/* Neoverse V1 costs for vector insn classes.  */
-static const struct cpu_vector_cost neoversev1_vector_cost =
-{
-  1, /* scalar_int_stmt_cost  */
-  2, /* scalar_fp_stmt_cost  */
-  4, /* scalar_load_cost  */
-  1, /* scalar_store_cost  */
-  1, /* cond_taken_branch_cost  */
-  1, /* cond_not_taken_branch_cost  */
-  &neoversev1_advsimd_vector_cost, /* advsimd  */
-  &neoversev1_sve_vector_cost, /* sve  */
-  &neoversev1_vec_issue_info /* issue_info  */
-};
-
-static const struct tune_params neoversev1_tunings =
-{
-  &cortexa76_extra_costs,
-  &neoversev1_addrcost_table,
-  &neoversev1_regmove_cost,
-  &neoversev1_vector_cost,
-  &generic_branch_cost,
-  &generic_approx_modes,
-  SVE_256, /* sve_width  */
-  { 4, /* load_int.  */
-    2, /* store_int.  */
-    6, /* load_fp.  */
-    2, /* store_fp.  */
-    6, /* load_pred.  */
-    1 /* store_pred.  */
-  }, /* memmov_cost.  */
-  3, /* issue_rate  */
-  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops  */
-  "32:16",	/* function_align.  */
-  "4",		/* jump_align.  */
-  "32:16",	/* loop_align.  */
-  2,	/* int_reassoc_width.  */
-  4,	/* fp_reassoc_width.  */
-  4,	/* fma_reassoc_width.  */
-  2,	/* vec_reassoc_width.  */
-  2,	/* min_div_recip_mul_sf.  */
-  2,	/* min_div_recip_mul_df.  */
-  0,	/* max_case_values.  */
-  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
-  (AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
-   | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
-   | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT
-   | AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND),	/* tune_flags.  */
-  &generic_prefetch_tune,
-  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
-  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
-};
-
-static const sve_vec_cost neoverse512tvb_sve_vector_cost =
-{
-  {
-    2, /* int_stmt_cost  */
-    2, /* fp_stmt_cost  */
-    4, /* ld2_st2_permute_cost  */
-    5, /* ld3_st3_permute_cost  */
-    5, /* ld4_st4_permute_cost  */
-    3, /* permute_cost  */
-    /* Theoretically, a reduction involving 15 scalar ADDs could
-       complete in ~5 cycles and would have a cost of 15.  Assume that
-       [SU]ADDV completes in 11 cycles and so give it a cost of 15 + 6.  */
-    21, /* reduc_i8_cost  */
-    /* Likewise for 7 scalar ADDs (~3 cycles) vs. 9: 7 + 6.  */
-    13, /* reduc_i16_cost  */
-    /* Likewise for 3 scalar ADDs (~2 cycles) vs. 8: 3 + 6.  */
-    9, /* reduc_i32_cost  */
-    /* Likewise for 1 scalar ADD (1 cycle) vs. 8: 1 + 7.  */
-    8, /* reduc_i64_cost  */
-    /* Theoretically, a reduction involving 7 scalar FADDs could
-       complete in ~6 cycles and would have a cost of 14.  Assume that
-       FADDV completes in 8 cycles and so give it a cost of 14 + 2.  */
-    16, /* reduc_f16_cost  */
-    /* Likewise for 3 scalar FADDs (~4 cycles) vs. 6: 6 + 2.  */
-    8, /* reduc_f32_cost  */
-    /* Likewise for 1 scalar FADD (2 cycles) vs. 4: 2 + 2.  */
-    4, /* reduc_f64_cost  */
-    2, /* store_elt_extra_cost  */
-    /* This value is just inherited from the Cortex-A57 table.  */
-    8, /* vec_to_scalar_cost  */
-    /* This depends very much on what the scalar value is and
-       where it comes from.  E.g. some constants take two dependent
-       instructions or a load, while others might be moved from a GPR.
-       4 seems to be a reasonable compromise in practice.  */
-    4, /* scalar_to_vec_cost  */
-    4, /* align_load_cost  */
-    4, /* unalign_load_cost  */
-    /* Although stores generally have a latency of 2 and compete for the
-       vector pipes, in practice it's better not to model that.  */
-    1, /* unalign_store_cost  */
-    1  /* store_cost  */
-  },
-  3, /* clast_cost  */
-  10, /* fadda_f16_cost  */
-  6, /* fadda_f32_cost  */
-  4, /* fadda_f64_cost  */
-  /* A strided Advanced SIMD x64 load would take two parallel FP loads
-     (6 cycles) plus an insertion (2 cycles).  Assume a 64-bit SVE gather
-     is 1 cycle more.  The Advanced SIMD version is costed as 2 scalar loads
-     (cost 8) and a vec_construct (cost 2).  Add a full vector operation
-     (cost 2) to that, to avoid the difference being lost in rounding.
-
-     There is no easy comparison between a strided Advanced SIMD x32 load
-     and an SVE 32-bit gather, but cost an SVE 32-bit gather as 1 vector
-     operation more than a 64-bit gather.  */
-  14, /* gather_load_x32_cost  */
-  12, /* gather_load_x64_cost  */
-  3 /* scatter_store_elt_cost  */
-};
-
-static const aarch64_sve_vec_issue_info neoverse512tvb_sve_issue_info =
-{
-  {
-    {
-      3, /* loads_per_cycle  */
-      2, /* stores_per_cycle  */
-      4, /* general_ops_per_cycle  */
-      0, /* fp_simd_load_general_ops  */
-      1 /* fp_simd_store_general_ops  */
-    },
-    2, /* ld2_st2_general_ops  */
-    2, /* ld3_st3_general_ops  */
-    3 /* ld4_st4_general_ops  */
-  },
-  2, /* pred_ops_per_cycle  */
-  2, /* while_pred_ops  */
-  2, /* int_cmp_pred_ops  */
-  1, /* fp_cmp_pred_ops  */
-  1, /* gather_scatter_pair_general_ops  */
-  1 /* gather_scatter_pair_pred_ops  */
-};
-
-static const aarch64_vec_issue_info neoverse512tvb_vec_issue_info =
-{
-  &neoversev1_scalar_issue_info,
-  &neoversev1_advsimd_issue_info,
-  &neoverse512tvb_sve_issue_info
-};
-
-static const struct cpu_vector_cost neoverse512tvb_vector_cost =
-{
-  1, /* scalar_int_stmt_cost  */
-  2, /* scalar_fp_stmt_cost  */
-  4, /* scalar_load_cost  */
-  1, /* scalar_store_cost  */
-  1, /* cond_taken_branch_cost  */
-  1, /* cond_not_taken_branch_cost  */
-  &neoversev1_advsimd_vector_cost, /* advsimd  */
-  &neoverse512tvb_sve_vector_cost, /* sve  */
-  &neoverse512tvb_vec_issue_info /* issue_info  */
-};
-
-static const struct tune_params neoverse512tvb_tunings =
-{
-  &cortexa76_extra_costs,
-  &neoversev1_addrcost_table,
-  &neoversev1_regmove_cost,
-  &neoverse512tvb_vector_cost,
-  &generic_branch_cost,
-  &generic_approx_modes,
-  SVE_128 | SVE_256, /* sve_width  */
-  { 4, /* load_int.  */
-    2, /* store_int.  */
-    6, /* load_fp.  */
-    2, /* store_fp.  */
-    6, /* load_pred.  */
-    1 /* store_pred.  */
-  }, /* memmov_cost.  */
-  3, /* issue_rate  */
-  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops  */
-  "32:16",	/* function_align.  */
-  "4",		/* jump_align.  */
-  "32:16",	/* loop_align.  */
-  2,	/* int_reassoc_width.  */
-  4,	/* fp_reassoc_width.  */
-  4,	/* fma_reassoc_width.  */
-  2,	/* vec_reassoc_width.  */
-  2,	/* min_div_recip_mul_sf.  */
-  2,	/* min_div_recip_mul_df.  */
-  0,	/* max_case_values.  */
-  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
-  (AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
-   | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
-   | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT),	/* tune_flags.  */
-  &generic_prefetch_tune,
-  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
-  AARCH64_LDP_STP_POLICY_ALWAYS	   /* stp_policy_model.  */
-};
-
-static const advsimd_vec_cost neoversen2_advsimd_vector_cost =
-{
-  2, /* int_stmt_cost  */
-  2, /* fp_stmt_cost  */
-  2, /* ld2_st2_permute_cost */
-  2, /* ld3_st3_permute_cost  */
-  3, /* ld4_st4_permute_cost  */
-  3, /* permute_cost  */
-  4, /* reduc_i8_cost  */
-  4, /* reduc_i16_cost  */
-  2, /* reduc_i32_cost  */
-  2, /* reduc_i64_cost  */
-  6, /* reduc_f16_cost  */
-  4, /* reduc_f32_cost  */
-  2, /* reduc_f64_cost  */
-  2, /* store_elt_extra_cost  */
-  /* This value is just inherited from the Cortex-A57 table.  */
-  8, /* vec_to_scalar_cost  */
-  /* This depends very much on what the scalar value is and
-     where it comes from.  E.g. some constants take two dependent
-     instructions or a load, while others might be moved from a GPR.
-     4 seems to be a reasonable compromise in practice.  */
-  4, /* scalar_to_vec_cost  */
-  4, /* align_load_cost  */
-  4, /* unalign_load_cost  */
-  /* Although stores have a latency of 2 and compete for the
-     vector pipes, in practice it's better not to model that.  */
-  1, /* unalign_store_cost  */
-  1  /* store_cost  */
-};
-
-static const sve_vec_cost neoversen2_sve_vector_cost =
-{
-  {
-    2, /* int_stmt_cost  */
-    2, /* fp_stmt_cost  */
-    3, /* ld2_st2_permute_cost  */
-    4, /* ld3_st3_permute_cost  */
-    4, /* ld4_st4_permute_cost  */
-    3, /* permute_cost  */
-    /* Theoretically, a reduction involving 15 scalar ADDs could
-       complete in ~5 cycles and would have a cost of 15.  [SU]ADDV
-       completes in 11 cycles, so give it a cost of 15 + 6.  */
-    21, /* reduc_i8_cost  */
-    /* Likewise for 7 scalar ADDs (~3 cycles) vs. 9: 7 + 6.  */
-    13, /* reduc_i16_cost  */
-    /* Likewise for 3 scalar ADDs (~2 cycles) vs. 8: 3 + 6.  */
-    9, /* reduc_i32_cost  */
-    /* Likewise for 1 scalar ADD (~1 cycles) vs. 2: 1 + 1.  */
-    2, /* reduc_i64_cost  */
-    /* Theoretically, a reduction involving 7 scalar FADDs could
-       complete in ~8 cycles and would have a cost of 14.  FADDV
-       completes in 6 cycles, so give it a cost of 14 - 2.  */
-    12, /* reduc_f16_cost  */
-    /* Likewise for 3 scalar FADDs (~4 cycles) vs. 4: 6 - 0.  */
-    6, /* reduc_f32_cost  */
-    /* Likewise for 1 scalar FADD (~2 cycles) vs. 2: 2 - 0.  */
-    2, /* reduc_f64_cost  */
-    2, /* store_elt_extra_cost  */
-    /* This value is just inherited from the Cortex-A57 table.  */
-    8, /* vec_to_scalar_cost  */
-    /* See the comment above the Advanced SIMD versions.  */
-    4, /* scalar_to_vec_cost  */
-    4, /* align_load_cost  */
-    4, /* unalign_load_cost  */
-    /* Although stores have a latency of 2 and compete for the
-       vector pipes, in practice it's better not to model that.  */
-    1, /* unalign_store_cost  */
-    1  /* store_cost  */
-  },
-  3, /* clast_cost  */
-  10, /* fadda_f16_cost  */
-  6, /* fadda_f32_cost  */
-  4, /* fadda_f64_cost  */
-  /* A strided Advanced SIMD x64 load would take two parallel FP loads
-     (8 cycles) plus an insertion (2 cycles).  Assume a 64-bit SVE gather
-     is 1 cycle more.  The Advanced SIMD version is costed as 2 scalar loads
-     (cost 8) and a vec_construct (cost 2).  Add a full vector operation
-     (cost 2) to that, to avoid the difference being lost in rounding.
-
-     There is no easy comparison between a strided Advanced SIMD x32 load
-     and an SVE 32-bit gather, but cost an SVE 32-bit gather as 1 vector
-     operation more than a 64-bit gather.  */
-  14, /* gather_load_x32_cost  */
-  12, /* gather_load_x64_cost  */
-  3 /* scatter_store_elt_cost  */
-};
-
-static const aarch64_scalar_vec_issue_info neoversen2_scalar_issue_info =
-{
-  3, /* loads_stores_per_cycle  */
-  2, /* stores_per_cycle  */
-  4, /* general_ops_per_cycle  */
-  0, /* fp_simd_load_general_ops  */
-  1 /* fp_simd_store_general_ops  */
-};
-
-static const aarch64_advsimd_vec_issue_info neoversen2_advsimd_issue_info =
-{
-  {
-    3, /* loads_stores_per_cycle  */
-    2, /* stores_per_cycle  */
-    2, /* general_ops_per_cycle  */
-    0, /* fp_simd_load_general_ops  */
-    1 /* fp_simd_store_general_ops  */
-  },
-  2, /* ld2_st2_general_ops  */
-  2, /* ld3_st3_general_ops  */
-  3 /* ld4_st4_general_ops  */
-};
-
-static const aarch64_sve_vec_issue_info neoversen2_sve_issue_info =
-{
-  {
-    {
-      3, /* loads_per_cycle  */
-      2, /* stores_per_cycle  */
-      2, /* general_ops_per_cycle  */
-      0, /* fp_simd_load_general_ops  */
-      1 /* fp_simd_store_general_ops  */
-    },
-    2, /* ld2_st2_general_ops  */
-    3, /* ld3_st3_general_ops  */
-    3 /* ld4_st4_general_ops  */
-  },
-  2, /* pred_ops_per_cycle  */
-  2, /* while_pred_ops  */
-  2, /* int_cmp_pred_ops  */
-  1, /* fp_cmp_pred_ops  */
-  1, /* gather_scatter_pair_general_ops  */
-  1 /* gather_scatter_pair_pred_ops  */
-};
-
-static const aarch64_vec_issue_info neoversen2_vec_issue_info =
-{
-  &neoversen2_scalar_issue_info,
-  &neoversen2_advsimd_issue_info,
-  &neoversen2_sve_issue_info
-};
-
-/* Neoverse N2 costs for vector insn classes.  */
-static const struct cpu_vector_cost neoversen2_vector_cost =
-{
-  1, /* scalar_int_stmt_cost  */
-  2, /* scalar_fp_stmt_cost  */
-  4, /* scalar_load_cost  */
-  1, /* scalar_store_cost  */
-  1, /* cond_taken_branch_cost  */
-  1, /* cond_not_taken_branch_cost  */
-  &neoversen2_advsimd_vector_cost, /* advsimd  */
-  &neoversen2_sve_vector_cost, /* sve  */
-  &neoversen2_vec_issue_info /* issue_info  */
-};
-
-static const struct tune_params neoversen2_tunings =
-{
-  &cortexa76_extra_costs,
-  &neoversen2_addrcost_table,
-  &neoversen2_regmove_cost,
-  &neoversen2_vector_cost,
-  &generic_branch_cost,
-  &generic_approx_modes,
-  SVE_128, /* sve_width  */
-  { 4, /* load_int.  */
-    1, /* store_int.  */
-    6, /* load_fp.  */
-    2, /* store_fp.  */
-    6, /* load_pred.  */
-    1 /* store_pred.  */
-  }, /* memmov_cost.  */
-  3, /* issue_rate  */
-  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops  */
-  "32:16",	/* function_align.  */
-  "4",		/* jump_align.  */
-  "32:16",	/* loop_align.  */
-  2,	/* int_reassoc_width.  */
-  4,	/* fp_reassoc_width.  */
-  1,	/* fma_reassoc_width.  */
-  2,	/* vec_reassoc_width.  */
-  2,	/* min_div_recip_mul_sf.  */
-  2,	/* min_div_recip_mul_df.  */
-  0,	/* max_case_values.  */
-  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
-  (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND
-   | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
-   | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
-   | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT),	/* tune_flags.  */
-  &generic_prefetch_tune,
-  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
-  AARCH64_LDP_STP_POLICY_ALWAYS	   /* stp_policy_model.  */
-};
-
-static const advsimd_vec_cost neoversev2_advsimd_vector_cost =
-{
-  2, /* int_stmt_cost  */
-  2, /* fp_stmt_cost  */
-  2, /* ld2_st2_permute_cost */
-  2, /* ld3_st3_permute_cost  */
-  3, /* ld4_st4_permute_cost  */
-  3, /* permute_cost  */
-  4, /* reduc_i8_cost  */
-  4, /* reduc_i16_cost  */
-  2, /* reduc_i32_cost  */
-  2, /* reduc_i64_cost  */
-  6, /* reduc_f16_cost  */
-  3, /* reduc_f32_cost  */
-  2, /* reduc_f64_cost  */
-  2, /* store_elt_extra_cost  */
-  /* This value is just inherited from the Cortex-A57 table.  */
-  8, /* vec_to_scalar_cost  */
-  /* This depends very much on what the scalar value is and
-     where it comes from.  E.g. some constants take two dependent
-     instructions or a load, while others might be moved from a GPR.
-     4 seems to be a reasonable compromise in practice.  */
-  4, /* scalar_to_vec_cost  */
-  4, /* align_load_cost  */
-  4, /* unalign_load_cost  */
-  /* Although stores have a latency of 2 and compete for the
-     vector pipes, in practice it's better not to model that.  */
-  1, /* unalign_store_cost  */
-  1  /* store_cost  */
-};
-
-static const sve_vec_cost neoversev2_sve_vector_cost =
-{
-  {
-    2, /* int_stmt_cost  */
-    2, /* fp_stmt_cost  */
-    3, /* ld2_st2_permute_cost  */
-    3, /* ld3_st3_permute_cost  */
-    4, /* ld4_st4_permute_cost  */
-    3, /* permute_cost  */
-    /* Theoretically, a reduction involving 15 scalar ADDs could
-       complete in ~3 cycles and would have a cost of 15.  [SU]ADDV
-       completes in 11 cycles, so give it a cost of 15 + 8.  */
-    21, /* reduc_i8_cost  */
-    /* Likewise for 7 scalar ADDs (~2 cycles) vs. 9: 7 + 7.  */
-    14, /* reduc_i16_cost  */
-    /* Likewise for 3 scalar ADDs (~2 cycles) vs. 8: 3 + 4.  */
-    7, /* reduc_i32_cost  */
-    /* Likewise for 1 scalar ADD (~1 cycles) vs. 2: 1 + 1.  */
-    2, /* reduc_i64_cost  */
-    /* Theoretically, a reduction involving 7 scalar FADDs could
-       complete in ~6 cycles and would have a cost of 14.  FADDV
-       completes in 8 cycles, so give it a cost of 14 + 2.  */
-    16, /* reduc_f16_cost  */
-    /* Likewise for 3 scalar FADDs (~4 cycles) vs. 6: 6 + 2.  */
-    8, /* reduc_f32_cost  */
-    /* Likewise for 1 scalar FADD (~2 cycles) vs. 4: 2 + 2.  */
-    4, /* reduc_f64_cost  */
-    2, /* store_elt_extra_cost  */
-    /* This value is just inherited from the Cortex-A57 table.  */
-    8, /* vec_to_scalar_cost  */
-    /* See the comment above the Advanced SIMD versions.  */
-    4, /* scalar_to_vec_cost  */
-    4, /* align_load_cost  */
-    4, /* unalign_load_cost  */
-    /* Although stores have a latency of 2 and compete for the
-       vector pipes, in practice it's better not to model that.  */
-    1, /* unalign_store_cost  */
-    1  /* store_cost  */
-  },
-  3, /* clast_cost  */
-  10, /* fadda_f16_cost  */
-  6, /* fadda_f32_cost  */
-  4, /* fadda_f64_cost  */
-  /* A strided Advanced SIMD x64 load would take two parallel FP loads
-     (8 cycles) plus an insertion (2 cycles).  Assume a 64-bit SVE gather
-     is 1 cycle more.  The Advanced SIMD version is costed as 2 scalar loads
-     (cost 8) and a vec_construct (cost 2).  Add a full vector operation
-     (cost 2) to that, to avoid the difference being lost in rounding.
-
-     There is no easy comparison between a strided Advanced SIMD x32 load
-     and an SVE 32-bit gather, but cost an SVE 32-bit gather as 1 vector
-     operation more than a 64-bit gather.  */
-  14, /* gather_load_x32_cost  */
-  12, /* gather_load_x64_cost  */
-  3 /* scatter_store_elt_cost  */
-};
-
-static const aarch64_scalar_vec_issue_info neoversev2_scalar_issue_info =
-{
-  3, /* loads_stores_per_cycle  */
-  2, /* stores_per_cycle  */
-  6, /* general_ops_per_cycle  */
-  0, /* fp_simd_load_general_ops  */
-  1 /* fp_simd_store_general_ops  */
-};
-
-static const aarch64_advsimd_vec_issue_info neoversev2_advsimd_issue_info =
-{
-  {
-    3, /* loads_stores_per_cycle  */
-    2, /* stores_per_cycle  */
-    4, /* general_ops_per_cycle  */
-    0, /* fp_simd_load_general_ops  */
-    1 /* fp_simd_store_general_ops  */
-  },
-  2, /* ld2_st2_general_ops  */
-  2, /* ld3_st3_general_ops  */
-  3 /* ld4_st4_general_ops  */
-};
-
-static const aarch64_sve_vec_issue_info neoversev2_sve_issue_info =
-{
-  {
-    {
-      3, /* loads_per_cycle  */
-      2, /* stores_per_cycle  */
-      4, /* general_ops_per_cycle  */
-      0, /* fp_simd_load_general_ops  */
-      1 /* fp_simd_store_general_ops  */
-    },
-    2, /* ld2_st2_general_ops  */
-    3, /* ld3_st3_general_ops  */
-    3 /* ld4_st4_general_ops  */
-  },
-  2, /* pred_ops_per_cycle  */
-  2, /* while_pred_ops  */
-  2, /* int_cmp_pred_ops  */
-  1, /* fp_cmp_pred_ops  */
-  1, /* gather_scatter_pair_general_ops  */
-  1 /* gather_scatter_pair_pred_ops  */
-};
-
-static const aarch64_vec_issue_info neoversev2_vec_issue_info =
-{
-  &neoversev2_scalar_issue_info,
-  &neoversev2_advsimd_issue_info,
-  &neoversev2_sve_issue_info
-};
-
-/* Demeter costs for vector insn classes.  */
-static const struct cpu_vector_cost neoversev2_vector_cost =
-{
-  1, /* scalar_int_stmt_cost  */
-  2, /* scalar_fp_stmt_cost  */
-  4, /* scalar_load_cost  */
-  1, /* scalar_store_cost  */
-  1, /* cond_taken_branch_cost  */
-  1, /* cond_not_taken_branch_cost  */
-  &neoversev2_advsimd_vector_cost, /* advsimd  */
-  &neoversev2_sve_vector_cost, /* sve  */
-  &neoversev2_vec_issue_info /* issue_info  */
-};
-
-static const struct tune_params neoversev2_tunings =
-{
-  &cortexa76_extra_costs,
-  &neoversev2_addrcost_table,
-  &neoversev2_regmove_cost,
-  &neoversev2_vector_cost,
-  &generic_branch_cost,
-  &generic_approx_modes,
-  SVE_128, /* sve_width  */
-  { 4, /* load_int.  */
-    2, /* store_int.  */
-    6, /* load_fp.  */
-    1, /* store_fp.  */
-    6, /* load_pred.  */
-    2 /* store_pred.  */
-  }, /* memmov_cost.  */
-  5, /* issue_rate  */
-  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops  */
-  "32:16",	/* function_align.  */
-  "4",		/* jump_align.  */
-  "32:16",	/* loop_align.  */
-  3,	/* int_reassoc_width.  */
-  6,	/* fp_reassoc_width.  */
-  4,	/* fma_reassoc_width.  */
-  3,	/* vec_reassoc_width.  */
-  2,	/* min_div_recip_mul_sf.  */
-  2,	/* min_div_recip_mul_df.  */
-  0,	/* max_case_values.  */
-  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
-  (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND
-   | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
-   | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
-   | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT),	/* tune_flags.  */
-  &generic_prefetch_tune,
-  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
-  AARCH64_LDP_STP_POLICY_ALWAYS	   /* stp_policy_model.  */
-};
-
-static const struct tune_params a64fx_tunings =
-{
-  &a64fx_extra_costs,
-  &a64fx_addrcost_table,
-  &a64fx_regmove_cost,
-  &a64fx_vector_cost,
-  &generic_branch_cost,
-  &generic_approx_modes,
-  SVE_512, /* sve_width  */
-  { 4, /* load_int.  */
-    4, /* store_int.  */
-    4, /* load_fp.  */
-    4, /* store_fp.  */
-    4, /* load_pred.  */
-    4 /* store_pred.  */
-  }, /* memmov_cost.  */
-  7, /* issue_rate  */
-  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops  */
-  "32",	/* function_align.  */
-  "16",	/* jump_align.  */
-  "32",	/* loop_align.  */
-  4,	/* int_reassoc_width.  */
-  2,	/* fp_reassoc_width.  */
-  1,	/* fma_reassoc_width.  */
-  2,	/* vec_reassoc_width.  */
-  2,	/* min_div_recip_mul_sf.  */
-  2,	/* min_div_recip_mul_df.  */
-  0,	/* max_case_values.  */
-  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
-  (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
-  &a64fx_prefetch_tune,
-  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
-  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
-};
+#include "tuning_models/generic.h"
+#include "tuning_models/cortexa35.h"
+#include "tuning_models/cortexa53.h"
+#include "tuning_models/cortexa57.h"
+#include "tuning_models/cortexa72.h"
+#include "tuning_models/cortexa73.h"
+#include "tuning_models/exynosm1.h"
+#include "tuning_models/thunderxt88.h"
+#include "tuning_models/thunderx.h"
+#include "tuning_models/tsv110.h"
+#include "tuning_models/xgene1.h"
+#include "tuning_models/emag.h"
+#include "tuning_models/qdf24xx.h"
+#include "tuning_models/saphira.h"
+#include "tuning_models/thunderx2t99.h"
+#include "tuning_models/thunderx3t110.h"
+#include "tuning_models/neoversen1.h"
+#include "tuning_models/ampere1.h"
+#include "tuning_models/ampere1a.h"
+#include "tuning_models/neoversev1.h"
+#include "tuning_models/neoverse512tvb.h"
+#include "tuning_models/neoversen2.h"
+#include "tuning_models/neoversev2.h"
+#include "tuning_models/a64fx.h"
 
 /* Support for fine-grained override of the tuning structures.  */
 struct aarch64_tuning_override_function
diff --git a/gcc/config/aarch64/tuning_models/a64fx.h b/gcc/config/aarch64/tuning_models/a64fx.h
new file mode 100644
index 0000000000000000000000000000000000000000..7b06c27eba1e4de01738bdfdc077460f9135fb41
--- /dev/null
+++ b/gcc/config/aarch64/tuning_models/a64fx.h
@@ -0,0 +1,169 @@
+/* Tuning model description for AArch64 architecture.
+   Copyright (C) 2009-2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_AARCH64_H_A64FX
+#define GCC_AARCH64_H_A64FX
+
+#include "generic.h"
+
+static const struct cpu_addrcost_table a64fx_addrcost_table =
+{
+    {
+      1, /* hi  */
+      1, /* si  */
+      1, /* di  */
+      2, /* ti  */
+    },
+  0, /* pre_modify  */
+  0, /* post_modify  */
+  0, /* post_modify_ld3_st3  */
+  0, /* post_modify_ld4_st4  */
+  2, /* register_offset  */
+  3, /* register_sextend  */
+  3, /* register_zextend  */
+  0, /* imm_offset  */
+};
+
+static const struct cpu_regmove_cost a64fx_regmove_cost =
+{
+  1, /* GP2GP  */
+  /* Avoid the use of slow int<->fp moves for spilling by setting
+     their cost higher than memmov_cost.  */
+  5, /* GP2FP  */
+  7, /* FP2GP  */
+  2 /* FP2FP  */
+};
+
+static const advsimd_vec_cost a64fx_advsimd_vector_cost =
+{
+  2, /* int_stmt_cost  */
+  5, /* fp_stmt_cost  */
+  0, /* ld2_st2_permute_cost  */
+  0, /* ld3_st3_permute_cost  */
+  0, /* ld4_st4_permute_cost  */
+  3, /* permute_cost  */
+  13, /* reduc_i8_cost  */
+  13, /* reduc_i16_cost  */
+  13, /* reduc_i32_cost  */
+  13, /* reduc_i64_cost  */
+  13, /* reduc_f16_cost  */
+  13, /* reduc_f32_cost  */
+  13, /* reduc_f64_cost  */
+  13, /* store_elt_extra_cost  */
+  13, /* vec_to_scalar_cost  */
+  4, /* scalar_to_vec_cost  */
+  6, /* align_load_cost  */
+  6, /* unalign_load_cost  */
+  1, /* unalign_store_cost  */
+  1  /* store_cost  */
+};
+
+static const sve_vec_cost a64fx_sve_vector_cost =
+{
+  {
+    2, /* int_stmt_cost  */
+    5, /* fp_stmt_cost  */
+    0, /* ld2_st2_permute_cost  */
+    0, /* ld3_st3_permute_cost  */
+    0, /* ld4_st4_permute_cost  */
+    3, /* permute_cost  */
+    13, /* reduc_i8_cost  */
+    13, /* reduc_i16_cost  */
+    13, /* reduc_i32_cost  */
+    13, /* reduc_i64_cost  */
+    13, /* reduc_f16_cost  */
+    13, /* reduc_f32_cost  */
+    13, /* reduc_f64_cost  */
+    13, /* store_elt_extra_cost  */
+    13, /* vec_to_scalar_cost  */
+    4, /* scalar_to_vec_cost  */
+    6, /* align_load_cost  */
+    6, /* unalign_load_cost  */
+    1, /* unalign_store_cost  */
+    1  /* store_cost  */
+  },
+  13, /* clast_cost  */
+  13, /* fadda_f16_cost  */
+  13, /* fadda_f32_cost  */
+  13, /* fadda_f64_cost  */
+  64, /* gather_load_x32_cost  */
+  32, /* gather_load_x64_cost  */
+  1 /* scatter_store_elt_cost  */
+};
+
+static const struct cpu_vector_cost a64fx_vector_cost =
+{
+  1, /* scalar_int_stmt_cost  */
+  5, /* scalar_fp_stmt_cost  */
+  4, /* scalar_load_cost  */
+  1, /* scalar_store_cost  */
+  3, /* cond_taken_branch_cost  */
+  1, /* cond_not_taken_branch_cost  */
+  &a64fx_advsimd_vector_cost, /* advsimd  */
+  &a64fx_sve_vector_cost, /* sve  */
+  nullptr /* issue_info  */
+};
+
+static const cpu_prefetch_tune a64fx_prefetch_tune =
+{
+  8,			/* num_slots  */
+  64,			/* l1_cache_size  */
+  256,			/* l1_cache_line_size  */
+  32768,		/* l2_cache_size  */
+  true,			/* prefetch_dynamic_strides */
+  -1,			/* minimum_stride */
+  -1			/* default_opt_level  */
+};
+
+static const struct tune_params a64fx_tunings =
+{
+  &a64fx_extra_costs,
+  &a64fx_addrcost_table,
+  &a64fx_regmove_cost,
+  &a64fx_vector_cost,
+  &generic_branch_cost,
+  &generic_approx_modes,
+  SVE_512, /* sve_width  */
+  { 4, /* load_int.  */
+    4, /* store_int.  */
+    4, /* load_fp.  */
+    4, /* store_fp.  */
+    4, /* load_pred.  */
+    4 /* store_pred.  */
+  }, /* memmov_cost.  */
+  7, /* issue_rate  */
+  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops  */
+  "32",	/* function_align.  */
+  "16",	/* jump_align.  */
+  "32",	/* loop_align.  */
+  4,	/* int_reassoc_width.  */
+  2,	/* fp_reassoc_width.  */
+  1,	/* fma_reassoc_width.  */
+  2,	/* vec_reassoc_width.  */
+  2,	/* min_div_recip_mul_sf.  */
+  2,	/* min_div_recip_mul_df.  */
+  0,	/* max_case_values.  */
+  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
+  (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
+  &a64fx_prefetch_tune,
+  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
+  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
+};
+
+#endif /* GCC_AARCH64_H_A64FX.  */
diff --git a/gcc/config/aarch64/tuning_models/ampere1.h b/gcc/config/aarch64/tuning_models/ampere1.h
new file mode 100644
index 0000000000000000000000000000000000000000..8d2a1c696103259f23cf73df26cef9d4fa05ac73
--- /dev/null
+++ b/gcc/config/aarch64/tuning_models/ampere1.h
@@ -0,0 +1,113 @@
+/* Tuning model description for AArch64 architecture.
+   Copyright (C) 2009-2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_AARCH64_H_AMPERE1
+#define GCC_AARCH64_H_AMPERE1
+
+#include "generic.h"
+
+static const advsimd_vec_cost ampere1_advsimd_vector_cost =
+{
+  1, /* int_stmt_cost  */
+  3, /* fp_stmt_cost  */
+  0, /* ld2_st2_permute_cost  */
+  0, /* ld3_st3_permute_cost  */
+  0, /* ld4_st4_permute_cost  */
+  2, /* permute_cost  */
+  12, /* reduc_i8_cost  */
+  9, /* reduc_i16_cost  */
+  6, /* reduc_i32_cost  */
+  5, /* reduc_i64_cost  */
+  9, /* reduc_f16_cost  */
+  6, /* reduc_f32_cost  */
+  5, /* reduc_f64_cost  */
+  8, /* store_elt_extra_cost  */
+  6, /* vec_to_scalar_cost  */
+  7, /* scalar_to_vec_cost  */
+  4, /* align_load_cost  */
+  4, /* unalign_load_cost  */
+  1, /* unalign_store_cost  */
+  1  /* store_cost  */
+};
+
+/* Ampere-1 costs for vector insn classes.  */
+static const struct cpu_vector_cost ampere1_vector_cost =
+{
+  1, /* scalar_int_stmt_cost  */
+  3, /* scalar_fp_stmt_cost  */
+  4, /* scalar_load_cost  */
+  1, /* scalar_store_cost  */
+  1, /* cond_taken_branch_cost  */
+  1, /* cond_not_taken_branch_cost  */
+  &ampere1_advsimd_vector_cost, /* advsimd  */
+  nullptr, /* sve  */
+  nullptr  /* issue_info  */
+};
+
+static const cpu_prefetch_tune ampere1_prefetch_tune =
+{
+  0,			/* num_slots  */
+  64,			/* l1_cache_size  */
+  64,			/* l1_cache_line_size  */
+  2048,			/* l2_cache_size  */
+  true,			/* prefetch_dynamic_strides */
+  -1,			/* minimum_stride */
+  -1			/* default_opt_level  */
+};
+
+static const struct tune_params ampere1_tunings =
+{
+  &ampere1_extra_costs,
+  &generic_addrcost_table,
+  &generic_regmove_cost,
+  &ampere1_vector_cost,
+  &generic_branch_cost,
+  &generic_approx_modes,
+  SVE_NOT_IMPLEMENTED, /* sve_width  */
+  { 4, /* load_int.  */
+    4, /* store_int.  */
+    4, /* load_fp.  */
+    4, /* store_fp.  */
+    4, /* load_pred.  */
+    4 /* store_pred.  */
+  }, /* memmov_cost.  */
+  4, /* issue_rate  */
+  (AARCH64_FUSE_ADRP_ADD | AARCH64_FUSE_AES_AESMC |
+   AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_MOVK_MOVK |
+   AARCH64_FUSE_ALU_BRANCH /* adds, ands, bics, ccmp, ccmn */ |
+   AARCH64_FUSE_CMP_BRANCH),
+  /* fusible_ops  */
+  "32",		/* function_align.  */
+  "4",		/* jump_align.  */
+  "32:16",	/* loop_align.  */
+  2,	/* int_reassoc_width.  */
+  4,	/* fp_reassoc_width.  */
+  4,	/* fma_reassoc_width.  */
+  2,	/* vec_reassoc_width.  */
+  2,	/* min_div_recip_mul_sf.  */
+  2,	/* min_div_recip_mul_df.  */
+  0,	/* max_case_values.  */
+  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
+  (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
+  &ampere1_prefetch_tune,
+  AARCH64_LDP_STP_POLICY_ALIGNED,   /* ldp_policy_model.  */
+  AARCH64_LDP_STP_POLICY_ALIGNED    /* stp_policy_model.  */
+};
+
+#endif /* GCC_AARCH64_H_AMPERE1.  */
diff --git a/gcc/config/aarch64/tuning_models/ampere1a.h b/gcc/config/aarch64/tuning_models/ampere1a.h
new file mode 100644
index 0000000000000000000000000000000000000000..c419ffb3c1a936a01690ad157c6c71dc645273c8
--- /dev/null
+++ b/gcc/config/aarch64/tuning_models/ampere1a.h
@@ -0,0 +1,65 @@
+/* Tuning model description for AArch64 architecture.
+   Copyright (C) 2009-2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_AARCH64_H_AMPERE1A
+#define GCC_AARCH64_H_AMPERE1A
+
+#include "generic.h"
+
+static const struct tune_params ampere1a_tunings =
+{
+  &ampere1a_extra_costs,
+  &generic_addrcost_table,
+  &generic_regmove_cost,
+  &ampere1_vector_cost,
+  &generic_branch_cost,
+  &generic_approx_modes,
+  SVE_NOT_IMPLEMENTED, /* sve_width  */
+  { 4, /* load_int.  */
+    4, /* store_int.  */
+    4, /* load_fp.  */
+    4, /* store_fp.  */
+    4, /* load_pred.  */
+    4 /* store_pred.  */
+  }, /* memmov_cost.  */
+  4, /* issue_rate  */
+  (AARCH64_FUSE_ADRP_ADD | AARCH64_FUSE_AES_AESMC |
+   AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_MOVK_MOVK |
+   AARCH64_FUSE_ALU_BRANCH /* adds, ands, bics, ccmp, ccmn */ |
+   AARCH64_FUSE_CMP_BRANCH | AARCH64_FUSE_ALU_CBZ |
+   AARCH64_FUSE_ADDSUB_2REG_CONST1),
+  /* fusible_ops  */
+  "32",		/* function_align.  */
+  "4",		/* jump_align.  */
+  "32:16",	/* loop_align.  */
+  2,	/* int_reassoc_width.  */
+  4,	/* fp_reassoc_width.  */
+  1,	/* fma_reassoc_width.  */
+  2,	/* vec_reassoc_width.  */
+  2,	/* min_div_recip_mul_sf.  */
+  2,	/* min_div_recip_mul_df.  */
+  0,	/* max_case_values.  */
+  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
+  (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
+  &ampere1_prefetch_tune,
+  AARCH64_LDP_STP_POLICY_ALIGNED,   /* ldp_policy_model.  */
+  AARCH64_LDP_STP_POLICY_ALIGNED    /* stp_policy_model.  */
+};
+
+#endif /* GCC_AARCH64_H_AMPERE1A.  */
diff --git a/gcc/config/aarch64/tuning_models/cortexa35.h b/gcc/config/aarch64/tuning_models/cortexa35.h
new file mode 100644
index 0000000000000000000000000000000000000000..5534335348db96cc57fc9eccd7ff79a624cb528a
--- /dev/null
+++ b/gcc/config/aarch64/tuning_models/cortexa35.h
@@ -0,0 +1,62 @@
+/* Tuning model description for AArch64 architecture.
+   Copyright (C) 2009-2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_AARCH64_H_CORTEXA35
+#define GCC_AARCH64_H_CORTEXA35
+
+#include "generic.h"
+#include "cortexa53.h"
+
+static const struct tune_params cortexa35_tunings =
+{
+  &cortexa53_extra_costs,
+  &generic_addrcost_table,
+  &cortexa53_regmove_cost,
+  &generic_vector_cost,
+  &generic_branch_cost,
+  &generic_approx_modes,
+  SVE_NOT_IMPLEMENTED, /* sve_width  */
+  { 4, /* load_int.  */
+    4, /* store_int.  */
+    4, /* load_fp.  */
+    4, /* store_fp.  */
+    4, /* load_pred.  */
+    4 /* store_pred.  */
+  }, /* memmov_cost.  */
+  1, /* issue_rate  */
+  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
+   | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops  */
+  "16",	/* function_align.  */
+  "4",	/* jump_align.  */
+  "8",	/* loop_align.  */
+  2,	/* int_reassoc_width.  */
+  4,	/* fp_reassoc_width.  */
+  1,	/* fma_reassoc_width.  */
+  1,	/* vec_reassoc_width.  */
+  2,	/* min_div_recip_mul_sf.  */
+  2,	/* min_div_recip_mul_df.  */
+  0,	/* max_case_values.  */
+  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
+  (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
+  &generic_prefetch_tune,
+  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
+  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
+};
+
+#endif /* GCC_AARCH64_H_CORTEXA35.  */
diff --git a/gcc/config/aarch64/tuning_models/cortexa53.h b/gcc/config/aarch64/tuning_models/cortexa53.h
new file mode 100644
index 0000000000000000000000000000000000000000..9dfdccc5968e7f062af5c78f153bfe3838263b0a
--- /dev/null
+++ b/gcc/config/aarch64/tuning_models/cortexa53.h
@@ -0,0 +1,71 @@
+/* Tuning model description for AArch64 architecture.
+   Copyright (C) 2009-2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_AARCH64_H_CORTEXA53
+#define GCC_AARCH64_H_CORTEXA53
+
+#include "generic.h"
+
+static const struct cpu_regmove_cost cortexa53_regmove_cost =
+{
+  1, /* GP2GP  */
+  /* Avoid the use of slow int<->fp moves for spilling by setting
+     their cost higher than memmov_cost.  */
+  5, /* GP2FP  */
+  5, /* FP2GP  */
+  2 /* FP2FP  */
+};
+
+static const struct tune_params cortexa53_tunings =
+{
+  &cortexa53_extra_costs,
+  &generic_addrcost_table,
+  &cortexa53_regmove_cost,
+  &generic_vector_cost,
+  &generic_branch_cost,
+  &generic_approx_modes,
+  SVE_NOT_IMPLEMENTED, /* sve_width  */
+  { 4, /* load_int.  */
+    4, /* store_int.  */
+    4, /* load_fp.  */
+    4, /* store_fp.  */
+    4, /* load_pred.  */
+    4 /* store_pred.  */
+  }, /* memmov_cost.  */
+  2, /* issue_rate  */
+  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
+   | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops  */
+  "16",	/* function_align.  */
+  "4",	/* jump_align.  */
+  "8",	/* loop_align.  */
+  2,	/* int_reassoc_width.  */
+  4,	/* fp_reassoc_width.  */
+  1,	/* fma_reassoc_width.  */
+  1,	/* vec_reassoc_width.  */
+  2,	/* min_div_recip_mul_sf.  */
+  2,	/* min_div_recip_mul_df.  */
+  0,	/* max_case_values.  */
+  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
+  (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
+  &generic_prefetch_tune,
+  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
+  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
+};
+
+#endif /* GCC_AARCH64_H_CORTEXA53.  */
diff --git a/gcc/config/aarch64/tuning_models/cortexa57.h b/gcc/config/aarch64/tuning_models/cortexa57.h
new file mode 100644
index 0000000000000000000000000000000000000000..9c4789d57833a5879dda8e2fe454ac5f56cb0601
--- /dev/null
+++ b/gcc/config/aarch64/tuning_models/cortexa57.h
@@ -0,0 +1,109 @@
+/* Tuning model description for AArch64 architecture.
+   Copyright (C) 2009-2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_AARCH64_H_CORTEXA57
+#define GCC_AARCH64_H_CORTEXA57
+
+#include "generic.h"
+
+static const struct cpu_regmove_cost cortexa57_regmove_cost =
+{
+  1, /* GP2GP  */
+  /* Avoid the use of slow int<->fp moves for spilling by setting
+     their cost higher than memmov_cost.  */
+  5, /* GP2FP  */
+  5, /* FP2GP  */
+  2 /* FP2FP  */
+};
+
+static const advsimd_vec_cost cortexa57_advsimd_vector_cost =
+{
+  2, /* int_stmt_cost  */
+  2, /* fp_stmt_cost  */
+  0, /* ld2_st2_permute_cost  */
+  0, /* ld3_st3_permute_cost  */
+  0, /* ld4_st4_permute_cost  */
+  3, /* permute_cost  */
+  8, /* reduc_i8_cost  */
+  8, /* reduc_i16_cost  */
+  8, /* reduc_i32_cost  */
+  8, /* reduc_i64_cost  */
+  8, /* reduc_f16_cost  */
+  8, /* reduc_f32_cost  */
+  8, /* reduc_f64_cost  */
+  8, /* store_elt_extra_cost  */
+  8, /* vec_to_scalar_cost  */
+  8, /* scalar_to_vec_cost  */
+  4, /* align_load_cost  */
+  4, /* unalign_load_cost  */
+  1, /* unalign_store_cost  */
+  1  /* store_cost  */
+};
+
+/* Cortex-A57 costs for vector insn classes.  */
+static const struct cpu_vector_cost cortexa57_vector_cost =
+{
+  1, /* scalar_int_stmt_cost  */
+  1, /* scalar_fp_stmt_cost  */
+  4, /* scalar_load_cost  */
+  1, /* scalar_store_cost  */
+  1, /* cond_taken_branch_cost  */
+  1, /* cond_not_taken_branch_cost  */
+  &cortexa57_advsimd_vector_cost, /* advsimd  */
+  nullptr, /* sve  */
+  nullptr /* issue_info  */
+};
+
+static const struct tune_params cortexa57_tunings =
+{
+  &cortexa57_extra_costs,
+  &generic_addrcost_table,
+  &cortexa57_regmove_cost,
+  &cortexa57_vector_cost,
+  &generic_branch_cost,
+  &generic_approx_modes,
+  SVE_NOT_IMPLEMENTED, /* sve_width  */
+  { 4, /* load_int.  */
+    4, /* store_int.  */
+    4, /* load_fp.  */
+    4, /* store_fp.  */
+    4, /* load_pred.  */
+    4 /* store_pred.  */
+  }, /* memmov_cost.  */
+  3, /* issue_rate  */
+  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
+   | AARCH64_FUSE_MOVK_MOVK), /* fusible_ops  */
+  "16",	/* function_align.  */
+  "4",	/* jump_align.  */
+  "8",	/* loop_align.  */
+  2,	/* int_reassoc_width.  */
+  4,	/* fp_reassoc_width.  */
+  1,	/* fma_reassoc_width.  */
+  1,	/* vec_reassoc_width.  */
+  2,	/* min_div_recip_mul_sf.  */
+  2,	/* min_div_recip_mul_df.  */
+  0,	/* max_case_values.  */
+  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
+  (AARCH64_EXTRA_TUNE_RENAME_FMA_REGS),	/* tune_flags.  */
+  &generic_prefetch_tune,
+  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
+  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
+};
+
+#endif /* GCC_AARCH64_H_CORTEXA57.  */
diff --git a/gcc/config/aarch64/tuning_models/cortexa72.h b/gcc/config/aarch64/tuning_models/cortexa72.h
new file mode 100644
index 0000000000000000000000000000000000000000..968171c9b2e898d7479dbcb462e33fe3905e183d
--- /dev/null
+++ b/gcc/config/aarch64/tuning_models/cortexa72.h
@@ -0,0 +1,61 @@
+/* Tuning model description for AArch64 architecture.
+   Copyright (C) 2009-2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_AARCH64_H_CORTEXA72
+#define GCC_AARCH64_H_CORTEXA72
+
+#include "generic.h"
+
+static const struct tune_params cortexa72_tunings =
+{
+  &cortexa57_extra_costs,
+  &generic_addrcost_table,
+  &cortexa57_regmove_cost,
+  &cortexa57_vector_cost,
+  &generic_branch_cost,
+  &generic_approx_modes,
+  SVE_NOT_IMPLEMENTED, /* sve_width  */
+  { 4, /* load_int.  */
+    4, /* store_int.  */
+    4, /* load_fp.  */
+    4, /* store_fp.  */
+    4, /* load_pred.  */
+    4 /* store_pred.  */
+  }, /* memmov_cost.  */
+  3, /* issue_rate  */
+  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
+   | AARCH64_FUSE_MOVK_MOVK), /* fusible_ops  */
+  "16",	/* function_align.  */
+  "4",	/* jump_align.  */
+  "8",	/* loop_align.  */
+  2,	/* int_reassoc_width.  */
+  4,	/* fp_reassoc_width.  */
+  1,	/* fma_reassoc_width.  */
+  1,	/* vec_reassoc_width.  */
+  2,	/* min_div_recip_mul_sf.  */
+  2,	/* min_div_recip_mul_df.  */
+  0,	/* max_case_values.  */
+  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
+  (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
+  &generic_prefetch_tune,
+  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
+  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
+};
+
+#endif /* GCC_AARCH64_H_CORTEXA72.  */
diff --git a/gcc/config/aarch64/tuning_models/cortexa73.h b/gcc/config/aarch64/tuning_models/cortexa73.h
new file mode 100644
index 0000000000000000000000000000000000000000..8d1a504ddac39604dd193ce0f434fd2f5145c129
--- /dev/null
+++ b/gcc/config/aarch64/tuning_models/cortexa73.h
@@ -0,0 +1,62 @@
+/* Tuning model description for AArch64 architecture.
+   Copyright (C) 2009-2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_AARCH64_H_CORTEXA73
+#define GCC_AARCH64_H_CORTEXA73
+
+#include "generic.h"
+
+static const struct tune_params cortexa73_tunings =
+{
+  &cortexa57_extra_costs,
+  &generic_addrcost_table,
+  &cortexa57_regmove_cost,
+  &cortexa57_vector_cost,
+  &generic_branch_cost,
+  &generic_approx_modes,
+  SVE_NOT_IMPLEMENTED, /* sve_width  */
+  { 4, /* load_int.  */
+    4, /* store_int.  */
+    4, /* load_fp.  */
+    4, /* store_fp.  */
+    4, /* load_pred.  */
+    4 /* store_pred.  */
+  }, /* memmov_cost.  */
+  2, /* issue_rate.  */
+  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
+   | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops  */
+  "16",	/* function_align.  */
+  "4",	/* jump_align.  */
+  "8",	/* loop_align.  */
+  2,	/* int_reassoc_width.  */
+  4,	/* fp_reassoc_width.  */
+  1,	/* fma_reassoc_width.  */
+  1,	/* vec_reassoc_width.  */
+  2,	/* min_div_recip_mul_sf.  */
+  2,	/* min_div_recip_mul_df.  */
+  0,	/* max_case_values.  */
+  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
+  (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
+  &generic_prefetch_tune,
+  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
+  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
+};
+
+
+#endif /* GCC_AARCH64_H_CORTEXA73.  */
diff --git a/gcc/config/aarch64/tuning_models/emag.h b/gcc/config/aarch64/tuning_models/emag.h
new file mode 100644
index 0000000000000000000000000000000000000000..3f3402c3fc2a94704eeaf9223ecb0ca1c057cace
--- /dev/null
+++ b/gcc/config/aarch64/tuning_models/emag.h
@@ -0,0 +1,60 @@
+/* Tuning model description for AArch64 architecture.
+   Copyright (C) 2009-2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_AARCH64_H_EMAG
+#define GCC_AARCH64_H_EMAG
+
+#include "generic.h"
+
+static const struct tune_params emag_tunings =
+{
+  &xgene1_extra_costs,
+  &xgene1_addrcost_table,
+  &xgene1_regmove_cost,
+  &xgene1_vector_cost,
+  &generic_branch_cost,
+  &xgene1_approx_modes,
+  SVE_NOT_IMPLEMENTED,
+  { 6, /* load_int.  */
+    6, /* store_int.  */
+    6, /* load_fp.  */
+    6, /* store_fp.  */
+    6, /* load_pred.  */
+    6 /* store_pred.  */
+  }, /* memmov_cost.  */
+  4, /* issue_rate  */
+  AARCH64_FUSE_NOTHING, /* fusible_ops  */
+  "16",	/* function_align.  */
+  "16",	/* jump_align.  */
+  "16",	/* loop_align.  */
+  2,	/* int_reassoc_width.  */
+  4,	/* fp_reassoc_width.  */
+  1,	/* fma_reassoc_width.  */
+  1,	/* vec_reassoc_width.  */
+  2,	/* min_div_recip_mul_sf.  */
+  2,	/* min_div_recip_mul_df.  */
+  17,	/* max_case_values.  */
+  tune_params::AUTOPREFETCHER_OFF,	/* autoprefetcher_model.  */
+  (AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS),	/* tune_flags.  */
+  &xgene1_prefetch_tune,
+  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
+  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
+};
+
+#endif /* GCC_AARCH64_H_EMAG.  */
diff --git a/gcc/config/aarch64/tuning_models/exynosm1.h b/gcc/config/aarch64/tuning_models/exynosm1.h
new file mode 100644
index 0000000000000000000000000000000000000000..a42ea4df97f3f048c41481c304fd3684a69d743b
--- /dev/null
+++ b/gcc/config/aarch64/tuning_models/exynosm1.h
@@ -0,0 +1,144 @@
+/* Tuning model description for AArch64 architecture.
+   Copyright (C) 2009-2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_AARCH64_H_EXYNOSM1
+#define GCC_AARCH64_H_EXYNOSM1
+
+#include "generic.h"
+
+static const struct cpu_addrcost_table exynosm1_addrcost_table =
+{
+    {
+      0, /* hi  */
+      0, /* si  */
+      0, /* di  */
+      2, /* ti  */
+    },
+  0, /* pre_modify  */
+  0, /* post_modify  */
+  0, /* post_modify_ld3_st3  */
+  0, /* post_modify_ld4_st4  */
+  1, /* register_offset  */
+  1, /* register_sextend  */
+  2, /* register_zextend  */
+  0, /* imm_offset  */
+};
+
+static const struct cpu_regmove_cost exynosm1_regmove_cost =
+{
+  1, /* GP2GP  */
+  /* Avoid the use of slow int<->fp moves for spilling by setting
+     their cost higher than memmov_cost (actual, 4 and 9).  */
+  9, /* GP2FP  */
+  9, /* FP2GP  */
+  1 /* FP2FP  */
+};
+
+static const advsimd_vec_cost exynosm1_advsimd_vector_cost =
+{
+  3, /* int_stmt_cost  */
+  3, /* fp_stmt_cost  */
+  0, /* ld2_st2_permute_cost  */
+  0, /* ld3_st3_permute_cost  */
+  0, /* ld4_st4_permute_cost  */
+  3, /* permute_cost  */
+  3, /* reduc_i8_cost  */
+  3, /* reduc_i16_cost  */
+  3, /* reduc_i32_cost  */
+  3, /* reduc_i64_cost  */
+  3, /* reduc_f16_cost  */
+  3, /* reduc_f32_cost  */
+  3, /* reduc_f64_cost  */
+  3, /* store_elt_extra_cost  */
+  3, /* vec_to_scalar_cost  */
+  3, /* scalar_to_vec_cost  */
+  5, /* align_load_cost  */
+  5, /* unalign_load_cost  */
+  1, /* unalign_store_cost  */
+  1  /* store_cost  */
+};
+
+static const struct cpu_vector_cost exynosm1_vector_cost =
+{
+  1, /* scalar_int_stmt_cost  */
+  1, /* scalar_fp_stmt_cost  */
+  5, /* scalar_load_cost  */
+  1, /* scalar_store_cost  */
+  1, /* cond_taken_branch_cost  */
+  1, /* cond_not_taken_branch_cost  */
+  &exynosm1_advsimd_vector_cost, /* advsimd  */
+  nullptr, /* sve  */
+  nullptr /* issue_info  */
+};
+
+/* Approximation modes for Exynos M1.  */
+static const cpu_approx_modes exynosm1_approx_modes =
+{
+  AARCH64_APPROX_NONE,	/* division  */
+  AARCH64_APPROX_ALL,	/* sqrt  */
+  AARCH64_APPROX_ALL	/* recip_sqrt  */
+};
+
+static const cpu_prefetch_tune exynosm1_prefetch_tune =
+{
+  0,			/* num_slots  */
+  -1,			/* l1_cache_size  */
+  64,			/* l1_cache_line_size  */
+  -1,			/* l2_cache_size  */
+  true,			/* prefetch_dynamic_strides */
+  -1,			/* minimum_stride */
+  -1			/* default_opt_level  */
+};
+
+static const struct tune_params exynosm1_tunings =
+{
+  &exynosm1_extra_costs,
+  &exynosm1_addrcost_table,
+  &exynosm1_regmove_cost,
+  &exynosm1_vector_cost,
+  &generic_branch_cost,
+  &exynosm1_approx_modes,
+  SVE_NOT_IMPLEMENTED, /* sve_width  */
+  { 4, /* load_int.  */
+    4, /* store_int.  */
+    4, /* load_fp.  */
+    4, /* store_fp.  */
+    4, /* load_pred.  */
+    4 /* store_pred.  */
+  }, /* memmov_cost.  */
+  3,	/* issue_rate  */
+  (AARCH64_FUSE_AES_AESMC), /* fusible_ops  */
+  "4",	/* function_align.  */
+  "4",	/* jump_align.  */
+  "4",	/* loop_align.  */
+  2,	/* int_reassoc_width.  */
+  4,	/* fp_reassoc_width.  */
+  1,	/* fma_reassoc_width.  */
+  1,	/* vec_reassoc_width.  */
+  2,	/* min_div_recip_mul_sf.  */
+  2,	/* min_div_recip_mul_df.  */
+  48,	/* max_case_values.  */
+  tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model.  */
+  (AARCH64_EXTRA_TUNE_NONE), /* tune_flags.  */
+  &exynosm1_prefetch_tune,
+  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
+  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
+};
+
+#endif /* GCC_AARCH64_H_EXYNOSM1.  */
diff --git a/gcc/config/aarch64/tuning_models/generic.h b/gcc/config/aarch64/tuning_models/generic.h
new file mode 100644
index 0000000000000000000000000000000000000000..deb2c1cffe255bddcb5be571b12086442782da60
--- /dev/null
+++ b/gcc/config/aarch64/tuning_models/generic.h
@@ -0,0 +1,190 @@
+/* Tuning model description for AArch64 architecture.
+   Copyright (C) 2009-2023 Free Software Foundation, Inc.
+   Contributed by ARM Ltd.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_AARCH64_H_GENERIC
+#define GCC_AARCH64_H_GENERIC
+
+static const struct cpu_addrcost_table generic_addrcost_table =
+{
+    {
+      1, /* hi  */
+      0, /* si  */
+      0, /* di  */
+      1, /* ti  */
+    },
+  0, /* pre_modify  */
+  0, /* post_modify  */
+  0, /* post_modify_ld3_st3  */
+  0, /* post_modify_ld4_st4  */
+  0, /* register_offset  */
+  0, /* register_sextend  */
+  0, /* register_zextend  */
+  0 /* imm_offset  */
+};
+
+static const struct cpu_regmove_cost generic_regmove_cost =
+{
+  1, /* GP2GP  */
+  /* Avoid the use of slow int<->fp moves for spilling by setting
+     their cost higher than memmov_cost.  */
+  5, /* GP2FP  */
+  5, /* FP2GP  */
+  2 /* FP2FP  */
+};
+
+/* Generic costs for Advanced SIMD vector operations.   */
+static const advsimd_vec_cost generic_advsimd_vector_cost =
+{
+  1, /* int_stmt_cost  */
+  1, /* fp_stmt_cost  */
+  0, /* ld2_st2_permute_cost  */
+  0, /* ld3_st3_permute_cost  */
+  0, /* ld4_st4_permute_cost  */
+  2, /* permute_cost  */
+  2, /* reduc_i8_cost  */
+  2, /* reduc_i16_cost  */
+  2, /* reduc_i32_cost  */
+  2, /* reduc_i64_cost  */
+  2, /* reduc_f16_cost  */
+  2, /* reduc_f32_cost  */
+  2, /* reduc_f64_cost  */
+  2, /* store_elt_extra_cost  */
+  2, /* vec_to_scalar_cost  */
+  1, /* scalar_to_vec_cost  */
+  1, /* align_load_cost  */
+  1, /* unalign_load_cost  */
+  1, /* unalign_store_cost  */
+  1  /* store_cost  */
+};
+
+/* Generic costs for SVE vector operations.  */
+static const sve_vec_cost generic_sve_vector_cost =
+{
+  {
+    1, /* int_stmt_cost  */
+    1, /* fp_stmt_cost  */
+    0, /* ld2_st2_permute_cost  */
+    0, /* ld3_st3_permute_cost  */
+    0, /* ld4_st4_permute_cost  */
+    2, /* permute_cost  */
+    2, /* reduc_i8_cost  */
+    2, /* reduc_i16_cost  */
+    2, /* reduc_i32_cost  */
+    2, /* reduc_i64_cost  */
+    2, /* reduc_f16_cost  */
+    2, /* reduc_f32_cost  */
+    2, /* reduc_f64_cost  */
+    2, /* store_elt_extra_cost  */
+    2, /* vec_to_scalar_cost  */
+    1, /* scalar_to_vec_cost  */
+    1, /* align_load_cost  */
+    1, /* unalign_load_cost  */
+    1, /* unalign_store_cost  */
+    1  /* store_cost  */
+  },
+  2, /* clast_cost  */
+  2, /* fadda_f16_cost  */
+  2, /* fadda_f32_cost  */
+  2, /* fadda_f64_cost  */
+  4, /* gather_load_x32_cost  */
+  2, /* gather_load_x64_cost  */
+  1 /* scatter_store_elt_cost  */
+};
+
+/* Generic costs for vector insn classes.  */
+static const struct cpu_vector_cost generic_vector_cost =
+{
+  1, /* scalar_int_stmt_cost  */
+  1, /* scalar_fp_stmt_cost  */
+  1, /* scalar_load_cost  */
+  1, /* scalar_store_cost  */
+  3, /* cond_taken_branch_cost  */
+  1, /* cond_not_taken_branch_cost  */
+  &generic_advsimd_vector_cost, /* advsimd  */
+  &generic_sve_vector_cost, /* sve */
+  nullptr /* issue_info  */
+};
+
+/* Generic costs for branch instructions.  */
+static const struct cpu_branch_cost generic_branch_cost =
+{
+  1,  /* Predictable.  */
+  3   /* Unpredictable.  */
+};
+
+/* Generic approximation modes.  */
+static const cpu_approx_modes generic_approx_modes =
+{
+  AARCH64_APPROX_NONE,	/* division  */
+  AARCH64_APPROX_NONE,	/* sqrt  */
+  AARCH64_APPROX_NONE	/* recip_sqrt  */
+};
+
+/* Generic prefetch settings (which disable prefetch).  */
+static const cpu_prefetch_tune generic_prefetch_tune =
+{
+  0,			/* num_slots  */
+  -1,			/* l1_cache_size  */
+  -1,			/* l1_cache_line_size  */
+  -1,			/* l2_cache_size  */
+  true,			/* prefetch_dynamic_strides */
+  -1,			/* minimum_stride */
+  -1			/* default_opt_level  */
+};
+
+static const struct tune_params generic_tunings =
+{
+  &cortexa57_extra_costs,
+  &generic_addrcost_table,
+  &generic_regmove_cost,
+  &generic_vector_cost,
+  &generic_branch_cost,
+  &generic_approx_modes,
+  SVE_NOT_IMPLEMENTED, /* sve_width  */
+  { 4, /* load_int.  */
+    4, /* store_int.  */
+    4, /* load_fp.  */
+    4, /* store_fp.  */
+    4, /* load_pred.  */
+    4 /* store_pred.  */
+  }, /* memmov_cost.  */
+  2, /* issue_rate  */
+  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops  */
+  "16:12",	/* function_align.  */
+  "4",	/* jump_align.  */
+  "8",	/* loop_align.  */
+  2,	/* int_reassoc_width.  */
+  4,	/* fp_reassoc_width.  */
+  1,	/* fma_reassoc_width.  */
+  1,	/* vec_reassoc_width.  */
+  2,	/* min_div_recip_mul_sf.  */
+  2,	/* min_div_recip_mul_df.  */
+  0,	/* max_case_values.  */
+  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
+  /* Enabling AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS significantly benefits
+     Neoverse V1.  It does not have a noticeable effect on A64FX and should
+     have at most a very minor effect on SVE2 cores.  */
+  (AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS),	/* tune_flags.  */
+  &generic_prefetch_tune,
+  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
+  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
+};
+
+#endif /* GCC_AARCH64_H_GENERIC.  */
diff --git a/gcc/config/aarch64/tuning_models/neoverse512tvb.h b/gcc/config/aarch64/tuning_models/neoverse512tvb.h
new file mode 100644
index 0000000000000000000000000000000000000000..50d7b23712cc6a8be8f35246657ec5d86d6d4191
--- /dev/null
+++ b/gcc/config/aarch64/tuning_models/neoverse512tvb.h
@@ -0,0 +1,164 @@
+/* Tuning model description for AArch64 architecture.
+   Copyright (C) 2009-2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_AARCH64_H_NEOVERSE512TVB
+#define GCC_AARCH64_H_NEOVERSE512TVB
+
+#include "generic.h"
+
+static const sve_vec_cost neoverse512tvb_sve_vector_cost =
+{
+  {
+    2, /* int_stmt_cost  */
+    2, /* fp_stmt_cost  */
+    4, /* ld2_st2_permute_cost  */
+    5, /* ld3_st3_permute_cost  */
+    5, /* ld4_st4_permute_cost  */
+    3, /* permute_cost  */
+    /* Theoretically, a reduction involving 15 scalar ADDs could
+       complete in ~5 cycles and would have a cost of 15.  Assume that
+       [SU]ADDV completes in 11 cycles and so give it a cost of 15 + 6.  */
+    21, /* reduc_i8_cost  */
+    /* Likewise for 7 scalar ADDs (~3 cycles) vs. 9: 7 + 6.  */
+    13, /* reduc_i16_cost  */
+    /* Likewise for 3 scalar ADDs (~2 cycles) vs. 8: 3 + 6.  */
+    9, /* reduc_i32_cost  */
+    /* Likewise for 1 scalar ADD (1 cycle) vs. 8: 1 + 7.  */
+    8, /* reduc_i64_cost  */
+    /* Theoretically, a reduction involving 7 scalar FADDs could
+       complete in ~6 cycles and would have a cost of 14.  Assume that
+       FADDV completes in 8 cycles and so give it a cost of 14 + 2.  */
+    16, /* reduc_f16_cost  */
+    /* Likewise for 3 scalar FADDs (~4 cycles) vs. 6: 6 + 2.  */
+    8, /* reduc_f32_cost  */
+    /* Likewise for 1 scalar FADD (2 cycles) vs. 4: 2 + 2.  */
+    4, /* reduc_f64_cost  */
+    2, /* store_elt_extra_cost  */
+    /* This value is just inherited from the Cortex-A57 table.  */
+    8, /* vec_to_scalar_cost  */
+    /* This depends very much on what the scalar value is and
+       where it comes from.  E.g. some constants take two dependent
+       instructions or a load, while others might be moved from a GPR.
+       4 seems to be a reasonable compromise in practice.  */
+    4, /* scalar_to_vec_cost  */
+    4, /* align_load_cost  */
+    4, /* unalign_load_cost  */
+    /* Although stores generally have a latency of 2 and compete for the
+       vector pipes, in practice it's better not to model that.  */
+    1, /* unalign_store_cost  */
+    1  /* store_cost  */
+  },
+  3, /* clast_cost  */
+  10, /* fadda_f16_cost  */
+  6, /* fadda_f32_cost  */
+  4, /* fadda_f64_cost  */
+  /* A strided Advanced SIMD x64 load would take two parallel FP loads
+     (6 cycles) plus an insertion (2 cycles).  Assume a 64-bit SVE gather
+     is 1 cycle more.  The Advanced SIMD version is costed as 2 scalar loads
+     (cost 8) and a vec_construct (cost 2).  Add a full vector operation
+     (cost 2) to that, to avoid the difference being lost in rounding.
+
+     There is no easy comparison between a strided Advanced SIMD x32 load
+     and an SVE 32-bit gather, but cost an SVE 32-bit gather as 1 vector
+     operation more than a 64-bit gather.  */
+  14, /* gather_load_x32_cost  */
+  12, /* gather_load_x64_cost  */
+  3 /* scatter_store_elt_cost  */
+};
+
+static const aarch64_sve_vec_issue_info neoverse512tvb_sve_issue_info =
+{
+  {
+    {
+      3, /* loads_per_cycle  */
+      2, /* stores_per_cycle  */
+      4, /* general_ops_per_cycle  */
+      0, /* fp_simd_load_general_ops  */
+      1 /* fp_simd_store_general_ops  */
+    },
+    2, /* ld2_st2_general_ops  */
+    2, /* ld3_st3_general_ops  */
+    3 /* ld4_st4_general_ops  */
+  },
+  2, /* pred_ops_per_cycle  */
+  2, /* while_pred_ops  */
+  2, /* int_cmp_pred_ops  */
+  1, /* fp_cmp_pred_ops  */
+  1, /* gather_scatter_pair_general_ops  */
+  1 /* gather_scatter_pair_pred_ops  */
+};
+
+static const aarch64_vec_issue_info neoverse512tvb_vec_issue_info =
+{
+  &neoversev1_scalar_issue_info,
+  &neoversev1_advsimd_issue_info,
+  &neoverse512tvb_sve_issue_info
+};
+
+static const struct cpu_vector_cost neoverse512tvb_vector_cost =
+{
+  1, /* scalar_int_stmt_cost  */
+  2, /* scalar_fp_stmt_cost  */
+  4, /* scalar_load_cost  */
+  1, /* scalar_store_cost  */
+  1, /* cond_taken_branch_cost  */
+  1, /* cond_not_taken_branch_cost  */
+  &neoversev1_advsimd_vector_cost, /* advsimd  */
+  &neoverse512tvb_sve_vector_cost, /* sve  */
+  &neoverse512tvb_vec_issue_info /* issue_info  */
+};
+
+static const struct tune_params neoverse512tvb_tunings =
+{
+  &cortexa76_extra_costs,
+  &neoversev1_addrcost_table,
+  &neoversev1_regmove_cost,
+  &neoverse512tvb_vector_cost,
+  &generic_branch_cost,
+  &generic_approx_modes,
+  SVE_128 | SVE_256, /* sve_width  */
+  { 4, /* load_int.  */
+    2, /* store_int.  */
+    6, /* load_fp.  */
+    2, /* store_fp.  */
+    6, /* load_pred.  */
+    1 /* store_pred.  */
+  }, /* memmov_cost.  */
+  3, /* issue_rate  */
+  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops  */
+  "32:16",	/* function_align.  */
+  "4",		/* jump_align.  */
+  "32:16",	/* loop_align.  */
+  2,	/* int_reassoc_width.  */
+  4,	/* fp_reassoc_width.  */
+  4,	/* fma_reassoc_width.  */
+  2,	/* vec_reassoc_width.  */
+  2,	/* min_div_recip_mul_sf.  */
+  2,	/* min_div_recip_mul_df.  */
+  0,	/* max_case_values.  */
+  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
+  (AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
+   | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
+   | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT),	/* tune_flags.  */
+  &generic_prefetch_tune,
+  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
+  AARCH64_LDP_STP_POLICY_ALWAYS	   /* stp_policy_model.  */
+};
+
+#endif /* GCC_AARCH64_H_NEOVERSE512TVB.  */
diff --git a/gcc/config/aarch64/tuning_models/neoversen1.h b/gcc/config/aarch64/tuning_models/neoversen1.h
new file mode 100644
index 0000000000000000000000000000000000000000..132166d3d06430b725e4448937332cc159c11cda
--- /dev/null
+++ b/gcc/config/aarch64/tuning_models/neoversen1.h
@@ -0,0 +1,60 @@
+/* Tuning model description for AArch64 architecture.
+   Copyright (C) 2009-2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_AARCH64_H_NEOVERSEN1
+#define GCC_AARCH64_H_NEOVERSEN1
+
+#include "generic.h"
+
+static const struct tune_params neoversen1_tunings =
+{
+  &cortexa76_extra_costs,
+  &generic_addrcost_table,
+  &generic_regmove_cost,
+  &cortexa57_vector_cost,
+  &generic_branch_cost,
+  &generic_approx_modes,
+  SVE_NOT_IMPLEMENTED, /* sve_width  */
+  { 4, /* load_int.  */
+    2, /* store_int.  */
+    5, /* load_fp.  */
+    2, /* store_fp.  */
+    4, /* load_pred.  */
+    4 /* store_pred.  */
+  }, /* memmov_cost.  */
+  3, /* issue_rate  */
+  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops  */
+  "32:16",	/* function_align.  */
+  "4",		/* jump_align.  */
+  "32:16",	/* loop_align.  */
+  2,	/* int_reassoc_width.  */
+  4,	/* fp_reassoc_width.  */
+  1,	/* fma_reassoc_width.  */
+  2,	/* vec_reassoc_width.  */
+  2,	/* min_div_recip_mul_sf.  */
+  2,	/* min_div_recip_mul_df.  */
+  0,	/* max_case_values.  */
+  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
+  (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND),	/* tune_flags.  */
+  &generic_prefetch_tune,
+  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
+  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
+};
+
+#endif /* GCC_AARCH64_H_NEOVERSEN1.  */
diff --git a/gcc/config/aarch64/tuning_models/neoversen2.h b/gcc/config/aarch64/tuning_models/neoversen2.h
new file mode 100644
index 0000000000000000000000000000000000000000..395a6d82b8403e586bf179cade055543cf9b9eb0
--- /dev/null
+++ b/gcc/config/aarch64/tuning_models/neoversen2.h
@@ -0,0 +1,245 @@
+/* Tuning model description for AArch64 architecture.
+   Copyright (C) 2009-2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_AARCH64_H_NEOVERSEN2
+#define GCC_AARCH64_H_NEOVERSEN2
+
+#include "generic.h"
+
+static const struct cpu_addrcost_table neoversen2_addrcost_table =
+{
+    {
+      1, /* hi  */
+      0, /* si  */
+      0, /* di  */
+      1, /* ti  */
+    },
+  0, /* pre_modify  */
+  0, /* post_modify  */
+  2, /* post_modify_ld3_st3  */
+  2, /* post_modify_ld4_st4  */
+  0, /* register_offset  */
+  0, /* register_sextend  */
+  0, /* register_zextend  */
+  0 /* imm_offset  */
+};
+
+static const struct cpu_regmove_cost neoversen2_regmove_cost =
+{
+  1, /* GP2GP  */
+  /* Spilling to int<->fp instead of memory is recommended so set
+     realistic costs compared to memmov_cost.  */
+  3, /* GP2FP  */
+  2, /* FP2GP  */
+  2 /* FP2FP  */
+};
+
+static const advsimd_vec_cost neoversen2_advsimd_vector_cost =
+{
+  2, /* int_stmt_cost  */
+  2, /* fp_stmt_cost  */
+  2, /* ld2_st2_permute_cost */
+  2, /* ld3_st3_permute_cost  */
+  3, /* ld4_st4_permute_cost  */
+  3, /* permute_cost  */
+  4, /* reduc_i8_cost  */
+  4, /* reduc_i16_cost  */
+  2, /* reduc_i32_cost  */
+  2, /* reduc_i64_cost  */
+  6, /* reduc_f16_cost  */
+  4, /* reduc_f32_cost  */
+  2, /* reduc_f64_cost  */
+  2, /* store_elt_extra_cost  */
+  /* This value is just inherited from the Cortex-A57 table.  */
+  8, /* vec_to_scalar_cost  */
+  /* This depends very much on what the scalar value is and
+     where it comes from.  E.g. some constants take two dependent
+     instructions or a load, while others might be moved from a GPR.
+     4 seems to be a reasonable compromise in practice.  */
+  4, /* scalar_to_vec_cost  */
+  4, /* align_load_cost  */
+  4, /* unalign_load_cost  */
+  /* Although stores have a latency of 2 and compete for the
+     vector pipes, in practice it's better not to model that.  */
+  1, /* unalign_store_cost  */
+  1  /* store_cost  */
+};
+
+static const sve_vec_cost neoversen2_sve_vector_cost =
+{
+  {
+    2, /* int_stmt_cost  */
+    2, /* fp_stmt_cost  */
+    3, /* ld2_st2_permute_cost  */
+    4, /* ld3_st3_permute_cost  */
+    4, /* ld4_st4_permute_cost  */
+    3, /* permute_cost  */
+    /* Theoretically, a reduction involving 15 scalar ADDs could
+       complete in ~5 cycles and would have a cost of 15.  [SU]ADDV
+       completes in 11 cycles, so give it a cost of 15 + 6.  */
+    21, /* reduc_i8_cost  */
+    /* Likewise for 7 scalar ADDs (~3 cycles) vs. 9: 7 + 6.  */
+    13, /* reduc_i16_cost  */
+    /* Likewise for 3 scalar ADDs (~2 cycles) vs. 8: 3 + 6.  */
+    9, /* reduc_i32_cost  */
+    /* Likewise for 1 scalar ADD (~1 cycles) vs. 2: 1 + 1.  */
+    2, /* reduc_i64_cost  */
+    /* Theoretically, a reduction involving 7 scalar FADDs could
+       complete in ~8 cycles and would have a cost of 14.  FADDV
+       completes in 6 cycles, so give it a cost of 14 - 2.  */
+    12, /* reduc_f16_cost  */
+    /* Likewise for 3 scalar FADDs (~4 cycles) vs. 4: 6 - 0.  */
+    6, /* reduc_f32_cost  */
+    /* Likewise for 1 scalar FADD (~2 cycles) vs. 2: 2 - 0.  */
+    2, /* reduc_f64_cost  */
+    2, /* store_elt_extra_cost  */
+    /* This value is just inherited from the Cortex-A57 table.  */
+    8, /* vec_to_scalar_cost  */
+    /* See the comment above the Advanced SIMD versions.  */
+    4, /* scalar_to_vec_cost  */
+    4, /* align_load_cost  */
+    4, /* unalign_load_cost  */
+    /* Although stores have a latency of 2 and compete for the
+       vector pipes, in practice it's better not to model that.  */
+    1, /* unalign_store_cost  */
+    1  /* store_cost  */
+  },
+  3, /* clast_cost  */
+  10, /* fadda_f16_cost  */
+  6, /* fadda_f32_cost  */
+  4, /* fadda_f64_cost  */
+  /* A strided Advanced SIMD x64 load would take two parallel FP loads
+     (8 cycles) plus an insertion (2 cycles).  Assume a 64-bit SVE gather
+     is 1 cycle more.  The Advanced SIMD version is costed as 2 scalar loads
+     (cost 8) and a vec_construct (cost 2).  Add a full vector operation
+     (cost 2) to that, to avoid the difference being lost in rounding.
+
+     There is no easy comparison between a strided Advanced SIMD x32 load
+     and an SVE 32-bit gather, but cost an SVE 32-bit gather as 1 vector
+     operation more than a 64-bit gather.  */
+  14, /* gather_load_x32_cost  */
+  12, /* gather_load_x64_cost  */
+  3 /* scatter_store_elt_cost  */
+};
+
+static const aarch64_scalar_vec_issue_info neoversen2_scalar_issue_info =
+{
+  3, /* loads_stores_per_cycle  */
+  2, /* stores_per_cycle  */
+  4, /* general_ops_per_cycle  */
+  0, /* fp_simd_load_general_ops  */
+  1 /* fp_simd_store_general_ops  */
+};
+
+static const aarch64_advsimd_vec_issue_info neoversen2_advsimd_issue_info =
+{
+  {
+    3, /* loads_stores_per_cycle  */
+    2, /* stores_per_cycle  */
+    2, /* general_ops_per_cycle  */
+    0, /* fp_simd_load_general_ops  */
+    1 /* fp_simd_store_general_ops  */
+  },
+  2, /* ld2_st2_general_ops  */
+  2, /* ld3_st3_general_ops  */
+  3 /* ld4_st4_general_ops  */
+};
+
+static const aarch64_sve_vec_issue_info neoversen2_sve_issue_info =
+{
+  {
+    {
+      3, /* loads_per_cycle  */
+      2, /* stores_per_cycle  */
+      2, /* general_ops_per_cycle  */
+      0, /* fp_simd_load_general_ops  */
+      1 /* fp_simd_store_general_ops  */
+    },
+    2, /* ld2_st2_general_ops  */
+    3, /* ld3_st3_general_ops  */
+    3 /* ld4_st4_general_ops  */
+  },
+  2, /* pred_ops_per_cycle  */
+  2, /* while_pred_ops  */
+  2, /* int_cmp_pred_ops  */
+  1, /* fp_cmp_pred_ops  */
+  1, /* gather_scatter_pair_general_ops  */
+  1 /* gather_scatter_pair_pred_ops  */
+};
+
+static const aarch64_vec_issue_info neoversen2_vec_issue_info =
+{
+  &neoversen2_scalar_issue_info,
+  &neoversen2_advsimd_issue_info,
+  &neoversen2_sve_issue_info
+};
+
+/* Neoverse N2 costs for vector insn classes.  */
+static const struct cpu_vector_cost neoversen2_vector_cost =
+{
+  1, /* scalar_int_stmt_cost  */
+  2, /* scalar_fp_stmt_cost  */
+  4, /* scalar_load_cost  */
+  1, /* scalar_store_cost  */
+  1, /* cond_taken_branch_cost  */
+  1, /* cond_not_taken_branch_cost  */
+  &neoversen2_advsimd_vector_cost, /* advsimd  */
+  &neoversen2_sve_vector_cost, /* sve  */
+  &neoversen2_vec_issue_info /* issue_info  */
+};
+
+static const struct tune_params neoversen2_tunings =
+{
+  &cortexa76_extra_costs,
+  &neoversen2_addrcost_table,
+  &neoversen2_regmove_cost,
+  &neoversen2_vector_cost,
+  &generic_branch_cost,
+  &generic_approx_modes,
+  SVE_128, /* sve_width  */
+  { 4, /* load_int.  */
+    1, /* store_int.  */
+    6, /* load_fp.  */
+    2, /* store_fp.  */
+    6, /* load_pred.  */
+    1 /* store_pred.  */
+  }, /* memmov_cost.  */
+  3, /* issue_rate  */
+  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops  */
+  "32:16",	/* function_align.  */
+  "4",		/* jump_align.  */
+  "32:16",	/* loop_align.  */
+  2,	/* int_reassoc_width.  */
+  4,	/* fp_reassoc_width.  */
+  1,	/* fma_reassoc_width.  */
+  2,	/* vec_reassoc_width.  */
+  2,	/* min_div_recip_mul_sf.  */
+  2,	/* min_div_recip_mul_df.  */
+  0,	/* max_case_values.  */
+  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
+  (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND
+   | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
+   | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
+   | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT),	/* tune_flags.  */
+  &generic_prefetch_tune,
+  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
+  AARCH64_LDP_STP_POLICY_ALWAYS	   /* stp_policy_model.  */
+};
+
+#endif /* GCC_AARCH64_H_NEOVERSEN2.  */
diff --git a/gcc/config/aarch64/tuning_models/neoversev1.h b/gcc/config/aarch64/tuning_models/neoversev1.h
new file mode 100644
index 0000000000000000000000000000000000000000..584a5000e06f598dcdd3bcc533dc6dbc642223ca
--- /dev/null
+++ b/gcc/config/aarch64/tuning_models/neoversev1.h
@@ -0,0 +1,237 @@
+/* Tuning model description for AArch64 architecture.
+   Copyright (C) 2009-2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_AARCH64_H_NEOVERSEV1
+#define GCC_AARCH64_H_NEOVERSEV1
+
+#include "generic.h"
+
+static const struct cpu_addrcost_table neoversev1_addrcost_table =
+{
+    {
+      1, /* hi  */
+      0, /* si  */
+      0, /* di  */
+      1, /* ti  */
+    },
+  0, /* pre_modify  */
+  0, /* post_modify  */
+  3, /* post_modify_ld3_st3  */
+  3, /* post_modify_ld4_st4  */
+  0, /* register_offset  */
+  0, /* register_sextend  */
+  0, /* register_zextend  */
+  0 /* imm_offset  */
+};
+
+static const struct cpu_regmove_cost neoversev1_regmove_cost =
+{
+  1, /* GP2GP  */
+  /* Spilling to int<->fp instead of memory is recommended so set
+     realistic costs compared to memmov_cost.  */
+  3, /* GP2FP  */
+  2, /* FP2GP  */
+  2 /* FP2FP  */
+};
+
+static const advsimd_vec_cost neoversev1_advsimd_vector_cost =
+{
+  2, /* int_stmt_cost  */
+  2, /* fp_stmt_cost  */
+  4, /* ld2_st2_permute_cost */
+  4, /* ld3_st3_permute_cost  */
+  5, /* ld4_st4_permute_cost  */
+  3, /* permute_cost  */
+  4, /* reduc_i8_cost  */
+  4, /* reduc_i16_cost  */
+  2, /* reduc_i32_cost  */
+  2, /* reduc_i64_cost  */
+  6, /* reduc_f16_cost  */
+  3, /* reduc_f32_cost  */
+  2, /* reduc_f64_cost  */
+  2, /* store_elt_extra_cost  */
+  /* This value is just inherited from the Cortex-A57 table.  */
+  8, /* vec_to_scalar_cost  */
+  /* This depends very much on what the scalar value is and
+     where it comes from.  E.g. some constants take two dependent
+     instructions or a load, while others might be moved from a GPR.
+     4 seems to be a reasonable compromise in practice.  */
+  4, /* scalar_to_vec_cost  */
+  4, /* align_load_cost  */
+  4, /* unalign_load_cost  */
+  /* Although stores have a latency of 2 and compete for the
+     vector pipes, in practice it's better not to model that.  */
+  1, /* unalign_store_cost  */
+  1  /* store_cost  */
+};
+
+static const sve_vec_cost neoversev1_sve_vector_cost =
+{
+  {
+    2, /* int_stmt_cost  */
+    2, /* fp_stmt_cost  */
+    4, /* ld2_st2_permute_cost  */
+    7, /* ld3_st3_permute_cost  */
+    8, /* ld4_st4_permute_cost  */
+    3, /* permute_cost  */
+    /* Theoretically, a reduction involving 31 scalar ADDs could
+       complete in ~9 cycles and would have a cost of 31.  [SU]ADDV
+       completes in 14 cycles, so give it a cost of 31 + 5.  */
+    36, /* reduc_i8_cost  */
+    /* Likewise for 15 scalar ADDs (~5 cycles) vs. 12: 15 + 7.  */
+    22, /* reduc_i16_cost  */
+    /* Likewise for 7 scalar ADDs (~3 cycles) vs. 10: 7 + 7.  */
+    14, /* reduc_i32_cost  */
+    /* Likewise for 3 scalar ADDs (~2 cycles) vs. 10: 3 + 8.  */
+    11, /* reduc_i64_cost  */
+    /* Theoretically, a reduction involving 15 scalar FADDs could
+       complete in ~9 cycles and would have a cost of 30.  FADDV
+       completes in 13 cycles, so give it a cost of 30 + 4.  */
+    34, /* reduc_f16_cost  */
+    /* Likewise for 7 scalar FADDs (~6 cycles) vs. 11: 14 + 5.  */
+    19, /* reduc_f32_cost  */
+    /* Likewise for 3 scalar FADDs (~4 cycles) vs. 9: 6 + 5.  */
+    11, /* reduc_f64_cost  */
+    2, /* store_elt_extra_cost  */
+    /* This value is just inherited from the Cortex-A57 table.  */
+    8, /* vec_to_scalar_cost  */
+    /* See the comment above the Advanced SIMD versions.  */
+    4, /* scalar_to_vec_cost  */
+    4, /* align_load_cost  */
+    4, /* unalign_load_cost  */
+    /* Although stores have a latency of 2 and compete for the
+       vector pipes, in practice it's better not to model that.  */
+    1, /* unalign_store_cost  */
+    1  /* store_cost  */
+  },
+  3, /* clast_cost  */
+  19, /* fadda_f16_cost  */
+  11, /* fadda_f32_cost  */
+  8, /* fadda_f64_cost  */
+  32, /* gather_load_x32_cost  */
+  16, /* gather_load_x64_cost  */
+  3 /* scatter_store_elt_cost  */
+};
+
+static const aarch64_scalar_vec_issue_info neoversev1_scalar_issue_info =
+{
+  3, /* loads_stores_per_cycle  */
+  2, /* stores_per_cycle  */
+  4, /* general_ops_per_cycle  */
+  0, /* fp_simd_load_general_ops  */
+  1 /* fp_simd_store_general_ops  */
+};
+
+static const aarch64_advsimd_vec_issue_info neoversev1_advsimd_issue_info =
+{
+  {
+    3, /* loads_stores_per_cycle  */
+    2, /* stores_per_cycle  */
+    4, /* general_ops_per_cycle  */
+    0, /* fp_simd_load_general_ops  */
+    1 /* fp_simd_store_general_ops  */
+  },
+  2, /* ld2_st2_general_ops  */
+  2, /* ld3_st3_general_ops  */
+  3 /* ld4_st4_general_ops  */
+};
+
+static const aarch64_sve_vec_issue_info neoversev1_sve_issue_info =
+{
+  {
+    {
+      2, /* loads_per_cycle  */
+      2, /* stores_per_cycle  */
+      2, /* general_ops_per_cycle  */
+      0, /* fp_simd_load_general_ops  */
+      1 /* fp_simd_store_general_ops  */
+    },
+    2, /* ld2_st2_general_ops  */
+    2, /* ld3_st3_general_ops  */
+    3 /* ld4_st4_general_ops  */
+  },
+  1, /* pred_ops_per_cycle  */
+  2, /* while_pred_ops  */
+  2, /* int_cmp_pred_ops  */
+  1, /* fp_cmp_pred_ops  */
+  1, /* gather_scatter_pair_general_ops  */
+  1 /* gather_scatter_pair_pred_ops  */
+};
+
+static const aarch64_vec_issue_info neoversev1_vec_issue_info =
+{
+  &neoversev1_scalar_issue_info,
+  &neoversev1_advsimd_issue_info,
+  &neoversev1_sve_issue_info
+};
+
+/* Neoverse V1 costs for vector insn classes.  */
+static const struct cpu_vector_cost neoversev1_vector_cost =
+{
+  1, /* scalar_int_stmt_cost  */
+  2, /* scalar_fp_stmt_cost  */
+  4, /* scalar_load_cost  */
+  1, /* scalar_store_cost  */
+  1, /* cond_taken_branch_cost  */
+  1, /* cond_not_taken_branch_cost  */
+  &neoversev1_advsimd_vector_cost, /* advsimd  */
+  &neoversev1_sve_vector_cost, /* sve  */
+  &neoversev1_vec_issue_info /* issue_info  */
+};
+
+static const struct tune_params neoversev1_tunings =
+{
+  &cortexa76_extra_costs,
+  &neoversev1_addrcost_table,
+  &neoversev1_regmove_cost,
+  &neoversev1_vector_cost,
+  &generic_branch_cost,
+  &generic_approx_modes,
+  SVE_256, /* sve_width  */
+  { 4, /* load_int.  */
+    2, /* store_int.  */
+    6, /* load_fp.  */
+    2, /* store_fp.  */
+    6, /* load_pred.  */
+    1 /* store_pred.  */
+  }, /* memmov_cost.  */
+  3, /* issue_rate  */
+  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops  */
+  "32:16",	/* function_align.  */
+  "4",		/* jump_align.  */
+  "32:16",	/* loop_align.  */
+  2,	/* int_reassoc_width.  */
+  4,	/* fp_reassoc_width.  */
+  4,	/* fma_reassoc_width.  */
+  2,	/* vec_reassoc_width.  */
+  2,	/* min_div_recip_mul_sf.  */
+  2,	/* min_div_recip_mul_df.  */
+  0,	/* max_case_values.  */
+  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
+  (AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
+   | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
+   | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT
+   | AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND),	/* tune_flags.  */
+  &generic_prefetch_tune,
+  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
+  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
+};
+
+
+#endif /* GCC_AARCH64_H_NEOVERSEV1.  */
diff --git a/gcc/config/aarch64/tuning_models/neoversev2.h b/gcc/config/aarch64/tuning_models/neoversev2.h
new file mode 100644
index 0000000000000000000000000000000000000000..28d4244ef4c99ecdffb7408e39dc21bc191223de
--- /dev/null
+++ b/gcc/config/aarch64/tuning_models/neoversev2.h
@@ -0,0 +1,245 @@
+/* Tuning model description for AArch64 architecture.
+   Copyright (C) 2009-2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_AARCH64_H_NEOVERSEV2
+#define GCC_AARCH64_H_NEOVERSEV2
+
+#include "generic.h"
+
+static const struct cpu_addrcost_table neoversev2_addrcost_table =
+{
+    {
+      1, /* hi  */
+      0, /* si  */
+      0, /* di  */
+      1, /* ti  */
+    },
+  0, /* pre_modify  */
+  0, /* post_modify  */
+  2, /* post_modify_ld3_st3  */
+  2, /* post_modify_ld4_st4  */
+  0, /* register_offset  */
+  0, /* register_sextend  */
+  0, /* register_zextend  */
+  0 /* imm_offset  */
+};
+
+static const struct cpu_regmove_cost neoversev2_regmove_cost =
+{
+  1, /* GP2GP  */
+  /* Spilling to int<->fp instead of memory is recommended so set
+     realistic costs compared to memmov_cost.  */
+  3, /* GP2FP  */
+  2, /* FP2GP  */
+  2 /* FP2FP  */
+};
+
+static const advsimd_vec_cost neoversev2_advsimd_vector_cost =
+{
+  2, /* int_stmt_cost  */
+  2, /* fp_stmt_cost  */
+  2, /* ld2_st2_permute_cost */
+  2, /* ld3_st3_permute_cost  */
+  3, /* ld4_st4_permute_cost  */
+  3, /* permute_cost  */
+  4, /* reduc_i8_cost  */
+  4, /* reduc_i16_cost  */
+  2, /* reduc_i32_cost  */
+  2, /* reduc_i64_cost  */
+  6, /* reduc_f16_cost  */
+  3, /* reduc_f32_cost  */
+  2, /* reduc_f64_cost  */
+  2, /* store_elt_extra_cost  */
+  /* This value is just inherited from the Cortex-A57 table.  */
+  8, /* vec_to_scalar_cost  */
+  /* This depends very much on what the scalar value is and
+     where it comes from.  E.g. some constants take two dependent
+     instructions or a load, while others might be moved from a GPR.
+     4 seems to be a reasonable compromise in practice.  */
+  4, /* scalar_to_vec_cost  */
+  4, /* align_load_cost  */
+  4, /* unalign_load_cost  */
+  /* Although stores have a latency of 2 and compete for the
+     vector pipes, in practice it's better not to model that.  */
+  1, /* unalign_store_cost  */
+  1  /* store_cost  */
+};
+
+static const sve_vec_cost neoversev2_sve_vector_cost =
+{
+  {
+    2, /* int_stmt_cost  */
+    2, /* fp_stmt_cost  */
+    3, /* ld2_st2_permute_cost  */
+    3, /* ld3_st3_permute_cost  */
+    4, /* ld4_st4_permute_cost  */
+    3, /* permute_cost  */
+    /* Theoretically, a reduction involving 15 scalar ADDs could
+       complete in ~3 cycles and would have a cost of 15.  [SU]ADDV
+       completes in 11 cycles, so give it a cost of 15 + 8.  */
+    21, /* reduc_i8_cost  */
+    /* Likewise for 7 scalar ADDs (~2 cycles) vs. 9: 7 + 7.  */
+    14, /* reduc_i16_cost  */
+    /* Likewise for 3 scalar ADDs (~2 cycles) vs. 8: 3 + 4.  */
+    7, /* reduc_i32_cost  */
+    /* Likewise for 1 scalar ADD (~1 cycles) vs. 2: 1 + 1.  */
+    2, /* reduc_i64_cost  */
+    /* Theoretically, a reduction involving 7 scalar FADDs could
+       complete in ~6 cycles and would have a cost of 14.  FADDV
+       completes in 8 cycles, so give it a cost of 14 + 2.  */
+    16, /* reduc_f16_cost  */
+    /* Likewise for 3 scalar FADDs (~4 cycles) vs. 6: 6 + 2.  */
+    8, /* reduc_f32_cost  */
+    /* Likewise for 1 scalar FADD (~2 cycles) vs. 4: 2 + 2.  */
+    4, /* reduc_f64_cost  */
+    2, /* store_elt_extra_cost  */
+    /* This value is just inherited from the Cortex-A57 table.  */
+    8, /* vec_to_scalar_cost  */
+    /* See the comment above the Advanced SIMD versions.  */
+    4, /* scalar_to_vec_cost  */
+    4, /* align_load_cost  */
+    4, /* unalign_load_cost  */
+    /* Although stores have a latency of 2 and compete for the
+       vector pipes, in practice it's better not to model that.  */
+    1, /* unalign_store_cost  */
+    1  /* store_cost  */
+  },
+  3, /* clast_cost  */
+  10, /* fadda_f16_cost  */
+  6, /* fadda_f32_cost  */
+  4, /* fadda_f64_cost  */
+  /* A strided Advanced SIMD x64 load would take two parallel FP loads
+     (8 cycles) plus an insertion (2 cycles).  Assume a 64-bit SVE gather
+     is 1 cycle more.  The Advanced SIMD version is costed as 2 scalar loads
+     (cost 8) and a vec_construct (cost 2).  Add a full vector operation
+     (cost 2) to that, to avoid the difference being lost in rounding.
+
+     There is no easy comparison between a strided Advanced SIMD x32 load
+     and an SVE 32-bit gather, but cost an SVE 32-bit gather as 1 vector
+     operation more than a 64-bit gather.  */
+  14, /* gather_load_x32_cost  */
+  12, /* gather_load_x64_cost  */
+  3 /* scatter_store_elt_cost  */
+};
+
+static const aarch64_scalar_vec_issue_info neoversev2_scalar_issue_info =
+{
+  3, /* loads_stores_per_cycle  */
+  2, /* stores_per_cycle  */
+  6, /* general_ops_per_cycle  */
+  0, /* fp_simd_load_general_ops  */
+  1 /* fp_simd_store_general_ops  */
+};
+
+static const aarch64_advsimd_vec_issue_info neoversev2_advsimd_issue_info =
+{
+  {
+    3, /* loads_stores_per_cycle  */
+    2, /* stores_per_cycle  */
+    4, /* general_ops_per_cycle  */
+    0, /* fp_simd_load_general_ops  */
+    1 /* fp_simd_store_general_ops  */
+  },
+  2, /* ld2_st2_general_ops  */
+  2, /* ld3_st3_general_ops  */
+  3 /* ld4_st4_general_ops  */
+};
+
+static const aarch64_sve_vec_issue_info neoversev2_sve_issue_info =
+{
+  {
+    {
+      3, /* loads_per_cycle  */
+      2, /* stores_per_cycle  */
+      4, /* general_ops_per_cycle  */
+      0, /* fp_simd_load_general_ops  */
+      1 /* fp_simd_store_general_ops  */
+    },
+    2, /* ld2_st2_general_ops  */
+    3, /* ld3_st3_general_ops  */
+    3 /* ld4_st4_general_ops  */
+  },
+  2, /* pred_ops_per_cycle  */
+  2, /* while_pred_ops  */
+  2, /* int_cmp_pred_ops  */
+  1, /* fp_cmp_pred_ops  */
+  1, /* gather_scatter_pair_general_ops  */
+  1 /* gather_scatter_pair_pred_ops  */
+};
+
+static const aarch64_vec_issue_info neoversev2_vec_issue_info =
+{
+  &neoversev2_scalar_issue_info,
+  &neoversev2_advsimd_issue_info,
+  &neoversev2_sve_issue_info
+};
+
+/* Demeter costs for vector insn classes.  */
+static const struct cpu_vector_cost neoversev2_vector_cost =
+{
+  1, /* scalar_int_stmt_cost  */
+  2, /* scalar_fp_stmt_cost  */
+  4, /* scalar_load_cost  */
+  1, /* scalar_store_cost  */
+  1, /* cond_taken_branch_cost  */
+  1, /* cond_not_taken_branch_cost  */
+  &neoversev2_advsimd_vector_cost, /* advsimd  */
+  &neoversev2_sve_vector_cost, /* sve  */
+  &neoversev2_vec_issue_info /* issue_info  */
+};
+
+static const struct tune_params neoversev2_tunings =
+{
+  &cortexa76_extra_costs,
+  &neoversev2_addrcost_table,
+  &neoversev2_regmove_cost,
+  &neoversev2_vector_cost,
+  &generic_branch_cost,
+  &generic_approx_modes,
+  SVE_128, /* sve_width  */
+  { 4, /* load_int.  */
+    2, /* store_int.  */
+    6, /* load_fp.  */
+    1, /* store_fp.  */
+    6, /* load_pred.  */
+    2 /* store_pred.  */
+  }, /* memmov_cost.  */
+  5, /* issue_rate  */
+  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops  */
+  "32:16",	/* function_align.  */
+  "4",		/* jump_align.  */
+  "32:16",	/* loop_align.  */
+  3,	/* int_reassoc_width.  */
+  6,	/* fp_reassoc_width.  */
+  4,	/* fma_reassoc_width.  */
+  3,	/* vec_reassoc_width.  */
+  2,	/* min_div_recip_mul_sf.  */
+  2,	/* min_div_recip_mul_df.  */
+  0,	/* max_case_values.  */
+  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
+  (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND
+   | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
+   | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
+   | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT),	/* tune_flags.  */
+  &generic_prefetch_tune,
+  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
+  AARCH64_LDP_STP_POLICY_ALWAYS	   /* stp_policy_model.  */
+};
+
+#endif /* GCC_AARCH64_H_NEOVERSEV2.  */
diff --git a/gcc/config/aarch64/tuning_models/qdf24xx.h b/gcc/config/aarch64/tuning_models/qdf24xx.h
new file mode 100644
index 0000000000000000000000000000000000000000..29c9b9f5843acc15450a2492b141c02ee48a3f13
--- /dev/null
+++ b/gcc/config/aarch64/tuning_models/qdf24xx.h
@@ -0,0 +1,137 @@
+/* Tuning model description for AArch64 architecture.
+   Copyright (C) 2009-2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_AARCH64_H_QDF24XX
+#define GCC_AARCH64_H_QDF24XX
+
+#include "generic.h"
+
+static const struct cpu_addrcost_table qdf24xx_addrcost_table =
+{
+    {
+      1, /* hi  */
+      1, /* si  */
+      1, /* di  */
+      2, /* ti  */
+    },
+  1, /* pre_modify  */
+  1, /* post_modify  */
+  1, /* post_modify_ld3_st3  */
+  1, /* post_modify_ld4_st4  */
+  3, /* register_offset  */
+  3, /* register_sextend  */
+  3, /* register_zextend  */
+  2, /* imm_offset  */
+};
+
+static const struct cpu_regmove_cost qdf24xx_regmove_cost =
+{
+  2, /* GP2GP  */
+  /* Avoid the use of int<->fp moves for spilling.  */
+  6, /* GP2FP  */
+  6, /* FP2GP  */
+  4 /* FP2FP  */
+};
+
+static const advsimd_vec_cost qdf24xx_advsimd_vector_cost =
+{
+  1, /* int_stmt_cost  */
+  3, /* fp_stmt_cost  */
+  0, /* ld2_st2_permute_cost  */
+  0, /* ld3_st3_permute_cost  */
+  0, /* ld4_st4_permute_cost  */
+  2, /* permute_cost  */
+  1, /* reduc_i8_cost  */
+  1, /* reduc_i16_cost  */
+  1, /* reduc_i32_cost  */
+  1, /* reduc_i64_cost  */
+  1, /* reduc_f16_cost  */
+  1, /* reduc_f32_cost  */
+  1, /* reduc_f64_cost  */
+  1, /* store_elt_extra_cost  */
+  1, /* vec_to_scalar_cost  */
+  1, /* scalar_to_vec_cost  */
+  1, /* align_load_cost  */
+  1, /* unalign_load_cost  */
+  1, /* unalign_store_cost  */
+  1  /* store_cost  */
+};
+
+/* QDF24XX costs for vector insn classes.  */
+static const struct cpu_vector_cost qdf24xx_vector_cost =
+{
+  1, /* scalar_int_stmt_cost  */
+  1, /* scalar_fp_stmt_cost  */
+  1, /* scalar_load_cost  */
+  1, /* scalar_store_cost  */
+  3, /* cond_taken_branch_cost  */
+  1, /* cond_not_taken_branch_cost  */
+  &qdf24xx_advsimd_vector_cost, /* advsimd  */
+  nullptr, /* sve  */
+  nullptr /* issue_info  */
+};
+
+static const cpu_prefetch_tune qdf24xx_prefetch_tune =
+{
+  4,			/* num_slots  */
+  32,			/* l1_cache_size  */
+  64,			/* l1_cache_line_size  */
+  512,			/* l2_cache_size  */
+  false,		/* prefetch_dynamic_strides */
+  2048,			/* minimum_stride */
+  3			/* default_opt_level  */
+};
+
+static const struct tune_params qdf24xx_tunings =
+{
+  &qdf24xx_extra_costs,
+  &qdf24xx_addrcost_table,
+  &qdf24xx_regmove_cost,
+  &qdf24xx_vector_cost,
+  &generic_branch_cost,
+  &generic_approx_modes,
+  SVE_NOT_IMPLEMENTED, /* sve_width  */
+  { 4, /* load_int.  */
+    4, /* store_int.  */
+    4, /* load_fp.  */
+    4, /* store_fp.  */
+    4, /* load_pred.  */
+    4 /* store_pred.  */
+  }, /* memmov_cost.  */
+  4, /* issue_rate  */
+  (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
+   | AARCH64_FUSE_MOVK_MOVK), /* fuseable_ops  */
+  "16",	/* function_align.  */
+  "8",	/* jump_align.  */
+  "16",	/* loop_align.  */
+  2,	/* int_reassoc_width.  */
+  4,	/* fp_reassoc_width.  */
+  1,	/* fma_reassoc_width.  */
+  1,	/* vec_reassoc_width.  */
+  2,	/* min_div_recip_mul_sf.  */
+  2,	/* min_div_recip_mul_df.  */
+  0,	/* max_case_values.  */
+  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
+  AARCH64_EXTRA_TUNE_RENAME_LOAD_REGS, /* tune_flags.  */
+  &qdf24xx_prefetch_tune,
+  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
+  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
+};
+
+#endif /* GCC_AARCH64_H_QDF24XX.  */
diff --git a/gcc/config/aarch64/tuning_models/saphira.h b/gcc/config/aarch64/tuning_models/saphira.h
new file mode 100644
index 0000000000000000000000000000000000000000..e584d316bb7c3c2d232cf7623a92100ad261f07d
--- /dev/null
+++ b/gcc/config/aarch64/tuning_models/saphira.h
@@ -0,0 +1,63 @@
+/* Tuning model description for AArch64 architecture.
+   Copyright (C) 2009-2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_AARCH64_H_SAPHIRA
+#define GCC_AARCH64_H_SAPHIRA
+
+#include "generic.h"
+
+/* Tuning structure for the Qualcomm Saphira core.  Default to falkor values
+   for now.  */
+static const struct tune_params saphira_tunings =
+{
+  &generic_extra_costs,
+  &generic_addrcost_table,
+  &generic_regmove_cost,
+  &generic_vector_cost,
+  &generic_branch_cost,
+  &generic_approx_modes,
+  SVE_NOT_IMPLEMENTED, /* sve_width  */
+  { 4, /* load_int.  */
+    4, /* store_int.  */
+    4, /* load_fp.  */
+    4, /* store_fp.  */
+    4, /* load_pred.  */
+    4 /* store_pred.  */
+  }, /* memmov_cost.  */
+  4, /* issue_rate  */
+  (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
+   | AARCH64_FUSE_MOVK_MOVK), /* fuseable_ops  */
+  "16",	/* function_align.  */
+  "8",	/* jump_align.  */
+  "16",	/* loop_align.  */
+  2,	/* int_reassoc_width.  */
+  4,	/* fp_reassoc_width.  */
+  1,	/* fma_reassoc_width.  */
+  1,	/* vec_reassoc_width.  */
+  2,	/* min_div_recip_mul_sf.  */
+  2,	/* min_div_recip_mul_df.  */
+  0,	/* max_case_values.  */
+  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
+  (AARCH64_EXTRA_TUNE_NONE),		/* tune_flags.  */
+  &generic_prefetch_tune,
+  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
+  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
+};
+
+#endif /* GCC_AARCH64_H_SAPHIRA.  */
diff --git a/gcc/config/aarch64/tuning_models/thunderx.h b/gcc/config/aarch64/tuning_models/thunderx.h
new file mode 100644
index 0000000000000000000000000000000000000000..dd4b9d539fc5cf2bd20d84e91d6b72fa7237f99f
--- /dev/null
+++ b/gcc/config/aarch64/tuning_models/thunderx.h
@@ -0,0 +1,117 @@
+/* Tuning model description for AArch64 architecture.
+   Copyright (C) 2009-2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_AARCH64_H_THUNDERX
+#define GCC_AARCH64_H_THUNDERX
+
+#include "generic.h"
+
+static const struct cpu_regmove_cost thunderx_regmove_cost =
+{
+  2, /* GP2GP  */
+  2, /* GP2FP  */
+  6, /* FP2GP  */
+  4 /* FP2FP  */
+};
+
+static const advsimd_vec_cost thunderx_advsimd_vector_cost =
+{
+  4, /* int_stmt_cost  */
+  1, /* fp_stmt_cost  */
+  0, /* ld2_st2_permute_cost  */
+  0, /* ld3_st3_permute_cost  */
+  0, /* ld4_st4_permute_cost  */
+  4, /* permute_cost  */
+  2, /* reduc_i8_cost  */
+  2, /* reduc_i16_cost  */
+  2, /* reduc_i32_cost  */
+  2, /* reduc_i64_cost  */
+  2, /* reduc_f16_cost  */
+  2, /* reduc_f32_cost  */
+  2, /* reduc_f64_cost  */
+  2, /* store_elt_extra_cost  */
+  2, /* vec_to_scalar_cost  */
+  2, /* scalar_to_vec_cost  */
+  3, /* align_load_cost  */
+  5, /* unalign_load_cost  */
+  5, /* unalign_store_cost  */
+  1  /* store_cost  */
+};
+
+/* ThunderX costs for vector insn classes.  */
+static const struct cpu_vector_cost thunderx_vector_cost =
+{
+  1, /* scalar_int_stmt_cost  */
+  1, /* scalar_fp_stmt_cost  */
+  3, /* scalar_load_cost  */
+  1, /* scalar_store_cost  */
+  3, /* cond_taken_branch_cost  */
+  3, /* cond_not_taken_branch_cost  */
+  &thunderx_advsimd_vector_cost, /* advsimd  */
+  nullptr, /* sve  */
+  nullptr /* issue_info  */
+};
+
+static const cpu_prefetch_tune thunderx_prefetch_tune =
+{
+  8,			/* num_slots  */
+  32,			/* l1_cache_size  */
+  128,			/* l1_cache_line_size  */
+  -1,			/* l2_cache_size  */
+  true,			/* prefetch_dynamic_strides */
+  -1,			/* minimum_stride */
+  -1			/* default_opt_level  */
+};
+
+static const struct tune_params thunderx_tunings =
+{
+  &thunderx_extra_costs,
+  &generic_addrcost_table,
+  &thunderx_regmove_cost,
+  &thunderx_vector_cost,
+  &generic_branch_cost,
+  &generic_approx_modes,
+  SVE_NOT_IMPLEMENTED, /* sve_width  */
+  { 6, /* load_int.  */
+    6, /* store_int.  */
+    6, /* load_fp.  */
+    6, /* store_fp.  */
+    6, /* load_pred.  */
+    6 /* store_pred.  */
+  }, /* memmov_cost.  */
+  2, /* issue_rate  */
+  AARCH64_FUSE_ALU_BRANCH, /* fusible_ops  */
+  "8",	/* function_align.  */
+  "8",	/* jump_align.  */
+  "8",	/* loop_align.  */
+  2,	/* int_reassoc_width.  */
+  4,	/* fp_reassoc_width.  */
+  1,	/* fma_reassoc_width.  */
+  1,	/* vec_reassoc_width.  */
+  2,	/* min_div_recip_mul_sf.  */
+  2,	/* min_div_recip_mul_df.  */
+  0,	/* max_case_values.  */
+  tune_params::AUTOPREFETCHER_OFF,	/* autoprefetcher_model.  */
+  (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND),	/* tune_flags.  */
+  &thunderx_prefetch_tune,
+  AARCH64_LDP_STP_POLICY_ALIGNED,   /* ldp_policy_model.  */
+  AARCH64_LDP_STP_POLICY_ALIGNED    /* stp_policy_model.  */
+};
+
+#endif /* GCC_AARCH64_H_THUNDERX.  */
diff --git a/gcc/config/aarch64/tuning_models/thunderx2t99.h b/gcc/config/aarch64/tuning_models/thunderx2t99.h
new file mode 100644
index 0000000000000000000000000000000000000000..0a376e0bab37b0b5bc1ea23de0e96a9245846fd7
--- /dev/null
+++ b/gcc/config/aarch64/tuning_models/thunderx2t99.h
@@ -0,0 +1,137 @@
+/* Tuning model description for AArch64 architecture.
+   Copyright (C) 2009-2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_AARCH64_H_THUNDERX2T99
+#define GCC_AARCH64_H_THUNDERX2T99
+
+#include "generic.h"
+
+static const struct cpu_addrcost_table thunderx2t99_addrcost_table =
+{
+    {
+      1, /* hi  */
+      1, /* si  */
+      1, /* di  */
+      2, /* ti  */
+    },
+  0, /* pre_modify  */
+  0, /* post_modify  */
+  0, /* post_modify_ld3_st3  */
+  0, /* post_modify_ld4_st4  */
+  2, /* register_offset  */
+  3, /* register_sextend  */
+  3, /* register_zextend  */
+  0, /* imm_offset  */
+};
+
+static const struct cpu_regmove_cost thunderx2t99_regmove_cost =
+{
+  1, /* GP2GP  */
+  /* Avoid the use of int<->fp moves for spilling.  */
+  5, /* GP2FP  */
+  6, /* FP2GP  */
+  3, /* FP2FP  */
+};
+
+static const advsimd_vec_cost thunderx2t99_advsimd_vector_cost =
+{
+  4, /* int_stmt_cost  */
+  5, /* fp_stmt_cost  */
+  0, /* ld2_st2_permute_cost  */
+  0, /* ld3_st3_permute_cost  */
+  0, /* ld4_st4_permute_cost  */
+  10, /* permute_cost  */
+  6, /* reduc_i8_cost  */
+  6, /* reduc_i16_cost  */
+  6, /* reduc_i32_cost  */
+  6, /* reduc_i64_cost  */
+  6, /* reduc_f16_cost  */
+  6, /* reduc_f32_cost  */
+  6, /* reduc_f64_cost  */
+  6, /* store_elt_extra_cost  */
+  6, /* vec_to_scalar_cost  */
+  5, /* scalar_to_vec_cost  */
+  4, /* align_load_cost  */
+  4, /* unalign_load_cost  */
+  1, /* unalign_store_cost  */
+  1  /* store_cost  */
+};
+
+/* Costs for vector insn classes for Vulcan.  */
+static const struct cpu_vector_cost thunderx2t99_vector_cost =
+{
+  1, /* scalar_int_stmt_cost  */
+  6, /* scalar_fp_stmt_cost  */
+  4, /* scalar_load_cost  */
+  1, /* scalar_store_cost  */
+  2, /* cond_taken_branch_cost  */
+  1,  /* cond_not_taken_branch_cost  */
+  &thunderx2t99_advsimd_vector_cost, /* advsimd  */
+  nullptr, /* sve  */
+  nullptr /* issue_info  */
+};
+
+static const cpu_prefetch_tune thunderx2t99_prefetch_tune =
+{
+  8,			/* num_slots  */
+  32,			/* l1_cache_size  */
+  64,			/* l1_cache_line_size  */
+  256,			/* l2_cache_size  */
+  true,			/* prefetch_dynamic_strides */
+  -1,			/* minimum_stride */
+  -1			/* default_opt_level  */
+};
+
+static const struct tune_params thunderx2t99_tunings =
+{
+  &thunderx2t99_extra_costs,
+  &thunderx2t99_addrcost_table,
+  &thunderx2t99_regmove_cost,
+  &thunderx2t99_vector_cost,
+  &generic_branch_cost,
+  &generic_approx_modes,
+  SVE_NOT_IMPLEMENTED, /* sve_width  */
+  { 4, /* load_int.  */
+    4, /* store_int.  */
+    4, /* load_fp.  */
+    4, /* store_fp.  */
+    4, /* load_pred.  */
+    4 /* store_pred.  */
+  }, /* memmov_cost.  */
+  4, /* issue_rate.  */
+  (AARCH64_FUSE_ALU_BRANCH | AARCH64_FUSE_AES_AESMC
+   | AARCH64_FUSE_ALU_CBZ), /* fusible_ops  */
+  "16",	/* function_align.  */
+  "8",	/* jump_align.  */
+  "16",	/* loop_align.  */
+  3,	/* int_reassoc_width.  */
+  2,	/* fp_reassoc_width.  */
+  1,	/* fma_reassoc_width.  */
+  2,	/* vec_reassoc_width.  */
+  2,	/* min_div_recip_mul_sf.  */
+  2,	/* min_div_recip_mul_df.  */
+  0,	/* max_case_values.  */
+  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
+  (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
+  &thunderx2t99_prefetch_tune,
+  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
+  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
+};
+
+#endif /* GCC_AARCH64_H_THUNDERX2T99.  */
diff --git a/gcc/config/aarch64/tuning_models/thunderx3t110.h b/gcc/config/aarch64/tuning_models/thunderx3t110.h
new file mode 100644
index 0000000000000000000000000000000000000000..65203b4af132e12e4994013fbab228bd3873b756
--- /dev/null
+++ b/gcc/config/aarch64/tuning_models/thunderx3t110.h
@@ -0,0 +1,136 @@
+/* Tuning model description for AArch64 architecture.
+   Copyright (C) 2009-2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_AARCH64_H_THUNDERX3T110
+#define GCC_AARCH64_H_THUNDERX3T110
+
+#include "generic.h"
+
+static const struct cpu_addrcost_table thunderx3t110_addrcost_table =
+{
+    {
+      1, /* hi  */
+      1, /* si  */
+      1, /* di  */
+      2, /* ti  */
+    },
+  0, /* pre_modify  */
+  0, /* post_modify  */
+  0, /* post_modify_ld3_st3  */
+  0, /* post_modify_ld4_st4  */
+  2, /* register_offset  */
+  3, /* register_sextend  */
+  3, /* register_zextend  */
+  0, /* imm_offset  */
+};
+
+static const struct cpu_regmove_cost thunderx3t110_regmove_cost =
+{
+  1, /* GP2GP  */
+  /* Avoid the use of int<->fp moves for spilling.  */
+  4, /* GP2FP  */
+  5, /* FP2GP  */
+  4  /* FP2FP  */
+};
+
+static const advsimd_vec_cost thunderx3t110_advsimd_vector_cost =
+{
+  5, /* int_stmt_cost  */
+  5, /* fp_stmt_cost  */
+  0, /* ld2_st2_permute_cost  */
+  0, /* ld3_st3_permute_cost  */
+  0, /* ld4_st4_permute_cost  */
+  10, /* permute_cost  */
+  5, /* reduc_i8_cost  */
+  5, /* reduc_i16_cost  */
+  5, /* reduc_i32_cost  */
+  5, /* reduc_i64_cost  */
+  5, /* reduc_f16_cost  */
+  5, /* reduc_f32_cost  */
+  5, /* reduc_f64_cost  */
+  5, /* store_elt_extra_cost  */
+  5, /* vec_to_scalar_cost  */
+  5, /* scalar_to_vec_cost  */
+  4, /* align_load_cost  */
+  4, /* unalign_load_cost  */
+  4, /* unalign_store_cost  */
+  4  /* store_cost  */
+};
+
+static const struct cpu_vector_cost thunderx3t110_vector_cost =
+{
+  1, /* scalar_int_stmt_cost  */
+  5, /* scalar_fp_stmt_cost  */
+  4, /* scalar_load_cost  */
+  1, /* scalar_store_cost  */
+  2, /* cond_taken_branch_cost  */
+  1,  /* cond_not_taken_branch_cost  */
+  &thunderx3t110_advsimd_vector_cost, /* advsimd  */
+  nullptr, /* sve  */
+  nullptr /* issue_info  */
+};
+
+static const cpu_prefetch_tune thunderx3t110_prefetch_tune =
+{
+  8,			/* num_slots  */
+  32,			/* l1_cache_size  */
+  64,			/* l1_cache_line_size  */
+  256,			/* l2_cache_size  */
+  true,			/* prefetch_dynamic_strides */
+  -1,			/* minimum_stride */
+  -1			/* default_opt_level  */
+};
+
+static const struct tune_params thunderx3t110_tunings =
+{
+  &thunderx3t110_extra_costs,
+  &thunderx3t110_addrcost_table,
+  &thunderx3t110_regmove_cost,
+  &thunderx3t110_vector_cost,
+  &generic_branch_cost,
+  &generic_approx_modes,
+  SVE_NOT_IMPLEMENTED, /* sve_width  */
+  { 4, /* load_int.  */
+    4, /* store_int.  */
+    4, /* load_fp.  */
+    4, /* store_fp.  */
+    4, /* load_pred.  */
+    4 /* store_pred.  */
+  }, /* memmov_cost.  */
+  6, /* issue_rate.  */
+  (AARCH64_FUSE_ALU_BRANCH | AARCH64_FUSE_AES_AESMC
+   | AARCH64_FUSE_ALU_CBZ), /* fusible_ops  */
+  "16",	/* function_align.  */
+  "8",	/* jump_align.  */
+  "16",	/* loop_align.  */
+  3,	/* int_reassoc_width.  */
+  2,	/* fp_reassoc_width.  */
+  1,	/* fma_reassoc_width.  */
+  2,	/* vec_reassoc_width.  */
+  2,	/* min_div_recip_mul_sf.  */
+  2,	/* min_div_recip_mul_df.  */
+  0,	/* max_case_values.  */
+  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
+  (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
+  &thunderx3t110_prefetch_tune,
+  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
+  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
+};
+
+#endif /* GCC_AARCH64_H_THUNDERX3T110.  */
diff --git a/gcc/config/aarch64/tuning_models/thunderxt88.h b/gcc/config/aarch64/tuning_models/thunderxt88.h
new file mode 100644
index 0000000000000000000000000000000000000000..dcc74d31484ee6b99d37920dbfe7b1d59377d074
--- /dev/null
+++ b/gcc/config/aarch64/tuning_models/thunderxt88.h
@@ -0,0 +1,72 @@
+/* Tuning model description for AArch64 architecture.
+   Copyright (C) 2009-2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_AARCH64_H_THUNDERXT88
+#define GCC_AARCH64_H_THUNDERXT88
+
+#include "generic.h"
+#include "thunderx.h"
+
+static const cpu_prefetch_tune thunderxt88_prefetch_tune =
+{
+  8,			/* num_slots  */
+  32,			/* l1_cache_size  */
+  128,			/* l1_cache_line_size  */
+  16*1024,		/* l2_cache_size  */
+  true,			/* prefetch_dynamic_strides */
+  -1,			/* minimum_stride */
+  3			/* default_opt_level  */
+};
+
+static const struct tune_params thunderxt88_tunings =
+{
+  &thunderx_extra_costs,
+  &generic_addrcost_table,
+  &thunderx_regmove_cost,
+  &thunderx_vector_cost,
+  &generic_branch_cost,
+  &generic_approx_modes,
+  SVE_NOT_IMPLEMENTED, /* sve_width  */
+  { 6, /* load_int.  */
+    6, /* store_int.  */
+    6, /* load_fp.  */
+    6, /* store_fp.  */
+    6, /* load_pred.  */
+    6 /* store_pred.  */
+  }, /* memmov_cost.  */
+  2, /* issue_rate  */
+  AARCH64_FUSE_ALU_BRANCH, /* fusible_ops  */
+  "8",	/* function_align.  */
+  "8",	/* jump_align.  */
+  "8",	/* loop_align.  */
+  2,	/* int_reassoc_width.  */
+  4,	/* fp_reassoc_width.  */
+  1,	/* fma_reassoc_width.  */
+  1,	/* vec_reassoc_width.  */
+  2,	/* min_div_recip_mul_sf.  */
+  2,	/* min_div_recip_mul_df.  */
+  0,	/* max_case_values.  */
+  tune_params::AUTOPREFETCHER_OFF,	/* autoprefetcher_model.  */
+  (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
+  &thunderxt88_prefetch_tune,
+  AARCH64_LDP_STP_POLICY_ALIGNED,   /* ldp_policy_model.  */
+  AARCH64_LDP_STP_POLICY_ALIGNED    /* stp_policy_model.  */
+};
+
+#endif /* GCC_AARCH64_H_THUNDERXT88.  */
diff --git a/gcc/config/aarch64/tuning_models/tsv110.h b/gcc/config/aarch64/tuning_models/tsv110.h
new file mode 100644
index 0000000000000000000000000000000000000000..42aeafce652fff34e3277194993dd4aa1f0383a1
--- /dev/null
+++ b/gcc/config/aarch64/tuning_models/tsv110.h
@@ -0,0 +1,137 @@
+/* Tuning model description for AArch64 architecture.
+   Copyright (C) 2009-2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_AARCH64_H_TSV110
+#define GCC_AARCH64_H_TSV110
+
+#include "generic.h"
+
+static const struct cpu_addrcost_table tsv110_addrcost_table =
+{
+    {
+      1, /* hi  */
+      0, /* si  */
+      0, /* di  */
+      1, /* ti  */
+    },
+  0, /* pre_modify  */
+  0, /* post_modify  */
+  0, /* post_modify_ld3_st3  */
+  0, /* post_modify_ld4_st4  */
+  0, /* register_offset  */
+  1, /* register_sextend  */
+  1, /* register_zextend  */
+  0, /* imm_offset  */
+};
+
+static const struct cpu_regmove_cost tsv110_regmove_cost =
+{
+  1, /* GP2GP  */
+  /* Avoid the use of slow int<->fp moves for spilling by setting
+     their cost higher than memmov_cost.  */
+  2, /* GP2FP  */
+  3, /* FP2GP  */
+  2  /* FP2FP  */
+};
+
+static const advsimd_vec_cost tsv110_advsimd_vector_cost =
+{
+  2, /* int_stmt_cost  */
+  2, /* fp_stmt_cost  */
+  0, /* ld2_st2_permute_cost  */
+  0, /* ld3_st3_permute_cost  */
+  0, /* ld4_st4_permute_cost  */
+  2, /* permute_cost  */
+  3, /* reduc_i8_cost  */
+  3, /* reduc_i16_cost  */
+  3, /* reduc_i32_cost  */
+  3, /* reduc_i64_cost  */
+  3, /* reduc_f16_cost  */
+  3, /* reduc_f32_cost  */
+  3, /* reduc_f64_cost  */
+  3, /* store_elt_extra_cost  */
+  3, /* vec_to_scalar_cost  */
+  2, /* scalar_to_vec_cost  */
+  5, /* align_load_cost  */
+  5, /* unalign_load_cost  */
+  1, /* unalign_store_cost  */
+  1  /* store_cost  */
+};
+
+static const struct cpu_vector_cost tsv110_vector_cost =
+{
+  1, /* scalar_int_stmt_cost  */
+  1, /* scalar_fp_stmt_cost  */
+  5, /* scalar_load_cost  */
+  1, /* scalar_store_cost  */
+  1, /* cond_taken_branch_cost  */
+  1, /* cond_not_taken_branch_cost  */
+  &tsv110_advsimd_vector_cost, /* advsimd  */
+  nullptr, /* sve  */
+  nullptr /* issue_info  */
+};
+
+static const cpu_prefetch_tune tsv110_prefetch_tune =
+{
+  0,                    /* num_slots  */
+  64,                   /* l1_cache_size  */
+  64,                   /* l1_cache_line_size  */
+  512,                  /* l2_cache_size  */
+  true,                 /* prefetch_dynamic_strides */
+  -1,                   /* minimum_stride */
+  -1                    /* default_opt_level  */
+};
+
+static const struct tune_params tsv110_tunings =
+{
+  &tsv110_extra_costs,
+  &tsv110_addrcost_table,
+  &tsv110_regmove_cost,
+  &tsv110_vector_cost,
+  &generic_branch_cost,
+  &generic_approx_modes,
+  SVE_NOT_IMPLEMENTED, /* sve_width  */
+  { 4, /* load_int.  */
+    4, /* store_int.  */
+    4, /* load_fp.  */
+    4, /* store_fp.  */
+    4, /* load_pred.  */
+    4 /* store_pred.  */
+  }, /* memmov_cost.  */
+  4,    /* issue_rate  */
+  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_ALU_BRANCH
+   | AARCH64_FUSE_ALU_CBZ), /* fusible_ops  */
+  "16", /* function_align.  */
+  "4",  /* jump_align.  */
+  "8",  /* loop_align.  */
+  2,    /* int_reassoc_width.  */
+  4,    /* fp_reassoc_width.  */
+  1,	/* fma_reassoc_width.  */
+  1,    /* vec_reassoc_width.  */
+  2,    /* min_div_recip_mul_sf.  */
+  2,    /* min_div_recip_mul_df.  */
+  0,    /* max_case_values.  */
+  tune_params::AUTOPREFETCHER_WEAK,     /* autoprefetcher_model.  */
+  (AARCH64_EXTRA_TUNE_NONE),     /* tune_flags.  */
+  &tsv110_prefetch_tune,
+  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
+  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
+};
+
+#endif /* GCC_AARCH64_H_TSV110.  */
diff --git a/gcc/config/aarch64/tuning_models/xgene1.h b/gcc/config/aarch64/tuning_models/xgene1.h
new file mode 100644
index 0000000000000000000000000000000000000000..53a3eb0ddeb80a9735cc988e242a70e87dc90655
--- /dev/null
+++ b/gcc/config/aarch64/tuning_models/xgene1.h
@@ -0,0 +1,145 @@
+/* Tuning model description for AArch64 architecture.
+   Copyright (C) 2009-2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_AARCH64_H_XGENE1
+#define GCC_AARCH64_H_XGENE1
+
+#include "generic.h"
+
+static const struct cpu_addrcost_table xgene1_addrcost_table =
+{
+    {
+      1, /* hi  */
+      0, /* si  */
+      0, /* di  */
+      1, /* ti  */
+    },
+  1, /* pre_modify  */
+  1, /* post_modify  */
+  1, /* post_modify_ld3_st3  */
+  1, /* post_modify_ld4_st4  */
+  0, /* register_offset  */
+  1, /* register_sextend  */
+  1, /* register_zextend  */
+  0, /* imm_offset  */
+};
+
+static const struct cpu_regmove_cost xgene1_regmove_cost =
+{
+  1, /* GP2GP  */
+  /* Avoid the use of slow int<->fp moves for spilling by setting
+     their cost higher than memmov_cost.  */
+  8, /* GP2FP  */
+  8, /* FP2GP  */
+  2 /* FP2FP  */
+};
+
+static const advsimd_vec_cost xgene1_advsimd_vector_cost =
+{
+  2, /* int_stmt_cost  */
+  2, /* fp_stmt_cost  */
+  0, /* ld2_st2_permute_cost  */
+  0, /* ld3_st3_permute_cost  */
+  0, /* ld4_st4_permute_cost  */
+  2, /* permute_cost  */
+  4, /* reduc_i8_cost  */
+  4, /* reduc_i16_cost  */
+  4, /* reduc_i32_cost  */
+  4, /* reduc_i64_cost  */
+  4, /* reduc_f16_cost  */
+  4, /* reduc_f32_cost  */
+  4, /* reduc_f64_cost  */
+  4, /* store_elt_extra_cost  */
+  4, /* vec_to_scalar_cost  */
+  4, /* scalar_to_vec_cost  */
+  10, /* align_load_cost  */
+  10, /* unalign_load_cost  */
+  2, /* unalign_store_cost  */
+  2  /* store_cost  */
+};
+
+/* Generic costs for vector insn classes.  */
+static const struct cpu_vector_cost xgene1_vector_cost =
+{
+  1, /* scalar_int_stmt_cost  */
+  1, /* scalar_fp_stmt_cost  */
+  5, /* scalar_load_cost  */
+  1, /* scalar_store_cost  */
+  2, /* cond_taken_branch_cost  */
+  1, /* cond_not_taken_branch_cost  */
+  &xgene1_advsimd_vector_cost, /* advsimd  */
+  nullptr, /* sve  */
+  nullptr /* issue_info  */
+};
+
+/* Approximation modes for X-Gene 1.  */
+static const cpu_approx_modes xgene1_approx_modes =
+{
+  AARCH64_APPROX_NONE,	/* division  */
+  AARCH64_APPROX_NONE,	/* sqrt  */
+  AARCH64_APPROX_ALL	/* recip_sqrt  */
+};
+
+static const cpu_prefetch_tune xgene1_prefetch_tune =
+{
+  8,			/* num_slots  */
+  32,			/* l1_cache_size  */
+  64,			/* l1_cache_line_size  */
+  256,			/* l2_cache_size  */
+  true,                 /* prefetch_dynamic_strides */
+  -1,                   /* minimum_stride */
+  -1			/* default_opt_level  */
+};
+
+static const struct tune_params xgene1_tunings =
+{
+  &xgene1_extra_costs,
+  &xgene1_addrcost_table,
+  &xgene1_regmove_cost,
+  &xgene1_vector_cost,
+  &generic_branch_cost,
+  &xgene1_approx_modes,
+  SVE_NOT_IMPLEMENTED, /* sve_width  */
+  { 6, /* load_int.  */
+    6, /* store_int.  */
+    6, /* load_fp.  */
+    6, /* store_fp.  */
+    6, /* load_pred.  */
+    6 /* store_pred.  */
+  }, /* memmov_cost.  */
+  4, /* issue_rate  */
+  AARCH64_FUSE_NOTHING, /* fusible_ops  */
+  "16",	/* function_align.  */
+  "16",	/* jump_align.  */
+  "16",	/* loop_align.  */
+  2,	/* int_reassoc_width.  */
+  4,	/* fp_reassoc_width.  */
+  1,	/* fma_reassoc_width.  */
+  1,	/* vec_reassoc_width.  */
+  2,	/* min_div_recip_mul_sf.  */
+  2,	/* min_div_recip_mul_df.  */
+  17,	/* max_case_values.  */
+  tune_params::AUTOPREFETCHER_OFF,	/* autoprefetcher_model.  */
+  (AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS),	/* tune_flags.  */
+  &xgene1_prefetch_tune,
+  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
+  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
+};
+
+#endif /* GCC_AARCH64_H_XGENE1.  */




-- 

[-- Attachment #2: rb17815.patch --]
[-- Type: text/plain, Size: 178781 bytes --]

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 9fbfc548a891f5d11940c6fd3c49a14bfbdec886..07b1cde39209f5c7740e336b499e9aed31e4c515 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -354,2405 +354,30 @@ static const struct aarch64_flag_desc aarch64_tuning_flags[] =
 };
 
 /* Tuning parameters.  */
-
-static const struct cpu_addrcost_table generic_addrcost_table =
-{
-    {
-      1, /* hi  */
-      0, /* si  */
-      0, /* di  */
-      1, /* ti  */
-    },
-  0, /* pre_modify  */
-  0, /* post_modify  */
-  0, /* post_modify_ld3_st3  */
-  0, /* post_modify_ld4_st4  */
-  0, /* register_offset  */
-  0, /* register_sextend  */
-  0, /* register_zextend  */
-  0 /* imm_offset  */
-};
-
-static const struct cpu_addrcost_table exynosm1_addrcost_table =
-{
-    {
-      0, /* hi  */
-      0, /* si  */
-      0, /* di  */
-      2, /* ti  */
-    },
-  0, /* pre_modify  */
-  0, /* post_modify  */
-  0, /* post_modify_ld3_st3  */
-  0, /* post_modify_ld4_st4  */
-  1, /* register_offset  */
-  1, /* register_sextend  */
-  2, /* register_zextend  */
-  0, /* imm_offset  */
-};
-
-static const struct cpu_addrcost_table xgene1_addrcost_table =
-{
-    {
-      1, /* hi  */
-      0, /* si  */
-      0, /* di  */
-      1, /* ti  */
-    },
-  1, /* pre_modify  */
-  1, /* post_modify  */
-  1, /* post_modify_ld3_st3  */
-  1, /* post_modify_ld4_st4  */
-  0, /* register_offset  */
-  1, /* register_sextend  */
-  1, /* register_zextend  */
-  0, /* imm_offset  */
-};
-
-static const struct cpu_addrcost_table thunderx2t99_addrcost_table =
-{
-    {
-      1, /* hi  */
-      1, /* si  */
-      1, /* di  */
-      2, /* ti  */
-    },
-  0, /* pre_modify  */
-  0, /* post_modify  */
-  0, /* post_modify_ld3_st3  */
-  0, /* post_modify_ld4_st4  */
-  2, /* register_offset  */
-  3, /* register_sextend  */
-  3, /* register_zextend  */
-  0, /* imm_offset  */
-};
-
-static const struct cpu_addrcost_table thunderx3t110_addrcost_table =
-{
-    {
-      1, /* hi  */
-      1, /* si  */
-      1, /* di  */
-      2, /* ti  */
-    },
-  0, /* pre_modify  */
-  0, /* post_modify  */
-  0, /* post_modify_ld3_st3  */
-  0, /* post_modify_ld4_st4  */
-  2, /* register_offset  */
-  3, /* register_sextend  */
-  3, /* register_zextend  */
-  0, /* imm_offset  */
-};
-
-static const struct cpu_addrcost_table tsv110_addrcost_table =
-{
-    {
-      1, /* hi  */
-      0, /* si  */
-      0, /* di  */
-      1, /* ti  */
-    },
-  0, /* pre_modify  */
-  0, /* post_modify  */
-  0, /* post_modify_ld3_st3  */
-  0, /* post_modify_ld4_st4  */
-  0, /* register_offset  */
-  1, /* register_sextend  */
-  1, /* register_zextend  */
-  0, /* imm_offset  */
-};
-
-static const struct cpu_addrcost_table qdf24xx_addrcost_table =
-{
-    {
-      1, /* hi  */
-      1, /* si  */
-      1, /* di  */
-      2, /* ti  */
-    },
-  1, /* pre_modify  */
-  1, /* post_modify  */
-  1, /* post_modify_ld3_st3  */
-  1, /* post_modify_ld4_st4  */
-  3, /* register_offset  */
-  3, /* register_sextend  */
-  3, /* register_zextend  */
-  2, /* imm_offset  */
-};
-
-static const struct cpu_addrcost_table a64fx_addrcost_table =
-{
-    {
-      1, /* hi  */
-      1, /* si  */
-      1, /* di  */
-      2, /* ti  */
-    },
-  0, /* pre_modify  */
-  0, /* post_modify  */
-  0, /* post_modify_ld3_st3  */
-  0, /* post_modify_ld4_st4  */
-  2, /* register_offset  */
-  3, /* register_sextend  */
-  3, /* register_zextend  */
-  0, /* imm_offset  */
-};
-
-static const struct cpu_addrcost_table neoversev1_addrcost_table =
-{
-    {
-      1, /* hi  */
-      0, /* si  */
-      0, /* di  */
-      1, /* ti  */
-    },
-  0, /* pre_modify  */
-  0, /* post_modify  */
-  3, /* post_modify_ld3_st3  */
-  3, /* post_modify_ld4_st4  */
-  0, /* register_offset  */
-  0, /* register_sextend  */
-  0, /* register_zextend  */
-  0 /* imm_offset  */
-};
-
-static const struct cpu_addrcost_table neoversen2_addrcost_table =
-{
-    {
-      1, /* hi  */
-      0, /* si  */
-      0, /* di  */
-      1, /* ti  */
-    },
-  0, /* pre_modify  */
-  0, /* post_modify  */
-  2, /* post_modify_ld3_st3  */
-  2, /* post_modify_ld4_st4  */
-  0, /* register_offset  */
-  0, /* register_sextend  */
-  0, /* register_zextend  */
-  0 /* imm_offset  */
-};
-
-static const struct cpu_addrcost_table neoversev2_addrcost_table =
-{
-    {
-      1, /* hi  */
-      0, /* si  */
-      0, /* di  */
-      1, /* ti  */
-    },
-  0, /* pre_modify  */
-  0, /* post_modify  */
-  2, /* post_modify_ld3_st3  */
-  2, /* post_modify_ld4_st4  */
-  0, /* register_offset  */
-  0, /* register_sextend  */
-  0, /* register_zextend  */
-  0 /* imm_offset  */
-};
-
-static const struct cpu_regmove_cost generic_regmove_cost =
-{
-  1, /* GP2GP  */
-  /* Avoid the use of slow int<->fp moves for spilling by setting
-     their cost higher than memmov_cost.  */
-  5, /* GP2FP  */
-  5, /* FP2GP  */
-  2 /* FP2FP  */
-};
-
-static const struct cpu_regmove_cost cortexa57_regmove_cost =
-{
-  1, /* GP2GP  */
-  /* Avoid the use of slow int<->fp moves for spilling by setting
-     their cost higher than memmov_cost.  */
-  5, /* GP2FP  */
-  5, /* FP2GP  */
-  2 /* FP2FP  */
-};
-
-static const struct cpu_regmove_cost cortexa53_regmove_cost =
-{
-  1, /* GP2GP  */
-  /* Avoid the use of slow int<->fp moves for spilling by setting
-     their cost higher than memmov_cost.  */
-  5, /* GP2FP  */
-  5, /* FP2GP  */
-  2 /* FP2FP  */
-};
-
-static const struct cpu_regmove_cost exynosm1_regmove_cost =
-{
-  1, /* GP2GP  */
-  /* Avoid the use of slow int<->fp moves for spilling by setting
-     their cost higher than memmov_cost (actual, 4 and 9).  */
-  9, /* GP2FP  */
-  9, /* FP2GP  */
-  1 /* FP2FP  */
-};
-
-static const struct cpu_regmove_cost thunderx_regmove_cost =
-{
-  2, /* GP2GP  */
-  2, /* GP2FP  */
-  6, /* FP2GP  */
-  4 /* FP2FP  */
-};
-
-static const struct cpu_regmove_cost xgene1_regmove_cost =
-{
-  1, /* GP2GP  */
-  /* Avoid the use of slow int<->fp moves for spilling by setting
-     their cost higher than memmov_cost.  */
-  8, /* GP2FP  */
-  8, /* FP2GP  */
-  2 /* FP2FP  */
-};
-
-static const struct cpu_regmove_cost qdf24xx_regmove_cost =
-{
-  2, /* GP2GP  */
-  /* Avoid the use of int<->fp moves for spilling.  */
-  6, /* GP2FP  */
-  6, /* FP2GP  */
-  4 /* FP2FP  */
-};
-
-static const struct cpu_regmove_cost thunderx2t99_regmove_cost =
-{
-  1, /* GP2GP  */
-  /* Avoid the use of int<->fp moves for spilling.  */
-  5, /* GP2FP  */
-  6, /* FP2GP  */
-  3, /* FP2FP  */
-};
-
-static const struct cpu_regmove_cost thunderx3t110_regmove_cost =
-{
-  1, /* GP2GP  */
-  /* Avoid the use of int<->fp moves for spilling.  */
-  4, /* GP2FP  */
-  5, /* FP2GP  */
-  4  /* FP2FP  */
-};
-
-static const struct cpu_regmove_cost tsv110_regmove_cost =
-{
-  1, /* GP2GP  */
-  /* Avoid the use of slow int<->fp moves for spilling by setting
-     their cost higher than memmov_cost.  */
-  2, /* GP2FP  */
-  3, /* FP2GP  */
-  2  /* FP2FP  */
-};
-
-static const struct cpu_regmove_cost a64fx_regmove_cost =
-{
-  1, /* GP2GP  */
-  /* Avoid the use of slow int<->fp moves for spilling by setting
-     their cost higher than memmov_cost.  */
-  5, /* GP2FP  */
-  7, /* FP2GP  */
-  2 /* FP2FP  */
-};
-
-static const struct cpu_regmove_cost neoversen2_regmove_cost =
-{
-  1, /* GP2GP  */
-  /* Spilling to int<->fp instead of memory is recommended so set
-     realistic costs compared to memmov_cost.  */
-  3, /* GP2FP  */
-  2, /* FP2GP  */
-  2 /* FP2FP  */
-};
-
-static const struct cpu_regmove_cost neoversev1_regmove_cost =
-{
-  1, /* GP2GP  */
-  /* Spilling to int<->fp instead of memory is recommended so set
-     realistic costs compared to memmov_cost.  */
-  3, /* GP2FP  */
-  2, /* FP2GP  */
-  2 /* FP2FP  */
-};
-
-static const struct cpu_regmove_cost neoversev2_regmove_cost =
-{
-  1, /* GP2GP  */
-  /* Spilling to int<->fp instead of memory is recommended so set
-     realistic costs compared to memmov_cost.  */
-  3, /* GP2FP  */
-  2, /* FP2GP  */
-  2 /* FP2FP  */
-};
-
-/* Generic costs for Advanced SIMD vector operations.   */
-static const advsimd_vec_cost generic_advsimd_vector_cost =
-{
-  1, /* int_stmt_cost  */
-  1, /* fp_stmt_cost  */
-  0, /* ld2_st2_permute_cost  */
-  0, /* ld3_st3_permute_cost  */
-  0, /* ld4_st4_permute_cost  */
-  2, /* permute_cost  */
-  2, /* reduc_i8_cost  */
-  2, /* reduc_i16_cost  */
-  2, /* reduc_i32_cost  */
-  2, /* reduc_i64_cost  */
-  2, /* reduc_f16_cost  */
-  2, /* reduc_f32_cost  */
-  2, /* reduc_f64_cost  */
-  2, /* store_elt_extra_cost  */
-  2, /* vec_to_scalar_cost  */
-  1, /* scalar_to_vec_cost  */
-  1, /* align_load_cost  */
-  1, /* unalign_load_cost  */
-  1, /* unalign_store_cost  */
-  1  /* store_cost  */
-};
-
-/* Generic costs for SVE vector operations.  */
-static const sve_vec_cost generic_sve_vector_cost =
-{
-  {
-    1, /* int_stmt_cost  */
-    1, /* fp_stmt_cost  */
-    0, /* ld2_st2_permute_cost  */
-    0, /* ld3_st3_permute_cost  */
-    0, /* ld4_st4_permute_cost  */
-    2, /* permute_cost  */
-    2, /* reduc_i8_cost  */
-    2, /* reduc_i16_cost  */
-    2, /* reduc_i32_cost  */
-    2, /* reduc_i64_cost  */
-    2, /* reduc_f16_cost  */
-    2, /* reduc_f32_cost  */
-    2, /* reduc_f64_cost  */
-    2, /* store_elt_extra_cost  */
-    2, /* vec_to_scalar_cost  */
-    1, /* scalar_to_vec_cost  */
-    1, /* align_load_cost  */
-    1, /* unalign_load_cost  */
-    1, /* unalign_store_cost  */
-    1  /* store_cost  */
-  },
-  2, /* clast_cost  */
-  2, /* fadda_f16_cost  */
-  2, /* fadda_f32_cost  */
-  2, /* fadda_f64_cost  */
-  4, /* gather_load_x32_cost  */
-  2, /* gather_load_x64_cost  */
-  1 /* scatter_store_elt_cost  */
-};
-
-/* Generic costs for vector insn classes.  */
-static const struct cpu_vector_cost generic_vector_cost =
-{
-  1, /* scalar_int_stmt_cost  */
-  1, /* scalar_fp_stmt_cost  */
-  1, /* scalar_load_cost  */
-  1, /* scalar_store_cost  */
-  3, /* cond_taken_branch_cost  */
-  1, /* cond_not_taken_branch_cost  */
-  &generic_advsimd_vector_cost, /* advsimd  */
-  &generic_sve_vector_cost, /* sve */
-  nullptr /* issue_info  */
-};
-
-static const advsimd_vec_cost a64fx_advsimd_vector_cost =
-{
-  2, /* int_stmt_cost  */
-  5, /* fp_stmt_cost  */
-  0, /* ld2_st2_permute_cost  */
-  0, /* ld3_st3_permute_cost  */
-  0, /* ld4_st4_permute_cost  */
-  3, /* permute_cost  */
-  13, /* reduc_i8_cost  */
-  13, /* reduc_i16_cost  */
-  13, /* reduc_i32_cost  */
-  13, /* reduc_i64_cost  */
-  13, /* reduc_f16_cost  */
-  13, /* reduc_f32_cost  */
-  13, /* reduc_f64_cost  */
-  13, /* store_elt_extra_cost  */
-  13, /* vec_to_scalar_cost  */
-  4, /* scalar_to_vec_cost  */
-  6, /* align_load_cost  */
-  6, /* unalign_load_cost  */
-  1, /* unalign_store_cost  */
-  1  /* store_cost  */
-};
-
-static const sve_vec_cost a64fx_sve_vector_cost =
-{
-  {
-    2, /* int_stmt_cost  */
-    5, /* fp_stmt_cost  */
-    0, /* ld2_st2_permute_cost  */
-    0, /* ld3_st3_permute_cost  */
-    0, /* ld4_st4_permute_cost  */
-    3, /* permute_cost  */
-    13, /* reduc_i8_cost  */
-    13, /* reduc_i16_cost  */
-    13, /* reduc_i32_cost  */
-    13, /* reduc_i64_cost  */
-    13, /* reduc_f16_cost  */
-    13, /* reduc_f32_cost  */
-    13, /* reduc_f64_cost  */
-    13, /* store_elt_extra_cost  */
-    13, /* vec_to_scalar_cost  */
-    4, /* scalar_to_vec_cost  */
-    6, /* align_load_cost  */
-    6, /* unalign_load_cost  */
-    1, /* unalign_store_cost  */
-    1  /* store_cost  */
-  },
-  13, /* clast_cost  */
-  13, /* fadda_f16_cost  */
-  13, /* fadda_f32_cost  */
-  13, /* fadda_f64_cost  */
-  64, /* gather_load_x32_cost  */
-  32, /* gather_load_x64_cost  */
-  1 /* scatter_store_elt_cost  */
-};
-
-static const struct cpu_vector_cost a64fx_vector_cost =
-{
-  1, /* scalar_int_stmt_cost  */
-  5, /* scalar_fp_stmt_cost  */
-  4, /* scalar_load_cost  */
-  1, /* scalar_store_cost  */
-  3, /* cond_taken_branch_cost  */
-  1, /* cond_not_taken_branch_cost  */
-  &a64fx_advsimd_vector_cost, /* advsimd  */
-  &a64fx_sve_vector_cost, /* sve  */
-  nullptr /* issue_info  */
-};
-
-static const advsimd_vec_cost qdf24xx_advsimd_vector_cost =
-{
-  1, /* int_stmt_cost  */
-  3, /* fp_stmt_cost  */
-  0, /* ld2_st2_permute_cost  */
-  0, /* ld3_st3_permute_cost  */
-  0, /* ld4_st4_permute_cost  */
-  2, /* permute_cost  */
-  1, /* reduc_i8_cost  */
-  1, /* reduc_i16_cost  */
-  1, /* reduc_i32_cost  */
-  1, /* reduc_i64_cost  */
-  1, /* reduc_f16_cost  */
-  1, /* reduc_f32_cost  */
-  1, /* reduc_f64_cost  */
-  1, /* store_elt_extra_cost  */
-  1, /* vec_to_scalar_cost  */
-  1, /* scalar_to_vec_cost  */
-  1, /* align_load_cost  */
-  1, /* unalign_load_cost  */
-  1, /* unalign_store_cost  */
-  1  /* store_cost  */
-};
-
-/* QDF24XX costs for vector insn classes.  */
-static const struct cpu_vector_cost qdf24xx_vector_cost =
-{
-  1, /* scalar_int_stmt_cost  */
-  1, /* scalar_fp_stmt_cost  */
-  1, /* scalar_load_cost  */
-  1, /* scalar_store_cost  */
-  3, /* cond_taken_branch_cost  */
-  1, /* cond_not_taken_branch_cost  */
-  &qdf24xx_advsimd_vector_cost, /* advsimd  */
-  nullptr, /* sve  */
-  nullptr /* issue_info  */
-};
-
-
-static const advsimd_vec_cost thunderx_advsimd_vector_cost =
-{
-  4, /* int_stmt_cost  */
-  1, /* fp_stmt_cost  */
-  0, /* ld2_st2_permute_cost  */
-  0, /* ld3_st3_permute_cost  */
-  0, /* ld4_st4_permute_cost  */
-  4, /* permute_cost  */
-  2, /* reduc_i8_cost  */
-  2, /* reduc_i16_cost  */
-  2, /* reduc_i32_cost  */
-  2, /* reduc_i64_cost  */
-  2, /* reduc_f16_cost  */
-  2, /* reduc_f32_cost  */
-  2, /* reduc_f64_cost  */
-  2, /* store_elt_extra_cost  */
-  2, /* vec_to_scalar_cost  */
-  2, /* scalar_to_vec_cost  */
-  3, /* align_load_cost  */
-  5, /* unalign_load_cost  */
-  5, /* unalign_store_cost  */
-  1  /* store_cost  */
-};
-
-/* ThunderX costs for vector insn classes.  */
-static const struct cpu_vector_cost thunderx_vector_cost =
-{
-  1, /* scalar_int_stmt_cost  */
-  1, /* scalar_fp_stmt_cost  */
-  3, /* scalar_load_cost  */
-  1, /* scalar_store_cost  */
-  3, /* cond_taken_branch_cost  */
-  3, /* cond_not_taken_branch_cost  */
-  &thunderx_advsimd_vector_cost, /* advsimd  */
-  nullptr, /* sve  */
-  nullptr /* issue_info  */
-};
-
-static const advsimd_vec_cost tsv110_advsimd_vector_cost =
-{
-  2, /* int_stmt_cost  */
-  2, /* fp_stmt_cost  */
-  0, /* ld2_st2_permute_cost  */
-  0, /* ld3_st3_permute_cost  */
-  0, /* ld4_st4_permute_cost  */
-  2, /* permute_cost  */
-  3, /* reduc_i8_cost  */
-  3, /* reduc_i16_cost  */
-  3, /* reduc_i32_cost  */
-  3, /* reduc_i64_cost  */
-  3, /* reduc_f16_cost  */
-  3, /* reduc_f32_cost  */
-  3, /* reduc_f64_cost  */
-  3, /* store_elt_extra_cost  */
-  3, /* vec_to_scalar_cost  */
-  2, /* scalar_to_vec_cost  */
-  5, /* align_load_cost  */
-  5, /* unalign_load_cost  */
-  1, /* unalign_store_cost  */
-  1  /* store_cost  */
-};
-
-static const struct cpu_vector_cost tsv110_vector_cost =
-{
-  1, /* scalar_int_stmt_cost  */
-  1, /* scalar_fp_stmt_cost  */
-  5, /* scalar_load_cost  */
-  1, /* scalar_store_cost  */
-  1, /* cond_taken_branch_cost  */
-  1, /* cond_not_taken_branch_cost  */
-  &tsv110_advsimd_vector_cost, /* advsimd  */
-  nullptr, /* sve  */
-  nullptr /* issue_info  */
-};
-
-static const advsimd_vec_cost cortexa57_advsimd_vector_cost =
-{
-  2, /* int_stmt_cost  */
-  2, /* fp_stmt_cost  */
-  0, /* ld2_st2_permute_cost  */
-  0, /* ld3_st3_permute_cost  */
-  0, /* ld4_st4_permute_cost  */
-  3, /* permute_cost  */
-  8, /* reduc_i8_cost  */
-  8, /* reduc_i16_cost  */
-  8, /* reduc_i32_cost  */
-  8, /* reduc_i64_cost  */
-  8, /* reduc_f16_cost  */
-  8, /* reduc_f32_cost  */
-  8, /* reduc_f64_cost  */
-  8, /* store_elt_extra_cost  */
-  8, /* vec_to_scalar_cost  */
-  8, /* scalar_to_vec_cost  */
-  4, /* align_load_cost  */
-  4, /* unalign_load_cost  */
-  1, /* unalign_store_cost  */
-  1  /* store_cost  */
-};
-
-/* Cortex-A57 costs for vector insn classes.  */
-static const struct cpu_vector_cost cortexa57_vector_cost =
-{
-  1, /* scalar_int_stmt_cost  */
-  1, /* scalar_fp_stmt_cost  */
-  4, /* scalar_load_cost  */
-  1, /* scalar_store_cost  */
-  1, /* cond_taken_branch_cost  */
-  1, /* cond_not_taken_branch_cost  */
-  &cortexa57_advsimd_vector_cost, /* advsimd  */
-  nullptr, /* sve  */
-  nullptr /* issue_info  */
-};
-
-static const advsimd_vec_cost exynosm1_advsimd_vector_cost =
-{
-  3, /* int_stmt_cost  */
-  3, /* fp_stmt_cost  */
-  0, /* ld2_st2_permute_cost  */
-  0, /* ld3_st3_permute_cost  */
-  0, /* ld4_st4_permute_cost  */
-  3, /* permute_cost  */
-  3, /* reduc_i8_cost  */
-  3, /* reduc_i16_cost  */
-  3, /* reduc_i32_cost  */
-  3, /* reduc_i64_cost  */
-  3, /* reduc_f16_cost  */
-  3, /* reduc_f32_cost  */
-  3, /* reduc_f64_cost  */
-  3, /* store_elt_extra_cost  */
-  3, /* vec_to_scalar_cost  */
-  3, /* scalar_to_vec_cost  */
-  5, /* align_load_cost  */
-  5, /* unalign_load_cost  */
-  1, /* unalign_store_cost  */
-  1  /* store_cost  */
-};
-
-static const struct cpu_vector_cost exynosm1_vector_cost =
-{
-  1, /* scalar_int_stmt_cost  */
-  1, /* scalar_fp_stmt_cost  */
-  5, /* scalar_load_cost  */
-  1, /* scalar_store_cost  */
-  1, /* cond_taken_branch_cost  */
-  1, /* cond_not_taken_branch_cost  */
-  &exynosm1_advsimd_vector_cost, /* advsimd  */
-  nullptr, /* sve  */
-  nullptr /* issue_info  */
-};
-
-static const advsimd_vec_cost xgene1_advsimd_vector_cost =
-{
-  2, /* int_stmt_cost  */
-  2, /* fp_stmt_cost  */
-  0, /* ld2_st2_permute_cost  */
-  0, /* ld3_st3_permute_cost  */
-  0, /* ld4_st4_permute_cost  */
-  2, /* permute_cost  */
-  4, /* reduc_i8_cost  */
-  4, /* reduc_i16_cost  */
-  4, /* reduc_i32_cost  */
-  4, /* reduc_i64_cost  */
-  4, /* reduc_f16_cost  */
-  4, /* reduc_f32_cost  */
-  4, /* reduc_f64_cost  */
-  4, /* store_elt_extra_cost  */
-  4, /* vec_to_scalar_cost  */
-  4, /* scalar_to_vec_cost  */
-  10, /* align_load_cost  */
-  10, /* unalign_load_cost  */
-  2, /* unalign_store_cost  */
-  2  /* store_cost  */
-};
-
-/* Generic costs for vector insn classes.  */
-static const struct cpu_vector_cost xgene1_vector_cost =
-{
-  1, /* scalar_int_stmt_cost  */
-  1, /* scalar_fp_stmt_cost  */
-  5, /* scalar_load_cost  */
-  1, /* scalar_store_cost  */
-  2, /* cond_taken_branch_cost  */
-  1, /* cond_not_taken_branch_cost  */
-  &xgene1_advsimd_vector_cost, /* advsimd  */
-  nullptr, /* sve  */
-  nullptr /* issue_info  */
-};
-
-static const advsimd_vec_cost thunderx2t99_advsimd_vector_cost =
-{
-  4, /* int_stmt_cost  */
-  5, /* fp_stmt_cost  */
-  0, /* ld2_st2_permute_cost  */
-  0, /* ld3_st3_permute_cost  */
-  0, /* ld4_st4_permute_cost  */
-  10, /* permute_cost  */
-  6, /* reduc_i8_cost  */
-  6, /* reduc_i16_cost  */
-  6, /* reduc_i32_cost  */
-  6, /* reduc_i64_cost  */
-  6, /* reduc_f16_cost  */
-  6, /* reduc_f32_cost  */
-  6, /* reduc_f64_cost  */
-  6, /* store_elt_extra_cost  */
-  6, /* vec_to_scalar_cost  */
-  5, /* scalar_to_vec_cost  */
-  4, /* align_load_cost  */
-  4, /* unalign_load_cost  */
-  1, /* unalign_store_cost  */
-  1  /* store_cost  */
-};
-
-/* Costs for vector insn classes for Vulcan.  */
-static const struct cpu_vector_cost thunderx2t99_vector_cost =
-{
-  1, /* scalar_int_stmt_cost  */
-  6, /* scalar_fp_stmt_cost  */
-  4, /* scalar_load_cost  */
-  1, /* scalar_store_cost  */
-  2, /* cond_taken_branch_cost  */
-  1,  /* cond_not_taken_branch_cost  */
-  &thunderx2t99_advsimd_vector_cost, /* advsimd  */
-  nullptr, /* sve  */
-  nullptr /* issue_info  */
-};
-
-static const advsimd_vec_cost thunderx3t110_advsimd_vector_cost =
-{
-  5, /* int_stmt_cost  */
-  5, /* fp_stmt_cost  */
-  0, /* ld2_st2_permute_cost  */
-  0, /* ld3_st3_permute_cost  */
-  0, /* ld4_st4_permute_cost  */
-  10, /* permute_cost  */
-  5, /* reduc_i8_cost  */
-  5, /* reduc_i16_cost  */
-  5, /* reduc_i32_cost  */
-  5, /* reduc_i64_cost  */
-  5, /* reduc_f16_cost  */
-  5, /* reduc_f32_cost  */
-  5, /* reduc_f64_cost  */
-  5, /* store_elt_extra_cost  */
-  5, /* vec_to_scalar_cost  */
-  5, /* scalar_to_vec_cost  */
-  4, /* align_load_cost  */
-  4, /* unalign_load_cost  */
-  4, /* unalign_store_cost  */
-  4  /* store_cost  */
-};
-
-static const struct cpu_vector_cost thunderx3t110_vector_cost =
-{
-  1, /* scalar_int_stmt_cost  */
-  5, /* scalar_fp_stmt_cost  */
-  4, /* scalar_load_cost  */
-  1, /* scalar_store_cost  */
-  2, /* cond_taken_branch_cost  */
-  1,  /* cond_not_taken_branch_cost  */
-  &thunderx3t110_advsimd_vector_cost, /* advsimd  */
-  nullptr, /* sve  */
-  nullptr /* issue_info  */
-};
-
-static const advsimd_vec_cost ampere1_advsimd_vector_cost =
-{
-  1, /* int_stmt_cost  */
-  3, /* fp_stmt_cost  */
-  0, /* ld2_st2_permute_cost  */
-  0, /* ld3_st3_permute_cost  */
-  0, /* ld4_st4_permute_cost  */
-  2, /* permute_cost  */
-  12, /* reduc_i8_cost  */
-  9, /* reduc_i16_cost  */
-  6, /* reduc_i32_cost  */
-  5, /* reduc_i64_cost  */
-  9, /* reduc_f16_cost  */
-  6, /* reduc_f32_cost  */
-  5, /* reduc_f64_cost  */
-  8, /* store_elt_extra_cost  */
-  6, /* vec_to_scalar_cost  */
-  7, /* scalar_to_vec_cost  */
-  4, /* align_load_cost  */
-  4, /* unalign_load_cost  */
-  1, /* unalign_store_cost  */
-  1  /* store_cost  */
-};
-
-/* Ampere-1 costs for vector insn classes.  */
-static const struct cpu_vector_cost ampere1_vector_cost =
-{
-  1, /* scalar_int_stmt_cost  */
-  3, /* scalar_fp_stmt_cost  */
-  4, /* scalar_load_cost  */
-  1, /* scalar_store_cost  */
-  1, /* cond_taken_branch_cost  */
-  1, /* cond_not_taken_branch_cost  */
-  &ampere1_advsimd_vector_cost, /* advsimd  */
-  nullptr, /* sve  */
-  nullptr  /* issue_info  */
-};
-
-/* Generic costs for branch instructions.  */
-static const struct cpu_branch_cost generic_branch_cost =
-{
-  1,  /* Predictable.  */
-  3   /* Unpredictable.  */
-};
-
-/* Generic approximation modes.  */
-static const cpu_approx_modes generic_approx_modes =
-{
-  AARCH64_APPROX_NONE,	/* division  */
-  AARCH64_APPROX_NONE,	/* sqrt  */
-  AARCH64_APPROX_NONE	/* recip_sqrt  */
-};
-
-/* Approximation modes for Exynos M1.  */
-static const cpu_approx_modes exynosm1_approx_modes =
-{
-  AARCH64_APPROX_NONE,	/* division  */
-  AARCH64_APPROX_ALL,	/* sqrt  */
-  AARCH64_APPROX_ALL	/* recip_sqrt  */
-};
-
-/* Approximation modes for X-Gene 1.  */
-static const cpu_approx_modes xgene1_approx_modes =
-{
-  AARCH64_APPROX_NONE,	/* division  */
-  AARCH64_APPROX_NONE,	/* sqrt  */
-  AARCH64_APPROX_ALL	/* recip_sqrt  */
-};
-
-/* Generic prefetch settings (which disable prefetch).  */
-static const cpu_prefetch_tune generic_prefetch_tune =
-{
-  0,			/* num_slots  */
-  -1,			/* l1_cache_size  */
-  -1,			/* l1_cache_line_size  */
-  -1,			/* l2_cache_size  */
-  true,			/* prefetch_dynamic_strides */
-  -1,			/* minimum_stride */
-  -1			/* default_opt_level  */
-};
-
-static const cpu_prefetch_tune exynosm1_prefetch_tune =
-{
-  0,			/* num_slots  */
-  -1,			/* l1_cache_size  */
-  64,			/* l1_cache_line_size  */
-  -1,			/* l2_cache_size  */
-  true,			/* prefetch_dynamic_strides */
-  -1,			/* minimum_stride */
-  -1			/* default_opt_level  */
-};
-
-static const cpu_prefetch_tune qdf24xx_prefetch_tune =
-{
-  4,			/* num_slots  */
-  32,			/* l1_cache_size  */
-  64,			/* l1_cache_line_size  */
-  512,			/* l2_cache_size  */
-  false,		/* prefetch_dynamic_strides */
-  2048,			/* minimum_stride */
-  3			/* default_opt_level  */
-};
-
-static const cpu_prefetch_tune thunderxt88_prefetch_tune =
-{
-  8,			/* num_slots  */
-  32,			/* l1_cache_size  */
-  128,			/* l1_cache_line_size  */
-  16*1024,		/* l2_cache_size  */
-  true,			/* prefetch_dynamic_strides */
-  -1,			/* minimum_stride */
-  3			/* default_opt_level  */
-};
-
-static const cpu_prefetch_tune thunderx_prefetch_tune =
-{
-  8,			/* num_slots  */
-  32,			/* l1_cache_size  */
-  128,			/* l1_cache_line_size  */
-  -1,			/* l2_cache_size  */
-  true,			/* prefetch_dynamic_strides */
-  -1,			/* minimum_stride */
-  -1			/* default_opt_level  */
-};
-
-static const cpu_prefetch_tune thunderx2t99_prefetch_tune =
-{
-  8,			/* num_slots  */
-  32,			/* l1_cache_size  */
-  64,			/* l1_cache_line_size  */
-  256,			/* l2_cache_size  */
-  true,			/* prefetch_dynamic_strides */
-  -1,			/* minimum_stride */
-  -1			/* default_opt_level  */
-};
-
-static const cpu_prefetch_tune thunderx3t110_prefetch_tune =
-{
-  8,			/* num_slots  */
-  32,			/* l1_cache_size  */
-  64,			/* l1_cache_line_size  */
-  256,			/* l2_cache_size  */
-  true,			/* prefetch_dynamic_strides */
-  -1,			/* minimum_stride */
-  -1			/* default_opt_level  */
-};
-
-static const cpu_prefetch_tune tsv110_prefetch_tune =
-{
-  0,                    /* num_slots  */
-  64,                   /* l1_cache_size  */
-  64,                   /* l1_cache_line_size  */
-  512,                  /* l2_cache_size  */
-  true,                 /* prefetch_dynamic_strides */
-  -1,                   /* minimum_stride */
-  -1                    /* default_opt_level  */
-};
-
-static const cpu_prefetch_tune xgene1_prefetch_tune =
-{
-  8,			/* num_slots  */
-  32,			/* l1_cache_size  */
-  64,			/* l1_cache_line_size  */
-  256,			/* l2_cache_size  */
-  true,                 /* prefetch_dynamic_strides */
-  -1,                   /* minimum_stride */
-  -1			/* default_opt_level  */
-};
-
-static const cpu_prefetch_tune a64fx_prefetch_tune =
-{
-  8,			/* num_slots  */
-  64,			/* l1_cache_size  */
-  256,			/* l1_cache_line_size  */
-  32768,		/* l2_cache_size  */
-  true,			/* prefetch_dynamic_strides */
-  -1,			/* minimum_stride */
-  -1			/* default_opt_level  */
-};
-
-static const cpu_prefetch_tune ampere1_prefetch_tune =
-{
-  0,			/* num_slots  */
-  64,			/* l1_cache_size  */
-  64,			/* l1_cache_line_size  */
-  2048,			/* l2_cache_size  */
-  true,			/* prefetch_dynamic_strides */
-  -1,			/* minimum_stride */
-  -1			/* default_opt_level  */
-};
-
-static const struct tune_params generic_tunings =
-{
-  &cortexa57_extra_costs,
-  &generic_addrcost_table,
-  &generic_regmove_cost,
-  &generic_vector_cost,
-  &generic_branch_cost,
-  &generic_approx_modes,
-  SVE_NOT_IMPLEMENTED, /* sve_width  */
-  { 4, /* load_int.  */
-    4, /* store_int.  */
-    4, /* load_fp.  */
-    4, /* store_fp.  */
-    4, /* load_pred.  */
-    4 /* store_pred.  */
-  }, /* memmov_cost.  */
-  2, /* issue_rate  */
-  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops  */
-  "16:12",	/* function_align.  */
-  "4",	/* jump_align.  */
-  "8",	/* loop_align.  */
-  2,	/* int_reassoc_width.  */
-  4,	/* fp_reassoc_width.  */
-  1,	/* fma_reassoc_width.  */
-  1,	/* vec_reassoc_width.  */
-  2,	/* min_div_recip_mul_sf.  */
-  2,	/* min_div_recip_mul_df.  */
-  0,	/* max_case_values.  */
-  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
-  /* Enabling AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS significantly benefits
-     Neoverse V1.  It does not have a noticeable effect on A64FX and should
-     have at most a very minor effect on SVE2 cores.  */
-  (AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS),	/* tune_flags.  */
-  &generic_prefetch_tune,
-  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
-  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
-};
-
-static const struct tune_params cortexa35_tunings =
-{
-  &cortexa53_extra_costs,
-  &generic_addrcost_table,
-  &cortexa53_regmove_cost,
-  &generic_vector_cost,
-  &generic_branch_cost,
-  &generic_approx_modes,
-  SVE_NOT_IMPLEMENTED, /* sve_width  */
-  { 4, /* load_int.  */
-    4, /* store_int.  */
-    4, /* load_fp.  */
-    4, /* store_fp.  */
-    4, /* load_pred.  */
-    4 /* store_pred.  */
-  }, /* memmov_cost.  */
-  1, /* issue_rate  */
-  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
-   | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops  */
-  "16",	/* function_align.  */
-  "4",	/* jump_align.  */
-  "8",	/* loop_align.  */
-  2,	/* int_reassoc_width.  */
-  4,	/* fp_reassoc_width.  */
-  1,	/* fma_reassoc_width.  */
-  1,	/* vec_reassoc_width.  */
-  2,	/* min_div_recip_mul_sf.  */
-  2,	/* min_div_recip_mul_df.  */
-  0,	/* max_case_values.  */
-  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
-  (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
-  &generic_prefetch_tune,
-  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
-  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
-};
-
-static const struct tune_params cortexa53_tunings =
-{
-  &cortexa53_extra_costs,
-  &generic_addrcost_table,
-  &cortexa53_regmove_cost,
-  &generic_vector_cost,
-  &generic_branch_cost,
-  &generic_approx_modes,
-  SVE_NOT_IMPLEMENTED, /* sve_width  */
-  { 4, /* load_int.  */
-    4, /* store_int.  */
-    4, /* load_fp.  */
-    4, /* store_fp.  */
-    4, /* load_pred.  */
-    4 /* store_pred.  */
-  }, /* memmov_cost.  */
-  2, /* issue_rate  */
-  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
-   | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops  */
-  "16",	/* function_align.  */
-  "4",	/* jump_align.  */
-  "8",	/* loop_align.  */
-  2,	/* int_reassoc_width.  */
-  4,	/* fp_reassoc_width.  */
-  1,	/* fma_reassoc_width.  */
-  1,	/* vec_reassoc_width.  */
-  2,	/* min_div_recip_mul_sf.  */
-  2,	/* min_div_recip_mul_df.  */
-  0,	/* max_case_values.  */
-  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
-  (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
-  &generic_prefetch_tune,
-  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
-  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
-};
-
-static const struct tune_params cortexa57_tunings =
-{
-  &cortexa57_extra_costs,
-  &generic_addrcost_table,
-  &cortexa57_regmove_cost,
-  &cortexa57_vector_cost,
-  &generic_branch_cost,
-  &generic_approx_modes,
-  SVE_NOT_IMPLEMENTED, /* sve_width  */
-  { 4, /* load_int.  */
-    4, /* store_int.  */
-    4, /* load_fp.  */
-    4, /* store_fp.  */
-    4, /* load_pred.  */
-    4 /* store_pred.  */
-  }, /* memmov_cost.  */
-  3, /* issue_rate  */
-  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
-   | AARCH64_FUSE_MOVK_MOVK), /* fusible_ops  */
-  "16",	/* function_align.  */
-  "4",	/* jump_align.  */
-  "8",	/* loop_align.  */
-  2,	/* int_reassoc_width.  */
-  4,	/* fp_reassoc_width.  */
-  1,	/* fma_reassoc_width.  */
-  1,	/* vec_reassoc_width.  */
-  2,	/* min_div_recip_mul_sf.  */
-  2,	/* min_div_recip_mul_df.  */
-  0,	/* max_case_values.  */
-  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
-  (AARCH64_EXTRA_TUNE_RENAME_FMA_REGS),	/* tune_flags.  */
-  &generic_prefetch_tune,
-  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
-  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
-};
-
-static const struct tune_params cortexa72_tunings =
-{
-  &cortexa57_extra_costs,
-  &generic_addrcost_table,
-  &cortexa57_regmove_cost,
-  &cortexa57_vector_cost,
-  &generic_branch_cost,
-  &generic_approx_modes,
-  SVE_NOT_IMPLEMENTED, /* sve_width  */
-  { 4, /* load_int.  */
-    4, /* store_int.  */
-    4, /* load_fp.  */
-    4, /* store_fp.  */
-    4, /* load_pred.  */
-    4 /* store_pred.  */
-  }, /* memmov_cost.  */
-  3, /* issue_rate  */
-  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
-   | AARCH64_FUSE_MOVK_MOVK), /* fusible_ops  */
-  "16",	/* function_align.  */
-  "4",	/* jump_align.  */
-  "8",	/* loop_align.  */
-  2,	/* int_reassoc_width.  */
-  4,	/* fp_reassoc_width.  */
-  1,	/* fma_reassoc_width.  */
-  1,	/* vec_reassoc_width.  */
-  2,	/* min_div_recip_mul_sf.  */
-  2,	/* min_div_recip_mul_df.  */
-  0,	/* max_case_values.  */
-  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
-  (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
-  &generic_prefetch_tune,
-  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
-  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
-};
-
-static const struct tune_params cortexa73_tunings =
-{
-  &cortexa57_extra_costs,
-  &generic_addrcost_table,
-  &cortexa57_regmove_cost,
-  &cortexa57_vector_cost,
-  &generic_branch_cost,
-  &generic_approx_modes,
-  SVE_NOT_IMPLEMENTED, /* sve_width  */
-  { 4, /* load_int.  */
-    4, /* store_int.  */
-    4, /* load_fp.  */
-    4, /* store_fp.  */
-    4, /* load_pred.  */
-    4 /* store_pred.  */
-  }, /* memmov_cost.  */
-  2, /* issue_rate.  */
-  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
-   | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops  */
-  "16",	/* function_align.  */
-  "4",	/* jump_align.  */
-  "8",	/* loop_align.  */
-  2,	/* int_reassoc_width.  */
-  4,	/* fp_reassoc_width.  */
-  1,	/* fma_reassoc_width.  */
-  1,	/* vec_reassoc_width.  */
-  2,	/* min_div_recip_mul_sf.  */
-  2,	/* min_div_recip_mul_df.  */
-  0,	/* max_case_values.  */
-  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
-  (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
-  &generic_prefetch_tune,
-  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
-  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
-};
-
-static const struct tune_params exynosm1_tunings =
-{
-  &exynosm1_extra_costs,
-  &exynosm1_addrcost_table,
-  &exynosm1_regmove_cost,
-  &exynosm1_vector_cost,
-  &generic_branch_cost,
-  &exynosm1_approx_modes,
-  SVE_NOT_IMPLEMENTED, /* sve_width  */
-  { 4, /* load_int.  */
-    4, /* store_int.  */
-    4, /* load_fp.  */
-    4, /* store_fp.  */
-    4, /* load_pred.  */
-    4 /* store_pred.  */
-  }, /* memmov_cost.  */
-  3,	/* issue_rate  */
-  (AARCH64_FUSE_AES_AESMC), /* fusible_ops  */
-  "4",	/* function_align.  */
-  "4",	/* jump_align.  */
-  "4",	/* loop_align.  */
-  2,	/* int_reassoc_width.  */
-  4,	/* fp_reassoc_width.  */
-  1,	/* fma_reassoc_width.  */
-  1,	/* vec_reassoc_width.  */
-  2,	/* min_div_recip_mul_sf.  */
-  2,	/* min_div_recip_mul_df.  */
-  48,	/* max_case_values.  */
-  tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model.  */
-  (AARCH64_EXTRA_TUNE_NONE), /* tune_flags.  */
-  &exynosm1_prefetch_tune,
-  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
-  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
-};
-
-static const struct tune_params thunderxt88_tunings =
-{
-  &thunderx_extra_costs,
-  &generic_addrcost_table,
-  &thunderx_regmove_cost,
-  &thunderx_vector_cost,
-  &generic_branch_cost,
-  &generic_approx_modes,
-  SVE_NOT_IMPLEMENTED, /* sve_width  */
-  { 6, /* load_int.  */
-    6, /* store_int.  */
-    6, /* load_fp.  */
-    6, /* store_fp.  */
-    6, /* load_pred.  */
-    6 /* store_pred.  */
-  }, /* memmov_cost.  */
-  2, /* issue_rate  */
-  AARCH64_FUSE_ALU_BRANCH, /* fusible_ops  */
-  "8",	/* function_align.  */
-  "8",	/* jump_align.  */
-  "8",	/* loop_align.  */
-  2,	/* int_reassoc_width.  */
-  4,	/* fp_reassoc_width.  */
-  1,	/* fma_reassoc_width.  */
-  1,	/* vec_reassoc_width.  */
-  2,	/* min_div_recip_mul_sf.  */
-  2,	/* min_div_recip_mul_df.  */
-  0,	/* max_case_values.  */
-  tune_params::AUTOPREFETCHER_OFF,	/* autoprefetcher_model.  */
-  (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
-  &thunderxt88_prefetch_tune,
-  AARCH64_LDP_STP_POLICY_ALIGNED,   /* ldp_policy_model.  */
-  AARCH64_LDP_STP_POLICY_ALIGNED    /* stp_policy_model.  */
-};
-
-static const struct tune_params thunderx_tunings =
-{
-  &thunderx_extra_costs,
-  &generic_addrcost_table,
-  &thunderx_regmove_cost,
-  &thunderx_vector_cost,
-  &generic_branch_cost,
-  &generic_approx_modes,
-  SVE_NOT_IMPLEMENTED, /* sve_width  */
-  { 6, /* load_int.  */
-    6, /* store_int.  */
-    6, /* load_fp.  */
-    6, /* store_fp.  */
-    6, /* load_pred.  */
-    6 /* store_pred.  */
-  }, /* memmov_cost.  */
-  2, /* issue_rate  */
-  AARCH64_FUSE_ALU_BRANCH, /* fusible_ops  */
-  "8",	/* function_align.  */
-  "8",	/* jump_align.  */
-  "8",	/* loop_align.  */
-  2,	/* int_reassoc_width.  */
-  4,	/* fp_reassoc_width.  */
-  1,	/* fma_reassoc_width.  */
-  1,	/* vec_reassoc_width.  */
-  2,	/* min_div_recip_mul_sf.  */
-  2,	/* min_div_recip_mul_df.  */
-  0,	/* max_case_values.  */
-  tune_params::AUTOPREFETCHER_OFF,	/* autoprefetcher_model.  */
-  (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND),	/* tune_flags.  */
-  &thunderx_prefetch_tune,
-  AARCH64_LDP_STP_POLICY_ALIGNED,   /* ldp_policy_model.  */
-  AARCH64_LDP_STP_POLICY_ALIGNED    /* stp_policy_model.  */
-};
-
-static const struct tune_params tsv110_tunings =
-{
-  &tsv110_extra_costs,
-  &tsv110_addrcost_table,
-  &tsv110_regmove_cost,
-  &tsv110_vector_cost,
-  &generic_branch_cost,
-  &generic_approx_modes,
-  SVE_NOT_IMPLEMENTED, /* sve_width  */
-  { 4, /* load_int.  */
-    4, /* store_int.  */
-    4, /* load_fp.  */
-    4, /* store_fp.  */
-    4, /* load_pred.  */
-    4 /* store_pred.  */
-  }, /* memmov_cost.  */
-  4,    /* issue_rate  */
-  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_ALU_BRANCH
-   | AARCH64_FUSE_ALU_CBZ), /* fusible_ops  */
-  "16", /* function_align.  */
-  "4",  /* jump_align.  */
-  "8",  /* loop_align.  */
-  2,    /* int_reassoc_width.  */
-  4,    /* fp_reassoc_width.  */
-  1,	/* fma_reassoc_width.  */
-  1,    /* vec_reassoc_width.  */
-  2,    /* min_div_recip_mul_sf.  */
-  2,    /* min_div_recip_mul_df.  */
-  0,    /* max_case_values.  */
-  tune_params::AUTOPREFETCHER_WEAK,     /* autoprefetcher_model.  */
-  (AARCH64_EXTRA_TUNE_NONE),     /* tune_flags.  */
-  &tsv110_prefetch_tune,
-  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
-  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
-};
-
-static const struct tune_params xgene1_tunings =
-{
-  &xgene1_extra_costs,
-  &xgene1_addrcost_table,
-  &xgene1_regmove_cost,
-  &xgene1_vector_cost,
-  &generic_branch_cost,
-  &xgene1_approx_modes,
-  SVE_NOT_IMPLEMENTED, /* sve_width  */
-  { 6, /* load_int.  */
-    6, /* store_int.  */
-    6, /* load_fp.  */
-    6, /* store_fp.  */
-    6, /* load_pred.  */
-    6 /* store_pred.  */
-  }, /* memmov_cost.  */
-  4, /* issue_rate  */
-  AARCH64_FUSE_NOTHING, /* fusible_ops  */
-  "16",	/* function_align.  */
-  "16",	/* jump_align.  */
-  "16",	/* loop_align.  */
-  2,	/* int_reassoc_width.  */
-  4,	/* fp_reassoc_width.  */
-  1,	/* fma_reassoc_width.  */
-  1,	/* vec_reassoc_width.  */
-  2,	/* min_div_recip_mul_sf.  */
-  2,	/* min_div_recip_mul_df.  */
-  17,	/* max_case_values.  */
-  tune_params::AUTOPREFETCHER_OFF,	/* autoprefetcher_model.  */
-  (AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS),	/* tune_flags.  */
-  &xgene1_prefetch_tune,
-  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
-  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
-};
-
-static const struct tune_params emag_tunings =
-{
-  &xgene1_extra_costs,
-  &xgene1_addrcost_table,
-  &xgene1_regmove_cost,
-  &xgene1_vector_cost,
-  &generic_branch_cost,
-  &xgene1_approx_modes,
-  SVE_NOT_IMPLEMENTED,
-  { 6, /* load_int.  */
-    6, /* store_int.  */
-    6, /* load_fp.  */
-    6, /* store_fp.  */
-    6, /* load_pred.  */
-    6 /* store_pred.  */
-  }, /* memmov_cost.  */
-  4, /* issue_rate  */
-  AARCH64_FUSE_NOTHING, /* fusible_ops  */
-  "16",	/* function_align.  */
-  "16",	/* jump_align.  */
-  "16",	/* loop_align.  */
-  2,	/* int_reassoc_width.  */
-  4,	/* fp_reassoc_width.  */
-  1,	/* fma_reassoc_width.  */
-  1,	/* vec_reassoc_width.  */
-  2,	/* min_div_recip_mul_sf.  */
-  2,	/* min_div_recip_mul_df.  */
-  17,	/* max_case_values.  */
-  tune_params::AUTOPREFETCHER_OFF,	/* autoprefetcher_model.  */
-  (AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS),	/* tune_flags.  */
-  &xgene1_prefetch_tune,
-  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
-  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
-};
-
-static const struct tune_params qdf24xx_tunings =
-{
-  &qdf24xx_extra_costs,
-  &qdf24xx_addrcost_table,
-  &qdf24xx_regmove_cost,
-  &qdf24xx_vector_cost,
-  &generic_branch_cost,
-  &generic_approx_modes,
-  SVE_NOT_IMPLEMENTED, /* sve_width  */
-  { 4, /* load_int.  */
-    4, /* store_int.  */
-    4, /* load_fp.  */
-    4, /* store_fp.  */
-    4, /* load_pred.  */
-    4 /* store_pred.  */
-  }, /* memmov_cost.  */
-  4, /* issue_rate  */
-  (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
-   | AARCH64_FUSE_MOVK_MOVK), /* fuseable_ops  */
-  "16",	/* function_align.  */
-  "8",	/* jump_align.  */
-  "16",	/* loop_align.  */
-  2,	/* int_reassoc_width.  */
-  4,	/* fp_reassoc_width.  */
-  1,	/* fma_reassoc_width.  */
-  1,	/* vec_reassoc_width.  */
-  2,	/* min_div_recip_mul_sf.  */
-  2,	/* min_div_recip_mul_df.  */
-  0,	/* max_case_values.  */
-  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
-  AARCH64_EXTRA_TUNE_RENAME_LOAD_REGS, /* tune_flags.  */
-  &qdf24xx_prefetch_tune,
-  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
-  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
-};
-
-/* Tuning structure for the Qualcomm Saphira core.  Default to falkor values
-   for now.  */
-static const struct tune_params saphira_tunings =
-{
-  &generic_extra_costs,
-  &generic_addrcost_table,
-  &generic_regmove_cost,
-  &generic_vector_cost,
-  &generic_branch_cost,
-  &generic_approx_modes,
-  SVE_NOT_IMPLEMENTED, /* sve_width  */
-  { 4, /* load_int.  */
-    4, /* store_int.  */
-    4, /* load_fp.  */
-    4, /* store_fp.  */
-    4, /* load_pred.  */
-    4 /* store_pred.  */
-  }, /* memmov_cost.  */
-  4, /* issue_rate  */
-  (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
-   | AARCH64_FUSE_MOVK_MOVK), /* fuseable_ops  */
-  "16",	/* function_align.  */
-  "8",	/* jump_align.  */
-  "16",	/* loop_align.  */
-  2,	/* int_reassoc_width.  */
-  4,	/* fp_reassoc_width.  */
-  1,	/* fma_reassoc_width.  */
-  1,	/* vec_reassoc_width.  */
-  2,	/* min_div_recip_mul_sf.  */
-  2,	/* min_div_recip_mul_df.  */
-  0,	/* max_case_values.  */
-  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
-  (AARCH64_EXTRA_TUNE_NONE),		/* tune_flags.  */
-  &generic_prefetch_tune,
-  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
-  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
-};
-
-static const struct tune_params thunderx2t99_tunings =
-{
-  &thunderx2t99_extra_costs,
-  &thunderx2t99_addrcost_table,
-  &thunderx2t99_regmove_cost,
-  &thunderx2t99_vector_cost,
-  &generic_branch_cost,
-  &generic_approx_modes,
-  SVE_NOT_IMPLEMENTED, /* sve_width  */
-  { 4, /* load_int.  */
-    4, /* store_int.  */
-    4, /* load_fp.  */
-    4, /* store_fp.  */
-    4, /* load_pred.  */
-    4 /* store_pred.  */
-  }, /* memmov_cost.  */
-  4, /* issue_rate.  */
-  (AARCH64_FUSE_ALU_BRANCH | AARCH64_FUSE_AES_AESMC
-   | AARCH64_FUSE_ALU_CBZ), /* fusible_ops  */
-  "16",	/* function_align.  */
-  "8",	/* jump_align.  */
-  "16",	/* loop_align.  */
-  3,	/* int_reassoc_width.  */
-  2,	/* fp_reassoc_width.  */
-  1,	/* fma_reassoc_width.  */
-  2,	/* vec_reassoc_width.  */
-  2,	/* min_div_recip_mul_sf.  */
-  2,	/* min_div_recip_mul_df.  */
-  0,	/* max_case_values.  */
-  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
-  (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
-  &thunderx2t99_prefetch_tune,
-  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
-  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
-};
-
-static const struct tune_params thunderx3t110_tunings =
-{
-  &thunderx3t110_extra_costs,
-  &thunderx3t110_addrcost_table,
-  &thunderx3t110_regmove_cost,
-  &thunderx3t110_vector_cost,
-  &generic_branch_cost,
-  &generic_approx_modes,
-  SVE_NOT_IMPLEMENTED, /* sve_width  */
-  { 4, /* load_int.  */
-    4, /* store_int.  */
-    4, /* load_fp.  */
-    4, /* store_fp.  */
-    4, /* load_pred.  */
-    4 /* store_pred.  */
-  }, /* memmov_cost.  */
-  6, /* issue_rate.  */
-  (AARCH64_FUSE_ALU_BRANCH | AARCH64_FUSE_AES_AESMC
-   | AARCH64_FUSE_ALU_CBZ), /* fusible_ops  */
-  "16",	/* function_align.  */
-  "8",	/* jump_align.  */
-  "16",	/* loop_align.  */
-  3,	/* int_reassoc_width.  */
-  2,	/* fp_reassoc_width.  */
-  1,	/* fma_reassoc_width.  */
-  2,	/* vec_reassoc_width.  */
-  2,	/* min_div_recip_mul_sf.  */
-  2,	/* min_div_recip_mul_df.  */
-  0,	/* max_case_values.  */
-  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
-  (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
-  &thunderx3t110_prefetch_tune,
-  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
-  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
-};
-
-static const struct tune_params neoversen1_tunings =
-{
-  &cortexa76_extra_costs,
-  &generic_addrcost_table,
-  &generic_regmove_cost,
-  &cortexa57_vector_cost,
-  &generic_branch_cost,
-  &generic_approx_modes,
-  SVE_NOT_IMPLEMENTED, /* sve_width  */
-  { 4, /* load_int.  */
-    2, /* store_int.  */
-    5, /* load_fp.  */
-    2, /* store_fp.  */
-    4, /* load_pred.  */
-    4 /* store_pred.  */
-  }, /* memmov_cost.  */
-  3, /* issue_rate  */
-  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops  */
-  "32:16",	/* function_align.  */
-  "4",		/* jump_align.  */
-  "32:16",	/* loop_align.  */
-  2,	/* int_reassoc_width.  */
-  4,	/* fp_reassoc_width.  */
-  1,	/* fma_reassoc_width.  */
-  2,	/* vec_reassoc_width.  */
-  2,	/* min_div_recip_mul_sf.  */
-  2,	/* min_div_recip_mul_df.  */
-  0,	/* max_case_values.  */
-  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
-  (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND),	/* tune_flags.  */
-  &generic_prefetch_tune,
-  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
-  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
-};
-
-static const struct tune_params ampere1_tunings =
-{
-  &ampere1_extra_costs,
-  &generic_addrcost_table,
-  &generic_regmove_cost,
-  &ampere1_vector_cost,
-  &generic_branch_cost,
-  &generic_approx_modes,
-  SVE_NOT_IMPLEMENTED, /* sve_width  */
-  { 4, /* load_int.  */
-    4, /* store_int.  */
-    4, /* load_fp.  */
-    4, /* store_fp.  */
-    4, /* load_pred.  */
-    4 /* store_pred.  */
-  }, /* memmov_cost.  */
-  4, /* issue_rate  */
-  (AARCH64_FUSE_ADRP_ADD | AARCH64_FUSE_AES_AESMC |
-   AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_MOVK_MOVK |
-   AARCH64_FUSE_ALU_BRANCH /* adds, ands, bics, ccmp, ccmn */ |
-   AARCH64_FUSE_CMP_BRANCH),
-  /* fusible_ops  */
-  "32",		/* function_align.  */
-  "4",		/* jump_align.  */
-  "32:16",	/* loop_align.  */
-  2,	/* int_reassoc_width.  */
-  4,	/* fp_reassoc_width.  */
-  4,	/* fma_reassoc_width.  */
-  2,	/* vec_reassoc_width.  */
-  2,	/* min_div_recip_mul_sf.  */
-  2,	/* min_div_recip_mul_df.  */
-  0,	/* max_case_values.  */
-  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
-  (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
-  &ampere1_prefetch_tune,
-  AARCH64_LDP_STP_POLICY_ALIGNED,   /* ldp_policy_model.  */
-  AARCH64_LDP_STP_POLICY_ALIGNED    /* stp_policy_model.  */
-};
-
-static const struct tune_params ampere1a_tunings =
-{
-  &ampere1a_extra_costs,
-  &generic_addrcost_table,
-  &generic_regmove_cost,
-  &ampere1_vector_cost,
-  &generic_branch_cost,
-  &generic_approx_modes,
-  SVE_NOT_IMPLEMENTED, /* sve_width  */
-  { 4, /* load_int.  */
-    4, /* store_int.  */
-    4, /* load_fp.  */
-    4, /* store_fp.  */
-    4, /* load_pred.  */
-    4 /* store_pred.  */
-  }, /* memmov_cost.  */
-  4, /* issue_rate  */
-  (AARCH64_FUSE_ADRP_ADD | AARCH64_FUSE_AES_AESMC |
-   AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_MOVK_MOVK |
-   AARCH64_FUSE_ALU_BRANCH /* adds, ands, bics, ccmp, ccmn */ |
-   AARCH64_FUSE_CMP_BRANCH | AARCH64_FUSE_ALU_CBZ |
-   AARCH64_FUSE_ADDSUB_2REG_CONST1),
-  /* fusible_ops  */
-  "32",		/* function_align.  */
-  "4",		/* jump_align.  */
-  "32:16",	/* loop_align.  */
-  2,	/* int_reassoc_width.  */
-  4,	/* fp_reassoc_width.  */
-  1,	/* fma_reassoc_width.  */
-  2,	/* vec_reassoc_width.  */
-  2,	/* min_div_recip_mul_sf.  */
-  2,	/* min_div_recip_mul_df.  */
-  0,	/* max_case_values.  */
-  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
-  (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
-  &ampere1_prefetch_tune,
-  AARCH64_LDP_STP_POLICY_ALIGNED,   /* ldp_policy_model.  */
-  AARCH64_LDP_STP_POLICY_ALIGNED    /* stp_policy_model.  */
-};
-
-static const advsimd_vec_cost neoversev1_advsimd_vector_cost =
-{
-  2, /* int_stmt_cost  */
-  2, /* fp_stmt_cost  */
-  4, /* ld2_st2_permute_cost */
-  4, /* ld3_st3_permute_cost  */
-  5, /* ld4_st4_permute_cost  */
-  3, /* permute_cost  */
-  4, /* reduc_i8_cost  */
-  4, /* reduc_i16_cost  */
-  2, /* reduc_i32_cost  */
-  2, /* reduc_i64_cost  */
-  6, /* reduc_f16_cost  */
-  3, /* reduc_f32_cost  */
-  2, /* reduc_f64_cost  */
-  2, /* store_elt_extra_cost  */
-  /* This value is just inherited from the Cortex-A57 table.  */
-  8, /* vec_to_scalar_cost  */
-  /* This depends very much on what the scalar value is and
-     where it comes from.  E.g. some constants take two dependent
-     instructions or a load, while others might be moved from a GPR.
-     4 seems to be a reasonable compromise in practice.  */
-  4, /* scalar_to_vec_cost  */
-  4, /* align_load_cost  */
-  4, /* unalign_load_cost  */
-  /* Although stores have a latency of 2 and compete for the
-     vector pipes, in practice it's better not to model that.  */
-  1, /* unalign_store_cost  */
-  1  /* store_cost  */
-};
-
-static const sve_vec_cost neoversev1_sve_vector_cost =
-{
-  {
-    2, /* int_stmt_cost  */
-    2, /* fp_stmt_cost  */
-    4, /* ld2_st2_permute_cost  */
-    7, /* ld3_st3_permute_cost  */
-    8, /* ld4_st4_permute_cost  */
-    3, /* permute_cost  */
-    /* Theoretically, a reduction involving 31 scalar ADDs could
-       complete in ~9 cycles and would have a cost of 31.  [SU]ADDV
-       completes in 14 cycles, so give it a cost of 31 + 5.  */
-    36, /* reduc_i8_cost  */
-    /* Likewise for 15 scalar ADDs (~5 cycles) vs. 12: 15 + 7.  */
-    22, /* reduc_i16_cost  */
-    /* Likewise for 7 scalar ADDs (~3 cycles) vs. 10: 7 + 7.  */
-    14, /* reduc_i32_cost  */
-    /* Likewise for 3 scalar ADDs (~2 cycles) vs. 10: 3 + 8.  */
-    11, /* reduc_i64_cost  */
-    /* Theoretically, a reduction involving 15 scalar FADDs could
-       complete in ~9 cycles and would have a cost of 30.  FADDV
-       completes in 13 cycles, so give it a cost of 30 + 4.  */
-    34, /* reduc_f16_cost  */
-    /* Likewise for 7 scalar FADDs (~6 cycles) vs. 11: 14 + 5.  */
-    19, /* reduc_f32_cost  */
-    /* Likewise for 3 scalar FADDs (~4 cycles) vs. 9: 6 + 5.  */
-    11, /* reduc_f64_cost  */
-    2, /* store_elt_extra_cost  */
-    /* This value is just inherited from the Cortex-A57 table.  */
-    8, /* vec_to_scalar_cost  */
-    /* See the comment above the Advanced SIMD versions.  */
-    4, /* scalar_to_vec_cost  */
-    4, /* align_load_cost  */
-    4, /* unalign_load_cost  */
-    /* Although stores have a latency of 2 and compete for the
-       vector pipes, in practice it's better not to model that.  */
-    1, /* unalign_store_cost  */
-    1  /* store_cost  */
-  },
-  3, /* clast_cost  */
-  19, /* fadda_f16_cost  */
-  11, /* fadda_f32_cost  */
-  8, /* fadda_f64_cost  */
-  32, /* gather_load_x32_cost  */
-  16, /* gather_load_x64_cost  */
-  3 /* scatter_store_elt_cost  */
-};
-
-static const aarch64_scalar_vec_issue_info neoversev1_scalar_issue_info =
-{
-  3, /* loads_stores_per_cycle  */
-  2, /* stores_per_cycle  */
-  4, /* general_ops_per_cycle  */
-  0, /* fp_simd_load_general_ops  */
-  1 /* fp_simd_store_general_ops  */
-};
-
-static const aarch64_advsimd_vec_issue_info neoversev1_advsimd_issue_info =
-{
-  {
-    3, /* loads_stores_per_cycle  */
-    2, /* stores_per_cycle  */
-    4, /* general_ops_per_cycle  */
-    0, /* fp_simd_load_general_ops  */
-    1 /* fp_simd_store_general_ops  */
-  },
-  2, /* ld2_st2_general_ops  */
-  2, /* ld3_st3_general_ops  */
-  3 /* ld4_st4_general_ops  */
-};
-
-static const aarch64_sve_vec_issue_info neoversev1_sve_issue_info =
-{
-  {
-    {
-      2, /* loads_per_cycle  */
-      2, /* stores_per_cycle  */
-      2, /* general_ops_per_cycle  */
-      0, /* fp_simd_load_general_ops  */
-      1 /* fp_simd_store_general_ops  */
-    },
-    2, /* ld2_st2_general_ops  */
-    2, /* ld3_st3_general_ops  */
-    3 /* ld4_st4_general_ops  */
-  },
-  1, /* pred_ops_per_cycle  */
-  2, /* while_pred_ops  */
-  2, /* int_cmp_pred_ops  */
-  1, /* fp_cmp_pred_ops  */
-  1, /* gather_scatter_pair_general_ops  */
-  1 /* gather_scatter_pair_pred_ops  */
-};
-
-static const aarch64_vec_issue_info neoversev1_vec_issue_info =
-{
-  &neoversev1_scalar_issue_info,
-  &neoversev1_advsimd_issue_info,
-  &neoversev1_sve_issue_info
-};
-
-/* Neoverse V1 costs for vector insn classes.  */
-static const struct cpu_vector_cost neoversev1_vector_cost =
-{
-  1, /* scalar_int_stmt_cost  */
-  2, /* scalar_fp_stmt_cost  */
-  4, /* scalar_load_cost  */
-  1, /* scalar_store_cost  */
-  1, /* cond_taken_branch_cost  */
-  1, /* cond_not_taken_branch_cost  */
-  &neoversev1_advsimd_vector_cost, /* advsimd  */
-  &neoversev1_sve_vector_cost, /* sve  */
-  &neoversev1_vec_issue_info /* issue_info  */
-};
-
-static const struct tune_params neoversev1_tunings =
-{
-  &cortexa76_extra_costs,
-  &neoversev1_addrcost_table,
-  &neoversev1_regmove_cost,
-  &neoversev1_vector_cost,
-  &generic_branch_cost,
-  &generic_approx_modes,
-  SVE_256, /* sve_width  */
-  { 4, /* load_int.  */
-    2, /* store_int.  */
-    6, /* load_fp.  */
-    2, /* store_fp.  */
-    6, /* load_pred.  */
-    1 /* store_pred.  */
-  }, /* memmov_cost.  */
-  3, /* issue_rate  */
-  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops  */
-  "32:16",	/* function_align.  */
-  "4",		/* jump_align.  */
-  "32:16",	/* loop_align.  */
-  2,	/* int_reassoc_width.  */
-  4,	/* fp_reassoc_width.  */
-  4,	/* fma_reassoc_width.  */
-  2,	/* vec_reassoc_width.  */
-  2,	/* min_div_recip_mul_sf.  */
-  2,	/* min_div_recip_mul_df.  */
-  0,	/* max_case_values.  */
-  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
-  (AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
-   | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
-   | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT
-   | AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND),	/* tune_flags.  */
-  &generic_prefetch_tune,
-  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
-  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
-};
-
-static const sve_vec_cost neoverse512tvb_sve_vector_cost =
-{
-  {
-    2, /* int_stmt_cost  */
-    2, /* fp_stmt_cost  */
-    4, /* ld2_st2_permute_cost  */
-    5, /* ld3_st3_permute_cost  */
-    5, /* ld4_st4_permute_cost  */
-    3, /* permute_cost  */
-    /* Theoretically, a reduction involving 15 scalar ADDs could
-       complete in ~5 cycles and would have a cost of 15.  Assume that
-       [SU]ADDV completes in 11 cycles and so give it a cost of 15 + 6.  */
-    21, /* reduc_i8_cost  */
-    /* Likewise for 7 scalar ADDs (~3 cycles) vs. 9: 7 + 6.  */
-    13, /* reduc_i16_cost  */
-    /* Likewise for 3 scalar ADDs (~2 cycles) vs. 8: 3 + 6.  */
-    9, /* reduc_i32_cost  */
-    /* Likewise for 1 scalar ADD (1 cycle) vs. 8: 1 + 7.  */
-    8, /* reduc_i64_cost  */
-    /* Theoretically, a reduction involving 7 scalar FADDs could
-       complete in ~6 cycles and would have a cost of 14.  Assume that
-       FADDV completes in 8 cycles and so give it a cost of 14 + 2.  */
-    16, /* reduc_f16_cost  */
-    /* Likewise for 3 scalar FADDs (~4 cycles) vs. 6: 6 + 2.  */
-    8, /* reduc_f32_cost  */
-    /* Likewise for 1 scalar FADD (2 cycles) vs. 4: 2 + 2.  */
-    4, /* reduc_f64_cost  */
-    2, /* store_elt_extra_cost  */
-    /* This value is just inherited from the Cortex-A57 table.  */
-    8, /* vec_to_scalar_cost  */
-    /* This depends very much on what the scalar value is and
-       where it comes from.  E.g. some constants take two dependent
-       instructions or a load, while others might be moved from a GPR.
-       4 seems to be a reasonable compromise in practice.  */
-    4, /* scalar_to_vec_cost  */
-    4, /* align_load_cost  */
-    4, /* unalign_load_cost  */
-    /* Although stores generally have a latency of 2 and compete for the
-       vector pipes, in practice it's better not to model that.  */
-    1, /* unalign_store_cost  */
-    1  /* store_cost  */
-  },
-  3, /* clast_cost  */
-  10, /* fadda_f16_cost  */
-  6, /* fadda_f32_cost  */
-  4, /* fadda_f64_cost  */
-  /* A strided Advanced SIMD x64 load would take two parallel FP loads
-     (6 cycles) plus an insertion (2 cycles).  Assume a 64-bit SVE gather
-     is 1 cycle more.  The Advanced SIMD version is costed as 2 scalar loads
-     (cost 8) and a vec_construct (cost 2).  Add a full vector operation
-     (cost 2) to that, to avoid the difference being lost in rounding.
-
-     There is no easy comparison between a strided Advanced SIMD x32 load
-     and an SVE 32-bit gather, but cost an SVE 32-bit gather as 1 vector
-     operation more than a 64-bit gather.  */
-  14, /* gather_load_x32_cost  */
-  12, /* gather_load_x64_cost  */
-  3 /* scatter_store_elt_cost  */
-};
-
-static const aarch64_sve_vec_issue_info neoverse512tvb_sve_issue_info =
-{
-  {
-    {
-      3, /* loads_per_cycle  */
-      2, /* stores_per_cycle  */
-      4, /* general_ops_per_cycle  */
-      0, /* fp_simd_load_general_ops  */
-      1 /* fp_simd_store_general_ops  */
-    },
-    2, /* ld2_st2_general_ops  */
-    2, /* ld3_st3_general_ops  */
-    3 /* ld4_st4_general_ops  */
-  },
-  2, /* pred_ops_per_cycle  */
-  2, /* while_pred_ops  */
-  2, /* int_cmp_pred_ops  */
-  1, /* fp_cmp_pred_ops  */
-  1, /* gather_scatter_pair_general_ops  */
-  1 /* gather_scatter_pair_pred_ops  */
-};
-
-static const aarch64_vec_issue_info neoverse512tvb_vec_issue_info =
-{
-  &neoversev1_scalar_issue_info,
-  &neoversev1_advsimd_issue_info,
-  &neoverse512tvb_sve_issue_info
-};
-
-static const struct cpu_vector_cost neoverse512tvb_vector_cost =
-{
-  1, /* scalar_int_stmt_cost  */
-  2, /* scalar_fp_stmt_cost  */
-  4, /* scalar_load_cost  */
-  1, /* scalar_store_cost  */
-  1, /* cond_taken_branch_cost  */
-  1, /* cond_not_taken_branch_cost  */
-  &neoversev1_advsimd_vector_cost, /* advsimd  */
-  &neoverse512tvb_sve_vector_cost, /* sve  */
-  &neoverse512tvb_vec_issue_info /* issue_info  */
-};
-
-static const struct tune_params neoverse512tvb_tunings =
-{
-  &cortexa76_extra_costs,
-  &neoversev1_addrcost_table,
-  &neoversev1_regmove_cost,
-  &neoverse512tvb_vector_cost,
-  &generic_branch_cost,
-  &generic_approx_modes,
-  SVE_128 | SVE_256, /* sve_width  */
-  { 4, /* load_int.  */
-    2, /* store_int.  */
-    6, /* load_fp.  */
-    2, /* store_fp.  */
-    6, /* load_pred.  */
-    1 /* store_pred.  */
-  }, /* memmov_cost.  */
-  3, /* issue_rate  */
-  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops  */
-  "32:16",	/* function_align.  */
-  "4",		/* jump_align.  */
-  "32:16",	/* loop_align.  */
-  2,	/* int_reassoc_width.  */
-  4,	/* fp_reassoc_width.  */
-  4,	/* fma_reassoc_width.  */
-  2,	/* vec_reassoc_width.  */
-  2,	/* min_div_recip_mul_sf.  */
-  2,	/* min_div_recip_mul_df.  */
-  0,	/* max_case_values.  */
-  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
-  (AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
-   | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
-   | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT),	/* tune_flags.  */
-  &generic_prefetch_tune,
-  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
-  AARCH64_LDP_STP_POLICY_ALWAYS	   /* stp_policy_model.  */
-};
-
-static const advsimd_vec_cost neoversen2_advsimd_vector_cost =
-{
-  2, /* int_stmt_cost  */
-  2, /* fp_stmt_cost  */
-  2, /* ld2_st2_permute_cost */
-  2, /* ld3_st3_permute_cost  */
-  3, /* ld4_st4_permute_cost  */
-  3, /* permute_cost  */
-  4, /* reduc_i8_cost  */
-  4, /* reduc_i16_cost  */
-  2, /* reduc_i32_cost  */
-  2, /* reduc_i64_cost  */
-  6, /* reduc_f16_cost  */
-  4, /* reduc_f32_cost  */
-  2, /* reduc_f64_cost  */
-  2, /* store_elt_extra_cost  */
-  /* This value is just inherited from the Cortex-A57 table.  */
-  8, /* vec_to_scalar_cost  */
-  /* This depends very much on what the scalar value is and
-     where it comes from.  E.g. some constants take two dependent
-     instructions or a load, while others might be moved from a GPR.
-     4 seems to be a reasonable compromise in practice.  */
-  4, /* scalar_to_vec_cost  */
-  4, /* align_load_cost  */
-  4, /* unalign_load_cost  */
-  /* Although stores have a latency of 2 and compete for the
-     vector pipes, in practice it's better not to model that.  */
-  1, /* unalign_store_cost  */
-  1  /* store_cost  */
-};
-
-static const sve_vec_cost neoversen2_sve_vector_cost =
-{
-  {
-    2, /* int_stmt_cost  */
-    2, /* fp_stmt_cost  */
-    3, /* ld2_st2_permute_cost  */
-    4, /* ld3_st3_permute_cost  */
-    4, /* ld4_st4_permute_cost  */
-    3, /* permute_cost  */
-    /* Theoretically, a reduction involving 15 scalar ADDs could
-       complete in ~5 cycles and would have a cost of 15.  [SU]ADDV
-       completes in 11 cycles, so give it a cost of 15 + 6.  */
-    21, /* reduc_i8_cost  */
-    /* Likewise for 7 scalar ADDs (~3 cycles) vs. 9: 7 + 6.  */
-    13, /* reduc_i16_cost  */
-    /* Likewise for 3 scalar ADDs (~2 cycles) vs. 8: 3 + 6.  */
-    9, /* reduc_i32_cost  */
-    /* Likewise for 1 scalar ADD (~1 cycles) vs. 2: 1 + 1.  */
-    2, /* reduc_i64_cost  */
-    /* Theoretically, a reduction involving 7 scalar FADDs could
-       complete in ~8 cycles and would have a cost of 14.  FADDV
-       completes in 6 cycles, so give it a cost of 14 - 2.  */
-    12, /* reduc_f16_cost  */
-    /* Likewise for 3 scalar FADDs (~4 cycles) vs. 4: 6 - 0.  */
-    6, /* reduc_f32_cost  */
-    /* Likewise for 1 scalar FADD (~2 cycles) vs. 2: 2 - 0.  */
-    2, /* reduc_f64_cost  */
-    2, /* store_elt_extra_cost  */
-    /* This value is just inherited from the Cortex-A57 table.  */
-    8, /* vec_to_scalar_cost  */
-    /* See the comment above the Advanced SIMD versions.  */
-    4, /* scalar_to_vec_cost  */
-    4, /* align_load_cost  */
-    4, /* unalign_load_cost  */
-    /* Although stores have a latency of 2 and compete for the
-       vector pipes, in practice it's better not to model that.  */
-    1, /* unalign_store_cost  */
-    1  /* store_cost  */
-  },
-  3, /* clast_cost  */
-  10, /* fadda_f16_cost  */
-  6, /* fadda_f32_cost  */
-  4, /* fadda_f64_cost  */
-  /* A strided Advanced SIMD x64 load would take two parallel FP loads
-     (8 cycles) plus an insertion (2 cycles).  Assume a 64-bit SVE gather
-     is 1 cycle more.  The Advanced SIMD version is costed as 2 scalar loads
-     (cost 8) and a vec_construct (cost 2).  Add a full vector operation
-     (cost 2) to that, to avoid the difference being lost in rounding.
-
-     There is no easy comparison between a strided Advanced SIMD x32 load
-     and an SVE 32-bit gather, but cost an SVE 32-bit gather as 1 vector
-     operation more than a 64-bit gather.  */
-  14, /* gather_load_x32_cost  */
-  12, /* gather_load_x64_cost  */
-  3 /* scatter_store_elt_cost  */
-};
-
-static const aarch64_scalar_vec_issue_info neoversen2_scalar_issue_info =
-{
-  3, /* loads_stores_per_cycle  */
-  2, /* stores_per_cycle  */
-  4, /* general_ops_per_cycle  */
-  0, /* fp_simd_load_general_ops  */
-  1 /* fp_simd_store_general_ops  */
-};
-
-static const aarch64_advsimd_vec_issue_info neoversen2_advsimd_issue_info =
-{
-  {
-    3, /* loads_stores_per_cycle  */
-    2, /* stores_per_cycle  */
-    2, /* general_ops_per_cycle  */
-    0, /* fp_simd_load_general_ops  */
-    1 /* fp_simd_store_general_ops  */
-  },
-  2, /* ld2_st2_general_ops  */
-  2, /* ld3_st3_general_ops  */
-  3 /* ld4_st4_general_ops  */
-};
-
-static const aarch64_sve_vec_issue_info neoversen2_sve_issue_info =
-{
-  {
-    {
-      3, /* loads_per_cycle  */
-      2, /* stores_per_cycle  */
-      2, /* general_ops_per_cycle  */
-      0, /* fp_simd_load_general_ops  */
-      1 /* fp_simd_store_general_ops  */
-    },
-    2, /* ld2_st2_general_ops  */
-    3, /* ld3_st3_general_ops  */
-    3 /* ld4_st4_general_ops  */
-  },
-  2, /* pred_ops_per_cycle  */
-  2, /* while_pred_ops  */
-  2, /* int_cmp_pred_ops  */
-  1, /* fp_cmp_pred_ops  */
-  1, /* gather_scatter_pair_general_ops  */
-  1 /* gather_scatter_pair_pred_ops  */
-};
-
-static const aarch64_vec_issue_info neoversen2_vec_issue_info =
-{
-  &neoversen2_scalar_issue_info,
-  &neoversen2_advsimd_issue_info,
-  &neoversen2_sve_issue_info
-};
-
-/* Neoverse N2 costs for vector insn classes.  */
-static const struct cpu_vector_cost neoversen2_vector_cost =
-{
-  1, /* scalar_int_stmt_cost  */
-  2, /* scalar_fp_stmt_cost  */
-  4, /* scalar_load_cost  */
-  1, /* scalar_store_cost  */
-  1, /* cond_taken_branch_cost  */
-  1, /* cond_not_taken_branch_cost  */
-  &neoversen2_advsimd_vector_cost, /* advsimd  */
-  &neoversen2_sve_vector_cost, /* sve  */
-  &neoversen2_vec_issue_info /* issue_info  */
-};
-
-static const struct tune_params neoversen2_tunings =
-{
-  &cortexa76_extra_costs,
-  &neoversen2_addrcost_table,
-  &neoversen2_regmove_cost,
-  &neoversen2_vector_cost,
-  &generic_branch_cost,
-  &generic_approx_modes,
-  SVE_128, /* sve_width  */
-  { 4, /* load_int.  */
-    1, /* store_int.  */
-    6, /* load_fp.  */
-    2, /* store_fp.  */
-    6, /* load_pred.  */
-    1 /* store_pred.  */
-  }, /* memmov_cost.  */
-  3, /* issue_rate  */
-  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops  */
-  "32:16",	/* function_align.  */
-  "4",		/* jump_align.  */
-  "32:16",	/* loop_align.  */
-  2,	/* int_reassoc_width.  */
-  4,	/* fp_reassoc_width.  */
-  1,	/* fma_reassoc_width.  */
-  2,	/* vec_reassoc_width.  */
-  2,	/* min_div_recip_mul_sf.  */
-  2,	/* min_div_recip_mul_df.  */
-  0,	/* max_case_values.  */
-  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
-  (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND
-   | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
-   | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
-   | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT),	/* tune_flags.  */
-  &generic_prefetch_tune,
-  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
-  AARCH64_LDP_STP_POLICY_ALWAYS	   /* stp_policy_model.  */
-};
-
-static const advsimd_vec_cost neoversev2_advsimd_vector_cost =
-{
-  2, /* int_stmt_cost  */
-  2, /* fp_stmt_cost  */
-  2, /* ld2_st2_permute_cost */
-  2, /* ld3_st3_permute_cost  */
-  3, /* ld4_st4_permute_cost  */
-  3, /* permute_cost  */
-  4, /* reduc_i8_cost  */
-  4, /* reduc_i16_cost  */
-  2, /* reduc_i32_cost  */
-  2, /* reduc_i64_cost  */
-  6, /* reduc_f16_cost  */
-  3, /* reduc_f32_cost  */
-  2, /* reduc_f64_cost  */
-  2, /* store_elt_extra_cost  */
-  /* This value is just inherited from the Cortex-A57 table.  */
-  8, /* vec_to_scalar_cost  */
-  /* This depends very much on what the scalar value is and
-     where it comes from.  E.g. some constants take two dependent
-     instructions or a load, while others might be moved from a GPR.
-     4 seems to be a reasonable compromise in practice.  */
-  4, /* scalar_to_vec_cost  */
-  4, /* align_load_cost  */
-  4, /* unalign_load_cost  */
-  /* Although stores have a latency of 2 and compete for the
-     vector pipes, in practice it's better not to model that.  */
-  1, /* unalign_store_cost  */
-  1  /* store_cost  */
-};
-
-static const sve_vec_cost neoversev2_sve_vector_cost =
-{
-  {
-    2, /* int_stmt_cost  */
-    2, /* fp_stmt_cost  */
-    3, /* ld2_st2_permute_cost  */
-    3, /* ld3_st3_permute_cost  */
-    4, /* ld4_st4_permute_cost  */
-    3, /* permute_cost  */
-    /* Theoretically, a reduction involving 15 scalar ADDs could
-       complete in ~3 cycles and would have a cost of 15.  [SU]ADDV
-       completes in 11 cycles, so give it a cost of 15 + 8.  */
-    21, /* reduc_i8_cost  */
-    /* Likewise for 7 scalar ADDs (~2 cycles) vs. 9: 7 + 7.  */
-    14, /* reduc_i16_cost  */
-    /* Likewise for 3 scalar ADDs (~2 cycles) vs. 8: 3 + 4.  */
-    7, /* reduc_i32_cost  */
-    /* Likewise for 1 scalar ADD (~1 cycles) vs. 2: 1 + 1.  */
-    2, /* reduc_i64_cost  */
-    /* Theoretically, a reduction involving 7 scalar FADDs could
-       complete in ~6 cycles and would have a cost of 14.  FADDV
-       completes in 8 cycles, so give it a cost of 14 + 2.  */
-    16, /* reduc_f16_cost  */
-    /* Likewise for 3 scalar FADDs (~4 cycles) vs. 6: 6 + 2.  */
-    8, /* reduc_f32_cost  */
-    /* Likewise for 1 scalar FADD (~2 cycles) vs. 4: 2 + 2.  */
-    4, /* reduc_f64_cost  */
-    2, /* store_elt_extra_cost  */
-    /* This value is just inherited from the Cortex-A57 table.  */
-    8, /* vec_to_scalar_cost  */
-    /* See the comment above the Advanced SIMD versions.  */
-    4, /* scalar_to_vec_cost  */
-    4, /* align_load_cost  */
-    4, /* unalign_load_cost  */
-    /* Although stores have a latency of 2 and compete for the
-       vector pipes, in practice it's better not to model that.  */
-    1, /* unalign_store_cost  */
-    1  /* store_cost  */
-  },
-  3, /* clast_cost  */
-  10, /* fadda_f16_cost  */
-  6, /* fadda_f32_cost  */
-  4, /* fadda_f64_cost  */
-  /* A strided Advanced SIMD x64 load would take two parallel FP loads
-     (8 cycles) plus an insertion (2 cycles).  Assume a 64-bit SVE gather
-     is 1 cycle more.  The Advanced SIMD version is costed as 2 scalar loads
-     (cost 8) and a vec_construct (cost 2).  Add a full vector operation
-     (cost 2) to that, to avoid the difference being lost in rounding.
-
-     There is no easy comparison between a strided Advanced SIMD x32 load
-     and an SVE 32-bit gather, but cost an SVE 32-bit gather as 1 vector
-     operation more than a 64-bit gather.  */
-  14, /* gather_load_x32_cost  */
-  12, /* gather_load_x64_cost  */
-  3 /* scatter_store_elt_cost  */
-};
-
-static const aarch64_scalar_vec_issue_info neoversev2_scalar_issue_info =
-{
-  3, /* loads_stores_per_cycle  */
-  2, /* stores_per_cycle  */
-  6, /* general_ops_per_cycle  */
-  0, /* fp_simd_load_general_ops  */
-  1 /* fp_simd_store_general_ops  */
-};
-
-static const aarch64_advsimd_vec_issue_info neoversev2_advsimd_issue_info =
-{
-  {
-    3, /* loads_stores_per_cycle  */
-    2, /* stores_per_cycle  */
-    4, /* general_ops_per_cycle  */
-    0, /* fp_simd_load_general_ops  */
-    1 /* fp_simd_store_general_ops  */
-  },
-  2, /* ld2_st2_general_ops  */
-  2, /* ld3_st3_general_ops  */
-  3 /* ld4_st4_general_ops  */
-};
-
-static const aarch64_sve_vec_issue_info neoversev2_sve_issue_info =
-{
-  {
-    {
-      3, /* loads_per_cycle  */
-      2, /* stores_per_cycle  */
-      4, /* general_ops_per_cycle  */
-      0, /* fp_simd_load_general_ops  */
-      1 /* fp_simd_store_general_ops  */
-    },
-    2, /* ld2_st2_general_ops  */
-    3, /* ld3_st3_general_ops  */
-    3 /* ld4_st4_general_ops  */
-  },
-  2, /* pred_ops_per_cycle  */
-  2, /* while_pred_ops  */
-  2, /* int_cmp_pred_ops  */
-  1, /* fp_cmp_pred_ops  */
-  1, /* gather_scatter_pair_general_ops  */
-  1 /* gather_scatter_pair_pred_ops  */
-};
-
-static const aarch64_vec_issue_info neoversev2_vec_issue_info =
-{
-  &neoversev2_scalar_issue_info,
-  &neoversev2_advsimd_issue_info,
-  &neoversev2_sve_issue_info
-};
-
-/* Demeter costs for vector insn classes.  */
-static const struct cpu_vector_cost neoversev2_vector_cost =
-{
-  1, /* scalar_int_stmt_cost  */
-  2, /* scalar_fp_stmt_cost  */
-  4, /* scalar_load_cost  */
-  1, /* scalar_store_cost  */
-  1, /* cond_taken_branch_cost  */
-  1, /* cond_not_taken_branch_cost  */
-  &neoversev2_advsimd_vector_cost, /* advsimd  */
-  &neoversev2_sve_vector_cost, /* sve  */
-  &neoversev2_vec_issue_info /* issue_info  */
-};
-
-static const struct tune_params neoversev2_tunings =
-{
-  &cortexa76_extra_costs,
-  &neoversev2_addrcost_table,
-  &neoversev2_regmove_cost,
-  &neoversev2_vector_cost,
-  &generic_branch_cost,
-  &generic_approx_modes,
-  SVE_128, /* sve_width  */
-  { 4, /* load_int.  */
-    2, /* store_int.  */
-    6, /* load_fp.  */
-    1, /* store_fp.  */
-    6, /* load_pred.  */
-    2 /* store_pred.  */
-  }, /* memmov_cost.  */
-  5, /* issue_rate  */
-  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops  */
-  "32:16",	/* function_align.  */
-  "4",		/* jump_align.  */
-  "32:16",	/* loop_align.  */
-  3,	/* int_reassoc_width.  */
-  6,	/* fp_reassoc_width.  */
-  4,	/* fma_reassoc_width.  */
-  3,	/* vec_reassoc_width.  */
-  2,	/* min_div_recip_mul_sf.  */
-  2,	/* min_div_recip_mul_df.  */
-  0,	/* max_case_values.  */
-  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
-  (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND
-   | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
-   | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
-   | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT),	/* tune_flags.  */
-  &generic_prefetch_tune,
-  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
-  AARCH64_LDP_STP_POLICY_ALWAYS	   /* stp_policy_model.  */
-};
-
-static const struct tune_params a64fx_tunings =
-{
-  &a64fx_extra_costs,
-  &a64fx_addrcost_table,
-  &a64fx_regmove_cost,
-  &a64fx_vector_cost,
-  &generic_branch_cost,
-  &generic_approx_modes,
-  SVE_512, /* sve_width  */
-  { 4, /* load_int.  */
-    4, /* store_int.  */
-    4, /* load_fp.  */
-    4, /* store_fp.  */
-    4, /* load_pred.  */
-    4 /* store_pred.  */
-  }, /* memmov_cost.  */
-  7, /* issue_rate  */
-  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops  */
-  "32",	/* function_align.  */
-  "16",	/* jump_align.  */
-  "32",	/* loop_align.  */
-  4,	/* int_reassoc_width.  */
-  2,	/* fp_reassoc_width.  */
-  1,	/* fma_reassoc_width.  */
-  2,	/* vec_reassoc_width.  */
-  2,	/* min_div_recip_mul_sf.  */
-  2,	/* min_div_recip_mul_df.  */
-  0,	/* max_case_values.  */
-  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
-  (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
-  &a64fx_prefetch_tune,
-  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
-  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
-};
+#include "tuning_models/generic.h"
+#include "tuning_models/cortexa35.h"
+#include "tuning_models/cortexa53.h"
+#include "tuning_models/cortexa57.h"
+#include "tuning_models/cortexa72.h"
+#include "tuning_models/cortexa73.h"
+#include "tuning_models/exynosm1.h"
+#include "tuning_models/thunderxt88.h"
+#include "tuning_models/thunderx.h"
+#include "tuning_models/tsv110.h"
+#include "tuning_models/xgene1.h"
+#include "tuning_models/emag.h"
+#include "tuning_models/qdf24xx.h"
+#include "tuning_models/saphira.h"
+#include "tuning_models/thunderx2t99.h"
+#include "tuning_models/thunderx3t110.h"
+#include "tuning_models/neoversen1.h"
+#include "tuning_models/ampere1.h"
+#include "tuning_models/ampere1a.h"
+#include "tuning_models/neoversev1.h"
+#include "tuning_models/neoverse512tvb.h"
+#include "tuning_models/neoversen2.h"
+#include "tuning_models/neoversev2.h"
+#include "tuning_models/a64fx.h"
 
 /* Support for fine-grained override of the tuning structures.  */
 struct aarch64_tuning_override_function
diff --git a/gcc/config/aarch64/tuning_models/a64fx.h b/gcc/config/aarch64/tuning_models/a64fx.h
new file mode 100644
index 0000000000000000000000000000000000000000..7b06c27eba1e4de01738bdfdc077460f9135fb41
--- /dev/null
+++ b/gcc/config/aarch64/tuning_models/a64fx.h
@@ -0,0 +1,169 @@
+/* Tuning model description for AArch64 architecture.
+   Copyright (C) 2009-2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_AARCH64_H_A64FX
+#define GCC_AARCH64_H_A64FX
+
+#include "generic.h"
+
+static const struct cpu_addrcost_table a64fx_addrcost_table =
+{
+    {
+      1, /* hi  */
+      1, /* si  */
+      1, /* di  */
+      2, /* ti  */
+    },
+  0, /* pre_modify  */
+  0, /* post_modify  */
+  0, /* post_modify_ld3_st3  */
+  0, /* post_modify_ld4_st4  */
+  2, /* register_offset  */
+  3, /* register_sextend  */
+  3, /* register_zextend  */
+  0, /* imm_offset  */
+};
+
+static const struct cpu_regmove_cost a64fx_regmove_cost =
+{
+  1, /* GP2GP  */
+  /* Avoid the use of slow int<->fp moves for spilling by setting
+     their cost higher than memmov_cost.  */
+  5, /* GP2FP  */
+  7, /* FP2GP  */
+  2 /* FP2FP  */
+};
+
+static const advsimd_vec_cost a64fx_advsimd_vector_cost =
+{
+  2, /* int_stmt_cost  */
+  5, /* fp_stmt_cost  */
+  0, /* ld2_st2_permute_cost  */
+  0, /* ld3_st3_permute_cost  */
+  0, /* ld4_st4_permute_cost  */
+  3, /* permute_cost  */
+  13, /* reduc_i8_cost  */
+  13, /* reduc_i16_cost  */
+  13, /* reduc_i32_cost  */
+  13, /* reduc_i64_cost  */
+  13, /* reduc_f16_cost  */
+  13, /* reduc_f32_cost  */
+  13, /* reduc_f64_cost  */
+  13, /* store_elt_extra_cost  */
+  13, /* vec_to_scalar_cost  */
+  4, /* scalar_to_vec_cost  */
+  6, /* align_load_cost  */
+  6, /* unalign_load_cost  */
+  1, /* unalign_store_cost  */
+  1  /* store_cost  */
+};
+
+static const sve_vec_cost a64fx_sve_vector_cost =
+{
+  {
+    2, /* int_stmt_cost  */
+    5, /* fp_stmt_cost  */
+    0, /* ld2_st2_permute_cost  */
+    0, /* ld3_st3_permute_cost  */
+    0, /* ld4_st4_permute_cost  */
+    3, /* permute_cost  */
+    13, /* reduc_i8_cost  */
+    13, /* reduc_i16_cost  */
+    13, /* reduc_i32_cost  */
+    13, /* reduc_i64_cost  */
+    13, /* reduc_f16_cost  */
+    13, /* reduc_f32_cost  */
+    13, /* reduc_f64_cost  */
+    13, /* store_elt_extra_cost  */
+    13, /* vec_to_scalar_cost  */
+    4, /* scalar_to_vec_cost  */
+    6, /* align_load_cost  */
+    6, /* unalign_load_cost  */
+    1, /* unalign_store_cost  */
+    1  /* store_cost  */
+  },
+  13, /* clast_cost  */
+  13, /* fadda_f16_cost  */
+  13, /* fadda_f32_cost  */
+  13, /* fadda_f64_cost  */
+  64, /* gather_load_x32_cost  */
+  32, /* gather_load_x64_cost  */
+  1 /* scatter_store_elt_cost  */
+};
+
+static const struct cpu_vector_cost a64fx_vector_cost =
+{
+  1, /* scalar_int_stmt_cost  */
+  5, /* scalar_fp_stmt_cost  */
+  4, /* scalar_load_cost  */
+  1, /* scalar_store_cost  */
+  3, /* cond_taken_branch_cost  */
+  1, /* cond_not_taken_branch_cost  */
+  &a64fx_advsimd_vector_cost, /* advsimd  */
+  &a64fx_sve_vector_cost, /* sve  */
+  nullptr /* issue_info  */
+};
+
+static const cpu_prefetch_tune a64fx_prefetch_tune =
+{
+  8,			/* num_slots  */
+  64,			/* l1_cache_size  */
+  256,			/* l1_cache_line_size  */
+  32768,		/* l2_cache_size  */
+  true,			/* prefetch_dynamic_strides */
+  -1,			/* minimum_stride */
+  -1			/* default_opt_level  */
+};
+
+static const struct tune_params a64fx_tunings =
+{
+  &a64fx_extra_costs,
+  &a64fx_addrcost_table,
+  &a64fx_regmove_cost,
+  &a64fx_vector_cost,
+  &generic_branch_cost,
+  &generic_approx_modes,
+  SVE_512, /* sve_width  */
+  { 4, /* load_int.  */
+    4, /* store_int.  */
+    4, /* load_fp.  */
+    4, /* store_fp.  */
+    4, /* load_pred.  */
+    4 /* store_pred.  */
+  }, /* memmov_cost.  */
+  7, /* issue_rate  */
+  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops  */
+  "32",	/* function_align.  */
+  "16",	/* jump_align.  */
+  "32",	/* loop_align.  */
+  4,	/* int_reassoc_width.  */
+  2,	/* fp_reassoc_width.  */
+  1,	/* fma_reassoc_width.  */
+  2,	/* vec_reassoc_width.  */
+  2,	/* min_div_recip_mul_sf.  */
+  2,	/* min_div_recip_mul_df.  */
+  0,	/* max_case_values.  */
+  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
+  (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
+  &a64fx_prefetch_tune,
+  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
+  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
+};
+
+#endif /* GCC_AARCH64_H_A64FX.  */
diff --git a/gcc/config/aarch64/tuning_models/ampere1.h b/gcc/config/aarch64/tuning_models/ampere1.h
new file mode 100644
index 0000000000000000000000000000000000000000..8d2a1c696103259f23cf73df26cef9d4fa05ac73
--- /dev/null
+++ b/gcc/config/aarch64/tuning_models/ampere1.h
@@ -0,0 +1,113 @@
+/* Tuning model description for AArch64 architecture.
+   Copyright (C) 2009-2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_AARCH64_H_AMPERE1
+#define GCC_AARCH64_H_AMPERE1
+
+#include "generic.h"
+
+static const advsimd_vec_cost ampere1_advsimd_vector_cost =
+{
+  1, /* int_stmt_cost  */
+  3, /* fp_stmt_cost  */
+  0, /* ld2_st2_permute_cost  */
+  0, /* ld3_st3_permute_cost  */
+  0, /* ld4_st4_permute_cost  */
+  2, /* permute_cost  */
+  12, /* reduc_i8_cost  */
+  9, /* reduc_i16_cost  */
+  6, /* reduc_i32_cost  */
+  5, /* reduc_i64_cost  */
+  9, /* reduc_f16_cost  */
+  6, /* reduc_f32_cost  */
+  5, /* reduc_f64_cost  */
+  8, /* store_elt_extra_cost  */
+  6, /* vec_to_scalar_cost  */
+  7, /* scalar_to_vec_cost  */
+  4, /* align_load_cost  */
+  4, /* unalign_load_cost  */
+  1, /* unalign_store_cost  */
+  1  /* store_cost  */
+};
+
+/* Ampere-1 costs for vector insn classes.  */
+static const struct cpu_vector_cost ampere1_vector_cost =
+{
+  1, /* scalar_int_stmt_cost  */
+  3, /* scalar_fp_stmt_cost  */
+  4, /* scalar_load_cost  */
+  1, /* scalar_store_cost  */
+  1, /* cond_taken_branch_cost  */
+  1, /* cond_not_taken_branch_cost  */
+  &ampere1_advsimd_vector_cost, /* advsimd  */
+  nullptr, /* sve  */
+  nullptr  /* issue_info  */
+};
+
+static const cpu_prefetch_tune ampere1_prefetch_tune =
+{
+  0,			/* num_slots  */
+  64,			/* l1_cache_size  */
+  64,			/* l1_cache_line_size  */
+  2048,			/* l2_cache_size  */
+  true,			/* prefetch_dynamic_strides */
+  -1,			/* minimum_stride */
+  -1			/* default_opt_level  */
+};
+
+static const struct tune_params ampere1_tunings =
+{
+  &ampere1_extra_costs,
+  &generic_addrcost_table,
+  &generic_regmove_cost,
+  &ampere1_vector_cost,
+  &generic_branch_cost,
+  &generic_approx_modes,
+  SVE_NOT_IMPLEMENTED, /* sve_width  */
+  { 4, /* load_int.  */
+    4, /* store_int.  */
+    4, /* load_fp.  */
+    4, /* store_fp.  */
+    4, /* load_pred.  */
+    4 /* store_pred.  */
+  }, /* memmov_cost.  */
+  4, /* issue_rate  */
+  (AARCH64_FUSE_ADRP_ADD | AARCH64_FUSE_AES_AESMC |
+   AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_MOVK_MOVK |
+   AARCH64_FUSE_ALU_BRANCH /* adds, ands, bics, ccmp, ccmn */ |
+   AARCH64_FUSE_CMP_BRANCH),
+  /* fusible_ops  */
+  "32",		/* function_align.  */
+  "4",		/* jump_align.  */
+  "32:16",	/* loop_align.  */
+  2,	/* int_reassoc_width.  */
+  4,	/* fp_reassoc_width.  */
+  4,	/* fma_reassoc_width.  */
+  2,	/* vec_reassoc_width.  */
+  2,	/* min_div_recip_mul_sf.  */
+  2,	/* min_div_recip_mul_df.  */
+  0,	/* max_case_values.  */
+  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
+  (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
+  &ampere1_prefetch_tune,
+  AARCH64_LDP_STP_POLICY_ALIGNED,   /* ldp_policy_model.  */
+  AARCH64_LDP_STP_POLICY_ALIGNED    /* stp_policy_model.  */
+};
+
+#endif /* GCC_AARCH64_H_AMPERE1.  */
diff --git a/gcc/config/aarch64/tuning_models/ampere1a.h b/gcc/config/aarch64/tuning_models/ampere1a.h
new file mode 100644
index 0000000000000000000000000000000000000000..c419ffb3c1a936a01690ad157c6c71dc645273c8
--- /dev/null
+++ b/gcc/config/aarch64/tuning_models/ampere1a.h
@@ -0,0 +1,65 @@
+/* Tuning model description for AArch64 architecture.
+   Copyright (C) 2009-2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_AARCH64_H_AMPERE1A
+#define GCC_AARCH64_H_AMPERE1A
+
+#include "generic.h"
+
+static const struct tune_params ampere1a_tunings =
+{
+  &ampere1a_extra_costs,
+  &generic_addrcost_table,
+  &generic_regmove_cost,
+  &ampere1_vector_cost,
+  &generic_branch_cost,
+  &generic_approx_modes,
+  SVE_NOT_IMPLEMENTED, /* sve_width  */
+  { 4, /* load_int.  */
+    4, /* store_int.  */
+    4, /* load_fp.  */
+    4, /* store_fp.  */
+    4, /* load_pred.  */
+    4 /* store_pred.  */
+  }, /* memmov_cost.  */
+  4, /* issue_rate  */
+  (AARCH64_FUSE_ADRP_ADD | AARCH64_FUSE_AES_AESMC |
+   AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_MOVK_MOVK |
+   AARCH64_FUSE_ALU_BRANCH /* adds, ands, bics, ccmp, ccmn */ |
+   AARCH64_FUSE_CMP_BRANCH | AARCH64_FUSE_ALU_CBZ |
+   AARCH64_FUSE_ADDSUB_2REG_CONST1),
+  /* fusible_ops  */
+  "32",		/* function_align.  */
+  "4",		/* jump_align.  */
+  "32:16",	/* loop_align.  */
+  2,	/* int_reassoc_width.  */
+  4,	/* fp_reassoc_width.  */
+  1,	/* fma_reassoc_width.  */
+  2,	/* vec_reassoc_width.  */
+  2,	/* min_div_recip_mul_sf.  */
+  2,	/* min_div_recip_mul_df.  */
+  0,	/* max_case_values.  */
+  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
+  (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
+  &ampere1_prefetch_tune,
+  AARCH64_LDP_STP_POLICY_ALIGNED,   /* ldp_policy_model.  */
+  AARCH64_LDP_STP_POLICY_ALIGNED    /* stp_policy_model.  */
+};
+
+#endif /* GCC_AARCH64_H_AMPERE1A.  */
diff --git a/gcc/config/aarch64/tuning_models/cortexa35.h b/gcc/config/aarch64/tuning_models/cortexa35.h
new file mode 100644
index 0000000000000000000000000000000000000000..5534335348db96cc57fc9eccd7ff79a624cb528a
--- /dev/null
+++ b/gcc/config/aarch64/tuning_models/cortexa35.h
@@ -0,0 +1,62 @@
+/* Tuning model description for AArch64 architecture.
+   Copyright (C) 2009-2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_AARCH64_H_CORTEXA35
+#define GCC_AARCH64_H_CORTEXA35
+
+#include "generic.h"
+#include "cortexa53.h"
+
+static const struct tune_params cortexa35_tunings =
+{
+  &cortexa53_extra_costs,
+  &generic_addrcost_table,
+  &cortexa53_regmove_cost,
+  &generic_vector_cost,
+  &generic_branch_cost,
+  &generic_approx_modes,
+  SVE_NOT_IMPLEMENTED, /* sve_width  */
+  { 4, /* load_int.  */
+    4, /* store_int.  */
+    4, /* load_fp.  */
+    4, /* store_fp.  */
+    4, /* load_pred.  */
+    4 /* store_pred.  */
+  }, /* memmov_cost.  */
+  1, /* issue_rate  */
+  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
+   | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops  */
+  "16",	/* function_align.  */
+  "4",	/* jump_align.  */
+  "8",	/* loop_align.  */
+  2,	/* int_reassoc_width.  */
+  4,	/* fp_reassoc_width.  */
+  1,	/* fma_reassoc_width.  */
+  1,	/* vec_reassoc_width.  */
+  2,	/* min_div_recip_mul_sf.  */
+  2,	/* min_div_recip_mul_df.  */
+  0,	/* max_case_values.  */
+  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
+  (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
+  &generic_prefetch_tune,
+  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
+  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
+};
+
+#endif /* GCC_AARCH64_H_CORTEXA35.  */
diff --git a/gcc/config/aarch64/tuning_models/cortexa53.h b/gcc/config/aarch64/tuning_models/cortexa53.h
new file mode 100644
index 0000000000000000000000000000000000000000..9dfdccc5968e7f062af5c78f153bfe3838263b0a
--- /dev/null
+++ b/gcc/config/aarch64/tuning_models/cortexa53.h
@@ -0,0 +1,71 @@
+/* Tuning model description for AArch64 architecture.
+   Copyright (C) 2009-2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_AARCH64_H_CORTEXA53
+#define GCC_AARCH64_H_CORTEXA53
+
+#include "generic.h"
+
+static const struct cpu_regmove_cost cortexa53_regmove_cost =
+{
+  1, /* GP2GP  */
+  /* Avoid the use of slow int<->fp moves for spilling by setting
+     their cost higher than memmov_cost.  */
+  5, /* GP2FP  */
+  5, /* FP2GP  */
+  2 /* FP2FP  */
+};
+
+static const struct tune_params cortexa53_tunings =
+{
+  &cortexa53_extra_costs,
+  &generic_addrcost_table,
+  &cortexa53_regmove_cost,
+  &generic_vector_cost,
+  &generic_branch_cost,
+  &generic_approx_modes,
+  SVE_NOT_IMPLEMENTED, /* sve_width  */
+  { 4, /* load_int.  */
+    4, /* store_int.  */
+    4, /* load_fp.  */
+    4, /* store_fp.  */
+    4, /* load_pred.  */
+    4 /* store_pred.  */
+  }, /* memmov_cost.  */
+  2, /* issue_rate  */
+  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
+   | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops  */
+  "16",	/* function_align.  */
+  "4",	/* jump_align.  */
+  "8",	/* loop_align.  */
+  2,	/* int_reassoc_width.  */
+  4,	/* fp_reassoc_width.  */
+  1,	/* fma_reassoc_width.  */
+  1,	/* vec_reassoc_width.  */
+  2,	/* min_div_recip_mul_sf.  */
+  2,	/* min_div_recip_mul_df.  */
+  0,	/* max_case_values.  */
+  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
+  (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
+  &generic_prefetch_tune,
+  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
+  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
+};
+
+#endif /* GCC_AARCH64_H_CORTEXA53.  */
diff --git a/gcc/config/aarch64/tuning_models/cortexa57.h b/gcc/config/aarch64/tuning_models/cortexa57.h
new file mode 100644
index 0000000000000000000000000000000000000000..9c4789d57833a5879dda8e2fe454ac5f56cb0601
--- /dev/null
+++ b/gcc/config/aarch64/tuning_models/cortexa57.h
@@ -0,0 +1,109 @@
+/* Tuning model description for AArch64 architecture.
+   Copyright (C) 2009-2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_AARCH64_H_CORTEXA57
+#define GCC_AARCH64_H_CORTEXA57
+
+#include "generic.h"
+
+static const struct cpu_regmove_cost cortexa57_regmove_cost =
+{
+  1, /* GP2GP  */
+  /* Avoid the use of slow int<->fp moves for spilling by setting
+     their cost higher than memmov_cost.  */
+  5, /* GP2FP  */
+  5, /* FP2GP  */
+  2 /* FP2FP  */
+};
+
+static const advsimd_vec_cost cortexa57_advsimd_vector_cost =
+{
+  2, /* int_stmt_cost  */
+  2, /* fp_stmt_cost  */
+  0, /* ld2_st2_permute_cost  */
+  0, /* ld3_st3_permute_cost  */
+  0, /* ld4_st4_permute_cost  */
+  3, /* permute_cost  */
+  8, /* reduc_i8_cost  */
+  8, /* reduc_i16_cost  */
+  8, /* reduc_i32_cost  */
+  8, /* reduc_i64_cost  */
+  8, /* reduc_f16_cost  */
+  8, /* reduc_f32_cost  */
+  8, /* reduc_f64_cost  */
+  8, /* store_elt_extra_cost  */
+  8, /* vec_to_scalar_cost  */
+  8, /* scalar_to_vec_cost  */
+  4, /* align_load_cost  */
+  4, /* unalign_load_cost  */
+  1, /* unalign_store_cost  */
+  1  /* store_cost  */
+};
+
+/* Cortex-A57 costs for vector insn classes.  */
+static const struct cpu_vector_cost cortexa57_vector_cost =
+{
+  1, /* scalar_int_stmt_cost  */
+  1, /* scalar_fp_stmt_cost  */
+  4, /* scalar_load_cost  */
+  1, /* scalar_store_cost  */
+  1, /* cond_taken_branch_cost  */
+  1, /* cond_not_taken_branch_cost  */
+  &cortexa57_advsimd_vector_cost, /* advsimd  */
+  nullptr, /* sve  */
+  nullptr /* issue_info  */
+};
+
+static const struct tune_params cortexa57_tunings =
+{
+  &cortexa57_extra_costs,
+  &generic_addrcost_table,
+  &cortexa57_regmove_cost,
+  &cortexa57_vector_cost,
+  &generic_branch_cost,
+  &generic_approx_modes,
+  SVE_NOT_IMPLEMENTED, /* sve_width  */
+  { 4, /* load_int.  */
+    4, /* store_int.  */
+    4, /* load_fp.  */
+    4, /* store_fp.  */
+    4, /* load_pred.  */
+    4 /* store_pred.  */
+  }, /* memmov_cost.  */
+  3, /* issue_rate  */
+  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
+   | AARCH64_FUSE_MOVK_MOVK), /* fusible_ops  */
+  "16",	/* function_align.  */
+  "4",	/* jump_align.  */
+  "8",	/* loop_align.  */
+  2,	/* int_reassoc_width.  */
+  4,	/* fp_reassoc_width.  */
+  1,	/* fma_reassoc_width.  */
+  1,	/* vec_reassoc_width.  */
+  2,	/* min_div_recip_mul_sf.  */
+  2,	/* min_div_recip_mul_df.  */
+  0,	/* max_case_values.  */
+  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
+  (AARCH64_EXTRA_TUNE_RENAME_FMA_REGS),	/* tune_flags.  */
+  &generic_prefetch_tune,
+  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
+  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
+};
+
+#endif /* GCC_AARCH64_H_CORTEXA57.  */
diff --git a/gcc/config/aarch64/tuning_models/cortexa72.h b/gcc/config/aarch64/tuning_models/cortexa72.h
new file mode 100644
index 0000000000000000000000000000000000000000..968171c9b2e898d7479dbcb462e33fe3905e183d
--- /dev/null
+++ b/gcc/config/aarch64/tuning_models/cortexa72.h
@@ -0,0 +1,61 @@
+/* Tuning model description for AArch64 architecture.
+   Copyright (C) 2009-2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_AARCH64_H_CORTEXA72
+#define GCC_AARCH64_H_CORTEXA72
+
+#include "generic.h"
+
+static const struct tune_params cortexa72_tunings =
+{
+  &cortexa57_extra_costs,
+  &generic_addrcost_table,
+  &cortexa57_regmove_cost,
+  &cortexa57_vector_cost,
+  &generic_branch_cost,
+  &generic_approx_modes,
+  SVE_NOT_IMPLEMENTED, /* sve_width  */
+  { 4, /* load_int.  */
+    4, /* store_int.  */
+    4, /* load_fp.  */
+    4, /* store_fp.  */
+    4, /* load_pred.  */
+    4 /* store_pred.  */
+  }, /* memmov_cost.  */
+  3, /* issue_rate  */
+  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
+   | AARCH64_FUSE_MOVK_MOVK), /* fusible_ops  */
+  "16",	/* function_align.  */
+  "4",	/* jump_align.  */
+  "8",	/* loop_align.  */
+  2,	/* int_reassoc_width.  */
+  4,	/* fp_reassoc_width.  */
+  1,	/* fma_reassoc_width.  */
+  1,	/* vec_reassoc_width.  */
+  2,	/* min_div_recip_mul_sf.  */
+  2,	/* min_div_recip_mul_df.  */
+  0,	/* max_case_values.  */
+  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
+  (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
+  &generic_prefetch_tune,
+  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
+  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
+};
+
+#endif /* GCC_AARCH64_H_CORTEXA72.  */
diff --git a/gcc/config/aarch64/tuning_models/cortexa73.h b/gcc/config/aarch64/tuning_models/cortexa73.h
new file mode 100644
index 0000000000000000000000000000000000000000..8d1a504ddac39604dd193ce0f434fd2f5145c129
--- /dev/null
+++ b/gcc/config/aarch64/tuning_models/cortexa73.h
@@ -0,0 +1,62 @@
+/* Tuning model description for AArch64 architecture.
+   Copyright (C) 2009-2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_AARCH64_H_CORTEXA73
+#define GCC_AARCH64_H_CORTEXA73
+
+#include "generic.h"
+
+static const struct tune_params cortexa73_tunings =
+{
+  &cortexa57_extra_costs,
+  &generic_addrcost_table,
+  &cortexa57_regmove_cost,
+  &cortexa57_vector_cost,
+  &generic_branch_cost,
+  &generic_approx_modes,
+  SVE_NOT_IMPLEMENTED, /* sve_width  */
+  { 4, /* load_int.  */
+    4, /* store_int.  */
+    4, /* load_fp.  */
+    4, /* store_fp.  */
+    4, /* load_pred.  */
+    4 /* store_pred.  */
+  }, /* memmov_cost.  */
+  2, /* issue_rate.  */
+  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
+   | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops  */
+  "16",	/* function_align.  */
+  "4",	/* jump_align.  */
+  "8",	/* loop_align.  */
+  2,	/* int_reassoc_width.  */
+  4,	/* fp_reassoc_width.  */
+  1,	/* fma_reassoc_width.  */
+  1,	/* vec_reassoc_width.  */
+  2,	/* min_div_recip_mul_sf.  */
+  2,	/* min_div_recip_mul_df.  */
+  0,	/* max_case_values.  */
+  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
+  (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
+  &generic_prefetch_tune,
+  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
+  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
+};
+
+
+#endif /* GCC_AARCH64_H_CORTEXA73.  */
diff --git a/gcc/config/aarch64/tuning_models/emag.h b/gcc/config/aarch64/tuning_models/emag.h
new file mode 100644
index 0000000000000000000000000000000000000000..3f3402c3fc2a94704eeaf9223ecb0ca1c057cace
--- /dev/null
+++ b/gcc/config/aarch64/tuning_models/emag.h
@@ -0,0 +1,60 @@
+/* Tuning model description for AArch64 architecture.
+   Copyright (C) 2009-2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_AARCH64_H_EMAG
+#define GCC_AARCH64_H_EMAG
+
+#include "generic.h"
+
+static const struct tune_params emag_tunings =
+{
+  &xgene1_extra_costs,
+  &xgene1_addrcost_table,
+  &xgene1_regmove_cost,
+  &xgene1_vector_cost,
+  &generic_branch_cost,
+  &xgene1_approx_modes,
+  SVE_NOT_IMPLEMENTED,
+  { 6, /* load_int.  */
+    6, /* store_int.  */
+    6, /* load_fp.  */
+    6, /* store_fp.  */
+    6, /* load_pred.  */
+    6 /* store_pred.  */
+  }, /* memmov_cost.  */
+  4, /* issue_rate  */
+  AARCH64_FUSE_NOTHING, /* fusible_ops  */
+  "16",	/* function_align.  */
+  "16",	/* jump_align.  */
+  "16",	/* loop_align.  */
+  2,	/* int_reassoc_width.  */
+  4,	/* fp_reassoc_width.  */
+  1,	/* fma_reassoc_width.  */
+  1,	/* vec_reassoc_width.  */
+  2,	/* min_div_recip_mul_sf.  */
+  2,	/* min_div_recip_mul_df.  */
+  17,	/* max_case_values.  */
+  tune_params::AUTOPREFETCHER_OFF,	/* autoprefetcher_model.  */
+  (AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS),	/* tune_flags.  */
+  &xgene1_prefetch_tune,
+  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
+  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
+};
+
+#endif /* GCC_AARCH64_H_EMAG.  */
diff --git a/gcc/config/aarch64/tuning_models/exynosm1.h b/gcc/config/aarch64/tuning_models/exynosm1.h
new file mode 100644
index 0000000000000000000000000000000000000000..a42ea4df97f3f048c41481c304fd3684a69d743b
--- /dev/null
+++ b/gcc/config/aarch64/tuning_models/exynosm1.h
@@ -0,0 +1,144 @@
+/* Tuning model description for AArch64 architecture.
+   Copyright (C) 2009-2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_AARCH64_H_EXYNOSM1
+#define GCC_AARCH64_H_EXYNOSM1
+
+#include "generic.h"
+
+static const struct cpu_addrcost_table exynosm1_addrcost_table =
+{
+    {
+      0, /* hi  */
+      0, /* si  */
+      0, /* di  */
+      2, /* ti  */
+    },
+  0, /* pre_modify  */
+  0, /* post_modify  */
+  0, /* post_modify_ld3_st3  */
+  0, /* post_modify_ld4_st4  */
+  1, /* register_offset  */
+  1, /* register_sextend  */
+  2, /* register_zextend  */
+  0, /* imm_offset  */
+};
+
+static const struct cpu_regmove_cost exynosm1_regmove_cost =
+{
+  1, /* GP2GP  */
+  /* Avoid the use of slow int<->fp moves for spilling by setting
+     their cost higher than memmov_cost (actual, 4 and 9).  */
+  9, /* GP2FP  */
+  9, /* FP2GP  */
+  1 /* FP2FP  */
+};
+
+static const advsimd_vec_cost exynosm1_advsimd_vector_cost =
+{
+  3, /* int_stmt_cost  */
+  3, /* fp_stmt_cost  */
+  0, /* ld2_st2_permute_cost  */
+  0, /* ld3_st3_permute_cost  */
+  0, /* ld4_st4_permute_cost  */
+  3, /* permute_cost  */
+  3, /* reduc_i8_cost  */
+  3, /* reduc_i16_cost  */
+  3, /* reduc_i32_cost  */
+  3, /* reduc_i64_cost  */
+  3, /* reduc_f16_cost  */
+  3, /* reduc_f32_cost  */
+  3, /* reduc_f64_cost  */
+  3, /* store_elt_extra_cost  */
+  3, /* vec_to_scalar_cost  */
+  3, /* scalar_to_vec_cost  */
+  5, /* align_load_cost  */
+  5, /* unalign_load_cost  */
+  1, /* unalign_store_cost  */
+  1  /* store_cost  */
+};
+
+static const struct cpu_vector_cost exynosm1_vector_cost =
+{
+  1, /* scalar_int_stmt_cost  */
+  1, /* scalar_fp_stmt_cost  */
+  5, /* scalar_load_cost  */
+  1, /* scalar_store_cost  */
+  1, /* cond_taken_branch_cost  */
+  1, /* cond_not_taken_branch_cost  */
+  &exynosm1_advsimd_vector_cost, /* advsimd  */
+  nullptr, /* sve  */
+  nullptr /* issue_info  */
+};
+
+/* Approximation modes for Exynos M1.  */
+static const cpu_approx_modes exynosm1_approx_modes =
+{
+  AARCH64_APPROX_NONE,	/* division  */
+  AARCH64_APPROX_ALL,	/* sqrt  */
+  AARCH64_APPROX_ALL	/* recip_sqrt  */
+};
+
+static const cpu_prefetch_tune exynosm1_prefetch_tune =
+{
+  0,			/* num_slots  */
+  -1,			/* l1_cache_size  */
+  64,			/* l1_cache_line_size  */
+  -1,			/* l2_cache_size  */
+  true,			/* prefetch_dynamic_strides */
+  -1,			/* minimum_stride */
+  -1			/* default_opt_level  */
+};
+
+static const struct tune_params exynosm1_tunings =
+{
+  &exynosm1_extra_costs,
+  &exynosm1_addrcost_table,
+  &exynosm1_regmove_cost,
+  &exynosm1_vector_cost,
+  &generic_branch_cost,
+  &exynosm1_approx_modes,
+  SVE_NOT_IMPLEMENTED, /* sve_width  */
+  { 4, /* load_int.  */
+    4, /* store_int.  */
+    4, /* load_fp.  */
+    4, /* store_fp.  */
+    4, /* load_pred.  */
+    4 /* store_pred.  */
+  }, /* memmov_cost.  */
+  3,	/* issue_rate  */
+  (AARCH64_FUSE_AES_AESMC), /* fusible_ops  */
+  "4",	/* function_align.  */
+  "4",	/* jump_align.  */
+  "4",	/* loop_align.  */
+  2,	/* int_reassoc_width.  */
+  4,	/* fp_reassoc_width.  */
+  1,	/* fma_reassoc_width.  */
+  1,	/* vec_reassoc_width.  */
+  2,	/* min_div_recip_mul_sf.  */
+  2,	/* min_div_recip_mul_df.  */
+  48,	/* max_case_values.  */
+  tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model.  */
+  (AARCH64_EXTRA_TUNE_NONE), /* tune_flags.  */
+  &exynosm1_prefetch_tune,
+  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
+  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
+};
+
+#endif /* GCC_AARCH64_H_EXYNOSM1.  */
diff --git a/gcc/config/aarch64/tuning_models/generic.h b/gcc/config/aarch64/tuning_models/generic.h
new file mode 100644
index 0000000000000000000000000000000000000000..deb2c1cffe255bddcb5be571b12086442782da60
--- /dev/null
+++ b/gcc/config/aarch64/tuning_models/generic.h
@@ -0,0 +1,190 @@
+/* Tuning model description for AArch64 architecture.
+   Copyright (C) 2009-2023 Free Software Foundation, Inc.
+   Contributed by ARM Ltd.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_AARCH64_H_GENERIC
+#define GCC_AARCH64_H_GENERIC
+
+static const struct cpu_addrcost_table generic_addrcost_table =
+{
+    {
+      1, /* hi  */
+      0, /* si  */
+      0, /* di  */
+      1, /* ti  */
+    },
+  0, /* pre_modify  */
+  0, /* post_modify  */
+  0, /* post_modify_ld3_st3  */
+  0, /* post_modify_ld4_st4  */
+  0, /* register_offset  */
+  0, /* register_sextend  */
+  0, /* register_zextend  */
+  0 /* imm_offset  */
+};
+
+static const struct cpu_regmove_cost generic_regmove_cost =
+{
+  1, /* GP2GP  */
+  /* Avoid the use of slow int<->fp moves for spilling by setting
+     their cost higher than memmov_cost.  */
+  5, /* GP2FP  */
+  5, /* FP2GP  */
+  2 /* FP2FP  */
+};
+
+/* Generic costs for Advanced SIMD vector operations.   */
+static const advsimd_vec_cost generic_advsimd_vector_cost =
+{
+  1, /* int_stmt_cost  */
+  1, /* fp_stmt_cost  */
+  0, /* ld2_st2_permute_cost  */
+  0, /* ld3_st3_permute_cost  */
+  0, /* ld4_st4_permute_cost  */
+  2, /* permute_cost  */
+  2, /* reduc_i8_cost  */
+  2, /* reduc_i16_cost  */
+  2, /* reduc_i32_cost  */
+  2, /* reduc_i64_cost  */
+  2, /* reduc_f16_cost  */
+  2, /* reduc_f32_cost  */
+  2, /* reduc_f64_cost  */
+  2, /* store_elt_extra_cost  */
+  2, /* vec_to_scalar_cost  */
+  1, /* scalar_to_vec_cost  */
+  1, /* align_load_cost  */
+  1, /* unalign_load_cost  */
+  1, /* unalign_store_cost  */
+  1  /* store_cost  */
+};
+
+/* Generic costs for SVE vector operations.  */
+static const sve_vec_cost generic_sve_vector_cost =
+{
+  {
+    1, /* int_stmt_cost  */
+    1, /* fp_stmt_cost  */
+    0, /* ld2_st2_permute_cost  */
+    0, /* ld3_st3_permute_cost  */
+    0, /* ld4_st4_permute_cost  */
+    2, /* permute_cost  */
+    2, /* reduc_i8_cost  */
+    2, /* reduc_i16_cost  */
+    2, /* reduc_i32_cost  */
+    2, /* reduc_i64_cost  */
+    2, /* reduc_f16_cost  */
+    2, /* reduc_f32_cost  */
+    2, /* reduc_f64_cost  */
+    2, /* store_elt_extra_cost  */
+    2, /* vec_to_scalar_cost  */
+    1, /* scalar_to_vec_cost  */
+    1, /* align_load_cost  */
+    1, /* unalign_load_cost  */
+    1, /* unalign_store_cost  */
+    1  /* store_cost  */
+  },
+  2, /* clast_cost  */
+  2, /* fadda_f16_cost  */
+  2, /* fadda_f32_cost  */
+  2, /* fadda_f64_cost  */
+  4, /* gather_load_x32_cost  */
+  2, /* gather_load_x64_cost  */
+  1 /* scatter_store_elt_cost  */
+};
+
+/* Generic costs for vector insn classes.  */
+static const struct cpu_vector_cost generic_vector_cost =
+{
+  1, /* scalar_int_stmt_cost  */
+  1, /* scalar_fp_stmt_cost  */
+  1, /* scalar_load_cost  */
+  1, /* scalar_store_cost  */
+  3, /* cond_taken_branch_cost  */
+  1, /* cond_not_taken_branch_cost  */
+  &generic_advsimd_vector_cost, /* advsimd  */
+  &generic_sve_vector_cost, /* sve */
+  nullptr /* issue_info  */
+};
+
+/* Generic costs for branch instructions.  */
+static const struct cpu_branch_cost generic_branch_cost =
+{
+  1,  /* Predictable.  */
+  3   /* Unpredictable.  */
+};
+
+/* Generic approximation modes.  */
+static const cpu_approx_modes generic_approx_modes =
+{
+  AARCH64_APPROX_NONE,	/* division  */
+  AARCH64_APPROX_NONE,	/* sqrt  */
+  AARCH64_APPROX_NONE	/* recip_sqrt  */
+};
+
+/* Generic prefetch settings (which disable prefetch).  */
+static const cpu_prefetch_tune generic_prefetch_tune =
+{
+  0,			/* num_slots  */
+  -1,			/* l1_cache_size  */
+  -1,			/* l1_cache_line_size  */
+  -1,			/* l2_cache_size  */
+  true,			/* prefetch_dynamic_strides */
+  -1,			/* minimum_stride */
+  -1			/* default_opt_level  */
+};
+
+static const struct tune_params generic_tunings =
+{
+  &cortexa57_extra_costs,
+  &generic_addrcost_table,
+  &generic_regmove_cost,
+  &generic_vector_cost,
+  &generic_branch_cost,
+  &generic_approx_modes,
+  SVE_NOT_IMPLEMENTED, /* sve_width  */
+  { 4, /* load_int.  */
+    4, /* store_int.  */
+    4, /* load_fp.  */
+    4, /* store_fp.  */
+    4, /* load_pred.  */
+    4 /* store_pred.  */
+  }, /* memmov_cost.  */
+  2, /* issue_rate  */
+  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops  */
+  "16:12",	/* function_align.  */
+  "4",	/* jump_align.  */
+  "8",	/* loop_align.  */
+  2,	/* int_reassoc_width.  */
+  4,	/* fp_reassoc_width.  */
+  1,	/* fma_reassoc_width.  */
+  1,	/* vec_reassoc_width.  */
+  2,	/* min_div_recip_mul_sf.  */
+  2,	/* min_div_recip_mul_df.  */
+  0,	/* max_case_values.  */
+  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
+  /* Enabling AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS significantly benefits
+     Neoverse V1.  It does not have a noticeable effect on A64FX and should
+     have at most a very minor effect on SVE2 cores.  */
+  (AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS),	/* tune_flags.  */
+  &generic_prefetch_tune,
+  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
+  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
+};
+
+#endif /* GCC_AARCH64_H_GENERIC.  */
diff --git a/gcc/config/aarch64/tuning_models/neoverse512tvb.h b/gcc/config/aarch64/tuning_models/neoverse512tvb.h
new file mode 100644
index 0000000000000000000000000000000000000000..50d7b23712cc6a8be8f35246657ec5d86d6d4191
--- /dev/null
+++ b/gcc/config/aarch64/tuning_models/neoverse512tvb.h
@@ -0,0 +1,164 @@
+/* Tuning model description for AArch64 architecture.
+   Copyright (C) 2009-2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_AARCH64_H_NEOVERSE512TVB
+#define GCC_AARCH64_H_NEOVERSE512TVB
+
+#include "generic.h"
+
+static const sve_vec_cost neoverse512tvb_sve_vector_cost =
+{
+  {
+    2, /* int_stmt_cost  */
+    2, /* fp_stmt_cost  */
+    4, /* ld2_st2_permute_cost  */
+    5, /* ld3_st3_permute_cost  */
+    5, /* ld4_st4_permute_cost  */
+    3, /* permute_cost  */
+    /* Theoretically, a reduction involving 15 scalar ADDs could
+       complete in ~5 cycles and would have a cost of 15.  Assume that
+       [SU]ADDV completes in 11 cycles and so give it a cost of 15 + 6.  */
+    21, /* reduc_i8_cost  */
+    /* Likewise for 7 scalar ADDs (~3 cycles) vs. 9: 7 + 6.  */
+    13, /* reduc_i16_cost  */
+    /* Likewise for 3 scalar ADDs (~2 cycles) vs. 8: 3 + 6.  */
+    9, /* reduc_i32_cost  */
+    /* Likewise for 1 scalar ADD (1 cycle) vs. 8: 1 + 7.  */
+    8, /* reduc_i64_cost  */
+    /* Theoretically, a reduction involving 7 scalar FADDs could
+       complete in ~6 cycles and would have a cost of 14.  Assume that
+       FADDV completes in 8 cycles and so give it a cost of 14 + 2.  */
+    16, /* reduc_f16_cost  */
+    /* Likewise for 3 scalar FADDs (~4 cycles) vs. 6: 6 + 2.  */
+    8, /* reduc_f32_cost  */
+    /* Likewise for 1 scalar FADD (2 cycles) vs. 4: 2 + 2.  */
+    4, /* reduc_f64_cost  */
+    2, /* store_elt_extra_cost  */
+    /* This value is just inherited from the Cortex-A57 table.  */
+    8, /* vec_to_scalar_cost  */
+    /* This depends very much on what the scalar value is and
+       where it comes from.  E.g. some constants take two dependent
+       instructions or a load, while others might be moved from a GPR.
+       4 seems to be a reasonable compromise in practice.  */
+    4, /* scalar_to_vec_cost  */
+    4, /* align_load_cost  */
+    4, /* unalign_load_cost  */
+    /* Although stores generally have a latency of 2 and compete for the
+       vector pipes, in practice it's better not to model that.  */
+    1, /* unalign_store_cost  */
+    1  /* store_cost  */
+  },
+  3, /* clast_cost  */
+  10, /* fadda_f16_cost  */
+  6, /* fadda_f32_cost  */
+  4, /* fadda_f64_cost  */
+  /* A strided Advanced SIMD x64 load would take two parallel FP loads
+     (6 cycles) plus an insertion (2 cycles).  Assume a 64-bit SVE gather
+     is 1 cycle more.  The Advanced SIMD version is costed as 2 scalar loads
+     (cost 8) and a vec_construct (cost 2).  Add a full vector operation
+     (cost 2) to that, to avoid the difference being lost in rounding.
+
+     There is no easy comparison between a strided Advanced SIMD x32 load
+     and an SVE 32-bit gather, but cost an SVE 32-bit gather as 1 vector
+     operation more than a 64-bit gather.  */
+  14, /* gather_load_x32_cost  */
+  12, /* gather_load_x64_cost  */
+  3 /* scatter_store_elt_cost  */
+};
+
+static const aarch64_sve_vec_issue_info neoverse512tvb_sve_issue_info =
+{
+  {
+    {
+      3, /* loads_per_cycle  */
+      2, /* stores_per_cycle  */
+      4, /* general_ops_per_cycle  */
+      0, /* fp_simd_load_general_ops  */
+      1 /* fp_simd_store_general_ops  */
+    },
+    2, /* ld2_st2_general_ops  */
+    2, /* ld3_st3_general_ops  */
+    3 /* ld4_st4_general_ops  */
+  },
+  2, /* pred_ops_per_cycle  */
+  2, /* while_pred_ops  */
+  2, /* int_cmp_pred_ops  */
+  1, /* fp_cmp_pred_ops  */
+  1, /* gather_scatter_pair_general_ops  */
+  1 /* gather_scatter_pair_pred_ops  */
+};
+
+static const aarch64_vec_issue_info neoverse512tvb_vec_issue_info =
+{
+  &neoversev1_scalar_issue_info,
+  &neoversev1_advsimd_issue_info,
+  &neoverse512tvb_sve_issue_info
+};
+
+static const struct cpu_vector_cost neoverse512tvb_vector_cost =
+{
+  1, /* scalar_int_stmt_cost  */
+  2, /* scalar_fp_stmt_cost  */
+  4, /* scalar_load_cost  */
+  1, /* scalar_store_cost  */
+  1, /* cond_taken_branch_cost  */
+  1, /* cond_not_taken_branch_cost  */
+  &neoversev1_advsimd_vector_cost, /* advsimd  */
+  &neoverse512tvb_sve_vector_cost, /* sve  */
+  &neoverse512tvb_vec_issue_info /* issue_info  */
+};
+
+static const struct tune_params neoverse512tvb_tunings =
+{
+  &cortexa76_extra_costs,
+  &neoversev1_addrcost_table,
+  &neoversev1_regmove_cost,
+  &neoverse512tvb_vector_cost,
+  &generic_branch_cost,
+  &generic_approx_modes,
+  SVE_128 | SVE_256, /* sve_width  */
+  { 4, /* load_int.  */
+    2, /* store_int.  */
+    6, /* load_fp.  */
+    2, /* store_fp.  */
+    6, /* load_pred.  */
+    1 /* store_pred.  */
+  }, /* memmov_cost.  */
+  3, /* issue_rate  */
+  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops  */
+  "32:16",	/* function_align.  */
+  "4",		/* jump_align.  */
+  "32:16",	/* loop_align.  */
+  2,	/* int_reassoc_width.  */
+  4,	/* fp_reassoc_width.  */
+  4,	/* fma_reassoc_width.  */
+  2,	/* vec_reassoc_width.  */
+  2,	/* min_div_recip_mul_sf.  */
+  2,	/* min_div_recip_mul_df.  */
+  0,	/* max_case_values.  */
+  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
+  (AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
+   | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
+   | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT),	/* tune_flags.  */
+  &generic_prefetch_tune,
+  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
+  AARCH64_LDP_STP_POLICY_ALWAYS	   /* stp_policy_model.  */
+};
+
+#endif /* GCC_AARCH64_H_NEOVERSE512TVB.  */
diff --git a/gcc/config/aarch64/tuning_models/neoversen1.h b/gcc/config/aarch64/tuning_models/neoversen1.h
new file mode 100644
index 0000000000000000000000000000000000000000..132166d3d06430b725e4448937332cc159c11cda
--- /dev/null
+++ b/gcc/config/aarch64/tuning_models/neoversen1.h
@@ -0,0 +1,60 @@
+/* Tuning model description for AArch64 architecture.
+   Copyright (C) 2009-2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_AARCH64_H_NEOVERSEN1
+#define GCC_AARCH64_H_NEOVERSEN1
+
+#include "generic.h"
+
+static const struct tune_params neoversen1_tunings =
+{
+  &cortexa76_extra_costs,
+  &generic_addrcost_table,
+  &generic_regmove_cost,
+  &cortexa57_vector_cost,
+  &generic_branch_cost,
+  &generic_approx_modes,
+  SVE_NOT_IMPLEMENTED, /* sve_width  */
+  { 4, /* load_int.  */
+    2, /* store_int.  */
+    5, /* load_fp.  */
+    2, /* store_fp.  */
+    4, /* load_pred.  */
+    4 /* store_pred.  */
+  }, /* memmov_cost.  */
+  3, /* issue_rate  */
+  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops  */
+  "32:16",	/* function_align.  */
+  "4",		/* jump_align.  */
+  "32:16",	/* loop_align.  */
+  2,	/* int_reassoc_width.  */
+  4,	/* fp_reassoc_width.  */
+  1,	/* fma_reassoc_width.  */
+  2,	/* vec_reassoc_width.  */
+  2,	/* min_div_recip_mul_sf.  */
+  2,	/* min_div_recip_mul_df.  */
+  0,	/* max_case_values.  */
+  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
+  (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND),	/* tune_flags.  */
+  &generic_prefetch_tune,
+  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
+  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
+};
+
+#endif /* GCC_AARCH64_H_NEOVERSEN1.  */
diff --git a/gcc/config/aarch64/tuning_models/neoversen2.h b/gcc/config/aarch64/tuning_models/neoversen2.h
new file mode 100644
index 0000000000000000000000000000000000000000..395a6d82b8403e586bf179cade055543cf9b9eb0
--- /dev/null
+++ b/gcc/config/aarch64/tuning_models/neoversen2.h
@@ -0,0 +1,245 @@
+/* Tuning model description for AArch64 architecture.
+   Copyright (C) 2009-2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_AARCH64_H_NEOVERSEN2
+#define GCC_AARCH64_H_NEOVERSEN2
+
+#include "generic.h"
+
+static const struct cpu_addrcost_table neoversen2_addrcost_table =
+{
+    {
+      1, /* hi  */
+      0, /* si  */
+      0, /* di  */
+      1, /* ti  */
+    },
+  0, /* pre_modify  */
+  0, /* post_modify  */
+  2, /* post_modify_ld3_st3  */
+  2, /* post_modify_ld4_st4  */
+  0, /* register_offset  */
+  0, /* register_sextend  */
+  0, /* register_zextend  */
+  0 /* imm_offset  */
+};
+
+static const struct cpu_regmove_cost neoversen2_regmove_cost =
+{
+  1, /* GP2GP  */
+  /* Spilling to int<->fp instead of memory is recommended so set
+     realistic costs compared to memmov_cost.  */
+  3, /* GP2FP  */
+  2, /* FP2GP  */
+  2 /* FP2FP  */
+};
+
+static const advsimd_vec_cost neoversen2_advsimd_vector_cost =
+{
+  2, /* int_stmt_cost  */
+  2, /* fp_stmt_cost  */
+  2, /* ld2_st2_permute_cost */
+  2, /* ld3_st3_permute_cost  */
+  3, /* ld4_st4_permute_cost  */
+  3, /* permute_cost  */
+  4, /* reduc_i8_cost  */
+  4, /* reduc_i16_cost  */
+  2, /* reduc_i32_cost  */
+  2, /* reduc_i64_cost  */
+  6, /* reduc_f16_cost  */
+  4, /* reduc_f32_cost  */
+  2, /* reduc_f64_cost  */
+  2, /* store_elt_extra_cost  */
+  /* This value is just inherited from the Cortex-A57 table.  */
+  8, /* vec_to_scalar_cost  */
+  /* This depends very much on what the scalar value is and
+     where it comes from.  E.g. some constants take two dependent
+     instructions or a load, while others might be moved from a GPR.
+     4 seems to be a reasonable compromise in practice.  */
+  4, /* scalar_to_vec_cost  */
+  4, /* align_load_cost  */
+  4, /* unalign_load_cost  */
+  /* Although stores have a latency of 2 and compete for the
+     vector pipes, in practice it's better not to model that.  */
+  1, /* unalign_store_cost  */
+  1  /* store_cost  */
+};
+
+static const sve_vec_cost neoversen2_sve_vector_cost =
+{
+  {
+    2, /* int_stmt_cost  */
+    2, /* fp_stmt_cost  */
+    3, /* ld2_st2_permute_cost  */
+    4, /* ld3_st3_permute_cost  */
+    4, /* ld4_st4_permute_cost  */
+    3, /* permute_cost  */
+    /* Theoretically, a reduction involving 15 scalar ADDs could
+       complete in ~5 cycles and would have a cost of 15.  [SU]ADDV
+       completes in 11 cycles, so give it a cost of 15 + 6.  */
+    21, /* reduc_i8_cost  */
+    /* Likewise for 7 scalar ADDs (~3 cycles) vs. 9: 7 + 6.  */
+    13, /* reduc_i16_cost  */
+    /* Likewise for 3 scalar ADDs (~2 cycles) vs. 8: 3 + 6.  */
+    9, /* reduc_i32_cost  */
+    /* Likewise for 1 scalar ADD (~1 cycles) vs. 2: 1 + 1.  */
+    2, /* reduc_i64_cost  */
+    /* Theoretically, a reduction involving 7 scalar FADDs could
+       complete in ~8 cycles and would have a cost of 14.  FADDV
+       completes in 6 cycles, so give it a cost of 14 - 2.  */
+    12, /* reduc_f16_cost  */
+    /* Likewise for 3 scalar FADDs (~4 cycles) vs. 4: 6 - 0.  */
+    6, /* reduc_f32_cost  */
+    /* Likewise for 1 scalar FADD (~2 cycles) vs. 2: 2 - 0.  */
+    2, /* reduc_f64_cost  */
+    2, /* store_elt_extra_cost  */
+    /* This value is just inherited from the Cortex-A57 table.  */
+    8, /* vec_to_scalar_cost  */
+    /* See the comment above the Advanced SIMD versions.  */
+    4, /* scalar_to_vec_cost  */
+    4, /* align_load_cost  */
+    4, /* unalign_load_cost  */
+    /* Although stores have a latency of 2 and compete for the
+       vector pipes, in practice it's better not to model that.  */
+    1, /* unalign_store_cost  */
+    1  /* store_cost  */
+  },
+  3, /* clast_cost  */
+  10, /* fadda_f16_cost  */
+  6, /* fadda_f32_cost  */
+  4, /* fadda_f64_cost  */
+  /* A strided Advanced SIMD x64 load would take two parallel FP loads
+     (8 cycles) plus an insertion (2 cycles).  Assume a 64-bit SVE gather
+     is 1 cycle more.  The Advanced SIMD version is costed as 2 scalar loads
+     (cost 8) and a vec_construct (cost 2).  Add a full vector operation
+     (cost 2) to that, to avoid the difference being lost in rounding.
+
+     There is no easy comparison between a strided Advanced SIMD x32 load
+     and an SVE 32-bit gather, but cost an SVE 32-bit gather as 1 vector
+     operation more than a 64-bit gather.  */
+  14, /* gather_load_x32_cost  */
+  12, /* gather_load_x64_cost  */
+  3 /* scatter_store_elt_cost  */
+};
+
+static const aarch64_scalar_vec_issue_info neoversen2_scalar_issue_info =
+{
+  3, /* loads_stores_per_cycle  */
+  2, /* stores_per_cycle  */
+  4, /* general_ops_per_cycle  */
+  0, /* fp_simd_load_general_ops  */
+  1 /* fp_simd_store_general_ops  */
+};
+
+static const aarch64_advsimd_vec_issue_info neoversen2_advsimd_issue_info =
+{
+  {
+    3, /* loads_stores_per_cycle  */
+    2, /* stores_per_cycle  */
+    2, /* general_ops_per_cycle  */
+    0, /* fp_simd_load_general_ops  */
+    1 /* fp_simd_store_general_ops  */
+  },
+  2, /* ld2_st2_general_ops  */
+  2, /* ld3_st3_general_ops  */
+  3 /* ld4_st4_general_ops  */
+};
+
+static const aarch64_sve_vec_issue_info neoversen2_sve_issue_info =
+{
+  {
+    {
+      3, /* loads_per_cycle  */
+      2, /* stores_per_cycle  */
+      2, /* general_ops_per_cycle  */
+      0, /* fp_simd_load_general_ops  */
+      1 /* fp_simd_store_general_ops  */
+    },
+    2, /* ld2_st2_general_ops  */
+    3, /* ld3_st3_general_ops  */
+    3 /* ld4_st4_general_ops  */
+  },
+  2, /* pred_ops_per_cycle  */
+  2, /* while_pred_ops  */
+  2, /* int_cmp_pred_ops  */
+  1, /* fp_cmp_pred_ops  */
+  1, /* gather_scatter_pair_general_ops  */
+  1 /* gather_scatter_pair_pred_ops  */
+};
+
+static const aarch64_vec_issue_info neoversen2_vec_issue_info =
+{
+  &neoversen2_scalar_issue_info,
+  &neoversen2_advsimd_issue_info,
+  &neoversen2_sve_issue_info
+};
+
+/* Neoverse N2 costs for vector insn classes.  */
+static const struct cpu_vector_cost neoversen2_vector_cost =
+{
+  1, /* scalar_int_stmt_cost  */
+  2, /* scalar_fp_stmt_cost  */
+  4, /* scalar_load_cost  */
+  1, /* scalar_store_cost  */
+  1, /* cond_taken_branch_cost  */
+  1, /* cond_not_taken_branch_cost  */
+  &neoversen2_advsimd_vector_cost, /* advsimd  */
+  &neoversen2_sve_vector_cost, /* sve  */
+  &neoversen2_vec_issue_info /* issue_info  */
+};
+
+static const struct tune_params neoversen2_tunings =
+{
+  &cortexa76_extra_costs,
+  &neoversen2_addrcost_table,
+  &neoversen2_regmove_cost,
+  &neoversen2_vector_cost,
+  &generic_branch_cost,
+  &generic_approx_modes,
+  SVE_128, /* sve_width  */
+  { 4, /* load_int.  */
+    1, /* store_int.  */
+    6, /* load_fp.  */
+    2, /* store_fp.  */
+    6, /* load_pred.  */
+    1 /* store_pred.  */
+  }, /* memmov_cost.  */
+  3, /* issue_rate  */
+  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops  */
+  "32:16",	/* function_align.  */
+  "4",		/* jump_align.  */
+  "32:16",	/* loop_align.  */
+  2,	/* int_reassoc_width.  */
+  4,	/* fp_reassoc_width.  */
+  1,	/* fma_reassoc_width.  */
+  2,	/* vec_reassoc_width.  */
+  2,	/* min_div_recip_mul_sf.  */
+  2,	/* min_div_recip_mul_df.  */
+  0,	/* max_case_values.  */
+  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
+  (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND
+   | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
+   | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
+   | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT),	/* tune_flags.  */
+  &generic_prefetch_tune,
+  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
+  AARCH64_LDP_STP_POLICY_ALWAYS	   /* stp_policy_model.  */
+};
+
+#endif /* GCC_AARCH64_H_NEOVERSEN2.  */
diff --git a/gcc/config/aarch64/tuning_models/neoversev1.h b/gcc/config/aarch64/tuning_models/neoversev1.h
new file mode 100644
index 0000000000000000000000000000000000000000..584a5000e06f598dcdd3bcc533dc6dbc642223ca
--- /dev/null
+++ b/gcc/config/aarch64/tuning_models/neoversev1.h
@@ -0,0 +1,237 @@
+/* Tuning model description for AArch64 architecture.
+   Copyright (C) 2009-2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_AARCH64_H_NEOVERSEV1
+#define GCC_AARCH64_H_NEOVERSEV1
+
+#include "generic.h"
+
+static const struct cpu_addrcost_table neoversev1_addrcost_table =
+{
+    {
+      1, /* hi  */
+      0, /* si  */
+      0, /* di  */
+      1, /* ti  */
+    },
+  0, /* pre_modify  */
+  0, /* post_modify  */
+  3, /* post_modify_ld3_st3  */
+  3, /* post_modify_ld4_st4  */
+  0, /* register_offset  */
+  0, /* register_sextend  */
+  0, /* register_zextend  */
+  0 /* imm_offset  */
+};
+
+static const struct cpu_regmove_cost neoversev1_regmove_cost =
+{
+  1, /* GP2GP  */
+  /* Spilling to int<->fp instead of memory is recommended so set
+     realistic costs compared to memmov_cost.  */
+  3, /* GP2FP  */
+  2, /* FP2GP  */
+  2 /* FP2FP  */
+};
+
+static const advsimd_vec_cost neoversev1_advsimd_vector_cost =
+{
+  2, /* int_stmt_cost  */
+  2, /* fp_stmt_cost  */
+  4, /* ld2_st2_permute_cost */
+  4, /* ld3_st3_permute_cost  */
+  5, /* ld4_st4_permute_cost  */
+  3, /* permute_cost  */
+  4, /* reduc_i8_cost  */
+  4, /* reduc_i16_cost  */
+  2, /* reduc_i32_cost  */
+  2, /* reduc_i64_cost  */
+  6, /* reduc_f16_cost  */
+  3, /* reduc_f32_cost  */
+  2, /* reduc_f64_cost  */
+  2, /* store_elt_extra_cost  */
+  /* This value is just inherited from the Cortex-A57 table.  */
+  8, /* vec_to_scalar_cost  */
+  /* This depends very much on what the scalar value is and
+     where it comes from.  E.g. some constants take two dependent
+     instructions or a load, while others might be moved from a GPR.
+     4 seems to be a reasonable compromise in practice.  */
+  4, /* scalar_to_vec_cost  */
+  4, /* align_load_cost  */
+  4, /* unalign_load_cost  */
+  /* Although stores have a latency of 2 and compete for the
+     vector pipes, in practice it's better not to model that.  */
+  1, /* unalign_store_cost  */
+  1  /* store_cost  */
+};
+
+static const sve_vec_cost neoversev1_sve_vector_cost =
+{
+  {
+    2, /* int_stmt_cost  */
+    2, /* fp_stmt_cost  */
+    4, /* ld2_st2_permute_cost  */
+    7, /* ld3_st3_permute_cost  */
+    8, /* ld4_st4_permute_cost  */
+    3, /* permute_cost  */
+    /* Theoretically, a reduction involving 31 scalar ADDs could
+       complete in ~9 cycles and would have a cost of 31.  [SU]ADDV
+       completes in 14 cycles, so give it a cost of 31 + 5.  */
+    36, /* reduc_i8_cost  */
+    /* Likewise for 15 scalar ADDs (~5 cycles) vs. 12: 15 + 7.  */
+    22, /* reduc_i16_cost  */
+    /* Likewise for 7 scalar ADDs (~3 cycles) vs. 10: 7 + 7.  */
+    14, /* reduc_i32_cost  */
+    /* Likewise for 3 scalar ADDs (~2 cycles) vs. 10: 3 + 8.  */
+    11, /* reduc_i64_cost  */
+    /* Theoretically, a reduction involving 15 scalar FADDs could
+       complete in ~9 cycles and would have a cost of 30.  FADDV
+       completes in 13 cycles, so give it a cost of 30 + 4.  */
+    34, /* reduc_f16_cost  */
+    /* Likewise for 7 scalar FADDs (~6 cycles) vs. 11: 14 + 5.  */
+    19, /* reduc_f32_cost  */
+    /* Likewise for 3 scalar FADDs (~4 cycles) vs. 9: 6 + 5.  */
+    11, /* reduc_f64_cost  */
+    2, /* store_elt_extra_cost  */
+    /* This value is just inherited from the Cortex-A57 table.  */
+    8, /* vec_to_scalar_cost  */
+    /* See the comment above the Advanced SIMD versions.  */
+    4, /* scalar_to_vec_cost  */
+    4, /* align_load_cost  */
+    4, /* unalign_load_cost  */
+    /* Although stores have a latency of 2 and compete for the
+       vector pipes, in practice it's better not to model that.  */
+    1, /* unalign_store_cost  */
+    1  /* store_cost  */
+  },
+  3, /* clast_cost  */
+  19, /* fadda_f16_cost  */
+  11, /* fadda_f32_cost  */
+  8, /* fadda_f64_cost  */
+  32, /* gather_load_x32_cost  */
+  16, /* gather_load_x64_cost  */
+  3 /* scatter_store_elt_cost  */
+};
+
+static const aarch64_scalar_vec_issue_info neoversev1_scalar_issue_info =
+{
+  3, /* loads_stores_per_cycle  */
+  2, /* stores_per_cycle  */
+  4, /* general_ops_per_cycle  */
+  0, /* fp_simd_load_general_ops  */
+  1 /* fp_simd_store_general_ops  */
+};
+
+static const aarch64_advsimd_vec_issue_info neoversev1_advsimd_issue_info =
+{
+  {
+    3, /* loads_stores_per_cycle  */
+    2, /* stores_per_cycle  */
+    4, /* general_ops_per_cycle  */
+    0, /* fp_simd_load_general_ops  */
+    1 /* fp_simd_store_general_ops  */
+  },
+  2, /* ld2_st2_general_ops  */
+  2, /* ld3_st3_general_ops  */
+  3 /* ld4_st4_general_ops  */
+};
+
+static const aarch64_sve_vec_issue_info neoversev1_sve_issue_info =
+{
+  {
+    {
+      2, /* loads_per_cycle  */
+      2, /* stores_per_cycle  */
+      2, /* general_ops_per_cycle  */
+      0, /* fp_simd_load_general_ops  */
+      1 /* fp_simd_store_general_ops  */
+    },
+    2, /* ld2_st2_general_ops  */
+    2, /* ld3_st3_general_ops  */
+    3 /* ld4_st4_general_ops  */
+  },
+  1, /* pred_ops_per_cycle  */
+  2, /* while_pred_ops  */
+  2, /* int_cmp_pred_ops  */
+  1, /* fp_cmp_pred_ops  */
+  1, /* gather_scatter_pair_general_ops  */
+  1 /* gather_scatter_pair_pred_ops  */
+};
+
+static const aarch64_vec_issue_info neoversev1_vec_issue_info =
+{
+  &neoversev1_scalar_issue_info,
+  &neoversev1_advsimd_issue_info,
+  &neoversev1_sve_issue_info
+};
+
+/* Neoverse V1 costs for vector insn classes.  */
+static const struct cpu_vector_cost neoversev1_vector_cost =
+{
+  1, /* scalar_int_stmt_cost  */
+  2, /* scalar_fp_stmt_cost  */
+  4, /* scalar_load_cost  */
+  1, /* scalar_store_cost  */
+  1, /* cond_taken_branch_cost  */
+  1, /* cond_not_taken_branch_cost  */
+  &neoversev1_advsimd_vector_cost, /* advsimd  */
+  &neoversev1_sve_vector_cost, /* sve  */
+  &neoversev1_vec_issue_info /* issue_info  */
+};
+
+static const struct tune_params neoversev1_tunings =
+{
+  &cortexa76_extra_costs,
+  &neoversev1_addrcost_table,
+  &neoversev1_regmove_cost,
+  &neoversev1_vector_cost,
+  &generic_branch_cost,
+  &generic_approx_modes,
+  SVE_256, /* sve_width  */
+  { 4, /* load_int.  */
+    2, /* store_int.  */
+    6, /* load_fp.  */
+    2, /* store_fp.  */
+    6, /* load_pred.  */
+    1 /* store_pred.  */
+  }, /* memmov_cost.  */
+  3, /* issue_rate  */
+  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops  */
+  "32:16",	/* function_align.  */
+  "4",		/* jump_align.  */
+  "32:16",	/* loop_align.  */
+  2,	/* int_reassoc_width.  */
+  4,	/* fp_reassoc_width.  */
+  4,	/* fma_reassoc_width.  */
+  2,	/* vec_reassoc_width.  */
+  2,	/* min_div_recip_mul_sf.  */
+  2,	/* min_div_recip_mul_df.  */
+  0,	/* max_case_values.  */
+  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
+  (AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
+   | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
+   | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT
+   | AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND),	/* tune_flags.  */
+  &generic_prefetch_tune,
+  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
+  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
+};
+
+
+#endif /* GCC_AARCH64_H_NEOVERSEV1.  */
diff --git a/gcc/config/aarch64/tuning_models/neoversev2.h b/gcc/config/aarch64/tuning_models/neoversev2.h
new file mode 100644
index 0000000000000000000000000000000000000000..28d4244ef4c99ecdffb7408e39dc21bc191223de
--- /dev/null
+++ b/gcc/config/aarch64/tuning_models/neoversev2.h
@@ -0,0 +1,245 @@
+/* Tuning model description for AArch64 architecture.
+   Copyright (C) 2009-2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_AARCH64_H_NEOVERSEV2
+#define GCC_AARCH64_H_NEOVERSEV2
+
+#include "generic.h"
+
+static const struct cpu_addrcost_table neoversev2_addrcost_table =
+{
+    {
+      1, /* hi  */
+      0, /* si  */
+      0, /* di  */
+      1, /* ti  */
+    },
+  0, /* pre_modify  */
+  0, /* post_modify  */
+  2, /* post_modify_ld3_st3  */
+  2, /* post_modify_ld4_st4  */
+  0, /* register_offset  */
+  0, /* register_sextend  */
+  0, /* register_zextend  */
+  0 /* imm_offset  */
+};
+
+static const struct cpu_regmove_cost neoversev2_regmove_cost =
+{
+  1, /* GP2GP  */
+  /* Spilling to int<->fp instead of memory is recommended so set
+     realistic costs compared to memmov_cost.  */
+  3, /* GP2FP  */
+  2, /* FP2GP  */
+  2 /* FP2FP  */
+};
+
+static const advsimd_vec_cost neoversev2_advsimd_vector_cost =
+{
+  2, /* int_stmt_cost  */
+  2, /* fp_stmt_cost  */
+  2, /* ld2_st2_permute_cost */
+  2, /* ld3_st3_permute_cost  */
+  3, /* ld4_st4_permute_cost  */
+  3, /* permute_cost  */
+  4, /* reduc_i8_cost  */
+  4, /* reduc_i16_cost  */
+  2, /* reduc_i32_cost  */
+  2, /* reduc_i64_cost  */
+  6, /* reduc_f16_cost  */
+  3, /* reduc_f32_cost  */
+  2, /* reduc_f64_cost  */
+  2, /* store_elt_extra_cost  */
+  /* This value is just inherited from the Cortex-A57 table.  */
+  8, /* vec_to_scalar_cost  */
+  /* This depends very much on what the scalar value is and
+     where it comes from.  E.g. some constants take two dependent
+     instructions or a load, while others might be moved from a GPR.
+     4 seems to be a reasonable compromise in practice.  */
+  4, /* scalar_to_vec_cost  */
+  4, /* align_load_cost  */
+  4, /* unalign_load_cost  */
+  /* Although stores have a latency of 2 and compete for the
+     vector pipes, in practice it's better not to model that.  */
+  1, /* unalign_store_cost  */
+  1  /* store_cost  */
+};
+
+static const sve_vec_cost neoversev2_sve_vector_cost =
+{
+  {
+    2, /* int_stmt_cost  */
+    2, /* fp_stmt_cost  */
+    3, /* ld2_st2_permute_cost  */
+    3, /* ld3_st3_permute_cost  */
+    4, /* ld4_st4_permute_cost  */
+    3, /* permute_cost  */
+    /* Theoretically, a reduction involving 15 scalar ADDs could
+       complete in ~3 cycles and would have a cost of 15.  [SU]ADDV
+       completes in 11 cycles, so give it a cost of 15 + 8.  */
+    21, /* reduc_i8_cost  */
+    /* Likewise for 7 scalar ADDs (~2 cycles) vs. 9: 7 + 7.  */
+    14, /* reduc_i16_cost  */
+    /* Likewise for 3 scalar ADDs (~2 cycles) vs. 8: 3 + 4.  */
+    7, /* reduc_i32_cost  */
+    /* Likewise for 1 scalar ADD (~1 cycles) vs. 2: 1 + 1.  */
+    2, /* reduc_i64_cost  */
+    /* Theoretically, a reduction involving 7 scalar FADDs could
+       complete in ~6 cycles and would have a cost of 14.  FADDV
+       completes in 8 cycles, so give it a cost of 14 + 2.  */
+    16, /* reduc_f16_cost  */
+    /* Likewise for 3 scalar FADDs (~4 cycles) vs. 6: 6 + 2.  */
+    8, /* reduc_f32_cost  */
+    /* Likewise for 1 scalar FADD (~2 cycles) vs. 4: 2 + 2.  */
+    4, /* reduc_f64_cost  */
+    2, /* store_elt_extra_cost  */
+    /* This value is just inherited from the Cortex-A57 table.  */
+    8, /* vec_to_scalar_cost  */
+    /* See the comment above the Advanced SIMD versions.  */
+    4, /* scalar_to_vec_cost  */
+    4, /* align_load_cost  */
+    4, /* unalign_load_cost  */
+    /* Although stores have a latency of 2 and compete for the
+       vector pipes, in practice it's better not to model that.  */
+    1, /* unalign_store_cost  */
+    1  /* store_cost  */
+  },
+  3, /* clast_cost  */
+  10, /* fadda_f16_cost  */
+  6, /* fadda_f32_cost  */
+  4, /* fadda_f64_cost  */
+  /* A strided Advanced SIMD x64 load would take two parallel FP loads
+     (8 cycles) plus an insertion (2 cycles).  Assume a 64-bit SVE gather
+     is 1 cycle more.  The Advanced SIMD version is costed as 2 scalar loads
+     (cost 8) and a vec_construct (cost 2).  Add a full vector operation
+     (cost 2) to that, to avoid the difference being lost in rounding.
+
+     There is no easy comparison between a strided Advanced SIMD x32 load
+     and an SVE 32-bit gather, but cost an SVE 32-bit gather as 1 vector
+     operation more than a 64-bit gather.  */
+  14, /* gather_load_x32_cost  */
+  12, /* gather_load_x64_cost  */
+  3 /* scatter_store_elt_cost  */
+};
+
+static const aarch64_scalar_vec_issue_info neoversev2_scalar_issue_info =
+{
+  3, /* loads_stores_per_cycle  */
+  2, /* stores_per_cycle  */
+  6, /* general_ops_per_cycle  */
+  0, /* fp_simd_load_general_ops  */
+  1 /* fp_simd_store_general_ops  */
+};
+
+static const aarch64_advsimd_vec_issue_info neoversev2_advsimd_issue_info =
+{
+  {
+    3, /* loads_stores_per_cycle  */
+    2, /* stores_per_cycle  */
+    4, /* general_ops_per_cycle  */
+    0, /* fp_simd_load_general_ops  */
+    1 /* fp_simd_store_general_ops  */
+  },
+  2, /* ld2_st2_general_ops  */
+  2, /* ld3_st3_general_ops  */
+  3 /* ld4_st4_general_ops  */
+};
+
+static const aarch64_sve_vec_issue_info neoversev2_sve_issue_info =
+{
+  {
+    {
+      3, /* loads_per_cycle  */
+      2, /* stores_per_cycle  */
+      4, /* general_ops_per_cycle  */
+      0, /* fp_simd_load_general_ops  */
+      1 /* fp_simd_store_general_ops  */
+    },
+    2, /* ld2_st2_general_ops  */
+    3, /* ld3_st3_general_ops  */
+    3 /* ld4_st4_general_ops  */
+  },
+  2, /* pred_ops_per_cycle  */
+  2, /* while_pred_ops  */
+  2, /* int_cmp_pred_ops  */
+  1, /* fp_cmp_pred_ops  */
+  1, /* gather_scatter_pair_general_ops  */
+  1 /* gather_scatter_pair_pred_ops  */
+};
+
+static const aarch64_vec_issue_info neoversev2_vec_issue_info =
+{
+  &neoversev2_scalar_issue_info,
+  &neoversev2_advsimd_issue_info,
+  &neoversev2_sve_issue_info
+};
+
+/* Demeter costs for vector insn classes.  */
+static const struct cpu_vector_cost neoversev2_vector_cost =
+{
+  1, /* scalar_int_stmt_cost  */
+  2, /* scalar_fp_stmt_cost  */
+  4, /* scalar_load_cost  */
+  1, /* scalar_store_cost  */
+  1, /* cond_taken_branch_cost  */
+  1, /* cond_not_taken_branch_cost  */
+  &neoversev2_advsimd_vector_cost, /* advsimd  */
+  &neoversev2_sve_vector_cost, /* sve  */
+  &neoversev2_vec_issue_info /* issue_info  */
+};
+
+static const struct tune_params neoversev2_tunings =
+{
+  &cortexa76_extra_costs,
+  &neoversev2_addrcost_table,
+  &neoversev2_regmove_cost,
+  &neoversev2_vector_cost,
+  &generic_branch_cost,
+  &generic_approx_modes,
+  SVE_128, /* sve_width  */
+  { 4, /* load_int.  */
+    2, /* store_int.  */
+    6, /* load_fp.  */
+    1, /* store_fp.  */
+    6, /* load_pred.  */
+    2 /* store_pred.  */
+  }, /* memmov_cost.  */
+  5, /* issue_rate  */
+  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops  */
+  "32:16",	/* function_align.  */
+  "4",		/* jump_align.  */
+  "32:16",	/* loop_align.  */
+  3,	/* int_reassoc_width.  */
+  6,	/* fp_reassoc_width.  */
+  4,	/* fma_reassoc_width.  */
+  3,	/* vec_reassoc_width.  */
+  2,	/* min_div_recip_mul_sf.  */
+  2,	/* min_div_recip_mul_df.  */
+  0,	/* max_case_values.  */
+  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
+  (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND
+   | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
+   | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
+   | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT),	/* tune_flags.  */
+  &generic_prefetch_tune,
+  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
+  AARCH64_LDP_STP_POLICY_ALWAYS	   /* stp_policy_model.  */
+};
+
+#endif /* GCC_AARCH64_H_NEOVERSEV2.  */
diff --git a/gcc/config/aarch64/tuning_models/qdf24xx.h b/gcc/config/aarch64/tuning_models/qdf24xx.h
new file mode 100644
index 0000000000000000000000000000000000000000..29c9b9f5843acc15450a2492b141c02ee48a3f13
--- /dev/null
+++ b/gcc/config/aarch64/tuning_models/qdf24xx.h
@@ -0,0 +1,137 @@
+/* Tuning model description for AArch64 architecture.
+   Copyright (C) 2009-2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_AARCH64_H_QDF24XX
+#define GCC_AARCH64_H_QDF24XX
+
+#include "generic.h"
+
+static const struct cpu_addrcost_table qdf24xx_addrcost_table =
+{
+    {
+      1, /* hi  */
+      1, /* si  */
+      1, /* di  */
+      2, /* ti  */
+    },
+  1, /* pre_modify  */
+  1, /* post_modify  */
+  1, /* post_modify_ld3_st3  */
+  1, /* post_modify_ld4_st4  */
+  3, /* register_offset  */
+  3, /* register_sextend  */
+  3, /* register_zextend  */
+  2, /* imm_offset  */
+};
+
+static const struct cpu_regmove_cost qdf24xx_regmove_cost =
+{
+  2, /* GP2GP  */
+  /* Avoid the use of int<->fp moves for spilling.  */
+  6, /* GP2FP  */
+  6, /* FP2GP  */
+  4 /* FP2FP  */
+};
+
+static const advsimd_vec_cost qdf24xx_advsimd_vector_cost =
+{
+  1, /* int_stmt_cost  */
+  3, /* fp_stmt_cost  */
+  0, /* ld2_st2_permute_cost  */
+  0, /* ld3_st3_permute_cost  */
+  0, /* ld4_st4_permute_cost  */
+  2, /* permute_cost  */
+  1, /* reduc_i8_cost  */
+  1, /* reduc_i16_cost  */
+  1, /* reduc_i32_cost  */
+  1, /* reduc_i64_cost  */
+  1, /* reduc_f16_cost  */
+  1, /* reduc_f32_cost  */
+  1, /* reduc_f64_cost  */
+  1, /* store_elt_extra_cost  */
+  1, /* vec_to_scalar_cost  */
+  1, /* scalar_to_vec_cost  */
+  1, /* align_load_cost  */
+  1, /* unalign_load_cost  */
+  1, /* unalign_store_cost  */
+  1  /* store_cost  */
+};
+
+/* QDF24XX costs for vector insn classes.  */
+static const struct cpu_vector_cost qdf24xx_vector_cost =
+{
+  1, /* scalar_int_stmt_cost  */
+  1, /* scalar_fp_stmt_cost  */
+  1, /* scalar_load_cost  */
+  1, /* scalar_store_cost  */
+  3, /* cond_taken_branch_cost  */
+  1, /* cond_not_taken_branch_cost  */
+  &qdf24xx_advsimd_vector_cost, /* advsimd  */
+  nullptr, /* sve  */
+  nullptr /* issue_info  */
+};
+
+static const cpu_prefetch_tune qdf24xx_prefetch_tune =
+{
+  4,			/* num_slots  */
+  32,			/* l1_cache_size  */
+  64,			/* l1_cache_line_size  */
+  512,			/* l2_cache_size  */
+  false,		/* prefetch_dynamic_strides */
+  2048,			/* minimum_stride */
+  3			/* default_opt_level  */
+};
+
+static const struct tune_params qdf24xx_tunings =
+{
+  &qdf24xx_extra_costs,
+  &qdf24xx_addrcost_table,
+  &qdf24xx_regmove_cost,
+  &qdf24xx_vector_cost,
+  &generic_branch_cost,
+  &generic_approx_modes,
+  SVE_NOT_IMPLEMENTED, /* sve_width  */
+  { 4, /* load_int.  */
+    4, /* store_int.  */
+    4, /* load_fp.  */
+    4, /* store_fp.  */
+    4, /* load_pred.  */
+    4 /* store_pred.  */
+  }, /* memmov_cost.  */
+  4, /* issue_rate  */
+  (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
+   | AARCH64_FUSE_MOVK_MOVK), /* fuseable_ops  */
+  "16",	/* function_align.  */
+  "8",	/* jump_align.  */
+  "16",	/* loop_align.  */
+  2,	/* int_reassoc_width.  */
+  4,	/* fp_reassoc_width.  */
+  1,	/* fma_reassoc_width.  */
+  1,	/* vec_reassoc_width.  */
+  2,	/* min_div_recip_mul_sf.  */
+  2,	/* min_div_recip_mul_df.  */
+  0,	/* max_case_values.  */
+  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
+  AARCH64_EXTRA_TUNE_RENAME_LOAD_REGS, /* tune_flags.  */
+  &qdf24xx_prefetch_tune,
+  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
+  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
+};
+
+#endif /* GCC_AARCH64_H_QDF24XX.  */
diff --git a/gcc/config/aarch64/tuning_models/saphira.h b/gcc/config/aarch64/tuning_models/saphira.h
new file mode 100644
index 0000000000000000000000000000000000000000..e584d316bb7c3c2d232cf7623a92100ad261f07d
--- /dev/null
+++ b/gcc/config/aarch64/tuning_models/saphira.h
@@ -0,0 +1,63 @@
+/* Tuning model description for AArch64 architecture.
+   Copyright (C) 2009-2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_AARCH64_H_SAPHIRA
+#define GCC_AARCH64_H_SAPHIRA
+
+#include "generic.h"
+
+/* Tuning structure for the Qualcomm Saphira core.  Default to falkor values
+   for now.  */
+static const struct tune_params saphira_tunings =
+{
+  &generic_extra_costs,
+  &generic_addrcost_table,
+  &generic_regmove_cost,
+  &generic_vector_cost,
+  &generic_branch_cost,
+  &generic_approx_modes,
+  SVE_NOT_IMPLEMENTED, /* sve_width  */
+  { 4, /* load_int.  */
+    4, /* store_int.  */
+    4, /* load_fp.  */
+    4, /* store_fp.  */
+    4, /* load_pred.  */
+    4 /* store_pred.  */
+  }, /* memmov_cost.  */
+  4, /* issue_rate  */
+  (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
+   | AARCH64_FUSE_MOVK_MOVK), /* fuseable_ops  */
+  "16",	/* function_align.  */
+  "8",	/* jump_align.  */
+  "16",	/* loop_align.  */
+  2,	/* int_reassoc_width.  */
+  4,	/* fp_reassoc_width.  */
+  1,	/* fma_reassoc_width.  */
+  1,	/* vec_reassoc_width.  */
+  2,	/* min_div_recip_mul_sf.  */
+  2,	/* min_div_recip_mul_df.  */
+  0,	/* max_case_values.  */
+  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
+  (AARCH64_EXTRA_TUNE_NONE),		/* tune_flags.  */
+  &generic_prefetch_tune,
+  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
+  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
+};
+
+#endif /* GCC_AARCH64_H_SAPHIRA.  */
diff --git a/gcc/config/aarch64/tuning_models/thunderx.h b/gcc/config/aarch64/tuning_models/thunderx.h
new file mode 100644
index 0000000000000000000000000000000000000000..dd4b9d539fc5cf2bd20d84e91d6b72fa7237f99f
--- /dev/null
+++ b/gcc/config/aarch64/tuning_models/thunderx.h
@@ -0,0 +1,117 @@
+/* Tuning model description for AArch64 architecture.
+   Copyright (C) 2009-2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_AARCH64_H_THUNDERX
+#define GCC_AARCH64_H_THUNDERX
+
+#include "generic.h"
+
+static const struct cpu_regmove_cost thunderx_regmove_cost =
+{
+  2, /* GP2GP  */
+  2, /* GP2FP  */
+  6, /* FP2GP  */
+  4 /* FP2FP  */
+};
+
+static const advsimd_vec_cost thunderx_advsimd_vector_cost =
+{
+  4, /* int_stmt_cost  */
+  1, /* fp_stmt_cost  */
+  0, /* ld2_st2_permute_cost  */
+  0, /* ld3_st3_permute_cost  */
+  0, /* ld4_st4_permute_cost  */
+  4, /* permute_cost  */
+  2, /* reduc_i8_cost  */
+  2, /* reduc_i16_cost  */
+  2, /* reduc_i32_cost  */
+  2, /* reduc_i64_cost  */
+  2, /* reduc_f16_cost  */
+  2, /* reduc_f32_cost  */
+  2, /* reduc_f64_cost  */
+  2, /* store_elt_extra_cost  */
+  2, /* vec_to_scalar_cost  */
+  2, /* scalar_to_vec_cost  */
+  3, /* align_load_cost  */
+  5, /* unalign_load_cost  */
+  5, /* unalign_store_cost  */
+  1  /* store_cost  */
+};
+
+/* ThunderX costs for vector insn classes.  */
+static const struct cpu_vector_cost thunderx_vector_cost =
+{
+  1, /* scalar_int_stmt_cost  */
+  1, /* scalar_fp_stmt_cost  */
+  3, /* scalar_load_cost  */
+  1, /* scalar_store_cost  */
+  3, /* cond_taken_branch_cost  */
+  3, /* cond_not_taken_branch_cost  */
+  &thunderx_advsimd_vector_cost, /* advsimd  */
+  nullptr, /* sve  */
+  nullptr /* issue_info  */
+};
+
+static const cpu_prefetch_tune thunderx_prefetch_tune =
+{
+  8,			/* num_slots  */
+  32,			/* l1_cache_size  */
+  128,			/* l1_cache_line_size  */
+  -1,			/* l2_cache_size  */
+  true,			/* prefetch_dynamic_strides */
+  -1,			/* minimum_stride */
+  -1			/* default_opt_level  */
+};
+
+static const struct tune_params thunderx_tunings =
+{
+  &thunderx_extra_costs,
+  &generic_addrcost_table,
+  &thunderx_regmove_cost,
+  &thunderx_vector_cost,
+  &generic_branch_cost,
+  &generic_approx_modes,
+  SVE_NOT_IMPLEMENTED, /* sve_width  */
+  { 6, /* load_int.  */
+    6, /* store_int.  */
+    6, /* load_fp.  */
+    6, /* store_fp.  */
+    6, /* load_pred.  */
+    6 /* store_pred.  */
+  }, /* memmov_cost.  */
+  2, /* issue_rate  */
+  AARCH64_FUSE_ALU_BRANCH, /* fusible_ops  */
+  "8",	/* function_align.  */
+  "8",	/* jump_align.  */
+  "8",	/* loop_align.  */
+  2,	/* int_reassoc_width.  */
+  4,	/* fp_reassoc_width.  */
+  1,	/* fma_reassoc_width.  */
+  1,	/* vec_reassoc_width.  */
+  2,	/* min_div_recip_mul_sf.  */
+  2,	/* min_div_recip_mul_df.  */
+  0,	/* max_case_values.  */
+  tune_params::AUTOPREFETCHER_OFF,	/* autoprefetcher_model.  */
+  (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND),	/* tune_flags.  */
+  &thunderx_prefetch_tune,
+  AARCH64_LDP_STP_POLICY_ALIGNED,   /* ldp_policy_model.  */
+  AARCH64_LDP_STP_POLICY_ALIGNED    /* stp_policy_model.  */
+};
+
+#endif /* GCC_AARCH64_H_THUNDERX.  */
diff --git a/gcc/config/aarch64/tuning_models/thunderx2t99.h b/gcc/config/aarch64/tuning_models/thunderx2t99.h
new file mode 100644
index 0000000000000000000000000000000000000000..0a376e0bab37b0b5bc1ea23de0e96a9245846fd7
--- /dev/null
+++ b/gcc/config/aarch64/tuning_models/thunderx2t99.h
@@ -0,0 +1,137 @@
+/* Tuning model description for AArch64 architecture.
+   Copyright (C) 2009-2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_AARCH64_H_THUNDERX2T99
+#define GCC_AARCH64_H_THUNDERX2T99
+
+#include "generic.h"
+
+static const struct cpu_addrcost_table thunderx2t99_addrcost_table =
+{
+    {
+      1, /* hi  */
+      1, /* si  */
+      1, /* di  */
+      2, /* ti  */
+    },
+  0, /* pre_modify  */
+  0, /* post_modify  */
+  0, /* post_modify_ld3_st3  */
+  0, /* post_modify_ld4_st4  */
+  2, /* register_offset  */
+  3, /* register_sextend  */
+  3, /* register_zextend  */
+  0, /* imm_offset  */
+};
+
+static const struct cpu_regmove_cost thunderx2t99_regmove_cost =
+{
+  1, /* GP2GP  */
+  /* Avoid the use of int<->fp moves for spilling.  */
+  5, /* GP2FP  */
+  6, /* FP2GP  */
+  3, /* FP2FP  */
+};
+
+static const advsimd_vec_cost thunderx2t99_advsimd_vector_cost =
+{
+  4, /* int_stmt_cost  */
+  5, /* fp_stmt_cost  */
+  0, /* ld2_st2_permute_cost  */
+  0, /* ld3_st3_permute_cost  */
+  0, /* ld4_st4_permute_cost  */
+  10, /* permute_cost  */
+  6, /* reduc_i8_cost  */
+  6, /* reduc_i16_cost  */
+  6, /* reduc_i32_cost  */
+  6, /* reduc_i64_cost  */
+  6, /* reduc_f16_cost  */
+  6, /* reduc_f32_cost  */
+  6, /* reduc_f64_cost  */
+  6, /* store_elt_extra_cost  */
+  6, /* vec_to_scalar_cost  */
+  5, /* scalar_to_vec_cost  */
+  4, /* align_load_cost  */
+  4, /* unalign_load_cost  */
+  1, /* unalign_store_cost  */
+  1  /* store_cost  */
+};
+
+/* Costs for vector insn classes for Vulcan.  */
+static const struct cpu_vector_cost thunderx2t99_vector_cost =
+{
+  1, /* scalar_int_stmt_cost  */
+  6, /* scalar_fp_stmt_cost  */
+  4, /* scalar_load_cost  */
+  1, /* scalar_store_cost  */
+  2, /* cond_taken_branch_cost  */
+  1,  /* cond_not_taken_branch_cost  */
+  &thunderx2t99_advsimd_vector_cost, /* advsimd  */
+  nullptr, /* sve  */
+  nullptr /* issue_info  */
+};
+
+static const cpu_prefetch_tune thunderx2t99_prefetch_tune =
+{
+  8,			/* num_slots  */
+  32,			/* l1_cache_size  */
+  64,			/* l1_cache_line_size  */
+  256,			/* l2_cache_size  */
+  true,			/* prefetch_dynamic_strides */
+  -1,			/* minimum_stride */
+  -1			/* default_opt_level  */
+};
+
+static const struct tune_params thunderx2t99_tunings =
+{
+  &thunderx2t99_extra_costs,
+  &thunderx2t99_addrcost_table,
+  &thunderx2t99_regmove_cost,
+  &thunderx2t99_vector_cost,
+  &generic_branch_cost,
+  &generic_approx_modes,
+  SVE_NOT_IMPLEMENTED, /* sve_width  */
+  { 4, /* load_int.  */
+    4, /* store_int.  */
+    4, /* load_fp.  */
+    4, /* store_fp.  */
+    4, /* load_pred.  */
+    4 /* store_pred.  */
+  }, /* memmov_cost.  */
+  4, /* issue_rate.  */
+  (AARCH64_FUSE_ALU_BRANCH | AARCH64_FUSE_AES_AESMC
+   | AARCH64_FUSE_ALU_CBZ), /* fusible_ops  */
+  "16",	/* function_align.  */
+  "8",	/* jump_align.  */
+  "16",	/* loop_align.  */
+  3,	/* int_reassoc_width.  */
+  2,	/* fp_reassoc_width.  */
+  1,	/* fma_reassoc_width.  */
+  2,	/* vec_reassoc_width.  */
+  2,	/* min_div_recip_mul_sf.  */
+  2,	/* min_div_recip_mul_df.  */
+  0,	/* max_case_values.  */
+  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
+  (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
+  &thunderx2t99_prefetch_tune,
+  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
+  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
+};
+
+#endif /* GCC_AARCH64_H_THUNDERX2T99.  */
diff --git a/gcc/config/aarch64/tuning_models/thunderx3t110.h b/gcc/config/aarch64/tuning_models/thunderx3t110.h
new file mode 100644
index 0000000000000000000000000000000000000000..65203b4af132e12e4994013fbab228bd3873b756
--- /dev/null
+++ b/gcc/config/aarch64/tuning_models/thunderx3t110.h
@@ -0,0 +1,136 @@
+/* Tuning model description for AArch64 architecture.
+   Copyright (C) 2009-2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_AARCH64_H_THUNDERX3T110
+#define GCC_AARCH64_H_THUNDERX3T110
+
+#include "generic.h"
+
+static const struct cpu_addrcost_table thunderx3t110_addrcost_table =
+{
+    {
+      1, /* hi  */
+      1, /* si  */
+      1, /* di  */
+      2, /* ti  */
+    },
+  0, /* pre_modify  */
+  0, /* post_modify  */
+  0, /* post_modify_ld3_st3  */
+  0, /* post_modify_ld4_st4  */
+  2, /* register_offset  */
+  3, /* register_sextend  */
+  3, /* register_zextend  */
+  0, /* imm_offset  */
+};
+
+static const struct cpu_regmove_cost thunderx3t110_regmove_cost =
+{
+  1, /* GP2GP  */
+  /* Avoid the use of int<->fp moves for spilling.  */
+  4, /* GP2FP  */
+  5, /* FP2GP  */
+  4  /* FP2FP  */
+};
+
+static const advsimd_vec_cost thunderx3t110_advsimd_vector_cost =
+{
+  5, /* int_stmt_cost  */
+  5, /* fp_stmt_cost  */
+  0, /* ld2_st2_permute_cost  */
+  0, /* ld3_st3_permute_cost  */
+  0, /* ld4_st4_permute_cost  */
+  10, /* permute_cost  */
+  5, /* reduc_i8_cost  */
+  5, /* reduc_i16_cost  */
+  5, /* reduc_i32_cost  */
+  5, /* reduc_i64_cost  */
+  5, /* reduc_f16_cost  */
+  5, /* reduc_f32_cost  */
+  5, /* reduc_f64_cost  */
+  5, /* store_elt_extra_cost  */
+  5, /* vec_to_scalar_cost  */
+  5, /* scalar_to_vec_cost  */
+  4, /* align_load_cost  */
+  4, /* unalign_load_cost  */
+  4, /* unalign_store_cost  */
+  4  /* store_cost  */
+};
+
+static const struct cpu_vector_cost thunderx3t110_vector_cost =
+{
+  1, /* scalar_int_stmt_cost  */
+  5, /* scalar_fp_stmt_cost  */
+  4, /* scalar_load_cost  */
+  1, /* scalar_store_cost  */
+  2, /* cond_taken_branch_cost  */
+  1,  /* cond_not_taken_branch_cost  */
+  &thunderx3t110_advsimd_vector_cost, /* advsimd  */
+  nullptr, /* sve  */
+  nullptr /* issue_info  */
+};
+
+static const cpu_prefetch_tune thunderx3t110_prefetch_tune =
+{
+  8,			/* num_slots  */
+  32,			/* l1_cache_size  */
+  64,			/* l1_cache_line_size  */
+  256,			/* l2_cache_size  */
+  true,			/* prefetch_dynamic_strides */
+  -1,			/* minimum_stride */
+  -1			/* default_opt_level  */
+};
+
+static const struct tune_params thunderx3t110_tunings =
+{
+  &thunderx3t110_extra_costs,
+  &thunderx3t110_addrcost_table,
+  &thunderx3t110_regmove_cost,
+  &thunderx3t110_vector_cost,
+  &generic_branch_cost,
+  &generic_approx_modes,
+  SVE_NOT_IMPLEMENTED, /* sve_width  */
+  { 4, /* load_int.  */
+    4, /* store_int.  */
+    4, /* load_fp.  */
+    4, /* store_fp.  */
+    4, /* load_pred.  */
+    4 /* store_pred.  */
+  }, /* memmov_cost.  */
+  6, /* issue_rate.  */
+  (AARCH64_FUSE_ALU_BRANCH | AARCH64_FUSE_AES_AESMC
+   | AARCH64_FUSE_ALU_CBZ), /* fusible_ops  */
+  "16",	/* function_align.  */
+  "8",	/* jump_align.  */
+  "16",	/* loop_align.  */
+  3,	/* int_reassoc_width.  */
+  2,	/* fp_reassoc_width.  */
+  1,	/* fma_reassoc_width.  */
+  2,	/* vec_reassoc_width.  */
+  2,	/* min_div_recip_mul_sf.  */
+  2,	/* min_div_recip_mul_df.  */
+  0,	/* max_case_values.  */
+  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
+  (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
+  &thunderx3t110_prefetch_tune,
+  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
+  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
+};
+
+#endif /* GCC_AARCH64_H_THUNDERX3T110.  */
diff --git a/gcc/config/aarch64/tuning_models/thunderxt88.h b/gcc/config/aarch64/tuning_models/thunderxt88.h
new file mode 100644
index 0000000000000000000000000000000000000000..dcc74d31484ee6b99d37920dbfe7b1d59377d074
--- /dev/null
+++ b/gcc/config/aarch64/tuning_models/thunderxt88.h
@@ -0,0 +1,72 @@
+/* Tuning model description for AArch64 architecture.
+   Copyright (C) 2009-2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_AARCH64_H_THUNDERXT88
+#define GCC_AARCH64_H_THUNDERXT88
+
+#include "generic.h"
+#include "thunderx.h"
+
+static const cpu_prefetch_tune thunderxt88_prefetch_tune =
+{
+  8,			/* num_slots  */
+  32,			/* l1_cache_size  */
+  128,			/* l1_cache_line_size  */
+  16*1024,		/* l2_cache_size  */
+  true,			/* prefetch_dynamic_strides */
+  -1,			/* minimum_stride */
+  3			/* default_opt_level  */
+};
+
+static const struct tune_params thunderxt88_tunings =
+{
+  &thunderx_extra_costs,
+  &generic_addrcost_table,
+  &thunderx_regmove_cost,
+  &thunderx_vector_cost,
+  &generic_branch_cost,
+  &generic_approx_modes,
+  SVE_NOT_IMPLEMENTED, /* sve_width  */
+  { 6, /* load_int.  */
+    6, /* store_int.  */
+    6, /* load_fp.  */
+    6, /* store_fp.  */
+    6, /* load_pred.  */
+    6 /* store_pred.  */
+  }, /* memmov_cost.  */
+  2, /* issue_rate  */
+  AARCH64_FUSE_ALU_BRANCH, /* fusible_ops  */
+  "8",	/* function_align.  */
+  "8",	/* jump_align.  */
+  "8",	/* loop_align.  */
+  2,	/* int_reassoc_width.  */
+  4,	/* fp_reassoc_width.  */
+  1,	/* fma_reassoc_width.  */
+  1,	/* vec_reassoc_width.  */
+  2,	/* min_div_recip_mul_sf.  */
+  2,	/* min_div_recip_mul_df.  */
+  0,	/* max_case_values.  */
+  tune_params::AUTOPREFETCHER_OFF,	/* autoprefetcher_model.  */
+  (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
+  &thunderxt88_prefetch_tune,
+  AARCH64_LDP_STP_POLICY_ALIGNED,   /* ldp_policy_model.  */
+  AARCH64_LDP_STP_POLICY_ALIGNED    /* stp_policy_model.  */
+};
+
+#endif /* GCC_AARCH64_H_THUNDERXT88.  */
diff --git a/gcc/config/aarch64/tuning_models/tsv110.h b/gcc/config/aarch64/tuning_models/tsv110.h
new file mode 100644
index 0000000000000000000000000000000000000000..42aeafce652fff34e3277194993dd4aa1f0383a1
--- /dev/null
+++ b/gcc/config/aarch64/tuning_models/tsv110.h
@@ -0,0 +1,137 @@
+/* Tuning model description for AArch64 architecture.
+   Copyright (C) 2009-2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_AARCH64_H_TSV110
+#define GCC_AARCH64_H_TSV110
+
+#include "generic.h"
+
+static const struct cpu_addrcost_table tsv110_addrcost_table =
+{
+    {
+      1, /* hi  */
+      0, /* si  */
+      0, /* di  */
+      1, /* ti  */
+    },
+  0, /* pre_modify  */
+  0, /* post_modify  */
+  0, /* post_modify_ld3_st3  */
+  0, /* post_modify_ld4_st4  */
+  0, /* register_offset  */
+  1, /* register_sextend  */
+  1, /* register_zextend  */
+  0, /* imm_offset  */
+};
+
+static const struct cpu_regmove_cost tsv110_regmove_cost =
+{
+  1, /* GP2GP  */
+  /* Avoid the use of slow int<->fp moves for spilling by setting
+     their cost higher than memmov_cost.  */
+  2, /* GP2FP  */
+  3, /* FP2GP  */
+  2  /* FP2FP  */
+};
+
+static const advsimd_vec_cost tsv110_advsimd_vector_cost =
+{
+  2, /* int_stmt_cost  */
+  2, /* fp_stmt_cost  */
+  0, /* ld2_st2_permute_cost  */
+  0, /* ld3_st3_permute_cost  */
+  0, /* ld4_st4_permute_cost  */
+  2, /* permute_cost  */
+  3, /* reduc_i8_cost  */
+  3, /* reduc_i16_cost  */
+  3, /* reduc_i32_cost  */
+  3, /* reduc_i64_cost  */
+  3, /* reduc_f16_cost  */
+  3, /* reduc_f32_cost  */
+  3, /* reduc_f64_cost  */
+  3, /* store_elt_extra_cost  */
+  3, /* vec_to_scalar_cost  */
+  2, /* scalar_to_vec_cost  */
+  5, /* align_load_cost  */
+  5, /* unalign_load_cost  */
+  1, /* unalign_store_cost  */
+  1  /* store_cost  */
+};
+
+static const struct cpu_vector_cost tsv110_vector_cost =
+{
+  1, /* scalar_int_stmt_cost  */
+  1, /* scalar_fp_stmt_cost  */
+  5, /* scalar_load_cost  */
+  1, /* scalar_store_cost  */
+  1, /* cond_taken_branch_cost  */
+  1, /* cond_not_taken_branch_cost  */
+  &tsv110_advsimd_vector_cost, /* advsimd  */
+  nullptr, /* sve  */
+  nullptr /* issue_info  */
+};
+
+static const cpu_prefetch_tune tsv110_prefetch_tune =
+{
+  0,                    /* num_slots  */
+  64,                   /* l1_cache_size  */
+  64,                   /* l1_cache_line_size  */
+  512,                  /* l2_cache_size  */
+  true,                 /* prefetch_dynamic_strides */
+  -1,                   /* minimum_stride */
+  -1                    /* default_opt_level  */
+};
+
+static const struct tune_params tsv110_tunings =
+{
+  &tsv110_extra_costs,
+  &tsv110_addrcost_table,
+  &tsv110_regmove_cost,
+  &tsv110_vector_cost,
+  &generic_branch_cost,
+  &generic_approx_modes,
+  SVE_NOT_IMPLEMENTED, /* sve_width  */
+  { 4, /* load_int.  */
+    4, /* store_int.  */
+    4, /* load_fp.  */
+    4, /* store_fp.  */
+    4, /* load_pred.  */
+    4 /* store_pred.  */
+  }, /* memmov_cost.  */
+  4,    /* issue_rate  */
+  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_ALU_BRANCH
+   | AARCH64_FUSE_ALU_CBZ), /* fusible_ops  */
+  "16", /* function_align.  */
+  "4",  /* jump_align.  */
+  "8",  /* loop_align.  */
+  2,    /* int_reassoc_width.  */
+  4,    /* fp_reassoc_width.  */
+  1,	/* fma_reassoc_width.  */
+  1,    /* vec_reassoc_width.  */
+  2,    /* min_div_recip_mul_sf.  */
+  2,    /* min_div_recip_mul_df.  */
+  0,    /* max_case_values.  */
+  tune_params::AUTOPREFETCHER_WEAK,     /* autoprefetcher_model.  */
+  (AARCH64_EXTRA_TUNE_NONE),     /* tune_flags.  */
+  &tsv110_prefetch_tune,
+  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
+  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
+};
+
+#endif /* GCC_AARCH64_H_TSV110.  */
diff --git a/gcc/config/aarch64/tuning_models/xgene1.h b/gcc/config/aarch64/tuning_models/xgene1.h
new file mode 100644
index 0000000000000000000000000000000000000000..53a3eb0ddeb80a9735cc988e242a70e87dc90655
--- /dev/null
+++ b/gcc/config/aarch64/tuning_models/xgene1.h
@@ -0,0 +1,145 @@
+/* Tuning model description for AArch64 architecture.
+   Copyright (C) 2009-2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_AARCH64_H_XGENE1
+#define GCC_AARCH64_H_XGENE1
+
+#include "generic.h"
+
+static const struct cpu_addrcost_table xgene1_addrcost_table =
+{
+    {
+      1, /* hi  */
+      0, /* si  */
+      0, /* di  */
+      1, /* ti  */
+    },
+  1, /* pre_modify  */
+  1, /* post_modify  */
+  1, /* post_modify_ld3_st3  */
+  1, /* post_modify_ld4_st4  */
+  0, /* register_offset  */
+  1, /* register_sextend  */
+  1, /* register_zextend  */
+  0, /* imm_offset  */
+};
+
+static const struct cpu_regmove_cost xgene1_regmove_cost =
+{
+  1, /* GP2GP  */
+  /* Avoid the use of slow int<->fp moves for spilling by setting
+     their cost higher than memmov_cost.  */
+  8, /* GP2FP  */
+  8, /* FP2GP  */
+  2 /* FP2FP  */
+};
+
+static const advsimd_vec_cost xgene1_advsimd_vector_cost =
+{
+  2, /* int_stmt_cost  */
+  2, /* fp_stmt_cost  */
+  0, /* ld2_st2_permute_cost  */
+  0, /* ld3_st3_permute_cost  */
+  0, /* ld4_st4_permute_cost  */
+  2, /* permute_cost  */
+  4, /* reduc_i8_cost  */
+  4, /* reduc_i16_cost  */
+  4, /* reduc_i32_cost  */
+  4, /* reduc_i64_cost  */
+  4, /* reduc_f16_cost  */
+  4, /* reduc_f32_cost  */
+  4, /* reduc_f64_cost  */
+  4, /* store_elt_extra_cost  */
+  4, /* vec_to_scalar_cost  */
+  4, /* scalar_to_vec_cost  */
+  10, /* align_load_cost  */
+  10, /* unalign_load_cost  */
+  2, /* unalign_store_cost  */
+  2  /* store_cost  */
+};
+
+/* Generic costs for vector insn classes.  */
+static const struct cpu_vector_cost xgene1_vector_cost =
+{
+  1, /* scalar_int_stmt_cost  */
+  1, /* scalar_fp_stmt_cost  */
+  5, /* scalar_load_cost  */
+  1, /* scalar_store_cost  */
+  2, /* cond_taken_branch_cost  */
+  1, /* cond_not_taken_branch_cost  */
+  &xgene1_advsimd_vector_cost, /* advsimd  */
+  nullptr, /* sve  */
+  nullptr /* issue_info  */
+};
+
+/* Approximation modes for X-Gene 1.  */
+static const cpu_approx_modes xgene1_approx_modes =
+{
+  AARCH64_APPROX_NONE,	/* division  */
+  AARCH64_APPROX_NONE,	/* sqrt  */
+  AARCH64_APPROX_ALL	/* recip_sqrt  */
+};
+
+static const cpu_prefetch_tune xgene1_prefetch_tune =
+{
+  8,			/* num_slots  */
+  32,			/* l1_cache_size  */
+  64,			/* l1_cache_line_size  */
+  256,			/* l2_cache_size  */
+  true,                 /* prefetch_dynamic_strides */
+  -1,                   /* minimum_stride */
+  -1			/* default_opt_level  */
+};
+
+static const struct tune_params xgene1_tunings =
+{
+  &xgene1_extra_costs,
+  &xgene1_addrcost_table,
+  &xgene1_regmove_cost,
+  &xgene1_vector_cost,
+  &generic_branch_cost,
+  &xgene1_approx_modes,
+  SVE_NOT_IMPLEMENTED, /* sve_width  */
+  { 6, /* load_int.  */
+    6, /* store_int.  */
+    6, /* load_fp.  */
+    6, /* store_fp.  */
+    6, /* load_pred.  */
+    6 /* store_pred.  */
+  }, /* memmov_cost.  */
+  4, /* issue_rate  */
+  AARCH64_FUSE_NOTHING, /* fusible_ops  */
+  "16",	/* function_align.  */
+  "16",	/* jump_align.  */
+  "16",	/* loop_align.  */
+  2,	/* int_reassoc_width.  */
+  4,	/* fp_reassoc_width.  */
+  1,	/* fma_reassoc_width.  */
+  1,	/* vec_reassoc_width.  */
+  2,	/* min_div_recip_mul_sf.  */
+  2,	/* min_div_recip_mul_df.  */
+  17,	/* max_case_values.  */
+  tune_params::AUTOPREFETCHER_OFF,	/* autoprefetcher_model.  */
+  (AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS),	/* tune_flags.  */
+  &xgene1_prefetch_tune,
+  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
+  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
+};
+
+#endif /* GCC_AARCH64_H_XGENE1.  */




^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 2/6]AArch64: Remove special handling of generic cpu.
  2023-11-15 17:06 [PATCH 1/6]AArch64: Refactor costs models to different files Tamar Christina
@ 2023-11-15 17:07 ` Tamar Christina
  2023-11-16  9:14   ` Richard Earnshaw
  2023-11-15 17:07 ` [PATCH 3/6]AArch64: Add new generic-armv8-a CPU and make it the default Tamar Christina
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Tamar Christina @ 2023-11-15 17:07 UTC (permalink / raw)
  To: gcc-patches
  Cc: nd, Richard.Earnshaw, Marcus.Shawcroft, Kyrylo.Tkachov,
	richard.sandiford

[-- Attachment #1: Type: text/plain, Size: 5236 bytes --]

Hi All,

In anticipation of adding new generic turning values this removes the hardcoding
of the "generic" CPU and instead just specifies it as a normal CPU.

No change in behavior is expected.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	PR target/111370
	* config/aarch64/aarch64-cores.def: Add generic.
	* config/aarch64/aarch64-opts.h (enum aarch64_proc): Remove generic.
	* config/aarch64/aarch64-tune.md: Regenerate
	* config/aarch64/aarch64.cc (all_cores): Remove generic
	* config/aarch64/aarch64.h (enum target_cpus): Remove
	TARGET_CPU_generic.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64-cores.def b/gcc/config/aarch64/aarch64-cores.def
index eae40b29df6f8ae353d168b6f73845846d1da94b..3e363bd0e8bbc10cb5b28d6183647736318e6d40 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -189,4 +189,7 @@ AARCH64_CORE("neoverse-n2", neoversen2, cortexa57, V9A, (I8MM, BF16, SVE2_BITPER
 AARCH64_CORE("neoverse-v2", neoversev2, cortexa57, V9A, (I8MM, BF16, SVE2_BITPERM, RNG, MEMTAG, PROFILE), neoversev2, 0x41, 0xd4f, -1)
 AARCH64_CORE("demeter", demeter, cortexa57, V9A, (I8MM, BF16, SVE2_BITPERM, RNG, MEMTAG, PROFILE), neoversev2, 0x41, 0xd4f, -1)
 
+/* Generic Architecture Processors.  */
+AARCH64_CORE("generic",  generic, cortexa53, V8A,  (), generic, 0x0, 0x0, -1)
+
 #undef AARCH64_CORE
diff --git a/gcc/config/aarch64/aarch64-opts.h b/gcc/config/aarch64/aarch64-opts.h
index 831e28ab52a4271ef5467965039a32d078755d42..01151e93d17979f499523cabb74a449170483a70 100644
--- a/gcc/config/aarch64/aarch64-opts.h
+++ b/gcc/config/aarch64/aarch64-opts.h
@@ -32,8 +32,6 @@ enum aarch64_processor
 #define AARCH64_CORE(NAME, INTERNAL_IDENT, SCHED, ARCH, FLAGS, COSTS, IMP, PART, VARIANT) \
   INTERNAL_IDENT,
 #include "aarch64-cores.def"
-  /* Used to indicate that no processor has been specified.  */
-  generic,
   /* Used to mark the end of the processor table.  */
   aarch64_none
 };
diff --git a/gcc/config/aarch64/aarch64-tune.md b/gcc/config/aarch64/aarch64-tune.md
index c969277d617ad5fd070a915bfedb83323eb71e6c..cd5d79ea9c221874578a4d5804e4f618e671ebcd 100644
--- a/gcc/config/aarch64/aarch64-tune.md
+++ b/gcc/config/aarch64/aarch64-tune.md
@@ -1,5 +1,5 @@
 ;; -*- buffer-read-only: t -*-
 ;; Generated automatically by gentune.sh from aarch64-cores.def
 (define_attr "tune"
-	"cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,ampere1a,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,neoversen1,ares,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,neoversev1,zeus,neoverse512tvb,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa520,cortexa710,cortexa715,cortexa720,cortexx2,cortexx3,cortexx4,neoversen2,neoversev2,demeter"
+	"cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,ampere1a,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,neoversen1,ares,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,neoversev1,zeus,neoverse512tvb,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa520,cortexa710,cortexa715,cortexa720,cortexx2,cortexx3,cortexx4,neoversen2,neoversev2,demeter,generic"
 	(const (symbol_ref "((enum attr_tune) aarch64_tune)")))
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index d74e9116fc56cfa85558cc0810f76479e7280f69..b178bb5b62dbdcb1f5edbad4155416d6093a11f3 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -720,7 +720,6 @@ enum target_cpus
 #define AARCH64_CORE(NAME, INTERNAL_IDENT, SCHED, ARCH, FLAGS, COSTS, IMP, PART, VARIANT) \
   TARGET_CPU_##INTERNAL_IDENT,
 #include "aarch64-cores.def"
-  TARGET_CPU_generic
 };
 
 /* If there is no CPU defined at configure, use generic as default.  */
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 07b1cde39209f5c7740e336b499e9aed31e4c515..086448632700bc97b0d4c75d85cef63f820e9944 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -427,8 +427,6 @@ static const struct processor all_cores[] =
   {NAME, IDENT, SCHED, AARCH64_ARCH_##ARCH, \
    feature_deps::cpu_##IDENT, &COSTS##_tunings},
 #include "aarch64-cores.def"
-  {"generic", generic, cortexa53, AARCH64_ARCH_V8A,
-   feature_deps::V8A ().enable, &generic_tunings},
   {NULL, aarch64_none, aarch64_none, aarch64_no_arch, 0, NULL}
 };
 




-- 

[-- Attachment #2: rb17816.patch --]
[-- Type: text/plain, Size: 4577 bytes --]

diff --git a/gcc/config/aarch64/aarch64-cores.def b/gcc/config/aarch64/aarch64-cores.def
index eae40b29df6f8ae353d168b6f73845846d1da94b..3e363bd0e8bbc10cb5b28d6183647736318e6d40 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -189,4 +189,7 @@ AARCH64_CORE("neoverse-n2", neoversen2, cortexa57, V9A, (I8MM, BF16, SVE2_BITPER
 AARCH64_CORE("neoverse-v2", neoversev2, cortexa57, V9A, (I8MM, BF16, SVE2_BITPERM, RNG, MEMTAG, PROFILE), neoversev2, 0x41, 0xd4f, -1)
 AARCH64_CORE("demeter", demeter, cortexa57, V9A, (I8MM, BF16, SVE2_BITPERM, RNG, MEMTAG, PROFILE), neoversev2, 0x41, 0xd4f, -1)
 
+/* Generic Architecture Processors.  */
+AARCH64_CORE("generic",  generic, cortexa53, V8A,  (), generic, 0x0, 0x0, -1)
+
 #undef AARCH64_CORE
diff --git a/gcc/config/aarch64/aarch64-opts.h b/gcc/config/aarch64/aarch64-opts.h
index 831e28ab52a4271ef5467965039a32d078755d42..01151e93d17979f499523cabb74a449170483a70 100644
--- a/gcc/config/aarch64/aarch64-opts.h
+++ b/gcc/config/aarch64/aarch64-opts.h
@@ -32,8 +32,6 @@ enum aarch64_processor
 #define AARCH64_CORE(NAME, INTERNAL_IDENT, SCHED, ARCH, FLAGS, COSTS, IMP, PART, VARIANT) \
   INTERNAL_IDENT,
 #include "aarch64-cores.def"
-  /* Used to indicate that no processor has been specified.  */
-  generic,
   /* Used to mark the end of the processor table.  */
   aarch64_none
 };
diff --git a/gcc/config/aarch64/aarch64-tune.md b/gcc/config/aarch64/aarch64-tune.md
index c969277d617ad5fd070a915bfedb83323eb71e6c..cd5d79ea9c221874578a4d5804e4f618e671ebcd 100644
--- a/gcc/config/aarch64/aarch64-tune.md
+++ b/gcc/config/aarch64/aarch64-tune.md
@@ -1,5 +1,5 @@
 ;; -*- buffer-read-only: t -*-
 ;; Generated automatically by gentune.sh from aarch64-cores.def
 (define_attr "tune"
-	"cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,ampere1a,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,neoversen1,ares,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,neoversev1,zeus,neoverse512tvb,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa520,cortexa710,cortexa715,cortexa720,cortexx2,cortexx3,cortexx4,neoversen2,neoversev2,demeter"
+	"cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,ampere1a,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,neoversen1,ares,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,neoversev1,zeus,neoverse512tvb,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa520,cortexa710,cortexa715,cortexa720,cortexx2,cortexx3,cortexx4,neoversen2,neoversev2,demeter,generic"
 	(const (symbol_ref "((enum attr_tune) aarch64_tune)")))
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index d74e9116fc56cfa85558cc0810f76479e7280f69..b178bb5b62dbdcb1f5edbad4155416d6093a11f3 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -720,7 +720,6 @@ enum target_cpus
 #define AARCH64_CORE(NAME, INTERNAL_IDENT, SCHED, ARCH, FLAGS, COSTS, IMP, PART, VARIANT) \
   TARGET_CPU_##INTERNAL_IDENT,
 #include "aarch64-cores.def"
-  TARGET_CPU_generic
 };
 
 /* If there is no CPU defined at configure, use generic as default.  */
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 07b1cde39209f5c7740e336b499e9aed31e4c515..086448632700bc97b0d4c75d85cef63f820e9944 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -427,8 +427,6 @@ static const struct processor all_cores[] =
   {NAME, IDENT, SCHED, AARCH64_ARCH_##ARCH, \
    feature_deps::cpu_##IDENT, &COSTS##_tunings},
 #include "aarch64-cores.def"
-  {"generic", generic, cortexa53, AARCH64_ARCH_V8A,
-   feature_deps::V8A ().enable, &generic_tunings},
   {NULL, aarch64_none, aarch64_none, aarch64_no_arch, 0, NULL}
 };
 




^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 3/6]AArch64: Add new generic-armv8-a CPU and make it the default.
  2023-11-15 17:06 [PATCH 1/6]AArch64: Refactor costs models to different files Tamar Christina
  2023-11-15 17:07 ` [PATCH 2/6]AArch64: Remove special handling of generic cpu Tamar Christina
@ 2023-11-15 17:07 ` Tamar Christina
  2023-11-16  9:23   ` Richard Earnshaw
  2023-11-15 17:08 ` [PATCH 4/6]AArch64: Add new generic-armv9-a CPU and make it the default for Armv9 Tamar Christina
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Tamar Christina @ 2023-11-15 17:07 UTC (permalink / raw)
  To: gcc-patches
  Cc: nd, Richard.Earnshaw, Marcus.Shawcroft, Kyrylo.Tkachov,
	richard.sandiford

[-- Attachment #1: Type: text/plain, Size: 17427 bytes --]

Hi All,

This patch adds a new generic scheduling model "generic-armv8-a" and makes it
the default for all Armv8 architectures.

-mcpu=generic and -mtune=generic is kept around for those that really want the
deprecated cost model.

This shows on SPECCPU 2017 the following:

generic:  SPECINT 1.0% imporvement in geomean, SPECFP -0.6%.  The SPECFP is due
          to fotonik3d_r where we vectorize an FP calculation that only ever
	  needs one lane of the result.  This I believe is a generic costing bug
	  but at the moment we can't change costs of FP and INT independently.
	  So will defer updating that cost to stage3 after Richard's other
	  costing updates land.

generic SVE: SPECINT 1.1% improvement in geomean, SPECFP 0.7% improvement.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	PR target/111370
	* config/aarch64/aarch64-arches.def (armv8-9, armv8-a, armv8.1-a,
	armv8.2-a, armv8.3-a, armv8.4-a, armv8.5-a, armv8.6-a, armv8.7-a,
	armv8.8-a): Update to generic_armv8_a.
	* config/aarch64/aarch64-cores.def (generic-armv8-a): New.
	* config/aarch64/aarch64-tune.md: Regenerate.
	* config/aarch64/aarch64.cc: Include generic_armv8_a.h
	* config/aarch64/aarch64.h (TARGET_CPU_DEFAULT): Change to
	TARGET_CPU_generic_armv8_a.
	* config/aarch64/tuning_models/generic_armv8_a.h: New file.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64-arches.def b/gcc/config/aarch64/aarch64-arches.def
index 7ae92aa8e984e0a77efd5c5a5061c4c6f86e0118..f89e4ea1f48acc2875c9a834d93d94c94163cddc 100644
--- a/gcc/config/aarch64/aarch64-arches.def
+++ b/gcc/config/aarch64/aarch64-arches.def
@@ -30,19 +30,19 @@
    Due to the assumptions about the positions of these fields in config.gcc,
    NAME should be kept as the first argument.  */
 
-AARCH64_ARCH("armv8-a",       generic,       V8A,       8,  (SIMD))
-AARCH64_ARCH("armv8.1-a",     generic,       V8_1A,     8,  (V8A, LSE, CRC, RDMA))
-AARCH64_ARCH("armv8.2-a",     generic,       V8_2A,     8,  (V8_1A))
-AARCH64_ARCH("armv8.3-a",     generic,       V8_3A,     8,  (V8_2A, PAUTH, RCPC))
-AARCH64_ARCH("armv8.4-a",     generic,       V8_4A,     8,  (V8_3A, F16FML, DOTPROD, FLAGM))
-AARCH64_ARCH("armv8.5-a",     generic,       V8_5A,     8,  (V8_4A, SB, SSBS, PREDRES))
-AARCH64_ARCH("armv8.6-a",     generic,       V8_6A,     8,  (V8_5A, I8MM, BF16))
-AARCH64_ARCH("armv8.7-a",     generic,       V8_7A,     8,  (V8_6A, LS64))
-AARCH64_ARCH("armv8.8-a",     generic,       V8_8A,     8,  (V8_7A, MOPS))
-AARCH64_ARCH("armv8-r",       generic,       V8R  ,     8,  (V8_4A))
-AARCH64_ARCH("armv9-a",       generic,       V9A  ,     9,  (V8_5A, SVE2))
-AARCH64_ARCH("armv9.1-a",     generic,       V9_1A,     9,  (V8_6A, V9A))
-AARCH64_ARCH("armv9.2-a",     generic,       V9_2A,     9,  (V8_7A, V9_1A))
-AARCH64_ARCH("armv9.3-a",     generic,       V9_3A,     9,  (V8_8A, V9_2A))
+AARCH64_ARCH("armv8-a",       generic_armv8_a,   V8A,       8,  (SIMD))
+AARCH64_ARCH("armv8.1-a",     generic_armv8_a,   V8_1A,     8,  (V8A, LSE, CRC, RDMA))
+AARCH64_ARCH("armv8.2-a",     generic_armv8_a,   V8_2A,     8,  (V8_1A))
+AARCH64_ARCH("armv8.3-a",     generic_armv8_a,   V8_3A,     8,  (V8_2A, PAUTH, RCPC))
+AARCH64_ARCH("armv8.4-a",     generic_armv8_a,   V8_4A,     8,  (V8_3A, F16FML, DOTPROD, FLAGM))
+AARCH64_ARCH("armv8.5-a",     generic_armv8_a,   V8_5A,     8,  (V8_4A, SB, SSBS, PREDRES))
+AARCH64_ARCH("armv8.6-a",     generic_armv8_a,   V8_6A,     8,  (V8_5A, I8MM, BF16))
+AARCH64_ARCH("armv8.7-a",     generic_armv8_a,   V8_7A,     8,  (V8_6A, LS64))
+AARCH64_ARCH("armv8.8-a",     generic_armv8_a,   V8_8A,     8,  (V8_7A, MOPS))
+AARCH64_ARCH("armv8-r",       generic_armv8_a,   V8R  ,     8,  (V8_4A))
+AARCH64_ARCH("armv9-a",       generic,           V9A  ,     9,  (V8_5A, SVE2))
+AARCH64_ARCH("armv9.1-a",     generic,           V9_1A,     9,  (V8_6A, V9A))
+AARCH64_ARCH("armv9.2-a",     generic,           V9_2A,     9,  (V8_7A, V9_1A))
+AARCH64_ARCH("armv9.3-a",     generic,           V9_3A,     9,  (V8_8A, V9_2A))
 
 #undef AARCH64_ARCH
diff --git a/gcc/config/aarch64/aarch64-cores.def b/gcc/config/aarch64/aarch64-cores.def
index 3e363bd0e8bbc10cb5b28d6183647736318e6d40..30f4dd04ed71823bc34c0c405d49963b6b2d1375 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -191,5 +191,6 @@ AARCH64_CORE("demeter", demeter, cortexa57, V9A, (I8MM, BF16, SVE2_BITPERM, RNG,
 
 /* Generic Architecture Processors.  */
 AARCH64_CORE("generic",  generic, cortexa53, V8A,  (), generic, 0x0, 0x0, -1)
+AARCH64_CORE("generic-armv8-a",  generic_armv8_a, cortexa53, V8A,  (), generic_armv8_a, 0x0, 0x0, -1)
 
 #undef AARCH64_CORE
diff --git a/gcc/config/aarch64/aarch64-tune.md b/gcc/config/aarch64/aarch64-tune.md
index cd5d79ea9c221874578a4d5804e4f618e671ebcd..0a32056f255de455f47a0b7395dfef0af84c6b5e 100644
--- a/gcc/config/aarch64/aarch64-tune.md
+++ b/gcc/config/aarch64/aarch64-tune.md
@@ -1,5 +1,5 @@
 ;; -*- buffer-read-only: t -*-
 ;; Generated automatically by gentune.sh from aarch64-cores.def
 (define_attr "tune"
-	"cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,ampere1a,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,neoversen1,ares,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,neoversev1,zeus,neoverse512tvb,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa520,cortexa710,cortexa715,cortexa720,cortexx2,cortexx3,cortexx4,neoversen2,neoversev2,demeter,generic"
+	"cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,ampere1a,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,neoversen1,ares,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,neoversev1,zeus,neoverse512tvb,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa520,cortexa710,cortexa715,cortexa720,cortexx2,cortexx3,cortexx4,neoversen2,neoversev2,demeter,generic,generic_armv8_a"
 	(const (symbol_ref "((enum attr_tune) aarch64_tune)")))
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 145bf536c28fdef84246e16d8351f4b4e357d27c..1ac298926ce1606a87bcdcaf691f182ca416d600 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -724,7 +724,7 @@ enum target_cpus
 
 /* If there is no CPU defined at configure, use generic as default.  */
 #ifndef TARGET_CPU_DEFAULT
-# define TARGET_CPU_DEFAULT TARGET_CPU_generic
+# define TARGET_CPU_DEFAULT TARGET_CPU_generic_armv8_a
 #endif
 
 /* If inserting NOP before a mult-accumulate insn remember to adjust the
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 9d59431d933021d71c5c202f0a61f807a2d2b0f1..1f5645e4886acd30ee5a437f60ffb53ee7b09436 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -355,6 +355,7 @@ static const struct aarch64_flag_desc aarch64_tuning_flags[] =
 
 /* Tuning parameters.  */
 #include "tuning_models/generic.h"
+#include "tuning_models/generic_armv8_a.h"
 #include "tuning_models/cortexa35.h"
 #include "tuning_models/cortexa53.h"
 #include "tuning_models/cortexa57.h"
diff --git a/gcc/config/aarch64/tuning_models/generic_armv8_a.h b/gcc/config/aarch64/tuning_models/generic_armv8_a.h
new file mode 100644
index 0000000000000000000000000000000000000000..82abe172834756696a3905dbf92464f73a1ea3da
--- /dev/null
+++ b/gcc/config/aarch64/tuning_models/generic_armv8_a.h
@@ -0,0 +1,191 @@
+/* Tuning model description for AArch64 architecture.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_AARCH64_H_GENERIC_ARMV8_A
+#define GCC_AARCH64_H_GENERIC_ARMV8_A
+
+#include "generic.h"
+
+static const struct cpu_addrcost_table generic_armv8_a_addrcost_table =
+{
+    {
+      1, /* hi  */
+      0, /* si  */
+      0, /* di  */
+      1, /* ti  */
+    },
+  0, /* pre_modify  */
+  0, /* post_modify  */
+  0, /* post_modify_ld3_st3  */
+  0, /* post_modify_ld4_st4  */
+  0, /* register_offset  */
+  0, /* register_sextend  */
+  0, /* register_zextend  */
+  0 /* imm_offset  */
+};
+
+static const struct cpu_regmove_cost generic_armv8_a_regmove_cost =
+{
+  1, /* GP2GP  */
+  /* Avoid the use of slow int<->fp moves for spilling by setting
+     their cost higher than memmov_cost.  */
+  5, /* GP2FP  */
+  5, /* FP2GP  */
+  2 /* FP2FP  */
+};
+
+/* Generic costs for Advanced SIMD vector operations.   */
+static const advsimd_vec_cost generic_armv8_a_advsimd_vector_cost =
+{
+  1, /* int_stmt_cost  */
+  1, /* fp_stmt_cost  */
+  0, /* ld2_st2_permute_cost  */
+  0, /* ld3_st3_permute_cost  */
+  0, /* ld4_st4_permute_cost  */
+  2, /* permute_cost  */
+  2, /* reduc_i8_cost  */
+  2, /* reduc_i16_cost  */
+  2, /* reduc_i32_cost  */
+  2, /* reduc_i64_cost  */
+  2, /* reduc_f16_cost  */
+  2, /* reduc_f32_cost  */
+  2, /* reduc_f64_cost  */
+  2, /* store_elt_extra_cost  */
+  2, /* vec_to_scalar_cost  */
+  1, /* scalar_to_vec_cost  */
+  1, /* align_load_cost  */
+  1, /* unalign_load_cost  */
+  1, /* unalign_store_cost  */
+  1  /* store_cost  */
+};
+
+/* Generic costs for SVE vector operations.  */
+static const sve_vec_cost generic_armv8_a_sve_vector_cost =
+{
+  {
+    1, /* int_stmt_cost  */
+    1, /* fp_stmt_cost  */
+    0, /* ld2_st2_permute_cost  */
+    0, /* ld3_st3_permute_cost  */
+    0, /* ld4_st4_permute_cost  */
+    2, /* permute_cost  */
+    2, /* reduc_i8_cost  */
+    2, /* reduc_i16_cost  */
+    2, /* reduc_i32_cost  */
+    2, /* reduc_i64_cost  */
+    2, /* reduc_f16_cost  */
+    2, /* reduc_f32_cost  */
+    2, /* reduc_f64_cost  */
+    2, /* store_elt_extra_cost  */
+    2, /* vec_to_scalar_cost  */
+    1, /* scalar_to_vec_cost  */
+    1, /* align_load_cost  */
+    1, /* unalign_load_cost  */
+    1, /* unalign_store_cost  */
+    1  /* store_cost  */
+  },
+  2, /* clast_cost  */
+  2, /* fadda_f16_cost  */
+  2, /* fadda_f32_cost  */
+  2, /* fadda_f64_cost  */
+  4, /* gather_load_x32_cost  */
+  2, /* gather_load_x64_cost  */
+  1 /* scatter_store_elt_cost  */
+};
+
+/* Generic costs for vector insn classes.  */
+static const struct cpu_vector_cost generic_armv8_a_vector_cost =
+{
+  1, /* scalar_int_stmt_cost  */
+  1, /* scalar_fp_stmt_cost  */
+  1, /* scalar_load_cost  */
+  1, /* scalar_store_cost  */
+  3, /* cond_taken_branch_cost  */
+  1, /* cond_not_taken_branch_cost  */
+  &generic_armv8_a_advsimd_vector_cost, /* advsimd  */
+  &generic_armv8_a_sve_vector_cost, /* sve */
+  nullptr /* issue_info  */
+};
+
+/* Generic costs for branch instructions.  */
+static const struct cpu_branch_cost generic_armv8_a_branch_cost =
+{
+  1,  /* Predictable.  */
+  3   /* Unpredictable.  */
+};
+
+/* Generic approximation modes.  */
+static const cpu_approx_modes generic_armv8_a_approx_modes =
+{
+  AARCH64_APPROX_NONE,	/* division  */
+  AARCH64_APPROX_NONE,	/* sqrt  */
+  AARCH64_APPROX_NONE	/* recip_sqrt  */
+};
+
+/* Generic prefetch settings (which disable prefetch).  */
+static const cpu_prefetch_tune generic_armv8_a_prefetch_tune =
+{
+  0,			/* num_slots  */
+  -1,			/* l1_cache_size  */
+  -1,			/* l1_cache_line_size  */
+  -1,			/* l2_cache_size  */
+  true,			/* prefetch_dynamic_strides */
+  -1,			/* minimum_stride */
+  -1			/* default_opt_level  */
+};
+
+static const struct tune_params generic_armv8_a_tunings =
+{
+  &cortexa76_extra_costs,
+  &generic_armv8_a_addrcost_table,
+  &generic_armv8_a_regmove_cost,
+  &generic_armv8_a_vector_cost,
+  &generic_armv8_a_branch_cost,
+  &generic_armv8_a_approx_modes,
+  SVE_NOT_IMPLEMENTED, /* sve_width  */
+  { 4, /* load_int.  */
+    2, /* store_int.  */
+    5, /* load_fp.  */
+    2, /* store_fp.  */
+    4, /* load_pred.  */
+    4 /* store_pred.  */
+  }, /* memmov_cost.  */
+  3, /* issue_rate  */
+  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops  */
+  "32:16",	/* function_align.  */
+  "4",		/* jump_align.  */
+  "32:16",	/* loop_align.  */
+  2,	/* int_reassoc_width.  */
+  4,	/* fp_reassoc_width.  */
+  1,	/* fma_reassoc_width.  */
+  2,	/* vec_reassoc_width.  */
+  2,	/* min_div_recip_mul_sf.  */
+  2,	/* min_div_recip_mul_df.  */
+  0,	/* max_case_values.  */
+  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
+  (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND
+   | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
+   | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
+   | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT),	/* tune_flags.  */
+  &generic_prefetch_tune,
+  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
+  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
+};
+
+#endif /* GCC_AARCH64_H_GENERIC_ARMV8_A.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_asrd_1.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_asrd_1.c
index aac06bd8093bed9e50928ee23f9a075888f14543..96e9935360100e25a4c01cceabc7aa840f520a3e 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/cond_asrd_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_asrd_1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256" } */
+/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256 --param=aarch64-autovec-preference=2" } */
 
 #include <stdint.h>
 
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_4.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_4.c
index f6278916e1afeb3f0cb8fdbff4e98782ad0a726e..6f969a829425960b414508a7e354a1f39426a0e4 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_4.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_4.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256" } */
+/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256 --param=aarch64-autovec-preference=2" } */
 
 #include <stdint.h>
 
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_5.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_5.c
index 03a6636f2d20b12f7e950a5bd6e43216139370fa..e6ec5157cd6dcc6b6dc24c5384432289b6dcdfba 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_5.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_5.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256" } */
+/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256 --param=aarch64-autovec-preference=2" } */
 
 #include <stdint.h>
 
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_uxt_5.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_uxt_5.c
index 9a2bd8f152ff32e8da1c4e2a73a31a249e5991c7..7ed35921b6f914441dc463c4030fcc4663a6813c 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/cond_uxt_5.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_uxt_5.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256" } */
+/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256 --param=aarch64-autovec-preference=2" } */
 
 #include <stdint.h>
 
diff --git a/gcc/testsuite/gcc.target/aarch64/target_attr_13.c b/gcc/testsuite/gcc.target/aarch64/target_attr_13.c
index d5bee3a7b900bf9348c9cbfd67f487c381b13bf6..4bdb167944cda1861dd0462d905149646be69693 100644
--- a/gcc/testsuite/gcc.target/aarch64/target_attr_13.c
+++ b/gcc/testsuite/gcc.target/aarch64/target_attr_13.c
@@ -1,5 +1,5 @@
 /* { dg-do assemble } */
-/* { dg-options "-O2 -march=armv8-a+crc+crypto -mcpu=generic" } */
+/* { dg-options "-O2 -mcpu=generic+crypto" } */
 
 #include "arm_acle.h"
 
diff --git a/gcc/testsuite/gcc.target/aarch64/target_attr_15.c b/gcc/testsuite/gcc.target/aarch64/target_attr_15.c
index 069a0010865334324a100bab358bb53369f122fb..e6f31ba72ee77d1129f3cfbe2d90216d6c355c57 100644
--- a/gcc/testsuite/gcc.target/aarch64/target_attr_15.c
+++ b/gcc/testsuite/gcc.target/aarch64/target_attr_15.c
@@ -1,5 +1,5 @@
 /* { dg-do assemble } */
-/* { dg-options "-march=armv8-a+crypto -mcpu=generic -save-temps" } */
+/* { dg-options "-mcpu=generic+crypto -save-temps" } */
 
 /* Check that "+nothing" clears the ISA flags.  */
 




-- 

[-- Attachment #2: rb17817.patch --]
[-- Type: text/plain, Size: 16027 bytes --]

diff --git a/gcc/config/aarch64/aarch64-arches.def b/gcc/config/aarch64/aarch64-arches.def
index 7ae92aa8e984e0a77efd5c5a5061c4c6f86e0118..f89e4ea1f48acc2875c9a834d93d94c94163cddc 100644
--- a/gcc/config/aarch64/aarch64-arches.def
+++ b/gcc/config/aarch64/aarch64-arches.def
@@ -30,19 +30,19 @@
    Due to the assumptions about the positions of these fields in config.gcc,
    NAME should be kept as the first argument.  */
 
-AARCH64_ARCH("armv8-a",       generic,       V8A,       8,  (SIMD))
-AARCH64_ARCH("armv8.1-a",     generic,       V8_1A,     8,  (V8A, LSE, CRC, RDMA))
-AARCH64_ARCH("armv8.2-a",     generic,       V8_2A,     8,  (V8_1A))
-AARCH64_ARCH("armv8.3-a",     generic,       V8_3A,     8,  (V8_2A, PAUTH, RCPC))
-AARCH64_ARCH("armv8.4-a",     generic,       V8_4A,     8,  (V8_3A, F16FML, DOTPROD, FLAGM))
-AARCH64_ARCH("armv8.5-a",     generic,       V8_5A,     8,  (V8_4A, SB, SSBS, PREDRES))
-AARCH64_ARCH("armv8.6-a",     generic,       V8_6A,     8,  (V8_5A, I8MM, BF16))
-AARCH64_ARCH("armv8.7-a",     generic,       V8_7A,     8,  (V8_6A, LS64))
-AARCH64_ARCH("armv8.8-a",     generic,       V8_8A,     8,  (V8_7A, MOPS))
-AARCH64_ARCH("armv8-r",       generic,       V8R  ,     8,  (V8_4A))
-AARCH64_ARCH("armv9-a",       generic,       V9A  ,     9,  (V8_5A, SVE2))
-AARCH64_ARCH("armv9.1-a",     generic,       V9_1A,     9,  (V8_6A, V9A))
-AARCH64_ARCH("armv9.2-a",     generic,       V9_2A,     9,  (V8_7A, V9_1A))
-AARCH64_ARCH("armv9.3-a",     generic,       V9_3A,     9,  (V8_8A, V9_2A))
+AARCH64_ARCH("armv8-a",       generic_armv8_a,   V8A,       8,  (SIMD))
+AARCH64_ARCH("armv8.1-a",     generic_armv8_a,   V8_1A,     8,  (V8A, LSE, CRC, RDMA))
+AARCH64_ARCH("armv8.2-a",     generic_armv8_a,   V8_2A,     8,  (V8_1A))
+AARCH64_ARCH("armv8.3-a",     generic_armv8_a,   V8_3A,     8,  (V8_2A, PAUTH, RCPC))
+AARCH64_ARCH("armv8.4-a",     generic_armv8_a,   V8_4A,     8,  (V8_3A, F16FML, DOTPROD, FLAGM))
+AARCH64_ARCH("armv8.5-a",     generic_armv8_a,   V8_5A,     8,  (V8_4A, SB, SSBS, PREDRES))
+AARCH64_ARCH("armv8.6-a",     generic_armv8_a,   V8_6A,     8,  (V8_5A, I8MM, BF16))
+AARCH64_ARCH("armv8.7-a",     generic_armv8_a,   V8_7A,     8,  (V8_6A, LS64))
+AARCH64_ARCH("armv8.8-a",     generic_armv8_a,   V8_8A,     8,  (V8_7A, MOPS))
+AARCH64_ARCH("armv8-r",       generic_armv8_a,   V8R  ,     8,  (V8_4A))
+AARCH64_ARCH("armv9-a",       generic,           V9A  ,     9,  (V8_5A, SVE2))
+AARCH64_ARCH("armv9.1-a",     generic,           V9_1A,     9,  (V8_6A, V9A))
+AARCH64_ARCH("armv9.2-a",     generic,           V9_2A,     9,  (V8_7A, V9_1A))
+AARCH64_ARCH("armv9.3-a",     generic,           V9_3A,     9,  (V8_8A, V9_2A))
 
 #undef AARCH64_ARCH
diff --git a/gcc/config/aarch64/aarch64-cores.def b/gcc/config/aarch64/aarch64-cores.def
index 3e363bd0e8bbc10cb5b28d6183647736318e6d40..30f4dd04ed71823bc34c0c405d49963b6b2d1375 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -191,5 +191,6 @@ AARCH64_CORE("demeter", demeter, cortexa57, V9A, (I8MM, BF16, SVE2_BITPERM, RNG,
 
 /* Generic Architecture Processors.  */
 AARCH64_CORE("generic",  generic, cortexa53, V8A,  (), generic, 0x0, 0x0, -1)
+AARCH64_CORE("generic-armv8-a",  generic_armv8_a, cortexa53, V8A,  (), generic_armv8_a, 0x0, 0x0, -1)
 
 #undef AARCH64_CORE
diff --git a/gcc/config/aarch64/aarch64-tune.md b/gcc/config/aarch64/aarch64-tune.md
index cd5d79ea9c221874578a4d5804e4f618e671ebcd..0a32056f255de455f47a0b7395dfef0af84c6b5e 100644
--- a/gcc/config/aarch64/aarch64-tune.md
+++ b/gcc/config/aarch64/aarch64-tune.md
@@ -1,5 +1,5 @@
 ;; -*- buffer-read-only: t -*-
 ;; Generated automatically by gentune.sh from aarch64-cores.def
 (define_attr "tune"
-	"cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,ampere1a,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,neoversen1,ares,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,neoversev1,zeus,neoverse512tvb,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa520,cortexa710,cortexa715,cortexa720,cortexx2,cortexx3,cortexx4,neoversen2,neoversev2,demeter,generic"
+	"cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,ampere1a,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,neoversen1,ares,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,neoversev1,zeus,neoverse512tvb,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa520,cortexa710,cortexa715,cortexa720,cortexx2,cortexx3,cortexx4,neoversen2,neoversev2,demeter,generic,generic_armv8_a"
 	(const (symbol_ref "((enum attr_tune) aarch64_tune)")))
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 145bf536c28fdef84246e16d8351f4b4e357d27c..1ac298926ce1606a87bcdcaf691f182ca416d600 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -724,7 +724,7 @@ enum target_cpus
 
 /* If there is no CPU defined at configure, use generic as default.  */
 #ifndef TARGET_CPU_DEFAULT
-# define TARGET_CPU_DEFAULT TARGET_CPU_generic
+# define TARGET_CPU_DEFAULT TARGET_CPU_generic_armv8_a
 #endif
 
 /* If inserting NOP before a mult-accumulate insn remember to adjust the
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 9d59431d933021d71c5c202f0a61f807a2d2b0f1..1f5645e4886acd30ee5a437f60ffb53ee7b09436 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -355,6 +355,7 @@ static const struct aarch64_flag_desc aarch64_tuning_flags[] =
 
 /* Tuning parameters.  */
 #include "tuning_models/generic.h"
+#include "tuning_models/generic_armv8_a.h"
 #include "tuning_models/cortexa35.h"
 #include "tuning_models/cortexa53.h"
 #include "tuning_models/cortexa57.h"
diff --git a/gcc/config/aarch64/tuning_models/generic_armv8_a.h b/gcc/config/aarch64/tuning_models/generic_armv8_a.h
new file mode 100644
index 0000000000000000000000000000000000000000..82abe172834756696a3905dbf92464f73a1ea3da
--- /dev/null
+++ b/gcc/config/aarch64/tuning_models/generic_armv8_a.h
@@ -0,0 +1,191 @@
+/* Tuning model description for AArch64 architecture.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_AARCH64_H_GENERIC_ARMV8_A
+#define GCC_AARCH64_H_GENERIC_ARMV8_A
+
+#include "generic.h"
+
+static const struct cpu_addrcost_table generic_armv8_a_addrcost_table =
+{
+    {
+      1, /* hi  */
+      0, /* si  */
+      0, /* di  */
+      1, /* ti  */
+    },
+  0, /* pre_modify  */
+  0, /* post_modify  */
+  0, /* post_modify_ld3_st3  */
+  0, /* post_modify_ld4_st4  */
+  0, /* register_offset  */
+  0, /* register_sextend  */
+  0, /* register_zextend  */
+  0 /* imm_offset  */
+};
+
+static const struct cpu_regmove_cost generic_armv8_a_regmove_cost =
+{
+  1, /* GP2GP  */
+  /* Avoid the use of slow int<->fp moves for spilling by setting
+     their cost higher than memmov_cost.  */
+  5, /* GP2FP  */
+  5, /* FP2GP  */
+  2 /* FP2FP  */
+};
+
+/* Generic costs for Advanced SIMD vector operations.   */
+static const advsimd_vec_cost generic_armv8_a_advsimd_vector_cost =
+{
+  1, /* int_stmt_cost  */
+  1, /* fp_stmt_cost  */
+  0, /* ld2_st2_permute_cost  */
+  0, /* ld3_st3_permute_cost  */
+  0, /* ld4_st4_permute_cost  */
+  2, /* permute_cost  */
+  2, /* reduc_i8_cost  */
+  2, /* reduc_i16_cost  */
+  2, /* reduc_i32_cost  */
+  2, /* reduc_i64_cost  */
+  2, /* reduc_f16_cost  */
+  2, /* reduc_f32_cost  */
+  2, /* reduc_f64_cost  */
+  2, /* store_elt_extra_cost  */
+  2, /* vec_to_scalar_cost  */
+  1, /* scalar_to_vec_cost  */
+  1, /* align_load_cost  */
+  1, /* unalign_load_cost  */
+  1, /* unalign_store_cost  */
+  1  /* store_cost  */
+};
+
+/* Generic costs for SVE vector operations.  */
+static const sve_vec_cost generic_armv8_a_sve_vector_cost =
+{
+  {
+    1, /* int_stmt_cost  */
+    1, /* fp_stmt_cost  */
+    0, /* ld2_st2_permute_cost  */
+    0, /* ld3_st3_permute_cost  */
+    0, /* ld4_st4_permute_cost  */
+    2, /* permute_cost  */
+    2, /* reduc_i8_cost  */
+    2, /* reduc_i16_cost  */
+    2, /* reduc_i32_cost  */
+    2, /* reduc_i64_cost  */
+    2, /* reduc_f16_cost  */
+    2, /* reduc_f32_cost  */
+    2, /* reduc_f64_cost  */
+    2, /* store_elt_extra_cost  */
+    2, /* vec_to_scalar_cost  */
+    1, /* scalar_to_vec_cost  */
+    1, /* align_load_cost  */
+    1, /* unalign_load_cost  */
+    1, /* unalign_store_cost  */
+    1  /* store_cost  */
+  },
+  2, /* clast_cost  */
+  2, /* fadda_f16_cost  */
+  2, /* fadda_f32_cost  */
+  2, /* fadda_f64_cost  */
+  4, /* gather_load_x32_cost  */
+  2, /* gather_load_x64_cost  */
+  1 /* scatter_store_elt_cost  */
+};
+
+/* Generic costs for vector insn classes.  */
+static const struct cpu_vector_cost generic_armv8_a_vector_cost =
+{
+  1, /* scalar_int_stmt_cost  */
+  1, /* scalar_fp_stmt_cost  */
+  1, /* scalar_load_cost  */
+  1, /* scalar_store_cost  */
+  3, /* cond_taken_branch_cost  */
+  1, /* cond_not_taken_branch_cost  */
+  &generic_armv8_a_advsimd_vector_cost, /* advsimd  */
+  &generic_armv8_a_sve_vector_cost, /* sve */
+  nullptr /* issue_info  */
+};
+
+/* Generic costs for branch instructions.  */
+static const struct cpu_branch_cost generic_armv8_a_branch_cost =
+{
+  1,  /* Predictable.  */
+  3   /* Unpredictable.  */
+};
+
+/* Generic approximation modes.  */
+static const cpu_approx_modes generic_armv8_a_approx_modes =
+{
+  AARCH64_APPROX_NONE,	/* division  */
+  AARCH64_APPROX_NONE,	/* sqrt  */
+  AARCH64_APPROX_NONE	/* recip_sqrt  */
+};
+
+/* Generic prefetch settings (which disable prefetch).  */
+static const cpu_prefetch_tune generic_armv8_a_prefetch_tune =
+{
+  0,			/* num_slots  */
+  -1,			/* l1_cache_size  */
+  -1,			/* l1_cache_line_size  */
+  -1,			/* l2_cache_size  */
+  true,			/* prefetch_dynamic_strides */
+  -1,			/* minimum_stride */
+  -1			/* default_opt_level  */
+};
+
+static const struct tune_params generic_armv8_a_tunings =
+{
+  &cortexa76_extra_costs,
+  &generic_armv8_a_addrcost_table,
+  &generic_armv8_a_regmove_cost,
+  &generic_armv8_a_vector_cost,
+  &generic_armv8_a_branch_cost,
+  &generic_armv8_a_approx_modes,
+  SVE_NOT_IMPLEMENTED, /* sve_width  */
+  { 4, /* load_int.  */
+    2, /* store_int.  */
+    5, /* load_fp.  */
+    2, /* store_fp.  */
+    4, /* load_pred.  */
+    4 /* store_pred.  */
+  }, /* memmov_cost.  */
+  3, /* issue_rate  */
+  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops  */
+  "32:16",	/* function_align.  */
+  "4",		/* jump_align.  */
+  "32:16",	/* loop_align.  */
+  2,	/* int_reassoc_width.  */
+  4,	/* fp_reassoc_width.  */
+  1,	/* fma_reassoc_width.  */
+  2,	/* vec_reassoc_width.  */
+  2,	/* min_div_recip_mul_sf.  */
+  2,	/* min_div_recip_mul_df.  */
+  0,	/* max_case_values.  */
+  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
+  (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND
+   | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
+   | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
+   | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT),	/* tune_flags.  */
+  &generic_prefetch_tune,
+  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
+  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
+};
+
+#endif /* GCC_AARCH64_H_GENERIC_ARMV8_A.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_asrd_1.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_asrd_1.c
index aac06bd8093bed9e50928ee23f9a075888f14543..96e9935360100e25a4c01cceabc7aa840f520a3e 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/cond_asrd_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_asrd_1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256" } */
+/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256 --param=aarch64-autovec-preference=2" } */
 
 #include <stdint.h>
 
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_4.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_4.c
index f6278916e1afeb3f0cb8fdbff4e98782ad0a726e..6f969a829425960b414508a7e354a1f39426a0e4 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_4.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_4.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256" } */
+/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256 --param=aarch64-autovec-preference=2" } */
 
 #include <stdint.h>
 
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_5.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_5.c
index 03a6636f2d20b12f7e950a5bd6e43216139370fa..e6ec5157cd6dcc6b6dc24c5384432289b6dcdfba 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_5.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_5.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256" } */
+/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256 --param=aarch64-autovec-preference=2" } */
 
 #include <stdint.h>
 
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_uxt_5.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_uxt_5.c
index 9a2bd8f152ff32e8da1c4e2a73a31a249e5991c7..7ed35921b6f914441dc463c4030fcc4663a6813c 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/cond_uxt_5.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_uxt_5.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256" } */
+/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256 --param=aarch64-autovec-preference=2" } */
 
 #include <stdint.h>
 
diff --git a/gcc/testsuite/gcc.target/aarch64/target_attr_13.c b/gcc/testsuite/gcc.target/aarch64/target_attr_13.c
index d5bee3a7b900bf9348c9cbfd67f487c381b13bf6..4bdb167944cda1861dd0462d905149646be69693 100644
--- a/gcc/testsuite/gcc.target/aarch64/target_attr_13.c
+++ b/gcc/testsuite/gcc.target/aarch64/target_attr_13.c
@@ -1,5 +1,5 @@
 /* { dg-do assemble } */
-/* { dg-options "-O2 -march=armv8-a+crc+crypto -mcpu=generic" } */
+/* { dg-options "-O2 -mcpu=generic+crypto" } */
 
 #include "arm_acle.h"
 
diff --git a/gcc/testsuite/gcc.target/aarch64/target_attr_15.c b/gcc/testsuite/gcc.target/aarch64/target_attr_15.c
index 069a0010865334324a100bab358bb53369f122fb..e6f31ba72ee77d1129f3cfbe2d90216d6c355c57 100644
--- a/gcc/testsuite/gcc.target/aarch64/target_attr_15.c
+++ b/gcc/testsuite/gcc.target/aarch64/target_attr_15.c
@@ -1,5 +1,5 @@
 /* { dg-do assemble } */
-/* { dg-options "-march=armv8-a+crypto -mcpu=generic -save-temps" } */
+/* { dg-options "-mcpu=generic+crypto -save-temps" } */
 
 /* Check that "+nothing" clears the ISA flags.  */
 




^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 4/6]AArch64: Add new generic-armv9-a CPU and make it the default for Armv9
  2023-11-15 17:06 [PATCH 1/6]AArch64: Refactor costs models to different files Tamar Christina
  2023-11-15 17:07 ` [PATCH 2/6]AArch64: Remove special handling of generic cpu Tamar Christina
  2023-11-15 17:07 ` [PATCH 3/6]AArch64: Add new generic-armv8-a CPU and make it the default Tamar Christina
@ 2023-11-15 17:08 ` Tamar Christina
  2023-11-16  9:23   ` Richard Earnshaw
  2023-11-15 17:08 ` [PATCH 6/6]AArch64: only emit mismatch error when features would be disabled Tamar Christina
  2023-11-16  9:13 ` [PATCH 1/6]AArch64: Refactor costs models to different files Richard Earnshaw
  4 siblings, 1 reply; 14+ messages in thread
From: Tamar Christina @ 2023-11-15 17:08 UTC (permalink / raw)
  To: gcc-patches
  Cc: nd, Richard.Earnshaw, Marcus.Shawcroft, Kyrylo.Tkachov,
	richard.sandiford

[-- Attachment #1: Type: text/plain, Size: 14297 bytes --]

Hi All,

This patch adds a new generic scheduling model "generic-armv9-a" and makes it
the default for all Armv9 architectures.

-mcpu=generic and -mtune=generic is kept around for those that really want the
deprecated cost model.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	PR target/111370
	* config/aarch64/aarch64-arches.def (armv9-a, armv9.1-a, armv9.2-a,
	armv9.3-a): Update to generic-armv9-a.
	* config/aarch64/aarch64-cores.def (generic-armv9-a): New.
	* config/aarch64/aarch64-tune.md: Regenerate.
	* config/aarch64/aarch64.cc: Include generic_armv9_a.h.
	* config/aarch64/tuning_models/generic_armv9_a.h: New file.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64-arches.def b/gcc/config/aarch64/aarch64-arches.def
index f89e4ea1f48acc2875c9a834d93d94c94163cddc..6b9a19c490ba0b35082077e877b19906138f039b 100644
--- a/gcc/config/aarch64/aarch64-arches.def
+++ b/gcc/config/aarch64/aarch64-arches.def
@@ -40,9 +40,9 @@ AARCH64_ARCH("armv8.6-a",     generic_armv8_a,   V8_6A,     8,  (V8_5A, I8MM, BF
 AARCH64_ARCH("armv8.7-a",     generic_armv8_a,   V8_7A,     8,  (V8_6A, LS64))
 AARCH64_ARCH("armv8.8-a",     generic_armv8_a,   V8_8A,     8,  (V8_7A, MOPS))
 AARCH64_ARCH("armv8-r",       generic_armv8_a,   V8R  ,     8,  (V8_4A))
-AARCH64_ARCH("armv9-a",       generic,           V9A  ,     9,  (V8_5A, SVE2))
-AARCH64_ARCH("armv9.1-a",     generic,           V9_1A,     9,  (V8_6A, V9A))
-AARCH64_ARCH("armv9.2-a",     generic,           V9_2A,     9,  (V8_7A, V9_1A))
-AARCH64_ARCH("armv9.3-a",     generic,           V9_3A,     9,  (V8_8A, V9_2A))
+AARCH64_ARCH("armv9-a",       generic_armv9_a,   V9A  ,     9,  (V8_5A, SVE2))
+AARCH64_ARCH("armv9.1-a",     generic_armv9_a,   V9_1A,     9,  (V8_6A, V9A))
+AARCH64_ARCH("armv9.2-a",     generic_armv9_a,   V9_2A,     9,  (V8_7A, V9_1A))
+AARCH64_ARCH("armv9.3-a",     generic_armv9_a,   V9_3A,     9,  (V8_8A, V9_2A))
 
 #undef AARCH64_ARCH
diff --git a/gcc/config/aarch64/aarch64-cores.def b/gcc/config/aarch64/aarch64-cores.def
index 30f4dd04ed71823bc34c0c405d49963b6b2d1375..16752b77f4baf8d1aa8a5406826aa29e367120c5 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -191,6 +191,7 @@ AARCH64_CORE("demeter", demeter, cortexa57, V9A, (I8MM, BF16, SVE2_BITPERM, RNG,
 
 /* Generic Architecture Processors.  */
 AARCH64_CORE("generic",  generic, cortexa53, V8A,  (), generic, 0x0, 0x0, -1)
-AARCH64_CORE("generic-armv8-a",  generic_armv8_a, cortexa53, V8A,  (), generic_armv8_a, 0x0, 0x0, -1)
+AARCH64_CORE("generic-armv8-a",  generic_armv8_a, cortexa53, V8A, (), generic_armv8_a, 0x0, 0x0, -1)
+AARCH64_CORE("generic-armv9-a",  generic_armv9_a, cortexa53, V9A, (), generic_armv9_a, 0x0, 0x0, -1)
 
 #undef AARCH64_CORE
diff --git a/gcc/config/aarch64/aarch64-tune.md b/gcc/config/aarch64/aarch64-tune.md
index 0a32056f255de455f47a0b7395dfef0af84c6b5e..61bb85211252970f0a0526929d6b88353bdd930f 100644
--- a/gcc/config/aarch64/aarch64-tune.md
+++ b/gcc/config/aarch64/aarch64-tune.md
@@ -1,5 +1,5 @@
 ;; -*- buffer-read-only: t -*-
 ;; Generated automatically by gentune.sh from aarch64-cores.def
 (define_attr "tune"
-	"cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,ampere1a,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,neoversen1,ares,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,neoversev1,zeus,neoverse512tvb,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa520,cortexa710,cortexa715,cortexa720,cortexx2,cortexx3,cortexx4,neoversen2,neoversev2,demeter,generic,generic_armv8_a"
+	"cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,ampere1a,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,neoversen1,ares,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,neoversev1,zeus,neoverse512tvb,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa520,cortexa710,cortexa715,cortexa720,cortexx2,cortexx3,cortexx4,neoversen2,neoversev2,demeter,generic,generic_armv8_a,generic_armv9_a"
 	(const (symbol_ref "((enum attr_tune) aarch64_tune)")))
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 08635e0df9cfa02286f3950383a32f6f93d1b4e0..5bed5f84cef242ec01f8510c76a450f81a985521 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -356,6 +356,7 @@ static const struct aarch64_flag_desc aarch64_tuning_flags[] =
 /* Tuning parameters.  */
 #include "tuning_models/generic.h"
 #include "tuning_models/generic_armv8_a.h"
+#include "tuning_models/generic_armv9_a.h"
 #include "tuning_models/cortexa35.h"
 #include "tuning_models/cortexa53.h"
 #include "tuning_models/cortexa57.h"
diff --git a/gcc/config/aarch64/tuning_models/generic_armv9_a.h b/gcc/config/aarch64/tuning_models/generic_armv9_a.h
new file mode 100644
index 0000000000000000000000000000000000000000..c017468592a9dba74ddd432247aaf51a70bb34b5
--- /dev/null
+++ b/gcc/config/aarch64/tuning_models/generic_armv9_a.h
@@ -0,0 +1,245 @@
+/* Tuning model description for AArch64 architecture.
+   Copyright (C) 2009-2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_AARCH64_H_GENERIC_ARMV9_A
+#define GCC_AARCH64_H_GENERIC_ARMV9_A
+
+#include "generic.h"
+#include "generic_armv8_a.h"
+
+static const struct cpu_addrcost_table generic_armv9_a_addrcost_table =
+{
+    {
+      1, /* hi  */
+      0, /* si  */
+      0, /* di  */
+      1, /* ti  */
+    },
+  0, /* pre_modify  */
+  0, /* post_modify  */
+  2, /* post_modify_ld3_st3  */
+  2, /* post_modify_ld4_st4  */
+  0, /* register_offset  */
+  0, /* register_sextend  */
+  0, /* register_zextend  */
+  0 /* imm_offset  */
+};
+
+static const struct cpu_regmove_cost generic_armv9_a_regmove_cost =
+{
+  1, /* GP2GP  */
+  /* Spilling to int<->fp instead of memory is recommended so set
+     realistic costs compared to memmov_cost.  */
+  3, /* GP2FP  */
+  2, /* FP2GP  */
+  2 /* FP2FP  */
+};
+
+static const advsimd_vec_cost generic_armv9_a_advsimd_vector_cost =
+{
+  2, /* int_stmt_cost  */
+  2, /* fp_stmt_cost  */
+  2, /* ld2_st2_permute_cost */
+  2, /* ld3_st3_permute_cost  */
+  3, /* ld4_st4_permute_cost  */
+  3, /* permute_cost  */
+  4, /* reduc_i8_cost  */
+  4, /* reduc_i16_cost  */
+  2, /* reduc_i32_cost  */
+  2, /* reduc_i64_cost  */
+  6, /* reduc_f16_cost  */
+  4, /* reduc_f32_cost  */
+  2, /* reduc_f64_cost  */
+  2, /* store_elt_extra_cost  */
+  /* This value is just inherited from the Cortex-A57 table.  */
+  8, /* vec_to_scalar_cost  */
+  /* This depends very much on what the scalar value is and
+     where it comes from.  E.g. some constants take two dependent
+     instructions or a load, while others might be moved from a GPR.
+     4 seems to be a reasonable compromise in practice.  */
+  4, /* scalar_to_vec_cost  */
+  4, /* align_load_cost  */
+  4, /* unalign_load_cost  */
+  /* Although stores have a latency of 2 and compete for the
+     vector pipes, in practice it's better not to model that.  */
+  1, /* unalign_store_cost  */
+  1  /* store_cost  */
+};
+
+static const sve_vec_cost generic_armv9_a_sve_vector_cost =
+{
+  {
+    2, /* int_stmt_cost  */
+    2, /* fp_stmt_cost  */
+    3, /* ld2_st2_permute_cost  */
+    4, /* ld3_st3_permute_cost  */
+    4, /* ld4_st4_permute_cost  */
+    3, /* permute_cost  */
+    /* Theoretically, a reduction involving 15 scalar ADDs could
+       complete in ~5 cycles and would have a cost of 15.  [SU]ADDV
+       completes in 11 cycles, so give it a cost of 15 + 6.  */
+    21, /* reduc_i8_cost  */
+    /* Likewise for 7 scalar ADDs (~3 cycles) vs. 9: 7 + 6.  */
+    13, /* reduc_i16_cost  */
+    /* Likewise for 3 scalar ADDs (~2 cycles) vs. 8: 3 + 6.  */
+    9, /* reduc_i32_cost  */
+    /* Likewise for 1 scalar ADD (~1 cycles) vs. 2: 1 + 1.  */
+    2, /* reduc_i64_cost  */
+    /* Theoretically, a reduction involving 7 scalar FADDs could
+       complete in ~8 cycles and would have a cost of 14.  FADDV
+       completes in 6 cycles, so give it a cost of 14 - 2.  */
+    12, /* reduc_f16_cost  */
+    /* Likewise for 3 scalar FADDs (~4 cycles) vs. 4: 6 - 0.  */
+    6, /* reduc_f32_cost  */
+    /* Likewise for 1 scalar FADD (~2 cycles) vs. 2: 2 - 0.  */
+    2, /* reduc_f64_cost  */
+    2, /* store_elt_extra_cost  */
+    /* This value is just inherited from the Cortex-A57 table.  */
+    8, /* vec_to_scalar_cost  */
+    /* See the comment above the Advanced SIMD versions.  */
+    4, /* scalar_to_vec_cost  */
+    4, /* align_load_cost  */
+    4, /* unalign_load_cost  */
+    /* Although stores have a latency of 2 and compete for the
+       vector pipes, in practice it's better not to model that.  */
+    1, /* unalign_store_cost  */
+    1  /* store_cost  */
+  },
+  3, /* clast_cost  */
+  10, /* fadda_f16_cost  */
+  6, /* fadda_f32_cost  */
+  4, /* fadda_f64_cost  */
+  /* A strided Advanced SIMD x64 load would take two parallel FP loads
+     (8 cycles) plus an insertion (2 cycles).  Assume a 64-bit SVE gather
+     is 1 cycle more.  The Advanced SIMD version is costed as 2 scalar loads
+     (cost 8) and a vec_construct (cost 2).  Add a full vector operation
+     (cost 2) to that, to avoid the difference being lost in rounding.
+
+     There is no easy comparison between a strided Advanced SIMD x32 load
+     and an SVE 32-bit gather, but cost an SVE 32-bit gather as 1 vector
+     operation more than a 64-bit gather.  */
+  14, /* gather_load_x32_cost  */
+  12, /* gather_load_x64_cost  */
+  3 /* scatter_store_elt_cost  */
+};
+
+static const aarch64_scalar_vec_issue_info generic_armv9_a_scalar_issue_info =
+{
+  3, /* loads_stores_per_cycle  */
+  2, /* stores_per_cycle  */
+  4, /* general_ops_per_cycle  */
+  0, /* fp_simd_load_general_ops  */
+  1 /* fp_simd_store_general_ops  */
+};
+
+static const aarch64_advsimd_vec_issue_info generic_armv9_a_advsimd_issue_info =
+{
+  {
+    3, /* loads_stores_per_cycle  */
+    2, /* stores_per_cycle  */
+    2, /* general_ops_per_cycle  */
+    0, /* fp_simd_load_general_ops  */
+    1 /* fp_simd_store_general_ops  */
+  },
+  2, /* ld2_st2_general_ops  */
+  2, /* ld3_st3_general_ops  */
+  3 /* ld4_st4_general_ops  */
+};
+
+static const aarch64_sve_vec_issue_info generic_armv9_a_sve_issue_info =
+{
+  {
+    {
+      3, /* loads_per_cycle  */
+      2, /* stores_per_cycle  */
+      2, /* general_ops_per_cycle  */
+      0, /* fp_simd_load_general_ops  */
+      1 /* fp_simd_store_general_ops  */
+    },
+    2, /* ld2_st2_general_ops  */
+    3, /* ld3_st3_general_ops  */
+    3 /* ld4_st4_general_ops  */
+  },
+  2, /* pred_ops_per_cycle  */
+  2, /* while_pred_ops  */
+  2, /* int_cmp_pred_ops  */
+  1, /* fp_cmp_pred_ops  */
+  1, /* gather_scatter_pair_general_ops  */
+  1 /* gather_scatter_pair_pred_ops  */
+};
+
+static const aarch64_vec_issue_info generic_armv9_a_vec_issue_info =
+{
+  &generic_armv9_a_scalar_issue_info,
+  &generic_armv9_a_advsimd_issue_info,
+  &generic_armv9_a_sve_issue_info
+};
+
+/* Neoverse N2 costs for vector insn classes.  */
+static const struct cpu_vector_cost generic_armv9_a_vector_cost =
+{
+  1, /* scalar_int_stmt_cost  */
+  2, /* scalar_fp_stmt_cost  */
+  4, /* scalar_load_cost  */
+  1, /* scalar_store_cost  */
+  1, /* cond_taken_branch_cost  */
+  1, /* cond_not_taken_branch_cost  */
+  &generic_armv9_a_advsimd_vector_cost, /* advsimd  */
+  &generic_armv9_a_sve_vector_cost, /* sve  */
+  &generic_armv9_a_vec_issue_info /* issue_info  */
+};
+
+static const struct tune_params generic_armv9_a_tunings =
+{
+  &cortexa76_extra_costs,
+  &generic_armv9_a_addrcost_table,
+  &generic_armv9_a_regmove_cost,
+  &generic_armv9_a_vector_cost,
+  &generic_branch_cost,
+  &generic_approx_modes,
+  SVE_SCALABLE, /* sve_width  */
+  { 4, /* load_int.  */
+    1, /* store_int.  */
+    6, /* load_fp.  */
+    2, /* store_fp.  */
+    6, /* load_pred.  */
+    1 /* store_pred.  */
+  }, /* memmov_cost.  */
+  3, /* issue_rate  */
+  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops  */
+  "32:16",	/* function_align.  */
+  "4",		/* jump_align.  */
+  "32:16",	/* loop_align.  */
+  2,	/* int_reassoc_width.  */
+  4,	/* fp_reassoc_width.  */
+  1,	/* fma_reassoc_width.  */
+  2,	/* vec_reassoc_width.  */
+  2,	/* min_div_recip_mul_sf.  */
+  2,	/* min_div_recip_mul_df.  */
+  0,	/* max_case_values.  */
+  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
+  (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND
+   | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
+   | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT),	/* tune_flags.  */
+  &generic_prefetch_tune,
+  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
+  AARCH64_LDP_STP_POLICY_ALWAYS	   /* stp_policy_model.  */
+};
+
+#endif /* GCC_AARCH64_H_GENERIC_ARMV9_A.  */




-- 

[-- Attachment #2: rb17818.patch --]
[-- Type: text/plain, Size: 13566 bytes --]

diff --git a/gcc/config/aarch64/aarch64-arches.def b/gcc/config/aarch64/aarch64-arches.def
index f89e4ea1f48acc2875c9a834d93d94c94163cddc..6b9a19c490ba0b35082077e877b19906138f039b 100644
--- a/gcc/config/aarch64/aarch64-arches.def
+++ b/gcc/config/aarch64/aarch64-arches.def
@@ -40,9 +40,9 @@ AARCH64_ARCH("armv8.6-a",     generic_armv8_a,   V8_6A,     8,  (V8_5A, I8MM, BF
 AARCH64_ARCH("armv8.7-a",     generic_armv8_a,   V8_7A,     8,  (V8_6A, LS64))
 AARCH64_ARCH("armv8.8-a",     generic_armv8_a,   V8_8A,     8,  (V8_7A, MOPS))
 AARCH64_ARCH("armv8-r",       generic_armv8_a,   V8R  ,     8,  (V8_4A))
-AARCH64_ARCH("armv9-a",       generic,           V9A  ,     9,  (V8_5A, SVE2))
-AARCH64_ARCH("armv9.1-a",     generic,           V9_1A,     9,  (V8_6A, V9A))
-AARCH64_ARCH("armv9.2-a",     generic,           V9_2A,     9,  (V8_7A, V9_1A))
-AARCH64_ARCH("armv9.3-a",     generic,           V9_3A,     9,  (V8_8A, V9_2A))
+AARCH64_ARCH("armv9-a",       generic_armv9_a,   V9A  ,     9,  (V8_5A, SVE2))
+AARCH64_ARCH("armv9.1-a",     generic_armv9_a,   V9_1A,     9,  (V8_6A, V9A))
+AARCH64_ARCH("armv9.2-a",     generic_armv9_a,   V9_2A,     9,  (V8_7A, V9_1A))
+AARCH64_ARCH("armv9.3-a",     generic_armv9_a,   V9_3A,     9,  (V8_8A, V9_2A))
 
 #undef AARCH64_ARCH
diff --git a/gcc/config/aarch64/aarch64-cores.def b/gcc/config/aarch64/aarch64-cores.def
index 30f4dd04ed71823bc34c0c405d49963b6b2d1375..16752b77f4baf8d1aa8a5406826aa29e367120c5 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -191,6 +191,7 @@ AARCH64_CORE("demeter", demeter, cortexa57, V9A, (I8MM, BF16, SVE2_BITPERM, RNG,
 
 /* Generic Architecture Processors.  */
 AARCH64_CORE("generic",  generic, cortexa53, V8A,  (), generic, 0x0, 0x0, -1)
-AARCH64_CORE("generic-armv8-a",  generic_armv8_a, cortexa53, V8A,  (), generic_armv8_a, 0x0, 0x0, -1)
+AARCH64_CORE("generic-armv8-a",  generic_armv8_a, cortexa53, V8A, (), generic_armv8_a, 0x0, 0x0, -1)
+AARCH64_CORE("generic-armv9-a",  generic_armv9_a, cortexa53, V9A, (), generic_armv9_a, 0x0, 0x0, -1)
 
 #undef AARCH64_CORE
diff --git a/gcc/config/aarch64/aarch64-tune.md b/gcc/config/aarch64/aarch64-tune.md
index 0a32056f255de455f47a0b7395dfef0af84c6b5e..61bb85211252970f0a0526929d6b88353bdd930f 100644
--- a/gcc/config/aarch64/aarch64-tune.md
+++ b/gcc/config/aarch64/aarch64-tune.md
@@ -1,5 +1,5 @@
 ;; -*- buffer-read-only: t -*-
 ;; Generated automatically by gentune.sh from aarch64-cores.def
 (define_attr "tune"
-	"cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,ampere1a,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,neoversen1,ares,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,neoversev1,zeus,neoverse512tvb,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa520,cortexa710,cortexa715,cortexa720,cortexx2,cortexx3,cortexx4,neoversen2,neoversev2,demeter,generic,generic_armv8_a"
+	"cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,ampere1a,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,neoversen1,ares,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,neoversev1,zeus,neoverse512tvb,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa520,cortexa710,cortexa715,cortexa720,cortexx2,cortexx3,cortexx4,neoversen2,neoversev2,demeter,generic,generic_armv8_a,generic_armv9_a"
 	(const (symbol_ref "((enum attr_tune) aarch64_tune)")))
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 08635e0df9cfa02286f3950383a32f6f93d1b4e0..5bed5f84cef242ec01f8510c76a450f81a985521 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -356,6 +356,7 @@ static const struct aarch64_flag_desc aarch64_tuning_flags[] =
 /* Tuning parameters.  */
 #include "tuning_models/generic.h"
 #include "tuning_models/generic_armv8_a.h"
+#include "tuning_models/generic_armv9_a.h"
 #include "tuning_models/cortexa35.h"
 #include "tuning_models/cortexa53.h"
 #include "tuning_models/cortexa57.h"
diff --git a/gcc/config/aarch64/tuning_models/generic_armv9_a.h b/gcc/config/aarch64/tuning_models/generic_armv9_a.h
new file mode 100644
index 0000000000000000000000000000000000000000..c017468592a9dba74ddd432247aaf51a70bb34b5
--- /dev/null
+++ b/gcc/config/aarch64/tuning_models/generic_armv9_a.h
@@ -0,0 +1,245 @@
+/* Tuning model description for AArch64 architecture.
+   Copyright (C) 2009-2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_AARCH64_H_GENERIC_ARMV9_A
+#define GCC_AARCH64_H_GENERIC_ARMV9_A
+
+#include "generic.h"
+#include "generic_armv8_a.h"
+
+static const struct cpu_addrcost_table generic_armv9_a_addrcost_table =
+{
+    {
+      1, /* hi  */
+      0, /* si  */
+      0, /* di  */
+      1, /* ti  */
+    },
+  0, /* pre_modify  */
+  0, /* post_modify  */
+  2, /* post_modify_ld3_st3  */
+  2, /* post_modify_ld4_st4  */
+  0, /* register_offset  */
+  0, /* register_sextend  */
+  0, /* register_zextend  */
+  0 /* imm_offset  */
+};
+
+static const struct cpu_regmove_cost generic_armv9_a_regmove_cost =
+{
+  1, /* GP2GP  */
+  /* Spilling to int<->fp instead of memory is recommended so set
+     realistic costs compared to memmov_cost.  */
+  3, /* GP2FP  */
+  2, /* FP2GP  */
+  2 /* FP2FP  */
+};
+
+static const advsimd_vec_cost generic_armv9_a_advsimd_vector_cost =
+{
+  2, /* int_stmt_cost  */
+  2, /* fp_stmt_cost  */
+  2, /* ld2_st2_permute_cost */
+  2, /* ld3_st3_permute_cost  */
+  3, /* ld4_st4_permute_cost  */
+  3, /* permute_cost  */
+  4, /* reduc_i8_cost  */
+  4, /* reduc_i16_cost  */
+  2, /* reduc_i32_cost  */
+  2, /* reduc_i64_cost  */
+  6, /* reduc_f16_cost  */
+  4, /* reduc_f32_cost  */
+  2, /* reduc_f64_cost  */
+  2, /* store_elt_extra_cost  */
+  /* This value is just inherited from the Cortex-A57 table.  */
+  8, /* vec_to_scalar_cost  */
+  /* This depends very much on what the scalar value is and
+     where it comes from.  E.g. some constants take two dependent
+     instructions or a load, while others might be moved from a GPR.
+     4 seems to be a reasonable compromise in practice.  */
+  4, /* scalar_to_vec_cost  */
+  4, /* align_load_cost  */
+  4, /* unalign_load_cost  */
+  /* Although stores have a latency of 2 and compete for the
+     vector pipes, in practice it's better not to model that.  */
+  1, /* unalign_store_cost  */
+  1  /* store_cost  */
+};
+
+static const sve_vec_cost generic_armv9_a_sve_vector_cost =
+{
+  {
+    2, /* int_stmt_cost  */
+    2, /* fp_stmt_cost  */
+    3, /* ld2_st2_permute_cost  */
+    4, /* ld3_st3_permute_cost  */
+    4, /* ld4_st4_permute_cost  */
+    3, /* permute_cost  */
+    /* Theoretically, a reduction involving 15 scalar ADDs could
+       complete in ~5 cycles and would have a cost of 15.  [SU]ADDV
+       completes in 11 cycles, so give it a cost of 15 + 6.  */
+    21, /* reduc_i8_cost  */
+    /* Likewise for 7 scalar ADDs (~3 cycles) vs. 9: 7 + 6.  */
+    13, /* reduc_i16_cost  */
+    /* Likewise for 3 scalar ADDs (~2 cycles) vs. 8: 3 + 6.  */
+    9, /* reduc_i32_cost  */
+    /* Likewise for 1 scalar ADD (~1 cycles) vs. 2: 1 + 1.  */
+    2, /* reduc_i64_cost  */
+    /* Theoretically, a reduction involving 7 scalar FADDs could
+       complete in ~8 cycles and would have a cost of 14.  FADDV
+       completes in 6 cycles, so give it a cost of 14 - 2.  */
+    12, /* reduc_f16_cost  */
+    /* Likewise for 3 scalar FADDs (~4 cycles) vs. 4: 6 - 0.  */
+    6, /* reduc_f32_cost  */
+    /* Likewise for 1 scalar FADD (~2 cycles) vs. 2: 2 - 0.  */
+    2, /* reduc_f64_cost  */
+    2, /* store_elt_extra_cost  */
+    /* This value is just inherited from the Cortex-A57 table.  */
+    8, /* vec_to_scalar_cost  */
+    /* See the comment above the Advanced SIMD versions.  */
+    4, /* scalar_to_vec_cost  */
+    4, /* align_load_cost  */
+    4, /* unalign_load_cost  */
+    /* Although stores have a latency of 2 and compete for the
+       vector pipes, in practice it's better not to model that.  */
+    1, /* unalign_store_cost  */
+    1  /* store_cost  */
+  },
+  3, /* clast_cost  */
+  10, /* fadda_f16_cost  */
+  6, /* fadda_f32_cost  */
+  4, /* fadda_f64_cost  */
+  /* A strided Advanced SIMD x64 load would take two parallel FP loads
+     (8 cycles) plus an insertion (2 cycles).  Assume a 64-bit SVE gather
+     is 1 cycle more.  The Advanced SIMD version is costed as 2 scalar loads
+     (cost 8) and a vec_construct (cost 2).  Add a full vector operation
+     (cost 2) to that, to avoid the difference being lost in rounding.
+
+     There is no easy comparison between a strided Advanced SIMD x32 load
+     and an SVE 32-bit gather, but cost an SVE 32-bit gather as 1 vector
+     operation more than a 64-bit gather.  */
+  14, /* gather_load_x32_cost  */
+  12, /* gather_load_x64_cost  */
+  3 /* scatter_store_elt_cost  */
+};
+
+static const aarch64_scalar_vec_issue_info generic_armv9_a_scalar_issue_info =
+{
+  3, /* loads_stores_per_cycle  */
+  2, /* stores_per_cycle  */
+  4, /* general_ops_per_cycle  */
+  0, /* fp_simd_load_general_ops  */
+  1 /* fp_simd_store_general_ops  */
+};
+
+static const aarch64_advsimd_vec_issue_info generic_armv9_a_advsimd_issue_info =
+{
+  {
+    3, /* loads_stores_per_cycle  */
+    2, /* stores_per_cycle  */
+    2, /* general_ops_per_cycle  */
+    0, /* fp_simd_load_general_ops  */
+    1 /* fp_simd_store_general_ops  */
+  },
+  2, /* ld2_st2_general_ops  */
+  2, /* ld3_st3_general_ops  */
+  3 /* ld4_st4_general_ops  */
+};
+
+static const aarch64_sve_vec_issue_info generic_armv9_a_sve_issue_info =
+{
+  {
+    {
+      3, /* loads_per_cycle  */
+      2, /* stores_per_cycle  */
+      2, /* general_ops_per_cycle  */
+      0, /* fp_simd_load_general_ops  */
+      1 /* fp_simd_store_general_ops  */
+    },
+    2, /* ld2_st2_general_ops  */
+    3, /* ld3_st3_general_ops  */
+    3 /* ld4_st4_general_ops  */
+  },
+  2, /* pred_ops_per_cycle  */
+  2, /* while_pred_ops  */
+  2, /* int_cmp_pred_ops  */
+  1, /* fp_cmp_pred_ops  */
+  1, /* gather_scatter_pair_general_ops  */
+  1 /* gather_scatter_pair_pred_ops  */
+};
+
+static const aarch64_vec_issue_info generic_armv9_a_vec_issue_info =
+{
+  &generic_armv9_a_scalar_issue_info,
+  &generic_armv9_a_advsimd_issue_info,
+  &generic_armv9_a_sve_issue_info
+};
+
+/* Neoverse N2 costs for vector insn classes.  */
+static const struct cpu_vector_cost generic_armv9_a_vector_cost =
+{
+  1, /* scalar_int_stmt_cost  */
+  2, /* scalar_fp_stmt_cost  */
+  4, /* scalar_load_cost  */
+  1, /* scalar_store_cost  */
+  1, /* cond_taken_branch_cost  */
+  1, /* cond_not_taken_branch_cost  */
+  &generic_armv9_a_advsimd_vector_cost, /* advsimd  */
+  &generic_armv9_a_sve_vector_cost, /* sve  */
+  &generic_armv9_a_vec_issue_info /* issue_info  */
+};
+
+static const struct tune_params generic_armv9_a_tunings =
+{
+  &cortexa76_extra_costs,
+  &generic_armv9_a_addrcost_table,
+  &generic_armv9_a_regmove_cost,
+  &generic_armv9_a_vector_cost,
+  &generic_branch_cost,
+  &generic_approx_modes,
+  SVE_SCALABLE, /* sve_width  */
+  { 4, /* load_int.  */
+    1, /* store_int.  */
+    6, /* load_fp.  */
+    2, /* store_fp.  */
+    6, /* load_pred.  */
+    1 /* store_pred.  */
+  }, /* memmov_cost.  */
+  3, /* issue_rate  */
+  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops  */
+  "32:16",	/* function_align.  */
+  "4",		/* jump_align.  */
+  "32:16",	/* loop_align.  */
+  2,	/* int_reassoc_width.  */
+  4,	/* fp_reassoc_width.  */
+  1,	/* fma_reassoc_width.  */
+  2,	/* vec_reassoc_width.  */
+  2,	/* min_div_recip_mul_sf.  */
+  2,	/* min_div_recip_mul_df.  */
+  0,	/* max_case_values.  */
+  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
+  (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND
+   | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
+   | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT),	/* tune_flags.  */
+  &generic_prefetch_tune,
+  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
+  AARCH64_LDP_STP_POLICY_ALWAYS	   /* stp_policy_model.  */
+};
+
+#endif /* GCC_AARCH64_H_GENERIC_ARMV9_A.  */




^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 6/6]AArch64: only emit mismatch error when features would be disabled.
  2023-11-15 17:06 [PATCH 1/6]AArch64: Refactor costs models to different files Tamar Christina
                   ` (2 preceding siblings ...)
  2023-11-15 17:08 ` [PATCH 4/6]AArch64: Add new generic-armv9-a CPU and make it the default for Armv9 Tamar Christina
@ 2023-11-15 17:08 ` Tamar Christina
  2023-11-16  9:26   ` Richard Earnshaw
  2023-11-16 10:33   ` Richard Earnshaw
  2023-11-16  9:13 ` [PATCH 1/6]AArch64: Refactor costs models to different files Richard Earnshaw
  4 siblings, 2 replies; 14+ messages in thread
From: Tamar Christina @ 2023-11-15 17:08 UTC (permalink / raw)
  To: gcc-patches
  Cc: nd, Richard.Earnshaw, Marcus.Shawcroft, Kyrylo.Tkachov,
	richard.sandiford

[-- Attachment #1: Type: text/plain, Size: 4143 bytes --]

Hi All,

At the moment we emit a warning whenever you specify both -march and -mcpu
and the architecture of them differ.  The idea originally was that the user may
not be aware of this change.

However this has a few problems:

1.  Architecture revisions is not an observable part of the architecture,
    extensions are.  Starting with GCC 14 we have therefore relaxed the rule that
    all extensions can be enabled at any architecture level.  Therefore it's
    incorrect, or at least not useful to keep the check on architecture.

2.  It's problematic in Makefiles and other build systems, where you want to
    for certain files enable CPU specific builds.  i.e. you may be by default
    building for -march=armv8-a but for some file for -mcpu=neoverse-n1.  Since
    there's no easy way to remove the earlier options we end up warning and
    there's no way to disable just this warning.  Build systems compiling with
    -Werror face an issue in this case that compiling with GCC is needlessly
    hard.

3. It doesn't actually warn for cases that may lead to issues, so e.g.
   -march=armv8.2-a+sve -mcpu=neoverse-n1 does not give a warning that SVE would
   be disabled.

For this reason I have one of two proposals:

1.  Just remove this warning all together.

2.  Rework the warning based on extensions and only warn when features would be
    disabled by the presence of the -mcpu.  This is the approach this patch has
    taken.

As examples:

> aarch64-none-linux-gnu-gcc -march=armv8.2-a+sve -mcpu=neoverse-n1
cc1: warning: switch ‘-mcpu=neoverse-n1’ conflicts with ‘-march=armv8.2-a+sve’ switch and resulted in options +crc+sve+norcpc+nodotprod being added                                                                                                                                        .arch armv8.2-a+crc+sve

> aarch64-none-linux-gnu-gcc -march=armv8.2-a -mcpu=neoverse-n1
> aarch64-none-linux-gnu-gcc -march=armv8.2-a+dotprod -mcpu=neoverse-n1
> aarch64-none-linux-gnu-gcc -march=armv8.2-a+dotprod -mcpu=neoverse-n2
<no warning>

The one remaining issue here is that if both -march and -mcpu are specified we
pick the -march.  This is not particularly obvious and for the use case to be
more useful I think it makes sense to pick the CPU's arch?

I did not make that change in the patch as it changes semantics.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Note that I can't write a test for this because dg-warning expects warnings to
be at a particular line and doesn't support warnings at the "global" level.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* config/aarch64/aarch64.cc (aarch64_override_options): Rework warnings.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index caf80d66b3a744cc93899645aa5f9374983cd3db..3afd222ad3bdcfb922cc010dcc0b138db29caf7f 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -16388,12 +16388,22 @@ aarch64_override_options (void)
   if (cpu && arch)
     {
       /* If both -mcpu and -march are specified, warn if they are not
-	 architecturally compatible and prefer the -march ISA flags.  */
-      if (arch->arch != cpu->arch)
-	{
-	  warning (0, "switch %<-mcpu=%s%> conflicts with %<-march=%s%> switch",
+	 feature compatible.  feature compatible means that the inclusion of the
+	 cpu features would end up disabling an achitecture feature.  In
+	 otherwords the cpu features need to be a strict superset of the arch
+	 features and if so prefer the -march ISA flags.  */
+      auto full_arch_flags = arch->flags | arch_isa;
+      auto full_cpu_flags = cpu->flags | cpu_isa;
+      if (~full_cpu_flags & full_arch_flags)
+	{
+	  std::string ext_diff
+	    = aarch64_get_extension_string_for_isa_flags (full_arch_flags,
+							  full_cpu_flags);
+	  warning (0, "switch %<-mcpu=%s%> conflicts with %<-march=%s%> switch "
+		      "and resulted in options %s being added",
 		       aarch64_cpu_string,
-		       aarch64_arch_string);
+		       aarch64_arch_string,
+		       ext_diff.c_str ());
 	}
 
       selected_arch = arch->arch;




-- 

[-- Attachment #2: rb17820.patch --]
[-- Type: text/plain, Size: 1419 bytes --]

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index caf80d66b3a744cc93899645aa5f9374983cd3db..3afd222ad3bdcfb922cc010dcc0b138db29caf7f 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -16388,12 +16388,22 @@ aarch64_override_options (void)
   if (cpu && arch)
     {
       /* If both -mcpu and -march are specified, warn if they are not
-	 architecturally compatible and prefer the -march ISA flags.  */
-      if (arch->arch != cpu->arch)
-	{
-	  warning (0, "switch %<-mcpu=%s%> conflicts with %<-march=%s%> switch",
+	 feature compatible.  feature compatible means that the inclusion of the
+	 cpu features would end up disabling an achitecture feature.  In
+	 otherwords the cpu features need to be a strict superset of the arch
+	 features and if so prefer the -march ISA flags.  */
+      auto full_arch_flags = arch->flags | arch_isa;
+      auto full_cpu_flags = cpu->flags | cpu_isa;
+      if (~full_cpu_flags & full_arch_flags)
+	{
+	  std::string ext_diff
+	    = aarch64_get_extension_string_for_isa_flags (full_arch_flags,
+							  full_cpu_flags);
+	  warning (0, "switch %<-mcpu=%s%> conflicts with %<-march=%s%> switch "
+		      "and resulted in options %s being added",
 		       aarch64_cpu_string,
-		       aarch64_arch_string);
+		       aarch64_arch_string,
+		       ext_diff.c_str ());
 	}
 
       selected_arch = arch->arch;




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/6]AArch64: Refactor costs models to different files.
  2023-11-15 17:06 [PATCH 1/6]AArch64: Refactor costs models to different files Tamar Christina
                   ` (3 preceding siblings ...)
  2023-11-15 17:08 ` [PATCH 6/6]AArch64: only emit mismatch error when features would be disabled Tamar Christina
@ 2023-11-16  9:13 ` Richard Earnshaw
  4 siblings, 0 replies; 14+ messages in thread
From: Richard Earnshaw @ 2023-11-16  9:13 UTC (permalink / raw)
  To: Tamar Christina, gcc-patches
  Cc: nd, Richard.Earnshaw, Marcus.Shawcroft, Kyrylo.Tkachov,
	richard.sandiford



On 15/11/2023 17:06, Tamar Christina wrote:
> Hi All,
> 
> This patch series attempts to move the generic cost model in AArch64 to a new
> and modern generic standard.  The current standard is quite old and generates
> very suboptimal code out of the box for user of GCC.
> 
> The goal is for the new cost model to be beneficial on newer/current Arm
> Microarchitectures while not being too negative for older ones.
> 
> It does not change any core specific optimization.  The final changes reflect
> both performance optimizations and size optimizations.
> 
> This first patch just re-organizes the cost structures to their own files.
> The AArch64.cc file has gotten very big and it's hard to follow.
> 
> No functional changes are expected from this change.  Note that since all the
> structures have private visibility I've put them in header files instead.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	PR target/111370
> 	* config/aarch64/aarch64.cc (generic_addrcost_table,
> 	exynosm1_addrcost_table,
> 	xgene1_addrcost_table,
> 	thunderx2t99_addrcost_table,
> 	thunderx3t110_addrcost_table,
> 	tsv110_addrcost_table,
> 	qdf24xx_addrcost_table,
> 	a64fx_addrcost_table,
> 	neoversev1_addrcost_table,
> 	neoversen2_addrcost_table,
> 	neoversev2_addrcost_table,
> 	generic_regmove_cost,
> 	cortexa57_regmove_cost,
> 	cortexa53_regmove_cost,
> 	exynosm1_regmove_cost,
> 	thunderx_regmove_cost,
> 	xgene1_regmove_cost,
> 	qdf24xx_regmove_cost,
> 	thunderx2t99_regmove_cost,
> 	thunderx3t110_regmove_cost,
> 	tsv110_regmove_cost,
> 	a64fx_regmove_cost,
> 	neoversen2_regmove_cost,
> 	neoversev1_regmove_cost,
> 	neoversev2_regmove_cost,
> 	generic_vector_cost,
> 	a64fx_vector_cost,
> 	qdf24xx_vector_cost,
> 	thunderx_vector_cost,
> 	tsv110_vector_cost,
> 	cortexa57_vector_cost,
> 	exynosm1_vector_cost,
> 	xgene1_vector_cost,
> 	thunderx2t99_vector_cost,
> 	thunderx3t110_vector_cost,
> 	ampere1_vector_cost,
> 	generic_branch_cost,
> 	generic_tunings,
> 	cortexa35_tunings,
> 	cortexa53_tunings,
> 	cortexa57_tunings,
> 	cortexa72_tunings,
> 	cortexa73_tunings,
> 	exynosm1_tunings,
> 	thunderxt88_tunings,
> 	thunderx_tunings,
> 	tsv110_tunings,
> 	xgene1_tunings,
> 	emag_tunings,
> 	qdf24xx_tunings,
> 	saphira_tunings,
> 	thunderx2t99_tunings,
> 	thunderx3t110_tunings,
> 	neoversen1_tunings,
> 	ampere1_tunings,
> 	ampere1a_tunings,
> 	neoversev1_vector_cost,
> 	neoversev1_tunings,
> 	neoverse512tvb_vector_cost,
> 	neoverse512tvb_tunings,
> 	neoversen2_vector_cost,
> 	neoversen2_tunings,
> 	neoversev2_vector_cost,
> 	neoversev2_tunings
> 	a64fx_tunings): Split into own files.

I think the official way of writing this is

	* config/aarch64/aarch64.cc (generic_addrcost_table)
	(exynosm1_addrcost_table, xgene1_addrcost_table)
	(thunderx2t99_addrcost_table, thunderx3t110_addrcost_table)
         ...
	(a64fx_tunings): Split into own files.


> 	* config/aarch64/tuning_models/a64fx.h: New file.
> 	* config/aarch64/tuning_models/ampere1.h: New file.
> 	* config/aarch64/tuning_models/ampere1a.h: New file.
> 	* config/aarch64/tuning_models/cortexa35.h: New file.
> 	* config/aarch64/tuning_models/cortexa53.h: New file.
> 	* config/aarch64/tuning_models/cortexa57.h: New file.
> 	* config/aarch64/tuning_models/cortexa72.h: New file.
> 	* config/aarch64/tuning_models/cortexa73.h: New file.
> 	* config/aarch64/tuning_models/emag.h: New file.
> 	* config/aarch64/tuning_models/exynosm1.h: New file.
> 	* config/aarch64/tuning_models/generic.h: New file.
> 	* config/aarch64/tuning_models/neoverse512tvb.h: New file.
> 	* config/aarch64/tuning_models/neoversen1.h: New file.
> 	* config/aarch64/tuning_models/neoversen2.h: New file.
> 	* config/aarch64/tuning_models/neoversev1.h: New file.
> 	* config/aarch64/tuning_models/neoversev2.h: New file.
> 	* config/aarch64/tuning_models/qdf24xx.h: New file.
> 	* config/aarch64/tuning_models/saphira.h: New file.
> 	* config/aarch64/tuning_models/thunderx.h: New file.
> 	* config/aarch64/tuning_models/thunderx2t99.h: New file.
> 	* config/aarch64/tuning_models/thunderx3t110.h: New file.
> 	* config/aarch64/tuning_models/thunderxt88.h: New file.
> 	* config/aarch64/tuning_models/tsv110.h: New file.
> 	* config/aarch64/tuning_models/xgene1.h: New file.
> 

Otherwise, OK.

R.

> --- inline copy of patch --
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 9fbfc548a891f5d11940c6fd3c49a14bfbdec886..07b1cde39209f5c7740e336b499e9aed31e4c515 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -354,2405 +354,30 @@ static const struct aarch64_flag_desc aarch64_tuning_flags[] =
>   };
>   
>   /* Tuning parameters.  */
> -
> -static const struct cpu_addrcost_table generic_addrcost_table =
> -{
> -    {
> -      1, /* hi  */
> -      0, /* si  */
> -      0, /* di  */
> -      1, /* ti  */
> -    },
> -  0, /* pre_modify  */
> -  0, /* post_modify  */
> -  0, /* post_modify_ld3_st3  */
> -  0, /* post_modify_ld4_st4  */
> -  0, /* register_offset  */
> -  0, /* register_sextend  */
> -  0, /* register_zextend  */
> -  0 /* imm_offset  */
> -};
> -
> -static const struct cpu_addrcost_table exynosm1_addrcost_table =
> -{
> -    {
> -      0, /* hi  */
> -      0, /* si  */
> -      0, /* di  */
> -      2, /* ti  */
> -    },
> -  0, /* pre_modify  */
> -  0, /* post_modify  */
> -  0, /* post_modify_ld3_st3  */
> -  0, /* post_modify_ld4_st4  */
> -  1, /* register_offset  */
> -  1, /* register_sextend  */
> -  2, /* register_zextend  */
> -  0, /* imm_offset  */
> -};
> -
> -static const struct cpu_addrcost_table xgene1_addrcost_table =
> -{
> -    {
> -      1, /* hi  */
> -      0, /* si  */
> -      0, /* di  */
> -      1, /* ti  */
> -    },
> -  1, /* pre_modify  */
> -  1, /* post_modify  */
> -  1, /* post_modify_ld3_st3  */
> -  1, /* post_modify_ld4_st4  */
> -  0, /* register_offset  */
> -  1, /* register_sextend  */
> -  1, /* register_zextend  */
> -  0, /* imm_offset  */
> -};
> -
> -static const struct cpu_addrcost_table thunderx2t99_addrcost_table =
> -{
> -    {
> -      1, /* hi  */
> -      1, /* si  */
> -      1, /* di  */
> -      2, /* ti  */
> -    },
> -  0, /* pre_modify  */
> -  0, /* post_modify  */
> -  0, /* post_modify_ld3_st3  */
> -  0, /* post_modify_ld4_st4  */
> -  2, /* register_offset  */
> -  3, /* register_sextend  */
> -  3, /* register_zextend  */
> -  0, /* imm_offset  */
> -};
> -
> -static const struct cpu_addrcost_table thunderx3t110_addrcost_table =
> -{
> -    {
> -      1, /* hi  */
> -      1, /* si  */
> -      1, /* di  */
> -      2, /* ti  */
> -    },
> -  0, /* pre_modify  */
> -  0, /* post_modify  */
> -  0, /* post_modify_ld3_st3  */
> -  0, /* post_modify_ld4_st4  */
> -  2, /* register_offset  */
> -  3, /* register_sextend  */
> -  3, /* register_zextend  */
> -  0, /* imm_offset  */
> -};
> -
> -static const struct cpu_addrcost_table tsv110_addrcost_table =
> -{
> -    {
> -      1, /* hi  */
> -      0, /* si  */
> -      0, /* di  */
> -      1, /* ti  */
> -    },
> -  0, /* pre_modify  */
> -  0, /* post_modify  */
> -  0, /* post_modify_ld3_st3  */
> -  0, /* post_modify_ld4_st4  */
> -  0, /* register_offset  */
> -  1, /* register_sextend  */
> -  1, /* register_zextend  */
> -  0, /* imm_offset  */
> -};
> -
> -static const struct cpu_addrcost_table qdf24xx_addrcost_table =
> -{
> -    {
> -      1, /* hi  */
> -      1, /* si  */
> -      1, /* di  */
> -      2, /* ti  */
> -    },
> -  1, /* pre_modify  */
> -  1, /* post_modify  */
> -  1, /* post_modify_ld3_st3  */
> -  1, /* post_modify_ld4_st4  */
> -  3, /* register_offset  */
> -  3, /* register_sextend  */
> -  3, /* register_zextend  */
> -  2, /* imm_offset  */
> -};
> -
> -static const struct cpu_addrcost_table a64fx_addrcost_table =
> -{
> -    {
> -      1, /* hi  */
> -      1, /* si  */
> -      1, /* di  */
> -      2, /* ti  */
> -    },
> -  0, /* pre_modify  */
> -  0, /* post_modify  */
> -  0, /* post_modify_ld3_st3  */
> -  0, /* post_modify_ld4_st4  */
> -  2, /* register_offset  */
> -  3, /* register_sextend  */
> -  3, /* register_zextend  */
> -  0, /* imm_offset  */
> -};
> -
> -static const struct cpu_addrcost_table neoversev1_addrcost_table =
> -{
> -    {
> -      1, /* hi  */
> -      0, /* si  */
> -      0, /* di  */
> -      1, /* ti  */
> -    },
> -  0, /* pre_modify  */
> -  0, /* post_modify  */
> -  3, /* post_modify_ld3_st3  */
> -  3, /* post_modify_ld4_st4  */
> -  0, /* register_offset  */
> -  0, /* register_sextend  */
> -  0, /* register_zextend  */
> -  0 /* imm_offset  */
> -};
> -
> -static const struct cpu_addrcost_table neoversen2_addrcost_table =
> -{
> -    {
> -      1, /* hi  */
> -      0, /* si  */
> -      0, /* di  */
> -      1, /* ti  */
> -    },
> -  0, /* pre_modify  */
> -  0, /* post_modify  */
> -  2, /* post_modify_ld3_st3  */
> -  2, /* post_modify_ld4_st4  */
> -  0, /* register_offset  */
> -  0, /* register_sextend  */
> -  0, /* register_zextend  */
> -  0 /* imm_offset  */
> -};
> -
> -static const struct cpu_addrcost_table neoversev2_addrcost_table =
> -{
> -    {
> -      1, /* hi  */
> -      0, /* si  */
> -      0, /* di  */
> -      1, /* ti  */
> -    },
> -  0, /* pre_modify  */
> -  0, /* post_modify  */
> -  2, /* post_modify_ld3_st3  */
> -  2, /* post_modify_ld4_st4  */
> -  0, /* register_offset  */
> -  0, /* register_sextend  */
> -  0, /* register_zextend  */
> -  0 /* imm_offset  */
> -};
> -
> -static const struct cpu_regmove_cost generic_regmove_cost =
> -{
> -  1, /* GP2GP  */
> -  /* Avoid the use of slow int<->fp moves for spilling by setting
> -     their cost higher than memmov_cost.  */
> -  5, /* GP2FP  */
> -  5, /* FP2GP  */
> -  2 /* FP2FP  */
> -};
> -
> -static const struct cpu_regmove_cost cortexa57_regmove_cost =
> -{
> -  1, /* GP2GP  */
> -  /* Avoid the use of slow int<->fp moves for spilling by setting
> -     their cost higher than memmov_cost.  */
> -  5, /* GP2FP  */
> -  5, /* FP2GP  */
> -  2 /* FP2FP  */
> -};
> -
> -static const struct cpu_regmove_cost cortexa53_regmove_cost =
> -{
> -  1, /* GP2GP  */
> -  /* Avoid the use of slow int<->fp moves for spilling by setting
> -     their cost higher than memmov_cost.  */
> -  5, /* GP2FP  */
> -  5, /* FP2GP  */
> -  2 /* FP2FP  */
> -};
> -
> -static const struct cpu_regmove_cost exynosm1_regmove_cost =
> -{
> -  1, /* GP2GP  */
> -  /* Avoid the use of slow int<->fp moves for spilling by setting
> -     their cost higher than memmov_cost (actual, 4 and 9).  */
> -  9, /* GP2FP  */
> -  9, /* FP2GP  */
> -  1 /* FP2FP  */
> -};
> -
> -static const struct cpu_regmove_cost thunderx_regmove_cost =
> -{
> -  2, /* GP2GP  */
> -  2, /* GP2FP  */
> -  6, /* FP2GP  */
> -  4 /* FP2FP  */
> -};
> -
> -static const struct cpu_regmove_cost xgene1_regmove_cost =
> -{
> -  1, /* GP2GP  */
> -  /* Avoid the use of slow int<->fp moves for spilling by setting
> -     their cost higher than memmov_cost.  */
> -  8, /* GP2FP  */
> -  8, /* FP2GP  */
> -  2 /* FP2FP  */
> -};
> -
> -static const struct cpu_regmove_cost qdf24xx_regmove_cost =
> -{
> -  2, /* GP2GP  */
> -  /* Avoid the use of int<->fp moves for spilling.  */
> -  6, /* GP2FP  */
> -  6, /* FP2GP  */
> -  4 /* FP2FP  */
> -};
> -
> -static const struct cpu_regmove_cost thunderx2t99_regmove_cost =
> -{
> -  1, /* GP2GP  */
> -  /* Avoid the use of int<->fp moves for spilling.  */
> -  5, /* GP2FP  */
> -  6, /* FP2GP  */
> -  3, /* FP2FP  */
> -};
> -
> -static const struct cpu_regmove_cost thunderx3t110_regmove_cost =
> -{
> -  1, /* GP2GP  */
> -  /* Avoid the use of int<->fp moves for spilling.  */
> -  4, /* GP2FP  */
> -  5, /* FP2GP  */
> -  4  /* FP2FP  */
> -};
> -
> -static const struct cpu_regmove_cost tsv110_regmove_cost =
> -{
> -  1, /* GP2GP  */
> -  /* Avoid the use of slow int<->fp moves for spilling by setting
> -     their cost higher than memmov_cost.  */
> -  2, /* GP2FP  */
> -  3, /* FP2GP  */
> -  2  /* FP2FP  */
> -};
> -
> -static const struct cpu_regmove_cost a64fx_regmove_cost =
> -{
> -  1, /* GP2GP  */
> -  /* Avoid the use of slow int<->fp moves for spilling by setting
> -     their cost higher than memmov_cost.  */
> -  5, /* GP2FP  */
> -  7, /* FP2GP  */
> -  2 /* FP2FP  */
> -};
> -
> -static const struct cpu_regmove_cost neoversen2_regmove_cost =
> -{
> -  1, /* GP2GP  */
> -  /* Spilling to int<->fp instead of memory is recommended so set
> -     realistic costs compared to memmov_cost.  */
> -  3, /* GP2FP  */
> -  2, /* FP2GP  */
> -  2 /* FP2FP  */
> -};
> -
> -static const struct cpu_regmove_cost neoversev1_regmove_cost =
> -{
> -  1, /* GP2GP  */
> -  /* Spilling to int<->fp instead of memory is recommended so set
> -     realistic costs compared to memmov_cost.  */
> -  3, /* GP2FP  */
> -  2, /* FP2GP  */
> -  2 /* FP2FP  */
> -};
> -
> -static const struct cpu_regmove_cost neoversev2_regmove_cost =
> -{
> -  1, /* GP2GP  */
> -  /* Spilling to int<->fp instead of memory is recommended so set
> -     realistic costs compared to memmov_cost.  */
> -  3, /* GP2FP  */
> -  2, /* FP2GP  */
> -  2 /* FP2FP  */
> -};
> -
> -/* Generic costs for Advanced SIMD vector operations.   */
> -static const advsimd_vec_cost generic_advsimd_vector_cost =
> -{
> -  1, /* int_stmt_cost  */
> -  1, /* fp_stmt_cost  */
> -  0, /* ld2_st2_permute_cost  */
> -  0, /* ld3_st3_permute_cost  */
> -  0, /* ld4_st4_permute_cost  */
> -  2, /* permute_cost  */
> -  2, /* reduc_i8_cost  */
> -  2, /* reduc_i16_cost  */
> -  2, /* reduc_i32_cost  */
> -  2, /* reduc_i64_cost  */
> -  2, /* reduc_f16_cost  */
> -  2, /* reduc_f32_cost  */
> -  2, /* reduc_f64_cost  */
> -  2, /* store_elt_extra_cost  */
> -  2, /* vec_to_scalar_cost  */
> -  1, /* scalar_to_vec_cost  */
> -  1, /* align_load_cost  */
> -  1, /* unalign_load_cost  */
> -  1, /* unalign_store_cost  */
> -  1  /* store_cost  */
> -};
> -
> -/* Generic costs for SVE vector operations.  */
> -static const sve_vec_cost generic_sve_vector_cost =
> -{
> -  {
> -    1, /* int_stmt_cost  */
> -    1, /* fp_stmt_cost  */
> -    0, /* ld2_st2_permute_cost  */
> -    0, /* ld3_st3_permute_cost  */
> -    0, /* ld4_st4_permute_cost  */
> -    2, /* permute_cost  */
> -    2, /* reduc_i8_cost  */
> -    2, /* reduc_i16_cost  */
> -    2, /* reduc_i32_cost  */
> -    2, /* reduc_i64_cost  */
> -    2, /* reduc_f16_cost  */
> -    2, /* reduc_f32_cost  */
> -    2, /* reduc_f64_cost  */
> -    2, /* store_elt_extra_cost  */
> -    2, /* vec_to_scalar_cost  */
> -    1, /* scalar_to_vec_cost  */
> -    1, /* align_load_cost  */
> -    1, /* unalign_load_cost  */
> -    1, /* unalign_store_cost  */
> -    1  /* store_cost  */
> -  },
> -  2, /* clast_cost  */
> -  2, /* fadda_f16_cost  */
> -  2, /* fadda_f32_cost  */
> -  2, /* fadda_f64_cost  */
> -  4, /* gather_load_x32_cost  */
> -  2, /* gather_load_x64_cost  */
> -  1 /* scatter_store_elt_cost  */
> -};
> -
> -/* Generic costs for vector insn classes.  */
> -static const struct cpu_vector_cost generic_vector_cost =
> -{
> -  1, /* scalar_int_stmt_cost  */
> -  1, /* scalar_fp_stmt_cost  */
> -  1, /* scalar_load_cost  */
> -  1, /* scalar_store_cost  */
> -  3, /* cond_taken_branch_cost  */
> -  1, /* cond_not_taken_branch_cost  */
> -  &generic_advsimd_vector_cost, /* advsimd  */
> -  &generic_sve_vector_cost, /* sve */
> -  nullptr /* issue_info  */
> -};
> -
> -static const advsimd_vec_cost a64fx_advsimd_vector_cost =
> -{
> -  2, /* int_stmt_cost  */
> -  5, /* fp_stmt_cost  */
> -  0, /* ld2_st2_permute_cost  */
> -  0, /* ld3_st3_permute_cost  */
> -  0, /* ld4_st4_permute_cost  */
> -  3, /* permute_cost  */
> -  13, /* reduc_i8_cost  */
> -  13, /* reduc_i16_cost  */
> -  13, /* reduc_i32_cost  */
> -  13, /* reduc_i64_cost  */
> -  13, /* reduc_f16_cost  */
> -  13, /* reduc_f32_cost  */
> -  13, /* reduc_f64_cost  */
> -  13, /* store_elt_extra_cost  */
> -  13, /* vec_to_scalar_cost  */
> -  4, /* scalar_to_vec_cost  */
> -  6, /* align_load_cost  */
> -  6, /* unalign_load_cost  */
> -  1, /* unalign_store_cost  */
> -  1  /* store_cost  */
> -};
> -
> -static const sve_vec_cost a64fx_sve_vector_cost =
> -{
> -  {
> -    2, /* int_stmt_cost  */
> -    5, /* fp_stmt_cost  */
> -    0, /* ld2_st2_permute_cost  */
> -    0, /* ld3_st3_permute_cost  */
> -    0, /* ld4_st4_permute_cost  */
> -    3, /* permute_cost  */
> -    13, /* reduc_i8_cost  */
> -    13, /* reduc_i16_cost  */
> -    13, /* reduc_i32_cost  */
> -    13, /* reduc_i64_cost  */
> -    13, /* reduc_f16_cost  */
> -    13, /* reduc_f32_cost  */
> -    13, /* reduc_f64_cost  */
> -    13, /* store_elt_extra_cost  */
> -    13, /* vec_to_scalar_cost  */
> -    4, /* scalar_to_vec_cost  */
> -    6, /* align_load_cost  */
> -    6, /* unalign_load_cost  */
> -    1, /* unalign_store_cost  */
> -    1  /* store_cost  */
> -  },
> -  13, /* clast_cost  */
> -  13, /* fadda_f16_cost  */
> -  13, /* fadda_f32_cost  */
> -  13, /* fadda_f64_cost  */
> -  64, /* gather_load_x32_cost  */
> -  32, /* gather_load_x64_cost  */
> -  1 /* scatter_store_elt_cost  */
> -};
> -
> -static const struct cpu_vector_cost a64fx_vector_cost =
> -{
> -  1, /* scalar_int_stmt_cost  */
> -  5, /* scalar_fp_stmt_cost  */
> -  4, /* scalar_load_cost  */
> -  1, /* scalar_store_cost  */
> -  3, /* cond_taken_branch_cost  */
> -  1, /* cond_not_taken_branch_cost  */
> -  &a64fx_advsimd_vector_cost, /* advsimd  */
> -  &a64fx_sve_vector_cost, /* sve  */
> -  nullptr /* issue_info  */
> -};
> -
> -static const advsimd_vec_cost qdf24xx_advsimd_vector_cost =
> -{
> -  1, /* int_stmt_cost  */
> -  3, /* fp_stmt_cost  */
> -  0, /* ld2_st2_permute_cost  */
> -  0, /* ld3_st3_permute_cost  */
> -  0, /* ld4_st4_permute_cost  */
> -  2, /* permute_cost  */
> -  1, /* reduc_i8_cost  */
> -  1, /* reduc_i16_cost  */
> -  1, /* reduc_i32_cost  */
> -  1, /* reduc_i64_cost  */
> -  1, /* reduc_f16_cost  */
> -  1, /* reduc_f32_cost  */
> -  1, /* reduc_f64_cost  */
> -  1, /* store_elt_extra_cost  */
> -  1, /* vec_to_scalar_cost  */
> -  1, /* scalar_to_vec_cost  */
> -  1, /* align_load_cost  */
> -  1, /* unalign_load_cost  */
> -  1, /* unalign_store_cost  */
> -  1  /* store_cost  */
> -};
> -
> -/* QDF24XX costs for vector insn classes.  */
> -static const struct cpu_vector_cost qdf24xx_vector_cost =
> -{
> -  1, /* scalar_int_stmt_cost  */
> -  1, /* scalar_fp_stmt_cost  */
> -  1, /* scalar_load_cost  */
> -  1, /* scalar_store_cost  */
> -  3, /* cond_taken_branch_cost  */
> -  1, /* cond_not_taken_branch_cost  */
> -  &qdf24xx_advsimd_vector_cost, /* advsimd  */
> -  nullptr, /* sve  */
> -  nullptr /* issue_info  */
> -};
> -
> -
> -static const advsimd_vec_cost thunderx_advsimd_vector_cost =
> -{
> -  4, /* int_stmt_cost  */
> -  1, /* fp_stmt_cost  */
> -  0, /* ld2_st2_permute_cost  */
> -  0, /* ld3_st3_permute_cost  */
> -  0, /* ld4_st4_permute_cost  */
> -  4, /* permute_cost  */
> -  2, /* reduc_i8_cost  */
> -  2, /* reduc_i16_cost  */
> -  2, /* reduc_i32_cost  */
> -  2, /* reduc_i64_cost  */
> -  2, /* reduc_f16_cost  */
> -  2, /* reduc_f32_cost  */
> -  2, /* reduc_f64_cost  */
> -  2, /* store_elt_extra_cost  */
> -  2, /* vec_to_scalar_cost  */
> -  2, /* scalar_to_vec_cost  */
> -  3, /* align_load_cost  */
> -  5, /* unalign_load_cost  */
> -  5, /* unalign_store_cost  */
> -  1  /* store_cost  */
> -};
> -
> -/* ThunderX costs for vector insn classes.  */
> -static const struct cpu_vector_cost thunderx_vector_cost =
> -{
> -  1, /* scalar_int_stmt_cost  */
> -  1, /* scalar_fp_stmt_cost  */
> -  3, /* scalar_load_cost  */
> -  1, /* scalar_store_cost  */
> -  3, /* cond_taken_branch_cost  */
> -  3, /* cond_not_taken_branch_cost  */
> -  &thunderx_advsimd_vector_cost, /* advsimd  */
> -  nullptr, /* sve  */
> -  nullptr /* issue_info  */
> -};
> -
> -static const advsimd_vec_cost tsv110_advsimd_vector_cost =
> -{
> -  2, /* int_stmt_cost  */
> -  2, /* fp_stmt_cost  */
> -  0, /* ld2_st2_permute_cost  */
> -  0, /* ld3_st3_permute_cost  */
> -  0, /* ld4_st4_permute_cost  */
> -  2, /* permute_cost  */
> -  3, /* reduc_i8_cost  */
> -  3, /* reduc_i16_cost  */
> -  3, /* reduc_i32_cost  */
> -  3, /* reduc_i64_cost  */
> -  3, /* reduc_f16_cost  */
> -  3, /* reduc_f32_cost  */
> -  3, /* reduc_f64_cost  */
> -  3, /* store_elt_extra_cost  */
> -  3, /* vec_to_scalar_cost  */
> -  2, /* scalar_to_vec_cost  */
> -  5, /* align_load_cost  */
> -  5, /* unalign_load_cost  */
> -  1, /* unalign_store_cost  */
> -  1  /* store_cost  */
> -};
> -
> -static const struct cpu_vector_cost tsv110_vector_cost =
> -{
> -  1, /* scalar_int_stmt_cost  */
> -  1, /* scalar_fp_stmt_cost  */
> -  5, /* scalar_load_cost  */
> -  1, /* scalar_store_cost  */
> -  1, /* cond_taken_branch_cost  */
> -  1, /* cond_not_taken_branch_cost  */
> -  &tsv110_advsimd_vector_cost, /* advsimd  */
> -  nullptr, /* sve  */
> -  nullptr /* issue_info  */
> -};
> -
> -static const advsimd_vec_cost cortexa57_advsimd_vector_cost =
> -{
> -  2, /* int_stmt_cost  */
> -  2, /* fp_stmt_cost  */
> -  0, /* ld2_st2_permute_cost  */
> -  0, /* ld3_st3_permute_cost  */
> -  0, /* ld4_st4_permute_cost  */
> -  3, /* permute_cost  */
> -  8, /* reduc_i8_cost  */
> -  8, /* reduc_i16_cost  */
> -  8, /* reduc_i32_cost  */
> -  8, /* reduc_i64_cost  */
> -  8, /* reduc_f16_cost  */
> -  8, /* reduc_f32_cost  */
> -  8, /* reduc_f64_cost  */
> -  8, /* store_elt_extra_cost  */
> -  8, /* vec_to_scalar_cost  */
> -  8, /* scalar_to_vec_cost  */
> -  4, /* align_load_cost  */
> -  4, /* unalign_load_cost  */
> -  1, /* unalign_store_cost  */
> -  1  /* store_cost  */
> -};
> -
> -/* Cortex-A57 costs for vector insn classes.  */
> -static const struct cpu_vector_cost cortexa57_vector_cost =
> -{
> -  1, /* scalar_int_stmt_cost  */
> -  1, /* scalar_fp_stmt_cost  */
> -  4, /* scalar_load_cost  */
> -  1, /* scalar_store_cost  */
> -  1, /* cond_taken_branch_cost  */
> -  1, /* cond_not_taken_branch_cost  */
> -  &cortexa57_advsimd_vector_cost, /* advsimd  */
> -  nullptr, /* sve  */
> -  nullptr /* issue_info  */
> -};
> -
> -static const advsimd_vec_cost exynosm1_advsimd_vector_cost =
> -{
> -  3, /* int_stmt_cost  */
> -  3, /* fp_stmt_cost  */
> -  0, /* ld2_st2_permute_cost  */
> -  0, /* ld3_st3_permute_cost  */
> -  0, /* ld4_st4_permute_cost  */
> -  3, /* permute_cost  */
> -  3, /* reduc_i8_cost  */
> -  3, /* reduc_i16_cost  */
> -  3, /* reduc_i32_cost  */
> -  3, /* reduc_i64_cost  */
> -  3, /* reduc_f16_cost  */
> -  3, /* reduc_f32_cost  */
> -  3, /* reduc_f64_cost  */
> -  3, /* store_elt_extra_cost  */
> -  3, /* vec_to_scalar_cost  */
> -  3, /* scalar_to_vec_cost  */
> -  5, /* align_load_cost  */
> -  5, /* unalign_load_cost  */
> -  1, /* unalign_store_cost  */
> -  1  /* store_cost  */
> -};
> -
> -static const struct cpu_vector_cost exynosm1_vector_cost =
> -{
> -  1, /* scalar_int_stmt_cost  */
> -  1, /* scalar_fp_stmt_cost  */
> -  5, /* scalar_load_cost  */
> -  1, /* scalar_store_cost  */
> -  1, /* cond_taken_branch_cost  */
> -  1, /* cond_not_taken_branch_cost  */
> -  &exynosm1_advsimd_vector_cost, /* advsimd  */
> -  nullptr, /* sve  */
> -  nullptr /* issue_info  */
> -};
> -
> -static const advsimd_vec_cost xgene1_advsimd_vector_cost =
> -{
> -  2, /* int_stmt_cost  */
> -  2, /* fp_stmt_cost  */
> -  0, /* ld2_st2_permute_cost  */
> -  0, /* ld3_st3_permute_cost  */
> -  0, /* ld4_st4_permute_cost  */
> -  2, /* permute_cost  */
> -  4, /* reduc_i8_cost  */
> -  4, /* reduc_i16_cost  */
> -  4, /* reduc_i32_cost  */
> -  4, /* reduc_i64_cost  */
> -  4, /* reduc_f16_cost  */
> -  4, /* reduc_f32_cost  */
> -  4, /* reduc_f64_cost  */
> -  4, /* store_elt_extra_cost  */
> -  4, /* vec_to_scalar_cost  */
> -  4, /* scalar_to_vec_cost  */
> -  10, /* align_load_cost  */
> -  10, /* unalign_load_cost  */
> -  2, /* unalign_store_cost  */
> -  2  /* store_cost  */
> -};
> -
> -/* Generic costs for vector insn classes.  */
> -static const struct cpu_vector_cost xgene1_vector_cost =
> -{
> -  1, /* scalar_int_stmt_cost  */
> -  1, /* scalar_fp_stmt_cost  */
> -  5, /* scalar_load_cost  */
> -  1, /* scalar_store_cost  */
> -  2, /* cond_taken_branch_cost  */
> -  1, /* cond_not_taken_branch_cost  */
> -  &xgene1_advsimd_vector_cost, /* advsimd  */
> -  nullptr, /* sve  */
> -  nullptr /* issue_info  */
> -};
> -
> -static const advsimd_vec_cost thunderx2t99_advsimd_vector_cost =
> -{
> -  4, /* int_stmt_cost  */
> -  5, /* fp_stmt_cost  */
> -  0, /* ld2_st2_permute_cost  */
> -  0, /* ld3_st3_permute_cost  */
> -  0, /* ld4_st4_permute_cost  */
> -  10, /* permute_cost  */
> -  6, /* reduc_i8_cost  */
> -  6, /* reduc_i16_cost  */
> -  6, /* reduc_i32_cost  */
> -  6, /* reduc_i64_cost  */
> -  6, /* reduc_f16_cost  */
> -  6, /* reduc_f32_cost  */
> -  6, /* reduc_f64_cost  */
> -  6, /* store_elt_extra_cost  */
> -  6, /* vec_to_scalar_cost  */
> -  5, /* scalar_to_vec_cost  */
> -  4, /* align_load_cost  */
> -  4, /* unalign_load_cost  */
> -  1, /* unalign_store_cost  */
> -  1  /* store_cost  */
> -};
> -
> -/* Costs for vector insn classes for Vulcan.  */
> -static const struct cpu_vector_cost thunderx2t99_vector_cost =
> -{
> -  1, /* scalar_int_stmt_cost  */
> -  6, /* scalar_fp_stmt_cost  */
> -  4, /* scalar_load_cost  */
> -  1, /* scalar_store_cost  */
> -  2, /* cond_taken_branch_cost  */
> -  1,  /* cond_not_taken_branch_cost  */
> -  &thunderx2t99_advsimd_vector_cost, /* advsimd  */
> -  nullptr, /* sve  */
> -  nullptr /* issue_info  */
> -};
> -
> -static const advsimd_vec_cost thunderx3t110_advsimd_vector_cost =
> -{
> -  5, /* int_stmt_cost  */
> -  5, /* fp_stmt_cost  */
> -  0, /* ld2_st2_permute_cost  */
> -  0, /* ld3_st3_permute_cost  */
> -  0, /* ld4_st4_permute_cost  */
> -  10, /* permute_cost  */
> -  5, /* reduc_i8_cost  */
> -  5, /* reduc_i16_cost  */
> -  5, /* reduc_i32_cost  */
> -  5, /* reduc_i64_cost  */
> -  5, /* reduc_f16_cost  */
> -  5, /* reduc_f32_cost  */
> -  5, /* reduc_f64_cost  */
> -  5, /* store_elt_extra_cost  */
> -  5, /* vec_to_scalar_cost  */
> -  5, /* scalar_to_vec_cost  */
> -  4, /* align_load_cost  */
> -  4, /* unalign_load_cost  */
> -  4, /* unalign_store_cost  */
> -  4  /* store_cost  */
> -};
> -
> -static const struct cpu_vector_cost thunderx3t110_vector_cost =
> -{
> -  1, /* scalar_int_stmt_cost  */
> -  5, /* scalar_fp_stmt_cost  */
> -  4, /* scalar_load_cost  */
> -  1, /* scalar_store_cost  */
> -  2, /* cond_taken_branch_cost  */
> -  1,  /* cond_not_taken_branch_cost  */
> -  &thunderx3t110_advsimd_vector_cost, /* advsimd  */
> -  nullptr, /* sve  */
> -  nullptr /* issue_info  */
> -};
> -
> -static const advsimd_vec_cost ampere1_advsimd_vector_cost =
> -{
> -  1, /* int_stmt_cost  */
> -  3, /* fp_stmt_cost  */
> -  0, /* ld2_st2_permute_cost  */
> -  0, /* ld3_st3_permute_cost  */
> -  0, /* ld4_st4_permute_cost  */
> -  2, /* permute_cost  */
> -  12, /* reduc_i8_cost  */
> -  9, /* reduc_i16_cost  */
> -  6, /* reduc_i32_cost  */
> -  5, /* reduc_i64_cost  */
> -  9, /* reduc_f16_cost  */
> -  6, /* reduc_f32_cost  */
> -  5, /* reduc_f64_cost  */
> -  8, /* store_elt_extra_cost  */
> -  6, /* vec_to_scalar_cost  */
> -  7, /* scalar_to_vec_cost  */
> -  4, /* align_load_cost  */
> -  4, /* unalign_load_cost  */
> -  1, /* unalign_store_cost  */
> -  1  /* store_cost  */
> -};
> -
> -/* Ampere-1 costs for vector insn classes.  */
> -static const struct cpu_vector_cost ampere1_vector_cost =
> -{
> -  1, /* scalar_int_stmt_cost  */
> -  3, /* scalar_fp_stmt_cost  */
> -  4, /* scalar_load_cost  */
> -  1, /* scalar_store_cost  */
> -  1, /* cond_taken_branch_cost  */
> -  1, /* cond_not_taken_branch_cost  */
> -  &ampere1_advsimd_vector_cost, /* advsimd  */
> -  nullptr, /* sve  */
> -  nullptr  /* issue_info  */
> -};
> -
> -/* Generic costs for branch instructions.  */
> -static const struct cpu_branch_cost generic_branch_cost =
> -{
> -  1,  /* Predictable.  */
> -  3   /* Unpredictable.  */
> -};
> -
> -/* Generic approximation modes.  */
> -static const cpu_approx_modes generic_approx_modes =
> -{
> -  AARCH64_APPROX_NONE,	/* division  */
> -  AARCH64_APPROX_NONE,	/* sqrt  */
> -  AARCH64_APPROX_NONE	/* recip_sqrt  */
> -};
> -
> -/* Approximation modes for Exynos M1.  */
> -static const cpu_approx_modes exynosm1_approx_modes =
> -{
> -  AARCH64_APPROX_NONE,	/* division  */
> -  AARCH64_APPROX_ALL,	/* sqrt  */
> -  AARCH64_APPROX_ALL	/* recip_sqrt  */
> -};
> -
> -/* Approximation modes for X-Gene 1.  */
> -static const cpu_approx_modes xgene1_approx_modes =
> -{
> -  AARCH64_APPROX_NONE,	/* division  */
> -  AARCH64_APPROX_NONE,	/* sqrt  */
> -  AARCH64_APPROX_ALL	/* recip_sqrt  */
> -};
> -
> -/* Generic prefetch settings (which disable prefetch).  */
> -static const cpu_prefetch_tune generic_prefetch_tune =
> -{
> -  0,			/* num_slots  */
> -  -1,			/* l1_cache_size  */
> -  -1,			/* l1_cache_line_size  */
> -  -1,			/* l2_cache_size  */
> -  true,			/* prefetch_dynamic_strides */
> -  -1,			/* minimum_stride */
> -  -1			/* default_opt_level  */
> -};
> -
> -static const cpu_prefetch_tune exynosm1_prefetch_tune =
> -{
> -  0,			/* num_slots  */
> -  -1,			/* l1_cache_size  */
> -  64,			/* l1_cache_line_size  */
> -  -1,			/* l2_cache_size  */
> -  true,			/* prefetch_dynamic_strides */
> -  -1,			/* minimum_stride */
> -  -1			/* default_opt_level  */
> -};
> -
> -static const cpu_prefetch_tune qdf24xx_prefetch_tune =
> -{
> -  4,			/* num_slots  */
> -  32,			/* l1_cache_size  */
> -  64,			/* l1_cache_line_size  */
> -  512,			/* l2_cache_size  */
> -  false,		/* prefetch_dynamic_strides */
> -  2048,			/* minimum_stride */
> -  3			/* default_opt_level  */
> -};
> -
> -static const cpu_prefetch_tune thunderxt88_prefetch_tune =
> -{
> -  8,			/* num_slots  */
> -  32,			/* l1_cache_size  */
> -  128,			/* l1_cache_line_size  */
> -  16*1024,		/* l2_cache_size  */
> -  true,			/* prefetch_dynamic_strides */
> -  -1,			/* minimum_stride */
> -  3			/* default_opt_level  */
> -};
> -
> -static const cpu_prefetch_tune thunderx_prefetch_tune =
> -{
> -  8,			/* num_slots  */
> -  32,			/* l1_cache_size  */
> -  128,			/* l1_cache_line_size  */
> -  -1,			/* l2_cache_size  */
> -  true,			/* prefetch_dynamic_strides */
> -  -1,			/* minimum_stride */
> -  -1			/* default_opt_level  */
> -};
> -
> -static const cpu_prefetch_tune thunderx2t99_prefetch_tune =
> -{
> -  8,			/* num_slots  */
> -  32,			/* l1_cache_size  */
> -  64,			/* l1_cache_line_size  */
> -  256,			/* l2_cache_size  */
> -  true,			/* prefetch_dynamic_strides */
> -  -1,			/* minimum_stride */
> -  -1			/* default_opt_level  */
> -};
> -
> -static const cpu_prefetch_tune thunderx3t110_prefetch_tune =
> -{
> -  8,			/* num_slots  */
> -  32,			/* l1_cache_size  */
> -  64,			/* l1_cache_line_size  */
> -  256,			/* l2_cache_size  */
> -  true,			/* prefetch_dynamic_strides */
> -  -1,			/* minimum_stride */
> -  -1			/* default_opt_level  */
> -};
> -
> -static const cpu_prefetch_tune tsv110_prefetch_tune =
> -{
> -  0,                    /* num_slots  */
> -  64,                   /* l1_cache_size  */
> -  64,                   /* l1_cache_line_size  */
> -  512,                  /* l2_cache_size  */
> -  true,                 /* prefetch_dynamic_strides */
> -  -1,                   /* minimum_stride */
> -  -1                    /* default_opt_level  */
> -};
> -
> -static const cpu_prefetch_tune xgene1_prefetch_tune =
> -{
> -  8,			/* num_slots  */
> -  32,			/* l1_cache_size  */
> -  64,			/* l1_cache_line_size  */
> -  256,			/* l2_cache_size  */
> -  true,                 /* prefetch_dynamic_strides */
> -  -1,                   /* minimum_stride */
> -  -1			/* default_opt_level  */
> -};
> -
> -static const cpu_prefetch_tune a64fx_prefetch_tune =
> -{
> -  8,			/* num_slots  */
> -  64,			/* l1_cache_size  */
> -  256,			/* l1_cache_line_size  */
> -  32768,		/* l2_cache_size  */
> -  true,			/* prefetch_dynamic_strides */
> -  -1,			/* minimum_stride */
> -  -1			/* default_opt_level  */
> -};
> -
> -static const cpu_prefetch_tune ampere1_prefetch_tune =
> -{
> -  0,			/* num_slots  */
> -  64,			/* l1_cache_size  */
> -  64,			/* l1_cache_line_size  */
> -  2048,			/* l2_cache_size  */
> -  true,			/* prefetch_dynamic_strides */
> -  -1,			/* minimum_stride */
> -  -1			/* default_opt_level  */
> -};
> -
> -static const struct tune_params generic_tunings =
> -{
> -  &cortexa57_extra_costs,
> -  &generic_addrcost_table,
> -  &generic_regmove_cost,
> -  &generic_vector_cost,
> -  &generic_branch_cost,
> -  &generic_approx_modes,
> -  SVE_NOT_IMPLEMENTED, /* sve_width  */
> -  { 4, /* load_int.  */
> -    4, /* store_int.  */
> -    4, /* load_fp.  */
> -    4, /* store_fp.  */
> -    4, /* load_pred.  */
> -    4 /* store_pred.  */
> -  }, /* memmov_cost.  */
> -  2, /* issue_rate  */
> -  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops  */
> -  "16:12",	/* function_align.  */
> -  "4",	/* jump_align.  */
> -  "8",	/* loop_align.  */
> -  2,	/* int_reassoc_width.  */
> -  4,	/* fp_reassoc_width.  */
> -  1,	/* fma_reassoc_width.  */
> -  1,	/* vec_reassoc_width.  */
> -  2,	/* min_div_recip_mul_sf.  */
> -  2,	/* min_div_recip_mul_df.  */
> -  0,	/* max_case_values.  */
> -  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
> -  /* Enabling AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS significantly benefits
> -     Neoverse V1.  It does not have a noticeable effect on A64FX and should
> -     have at most a very minor effect on SVE2 cores.  */
> -  (AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS),	/* tune_flags.  */
> -  &generic_prefetch_tune,
> -  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
> -  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
> -};
> -
> -static const struct tune_params cortexa35_tunings =
> -{
> -  &cortexa53_extra_costs,
> -  &generic_addrcost_table,
> -  &cortexa53_regmove_cost,
> -  &generic_vector_cost,
> -  &generic_branch_cost,
> -  &generic_approx_modes,
> -  SVE_NOT_IMPLEMENTED, /* sve_width  */
> -  { 4, /* load_int.  */
> -    4, /* store_int.  */
> -    4, /* load_fp.  */
> -    4, /* store_fp.  */
> -    4, /* load_pred.  */
> -    4 /* store_pred.  */
> -  }, /* memmov_cost.  */
> -  1, /* issue_rate  */
> -  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
> -   | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops  */
> -  "16",	/* function_align.  */
> -  "4",	/* jump_align.  */
> -  "8",	/* loop_align.  */
> -  2,	/* int_reassoc_width.  */
> -  4,	/* fp_reassoc_width.  */
> -  1,	/* fma_reassoc_width.  */
> -  1,	/* vec_reassoc_width.  */
> -  2,	/* min_div_recip_mul_sf.  */
> -  2,	/* min_div_recip_mul_df.  */
> -  0,	/* max_case_values.  */
> -  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
> -  (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
> -  &generic_prefetch_tune,
> -  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
> -  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
> -};
> -
> -static const struct tune_params cortexa53_tunings =
> -{
> -  &cortexa53_extra_costs,
> -  &generic_addrcost_table,
> -  &cortexa53_regmove_cost,
> -  &generic_vector_cost,
> -  &generic_branch_cost,
> -  &generic_approx_modes,
> -  SVE_NOT_IMPLEMENTED, /* sve_width  */
> -  { 4, /* load_int.  */
> -    4, /* store_int.  */
> -    4, /* load_fp.  */
> -    4, /* store_fp.  */
> -    4, /* load_pred.  */
> -    4 /* store_pred.  */
> -  }, /* memmov_cost.  */
> -  2, /* issue_rate  */
> -  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
> -   | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops  */
> -  "16",	/* function_align.  */
> -  "4",	/* jump_align.  */
> -  "8",	/* loop_align.  */
> -  2,	/* int_reassoc_width.  */
> -  4,	/* fp_reassoc_width.  */
> -  1,	/* fma_reassoc_width.  */
> -  1,	/* vec_reassoc_width.  */
> -  2,	/* min_div_recip_mul_sf.  */
> -  2,	/* min_div_recip_mul_df.  */
> -  0,	/* max_case_values.  */
> -  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
> -  (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
> -  &generic_prefetch_tune,
> -  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
> -  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
> -};
> -
> -static const struct tune_params cortexa57_tunings =
> -{
> -  &cortexa57_extra_costs,
> -  &generic_addrcost_table,
> -  &cortexa57_regmove_cost,
> -  &cortexa57_vector_cost,
> -  &generic_branch_cost,
> -  &generic_approx_modes,
> -  SVE_NOT_IMPLEMENTED, /* sve_width  */
> -  { 4, /* load_int.  */
> -    4, /* store_int.  */
> -    4, /* load_fp.  */
> -    4, /* store_fp.  */
> -    4, /* load_pred.  */
> -    4 /* store_pred.  */
> -  }, /* memmov_cost.  */
> -  3, /* issue_rate  */
> -  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
> -   | AARCH64_FUSE_MOVK_MOVK), /* fusible_ops  */
> -  "16",	/* function_align.  */
> -  "4",	/* jump_align.  */
> -  "8",	/* loop_align.  */
> -  2,	/* int_reassoc_width.  */
> -  4,	/* fp_reassoc_width.  */
> -  1,	/* fma_reassoc_width.  */
> -  1,	/* vec_reassoc_width.  */
> -  2,	/* min_div_recip_mul_sf.  */
> -  2,	/* min_div_recip_mul_df.  */
> -  0,	/* max_case_values.  */
> -  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
> -  (AARCH64_EXTRA_TUNE_RENAME_FMA_REGS),	/* tune_flags.  */
> -  &generic_prefetch_tune,
> -  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
> -  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
> -};
> -
> -static const struct tune_params cortexa72_tunings =
> -{
> -  &cortexa57_extra_costs,
> -  &generic_addrcost_table,
> -  &cortexa57_regmove_cost,
> -  &cortexa57_vector_cost,
> -  &generic_branch_cost,
> -  &generic_approx_modes,
> -  SVE_NOT_IMPLEMENTED, /* sve_width  */
> -  { 4, /* load_int.  */
> -    4, /* store_int.  */
> -    4, /* load_fp.  */
> -    4, /* store_fp.  */
> -    4, /* load_pred.  */
> -    4 /* store_pred.  */
> -  }, /* memmov_cost.  */
> -  3, /* issue_rate  */
> -  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
> -   | AARCH64_FUSE_MOVK_MOVK), /* fusible_ops  */
> -  "16",	/* function_align.  */
> -  "4",	/* jump_align.  */
> -  "8",	/* loop_align.  */
> -  2,	/* int_reassoc_width.  */
> -  4,	/* fp_reassoc_width.  */
> -  1,	/* fma_reassoc_width.  */
> -  1,	/* vec_reassoc_width.  */
> -  2,	/* min_div_recip_mul_sf.  */
> -  2,	/* min_div_recip_mul_df.  */
> -  0,	/* max_case_values.  */
> -  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
> -  (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
> -  &generic_prefetch_tune,
> -  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
> -  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
> -};
> -
> -static const struct tune_params cortexa73_tunings =
> -{
> -  &cortexa57_extra_costs,
> -  &generic_addrcost_table,
> -  &cortexa57_regmove_cost,
> -  &cortexa57_vector_cost,
> -  &generic_branch_cost,
> -  &generic_approx_modes,
> -  SVE_NOT_IMPLEMENTED, /* sve_width  */
> -  { 4, /* load_int.  */
> -    4, /* store_int.  */
> -    4, /* load_fp.  */
> -    4, /* store_fp.  */
> -    4, /* load_pred.  */
> -    4 /* store_pred.  */
> -  }, /* memmov_cost.  */
> -  2, /* issue_rate.  */
> -  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
> -   | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops  */
> -  "16",	/* function_align.  */
> -  "4",	/* jump_align.  */
> -  "8",	/* loop_align.  */
> -  2,	/* int_reassoc_width.  */
> -  4,	/* fp_reassoc_width.  */
> -  1,	/* fma_reassoc_width.  */
> -  1,	/* vec_reassoc_width.  */
> -  2,	/* min_div_recip_mul_sf.  */
> -  2,	/* min_div_recip_mul_df.  */
> -  0,	/* max_case_values.  */
> -  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
> -  (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
> -  &generic_prefetch_tune,
> -  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
> -  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
> -};
> -
> -static const struct tune_params exynosm1_tunings =
> -{
> -  &exynosm1_extra_costs,
> -  &exynosm1_addrcost_table,
> -  &exynosm1_regmove_cost,
> -  &exynosm1_vector_cost,
> -  &generic_branch_cost,
> -  &exynosm1_approx_modes,
> -  SVE_NOT_IMPLEMENTED, /* sve_width  */
> -  { 4, /* load_int.  */
> -    4, /* store_int.  */
> -    4, /* load_fp.  */
> -    4, /* store_fp.  */
> -    4, /* load_pred.  */
> -    4 /* store_pred.  */
> -  }, /* memmov_cost.  */
> -  3,	/* issue_rate  */
> -  (AARCH64_FUSE_AES_AESMC), /* fusible_ops  */
> -  "4",	/* function_align.  */
> -  "4",	/* jump_align.  */
> -  "4",	/* loop_align.  */
> -  2,	/* int_reassoc_width.  */
> -  4,	/* fp_reassoc_width.  */
> -  1,	/* fma_reassoc_width.  */
> -  1,	/* vec_reassoc_width.  */
> -  2,	/* min_div_recip_mul_sf.  */
> -  2,	/* min_div_recip_mul_df.  */
> -  48,	/* max_case_values.  */
> -  tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model.  */
> -  (AARCH64_EXTRA_TUNE_NONE), /* tune_flags.  */
> -  &exynosm1_prefetch_tune,
> -  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
> -  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
> -};
> -
> -static const struct tune_params thunderxt88_tunings =
> -{
> -  &thunderx_extra_costs,
> -  &generic_addrcost_table,
> -  &thunderx_regmove_cost,
> -  &thunderx_vector_cost,
> -  &generic_branch_cost,
> -  &generic_approx_modes,
> -  SVE_NOT_IMPLEMENTED, /* sve_width  */
> -  { 6, /* load_int.  */
> -    6, /* store_int.  */
> -    6, /* load_fp.  */
> -    6, /* store_fp.  */
> -    6, /* load_pred.  */
> -    6 /* store_pred.  */
> -  }, /* memmov_cost.  */
> -  2, /* issue_rate  */
> -  AARCH64_FUSE_ALU_BRANCH, /* fusible_ops  */
> -  "8",	/* function_align.  */
> -  "8",	/* jump_align.  */
> -  "8",	/* loop_align.  */
> -  2,	/* int_reassoc_width.  */
> -  4,	/* fp_reassoc_width.  */
> -  1,	/* fma_reassoc_width.  */
> -  1,	/* vec_reassoc_width.  */
> -  2,	/* min_div_recip_mul_sf.  */
> -  2,	/* min_div_recip_mul_df.  */
> -  0,	/* max_case_values.  */
> -  tune_params::AUTOPREFETCHER_OFF,	/* autoprefetcher_model.  */
> -  (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
> -  &thunderxt88_prefetch_tune,
> -  AARCH64_LDP_STP_POLICY_ALIGNED,   /* ldp_policy_model.  */
> -  AARCH64_LDP_STP_POLICY_ALIGNED    /* stp_policy_model.  */
> -};
> -
> -static const struct tune_params thunderx_tunings =
> -{
> -  &thunderx_extra_costs,
> -  &generic_addrcost_table,
> -  &thunderx_regmove_cost,
> -  &thunderx_vector_cost,
> -  &generic_branch_cost,
> -  &generic_approx_modes,
> -  SVE_NOT_IMPLEMENTED, /* sve_width  */
> -  { 6, /* load_int.  */
> -    6, /* store_int.  */
> -    6, /* load_fp.  */
> -    6, /* store_fp.  */
> -    6, /* load_pred.  */
> -    6 /* store_pred.  */
> -  }, /* memmov_cost.  */
> -  2, /* issue_rate  */
> -  AARCH64_FUSE_ALU_BRANCH, /* fusible_ops  */
> -  "8",	/* function_align.  */
> -  "8",	/* jump_align.  */
> -  "8",	/* loop_align.  */
> -  2,	/* int_reassoc_width.  */
> -  4,	/* fp_reassoc_width.  */
> -  1,	/* fma_reassoc_width.  */
> -  1,	/* vec_reassoc_width.  */
> -  2,	/* min_div_recip_mul_sf.  */
> -  2,	/* min_div_recip_mul_df.  */
> -  0,	/* max_case_values.  */
> -  tune_params::AUTOPREFETCHER_OFF,	/* autoprefetcher_model.  */
> -  (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND),	/* tune_flags.  */
> -  &thunderx_prefetch_tune,
> -  AARCH64_LDP_STP_POLICY_ALIGNED,   /* ldp_policy_model.  */
> -  AARCH64_LDP_STP_POLICY_ALIGNED    /* stp_policy_model.  */
> -};
> -
> -static const struct tune_params tsv110_tunings =
> -{
> -  &tsv110_extra_costs,
> -  &tsv110_addrcost_table,
> -  &tsv110_regmove_cost,
> -  &tsv110_vector_cost,
> -  &generic_branch_cost,
> -  &generic_approx_modes,
> -  SVE_NOT_IMPLEMENTED, /* sve_width  */
> -  { 4, /* load_int.  */
> -    4, /* store_int.  */
> -    4, /* load_fp.  */
> -    4, /* store_fp.  */
> -    4, /* load_pred.  */
> -    4 /* store_pred.  */
> -  }, /* memmov_cost.  */
> -  4,    /* issue_rate  */
> -  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_ALU_BRANCH
> -   | AARCH64_FUSE_ALU_CBZ), /* fusible_ops  */
> -  "16", /* function_align.  */
> -  "4",  /* jump_align.  */
> -  "8",  /* loop_align.  */
> -  2,    /* int_reassoc_width.  */
> -  4,    /* fp_reassoc_width.  */
> -  1,	/* fma_reassoc_width.  */
> -  1,    /* vec_reassoc_width.  */
> -  2,    /* min_div_recip_mul_sf.  */
> -  2,    /* min_div_recip_mul_df.  */
> -  0,    /* max_case_values.  */
> -  tune_params::AUTOPREFETCHER_WEAK,     /* autoprefetcher_model.  */
> -  (AARCH64_EXTRA_TUNE_NONE),     /* tune_flags.  */
> -  &tsv110_prefetch_tune,
> -  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
> -  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
> -};
> -
> -static const struct tune_params xgene1_tunings =
> -{
> -  &xgene1_extra_costs,
> -  &xgene1_addrcost_table,
> -  &xgene1_regmove_cost,
> -  &xgene1_vector_cost,
> -  &generic_branch_cost,
> -  &xgene1_approx_modes,
> -  SVE_NOT_IMPLEMENTED, /* sve_width  */
> -  { 6, /* load_int.  */
> -    6, /* store_int.  */
> -    6, /* load_fp.  */
> -    6, /* store_fp.  */
> -    6, /* load_pred.  */
> -    6 /* store_pred.  */
> -  }, /* memmov_cost.  */
> -  4, /* issue_rate  */
> -  AARCH64_FUSE_NOTHING, /* fusible_ops  */
> -  "16",	/* function_align.  */
> -  "16",	/* jump_align.  */
> -  "16",	/* loop_align.  */
> -  2,	/* int_reassoc_width.  */
> -  4,	/* fp_reassoc_width.  */
> -  1,	/* fma_reassoc_width.  */
> -  1,	/* vec_reassoc_width.  */
> -  2,	/* min_div_recip_mul_sf.  */
> -  2,	/* min_div_recip_mul_df.  */
> -  17,	/* max_case_values.  */
> -  tune_params::AUTOPREFETCHER_OFF,	/* autoprefetcher_model.  */
> -  (AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS),	/* tune_flags.  */
> -  &xgene1_prefetch_tune,
> -  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
> -  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
> -};
> -
> -static const struct tune_params emag_tunings =
> -{
> -  &xgene1_extra_costs,
> -  &xgene1_addrcost_table,
> -  &xgene1_regmove_cost,
> -  &xgene1_vector_cost,
> -  &generic_branch_cost,
> -  &xgene1_approx_modes,
> -  SVE_NOT_IMPLEMENTED,
> -  { 6, /* load_int.  */
> -    6, /* store_int.  */
> -    6, /* load_fp.  */
> -    6, /* store_fp.  */
> -    6, /* load_pred.  */
> -    6 /* store_pred.  */
> -  }, /* memmov_cost.  */
> -  4, /* issue_rate  */
> -  AARCH64_FUSE_NOTHING, /* fusible_ops  */
> -  "16",	/* function_align.  */
> -  "16",	/* jump_align.  */
> -  "16",	/* loop_align.  */
> -  2,	/* int_reassoc_width.  */
> -  4,	/* fp_reassoc_width.  */
> -  1,	/* fma_reassoc_width.  */
> -  1,	/* vec_reassoc_width.  */
> -  2,	/* min_div_recip_mul_sf.  */
> -  2,	/* min_div_recip_mul_df.  */
> -  17,	/* max_case_values.  */
> -  tune_params::AUTOPREFETCHER_OFF,	/* autoprefetcher_model.  */
> -  (AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS),	/* tune_flags.  */
> -  &xgene1_prefetch_tune,
> -  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
> -  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
> -};
> -
> -static const struct tune_params qdf24xx_tunings =
> -{
> -  &qdf24xx_extra_costs,
> -  &qdf24xx_addrcost_table,
> -  &qdf24xx_regmove_cost,
> -  &qdf24xx_vector_cost,
> -  &generic_branch_cost,
> -  &generic_approx_modes,
> -  SVE_NOT_IMPLEMENTED, /* sve_width  */
> -  { 4, /* load_int.  */
> -    4, /* store_int.  */
> -    4, /* load_fp.  */
> -    4, /* store_fp.  */
> -    4, /* load_pred.  */
> -    4 /* store_pred.  */
> -  }, /* memmov_cost.  */
> -  4, /* issue_rate  */
> -  (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
> -   | AARCH64_FUSE_MOVK_MOVK), /* fuseable_ops  */
> -  "16",	/* function_align.  */
> -  "8",	/* jump_align.  */
> -  "16",	/* loop_align.  */
> -  2,	/* int_reassoc_width.  */
> -  4,	/* fp_reassoc_width.  */
> -  1,	/* fma_reassoc_width.  */
> -  1,	/* vec_reassoc_width.  */
> -  2,	/* min_div_recip_mul_sf.  */
> -  2,	/* min_div_recip_mul_df.  */
> -  0,	/* max_case_values.  */
> -  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
> -  AARCH64_EXTRA_TUNE_RENAME_LOAD_REGS, /* tune_flags.  */
> -  &qdf24xx_prefetch_tune,
> -  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
> -  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
> -};
> -
> -/* Tuning structure for the Qualcomm Saphira core.  Default to falkor values
> -   for now.  */
> -static const struct tune_params saphira_tunings =
> -{
> -  &generic_extra_costs,
> -  &generic_addrcost_table,
> -  &generic_regmove_cost,
> -  &generic_vector_cost,
> -  &generic_branch_cost,
> -  &generic_approx_modes,
> -  SVE_NOT_IMPLEMENTED, /* sve_width  */
> -  { 4, /* load_int.  */
> -    4, /* store_int.  */
> -    4, /* load_fp.  */
> -    4, /* store_fp.  */
> -    4, /* load_pred.  */
> -    4 /* store_pred.  */
> -  }, /* memmov_cost.  */
> -  4, /* issue_rate  */
> -  (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
> -   | AARCH64_FUSE_MOVK_MOVK), /* fuseable_ops  */
> -  "16",	/* function_align.  */
> -  "8",	/* jump_align.  */
> -  "16",	/* loop_align.  */
> -  2,	/* int_reassoc_width.  */
> -  4,	/* fp_reassoc_width.  */
> -  1,	/* fma_reassoc_width.  */
> -  1,	/* vec_reassoc_width.  */
> -  2,	/* min_div_recip_mul_sf.  */
> -  2,	/* min_div_recip_mul_df.  */
> -  0,	/* max_case_values.  */
> -  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
> -  (AARCH64_EXTRA_TUNE_NONE),		/* tune_flags.  */
> -  &generic_prefetch_tune,
> -  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
> -  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
> -};
> -
> -static const struct tune_params thunderx2t99_tunings =
> -{
> -  &thunderx2t99_extra_costs,
> -  &thunderx2t99_addrcost_table,
> -  &thunderx2t99_regmove_cost,
> -  &thunderx2t99_vector_cost,
> -  &generic_branch_cost,
> -  &generic_approx_modes,
> -  SVE_NOT_IMPLEMENTED, /* sve_width  */
> -  { 4, /* load_int.  */
> -    4, /* store_int.  */
> -    4, /* load_fp.  */
> -    4, /* store_fp.  */
> -    4, /* load_pred.  */
> -    4 /* store_pred.  */
> -  }, /* memmov_cost.  */
> -  4, /* issue_rate.  */
> -  (AARCH64_FUSE_ALU_BRANCH | AARCH64_FUSE_AES_AESMC
> -   | AARCH64_FUSE_ALU_CBZ), /* fusible_ops  */
> -  "16",	/* function_align.  */
> -  "8",	/* jump_align.  */
> -  "16",	/* loop_align.  */
> -  3,	/* int_reassoc_width.  */
> -  2,	/* fp_reassoc_width.  */
> -  1,	/* fma_reassoc_width.  */
> -  2,	/* vec_reassoc_width.  */
> -  2,	/* min_div_recip_mul_sf.  */
> -  2,	/* min_div_recip_mul_df.  */
> -  0,	/* max_case_values.  */
> -  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
> -  (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
> -  &thunderx2t99_prefetch_tune,
> -  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
> -  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
> -};
> -
> -static const struct tune_params thunderx3t110_tunings =
> -{
> -  &thunderx3t110_extra_costs,
> -  &thunderx3t110_addrcost_table,
> -  &thunderx3t110_regmove_cost,
> -  &thunderx3t110_vector_cost,
> -  &generic_branch_cost,
> -  &generic_approx_modes,
> -  SVE_NOT_IMPLEMENTED, /* sve_width  */
> -  { 4, /* load_int.  */
> -    4, /* store_int.  */
> -    4, /* load_fp.  */
> -    4, /* store_fp.  */
> -    4, /* load_pred.  */
> -    4 /* store_pred.  */
> -  }, /* memmov_cost.  */
> -  6, /* issue_rate.  */
> -  (AARCH64_FUSE_ALU_BRANCH | AARCH64_FUSE_AES_AESMC
> -   | AARCH64_FUSE_ALU_CBZ), /* fusible_ops  */
> -  "16",	/* function_align.  */
> -  "8",	/* jump_align.  */
> -  "16",	/* loop_align.  */
> -  3,	/* int_reassoc_width.  */
> -  2,	/* fp_reassoc_width.  */
> -  1,	/* fma_reassoc_width.  */
> -  2,	/* vec_reassoc_width.  */
> -  2,	/* min_div_recip_mul_sf.  */
> -  2,	/* min_div_recip_mul_df.  */
> -  0,	/* max_case_values.  */
> -  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
> -  (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
> -  &thunderx3t110_prefetch_tune,
> -  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
> -  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
> -};
> -
> -static const struct tune_params neoversen1_tunings =
> -{
> -  &cortexa76_extra_costs,
> -  &generic_addrcost_table,
> -  &generic_regmove_cost,
> -  &cortexa57_vector_cost,
> -  &generic_branch_cost,
> -  &generic_approx_modes,
> -  SVE_NOT_IMPLEMENTED, /* sve_width  */
> -  { 4, /* load_int.  */
> -    2, /* store_int.  */
> -    5, /* load_fp.  */
> -    2, /* store_fp.  */
> -    4, /* load_pred.  */
> -    4 /* store_pred.  */
> -  }, /* memmov_cost.  */
> -  3, /* issue_rate  */
> -  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops  */
> -  "32:16",	/* function_align.  */
> -  "4",		/* jump_align.  */
> -  "32:16",	/* loop_align.  */
> -  2,	/* int_reassoc_width.  */
> -  4,	/* fp_reassoc_width.  */
> -  1,	/* fma_reassoc_width.  */
> -  2,	/* vec_reassoc_width.  */
> -  2,	/* min_div_recip_mul_sf.  */
> -  2,	/* min_div_recip_mul_df.  */
> -  0,	/* max_case_values.  */
> -  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
> -  (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND),	/* tune_flags.  */
> -  &generic_prefetch_tune,
> -  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
> -  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
> -};
> -
> -static const struct tune_params ampere1_tunings =
> -{
> -  &ampere1_extra_costs,
> -  &generic_addrcost_table,
> -  &generic_regmove_cost,
> -  &ampere1_vector_cost,
> -  &generic_branch_cost,
> -  &generic_approx_modes,
> -  SVE_NOT_IMPLEMENTED, /* sve_width  */
> -  { 4, /* load_int.  */
> -    4, /* store_int.  */
> -    4, /* load_fp.  */
> -    4, /* store_fp.  */
> -    4, /* load_pred.  */
> -    4 /* store_pred.  */
> -  }, /* memmov_cost.  */
> -  4, /* issue_rate  */
> -  (AARCH64_FUSE_ADRP_ADD | AARCH64_FUSE_AES_AESMC |
> -   AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_MOVK_MOVK |
> -   AARCH64_FUSE_ALU_BRANCH /* adds, ands, bics, ccmp, ccmn */ |
> -   AARCH64_FUSE_CMP_BRANCH),
> -  /* fusible_ops  */
> -  "32",		/* function_align.  */
> -  "4",		/* jump_align.  */
> -  "32:16",	/* loop_align.  */
> -  2,	/* int_reassoc_width.  */
> -  4,	/* fp_reassoc_width.  */
> -  4,	/* fma_reassoc_width.  */
> -  2,	/* vec_reassoc_width.  */
> -  2,	/* min_div_recip_mul_sf.  */
> -  2,	/* min_div_recip_mul_df.  */
> -  0,	/* max_case_values.  */
> -  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
> -  (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
> -  &ampere1_prefetch_tune,
> -  AARCH64_LDP_STP_POLICY_ALIGNED,   /* ldp_policy_model.  */
> -  AARCH64_LDP_STP_POLICY_ALIGNED    /* stp_policy_model.  */
> -};
> -
> -static const struct tune_params ampere1a_tunings =
> -{
> -  &ampere1a_extra_costs,
> -  &generic_addrcost_table,
> -  &generic_regmove_cost,
> -  &ampere1_vector_cost,
> -  &generic_branch_cost,
> -  &generic_approx_modes,
> -  SVE_NOT_IMPLEMENTED, /* sve_width  */
> -  { 4, /* load_int.  */
> -    4, /* store_int.  */
> -    4, /* load_fp.  */
> -    4, /* store_fp.  */
> -    4, /* load_pred.  */
> -    4 /* store_pred.  */
> -  }, /* memmov_cost.  */
> -  4, /* issue_rate  */
> -  (AARCH64_FUSE_ADRP_ADD | AARCH64_FUSE_AES_AESMC |
> -   AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_MOVK_MOVK |
> -   AARCH64_FUSE_ALU_BRANCH /* adds, ands, bics, ccmp, ccmn */ |
> -   AARCH64_FUSE_CMP_BRANCH | AARCH64_FUSE_ALU_CBZ |
> -   AARCH64_FUSE_ADDSUB_2REG_CONST1),
> -  /* fusible_ops  */
> -  "32",		/* function_align.  */
> -  "4",		/* jump_align.  */
> -  "32:16",	/* loop_align.  */
> -  2,	/* int_reassoc_width.  */
> -  4,	/* fp_reassoc_width.  */
> -  1,	/* fma_reassoc_width.  */
> -  2,	/* vec_reassoc_width.  */
> -  2,	/* min_div_recip_mul_sf.  */
> -  2,	/* min_div_recip_mul_df.  */
> -  0,	/* max_case_values.  */
> -  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
> -  (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
> -  &ampere1_prefetch_tune,
> -  AARCH64_LDP_STP_POLICY_ALIGNED,   /* ldp_policy_model.  */
> -  AARCH64_LDP_STP_POLICY_ALIGNED    /* stp_policy_model.  */
> -};
> -
> -static const advsimd_vec_cost neoversev1_advsimd_vector_cost =
> -{
> -  2, /* int_stmt_cost  */
> -  2, /* fp_stmt_cost  */
> -  4, /* ld2_st2_permute_cost */
> -  4, /* ld3_st3_permute_cost  */
> -  5, /* ld4_st4_permute_cost  */
> -  3, /* permute_cost  */
> -  4, /* reduc_i8_cost  */
> -  4, /* reduc_i16_cost  */
> -  2, /* reduc_i32_cost  */
> -  2, /* reduc_i64_cost  */
> -  6, /* reduc_f16_cost  */
> -  3, /* reduc_f32_cost  */
> -  2, /* reduc_f64_cost  */
> -  2, /* store_elt_extra_cost  */
> -  /* This value is just inherited from the Cortex-A57 table.  */
> -  8, /* vec_to_scalar_cost  */
> -  /* This depends very much on what the scalar value is and
> -     where it comes from.  E.g. some constants take two dependent
> -     instructions or a load, while others might be moved from a GPR.
> -     4 seems to be a reasonable compromise in practice.  */
> -  4, /* scalar_to_vec_cost  */
> -  4, /* align_load_cost  */
> -  4, /* unalign_load_cost  */
> -  /* Although stores have a latency of 2 and compete for the
> -     vector pipes, in practice it's better not to model that.  */
> -  1, /* unalign_store_cost  */
> -  1  /* store_cost  */
> -};
> -
> -static const sve_vec_cost neoversev1_sve_vector_cost =
> -{
> -  {
> -    2, /* int_stmt_cost  */
> -    2, /* fp_stmt_cost  */
> -    4, /* ld2_st2_permute_cost  */
> -    7, /* ld3_st3_permute_cost  */
> -    8, /* ld4_st4_permute_cost  */
> -    3, /* permute_cost  */
> -    /* Theoretically, a reduction involving 31 scalar ADDs could
> -       complete in ~9 cycles and would have a cost of 31.  [SU]ADDV
> -       completes in 14 cycles, so give it a cost of 31 + 5.  */
> -    36, /* reduc_i8_cost  */
> -    /* Likewise for 15 scalar ADDs (~5 cycles) vs. 12: 15 + 7.  */
> -    22, /* reduc_i16_cost  */
> -    /* Likewise for 7 scalar ADDs (~3 cycles) vs. 10: 7 + 7.  */
> -    14, /* reduc_i32_cost  */
> -    /* Likewise for 3 scalar ADDs (~2 cycles) vs. 10: 3 + 8.  */
> -    11, /* reduc_i64_cost  */
> -    /* Theoretically, a reduction involving 15 scalar FADDs could
> -       complete in ~9 cycles and would have a cost of 30.  FADDV
> -       completes in 13 cycles, so give it a cost of 30 + 4.  */
> -    34, /* reduc_f16_cost  */
> -    /* Likewise for 7 scalar FADDs (~6 cycles) vs. 11: 14 + 5.  */
> -    19, /* reduc_f32_cost  */
> -    /* Likewise for 3 scalar FADDs (~4 cycles) vs. 9: 6 + 5.  */
> -    11, /* reduc_f64_cost  */
> -    2, /* store_elt_extra_cost  */
> -    /* This value is just inherited from the Cortex-A57 table.  */
> -    8, /* vec_to_scalar_cost  */
> -    /* See the comment above the Advanced SIMD versions.  */
> -    4, /* scalar_to_vec_cost  */
> -    4, /* align_load_cost  */
> -    4, /* unalign_load_cost  */
> -    /* Although stores have a latency of 2 and compete for the
> -       vector pipes, in practice it's better not to model that.  */
> -    1, /* unalign_store_cost  */
> -    1  /* store_cost  */
> -  },
> -  3, /* clast_cost  */
> -  19, /* fadda_f16_cost  */
> -  11, /* fadda_f32_cost  */
> -  8, /* fadda_f64_cost  */
> -  32, /* gather_load_x32_cost  */
> -  16, /* gather_load_x64_cost  */
> -  3 /* scatter_store_elt_cost  */
> -};
> -
> -static const aarch64_scalar_vec_issue_info neoversev1_scalar_issue_info =
> -{
> -  3, /* loads_stores_per_cycle  */
> -  2, /* stores_per_cycle  */
> -  4, /* general_ops_per_cycle  */
> -  0, /* fp_simd_load_general_ops  */
> -  1 /* fp_simd_store_general_ops  */
> -};
> -
> -static const aarch64_advsimd_vec_issue_info neoversev1_advsimd_issue_info =
> -{
> -  {
> -    3, /* loads_stores_per_cycle  */
> -    2, /* stores_per_cycle  */
> -    4, /* general_ops_per_cycle  */
> -    0, /* fp_simd_load_general_ops  */
> -    1 /* fp_simd_store_general_ops  */
> -  },
> -  2, /* ld2_st2_general_ops  */
> -  2, /* ld3_st3_general_ops  */
> -  3 /* ld4_st4_general_ops  */
> -};
> -
> -static const aarch64_sve_vec_issue_info neoversev1_sve_issue_info =
> -{
> -  {
> -    {
> -      2, /* loads_per_cycle  */
> -      2, /* stores_per_cycle  */
> -      2, /* general_ops_per_cycle  */
> -      0, /* fp_simd_load_general_ops  */
> -      1 /* fp_simd_store_general_ops  */
> -    },
> -    2, /* ld2_st2_general_ops  */
> -    2, /* ld3_st3_general_ops  */
> -    3 /* ld4_st4_general_ops  */
> -  },
> -  1, /* pred_ops_per_cycle  */
> -  2, /* while_pred_ops  */
> -  2, /* int_cmp_pred_ops  */
> -  1, /* fp_cmp_pred_ops  */
> -  1, /* gather_scatter_pair_general_ops  */
> -  1 /* gather_scatter_pair_pred_ops  */
> -};
> -
> -static const aarch64_vec_issue_info neoversev1_vec_issue_info =
> -{
> -  &neoversev1_scalar_issue_info,
> -  &neoversev1_advsimd_issue_info,
> -  &neoversev1_sve_issue_info
> -};
> -
> -/* Neoverse V1 costs for vector insn classes.  */
> -static const struct cpu_vector_cost neoversev1_vector_cost =
> -{
> -  1, /* scalar_int_stmt_cost  */
> -  2, /* scalar_fp_stmt_cost  */
> -  4, /* scalar_load_cost  */
> -  1, /* scalar_store_cost  */
> -  1, /* cond_taken_branch_cost  */
> -  1, /* cond_not_taken_branch_cost  */
> -  &neoversev1_advsimd_vector_cost, /* advsimd  */
> -  &neoversev1_sve_vector_cost, /* sve  */
> -  &neoversev1_vec_issue_info /* issue_info  */
> -};
> -
> -static const struct tune_params neoversev1_tunings =
> -{
> -  &cortexa76_extra_costs,
> -  &neoversev1_addrcost_table,
> -  &neoversev1_regmove_cost,
> -  &neoversev1_vector_cost,
> -  &generic_branch_cost,
> -  &generic_approx_modes,
> -  SVE_256, /* sve_width  */
> -  { 4, /* load_int.  */
> -    2, /* store_int.  */
> -    6, /* load_fp.  */
> -    2, /* store_fp.  */
> -    6, /* load_pred.  */
> -    1 /* store_pred.  */
> -  }, /* memmov_cost.  */
> -  3, /* issue_rate  */
> -  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops  */
> -  "32:16",	/* function_align.  */
> -  "4",		/* jump_align.  */
> -  "32:16",	/* loop_align.  */
> -  2,	/* int_reassoc_width.  */
> -  4,	/* fp_reassoc_width.  */
> -  4,	/* fma_reassoc_width.  */
> -  2,	/* vec_reassoc_width.  */
> -  2,	/* min_div_recip_mul_sf.  */
> -  2,	/* min_div_recip_mul_df.  */
> -  0,	/* max_case_values.  */
> -  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
> -  (AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
> -   | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
> -   | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT
> -   | AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND),	/* tune_flags.  */
> -  &generic_prefetch_tune,
> -  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
> -  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
> -};
> -
> -static const sve_vec_cost neoverse512tvb_sve_vector_cost =
> -{
> -  {
> -    2, /* int_stmt_cost  */
> -    2, /* fp_stmt_cost  */
> -    4, /* ld2_st2_permute_cost  */
> -    5, /* ld3_st3_permute_cost  */
> -    5, /* ld4_st4_permute_cost  */
> -    3, /* permute_cost  */
> -    /* Theoretically, a reduction involving 15 scalar ADDs could
> -       complete in ~5 cycles and would have a cost of 15.  Assume that
> -       [SU]ADDV completes in 11 cycles and so give it a cost of 15 + 6.  */
> -    21, /* reduc_i8_cost  */
> -    /* Likewise for 7 scalar ADDs (~3 cycles) vs. 9: 7 + 6.  */
> -    13, /* reduc_i16_cost  */
> -    /* Likewise for 3 scalar ADDs (~2 cycles) vs. 8: 3 + 6.  */
> -    9, /* reduc_i32_cost  */
> -    /* Likewise for 1 scalar ADD (1 cycle) vs. 8: 1 + 7.  */
> -    8, /* reduc_i64_cost  */
> -    /* Theoretically, a reduction involving 7 scalar FADDs could
> -       complete in ~6 cycles and would have a cost of 14.  Assume that
> -       FADDV completes in 8 cycles and so give it a cost of 14 + 2.  */
> -    16, /* reduc_f16_cost  */
> -    /* Likewise for 3 scalar FADDs (~4 cycles) vs. 6: 6 + 2.  */
> -    8, /* reduc_f32_cost  */
> -    /* Likewise for 1 scalar FADD (2 cycles) vs. 4: 2 + 2.  */
> -    4, /* reduc_f64_cost  */
> -    2, /* store_elt_extra_cost  */
> -    /* This value is just inherited from the Cortex-A57 table.  */
> -    8, /* vec_to_scalar_cost  */
> -    /* This depends very much on what the scalar value is and
> -       where it comes from.  E.g. some constants take two dependent
> -       instructions or a load, while others might be moved from a GPR.
> -       4 seems to be a reasonable compromise in practice.  */
> -    4, /* scalar_to_vec_cost  */
> -    4, /* align_load_cost  */
> -    4, /* unalign_load_cost  */
> -    /* Although stores generally have a latency of 2 and compete for the
> -       vector pipes, in practice it's better not to model that.  */
> -    1, /* unalign_store_cost  */
> -    1  /* store_cost  */
> -  },
> -  3, /* clast_cost  */
> -  10, /* fadda_f16_cost  */
> -  6, /* fadda_f32_cost  */
> -  4, /* fadda_f64_cost  */
> -  /* A strided Advanced SIMD x64 load would take two parallel FP loads
> -     (6 cycles) plus an insertion (2 cycles).  Assume a 64-bit SVE gather
> -     is 1 cycle more.  The Advanced SIMD version is costed as 2 scalar loads
> -     (cost 8) and a vec_construct (cost 2).  Add a full vector operation
> -     (cost 2) to that, to avoid the difference being lost in rounding.
> -
> -     There is no easy comparison between a strided Advanced SIMD x32 load
> -     and an SVE 32-bit gather, but cost an SVE 32-bit gather as 1 vector
> -     operation more than a 64-bit gather.  */
> -  14, /* gather_load_x32_cost  */
> -  12, /* gather_load_x64_cost  */
> -  3 /* scatter_store_elt_cost  */
> -};
> -
> -static const aarch64_sve_vec_issue_info neoverse512tvb_sve_issue_info =
> -{
> -  {
> -    {
> -      3, /* loads_per_cycle  */
> -      2, /* stores_per_cycle  */
> -      4, /* general_ops_per_cycle  */
> -      0, /* fp_simd_load_general_ops  */
> -      1 /* fp_simd_store_general_ops  */
> -    },
> -    2, /* ld2_st2_general_ops  */
> -    2, /* ld3_st3_general_ops  */
> -    3 /* ld4_st4_general_ops  */
> -  },
> -  2, /* pred_ops_per_cycle  */
> -  2, /* while_pred_ops  */
> -  2, /* int_cmp_pred_ops  */
> -  1, /* fp_cmp_pred_ops  */
> -  1, /* gather_scatter_pair_general_ops  */
> -  1 /* gather_scatter_pair_pred_ops  */
> -};
> -
> -static const aarch64_vec_issue_info neoverse512tvb_vec_issue_info =
> -{
> -  &neoversev1_scalar_issue_info,
> -  &neoversev1_advsimd_issue_info,
> -  &neoverse512tvb_sve_issue_info
> -};
> -
> -static const struct cpu_vector_cost neoverse512tvb_vector_cost =
> -{
> -  1, /* scalar_int_stmt_cost  */
> -  2, /* scalar_fp_stmt_cost  */
> -  4, /* scalar_load_cost  */
> -  1, /* scalar_store_cost  */
> -  1, /* cond_taken_branch_cost  */
> -  1, /* cond_not_taken_branch_cost  */
> -  &neoversev1_advsimd_vector_cost, /* advsimd  */
> -  &neoverse512tvb_sve_vector_cost, /* sve  */
> -  &neoverse512tvb_vec_issue_info /* issue_info  */
> -};
> -
> -static const struct tune_params neoverse512tvb_tunings =
> -{
> -  &cortexa76_extra_costs,
> -  &neoversev1_addrcost_table,
> -  &neoversev1_regmove_cost,
> -  &neoverse512tvb_vector_cost,
> -  &generic_branch_cost,
> -  &generic_approx_modes,
> -  SVE_128 | SVE_256, /* sve_width  */
> -  { 4, /* load_int.  */
> -    2, /* store_int.  */
> -    6, /* load_fp.  */
> -    2, /* store_fp.  */
> -    6, /* load_pred.  */
> -    1 /* store_pred.  */
> -  }, /* memmov_cost.  */
> -  3, /* issue_rate  */
> -  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops  */
> -  "32:16",	/* function_align.  */
> -  "4",		/* jump_align.  */
> -  "32:16",	/* loop_align.  */
> -  2,	/* int_reassoc_width.  */
> -  4,	/* fp_reassoc_width.  */
> -  4,	/* fma_reassoc_width.  */
> -  2,	/* vec_reassoc_width.  */
> -  2,	/* min_div_recip_mul_sf.  */
> -  2,	/* min_div_recip_mul_df.  */
> -  0,	/* max_case_values.  */
> -  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
> -  (AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
> -   | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
> -   | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT),	/* tune_flags.  */
> -  &generic_prefetch_tune,
> -  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
> -  AARCH64_LDP_STP_POLICY_ALWAYS	   /* stp_policy_model.  */
> -};
> -
> -static const advsimd_vec_cost neoversen2_advsimd_vector_cost =
> -{
> -  2, /* int_stmt_cost  */
> -  2, /* fp_stmt_cost  */
> -  2, /* ld2_st2_permute_cost */
> -  2, /* ld3_st3_permute_cost  */
> -  3, /* ld4_st4_permute_cost  */
> -  3, /* permute_cost  */
> -  4, /* reduc_i8_cost  */
> -  4, /* reduc_i16_cost  */
> -  2, /* reduc_i32_cost  */
> -  2, /* reduc_i64_cost  */
> -  6, /* reduc_f16_cost  */
> -  4, /* reduc_f32_cost  */
> -  2, /* reduc_f64_cost  */
> -  2, /* store_elt_extra_cost  */
> -  /* This value is just inherited from the Cortex-A57 table.  */
> -  8, /* vec_to_scalar_cost  */
> -  /* This depends very much on what the scalar value is and
> -     where it comes from.  E.g. some constants take two dependent
> -     instructions or a load, while others might be moved from a GPR.
> -     4 seems to be a reasonable compromise in practice.  */
> -  4, /* scalar_to_vec_cost  */
> -  4, /* align_load_cost  */
> -  4, /* unalign_load_cost  */
> -  /* Although stores have a latency of 2 and compete for the
> -     vector pipes, in practice it's better not to model that.  */
> -  1, /* unalign_store_cost  */
> -  1  /* store_cost  */
> -};
> -
> -static const sve_vec_cost neoversen2_sve_vector_cost =
> -{
> -  {
> -    2, /* int_stmt_cost  */
> -    2, /* fp_stmt_cost  */
> -    3, /* ld2_st2_permute_cost  */
> -    4, /* ld3_st3_permute_cost  */
> -    4, /* ld4_st4_permute_cost  */
> -    3, /* permute_cost  */
> -    /* Theoretically, a reduction involving 15 scalar ADDs could
> -       complete in ~5 cycles and would have a cost of 15.  [SU]ADDV
> -       completes in 11 cycles, so give it a cost of 15 + 6.  */
> -    21, /* reduc_i8_cost  */
> -    /* Likewise for 7 scalar ADDs (~3 cycles) vs. 9: 7 + 6.  */
> -    13, /* reduc_i16_cost  */
> -    /* Likewise for 3 scalar ADDs (~2 cycles) vs. 8: 3 + 6.  */
> -    9, /* reduc_i32_cost  */
> -    /* Likewise for 1 scalar ADD (~1 cycles) vs. 2: 1 + 1.  */
> -    2, /* reduc_i64_cost  */
> -    /* Theoretically, a reduction involving 7 scalar FADDs could
> -       complete in ~8 cycles and would have a cost of 14.  FADDV
> -       completes in 6 cycles, so give it a cost of 14 - 2.  */
> -    12, /* reduc_f16_cost  */
> -    /* Likewise for 3 scalar FADDs (~4 cycles) vs. 4: 6 - 0.  */
> -    6, /* reduc_f32_cost  */
> -    /* Likewise for 1 scalar FADD (~2 cycles) vs. 2: 2 - 0.  */
> -    2, /* reduc_f64_cost  */
> -    2, /* store_elt_extra_cost  */
> -    /* This value is just inherited from the Cortex-A57 table.  */
> -    8, /* vec_to_scalar_cost  */
> -    /* See the comment above the Advanced SIMD versions.  */
> -    4, /* scalar_to_vec_cost  */
> -    4, /* align_load_cost  */
> -    4, /* unalign_load_cost  */
> -    /* Although stores have a latency of 2 and compete for the
> -       vector pipes, in practice it's better not to model that.  */
> -    1, /* unalign_store_cost  */
> -    1  /* store_cost  */
> -  },
> -  3, /* clast_cost  */
> -  10, /* fadda_f16_cost  */
> -  6, /* fadda_f32_cost  */
> -  4, /* fadda_f64_cost  */
> -  /* A strided Advanced SIMD x64 load would take two parallel FP loads
> -     (8 cycles) plus an insertion (2 cycles).  Assume a 64-bit SVE gather
> -     is 1 cycle more.  The Advanced SIMD version is costed as 2 scalar loads
> -     (cost 8) and a vec_construct (cost 2).  Add a full vector operation
> -     (cost 2) to that, to avoid the difference being lost in rounding.
> -
> -     There is no easy comparison between a strided Advanced SIMD x32 load
> -     and an SVE 32-bit gather, but cost an SVE 32-bit gather as 1 vector
> -     operation more than a 64-bit gather.  */
> -  14, /* gather_load_x32_cost  */
> -  12, /* gather_load_x64_cost  */
> -  3 /* scatter_store_elt_cost  */
> -};
> -
> -static const aarch64_scalar_vec_issue_info neoversen2_scalar_issue_info =
> -{
> -  3, /* loads_stores_per_cycle  */
> -  2, /* stores_per_cycle  */
> -  4, /* general_ops_per_cycle  */
> -  0, /* fp_simd_load_general_ops  */
> -  1 /* fp_simd_store_general_ops  */
> -};
> -
> -static const aarch64_advsimd_vec_issue_info neoversen2_advsimd_issue_info =
> -{
> -  {
> -    3, /* loads_stores_per_cycle  */
> -    2, /* stores_per_cycle  */
> -    2, /* general_ops_per_cycle  */
> -    0, /* fp_simd_load_general_ops  */
> -    1 /* fp_simd_store_general_ops  */
> -  },
> -  2, /* ld2_st2_general_ops  */
> -  2, /* ld3_st3_general_ops  */
> -  3 /* ld4_st4_general_ops  */
> -};
> -
> -static const aarch64_sve_vec_issue_info neoversen2_sve_issue_info =
> -{
> -  {
> -    {
> -      3, /* loads_per_cycle  */
> -      2, /* stores_per_cycle  */
> -      2, /* general_ops_per_cycle  */
> -      0, /* fp_simd_load_general_ops  */
> -      1 /* fp_simd_store_general_ops  */
> -    },
> -    2, /* ld2_st2_general_ops  */
> -    3, /* ld3_st3_general_ops  */
> -    3 /* ld4_st4_general_ops  */
> -  },
> -  2, /* pred_ops_per_cycle  */
> -  2, /* while_pred_ops  */
> -  2, /* int_cmp_pred_ops  */
> -  1, /* fp_cmp_pred_ops  */
> -  1, /* gather_scatter_pair_general_ops  */
> -  1 /* gather_scatter_pair_pred_ops  */
> -};
> -
> -static const aarch64_vec_issue_info neoversen2_vec_issue_info =
> -{
> -  &neoversen2_scalar_issue_info,
> -  &neoversen2_advsimd_issue_info,
> -  &neoversen2_sve_issue_info
> -};
> -
> -/* Neoverse N2 costs for vector insn classes.  */
> -static const struct cpu_vector_cost neoversen2_vector_cost =
> -{
> -  1, /* scalar_int_stmt_cost  */
> -  2, /* scalar_fp_stmt_cost  */
> -  4, /* scalar_load_cost  */
> -  1, /* scalar_store_cost  */
> -  1, /* cond_taken_branch_cost  */
> -  1, /* cond_not_taken_branch_cost  */
> -  &neoversen2_advsimd_vector_cost, /* advsimd  */
> -  &neoversen2_sve_vector_cost, /* sve  */
> -  &neoversen2_vec_issue_info /* issue_info  */
> -};
> -
> -static const struct tune_params neoversen2_tunings =
> -{
> -  &cortexa76_extra_costs,
> -  &neoversen2_addrcost_table,
> -  &neoversen2_regmove_cost,
> -  &neoversen2_vector_cost,
> -  &generic_branch_cost,
> -  &generic_approx_modes,
> -  SVE_128, /* sve_width  */
> -  { 4, /* load_int.  */
> -    1, /* store_int.  */
> -    6, /* load_fp.  */
> -    2, /* store_fp.  */
> -    6, /* load_pred.  */
> -    1 /* store_pred.  */
> -  }, /* memmov_cost.  */
> -  3, /* issue_rate  */
> -  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops  */
> -  "32:16",	/* function_align.  */
> -  "4",		/* jump_align.  */
> -  "32:16",	/* loop_align.  */
> -  2,	/* int_reassoc_width.  */
> -  4,	/* fp_reassoc_width.  */
> -  1,	/* fma_reassoc_width.  */
> -  2,	/* vec_reassoc_width.  */
> -  2,	/* min_div_recip_mul_sf.  */
> -  2,	/* min_div_recip_mul_df.  */
> -  0,	/* max_case_values.  */
> -  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
> -  (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND
> -   | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
> -   | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
> -   | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT),	/* tune_flags.  */
> -  &generic_prefetch_tune,
> -  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
> -  AARCH64_LDP_STP_POLICY_ALWAYS	   /* stp_policy_model.  */
> -};
> -
> -static const advsimd_vec_cost neoversev2_advsimd_vector_cost =
> -{
> -  2, /* int_stmt_cost  */
> -  2, /* fp_stmt_cost  */
> -  2, /* ld2_st2_permute_cost */
> -  2, /* ld3_st3_permute_cost  */
> -  3, /* ld4_st4_permute_cost  */
> -  3, /* permute_cost  */
> -  4, /* reduc_i8_cost  */
> -  4, /* reduc_i16_cost  */
> -  2, /* reduc_i32_cost  */
> -  2, /* reduc_i64_cost  */
> -  6, /* reduc_f16_cost  */
> -  3, /* reduc_f32_cost  */
> -  2, /* reduc_f64_cost  */
> -  2, /* store_elt_extra_cost  */
> -  /* This value is just inherited from the Cortex-A57 table.  */
> -  8, /* vec_to_scalar_cost  */
> -  /* This depends very much on what the scalar value is and
> -     where it comes from.  E.g. some constants take two dependent
> -     instructions or a load, while others might be moved from a GPR.
> -     4 seems to be a reasonable compromise in practice.  */
> -  4, /* scalar_to_vec_cost  */
> -  4, /* align_load_cost  */
> -  4, /* unalign_load_cost  */
> -  /* Although stores have a latency of 2 and compete for the
> -     vector pipes, in practice it's better not to model that.  */
> -  1, /* unalign_store_cost  */
> -  1  /* store_cost  */
> -};
> -
> -static const sve_vec_cost neoversev2_sve_vector_cost =
> -{
> -  {
> -    2, /* int_stmt_cost  */
> -    2, /* fp_stmt_cost  */
> -    3, /* ld2_st2_permute_cost  */
> -    3, /* ld3_st3_permute_cost  */
> -    4, /* ld4_st4_permute_cost  */
> -    3, /* permute_cost  */
> -    /* Theoretically, a reduction involving 15 scalar ADDs could
> -       complete in ~3 cycles and would have a cost of 15.  [SU]ADDV
> -       completes in 11 cycles, so give it a cost of 15 + 8.  */
> -    21, /* reduc_i8_cost  */
> -    /* Likewise for 7 scalar ADDs (~2 cycles) vs. 9: 7 + 7.  */
> -    14, /* reduc_i16_cost  */
> -    /* Likewise for 3 scalar ADDs (~2 cycles) vs. 8: 3 + 4.  */
> -    7, /* reduc_i32_cost  */
> -    /* Likewise for 1 scalar ADD (~1 cycles) vs. 2: 1 + 1.  */
> -    2, /* reduc_i64_cost  */
> -    /* Theoretically, a reduction involving 7 scalar FADDs could
> -       complete in ~6 cycles and would have a cost of 14.  FADDV
> -       completes in 8 cycles, so give it a cost of 14 + 2.  */
> -    16, /* reduc_f16_cost  */
> -    /* Likewise for 3 scalar FADDs (~4 cycles) vs. 6: 6 + 2.  */
> -    8, /* reduc_f32_cost  */
> -    /* Likewise for 1 scalar FADD (~2 cycles) vs. 4: 2 + 2.  */
> -    4, /* reduc_f64_cost  */
> -    2, /* store_elt_extra_cost  */
> -    /* This value is just inherited from the Cortex-A57 table.  */
> -    8, /* vec_to_scalar_cost  */
> -    /* See the comment above the Advanced SIMD versions.  */
> -    4, /* scalar_to_vec_cost  */
> -    4, /* align_load_cost  */
> -    4, /* unalign_load_cost  */
> -    /* Although stores have a latency of 2 and compete for the
> -       vector pipes, in practice it's better not to model that.  */
> -    1, /* unalign_store_cost  */
> -    1  /* store_cost  */
> -  },
> -  3, /* clast_cost  */
> -  10, /* fadda_f16_cost  */
> -  6, /* fadda_f32_cost  */
> -  4, /* fadda_f64_cost  */
> -  /* A strided Advanced SIMD x64 load would take two parallel FP loads
> -     (8 cycles) plus an insertion (2 cycles).  Assume a 64-bit SVE gather
> -     is 1 cycle more.  The Advanced SIMD version is costed as 2 scalar loads
> -     (cost 8) and a vec_construct (cost 2).  Add a full vector operation
> -     (cost 2) to that, to avoid the difference being lost in rounding.
> -
> -     There is no easy comparison between a strided Advanced SIMD x32 load
> -     and an SVE 32-bit gather, but cost an SVE 32-bit gather as 1 vector
> -     operation more than a 64-bit gather.  */
> -  14, /* gather_load_x32_cost  */
> -  12, /* gather_load_x64_cost  */
> -  3 /* scatter_store_elt_cost  */
> -};
> -
> -static const aarch64_scalar_vec_issue_info neoversev2_scalar_issue_info =
> -{
> -  3, /* loads_stores_per_cycle  */
> -  2, /* stores_per_cycle  */
> -  6, /* general_ops_per_cycle  */
> -  0, /* fp_simd_load_general_ops  */
> -  1 /* fp_simd_store_general_ops  */
> -};
> -
> -static const aarch64_advsimd_vec_issue_info neoversev2_advsimd_issue_info =
> -{
> -  {
> -    3, /* loads_stores_per_cycle  */
> -    2, /* stores_per_cycle  */
> -    4, /* general_ops_per_cycle  */
> -    0, /* fp_simd_load_general_ops  */
> -    1 /* fp_simd_store_general_ops  */
> -  },
> -  2, /* ld2_st2_general_ops  */
> -  2, /* ld3_st3_general_ops  */
> -  3 /* ld4_st4_general_ops  */
> -};
> -
> -static const aarch64_sve_vec_issue_info neoversev2_sve_issue_info =
> -{
> -  {
> -    {
> -      3, /* loads_per_cycle  */
> -      2, /* stores_per_cycle  */
> -      4, /* general_ops_per_cycle  */
> -      0, /* fp_simd_load_general_ops  */
> -      1 /* fp_simd_store_general_ops  */
> -    },
> -    2, /* ld2_st2_general_ops  */
> -    3, /* ld3_st3_general_ops  */
> -    3 /* ld4_st4_general_ops  */
> -  },
> -  2, /* pred_ops_per_cycle  */
> -  2, /* while_pred_ops  */
> -  2, /* int_cmp_pred_ops  */
> -  1, /* fp_cmp_pred_ops  */
> -  1, /* gather_scatter_pair_general_ops  */
> -  1 /* gather_scatter_pair_pred_ops  */
> -};
> -
> -static const aarch64_vec_issue_info neoversev2_vec_issue_info =
> -{
> -  &neoversev2_scalar_issue_info,
> -  &neoversev2_advsimd_issue_info,
> -  &neoversev2_sve_issue_info
> -};
> -
> -/* Demeter costs for vector insn classes.  */
> -static const struct cpu_vector_cost neoversev2_vector_cost =
> -{
> -  1, /* scalar_int_stmt_cost  */
> -  2, /* scalar_fp_stmt_cost  */
> -  4, /* scalar_load_cost  */
> -  1, /* scalar_store_cost  */
> -  1, /* cond_taken_branch_cost  */
> -  1, /* cond_not_taken_branch_cost  */
> -  &neoversev2_advsimd_vector_cost, /* advsimd  */
> -  &neoversev2_sve_vector_cost, /* sve  */
> -  &neoversev2_vec_issue_info /* issue_info  */
> -};
> -
> -static const struct tune_params neoversev2_tunings =
> -{
> -  &cortexa76_extra_costs,
> -  &neoversev2_addrcost_table,
> -  &neoversev2_regmove_cost,
> -  &neoversev2_vector_cost,
> -  &generic_branch_cost,
> -  &generic_approx_modes,
> -  SVE_128, /* sve_width  */
> -  { 4, /* load_int.  */
> -    2, /* store_int.  */
> -    6, /* load_fp.  */
> -    1, /* store_fp.  */
> -    6, /* load_pred.  */
> -    2 /* store_pred.  */
> -  }, /* memmov_cost.  */
> -  5, /* issue_rate  */
> -  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops  */
> -  "32:16",	/* function_align.  */
> -  "4",		/* jump_align.  */
> -  "32:16",	/* loop_align.  */
> -  3,	/* int_reassoc_width.  */
> -  6,	/* fp_reassoc_width.  */
> -  4,	/* fma_reassoc_width.  */
> -  3,	/* vec_reassoc_width.  */
> -  2,	/* min_div_recip_mul_sf.  */
> -  2,	/* min_div_recip_mul_df.  */
> -  0,	/* max_case_values.  */
> -  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
> -  (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND
> -   | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
> -   | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
> -   | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT),	/* tune_flags.  */
> -  &generic_prefetch_tune,
> -  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
> -  AARCH64_LDP_STP_POLICY_ALWAYS	   /* stp_policy_model.  */
> -};
> -
> -static const struct tune_params a64fx_tunings =
> -{
> -  &a64fx_extra_costs,
> -  &a64fx_addrcost_table,
> -  &a64fx_regmove_cost,
> -  &a64fx_vector_cost,
> -  &generic_branch_cost,
> -  &generic_approx_modes,
> -  SVE_512, /* sve_width  */
> -  { 4, /* load_int.  */
> -    4, /* store_int.  */
> -    4, /* load_fp.  */
> -    4, /* store_fp.  */
> -    4, /* load_pred.  */
> -    4 /* store_pred.  */
> -  }, /* memmov_cost.  */
> -  7, /* issue_rate  */
> -  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops  */
> -  "32",	/* function_align.  */
> -  "16",	/* jump_align.  */
> -  "32",	/* loop_align.  */
> -  4,	/* int_reassoc_width.  */
> -  2,	/* fp_reassoc_width.  */
> -  1,	/* fma_reassoc_width.  */
> -  2,	/* vec_reassoc_width.  */
> -  2,	/* min_div_recip_mul_sf.  */
> -  2,	/* min_div_recip_mul_df.  */
> -  0,	/* max_case_values.  */
> -  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
> -  (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
> -  &a64fx_prefetch_tune,
> -  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
> -  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
> -};
> +#include "tuning_models/generic.h"
> +#include "tuning_models/cortexa35.h"
> +#include "tuning_models/cortexa53.h"
> +#include "tuning_models/cortexa57.h"
> +#include "tuning_models/cortexa72.h"
> +#include "tuning_models/cortexa73.h"
> +#include "tuning_models/exynosm1.h"
> +#include "tuning_models/thunderxt88.h"
> +#include "tuning_models/thunderx.h"
> +#include "tuning_models/tsv110.h"
> +#include "tuning_models/xgene1.h"
> +#include "tuning_models/emag.h"
> +#include "tuning_models/qdf24xx.h"
> +#include "tuning_models/saphira.h"
> +#include "tuning_models/thunderx2t99.h"
> +#include "tuning_models/thunderx3t110.h"
> +#include "tuning_models/neoversen1.h"
> +#include "tuning_models/ampere1.h"
> +#include "tuning_models/ampere1a.h"
> +#include "tuning_models/neoversev1.h"
> +#include "tuning_models/neoverse512tvb.h"
> +#include "tuning_models/neoversen2.h"
> +#include "tuning_models/neoversev2.h"
> +#include "tuning_models/a64fx.h"
>   
>   /* Support for fine-grained override of the tuning structures.  */
>   struct aarch64_tuning_override_function
> diff --git a/gcc/config/aarch64/tuning_models/a64fx.h b/gcc/config/aarch64/tuning_models/a64fx.h
> new file mode 100644
> index 0000000000000000000000000000000000000000..7b06c27eba1e4de01738bdfdc077460f9135fb41
> --- /dev/null
> +++ b/gcc/config/aarch64/tuning_models/a64fx.h
> @@ -0,0 +1,169 @@
> +/* Tuning model description for AArch64 architecture.
> +   Copyright (C) 2009-2023 Free Software Foundation, Inc.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3, or (at your option)
> +   any later version.
> +
> +   GCC is distributed in the hope that it will be useful, but
> +   WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with GCC; see the file COPYING3.  If not see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#ifndef GCC_AARCH64_H_A64FX
> +#define GCC_AARCH64_H_A64FX
> +
> +#include "generic.h"
> +
> +static const struct cpu_addrcost_table a64fx_addrcost_table =
> +{
> +    {
> +      1, /* hi  */
> +      1, /* si  */
> +      1, /* di  */
> +      2, /* ti  */
> +    },
> +  0, /* pre_modify  */
> +  0, /* post_modify  */
> +  0, /* post_modify_ld3_st3  */
> +  0, /* post_modify_ld4_st4  */
> +  2, /* register_offset  */
> +  3, /* register_sextend  */
> +  3, /* register_zextend  */
> +  0, /* imm_offset  */
> +};
> +
> +static const struct cpu_regmove_cost a64fx_regmove_cost =
> +{
> +  1, /* GP2GP  */
> +  /* Avoid the use of slow int<->fp moves for spilling by setting
> +     their cost higher than memmov_cost.  */
> +  5, /* GP2FP  */
> +  7, /* FP2GP  */
> +  2 /* FP2FP  */
> +};
> +
> +static const advsimd_vec_cost a64fx_advsimd_vector_cost =
> +{
> +  2, /* int_stmt_cost  */
> +  5, /* fp_stmt_cost  */
> +  0, /* ld2_st2_permute_cost  */
> +  0, /* ld3_st3_permute_cost  */
> +  0, /* ld4_st4_permute_cost  */
> +  3, /* permute_cost  */
> +  13, /* reduc_i8_cost  */
> +  13, /* reduc_i16_cost  */
> +  13, /* reduc_i32_cost  */
> +  13, /* reduc_i64_cost  */
> +  13, /* reduc_f16_cost  */
> +  13, /* reduc_f32_cost  */
> +  13, /* reduc_f64_cost  */
> +  13, /* store_elt_extra_cost  */
> +  13, /* vec_to_scalar_cost  */
> +  4, /* scalar_to_vec_cost  */
> +  6, /* align_load_cost  */
> +  6, /* unalign_load_cost  */
> +  1, /* unalign_store_cost  */
> +  1  /* store_cost  */
> +};
> +
> +static const sve_vec_cost a64fx_sve_vector_cost =
> +{
> +  {
> +    2, /* int_stmt_cost  */
> +    5, /* fp_stmt_cost  */
> +    0, /* ld2_st2_permute_cost  */
> +    0, /* ld3_st3_permute_cost  */
> +    0, /* ld4_st4_permute_cost  */
> +    3, /* permute_cost  */
> +    13, /* reduc_i8_cost  */
> +    13, /* reduc_i16_cost  */
> +    13, /* reduc_i32_cost  */
> +    13, /* reduc_i64_cost  */
> +    13, /* reduc_f16_cost  */
> +    13, /* reduc_f32_cost  */
> +    13, /* reduc_f64_cost  */
> +    13, /* store_elt_extra_cost  */
> +    13, /* vec_to_scalar_cost  */
> +    4, /* scalar_to_vec_cost  */
> +    6, /* align_load_cost  */
> +    6, /* unalign_load_cost  */
> +    1, /* unalign_store_cost  */
> +    1  /* store_cost  */
> +  },
> +  13, /* clast_cost  */
> +  13, /* fadda_f16_cost  */
> +  13, /* fadda_f32_cost  */
> +  13, /* fadda_f64_cost  */
> +  64, /* gather_load_x32_cost  */
> +  32, /* gather_load_x64_cost  */
> +  1 /* scatter_store_elt_cost  */
> +};
> +
> +static const struct cpu_vector_cost a64fx_vector_cost =
> +{
> +  1, /* scalar_int_stmt_cost  */
> +  5, /* scalar_fp_stmt_cost  */
> +  4, /* scalar_load_cost  */
> +  1, /* scalar_store_cost  */
> +  3, /* cond_taken_branch_cost  */
> +  1, /* cond_not_taken_branch_cost  */
> +  &a64fx_advsimd_vector_cost, /* advsimd  */
> +  &a64fx_sve_vector_cost, /* sve  */
> +  nullptr /* issue_info  */
> +};
> +
> +static const cpu_prefetch_tune a64fx_prefetch_tune =
> +{
> +  8,			/* num_slots  */
> +  64,			/* l1_cache_size  */
> +  256,			/* l1_cache_line_size  */
> +  32768,		/* l2_cache_size  */
> +  true,			/* prefetch_dynamic_strides */
> +  -1,			/* minimum_stride */
> +  -1			/* default_opt_level  */
> +};
> +
> +static const struct tune_params a64fx_tunings =
> +{
> +  &a64fx_extra_costs,
> +  &a64fx_addrcost_table,
> +  &a64fx_regmove_cost,
> +  &a64fx_vector_cost,
> +  &generic_branch_cost,
> +  &generic_approx_modes,
> +  SVE_512, /* sve_width  */
> +  { 4, /* load_int.  */
> +    4, /* store_int.  */
> +    4, /* load_fp.  */
> +    4, /* store_fp.  */
> +    4, /* load_pred.  */
> +    4 /* store_pred.  */
> +  }, /* memmov_cost.  */
> +  7, /* issue_rate  */
> +  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops  */
> +  "32",	/* function_align.  */
> +  "16",	/* jump_align.  */
> +  "32",	/* loop_align.  */
> +  4,	/* int_reassoc_width.  */
> +  2,	/* fp_reassoc_width.  */
> +  1,	/* fma_reassoc_width.  */
> +  2,	/* vec_reassoc_width.  */
> +  2,	/* min_div_recip_mul_sf.  */
> +  2,	/* min_div_recip_mul_df.  */
> +  0,	/* max_case_values.  */
> +  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
> +  (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
> +  &a64fx_prefetch_tune,
> +  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
> +  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
> +};
> +
> +#endif /* GCC_AARCH64_H_A64FX.  */
> diff --git a/gcc/config/aarch64/tuning_models/ampere1.h b/gcc/config/aarch64/tuning_models/ampere1.h
> new file mode 100644
> index 0000000000000000000000000000000000000000..8d2a1c696103259f23cf73df26cef9d4fa05ac73
> --- /dev/null
> +++ b/gcc/config/aarch64/tuning_models/ampere1.h
> @@ -0,0 +1,113 @@
> +/* Tuning model description for AArch64 architecture.
> +   Copyright (C) 2009-2023 Free Software Foundation, Inc.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3, or (at your option)
> +   any later version.
> +
> +   GCC is distributed in the hope that it will be useful, but
> +   WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with GCC; see the file COPYING3.  If not see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#ifndef GCC_AARCH64_H_AMPERE1
> +#define GCC_AARCH64_H_AMPERE1
> +
> +#include "generic.h"
> +
> +static const advsimd_vec_cost ampere1_advsimd_vector_cost =
> +{
> +  1, /* int_stmt_cost  */
> +  3, /* fp_stmt_cost  */
> +  0, /* ld2_st2_permute_cost  */
> +  0, /* ld3_st3_permute_cost  */
> +  0, /* ld4_st4_permute_cost  */
> +  2, /* permute_cost  */
> +  12, /* reduc_i8_cost  */
> +  9, /* reduc_i16_cost  */
> +  6, /* reduc_i32_cost  */
> +  5, /* reduc_i64_cost  */
> +  9, /* reduc_f16_cost  */
> +  6, /* reduc_f32_cost  */
> +  5, /* reduc_f64_cost  */
> +  8, /* store_elt_extra_cost  */
> +  6, /* vec_to_scalar_cost  */
> +  7, /* scalar_to_vec_cost  */
> +  4, /* align_load_cost  */
> +  4, /* unalign_load_cost  */
> +  1, /* unalign_store_cost  */
> +  1  /* store_cost  */
> +};
> +
> +/* Ampere-1 costs for vector insn classes.  */
> +static const struct cpu_vector_cost ampere1_vector_cost =
> +{
> +  1, /* scalar_int_stmt_cost  */
> +  3, /* scalar_fp_stmt_cost  */
> +  4, /* scalar_load_cost  */
> +  1, /* scalar_store_cost  */
> +  1, /* cond_taken_branch_cost  */
> +  1, /* cond_not_taken_branch_cost  */
> +  &ampere1_advsimd_vector_cost, /* advsimd  */
> +  nullptr, /* sve  */
> +  nullptr  /* issue_info  */
> +};
> +
> +static const cpu_prefetch_tune ampere1_prefetch_tune =
> +{
> +  0,			/* num_slots  */
> +  64,			/* l1_cache_size  */
> +  64,			/* l1_cache_line_size  */
> +  2048,			/* l2_cache_size  */
> +  true,			/* prefetch_dynamic_strides */
> +  -1,			/* minimum_stride */
> +  -1			/* default_opt_level  */
> +};
> +
> +static const struct tune_params ampere1_tunings =
> +{
> +  &ampere1_extra_costs,
> +  &generic_addrcost_table,
> +  &generic_regmove_cost,
> +  &ampere1_vector_cost,
> +  &generic_branch_cost,
> +  &generic_approx_modes,
> +  SVE_NOT_IMPLEMENTED, /* sve_width  */
> +  { 4, /* load_int.  */
> +    4, /* store_int.  */
> +    4, /* load_fp.  */
> +    4, /* store_fp.  */
> +    4, /* load_pred.  */
> +    4 /* store_pred.  */
> +  }, /* memmov_cost.  */
> +  4, /* issue_rate  */
> +  (AARCH64_FUSE_ADRP_ADD | AARCH64_FUSE_AES_AESMC |
> +   AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_MOVK_MOVK |
> +   AARCH64_FUSE_ALU_BRANCH /* adds, ands, bics, ccmp, ccmn */ |
> +   AARCH64_FUSE_CMP_BRANCH),
> +  /* fusible_ops  */
> +  "32",		/* function_align.  */
> +  "4",		/* jump_align.  */
> +  "32:16",	/* loop_align.  */
> +  2,	/* int_reassoc_width.  */
> +  4,	/* fp_reassoc_width.  */
> +  4,	/* fma_reassoc_width.  */
> +  2,	/* vec_reassoc_width.  */
> +  2,	/* min_div_recip_mul_sf.  */
> +  2,	/* min_div_recip_mul_df.  */
> +  0,	/* max_case_values.  */
> +  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
> +  (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
> +  &ampere1_prefetch_tune,
> +  AARCH64_LDP_STP_POLICY_ALIGNED,   /* ldp_policy_model.  */
> +  AARCH64_LDP_STP_POLICY_ALIGNED    /* stp_policy_model.  */
> +};
> +
> +#endif /* GCC_AARCH64_H_AMPERE1.  */
> diff --git a/gcc/config/aarch64/tuning_models/ampere1a.h b/gcc/config/aarch64/tuning_models/ampere1a.h
> new file mode 100644
> index 0000000000000000000000000000000000000000..c419ffb3c1a936a01690ad157c6c71dc645273c8
> --- /dev/null
> +++ b/gcc/config/aarch64/tuning_models/ampere1a.h
> @@ -0,0 +1,65 @@
> +/* Tuning model description for AArch64 architecture.
> +   Copyright (C) 2009-2023 Free Software Foundation, Inc.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3, or (at your option)
> +   any later version.
> +
> +   GCC is distributed in the hope that it will be useful, but
> +   WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with GCC; see the file COPYING3.  If not see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#ifndef GCC_AARCH64_H_AMPERE1A
> +#define GCC_AARCH64_H_AMPERE1A
> +
> +#include "generic.h"
> +
> +static const struct tune_params ampere1a_tunings =
> +{
> +  &ampere1a_extra_costs,
> +  &generic_addrcost_table,
> +  &generic_regmove_cost,
> +  &ampere1_vector_cost,
> +  &generic_branch_cost,
> +  &generic_approx_modes,
> +  SVE_NOT_IMPLEMENTED, /* sve_width  */
> +  { 4, /* load_int.  */
> +    4, /* store_int.  */
> +    4, /* load_fp.  */
> +    4, /* store_fp.  */
> +    4, /* load_pred.  */
> +    4 /* store_pred.  */
> +  }, /* memmov_cost.  */
> +  4, /* issue_rate  */
> +  (AARCH64_FUSE_ADRP_ADD | AARCH64_FUSE_AES_AESMC |
> +   AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_MOVK_MOVK |
> +   AARCH64_FUSE_ALU_BRANCH /* adds, ands, bics, ccmp, ccmn */ |
> +   AARCH64_FUSE_CMP_BRANCH | AARCH64_FUSE_ALU_CBZ |
> +   AARCH64_FUSE_ADDSUB_2REG_CONST1),
> +  /* fusible_ops  */
> +  "32",		/* function_align.  */
> +  "4",		/* jump_align.  */
> +  "32:16",	/* loop_align.  */
> +  2,	/* int_reassoc_width.  */
> +  4,	/* fp_reassoc_width.  */
> +  1,	/* fma_reassoc_width.  */
> +  2,	/* vec_reassoc_width.  */
> +  2,	/* min_div_recip_mul_sf.  */
> +  2,	/* min_div_recip_mul_df.  */
> +  0,	/* max_case_values.  */
> +  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
> +  (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
> +  &ampere1_prefetch_tune,
> +  AARCH64_LDP_STP_POLICY_ALIGNED,   /* ldp_policy_model.  */
> +  AARCH64_LDP_STP_POLICY_ALIGNED    /* stp_policy_model.  */
> +};
> +
> +#endif /* GCC_AARCH64_H_AMPERE1A.  */
> diff --git a/gcc/config/aarch64/tuning_models/cortexa35.h b/gcc/config/aarch64/tuning_models/cortexa35.h
> new file mode 100644
> index 0000000000000000000000000000000000000000..5534335348db96cc57fc9eccd7ff79a624cb528a
> --- /dev/null
> +++ b/gcc/config/aarch64/tuning_models/cortexa35.h
> @@ -0,0 +1,62 @@
> +/* Tuning model description for AArch64 architecture.
> +   Copyright (C) 2009-2023 Free Software Foundation, Inc.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3, or (at your option)
> +   any later version.
> +
> +   GCC is distributed in the hope that it will be useful, but
> +   WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with GCC; see the file COPYING3.  If not see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#ifndef GCC_AARCH64_H_CORTEXA35
> +#define GCC_AARCH64_H_CORTEXA35
> +
> +#include "generic.h"
> +#include "cortexa53.h"
> +
> +static const struct tune_params cortexa35_tunings =
> +{
> +  &cortexa53_extra_costs,
> +  &generic_addrcost_table,
> +  &cortexa53_regmove_cost,
> +  &generic_vector_cost,
> +  &generic_branch_cost,
> +  &generic_approx_modes,
> +  SVE_NOT_IMPLEMENTED, /* sve_width  */
> +  { 4, /* load_int.  */
> +    4, /* store_int.  */
> +    4, /* load_fp.  */
> +    4, /* store_fp.  */
> +    4, /* load_pred.  */
> +    4 /* store_pred.  */
> +  }, /* memmov_cost.  */
> +  1, /* issue_rate  */
> +  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
> +   | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops  */
> +  "16",	/* function_align.  */
> +  "4",	/* jump_align.  */
> +  "8",	/* loop_align.  */
> +  2,	/* int_reassoc_width.  */
> +  4,	/* fp_reassoc_width.  */
> +  1,	/* fma_reassoc_width.  */
> +  1,	/* vec_reassoc_width.  */
> +  2,	/* min_div_recip_mul_sf.  */
> +  2,	/* min_div_recip_mul_df.  */
> +  0,	/* max_case_values.  */
> +  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
> +  (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
> +  &generic_prefetch_tune,
> +  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
> +  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
> +};
> +
> +#endif /* GCC_AARCH64_H_CORTEXA35.  */
> diff --git a/gcc/config/aarch64/tuning_models/cortexa53.h b/gcc/config/aarch64/tuning_models/cortexa53.h
> new file mode 100644
> index 0000000000000000000000000000000000000000..9dfdccc5968e7f062af5c78f153bfe3838263b0a
> --- /dev/null
> +++ b/gcc/config/aarch64/tuning_models/cortexa53.h
> @@ -0,0 +1,71 @@
> +/* Tuning model description for AArch64 architecture.
> +   Copyright (C) 2009-2023 Free Software Foundation, Inc.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3, or (at your option)
> +   any later version.
> +
> +   GCC is distributed in the hope that it will be useful, but
> +   WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with GCC; see the file COPYING3.  If not see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#ifndef GCC_AARCH64_H_CORTEXA53
> +#define GCC_AARCH64_H_CORTEXA53
> +
> +#include "generic.h"
> +
> +static const struct cpu_regmove_cost cortexa53_regmove_cost =
> +{
> +  1, /* GP2GP  */
> +  /* Avoid the use of slow int<->fp moves for spilling by setting
> +     their cost higher than memmov_cost.  */
> +  5, /* GP2FP  */
> +  5, /* FP2GP  */
> +  2 /* FP2FP  */
> +};
> +
> +static const struct tune_params cortexa53_tunings =
> +{
> +  &cortexa53_extra_costs,
> +  &generic_addrcost_table,
> +  &cortexa53_regmove_cost,
> +  &generic_vector_cost,
> +  &generic_branch_cost,
> +  &generic_approx_modes,
> +  SVE_NOT_IMPLEMENTED, /* sve_width  */
> +  { 4, /* load_int.  */
> +    4, /* store_int.  */
> +    4, /* load_fp.  */
> +    4, /* store_fp.  */
> +    4, /* load_pred.  */
> +    4 /* store_pred.  */
> +  }, /* memmov_cost.  */
> +  2, /* issue_rate  */
> +  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
> +   | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops  */
> +  "16",	/* function_align.  */
> +  "4",	/* jump_align.  */
> +  "8",	/* loop_align.  */
> +  2,	/* int_reassoc_width.  */
> +  4,	/* fp_reassoc_width.  */
> +  1,	/* fma_reassoc_width.  */
> +  1,	/* vec_reassoc_width.  */
> +  2,	/* min_div_recip_mul_sf.  */
> +  2,	/* min_div_recip_mul_df.  */
> +  0,	/* max_case_values.  */
> +  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
> +  (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
> +  &generic_prefetch_tune,
> +  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
> +  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
> +};
> +
> +#endif /* GCC_AARCH64_H_CORTEXA53.  */
> diff --git a/gcc/config/aarch64/tuning_models/cortexa57.h b/gcc/config/aarch64/tuning_models/cortexa57.h
> new file mode 100644
> index 0000000000000000000000000000000000000000..9c4789d57833a5879dda8e2fe454ac5f56cb0601
> --- /dev/null
> +++ b/gcc/config/aarch64/tuning_models/cortexa57.h
> @@ -0,0 +1,109 @@
> +/* Tuning model description for AArch64 architecture.
> +   Copyright (C) 2009-2023 Free Software Foundation, Inc.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3, or (at your option)
> +   any later version.
> +
> +   GCC is distributed in the hope that it will be useful, but
> +   WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with GCC; see the file COPYING3.  If not see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#ifndef GCC_AARCH64_H_CORTEXA57
> +#define GCC_AARCH64_H_CORTEXA57
> +
> +#include "generic.h"
> +
> +static const struct cpu_regmove_cost cortexa57_regmove_cost =
> +{
> +  1, /* GP2GP  */
> +  /* Avoid the use of slow int<->fp moves for spilling by setting
> +     their cost higher than memmov_cost.  */
> +  5, /* GP2FP  */
> +  5, /* FP2GP  */
> +  2 /* FP2FP  */
> +};
> +
> +static const advsimd_vec_cost cortexa57_advsimd_vector_cost =
> +{
> +  2, /* int_stmt_cost  */
> +  2, /* fp_stmt_cost  */
> +  0, /* ld2_st2_permute_cost  */
> +  0, /* ld3_st3_permute_cost  */
> +  0, /* ld4_st4_permute_cost  */
> +  3, /* permute_cost  */
> +  8, /* reduc_i8_cost  */
> +  8, /* reduc_i16_cost  */
> +  8, /* reduc_i32_cost  */
> +  8, /* reduc_i64_cost  */
> +  8, /* reduc_f16_cost  */
> +  8, /* reduc_f32_cost  */
> +  8, /* reduc_f64_cost  */
> +  8, /* store_elt_extra_cost  */
> +  8, /* vec_to_scalar_cost  */
> +  8, /* scalar_to_vec_cost  */
> +  4, /* align_load_cost  */
> +  4, /* unalign_load_cost  */
> +  1, /* unalign_store_cost  */
> +  1  /* store_cost  */
> +};
> +
> +/* Cortex-A57 costs for vector insn classes.  */
> +static const struct cpu_vector_cost cortexa57_vector_cost =
> +{
> +  1, /* scalar_int_stmt_cost  */
> +  1, /* scalar_fp_stmt_cost  */
> +  4, /* scalar_load_cost  */
> +  1, /* scalar_store_cost  */
> +  1, /* cond_taken_branch_cost  */
> +  1, /* cond_not_taken_branch_cost  */
> +  &cortexa57_advsimd_vector_cost, /* advsimd  */
> +  nullptr, /* sve  */
> +  nullptr /* issue_info  */
> +};
> +
> +static const struct tune_params cortexa57_tunings =
> +{
> +  &cortexa57_extra_costs,
> +  &generic_addrcost_table,
> +  &cortexa57_regmove_cost,
> +  &cortexa57_vector_cost,
> +  &generic_branch_cost,
> +  &generic_approx_modes,
> +  SVE_NOT_IMPLEMENTED, /* sve_width  */
> +  { 4, /* load_int.  */
> +    4, /* store_int.  */
> +    4, /* load_fp.  */
> +    4, /* store_fp.  */
> +    4, /* load_pred.  */
> +    4 /* store_pred.  */
> +  }, /* memmov_cost.  */
> +  3, /* issue_rate  */
> +  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
> +   | AARCH64_FUSE_MOVK_MOVK), /* fusible_ops  */
> +  "16",	/* function_align.  */
> +  "4",	/* jump_align.  */
> +  "8",	/* loop_align.  */
> +  2,	/* int_reassoc_width.  */
> +  4,	/* fp_reassoc_width.  */
> +  1,	/* fma_reassoc_width.  */
> +  1,	/* vec_reassoc_width.  */
> +  2,	/* min_div_recip_mul_sf.  */
> +  2,	/* min_div_recip_mul_df.  */
> +  0,	/* max_case_values.  */
> +  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
> +  (AARCH64_EXTRA_TUNE_RENAME_FMA_REGS),	/* tune_flags.  */
> +  &generic_prefetch_tune,
> +  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
> +  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
> +};
> +
> +#endif /* GCC_AARCH64_H_CORTEXA57.  */
> diff --git a/gcc/config/aarch64/tuning_models/cortexa72.h b/gcc/config/aarch64/tuning_models/cortexa72.h
> new file mode 100644
> index 0000000000000000000000000000000000000000..968171c9b2e898d7479dbcb462e33fe3905e183d
> --- /dev/null
> +++ b/gcc/config/aarch64/tuning_models/cortexa72.h
> @@ -0,0 +1,61 @@
> +/* Tuning model description for AArch64 architecture.
> +   Copyright (C) 2009-2023 Free Software Foundation, Inc.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3, or (at your option)
> +   any later version.
> +
> +   GCC is distributed in the hope that it will be useful, but
> +   WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with GCC; see the file COPYING3.  If not see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#ifndef GCC_AARCH64_H_CORTEXA72
> +#define GCC_AARCH64_H_CORTEXA72
> +
> +#include "generic.h"
> +
> +static const struct tune_params cortexa72_tunings =
> +{
> +  &cortexa57_extra_costs,
> +  &generic_addrcost_table,
> +  &cortexa57_regmove_cost,
> +  &cortexa57_vector_cost,
> +  &generic_branch_cost,
> +  &generic_approx_modes,
> +  SVE_NOT_IMPLEMENTED, /* sve_width  */
> +  { 4, /* load_int.  */
> +    4, /* store_int.  */
> +    4, /* load_fp.  */
> +    4, /* store_fp.  */
> +    4, /* load_pred.  */
> +    4 /* store_pred.  */
> +  }, /* memmov_cost.  */
> +  3, /* issue_rate  */
> +  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
> +   | AARCH64_FUSE_MOVK_MOVK), /* fusible_ops  */
> +  "16",	/* function_align.  */
> +  "4",	/* jump_align.  */
> +  "8",	/* loop_align.  */
> +  2,	/* int_reassoc_width.  */
> +  4,	/* fp_reassoc_width.  */
> +  1,	/* fma_reassoc_width.  */
> +  1,	/* vec_reassoc_width.  */
> +  2,	/* min_div_recip_mul_sf.  */
> +  2,	/* min_div_recip_mul_df.  */
> +  0,	/* max_case_values.  */
> +  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
> +  (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
> +  &generic_prefetch_tune,
> +  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
> +  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
> +};
> +
> +#endif /* GCC_AARCH64_H_CORTEXA72.  */
> diff --git a/gcc/config/aarch64/tuning_models/cortexa73.h b/gcc/config/aarch64/tuning_models/cortexa73.h
> new file mode 100644
> index 0000000000000000000000000000000000000000..8d1a504ddac39604dd193ce0f434fd2f5145c129
> --- /dev/null
> +++ b/gcc/config/aarch64/tuning_models/cortexa73.h
> @@ -0,0 +1,62 @@
> +/* Tuning model description for AArch64 architecture.
> +   Copyright (C) 2009-2023 Free Software Foundation, Inc.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3, or (at your option)
> +   any later version.
> +
> +   GCC is distributed in the hope that it will be useful, but
> +   WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with GCC; see the file COPYING3.  If not see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#ifndef GCC_AARCH64_H_CORTEXA73
> +#define GCC_AARCH64_H_CORTEXA73
> +
> +#include "generic.h"
> +
> +static const struct tune_params cortexa73_tunings =
> +{
> +  &cortexa57_extra_costs,
> +  &generic_addrcost_table,
> +  &cortexa57_regmove_cost,
> +  &cortexa57_vector_cost,
> +  &generic_branch_cost,
> +  &generic_approx_modes,
> +  SVE_NOT_IMPLEMENTED, /* sve_width  */
> +  { 4, /* load_int.  */
> +    4, /* store_int.  */
> +    4, /* load_fp.  */
> +    4, /* store_fp.  */
> +    4, /* load_pred.  */
> +    4 /* store_pred.  */
> +  }, /* memmov_cost.  */
> +  2, /* issue_rate.  */
> +  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
> +   | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops  */
> +  "16",	/* function_align.  */
> +  "4",	/* jump_align.  */
> +  "8",	/* loop_align.  */
> +  2,	/* int_reassoc_width.  */
> +  4,	/* fp_reassoc_width.  */
> +  1,	/* fma_reassoc_width.  */
> +  1,	/* vec_reassoc_width.  */
> +  2,	/* min_div_recip_mul_sf.  */
> +  2,	/* min_div_recip_mul_df.  */
> +  0,	/* max_case_values.  */
> +  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
> +  (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
> +  &generic_prefetch_tune,
> +  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
> +  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
> +};
> +
> +
> +#endif /* GCC_AARCH64_H_CORTEXA73.  */
> diff --git a/gcc/config/aarch64/tuning_models/emag.h b/gcc/config/aarch64/tuning_models/emag.h
> new file mode 100644
> index 0000000000000000000000000000000000000000..3f3402c3fc2a94704eeaf9223ecb0ca1c057cace
> --- /dev/null
> +++ b/gcc/config/aarch64/tuning_models/emag.h
> @@ -0,0 +1,60 @@
> +/* Tuning model description for AArch64 architecture.
> +   Copyright (C) 2009-2023 Free Software Foundation, Inc.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3, or (at your option)
> +   any later version.
> +
> +   GCC is distributed in the hope that it will be useful, but
> +   WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with GCC; see the file COPYING3.  If not see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#ifndef GCC_AARCH64_H_EMAG
> +#define GCC_AARCH64_H_EMAG
> +
> +#include "generic.h"
> +
> +static const struct tune_params emag_tunings =
> +{
> +  &xgene1_extra_costs,
> +  &xgene1_addrcost_table,
> +  &xgene1_regmove_cost,
> +  &xgene1_vector_cost,
> +  &generic_branch_cost,
> +  &xgene1_approx_modes,
> +  SVE_NOT_IMPLEMENTED,
> +  { 6, /* load_int.  */
> +    6, /* store_int.  */
> +    6, /* load_fp.  */
> +    6, /* store_fp.  */
> +    6, /* load_pred.  */
> +    6 /* store_pred.  */
> +  }, /* memmov_cost.  */
> +  4, /* issue_rate  */
> +  AARCH64_FUSE_NOTHING, /* fusible_ops  */
> +  "16",	/* function_align.  */
> +  "16",	/* jump_align.  */
> +  "16",	/* loop_align.  */
> +  2,	/* int_reassoc_width.  */
> +  4,	/* fp_reassoc_width.  */
> +  1,	/* fma_reassoc_width.  */
> +  1,	/* vec_reassoc_width.  */
> +  2,	/* min_div_recip_mul_sf.  */
> +  2,	/* min_div_recip_mul_df.  */
> +  17,	/* max_case_values.  */
> +  tune_params::AUTOPREFETCHER_OFF,	/* autoprefetcher_model.  */
> +  (AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS),	/* tune_flags.  */
> +  &xgene1_prefetch_tune,
> +  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
> +  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
> +};
> +
> +#endif /* GCC_AARCH64_H_EMAG.  */
> diff --git a/gcc/config/aarch64/tuning_models/exynosm1.h b/gcc/config/aarch64/tuning_models/exynosm1.h
> new file mode 100644
> index 0000000000000000000000000000000000000000..a42ea4df97f3f048c41481c304fd3684a69d743b
> --- /dev/null
> +++ b/gcc/config/aarch64/tuning_models/exynosm1.h
> @@ -0,0 +1,144 @@
> +/* Tuning model description for AArch64 architecture.
> +   Copyright (C) 2009-2023 Free Software Foundation, Inc.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3, or (at your option)
> +   any later version.
> +
> +   GCC is distributed in the hope that it will be useful, but
> +   WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with GCC; see the file COPYING3.  If not see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#ifndef GCC_AARCH64_H_EXYNOSM1
> +#define GCC_AARCH64_H_EXYNOSM1
> +
> +#include "generic.h"
> +
> +static const struct cpu_addrcost_table exynosm1_addrcost_table =
> +{
> +    {
> +      0, /* hi  */
> +      0, /* si  */
> +      0, /* di  */
> +      2, /* ti  */
> +    },
> +  0, /* pre_modify  */
> +  0, /* post_modify  */
> +  0, /* post_modify_ld3_st3  */
> +  0, /* post_modify_ld4_st4  */
> +  1, /* register_offset  */
> +  1, /* register_sextend  */
> +  2, /* register_zextend  */
> +  0, /* imm_offset  */
> +};
> +
> +static const struct cpu_regmove_cost exynosm1_regmove_cost =
> +{
> +  1, /* GP2GP  */
> +  /* Avoid the use of slow int<->fp moves for spilling by setting
> +     their cost higher than memmov_cost (actual, 4 and 9).  */
> +  9, /* GP2FP  */
> +  9, /* FP2GP  */
> +  1 /* FP2FP  */
> +};
> +
> +static const advsimd_vec_cost exynosm1_advsimd_vector_cost =
> +{
> +  3, /* int_stmt_cost  */
> +  3, /* fp_stmt_cost  */
> +  0, /* ld2_st2_permute_cost  */
> +  0, /* ld3_st3_permute_cost  */
> +  0, /* ld4_st4_permute_cost  */
> +  3, /* permute_cost  */
> +  3, /* reduc_i8_cost  */
> +  3, /* reduc_i16_cost  */
> +  3, /* reduc_i32_cost  */
> +  3, /* reduc_i64_cost  */
> +  3, /* reduc_f16_cost  */
> +  3, /* reduc_f32_cost  */
> +  3, /* reduc_f64_cost  */
> +  3, /* store_elt_extra_cost  */
> +  3, /* vec_to_scalar_cost  */
> +  3, /* scalar_to_vec_cost  */
> +  5, /* align_load_cost  */
> +  5, /* unalign_load_cost  */
> +  1, /* unalign_store_cost  */
> +  1  /* store_cost  */
> +};
> +
> +static const struct cpu_vector_cost exynosm1_vector_cost =
> +{
> +  1, /* scalar_int_stmt_cost  */
> +  1, /* scalar_fp_stmt_cost  */
> +  5, /* scalar_load_cost  */
> +  1, /* scalar_store_cost  */
> +  1, /* cond_taken_branch_cost  */
> +  1, /* cond_not_taken_branch_cost  */
> +  &exynosm1_advsimd_vector_cost, /* advsimd  */
> +  nullptr, /* sve  */
> +  nullptr /* issue_info  */
> +};
> +
> +/* Approximation modes for Exynos M1.  */
> +static const cpu_approx_modes exynosm1_approx_modes =
> +{
> +  AARCH64_APPROX_NONE,	/* division  */
> +  AARCH64_APPROX_ALL,	/* sqrt  */
> +  AARCH64_APPROX_ALL	/* recip_sqrt  */
> +};
> +
> +static const cpu_prefetch_tune exynosm1_prefetch_tune =
> +{
> +  0,			/* num_slots  */
> +  -1,			/* l1_cache_size  */
> +  64,			/* l1_cache_line_size  */
> +  -1,			/* l2_cache_size  */
> +  true,			/* prefetch_dynamic_strides */
> +  -1,			/* minimum_stride */
> +  -1			/* default_opt_level  */
> +};
> +
> +static const struct tune_params exynosm1_tunings =
> +{
> +  &exynosm1_extra_costs,
> +  &exynosm1_addrcost_table,
> +  &exynosm1_regmove_cost,
> +  &exynosm1_vector_cost,
> +  &generic_branch_cost,
> +  &exynosm1_approx_modes,
> +  SVE_NOT_IMPLEMENTED, /* sve_width  */
> +  { 4, /* load_int.  */
> +    4, /* store_int.  */
> +    4, /* load_fp.  */
> +    4, /* store_fp.  */
> +    4, /* load_pred.  */
> +    4 /* store_pred.  */
> +  }, /* memmov_cost.  */
> +  3,	/* issue_rate  */
> +  (AARCH64_FUSE_AES_AESMC), /* fusible_ops  */
> +  "4",	/* function_align.  */
> +  "4",	/* jump_align.  */
> +  "4",	/* loop_align.  */
> +  2,	/* int_reassoc_width.  */
> +  4,	/* fp_reassoc_width.  */
> +  1,	/* fma_reassoc_width.  */
> +  1,	/* vec_reassoc_width.  */
> +  2,	/* min_div_recip_mul_sf.  */
> +  2,	/* min_div_recip_mul_df.  */
> +  48,	/* max_case_values.  */
> +  tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model.  */
> +  (AARCH64_EXTRA_TUNE_NONE), /* tune_flags.  */
> +  &exynosm1_prefetch_tune,
> +  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
> +  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
> +};
> +
> +#endif /* GCC_AARCH64_H_EXYNOSM1.  */
> diff --git a/gcc/config/aarch64/tuning_models/generic.h b/gcc/config/aarch64/tuning_models/generic.h
> new file mode 100644
> index 0000000000000000000000000000000000000000..deb2c1cffe255bddcb5be571b12086442782da60
> --- /dev/null
> +++ b/gcc/config/aarch64/tuning_models/generic.h
> @@ -0,0 +1,190 @@
> +/* Tuning model description for AArch64 architecture.
> +   Copyright (C) 2009-2023 Free Software Foundation, Inc.
> +   Contributed by ARM Ltd.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3, or (at your option)
> +   any later version.
> +
> +   GCC is distributed in the hope that it will be useful, but
> +   WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with GCC; see the file COPYING3.  If not see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#ifndef GCC_AARCH64_H_GENERIC
> +#define GCC_AARCH64_H_GENERIC
> +
> +static const struct cpu_addrcost_table generic_addrcost_table =
> +{
> +    {
> +      1, /* hi  */
> +      0, /* si  */
> +      0, /* di  */
> +      1, /* ti  */
> +    },
> +  0, /* pre_modify  */
> +  0, /* post_modify  */
> +  0, /* post_modify_ld3_st3  */
> +  0, /* post_modify_ld4_st4  */
> +  0, /* register_offset  */
> +  0, /* register_sextend  */
> +  0, /* register_zextend  */
> +  0 /* imm_offset  */
> +};
> +
> +static const struct cpu_regmove_cost generic_regmove_cost =
> +{
> +  1, /* GP2GP  */
> +  /* Avoid the use of slow int<->fp moves for spilling by setting
> +     their cost higher than memmov_cost.  */
> +  5, /* GP2FP  */
> +  5, /* FP2GP  */
> +  2 /* FP2FP  */
> +};
> +
> +/* Generic costs for Advanced SIMD vector operations.   */
> +static const advsimd_vec_cost generic_advsimd_vector_cost =
> +{
> +  1, /* int_stmt_cost  */
> +  1, /* fp_stmt_cost  */
> +  0, /* ld2_st2_permute_cost  */
> +  0, /* ld3_st3_permute_cost  */
> +  0, /* ld4_st4_permute_cost  */
> +  2, /* permute_cost  */
> +  2, /* reduc_i8_cost  */
> +  2, /* reduc_i16_cost  */
> +  2, /* reduc_i32_cost  */
> +  2, /* reduc_i64_cost  */
> +  2, /* reduc_f16_cost  */
> +  2, /* reduc_f32_cost  */
> +  2, /* reduc_f64_cost  */
> +  2, /* store_elt_extra_cost  */
> +  2, /* vec_to_scalar_cost  */
> +  1, /* scalar_to_vec_cost  */
> +  1, /* align_load_cost  */
> +  1, /* unalign_load_cost  */
> +  1, /* unalign_store_cost  */
> +  1  /* store_cost  */
> +};
> +
> +/* Generic costs for SVE vector operations.  */
> +static const sve_vec_cost generic_sve_vector_cost =
> +{
> +  {
> +    1, /* int_stmt_cost  */
> +    1, /* fp_stmt_cost  */
> +    0, /* ld2_st2_permute_cost  */
> +    0, /* ld3_st3_permute_cost  */
> +    0, /* ld4_st4_permute_cost  */
> +    2, /* permute_cost  */
> +    2, /* reduc_i8_cost  */
> +    2, /* reduc_i16_cost  */
> +    2, /* reduc_i32_cost  */
> +    2, /* reduc_i64_cost  */
> +    2, /* reduc_f16_cost  */
> +    2, /* reduc_f32_cost  */
> +    2, /* reduc_f64_cost  */
> +    2, /* store_elt_extra_cost  */
> +    2, /* vec_to_scalar_cost  */
> +    1, /* scalar_to_vec_cost  */
> +    1, /* align_load_cost  */
> +    1, /* unalign_load_cost  */
> +    1, /* unalign_store_cost  */
> +    1  /* store_cost  */
> +  },
> +  2, /* clast_cost  */
> +  2, /* fadda_f16_cost  */
> +  2, /* fadda_f32_cost  */
> +  2, /* fadda_f64_cost  */
> +  4, /* gather_load_x32_cost  */
> +  2, /* gather_load_x64_cost  */
> +  1 /* scatter_store_elt_cost  */
> +};
> +
> +/* Generic costs for vector insn classes.  */
> +static const struct cpu_vector_cost generic_vector_cost =
> +{
> +  1, /* scalar_int_stmt_cost  */
> +  1, /* scalar_fp_stmt_cost  */
> +  1, /* scalar_load_cost  */
> +  1, /* scalar_store_cost  */
> +  3, /* cond_taken_branch_cost  */
> +  1, /* cond_not_taken_branch_cost  */
> +  &generic_advsimd_vector_cost, /* advsimd  */
> +  &generic_sve_vector_cost, /* sve */
> +  nullptr /* issue_info  */
> +};
> +
> +/* Generic costs for branch instructions.  */
> +static const struct cpu_branch_cost generic_branch_cost =
> +{
> +  1,  /* Predictable.  */
> +  3   /* Unpredictable.  */
> +};
> +
> +/* Generic approximation modes.  */
> +static const cpu_approx_modes generic_approx_modes =
> +{
> +  AARCH64_APPROX_NONE,	/* division  */
> +  AARCH64_APPROX_NONE,	/* sqrt  */
> +  AARCH64_APPROX_NONE	/* recip_sqrt  */
> +};
> +
> +/* Generic prefetch settings (which disable prefetch).  */
> +static const cpu_prefetch_tune generic_prefetch_tune =
> +{
> +  0,			/* num_slots  */
> +  -1,			/* l1_cache_size  */
> +  -1,			/* l1_cache_line_size  */
> +  -1,			/* l2_cache_size  */
> +  true,			/* prefetch_dynamic_strides */
> +  -1,			/* minimum_stride */
> +  -1			/* default_opt_level  */
> +};
> +
> +static const struct tune_params generic_tunings =
> +{
> +  &cortexa57_extra_costs,
> +  &generic_addrcost_table,
> +  &generic_regmove_cost,
> +  &generic_vector_cost,
> +  &generic_branch_cost,
> +  &generic_approx_modes,
> +  SVE_NOT_IMPLEMENTED, /* sve_width  */
> +  { 4, /* load_int.  */
> +    4, /* store_int.  */
> +    4, /* load_fp.  */
> +    4, /* store_fp.  */
> +    4, /* load_pred.  */
> +    4 /* store_pred.  */
> +  }, /* memmov_cost.  */
> +  2, /* issue_rate  */
> +  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops  */
> +  "16:12",	/* function_align.  */
> +  "4",	/* jump_align.  */
> +  "8",	/* loop_align.  */
> +  2,	/* int_reassoc_width.  */
> +  4,	/* fp_reassoc_width.  */
> +  1,	/* fma_reassoc_width.  */
> +  1,	/* vec_reassoc_width.  */
> +  2,	/* min_div_recip_mul_sf.  */
> +  2,	/* min_div_recip_mul_df.  */
> +  0,	/* max_case_values.  */
> +  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
> +  /* Enabling AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS significantly benefits
> +     Neoverse V1.  It does not have a noticeable effect on A64FX and should
> +     have at most a very minor effect on SVE2 cores.  */
> +  (AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS),	/* tune_flags.  */
> +  &generic_prefetch_tune,
> +  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
> +  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
> +};
> +
> +#endif /* GCC_AARCH64_H_GENERIC.  */
> diff --git a/gcc/config/aarch64/tuning_models/neoverse512tvb.h b/gcc/config/aarch64/tuning_models/neoverse512tvb.h
> new file mode 100644
> index 0000000000000000000000000000000000000000..50d7b23712cc6a8be8f35246657ec5d86d6d4191
> --- /dev/null
> +++ b/gcc/config/aarch64/tuning_models/neoverse512tvb.h
> @@ -0,0 +1,164 @@
> +/* Tuning model description for AArch64 architecture.
> +   Copyright (C) 2009-2023 Free Software Foundation, Inc.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3, or (at your option)
> +   any later version.
> +
> +   GCC is distributed in the hope that it will be useful, but
> +   WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with GCC; see the file COPYING3.  If not see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#ifndef GCC_AARCH64_H_NEOVERSE512TVB
> +#define GCC_AARCH64_H_NEOVERSE512TVB
> +
> +#include "generic.h"
> +
> +static const sve_vec_cost neoverse512tvb_sve_vector_cost =
> +{
> +  {
> +    2, /* int_stmt_cost  */
> +    2, /* fp_stmt_cost  */
> +    4, /* ld2_st2_permute_cost  */
> +    5, /* ld3_st3_permute_cost  */
> +    5, /* ld4_st4_permute_cost  */
> +    3, /* permute_cost  */
> +    /* Theoretically, a reduction involving 15 scalar ADDs could
> +       complete in ~5 cycles and would have a cost of 15.  Assume that
> +       [SU]ADDV completes in 11 cycles and so give it a cost of 15 + 6.  */
> +    21, /* reduc_i8_cost  */
> +    /* Likewise for 7 scalar ADDs (~3 cycles) vs. 9: 7 + 6.  */
> +    13, /* reduc_i16_cost  */
> +    /* Likewise for 3 scalar ADDs (~2 cycles) vs. 8: 3 + 6.  */
> +    9, /* reduc_i32_cost  */
> +    /* Likewise for 1 scalar ADD (1 cycle) vs. 8: 1 + 7.  */
> +    8, /* reduc_i64_cost  */
> +    /* Theoretically, a reduction involving 7 scalar FADDs could
> +       complete in ~6 cycles and would have a cost of 14.  Assume that
> +       FADDV completes in 8 cycles and so give it a cost of 14 + 2.  */
> +    16, /* reduc_f16_cost  */
> +    /* Likewise for 3 scalar FADDs (~4 cycles) vs. 6: 6 + 2.  */
> +    8, /* reduc_f32_cost  */
> +    /* Likewise for 1 scalar FADD (2 cycles) vs. 4: 2 + 2.  */
> +    4, /* reduc_f64_cost  */
> +    2, /* store_elt_extra_cost  */
> +    /* This value is just inherited from the Cortex-A57 table.  */
> +    8, /* vec_to_scalar_cost  */
> +    /* This depends very much on what the scalar value is and
> +       where it comes from.  E.g. some constants take two dependent
> +       instructions or a load, while others might be moved from a GPR.
> +       4 seems to be a reasonable compromise in practice.  */
> +    4, /* scalar_to_vec_cost  */
> +    4, /* align_load_cost  */
> +    4, /* unalign_load_cost  */
> +    /* Although stores generally have a latency of 2 and compete for the
> +       vector pipes, in practice it's better not to model that.  */
> +    1, /* unalign_store_cost  */
> +    1  /* store_cost  */
> +  },
> +  3, /* clast_cost  */
> +  10, /* fadda_f16_cost  */
> +  6, /* fadda_f32_cost  */
> +  4, /* fadda_f64_cost  */
> +  /* A strided Advanced SIMD x64 load would take two parallel FP loads
> +     (6 cycles) plus an insertion (2 cycles).  Assume a 64-bit SVE gather
> +     is 1 cycle more.  The Advanced SIMD version is costed as 2 scalar loads
> +     (cost 8) and a vec_construct (cost 2).  Add a full vector operation
> +     (cost 2) to that, to avoid the difference being lost in rounding.
> +
> +     There is no easy comparison between a strided Advanced SIMD x32 load
> +     and an SVE 32-bit gather, but cost an SVE 32-bit gather as 1 vector
> +     operation more than a 64-bit gather.  */
> +  14, /* gather_load_x32_cost  */
> +  12, /* gather_load_x64_cost  */
> +  3 /* scatter_store_elt_cost  */
> +};
> +
> +static const aarch64_sve_vec_issue_info neoverse512tvb_sve_issue_info =
> +{
> +  {
> +    {
> +      3, /* loads_per_cycle  */
> +      2, /* stores_per_cycle  */
> +      4, /* general_ops_per_cycle  */
> +      0, /* fp_simd_load_general_ops  */
> +      1 /* fp_simd_store_general_ops  */
> +    },
> +    2, /* ld2_st2_general_ops  */
> +    2, /* ld3_st3_general_ops  */
> +    3 /* ld4_st4_general_ops  */
> +  },
> +  2, /* pred_ops_per_cycle  */
> +  2, /* while_pred_ops  */
> +  2, /* int_cmp_pred_ops  */
> +  1, /* fp_cmp_pred_ops  */
> +  1, /* gather_scatter_pair_general_ops  */
> +  1 /* gather_scatter_pair_pred_ops  */
> +};
> +
> +static const aarch64_vec_issue_info neoverse512tvb_vec_issue_info =
> +{
> +  &neoversev1_scalar_issue_info,
> +  &neoversev1_advsimd_issue_info,
> +  &neoverse512tvb_sve_issue_info
> +};
> +
> +static const struct cpu_vector_cost neoverse512tvb_vector_cost =
> +{
> +  1, /* scalar_int_stmt_cost  */
> +  2, /* scalar_fp_stmt_cost  */
> +  4, /* scalar_load_cost  */
> +  1, /* scalar_store_cost  */
> +  1, /* cond_taken_branch_cost  */
> +  1, /* cond_not_taken_branch_cost  */
> +  &neoversev1_advsimd_vector_cost, /* advsimd  */
> +  &neoverse512tvb_sve_vector_cost, /* sve  */
> +  &neoverse512tvb_vec_issue_info /* issue_info  */
> +};
> +
> +static const struct tune_params neoverse512tvb_tunings =
> +{
> +  &cortexa76_extra_costs,
> +  &neoversev1_addrcost_table,
> +  &neoversev1_regmove_cost,
> +  &neoverse512tvb_vector_cost,
> +  &generic_branch_cost,
> +  &generic_approx_modes,
> +  SVE_128 | SVE_256, /* sve_width  */
> +  { 4, /* load_int.  */
> +    2, /* store_int.  */
> +    6, /* load_fp.  */
> +    2, /* store_fp.  */
> +    6, /* load_pred.  */
> +    1 /* store_pred.  */
> +  }, /* memmov_cost.  */
> +  3, /* issue_rate  */
> +  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops  */
> +  "32:16",	/* function_align.  */
> +  "4",		/* jump_align.  */
> +  "32:16",	/* loop_align.  */
> +  2,	/* int_reassoc_width.  */
> +  4,	/* fp_reassoc_width.  */
> +  4,	/* fma_reassoc_width.  */
> +  2,	/* vec_reassoc_width.  */
> +  2,	/* min_div_recip_mul_sf.  */
> +  2,	/* min_div_recip_mul_df.  */
> +  0,	/* max_case_values.  */
> +  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
> +  (AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
> +   | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
> +   | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT),	/* tune_flags.  */
> +  &generic_prefetch_tune,
> +  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
> +  AARCH64_LDP_STP_POLICY_ALWAYS	   /* stp_policy_model.  */
> +};
> +
> +#endif /* GCC_AARCH64_H_NEOVERSE512TVB.  */
> diff --git a/gcc/config/aarch64/tuning_models/neoversen1.h b/gcc/config/aarch64/tuning_models/neoversen1.h
> new file mode 100644
> index 0000000000000000000000000000000000000000..132166d3d06430b725e4448937332cc159c11cda
> --- /dev/null
> +++ b/gcc/config/aarch64/tuning_models/neoversen1.h
> @@ -0,0 +1,60 @@
> +/* Tuning model description for AArch64 architecture.
> +   Copyright (C) 2009-2023 Free Software Foundation, Inc.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3, or (at your option)
> +   any later version.
> +
> +   GCC is distributed in the hope that it will be useful, but
> +   WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with GCC; see the file COPYING3.  If not see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#ifndef GCC_AARCH64_H_NEOVERSEN1
> +#define GCC_AARCH64_H_NEOVERSEN1
> +
> +#include "generic.h"
> +
> +static const struct tune_params neoversen1_tunings =
> +{
> +  &cortexa76_extra_costs,
> +  &generic_addrcost_table,
> +  &generic_regmove_cost,
> +  &cortexa57_vector_cost,
> +  &generic_branch_cost,
> +  &generic_approx_modes,
> +  SVE_NOT_IMPLEMENTED, /* sve_width  */
> +  { 4, /* load_int.  */
> +    2, /* store_int.  */
> +    5, /* load_fp.  */
> +    2, /* store_fp.  */
> +    4, /* load_pred.  */
> +    4 /* store_pred.  */
> +  }, /* memmov_cost.  */
> +  3, /* issue_rate  */
> +  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops  */
> +  "32:16",	/* function_align.  */
> +  "4",		/* jump_align.  */
> +  "32:16",	/* loop_align.  */
> +  2,	/* int_reassoc_width.  */
> +  4,	/* fp_reassoc_width.  */
> +  1,	/* fma_reassoc_width.  */
> +  2,	/* vec_reassoc_width.  */
> +  2,	/* min_div_recip_mul_sf.  */
> +  2,	/* min_div_recip_mul_df.  */
> +  0,	/* max_case_values.  */
> +  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
> +  (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND),	/* tune_flags.  */
> +  &generic_prefetch_tune,
> +  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
> +  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
> +};
> +
> +#endif /* GCC_AARCH64_H_NEOVERSEN1.  */
> diff --git a/gcc/config/aarch64/tuning_models/neoversen2.h b/gcc/config/aarch64/tuning_models/neoversen2.h
> new file mode 100644
> index 0000000000000000000000000000000000000000..395a6d82b8403e586bf179cade055543cf9b9eb0
> --- /dev/null
> +++ b/gcc/config/aarch64/tuning_models/neoversen2.h
> @@ -0,0 +1,245 @@
> +/* Tuning model description for AArch64 architecture.
> +   Copyright (C) 2009-2023 Free Software Foundation, Inc.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3, or (at your option)
> +   any later version.
> +
> +   GCC is distributed in the hope that it will be useful, but
> +   WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with GCC; see the file COPYING3.  If not see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#ifndef GCC_AARCH64_H_NEOVERSEN2
> +#define GCC_AARCH64_H_NEOVERSEN2
> +
> +#include "generic.h"
> +
> +static const struct cpu_addrcost_table neoversen2_addrcost_table =
> +{
> +    {
> +      1, /* hi  */
> +      0, /* si  */
> +      0, /* di  */
> +      1, /* ti  */
> +    },
> +  0, /* pre_modify  */
> +  0, /* post_modify  */
> +  2, /* post_modify_ld3_st3  */
> +  2, /* post_modify_ld4_st4  */
> +  0, /* register_offset  */
> +  0, /* register_sextend  */
> +  0, /* register_zextend  */
> +  0 /* imm_offset  */
> +};
> +
> +static const struct cpu_regmove_cost neoversen2_regmove_cost =
> +{
> +  1, /* GP2GP  */
> +  /* Spilling to int<->fp instead of memory is recommended so set
> +     realistic costs compared to memmov_cost.  */
> +  3, /* GP2FP  */
> +  2, /* FP2GP  */
> +  2 /* FP2FP  */
> +};
> +
> +static const advsimd_vec_cost neoversen2_advsimd_vector_cost =
> +{
> +  2, /* int_stmt_cost  */
> +  2, /* fp_stmt_cost  */
> +  2, /* ld2_st2_permute_cost */
> +  2, /* ld3_st3_permute_cost  */
> +  3, /* ld4_st4_permute_cost  */
> +  3, /* permute_cost  */
> +  4, /* reduc_i8_cost  */
> +  4, /* reduc_i16_cost  */
> +  2, /* reduc_i32_cost  */
> +  2, /* reduc_i64_cost  */
> +  6, /* reduc_f16_cost  */
> +  4, /* reduc_f32_cost  */
> +  2, /* reduc_f64_cost  */
> +  2, /* store_elt_extra_cost  */
> +  /* This value is just inherited from the Cortex-A57 table.  */
> +  8, /* vec_to_scalar_cost  */
> +  /* This depends very much on what the scalar value is and
> +     where it comes from.  E.g. some constants take two dependent
> +     instructions or a load, while others might be moved from a GPR.
> +     4 seems to be a reasonable compromise in practice.  */
> +  4, /* scalar_to_vec_cost  */
> +  4, /* align_load_cost  */
> +  4, /* unalign_load_cost  */
> +  /* Although stores have a latency of 2 and compete for the
> +     vector pipes, in practice it's better not to model that.  */
> +  1, /* unalign_store_cost  */
> +  1  /* store_cost  */
> +};
> +
> +static const sve_vec_cost neoversen2_sve_vector_cost =
> +{
> +  {
> +    2, /* int_stmt_cost  */
> +    2, /* fp_stmt_cost  */
> +    3, /* ld2_st2_permute_cost  */
> +    4, /* ld3_st3_permute_cost  */
> +    4, /* ld4_st4_permute_cost  */
> +    3, /* permute_cost  */
> +    /* Theoretically, a reduction involving 15 scalar ADDs could
> +       complete in ~5 cycles and would have a cost of 15.  [SU]ADDV
> +       completes in 11 cycles, so give it a cost of 15 + 6.  */
> +    21, /* reduc_i8_cost  */
> +    /* Likewise for 7 scalar ADDs (~3 cycles) vs. 9: 7 + 6.  */
> +    13, /* reduc_i16_cost  */
> +    /* Likewise for 3 scalar ADDs (~2 cycles) vs. 8: 3 + 6.  */
> +    9, /* reduc_i32_cost  */
> +    /* Likewise for 1 scalar ADD (~1 cycles) vs. 2: 1 + 1.  */
> +    2, /* reduc_i64_cost  */
> +    /* Theoretically, a reduction involving 7 scalar FADDs could
> +       complete in ~8 cycles and would have a cost of 14.  FADDV
> +       completes in 6 cycles, so give it a cost of 14 - 2.  */
> +    12, /* reduc_f16_cost  */
> +    /* Likewise for 3 scalar FADDs (~4 cycles) vs. 4: 6 - 0.  */
> +    6, /* reduc_f32_cost  */
> +    /* Likewise for 1 scalar FADD (~2 cycles) vs. 2: 2 - 0.  */
> +    2, /* reduc_f64_cost  */
> +    2, /* store_elt_extra_cost  */
> +    /* This value is just inherited from the Cortex-A57 table.  */
> +    8, /* vec_to_scalar_cost  */
> +    /* See the comment above the Advanced SIMD versions.  */
> +    4, /* scalar_to_vec_cost  */
> +    4, /* align_load_cost  */
> +    4, /* unalign_load_cost  */
> +    /* Although stores have a latency of 2 and compete for the
> +       vector pipes, in practice it's better not to model that.  */
> +    1, /* unalign_store_cost  */
> +    1  /* store_cost  */
> +  },
> +  3, /* clast_cost  */
> +  10, /* fadda_f16_cost  */
> +  6, /* fadda_f32_cost  */
> +  4, /* fadda_f64_cost  */
> +  /* A strided Advanced SIMD x64 load would take two parallel FP loads
> +     (8 cycles) plus an insertion (2 cycles).  Assume a 64-bit SVE gather
> +     is 1 cycle more.  The Advanced SIMD version is costed as 2 scalar loads
> +     (cost 8) and a vec_construct (cost 2).  Add a full vector operation
> +     (cost 2) to that, to avoid the difference being lost in rounding.
> +
> +     There is no easy comparison between a strided Advanced SIMD x32 load
> +     and an SVE 32-bit gather, but cost an SVE 32-bit gather as 1 vector
> +     operation more than a 64-bit gather.  */
> +  14, /* gather_load_x32_cost  */
> +  12, /* gather_load_x64_cost  */
> +  3 /* scatter_store_elt_cost  */
> +};
> +
> +static const aarch64_scalar_vec_issue_info neoversen2_scalar_issue_info =
> +{
> +  3, /* loads_stores_per_cycle  */
> +  2, /* stores_per_cycle  */
> +  4, /* general_ops_per_cycle  */
> +  0, /* fp_simd_load_general_ops  */
> +  1 /* fp_simd_store_general_ops  */
> +};
> +
> +static const aarch64_advsimd_vec_issue_info neoversen2_advsimd_issue_info =
> +{
> +  {
> +    3, /* loads_stores_per_cycle  */
> +    2, /* stores_per_cycle  */
> +    2, /* general_ops_per_cycle  */
> +    0, /* fp_simd_load_general_ops  */
> +    1 /* fp_simd_store_general_ops  */
> +  },
> +  2, /* ld2_st2_general_ops  */
> +  2, /* ld3_st3_general_ops  */
> +  3 /* ld4_st4_general_ops  */
> +};
> +
> +static const aarch64_sve_vec_issue_info neoversen2_sve_issue_info =
> +{
> +  {
> +    {
> +      3, /* loads_per_cycle  */
> +      2, /* stores_per_cycle  */
> +      2, /* general_ops_per_cycle  */
> +      0, /* fp_simd_load_general_ops  */
> +      1 /* fp_simd_store_general_ops  */
> +    },
> +    2, /* ld2_st2_general_ops  */
> +    3, /* ld3_st3_general_ops  */
> +    3 /* ld4_st4_general_ops  */
> +  },
> +  2, /* pred_ops_per_cycle  */
> +  2, /* while_pred_ops  */
> +  2, /* int_cmp_pred_ops  */
> +  1, /* fp_cmp_pred_ops  */
> +  1, /* gather_scatter_pair_general_ops  */
> +  1 /* gather_scatter_pair_pred_ops  */
> +};
> +
> +static const aarch64_vec_issue_info neoversen2_vec_issue_info =
> +{
> +  &neoversen2_scalar_issue_info,
> +  &neoversen2_advsimd_issue_info,
> +  &neoversen2_sve_issue_info
> +};
> +
> +/* Neoverse N2 costs for vector insn classes.  */
> +static const struct cpu_vector_cost neoversen2_vector_cost =
> +{
> +  1, /* scalar_int_stmt_cost  */
> +  2, /* scalar_fp_stmt_cost  */
> +  4, /* scalar_load_cost  */
> +  1, /* scalar_store_cost  */
> +  1, /* cond_taken_branch_cost  */
> +  1, /* cond_not_taken_branch_cost  */
> +  &neoversen2_advsimd_vector_cost, /* advsimd  */
> +  &neoversen2_sve_vector_cost, /* sve  */
> +  &neoversen2_vec_issue_info /* issue_info  */
> +};
> +
> +static const struct tune_params neoversen2_tunings =
> +{
> +  &cortexa76_extra_costs,
> +  &neoversen2_addrcost_table,
> +  &neoversen2_regmove_cost,
> +  &neoversen2_vector_cost,
> +  &generic_branch_cost,
> +  &generic_approx_modes,
> +  SVE_128, /* sve_width  */
> +  { 4, /* load_int.  */
> +    1, /* store_int.  */
> +    6, /* load_fp.  */
> +    2, /* store_fp.  */
> +    6, /* load_pred.  */
> +    1 /* store_pred.  */
> +  }, /* memmov_cost.  */
> +  3, /* issue_rate  */
> +  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops  */
> +  "32:16",	/* function_align.  */
> +  "4",		/* jump_align.  */
> +  "32:16",	/* loop_align.  */
> +  2,	/* int_reassoc_width.  */
> +  4,	/* fp_reassoc_width.  */
> +  1,	/* fma_reassoc_width.  */
> +  2,	/* vec_reassoc_width.  */
> +  2,	/* min_div_recip_mul_sf.  */
> +  2,	/* min_div_recip_mul_df.  */
> +  0,	/* max_case_values.  */
> +  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
> +  (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND
> +   | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
> +   | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
> +   | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT),	/* tune_flags.  */
> +  &generic_prefetch_tune,
> +  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
> +  AARCH64_LDP_STP_POLICY_ALWAYS	   /* stp_policy_model.  */
> +};
> +
> +#endif /* GCC_AARCH64_H_NEOVERSEN2.  */
> diff --git a/gcc/config/aarch64/tuning_models/neoversev1.h b/gcc/config/aarch64/tuning_models/neoversev1.h
> new file mode 100644
> index 0000000000000000000000000000000000000000..584a5000e06f598dcdd3bcc533dc6dbc642223ca
> --- /dev/null
> +++ b/gcc/config/aarch64/tuning_models/neoversev1.h
> @@ -0,0 +1,237 @@
> +/* Tuning model description for AArch64 architecture.
> +   Copyright (C) 2009-2023 Free Software Foundation, Inc.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3, or (at your option)
> +   any later version.
> +
> +   GCC is distributed in the hope that it will be useful, but
> +   WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with GCC; see the file COPYING3.  If not see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#ifndef GCC_AARCH64_H_NEOVERSEV1
> +#define GCC_AARCH64_H_NEOVERSEV1
> +
> +#include "generic.h"
> +
> +static const struct cpu_addrcost_table neoversev1_addrcost_table =
> +{
> +    {
> +      1, /* hi  */
> +      0, /* si  */
> +      0, /* di  */
> +      1, /* ti  */
> +    },
> +  0, /* pre_modify  */
> +  0, /* post_modify  */
> +  3, /* post_modify_ld3_st3  */
> +  3, /* post_modify_ld4_st4  */
> +  0, /* register_offset  */
> +  0, /* register_sextend  */
> +  0, /* register_zextend  */
> +  0 /* imm_offset  */
> +};
> +
> +static const struct cpu_regmove_cost neoversev1_regmove_cost =
> +{
> +  1, /* GP2GP  */
> +  /* Spilling to int<->fp instead of memory is recommended so set
> +     realistic costs compared to memmov_cost.  */
> +  3, /* GP2FP  */
> +  2, /* FP2GP  */
> +  2 /* FP2FP  */
> +};
> +
> +static const advsimd_vec_cost neoversev1_advsimd_vector_cost =
> +{
> +  2, /* int_stmt_cost  */
> +  2, /* fp_stmt_cost  */
> +  4, /* ld2_st2_permute_cost */
> +  4, /* ld3_st3_permute_cost  */
> +  5, /* ld4_st4_permute_cost  */
> +  3, /* permute_cost  */
> +  4, /* reduc_i8_cost  */
> +  4, /* reduc_i16_cost  */
> +  2, /* reduc_i32_cost  */
> +  2, /* reduc_i64_cost  */
> +  6, /* reduc_f16_cost  */
> +  3, /* reduc_f32_cost  */
> +  2, /* reduc_f64_cost  */
> +  2, /* store_elt_extra_cost  */
> +  /* This value is just inherited from the Cortex-A57 table.  */
> +  8, /* vec_to_scalar_cost  */
> +  /* This depends very much on what the scalar value is and
> +     where it comes from.  E.g. some constants take two dependent
> +     instructions or a load, while others might be moved from a GPR.
> +     4 seems to be a reasonable compromise in practice.  */
> +  4, /* scalar_to_vec_cost  */
> +  4, /* align_load_cost  */
> +  4, /* unalign_load_cost  */
> +  /* Although stores have a latency of 2 and compete for the
> +     vector pipes, in practice it's better not to model that.  */
> +  1, /* unalign_store_cost  */
> +  1  /* store_cost  */
> +};
> +
> +static const sve_vec_cost neoversev1_sve_vector_cost =
> +{
> +  {
> +    2, /* int_stmt_cost  */
> +    2, /* fp_stmt_cost  */
> +    4, /* ld2_st2_permute_cost  */
> +    7, /* ld3_st3_permute_cost  */
> +    8, /* ld4_st4_permute_cost  */
> +    3, /* permute_cost  */
> +    /* Theoretically, a reduction involving 31 scalar ADDs could
> +       complete in ~9 cycles and would have a cost of 31.  [SU]ADDV
> +       completes in 14 cycles, so give it a cost of 31 + 5.  */
> +    36, /* reduc_i8_cost  */
> +    /* Likewise for 15 scalar ADDs (~5 cycles) vs. 12: 15 + 7.  */
> +    22, /* reduc_i16_cost  */
> +    /* Likewise for 7 scalar ADDs (~3 cycles) vs. 10: 7 + 7.  */
> +    14, /* reduc_i32_cost  */
> +    /* Likewise for 3 scalar ADDs (~2 cycles) vs. 10: 3 + 8.  */
> +    11, /* reduc_i64_cost  */
> +    /* Theoretically, a reduction involving 15 scalar FADDs could
> +       complete in ~9 cycles and would have a cost of 30.  FADDV
> +       completes in 13 cycles, so give it a cost of 30 + 4.  */
> +    34, /* reduc_f16_cost  */
> +    /* Likewise for 7 scalar FADDs (~6 cycles) vs. 11: 14 + 5.  */
> +    19, /* reduc_f32_cost  */
> +    /* Likewise for 3 scalar FADDs (~4 cycles) vs. 9: 6 + 5.  */
> +    11, /* reduc_f64_cost  */
> +    2, /* store_elt_extra_cost  */
> +    /* This value is just inherited from the Cortex-A57 table.  */
> +    8, /* vec_to_scalar_cost  */
> +    /* See the comment above the Advanced SIMD versions.  */
> +    4, /* scalar_to_vec_cost  */
> +    4, /* align_load_cost  */
> +    4, /* unalign_load_cost  */
> +    /* Although stores have a latency of 2 and compete for the
> +       vector pipes, in practice it's better not to model that.  */
> +    1, /* unalign_store_cost  */
> +    1  /* store_cost  */
> +  },
> +  3, /* clast_cost  */
> +  19, /* fadda_f16_cost  */
> +  11, /* fadda_f32_cost  */
> +  8, /* fadda_f64_cost  */
> +  32, /* gather_load_x32_cost  */
> +  16, /* gather_load_x64_cost  */
> +  3 /* scatter_store_elt_cost  */
> +};
> +
> +static const aarch64_scalar_vec_issue_info neoversev1_scalar_issue_info =
> +{
> +  3, /* loads_stores_per_cycle  */
> +  2, /* stores_per_cycle  */
> +  4, /* general_ops_per_cycle  */
> +  0, /* fp_simd_load_general_ops  */
> +  1 /* fp_simd_store_general_ops  */
> +};
> +
> +static const aarch64_advsimd_vec_issue_info neoversev1_advsimd_issue_info =
> +{
> +  {
> +    3, /* loads_stores_per_cycle  */
> +    2, /* stores_per_cycle  */
> +    4, /* general_ops_per_cycle  */
> +    0, /* fp_simd_load_general_ops  */
> +    1 /* fp_simd_store_general_ops  */
> +  },
> +  2, /* ld2_st2_general_ops  */
> +  2, /* ld3_st3_general_ops  */
> +  3 /* ld4_st4_general_ops  */
> +};
> +
> +static const aarch64_sve_vec_issue_info neoversev1_sve_issue_info =
> +{
> +  {
> +    {
> +      2, /* loads_per_cycle  */
> +      2, /* stores_per_cycle  */
> +      2, /* general_ops_per_cycle  */
> +      0, /* fp_simd_load_general_ops  */
> +      1 /* fp_simd_store_general_ops  */
> +    },
> +    2, /* ld2_st2_general_ops  */
> +    2, /* ld3_st3_general_ops  */
> +    3 /* ld4_st4_general_ops  */
> +  },
> +  1, /* pred_ops_per_cycle  */
> +  2, /* while_pred_ops  */
> +  2, /* int_cmp_pred_ops  */
> +  1, /* fp_cmp_pred_ops  */
> +  1, /* gather_scatter_pair_general_ops  */
> +  1 /* gather_scatter_pair_pred_ops  */
> +};
> +
> +static const aarch64_vec_issue_info neoversev1_vec_issue_info =
> +{
> +  &neoversev1_scalar_issue_info,
> +  &neoversev1_advsimd_issue_info,
> +  &neoversev1_sve_issue_info
> +};
> +
> +/* Neoverse V1 costs for vector insn classes.  */
> +static const struct cpu_vector_cost neoversev1_vector_cost =
> +{
> +  1, /* scalar_int_stmt_cost  */
> +  2, /* scalar_fp_stmt_cost  */
> +  4, /* scalar_load_cost  */
> +  1, /* scalar_store_cost  */
> +  1, /* cond_taken_branch_cost  */
> +  1, /* cond_not_taken_branch_cost  */
> +  &neoversev1_advsimd_vector_cost, /* advsimd  */
> +  &neoversev1_sve_vector_cost, /* sve  */
> +  &neoversev1_vec_issue_info /* issue_info  */
> +};
> +
> +static const struct tune_params neoversev1_tunings =
> +{
> +  &cortexa76_extra_costs,
> +  &neoversev1_addrcost_table,
> +  &neoversev1_regmove_cost,
> +  &neoversev1_vector_cost,
> +  &generic_branch_cost,
> +  &generic_approx_modes,
> +  SVE_256, /* sve_width  */
> +  { 4, /* load_int.  */
> +    2, /* store_int.  */
> +    6, /* load_fp.  */
> +    2, /* store_fp.  */
> +    6, /* load_pred.  */
> +    1 /* store_pred.  */
> +  }, /* memmov_cost.  */
> +  3, /* issue_rate  */
> +  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops  */
> +  "32:16",	/* function_align.  */
> +  "4",		/* jump_align.  */
> +  "32:16",	/* loop_align.  */
> +  2,	/* int_reassoc_width.  */
> +  4,	/* fp_reassoc_width.  */
> +  4,	/* fma_reassoc_width.  */
> +  2,	/* vec_reassoc_width.  */
> +  2,	/* min_div_recip_mul_sf.  */
> +  2,	/* min_div_recip_mul_df.  */
> +  0,	/* max_case_values.  */
> +  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
> +  (AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
> +   | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
> +   | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT
> +   | AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND),	/* tune_flags.  */
> +  &generic_prefetch_tune,
> +  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
> +  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
> +};
> +
> +
> +#endif /* GCC_AARCH64_H_NEOVERSEV1.  */
> diff --git a/gcc/config/aarch64/tuning_models/neoversev2.h b/gcc/config/aarch64/tuning_models/neoversev2.h
> new file mode 100644
> index 0000000000000000000000000000000000000000..28d4244ef4c99ecdffb7408e39dc21bc191223de
> --- /dev/null
> +++ b/gcc/config/aarch64/tuning_models/neoversev2.h
> @@ -0,0 +1,245 @@
> +/* Tuning model description for AArch64 architecture.
> +   Copyright (C) 2009-2023 Free Software Foundation, Inc.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3, or (at your option)
> +   any later version.
> +
> +   GCC is distributed in the hope that it will be useful, but
> +   WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with GCC; see the file COPYING3.  If not see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#ifndef GCC_AARCH64_H_NEOVERSEV2
> +#define GCC_AARCH64_H_NEOVERSEV2
> +
> +#include "generic.h"
> +
> +static const struct cpu_addrcost_table neoversev2_addrcost_table =
> +{
> +    {
> +      1, /* hi  */
> +      0, /* si  */
> +      0, /* di  */
> +      1, /* ti  */
> +    },
> +  0, /* pre_modify  */
> +  0, /* post_modify  */
> +  2, /* post_modify_ld3_st3  */
> +  2, /* post_modify_ld4_st4  */
> +  0, /* register_offset  */
> +  0, /* register_sextend  */
> +  0, /* register_zextend  */
> +  0 /* imm_offset  */
> +};
> +
> +static const struct cpu_regmove_cost neoversev2_regmove_cost =
> +{
> +  1, /* GP2GP  */
> +  /* Spilling to int<->fp instead of memory is recommended so set
> +     realistic costs compared to memmov_cost.  */
> +  3, /* GP2FP  */
> +  2, /* FP2GP  */
> +  2 /* FP2FP  */
> +};
> +
> +static const advsimd_vec_cost neoversev2_advsimd_vector_cost =
> +{
> +  2, /* int_stmt_cost  */
> +  2, /* fp_stmt_cost  */
> +  2, /* ld2_st2_permute_cost */
> +  2, /* ld3_st3_permute_cost  */
> +  3, /* ld4_st4_permute_cost  */
> +  3, /* permute_cost  */
> +  4, /* reduc_i8_cost  */
> +  4, /* reduc_i16_cost  */
> +  2, /* reduc_i32_cost  */
> +  2, /* reduc_i64_cost  */
> +  6, /* reduc_f16_cost  */
> +  3, /* reduc_f32_cost  */
> +  2, /* reduc_f64_cost  */
> +  2, /* store_elt_extra_cost  */
> +  /* This value is just inherited from the Cortex-A57 table.  */
> +  8, /* vec_to_scalar_cost  */
> +  /* This depends very much on what the scalar value is and
> +     where it comes from.  E.g. some constants take two dependent
> +     instructions or a load, while others might be moved from a GPR.
> +     4 seems to be a reasonable compromise in practice.  */
> +  4, /* scalar_to_vec_cost  */
> +  4, /* align_load_cost  */
> +  4, /* unalign_load_cost  */
> +  /* Although stores have a latency of 2 and compete for the
> +     vector pipes, in practice it's better not to model that.  */
> +  1, /* unalign_store_cost  */
> +  1  /* store_cost  */
> +};
> +
> +static const sve_vec_cost neoversev2_sve_vector_cost =
> +{
> +  {
> +    2, /* int_stmt_cost  */
> +    2, /* fp_stmt_cost  */
> +    3, /* ld2_st2_permute_cost  */
> +    3, /* ld3_st3_permute_cost  */
> +    4, /* ld4_st4_permute_cost  */
> +    3, /* permute_cost  */
> +    /* Theoretically, a reduction involving 15 scalar ADDs could
> +       complete in ~3 cycles and would have a cost of 15.  [SU]ADDV
> +       completes in 11 cycles, so give it a cost of 15 + 8.  */
> +    21, /* reduc_i8_cost  */
> +    /* Likewise for 7 scalar ADDs (~2 cycles) vs. 9: 7 + 7.  */
> +    14, /* reduc_i16_cost  */
> +    /* Likewise for 3 scalar ADDs (~2 cycles) vs. 8: 3 + 4.  */
> +    7, /* reduc_i32_cost  */
> +    /* Likewise for 1 scalar ADD (~1 cycles) vs. 2: 1 + 1.  */
> +    2, /* reduc_i64_cost  */
> +    /* Theoretically, a reduction involving 7 scalar FADDs could
> +       complete in ~6 cycles and would have a cost of 14.  FADDV
> +       completes in 8 cycles, so give it a cost of 14 + 2.  */
> +    16, /* reduc_f16_cost  */
> +    /* Likewise for 3 scalar FADDs (~4 cycles) vs. 6: 6 + 2.  */
> +    8, /* reduc_f32_cost  */
> +    /* Likewise for 1 scalar FADD (~2 cycles) vs. 4: 2 + 2.  */
> +    4, /* reduc_f64_cost  */
> +    2, /* store_elt_extra_cost  */
> +    /* This value is just inherited from the Cortex-A57 table.  */
> +    8, /* vec_to_scalar_cost  */
> +    /* See the comment above the Advanced SIMD versions.  */
> +    4, /* scalar_to_vec_cost  */
> +    4, /* align_load_cost  */
> +    4, /* unalign_load_cost  */
> +    /* Although stores have a latency of 2 and compete for the
> +       vector pipes, in practice it's better not to model that.  */
> +    1, /* unalign_store_cost  */
> +    1  /* store_cost  */
> +  },
> +  3, /* clast_cost  */
> +  10, /* fadda_f16_cost  */
> +  6, /* fadda_f32_cost  */
> +  4, /* fadda_f64_cost  */
> +  /* A strided Advanced SIMD x64 load would take two parallel FP loads
> +     (8 cycles) plus an insertion (2 cycles).  Assume a 64-bit SVE gather
> +     is 1 cycle more.  The Advanced SIMD version is costed as 2 scalar loads
> +     (cost 8) and a vec_construct (cost 2).  Add a full vector operation
> +     (cost 2) to that, to avoid the difference being lost in rounding.
> +
> +     There is no easy comparison between a strided Advanced SIMD x32 load
> +     and an SVE 32-bit gather, but cost an SVE 32-bit gather as 1 vector
> +     operation more than a 64-bit gather.  */
> +  14, /* gather_load_x32_cost  */
> +  12, /* gather_load_x64_cost  */
> +  3 /* scatter_store_elt_cost  */
> +};
> +
> +static const aarch64_scalar_vec_issue_info neoversev2_scalar_issue_info =
> +{
> +  3, /* loads_stores_per_cycle  */
> +  2, /* stores_per_cycle  */
> +  6, /* general_ops_per_cycle  */
> +  0, /* fp_simd_load_general_ops  */
> +  1 /* fp_simd_store_general_ops  */
> +};
> +
> +static const aarch64_advsimd_vec_issue_info neoversev2_advsimd_issue_info =
> +{
> +  {
> +    3, /* loads_stores_per_cycle  */
> +    2, /* stores_per_cycle  */
> +    4, /* general_ops_per_cycle  */
> +    0, /* fp_simd_load_general_ops  */
> +    1 /* fp_simd_store_general_ops  */
> +  },
> +  2, /* ld2_st2_general_ops  */
> +  2, /* ld3_st3_general_ops  */
> +  3 /* ld4_st4_general_ops  */
> +};
> +
> +static const aarch64_sve_vec_issue_info neoversev2_sve_issue_info =
> +{
> +  {
> +    {
> +      3, /* loads_per_cycle  */
> +      2, /* stores_per_cycle  */
> +      4, /* general_ops_per_cycle  */
> +      0, /* fp_simd_load_general_ops  */
> +      1 /* fp_simd_store_general_ops  */
> +    },
> +    2, /* ld2_st2_general_ops  */
> +    3, /* ld3_st3_general_ops  */
> +    3 /* ld4_st4_general_ops  */
> +  },
> +  2, /* pred_ops_per_cycle  */
> +  2, /* while_pred_ops  */
> +  2, /* int_cmp_pred_ops  */
> +  1, /* fp_cmp_pred_ops  */
> +  1, /* gather_scatter_pair_general_ops  */
> +  1 /* gather_scatter_pair_pred_ops  */
> +};
> +
> +static const aarch64_vec_issue_info neoversev2_vec_issue_info =
> +{
> +  &neoversev2_scalar_issue_info,
> +  &neoversev2_advsimd_issue_info,
> +  &neoversev2_sve_issue_info
> +};
> +
> +/* Demeter costs for vector insn classes.  */
> +static const struct cpu_vector_cost neoversev2_vector_cost =
> +{
> +  1, /* scalar_int_stmt_cost  */
> +  2, /* scalar_fp_stmt_cost  */
> +  4, /* scalar_load_cost  */
> +  1, /* scalar_store_cost  */
> +  1, /* cond_taken_branch_cost  */
> +  1, /* cond_not_taken_branch_cost  */
> +  &neoversev2_advsimd_vector_cost, /* advsimd  */
> +  &neoversev2_sve_vector_cost, /* sve  */
> +  &neoversev2_vec_issue_info /* issue_info  */
> +};
> +
> +static const struct tune_params neoversev2_tunings =
> +{
> +  &cortexa76_extra_costs,
> +  &neoversev2_addrcost_table,
> +  &neoversev2_regmove_cost,
> +  &neoversev2_vector_cost,
> +  &generic_branch_cost,
> +  &generic_approx_modes,
> +  SVE_128, /* sve_width  */
> +  { 4, /* load_int.  */
> +    2, /* store_int.  */
> +    6, /* load_fp.  */
> +    1, /* store_fp.  */
> +    6, /* load_pred.  */
> +    2 /* store_pred.  */
> +  }, /* memmov_cost.  */
> +  5, /* issue_rate  */
> +  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops  */
> +  "32:16",	/* function_align.  */
> +  "4",		/* jump_align.  */
> +  "32:16",	/* loop_align.  */
> +  3,	/* int_reassoc_width.  */
> +  6,	/* fp_reassoc_width.  */
> +  4,	/* fma_reassoc_width.  */
> +  3,	/* vec_reassoc_width.  */
> +  2,	/* min_div_recip_mul_sf.  */
> +  2,	/* min_div_recip_mul_df.  */
> +  0,	/* max_case_values.  */
> +  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
> +  (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND
> +   | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
> +   | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
> +   | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT),	/* tune_flags.  */
> +  &generic_prefetch_tune,
> +  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
> +  AARCH64_LDP_STP_POLICY_ALWAYS	   /* stp_policy_model.  */
> +};
> +
> +#endif /* GCC_AARCH64_H_NEOVERSEV2.  */
> diff --git a/gcc/config/aarch64/tuning_models/qdf24xx.h b/gcc/config/aarch64/tuning_models/qdf24xx.h
> new file mode 100644
> index 0000000000000000000000000000000000000000..29c9b9f5843acc15450a2492b141c02ee48a3f13
> --- /dev/null
> +++ b/gcc/config/aarch64/tuning_models/qdf24xx.h
> @@ -0,0 +1,137 @@
> +/* Tuning model description for AArch64 architecture.
> +   Copyright (C) 2009-2023 Free Software Foundation, Inc.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3, or (at your option)
> +   any later version.
> +
> +   GCC is distributed in the hope that it will be useful, but
> +   WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with GCC; see the file COPYING3.  If not see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#ifndef GCC_AARCH64_H_QDF24XX
> +#define GCC_AARCH64_H_QDF24XX
> +
> +#include "generic.h"
> +
> +static const struct cpu_addrcost_table qdf24xx_addrcost_table =
> +{
> +    {
> +      1, /* hi  */
> +      1, /* si  */
> +      1, /* di  */
> +      2, /* ti  */
> +    },
> +  1, /* pre_modify  */
> +  1, /* post_modify  */
> +  1, /* post_modify_ld3_st3  */
> +  1, /* post_modify_ld4_st4  */
> +  3, /* register_offset  */
> +  3, /* register_sextend  */
> +  3, /* register_zextend  */
> +  2, /* imm_offset  */
> +};
> +
> +static const struct cpu_regmove_cost qdf24xx_regmove_cost =
> +{
> +  2, /* GP2GP  */
> +  /* Avoid the use of int<->fp moves for spilling.  */
> +  6, /* GP2FP  */
> +  6, /* FP2GP  */
> +  4 /* FP2FP  */
> +};
> +
> +static const advsimd_vec_cost qdf24xx_advsimd_vector_cost =
> +{
> +  1, /* int_stmt_cost  */
> +  3, /* fp_stmt_cost  */
> +  0, /* ld2_st2_permute_cost  */
> +  0, /* ld3_st3_permute_cost  */
> +  0, /* ld4_st4_permute_cost  */
> +  2, /* permute_cost  */
> +  1, /* reduc_i8_cost  */
> +  1, /* reduc_i16_cost  */
> +  1, /* reduc_i32_cost  */
> +  1, /* reduc_i64_cost  */
> +  1, /* reduc_f16_cost  */
> +  1, /* reduc_f32_cost  */
> +  1, /* reduc_f64_cost  */
> +  1, /* store_elt_extra_cost  */
> +  1, /* vec_to_scalar_cost  */
> +  1, /* scalar_to_vec_cost  */
> +  1, /* align_load_cost  */
> +  1, /* unalign_load_cost  */
> +  1, /* unalign_store_cost  */
> +  1  /* store_cost  */
> +};
> +
> +/* QDF24XX costs for vector insn classes.  */
> +static const struct cpu_vector_cost qdf24xx_vector_cost =
> +{
> +  1, /* scalar_int_stmt_cost  */
> +  1, /* scalar_fp_stmt_cost  */
> +  1, /* scalar_load_cost  */
> +  1, /* scalar_store_cost  */
> +  3, /* cond_taken_branch_cost  */
> +  1, /* cond_not_taken_branch_cost  */
> +  &qdf24xx_advsimd_vector_cost, /* advsimd  */
> +  nullptr, /* sve  */
> +  nullptr /* issue_info  */
> +};
> +
> +static const cpu_prefetch_tune qdf24xx_prefetch_tune =
> +{
> +  4,			/* num_slots  */
> +  32,			/* l1_cache_size  */
> +  64,			/* l1_cache_line_size  */
> +  512,			/* l2_cache_size  */
> +  false,		/* prefetch_dynamic_strides */
> +  2048,			/* minimum_stride */
> +  3			/* default_opt_level  */
> +};
> +
> +static const struct tune_params qdf24xx_tunings =
> +{
> +  &qdf24xx_extra_costs,
> +  &qdf24xx_addrcost_table,
> +  &qdf24xx_regmove_cost,
> +  &qdf24xx_vector_cost,
> +  &generic_branch_cost,
> +  &generic_approx_modes,
> +  SVE_NOT_IMPLEMENTED, /* sve_width  */
> +  { 4, /* load_int.  */
> +    4, /* store_int.  */
> +    4, /* load_fp.  */
> +    4, /* store_fp.  */
> +    4, /* load_pred.  */
> +    4 /* store_pred.  */
> +  }, /* memmov_cost.  */
> +  4, /* issue_rate  */
> +  (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
> +   | AARCH64_FUSE_MOVK_MOVK), /* fuseable_ops  */
> +  "16",	/* function_align.  */
> +  "8",	/* jump_align.  */
> +  "16",	/* loop_align.  */
> +  2,	/* int_reassoc_width.  */
> +  4,	/* fp_reassoc_width.  */
> +  1,	/* fma_reassoc_width.  */
> +  1,	/* vec_reassoc_width.  */
> +  2,	/* min_div_recip_mul_sf.  */
> +  2,	/* min_div_recip_mul_df.  */
> +  0,	/* max_case_values.  */
> +  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
> +  AARCH64_EXTRA_TUNE_RENAME_LOAD_REGS, /* tune_flags.  */
> +  &qdf24xx_prefetch_tune,
> +  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
> +  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
> +};
> +
> +#endif /* GCC_AARCH64_H_QDF24XX.  */
> diff --git a/gcc/config/aarch64/tuning_models/saphira.h b/gcc/config/aarch64/tuning_models/saphira.h
> new file mode 100644
> index 0000000000000000000000000000000000000000..e584d316bb7c3c2d232cf7623a92100ad261f07d
> --- /dev/null
> +++ b/gcc/config/aarch64/tuning_models/saphira.h
> @@ -0,0 +1,63 @@
> +/* Tuning model description for AArch64 architecture.
> +   Copyright (C) 2009-2023 Free Software Foundation, Inc.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3, or (at your option)
> +   any later version.
> +
> +   GCC is distributed in the hope that it will be useful, but
> +   WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with GCC; see the file COPYING3.  If not see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#ifndef GCC_AARCH64_H_SAPHIRA
> +#define GCC_AARCH64_H_SAPHIRA
> +
> +#include "generic.h"
> +
> +/* Tuning structure for the Qualcomm Saphira core.  Default to falkor values
> +   for now.  */
> +static const struct tune_params saphira_tunings =
> +{
> +  &generic_extra_costs,
> +  &generic_addrcost_table,
> +  &generic_regmove_cost,
> +  &generic_vector_cost,
> +  &generic_branch_cost,
> +  &generic_approx_modes,
> +  SVE_NOT_IMPLEMENTED, /* sve_width  */
> +  { 4, /* load_int.  */
> +    4, /* store_int.  */
> +    4, /* load_fp.  */
> +    4, /* store_fp.  */
> +    4, /* load_pred.  */
> +    4 /* store_pred.  */
> +  }, /* memmov_cost.  */
> +  4, /* issue_rate  */
> +  (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
> +   | AARCH64_FUSE_MOVK_MOVK), /* fuseable_ops  */
> +  "16",	/* function_align.  */
> +  "8",	/* jump_align.  */
> +  "16",	/* loop_align.  */
> +  2,	/* int_reassoc_width.  */
> +  4,	/* fp_reassoc_width.  */
> +  1,	/* fma_reassoc_width.  */
> +  1,	/* vec_reassoc_width.  */
> +  2,	/* min_div_recip_mul_sf.  */
> +  2,	/* min_div_recip_mul_df.  */
> +  0,	/* max_case_values.  */
> +  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
> +  (AARCH64_EXTRA_TUNE_NONE),		/* tune_flags.  */
> +  &generic_prefetch_tune,
> +  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
> +  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
> +};
> +
> +#endif /* GCC_AARCH64_H_SAPHIRA.  */
> diff --git a/gcc/config/aarch64/tuning_models/thunderx.h b/gcc/config/aarch64/tuning_models/thunderx.h
> new file mode 100644
> index 0000000000000000000000000000000000000000..dd4b9d539fc5cf2bd20d84e91d6b72fa7237f99f
> --- /dev/null
> +++ b/gcc/config/aarch64/tuning_models/thunderx.h
> @@ -0,0 +1,117 @@
> +/* Tuning model description for AArch64 architecture.
> +   Copyright (C) 2009-2023 Free Software Foundation, Inc.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3, or (at your option)
> +   any later version.
> +
> +   GCC is distributed in the hope that it will be useful, but
> +   WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with GCC; see the file COPYING3.  If not see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#ifndef GCC_AARCH64_H_THUNDERX
> +#define GCC_AARCH64_H_THUNDERX
> +
> +#include "generic.h"
> +
> +static const struct cpu_regmove_cost thunderx_regmove_cost =
> +{
> +  2, /* GP2GP  */
> +  2, /* GP2FP  */
> +  6, /* FP2GP  */
> +  4 /* FP2FP  */
> +};
> +
> +static const advsimd_vec_cost thunderx_advsimd_vector_cost =
> +{
> +  4, /* int_stmt_cost  */
> +  1, /* fp_stmt_cost  */
> +  0, /* ld2_st2_permute_cost  */
> +  0, /* ld3_st3_permute_cost  */
> +  0, /* ld4_st4_permute_cost  */
> +  4, /* permute_cost  */
> +  2, /* reduc_i8_cost  */
> +  2, /* reduc_i16_cost  */
> +  2, /* reduc_i32_cost  */
> +  2, /* reduc_i64_cost  */
> +  2, /* reduc_f16_cost  */
> +  2, /* reduc_f32_cost  */
> +  2, /* reduc_f64_cost  */
> +  2, /* store_elt_extra_cost  */
> +  2, /* vec_to_scalar_cost  */
> +  2, /* scalar_to_vec_cost  */
> +  3, /* align_load_cost  */
> +  5, /* unalign_load_cost  */
> +  5, /* unalign_store_cost  */
> +  1  /* store_cost  */
> +};
> +
> +/* ThunderX costs for vector insn classes.  */
> +static const struct cpu_vector_cost thunderx_vector_cost =
> +{
> +  1, /* scalar_int_stmt_cost  */
> +  1, /* scalar_fp_stmt_cost  */
> +  3, /* scalar_load_cost  */
> +  1, /* scalar_store_cost  */
> +  3, /* cond_taken_branch_cost  */
> +  3, /* cond_not_taken_branch_cost  */
> +  &thunderx_advsimd_vector_cost, /* advsimd  */
> +  nullptr, /* sve  */
> +  nullptr /* issue_info  */
> +};
> +
> +static const cpu_prefetch_tune thunderx_prefetch_tune =
> +{
> +  8,			/* num_slots  */
> +  32,			/* l1_cache_size  */
> +  128,			/* l1_cache_line_size  */
> +  -1,			/* l2_cache_size  */
> +  true,			/* prefetch_dynamic_strides */
> +  -1,			/* minimum_stride */
> +  -1			/* default_opt_level  */
> +};
> +
> +static const struct tune_params thunderx_tunings =
> +{
> +  &thunderx_extra_costs,
> +  &generic_addrcost_table,
> +  &thunderx_regmove_cost,
> +  &thunderx_vector_cost,
> +  &generic_branch_cost,
> +  &generic_approx_modes,
> +  SVE_NOT_IMPLEMENTED, /* sve_width  */
> +  { 6, /* load_int.  */
> +    6, /* store_int.  */
> +    6, /* load_fp.  */
> +    6, /* store_fp.  */
> +    6, /* load_pred.  */
> +    6 /* store_pred.  */
> +  }, /* memmov_cost.  */
> +  2, /* issue_rate  */
> +  AARCH64_FUSE_ALU_BRANCH, /* fusible_ops  */
> +  "8",	/* function_align.  */
> +  "8",	/* jump_align.  */
> +  "8",	/* loop_align.  */
> +  2,	/* int_reassoc_width.  */
> +  4,	/* fp_reassoc_width.  */
> +  1,	/* fma_reassoc_width.  */
> +  1,	/* vec_reassoc_width.  */
> +  2,	/* min_div_recip_mul_sf.  */
> +  2,	/* min_div_recip_mul_df.  */
> +  0,	/* max_case_values.  */
> +  tune_params::AUTOPREFETCHER_OFF,	/* autoprefetcher_model.  */
> +  (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND),	/* tune_flags.  */
> +  &thunderx_prefetch_tune,
> +  AARCH64_LDP_STP_POLICY_ALIGNED,   /* ldp_policy_model.  */
> +  AARCH64_LDP_STP_POLICY_ALIGNED    /* stp_policy_model.  */
> +};
> +
> +#endif /* GCC_AARCH64_H_THUNDERX.  */
> diff --git a/gcc/config/aarch64/tuning_models/thunderx2t99.h b/gcc/config/aarch64/tuning_models/thunderx2t99.h
> new file mode 100644
> index 0000000000000000000000000000000000000000..0a376e0bab37b0b5bc1ea23de0e96a9245846fd7
> --- /dev/null
> +++ b/gcc/config/aarch64/tuning_models/thunderx2t99.h
> @@ -0,0 +1,137 @@
> +/* Tuning model description for AArch64 architecture.
> +   Copyright (C) 2009-2023 Free Software Foundation, Inc.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3, or (at your option)
> +   any later version.
> +
> +   GCC is distributed in the hope that it will be useful, but
> +   WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with GCC; see the file COPYING3.  If not see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#ifndef GCC_AARCH64_H_THUNDERX2T99
> +#define GCC_AARCH64_H_THUNDERX2T99
> +
> +#include "generic.h"
> +
> +static const struct cpu_addrcost_table thunderx2t99_addrcost_table =
> +{
> +    {
> +      1, /* hi  */
> +      1, /* si  */
> +      1, /* di  */
> +      2, /* ti  */
> +    },
> +  0, /* pre_modify  */
> +  0, /* post_modify  */
> +  0, /* post_modify_ld3_st3  */
> +  0, /* post_modify_ld4_st4  */
> +  2, /* register_offset  */
> +  3, /* register_sextend  */
> +  3, /* register_zextend  */
> +  0, /* imm_offset  */
> +};
> +
> +static const struct cpu_regmove_cost thunderx2t99_regmove_cost =
> +{
> +  1, /* GP2GP  */
> +  /* Avoid the use of int<->fp moves for spilling.  */
> +  5, /* GP2FP  */
> +  6, /* FP2GP  */
> +  3, /* FP2FP  */
> +};
> +
> +static const advsimd_vec_cost thunderx2t99_advsimd_vector_cost =
> +{
> +  4, /* int_stmt_cost  */
> +  5, /* fp_stmt_cost  */
> +  0, /* ld2_st2_permute_cost  */
> +  0, /* ld3_st3_permute_cost  */
> +  0, /* ld4_st4_permute_cost  */
> +  10, /* permute_cost  */
> +  6, /* reduc_i8_cost  */
> +  6, /* reduc_i16_cost  */
> +  6, /* reduc_i32_cost  */
> +  6, /* reduc_i64_cost  */
> +  6, /* reduc_f16_cost  */
> +  6, /* reduc_f32_cost  */
> +  6, /* reduc_f64_cost  */
> +  6, /* store_elt_extra_cost  */
> +  6, /* vec_to_scalar_cost  */
> +  5, /* scalar_to_vec_cost  */
> +  4, /* align_load_cost  */
> +  4, /* unalign_load_cost  */
> +  1, /* unalign_store_cost  */
> +  1  /* store_cost  */
> +};
> +
> +/* Costs for vector insn classes for Vulcan.  */
> +static const struct cpu_vector_cost thunderx2t99_vector_cost =
> +{
> +  1, /* scalar_int_stmt_cost  */
> +  6, /* scalar_fp_stmt_cost  */
> +  4, /* scalar_load_cost  */
> +  1, /* scalar_store_cost  */
> +  2, /* cond_taken_branch_cost  */
> +  1,  /* cond_not_taken_branch_cost  */
> +  &thunderx2t99_advsimd_vector_cost, /* advsimd  */
> +  nullptr, /* sve  */
> +  nullptr /* issue_info  */
> +};
> +
> +static const cpu_prefetch_tune thunderx2t99_prefetch_tune =
> +{
> +  8,			/* num_slots  */
> +  32,			/* l1_cache_size  */
> +  64,			/* l1_cache_line_size  */
> +  256,			/* l2_cache_size  */
> +  true,			/* prefetch_dynamic_strides */
> +  -1,			/* minimum_stride */
> +  -1			/* default_opt_level  */
> +};
> +
> +static const struct tune_params thunderx2t99_tunings =
> +{
> +  &thunderx2t99_extra_costs,
> +  &thunderx2t99_addrcost_table,
> +  &thunderx2t99_regmove_cost,
> +  &thunderx2t99_vector_cost,
> +  &generic_branch_cost,
> +  &generic_approx_modes,
> +  SVE_NOT_IMPLEMENTED, /* sve_width  */
> +  { 4, /* load_int.  */
> +    4, /* store_int.  */
> +    4, /* load_fp.  */
> +    4, /* store_fp.  */
> +    4, /* load_pred.  */
> +    4 /* store_pred.  */
> +  }, /* memmov_cost.  */
> +  4, /* issue_rate.  */
> +  (AARCH64_FUSE_ALU_BRANCH | AARCH64_FUSE_AES_AESMC
> +   | AARCH64_FUSE_ALU_CBZ), /* fusible_ops  */
> +  "16",	/* function_align.  */
> +  "8",	/* jump_align.  */
> +  "16",	/* loop_align.  */
> +  3,	/* int_reassoc_width.  */
> +  2,	/* fp_reassoc_width.  */
> +  1,	/* fma_reassoc_width.  */
> +  2,	/* vec_reassoc_width.  */
> +  2,	/* min_div_recip_mul_sf.  */
> +  2,	/* min_div_recip_mul_df.  */
> +  0,	/* max_case_values.  */
> +  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
> +  (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
> +  &thunderx2t99_prefetch_tune,
> +  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
> +  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
> +};
> +
> +#endif /* GCC_AARCH64_H_THUNDERX2T99.  */
> diff --git a/gcc/config/aarch64/tuning_models/thunderx3t110.h b/gcc/config/aarch64/tuning_models/thunderx3t110.h
> new file mode 100644
> index 0000000000000000000000000000000000000000..65203b4af132e12e4994013fbab228bd3873b756
> --- /dev/null
> +++ b/gcc/config/aarch64/tuning_models/thunderx3t110.h
> @@ -0,0 +1,136 @@
> +/* Tuning model description for AArch64 architecture.
> +   Copyright (C) 2009-2023 Free Software Foundation, Inc.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3, or (at your option)
> +   any later version.
> +
> +   GCC is distributed in the hope that it will be useful, but
> +   WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with GCC; see the file COPYING3.  If not see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#ifndef GCC_AARCH64_H_THUNDERX3T110
> +#define GCC_AARCH64_H_THUNDERX3T110
> +
> +#include "generic.h"
> +
> +static const struct cpu_addrcost_table thunderx3t110_addrcost_table =
> +{
> +    {
> +      1, /* hi  */
> +      1, /* si  */
> +      1, /* di  */
> +      2, /* ti  */
> +    },
> +  0, /* pre_modify  */
> +  0, /* post_modify  */
> +  0, /* post_modify_ld3_st3  */
> +  0, /* post_modify_ld4_st4  */
> +  2, /* register_offset  */
> +  3, /* register_sextend  */
> +  3, /* register_zextend  */
> +  0, /* imm_offset  */
> +};
> +
> +static const struct cpu_regmove_cost thunderx3t110_regmove_cost =
> +{
> +  1, /* GP2GP  */
> +  /* Avoid the use of int<->fp moves for spilling.  */
> +  4, /* GP2FP  */
> +  5, /* FP2GP  */
> +  4  /* FP2FP  */
> +};
> +
> +static const advsimd_vec_cost thunderx3t110_advsimd_vector_cost =
> +{
> +  5, /* int_stmt_cost  */
> +  5, /* fp_stmt_cost  */
> +  0, /* ld2_st2_permute_cost  */
> +  0, /* ld3_st3_permute_cost  */
> +  0, /* ld4_st4_permute_cost  */
> +  10, /* permute_cost  */
> +  5, /* reduc_i8_cost  */
> +  5, /* reduc_i16_cost  */
> +  5, /* reduc_i32_cost  */
> +  5, /* reduc_i64_cost  */
> +  5, /* reduc_f16_cost  */
> +  5, /* reduc_f32_cost  */
> +  5, /* reduc_f64_cost  */
> +  5, /* store_elt_extra_cost  */
> +  5, /* vec_to_scalar_cost  */
> +  5, /* scalar_to_vec_cost  */
> +  4, /* align_load_cost  */
> +  4, /* unalign_load_cost  */
> +  4, /* unalign_store_cost  */
> +  4  /* store_cost  */
> +};
> +
> +static const struct cpu_vector_cost thunderx3t110_vector_cost =
> +{
> +  1, /* scalar_int_stmt_cost  */
> +  5, /* scalar_fp_stmt_cost  */
> +  4, /* scalar_load_cost  */
> +  1, /* scalar_store_cost  */
> +  2, /* cond_taken_branch_cost  */
> +  1,  /* cond_not_taken_branch_cost  */
> +  &thunderx3t110_advsimd_vector_cost, /* advsimd  */
> +  nullptr, /* sve  */
> +  nullptr /* issue_info  */
> +};
> +
> +static const cpu_prefetch_tune thunderx3t110_prefetch_tune =
> +{
> +  8,			/* num_slots  */
> +  32,			/* l1_cache_size  */
> +  64,			/* l1_cache_line_size  */
> +  256,			/* l2_cache_size  */
> +  true,			/* prefetch_dynamic_strides */
> +  -1,			/* minimum_stride */
> +  -1			/* default_opt_level  */
> +};
> +
> +static const struct tune_params thunderx3t110_tunings =
> +{
> +  &thunderx3t110_extra_costs,
> +  &thunderx3t110_addrcost_table,
> +  &thunderx3t110_regmove_cost,
> +  &thunderx3t110_vector_cost,
> +  &generic_branch_cost,
> +  &generic_approx_modes,
> +  SVE_NOT_IMPLEMENTED, /* sve_width  */
> +  { 4, /* load_int.  */
> +    4, /* store_int.  */
> +    4, /* load_fp.  */
> +    4, /* store_fp.  */
> +    4, /* load_pred.  */
> +    4 /* store_pred.  */
> +  }, /* memmov_cost.  */
> +  6, /* issue_rate.  */
> +  (AARCH64_FUSE_ALU_BRANCH | AARCH64_FUSE_AES_AESMC
> +   | AARCH64_FUSE_ALU_CBZ), /* fusible_ops  */
> +  "16",	/* function_align.  */
> +  "8",	/* jump_align.  */
> +  "16",	/* loop_align.  */
> +  3,	/* int_reassoc_width.  */
> +  2,	/* fp_reassoc_width.  */
> +  1,	/* fma_reassoc_width.  */
> +  2,	/* vec_reassoc_width.  */
> +  2,	/* min_div_recip_mul_sf.  */
> +  2,	/* min_div_recip_mul_df.  */
> +  0,	/* max_case_values.  */
> +  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
> +  (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
> +  &thunderx3t110_prefetch_tune,
> +  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
> +  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
> +};
> +
> +#endif /* GCC_AARCH64_H_THUNDERX3T110.  */
> diff --git a/gcc/config/aarch64/tuning_models/thunderxt88.h b/gcc/config/aarch64/tuning_models/thunderxt88.h
> new file mode 100644
> index 0000000000000000000000000000000000000000..dcc74d31484ee6b99d37920dbfe7b1d59377d074
> --- /dev/null
> +++ b/gcc/config/aarch64/tuning_models/thunderxt88.h
> @@ -0,0 +1,72 @@
> +/* Tuning model description for AArch64 architecture.
> +   Copyright (C) 2009-2023 Free Software Foundation, Inc.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3, or (at your option)
> +   any later version.
> +
> +   GCC is distributed in the hope that it will be useful, but
> +   WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with GCC; see the file COPYING3.  If not see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#ifndef GCC_AARCH64_H_THUNDERXT88
> +#define GCC_AARCH64_H_THUNDERXT88
> +
> +#include "generic.h"
> +#include "thunderx.h"
> +
> +static const cpu_prefetch_tune thunderxt88_prefetch_tune =
> +{
> +  8,			/* num_slots  */
> +  32,			/* l1_cache_size  */
> +  128,			/* l1_cache_line_size  */
> +  16*1024,		/* l2_cache_size  */
> +  true,			/* prefetch_dynamic_strides */
> +  -1,			/* minimum_stride */
> +  3			/* default_opt_level  */
> +};
> +
> +static const struct tune_params thunderxt88_tunings =
> +{
> +  &thunderx_extra_costs,
> +  &generic_addrcost_table,
> +  &thunderx_regmove_cost,
> +  &thunderx_vector_cost,
> +  &generic_branch_cost,
> +  &generic_approx_modes,
> +  SVE_NOT_IMPLEMENTED, /* sve_width  */
> +  { 6, /* load_int.  */
> +    6, /* store_int.  */
> +    6, /* load_fp.  */
> +    6, /* store_fp.  */
> +    6, /* load_pred.  */
> +    6 /* store_pred.  */
> +  }, /* memmov_cost.  */
> +  2, /* issue_rate  */
> +  AARCH64_FUSE_ALU_BRANCH, /* fusible_ops  */
> +  "8",	/* function_align.  */
> +  "8",	/* jump_align.  */
> +  "8",	/* loop_align.  */
> +  2,	/* int_reassoc_width.  */
> +  4,	/* fp_reassoc_width.  */
> +  1,	/* fma_reassoc_width.  */
> +  1,	/* vec_reassoc_width.  */
> +  2,	/* min_div_recip_mul_sf.  */
> +  2,	/* min_div_recip_mul_df.  */
> +  0,	/* max_case_values.  */
> +  tune_params::AUTOPREFETCHER_OFF,	/* autoprefetcher_model.  */
> +  (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
> +  &thunderxt88_prefetch_tune,
> +  AARCH64_LDP_STP_POLICY_ALIGNED,   /* ldp_policy_model.  */
> +  AARCH64_LDP_STP_POLICY_ALIGNED    /* stp_policy_model.  */
> +};
> +
> +#endif /* GCC_AARCH64_H_THUNDERXT88.  */
> diff --git a/gcc/config/aarch64/tuning_models/tsv110.h b/gcc/config/aarch64/tuning_models/tsv110.h
> new file mode 100644
> index 0000000000000000000000000000000000000000..42aeafce652fff34e3277194993dd4aa1f0383a1
> --- /dev/null
> +++ b/gcc/config/aarch64/tuning_models/tsv110.h
> @@ -0,0 +1,137 @@
> +/* Tuning model description for AArch64 architecture.
> +   Copyright (C) 2009-2023 Free Software Foundation, Inc.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3, or (at your option)
> +   any later version.
> +
> +   GCC is distributed in the hope that it will be useful, but
> +   WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with GCC; see the file COPYING3.  If not see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#ifndef GCC_AARCH64_H_TSV110
> +#define GCC_AARCH64_H_TSV110
> +
> +#include "generic.h"
> +
> +static const struct cpu_addrcost_table tsv110_addrcost_table =
> +{
> +    {
> +      1, /* hi  */
> +      0, /* si  */
> +      0, /* di  */
> +      1, /* ti  */
> +    },
> +  0, /* pre_modify  */
> +  0, /* post_modify  */
> +  0, /* post_modify_ld3_st3  */
> +  0, /* post_modify_ld4_st4  */
> +  0, /* register_offset  */
> +  1, /* register_sextend  */
> +  1, /* register_zextend  */
> +  0, /* imm_offset  */
> +};
> +
> +static const struct cpu_regmove_cost tsv110_regmove_cost =
> +{
> +  1, /* GP2GP  */
> +  /* Avoid the use of slow int<->fp moves for spilling by setting
> +     their cost higher than memmov_cost.  */
> +  2, /* GP2FP  */
> +  3, /* FP2GP  */
> +  2  /* FP2FP  */
> +};
> +
> +static const advsimd_vec_cost tsv110_advsimd_vector_cost =
> +{
> +  2, /* int_stmt_cost  */
> +  2, /* fp_stmt_cost  */
> +  0, /* ld2_st2_permute_cost  */
> +  0, /* ld3_st3_permute_cost  */
> +  0, /* ld4_st4_permute_cost  */
> +  2, /* permute_cost  */
> +  3, /* reduc_i8_cost  */
> +  3, /* reduc_i16_cost  */
> +  3, /* reduc_i32_cost  */
> +  3, /* reduc_i64_cost  */
> +  3, /* reduc_f16_cost  */
> +  3, /* reduc_f32_cost  */
> +  3, /* reduc_f64_cost  */
> +  3, /* store_elt_extra_cost  */
> +  3, /* vec_to_scalar_cost  */
> +  2, /* scalar_to_vec_cost  */
> +  5, /* align_load_cost  */
> +  5, /* unalign_load_cost  */
> +  1, /* unalign_store_cost  */
> +  1  /* store_cost  */
> +};
> +
> +static const struct cpu_vector_cost tsv110_vector_cost =
> +{
> +  1, /* scalar_int_stmt_cost  */
> +  1, /* scalar_fp_stmt_cost  */
> +  5, /* scalar_load_cost  */
> +  1, /* scalar_store_cost  */
> +  1, /* cond_taken_branch_cost  */
> +  1, /* cond_not_taken_branch_cost  */
> +  &tsv110_advsimd_vector_cost, /* advsimd  */
> +  nullptr, /* sve  */
> +  nullptr /* issue_info  */
> +};
> +
> +static const cpu_prefetch_tune tsv110_prefetch_tune =
> +{
> +  0,                    /* num_slots  */
> +  64,                   /* l1_cache_size  */
> +  64,                   /* l1_cache_line_size  */
> +  512,                  /* l2_cache_size  */
> +  true,                 /* prefetch_dynamic_strides */
> +  -1,                   /* minimum_stride */
> +  -1                    /* default_opt_level  */
> +};
> +
> +static const struct tune_params tsv110_tunings =
> +{
> +  &tsv110_extra_costs,
> +  &tsv110_addrcost_table,
> +  &tsv110_regmove_cost,
> +  &tsv110_vector_cost,
> +  &generic_branch_cost,
> +  &generic_approx_modes,
> +  SVE_NOT_IMPLEMENTED, /* sve_width  */
> +  { 4, /* load_int.  */
> +    4, /* store_int.  */
> +    4, /* load_fp.  */
> +    4, /* store_fp.  */
> +    4, /* load_pred.  */
> +    4 /* store_pred.  */
> +  }, /* memmov_cost.  */
> +  4,    /* issue_rate  */
> +  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_ALU_BRANCH
> +   | AARCH64_FUSE_ALU_CBZ), /* fusible_ops  */
> +  "16", /* function_align.  */
> +  "4",  /* jump_align.  */
> +  "8",  /* loop_align.  */
> +  2,    /* int_reassoc_width.  */
> +  4,    /* fp_reassoc_width.  */
> +  1,	/* fma_reassoc_width.  */
> +  1,    /* vec_reassoc_width.  */
> +  2,    /* min_div_recip_mul_sf.  */
> +  2,    /* min_div_recip_mul_df.  */
> +  0,    /* max_case_values.  */
> +  tune_params::AUTOPREFETCHER_WEAK,     /* autoprefetcher_model.  */
> +  (AARCH64_EXTRA_TUNE_NONE),     /* tune_flags.  */
> +  &tsv110_prefetch_tune,
> +  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
> +  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
> +};
> +
> +#endif /* GCC_AARCH64_H_TSV110.  */
> diff --git a/gcc/config/aarch64/tuning_models/xgene1.h b/gcc/config/aarch64/tuning_models/xgene1.h
> new file mode 100644
> index 0000000000000000000000000000000000000000..53a3eb0ddeb80a9735cc988e242a70e87dc90655
> --- /dev/null
> +++ b/gcc/config/aarch64/tuning_models/xgene1.h
> @@ -0,0 +1,145 @@
> +/* Tuning model description for AArch64 architecture.
> +   Copyright (C) 2009-2023 Free Software Foundation, Inc.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3, or (at your option)
> +   any later version.
> +
> +   GCC is distributed in the hope that it will be useful, but
> +   WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with GCC; see the file COPYING3.  If not see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#ifndef GCC_AARCH64_H_XGENE1
> +#define GCC_AARCH64_H_XGENE1
> +
> +#include "generic.h"
> +
> +static const struct cpu_addrcost_table xgene1_addrcost_table =
> +{
> +    {
> +      1, /* hi  */
> +      0, /* si  */
> +      0, /* di  */
> +      1, /* ti  */
> +    },
> +  1, /* pre_modify  */
> +  1, /* post_modify  */
> +  1, /* post_modify_ld3_st3  */
> +  1, /* post_modify_ld4_st4  */
> +  0, /* register_offset  */
> +  1, /* register_sextend  */
> +  1, /* register_zextend  */
> +  0, /* imm_offset  */
> +};
> +
> +static const struct cpu_regmove_cost xgene1_regmove_cost =
> +{
> +  1, /* GP2GP  */
> +  /* Avoid the use of slow int<->fp moves for spilling by setting
> +     their cost higher than memmov_cost.  */
> +  8, /* GP2FP  */
> +  8, /* FP2GP  */
> +  2 /* FP2FP  */
> +};
> +
> +static const advsimd_vec_cost xgene1_advsimd_vector_cost =
> +{
> +  2, /* int_stmt_cost  */
> +  2, /* fp_stmt_cost  */
> +  0, /* ld2_st2_permute_cost  */
> +  0, /* ld3_st3_permute_cost  */
> +  0, /* ld4_st4_permute_cost  */
> +  2, /* permute_cost  */
> +  4, /* reduc_i8_cost  */
> +  4, /* reduc_i16_cost  */
> +  4, /* reduc_i32_cost  */
> +  4, /* reduc_i64_cost  */
> +  4, /* reduc_f16_cost  */
> +  4, /* reduc_f32_cost  */
> +  4, /* reduc_f64_cost  */
> +  4, /* store_elt_extra_cost  */
> +  4, /* vec_to_scalar_cost  */
> +  4, /* scalar_to_vec_cost  */
> +  10, /* align_load_cost  */
> +  10, /* unalign_load_cost  */
> +  2, /* unalign_store_cost  */
> +  2  /* store_cost  */
> +};
> +
> +/* Generic costs for vector insn classes.  */
> +static const struct cpu_vector_cost xgene1_vector_cost =
> +{
> +  1, /* scalar_int_stmt_cost  */
> +  1, /* scalar_fp_stmt_cost  */
> +  5, /* scalar_load_cost  */
> +  1, /* scalar_store_cost  */
> +  2, /* cond_taken_branch_cost  */
> +  1, /* cond_not_taken_branch_cost  */
> +  &xgene1_advsimd_vector_cost, /* advsimd  */
> +  nullptr, /* sve  */
> +  nullptr /* issue_info  */
> +};
> +
> +/* Approximation modes for X-Gene 1.  */
> +static const cpu_approx_modes xgene1_approx_modes =
> +{
> +  AARCH64_APPROX_NONE,	/* division  */
> +  AARCH64_APPROX_NONE,	/* sqrt  */
> +  AARCH64_APPROX_ALL	/* recip_sqrt  */
> +};
> +
> +static const cpu_prefetch_tune xgene1_prefetch_tune =
> +{
> +  8,			/* num_slots  */
> +  32,			/* l1_cache_size  */
> +  64,			/* l1_cache_line_size  */
> +  256,			/* l2_cache_size  */
> +  true,                 /* prefetch_dynamic_strides */
> +  -1,                   /* minimum_stride */
> +  -1			/* default_opt_level  */
> +};
> +
> +static const struct tune_params xgene1_tunings =
> +{
> +  &xgene1_extra_costs,
> +  &xgene1_addrcost_table,
> +  &xgene1_regmove_cost,
> +  &xgene1_vector_cost,
> +  &generic_branch_cost,
> +  &xgene1_approx_modes,
> +  SVE_NOT_IMPLEMENTED, /* sve_width  */
> +  { 6, /* load_int.  */
> +    6, /* store_int.  */
> +    6, /* load_fp.  */
> +    6, /* store_fp.  */
> +    6, /* load_pred.  */
> +    6 /* store_pred.  */
> +  }, /* memmov_cost.  */
> +  4, /* issue_rate  */
> +  AARCH64_FUSE_NOTHING, /* fusible_ops  */
> +  "16",	/* function_align.  */
> +  "16",	/* jump_align.  */
> +  "16",	/* loop_align.  */
> +  2,	/* int_reassoc_width.  */
> +  4,	/* fp_reassoc_width.  */
> +  1,	/* fma_reassoc_width.  */
> +  1,	/* vec_reassoc_width.  */
> +  2,	/* min_div_recip_mul_sf.  */
> +  2,	/* min_div_recip_mul_df.  */
> +  17,	/* max_case_values.  */
> +  tune_params::AUTOPREFETCHER_OFF,	/* autoprefetcher_model.  */
> +  (AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS),	/* tune_flags.  */
> +  &xgene1_prefetch_tune,
> +  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
> +  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
> +};
> +
> +#endif /* GCC_AARCH64_H_XGENE1.  */
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/6]AArch64: Remove special handling of generic cpu.
  2023-11-15 17:07 ` [PATCH 2/6]AArch64: Remove special handling of generic cpu Tamar Christina
@ 2023-11-16  9:14   ` Richard Earnshaw
  0 siblings, 0 replies; 14+ messages in thread
From: Richard Earnshaw @ 2023-11-16  9:14 UTC (permalink / raw)
  To: Tamar Christina, gcc-patches
  Cc: nd, Richard.Earnshaw, Marcus.Shawcroft, Kyrylo.Tkachov,
	richard.sandiford



On 15/11/2023 17:07, Tamar Christina wrote:
> Hi All,
> 
> In anticipation of adding new generic turning values this removes the hardcoding
> of the "generic" CPU and instead just specifies it as a normal CPU.
> 
> No change in behavior is expected.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	PR target/111370
> 	* config/aarch64/aarch64-cores.def: Add generic.
> 	* config/aarch64/aarch64-opts.h (enum aarch64_proc): Remove generic.
> 	* config/aarch64/aarch64-tune.md: Regenerate
> 	* config/aarch64/aarch64.cc (all_cores): Remove generic
> 	* config/aarch64/aarch64.h (enum target_cpus): Remove
> 	TARGET_CPU_generic.
> 

OK.

R.
> --- inline copy of patch --
> diff --git a/gcc/config/aarch64/aarch64-cores.def b/gcc/config/aarch64/aarch64-cores.def
> index eae40b29df6f8ae353d168b6f73845846d1da94b..3e363bd0e8bbc10cb5b28d6183647736318e6d40 100644
> --- a/gcc/config/aarch64/aarch64-cores.def
> +++ b/gcc/config/aarch64/aarch64-cores.def
> @@ -189,4 +189,7 @@ AARCH64_CORE("neoverse-n2", neoversen2, cortexa57, V9A, (I8MM, BF16, SVE2_BITPER
>   AARCH64_CORE("neoverse-v2", neoversev2, cortexa57, V9A, (I8MM, BF16, SVE2_BITPERM, RNG, MEMTAG, PROFILE), neoversev2, 0x41, 0xd4f, -1)
>   AARCH64_CORE("demeter", demeter, cortexa57, V9A, (I8MM, BF16, SVE2_BITPERM, RNG, MEMTAG, PROFILE), neoversev2, 0x41, 0xd4f, -1)
>   
> +/* Generic Architecture Processors.  */
> +AARCH64_CORE("generic",  generic, cortexa53, V8A,  (), generic, 0x0, 0x0, -1)
> +
>   #undef AARCH64_CORE
> diff --git a/gcc/config/aarch64/aarch64-opts.h b/gcc/config/aarch64/aarch64-opts.h
> index 831e28ab52a4271ef5467965039a32d078755d42..01151e93d17979f499523cabb74a449170483a70 100644
> --- a/gcc/config/aarch64/aarch64-opts.h
> +++ b/gcc/config/aarch64/aarch64-opts.h
> @@ -32,8 +32,6 @@ enum aarch64_processor
>   #define AARCH64_CORE(NAME, INTERNAL_IDENT, SCHED, ARCH, FLAGS, COSTS, IMP, PART, VARIANT) \
>     INTERNAL_IDENT,
>   #include "aarch64-cores.def"
> -  /* Used to indicate that no processor has been specified.  */
> -  generic,
>     /* Used to mark the end of the processor table.  */
>     aarch64_none
>   };
> diff --git a/gcc/config/aarch64/aarch64-tune.md b/gcc/config/aarch64/aarch64-tune.md
> index c969277d617ad5fd070a915bfedb83323eb71e6c..cd5d79ea9c221874578a4d5804e4f618e671ebcd 100644
> --- a/gcc/config/aarch64/aarch64-tune.md
> +++ b/gcc/config/aarch64/aarch64-tune.md
> @@ -1,5 +1,5 @@
>   ;; -*- buffer-read-only: t -*-
>   ;; Generated automatically by gentune.sh from aarch64-cores.def
>   (define_attr "tune"
> -	"cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,ampere1a,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,neoversen1,ares,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,neoversev1,zeus,neoverse512tvb,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa520,cortexa710,cortexa715,cortexa720,cortexx2,cortexx3,cortexx4,neoversen2,neoversev2,demeter"
> +	"cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,ampere1a,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,neoversen1,ares,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,neoversev1,zeus,neoverse512tvb,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa520,cortexa710,cortexa715,cortexa720,cortexx2,cortexx3,cortexx4,neoversen2,neoversev2,demeter,generic"
>   	(const (symbol_ref "((enum attr_tune) aarch64_tune)")))
> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> index d74e9116fc56cfa85558cc0810f76479e7280f69..b178bb5b62dbdcb1f5edbad4155416d6093a11f3 100644
> --- a/gcc/config/aarch64/aarch64.h
> +++ b/gcc/config/aarch64/aarch64.h
> @@ -720,7 +720,6 @@ enum target_cpus
>   #define AARCH64_CORE(NAME, INTERNAL_IDENT, SCHED, ARCH, FLAGS, COSTS, IMP, PART, VARIANT) \
>     TARGET_CPU_##INTERNAL_IDENT,
>   #include "aarch64-cores.def"
> -  TARGET_CPU_generic
>   };
>   
>   /* If there is no CPU defined at configure, use generic as default.  */
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 07b1cde39209f5c7740e336b499e9aed31e4c515..086448632700bc97b0d4c75d85cef63f820e9944 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -427,8 +427,6 @@ static const struct processor all_cores[] =
>     {NAME, IDENT, SCHED, AARCH64_ARCH_##ARCH, \
>      feature_deps::cpu_##IDENT, &COSTS##_tunings},
>   #include "aarch64-cores.def"
> -  {"generic", generic, cortexa53, AARCH64_ARCH_V8A,
> -   feature_deps::V8A ().enable, &generic_tunings},
>     {NULL, aarch64_none, aarch64_none, aarch64_no_arch, 0, NULL}
>   };
>   
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/6]AArch64: Add new generic-armv8-a CPU and make it the default.
  2023-11-15 17:07 ` [PATCH 3/6]AArch64: Add new generic-armv8-a CPU and make it the default Tamar Christina
@ 2023-11-16  9:23   ` Richard Earnshaw
  0 siblings, 0 replies; 14+ messages in thread
From: Richard Earnshaw @ 2023-11-16  9:23 UTC (permalink / raw)
  To: Tamar Christina, gcc-patches
  Cc: nd, Richard.Earnshaw, Marcus.Shawcroft, Kyrylo.Tkachov,
	richard.sandiford



On 15/11/2023 17:07, Tamar Christina wrote:
> Hi All,
> 
> This patch adds a new generic scheduling model "generic-armv8-a" and makes it
> the default for all Armv8 architectures.
> 
> -mcpu=generic and -mtune=generic is kept around for those that really want the
> deprecated cost model.

Rather than referring to generic as deprecated, I think we should update 
the documentation to make it clear that generic may change from release 
to release based on typical hardware availability at the time each 
version of GCC is released.  It should, however, mean the same thing for 
all minor updates of a single major version of GCC.

> 
> This shows on SPECCPU 2017 the following:
> 
> generic:  SPECINT 1.0% imporvement in geomean, SPECFP -0.6%.  The SPECFP is due

if this text is going in the commit log, there's a typo: improvement.

>            to fotonik3d_r where we vectorize an FP calculation that only ever
> 	  needs one lane of the result.  This I believe is a generic costing bug
> 	  but at the moment we can't change costs of FP and INT independently.
> 	  So will defer updating that cost to stage3 after Richard's other
> 	  costing updates land.
> 
> generic SVE: SPECINT 1.1% improvement in geomean, SPECFP 0.7% improvement.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	PR target/111370
> 	* config/aarch64/aarch64-arches.def (armv8-9, armv8-a, armv8.1-a,
> 	armv8.2-a, armv8.3-a, armv8.4-a, armv8.5-a, armv8.6-a, armv8.7-a,
> 	armv8.8-a): Update to generic_armv8_a.
> 	* config/aarch64/aarch64-cores.def (generic-armv8-a): New.
> 	* config/aarch64/aarch64-tune.md: Regenerate.
> 	* config/aarch64/aarch64.cc: Include generic_armv8_a.h
> 	* config/aarch64/aarch64.h (TARGET_CPU_DEFAULT): Change to
> 	TARGET_CPU_generic_armv8_a.
> 	* config/aarch64/tuning_models/generic_armv8_a.h: New file.

OK.

R.
> 
> --- inline copy of patch --
> diff --git a/gcc/config/aarch64/aarch64-arches.def b/gcc/config/aarch64/aarch64-arches.def
> index 7ae92aa8e984e0a77efd5c5a5061c4c6f86e0118..f89e4ea1f48acc2875c9a834d93d94c94163cddc 100644
> --- a/gcc/config/aarch64/aarch64-arches.def
> +++ b/gcc/config/aarch64/aarch64-arches.def
> @@ -30,19 +30,19 @@
>      Due to the assumptions about the positions of these fields in config.gcc,
>      NAME should be kept as the first argument.  */
>   
> -AARCH64_ARCH("armv8-a",       generic,       V8A,       8,  (SIMD))
> -AARCH64_ARCH("armv8.1-a",     generic,       V8_1A,     8,  (V8A, LSE, CRC, RDMA))
> -AARCH64_ARCH("armv8.2-a",     generic,       V8_2A,     8,  (V8_1A))
> -AARCH64_ARCH("armv8.3-a",     generic,       V8_3A,     8,  (V8_2A, PAUTH, RCPC))
> -AARCH64_ARCH("armv8.4-a",     generic,       V8_4A,     8,  (V8_3A, F16FML, DOTPROD, FLAGM))
> -AARCH64_ARCH("armv8.5-a",     generic,       V8_5A,     8,  (V8_4A, SB, SSBS, PREDRES))
> -AARCH64_ARCH("armv8.6-a",     generic,       V8_6A,     8,  (V8_5A, I8MM, BF16))
> -AARCH64_ARCH("armv8.7-a",     generic,       V8_7A,     8,  (V8_6A, LS64))
> -AARCH64_ARCH("armv8.8-a",     generic,       V8_8A,     8,  (V8_7A, MOPS))
> -AARCH64_ARCH("armv8-r",       generic,       V8R  ,     8,  (V8_4A))
> -AARCH64_ARCH("armv9-a",       generic,       V9A  ,     9,  (V8_5A, SVE2))
> -AARCH64_ARCH("armv9.1-a",     generic,       V9_1A,     9,  (V8_6A, V9A))
> -AARCH64_ARCH("armv9.2-a",     generic,       V9_2A,     9,  (V8_7A, V9_1A))
> -AARCH64_ARCH("armv9.3-a",     generic,       V9_3A,     9,  (V8_8A, V9_2A))
> +AARCH64_ARCH("armv8-a",       generic_armv8_a,   V8A,       8,  (SIMD))
> +AARCH64_ARCH("armv8.1-a",     generic_armv8_a,   V8_1A,     8,  (V8A, LSE, CRC, RDMA))
> +AARCH64_ARCH("armv8.2-a",     generic_armv8_a,   V8_2A,     8,  (V8_1A))
> +AARCH64_ARCH("armv8.3-a",     generic_armv8_a,   V8_3A,     8,  (V8_2A, PAUTH, RCPC))
> +AARCH64_ARCH("armv8.4-a",     generic_armv8_a,   V8_4A,     8,  (V8_3A, F16FML, DOTPROD, FLAGM))
> +AARCH64_ARCH("armv8.5-a",     generic_armv8_a,   V8_5A,     8,  (V8_4A, SB, SSBS, PREDRES))
> +AARCH64_ARCH("armv8.6-a",     generic_armv8_a,   V8_6A,     8,  (V8_5A, I8MM, BF16))
> +AARCH64_ARCH("armv8.7-a",     generic_armv8_a,   V8_7A,     8,  (V8_6A, LS64))
> +AARCH64_ARCH("armv8.8-a",     generic_armv8_a,   V8_8A,     8,  (V8_7A, MOPS))
> +AARCH64_ARCH("armv8-r",       generic_armv8_a,   V8R  ,     8,  (V8_4A))
> +AARCH64_ARCH("armv9-a",       generic,           V9A  ,     9,  (V8_5A, SVE2))
> +AARCH64_ARCH("armv9.1-a",     generic,           V9_1A,     9,  (V8_6A, V9A))
> +AARCH64_ARCH("armv9.2-a",     generic,           V9_2A,     9,  (V8_7A, V9_1A))
> +AARCH64_ARCH("armv9.3-a",     generic,           V9_3A,     9,  (V8_8A, V9_2A))
>   
>   #undef AARCH64_ARCH
> diff --git a/gcc/config/aarch64/aarch64-cores.def b/gcc/config/aarch64/aarch64-cores.def
> index 3e363bd0e8bbc10cb5b28d6183647736318e6d40..30f4dd04ed71823bc34c0c405d49963b6b2d1375 100644
> --- a/gcc/config/aarch64/aarch64-cores.def
> +++ b/gcc/config/aarch64/aarch64-cores.def
> @@ -191,5 +191,6 @@ AARCH64_CORE("demeter", demeter, cortexa57, V9A, (I8MM, BF16, SVE2_BITPERM, RNG,
>   
>   /* Generic Architecture Processors.  */
>   AARCH64_CORE("generic",  generic, cortexa53, V8A,  (), generic, 0x0, 0x0, -1)
> +AARCH64_CORE("generic-armv8-a",  generic_armv8_a, cortexa53, V8A,  (), generic_armv8_a, 0x0, 0x0, -1)
>   
>   #undef AARCH64_CORE
> diff --git a/gcc/config/aarch64/aarch64-tune.md b/gcc/config/aarch64/aarch64-tune.md
> index cd5d79ea9c221874578a4d5804e4f618e671ebcd..0a32056f255de455f47a0b7395dfef0af84c6b5e 100644
> --- a/gcc/config/aarch64/aarch64-tune.md
> +++ b/gcc/config/aarch64/aarch64-tune.md
> @@ -1,5 +1,5 @@
>   ;; -*- buffer-read-only: t -*-
>   ;; Generated automatically by gentune.sh from aarch64-cores.def
>   (define_attr "tune"
> -	"cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,ampere1a,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,neoversen1,ares,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,neoversev1,zeus,neoverse512tvb,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa520,cortexa710,cortexa715,cortexa720,cortexx2,cortexx3,cortexx4,neoversen2,neoversev2,demeter,generic"
> +	"cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,ampere1a,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,neoversen1,ares,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,neoversev1,zeus,neoverse512tvb,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa520,cortexa710,cortexa715,cortexa720,cortexx2,cortexx3,cortexx4,neoversen2,neoversev2,demeter,generic,generic_armv8_a"
>   	(const (symbol_ref "((enum attr_tune) aarch64_tune)")))
> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> index 145bf536c28fdef84246e16d8351f4b4e357d27c..1ac298926ce1606a87bcdcaf691f182ca416d600 100644
> --- a/gcc/config/aarch64/aarch64.h
> +++ b/gcc/config/aarch64/aarch64.h
> @@ -724,7 +724,7 @@ enum target_cpus
>   
>   /* If there is no CPU defined at configure, use generic as default.  */
>   #ifndef TARGET_CPU_DEFAULT
> -# define TARGET_CPU_DEFAULT TARGET_CPU_generic
> +# define TARGET_CPU_DEFAULT TARGET_CPU_generic_armv8_a
>   #endif
>   
>   /* If inserting NOP before a mult-accumulate insn remember to adjust the
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 9d59431d933021d71c5c202f0a61f807a2d2b0f1..1f5645e4886acd30ee5a437f60ffb53ee7b09436 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -355,6 +355,7 @@ static const struct aarch64_flag_desc aarch64_tuning_flags[] =
>   
>   /* Tuning parameters.  */
>   #include "tuning_models/generic.h"
> +#include "tuning_models/generic_armv8_a.h"
>   #include "tuning_models/cortexa35.h"
>   #include "tuning_models/cortexa53.h"
>   #include "tuning_models/cortexa57.h"
> diff --git a/gcc/config/aarch64/tuning_models/generic_armv8_a.h b/gcc/config/aarch64/tuning_models/generic_armv8_a.h
> new file mode 100644
> index 0000000000000000000000000000000000000000..82abe172834756696a3905dbf92464f73a1ea3da
> --- /dev/null
> +++ b/gcc/config/aarch64/tuning_models/generic_armv8_a.h
> @@ -0,0 +1,191 @@
> +/* Tuning model description for AArch64 architecture.
> +   Copyright (C) 2023 Free Software Foundation, Inc.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3, or (at your option)
> +   any later version.
> +
> +   GCC is distributed in the hope that it will be useful, but
> +   WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with GCC; see the file COPYING3.  If not see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#ifndef GCC_AARCH64_H_GENERIC_ARMV8_A
> +#define GCC_AARCH64_H_GENERIC_ARMV8_A
> +
> +#include "generic.h"
> +
> +static const struct cpu_addrcost_table generic_armv8_a_addrcost_table =
> +{
> +    {
> +      1, /* hi  */
> +      0, /* si  */
> +      0, /* di  */
> +      1, /* ti  */
> +    },
> +  0, /* pre_modify  */
> +  0, /* post_modify  */
> +  0, /* post_modify_ld3_st3  */
> +  0, /* post_modify_ld4_st4  */
> +  0, /* register_offset  */
> +  0, /* register_sextend  */
> +  0, /* register_zextend  */
> +  0 /* imm_offset  */
> +};
> +
> +static const struct cpu_regmove_cost generic_armv8_a_regmove_cost =
> +{
> +  1, /* GP2GP  */
> +  /* Avoid the use of slow int<->fp moves for spilling by setting
> +     their cost higher than memmov_cost.  */
> +  5, /* GP2FP  */
> +  5, /* FP2GP  */
> +  2 /* FP2FP  */
> +};
> +
> +/* Generic costs for Advanced SIMD vector operations.   */
> +static const advsimd_vec_cost generic_armv8_a_advsimd_vector_cost =
> +{
> +  1, /* int_stmt_cost  */
> +  1, /* fp_stmt_cost  */
> +  0, /* ld2_st2_permute_cost  */
> +  0, /* ld3_st3_permute_cost  */
> +  0, /* ld4_st4_permute_cost  */
> +  2, /* permute_cost  */
> +  2, /* reduc_i8_cost  */
> +  2, /* reduc_i16_cost  */
> +  2, /* reduc_i32_cost  */
> +  2, /* reduc_i64_cost  */
> +  2, /* reduc_f16_cost  */
> +  2, /* reduc_f32_cost  */
> +  2, /* reduc_f64_cost  */
> +  2, /* store_elt_extra_cost  */
> +  2, /* vec_to_scalar_cost  */
> +  1, /* scalar_to_vec_cost  */
> +  1, /* align_load_cost  */
> +  1, /* unalign_load_cost  */
> +  1, /* unalign_store_cost  */
> +  1  /* store_cost  */
> +};
> +
> +/* Generic costs for SVE vector operations.  */
> +static const sve_vec_cost generic_armv8_a_sve_vector_cost =
> +{
> +  {
> +    1, /* int_stmt_cost  */
> +    1, /* fp_stmt_cost  */
> +    0, /* ld2_st2_permute_cost  */
> +    0, /* ld3_st3_permute_cost  */
> +    0, /* ld4_st4_permute_cost  */
> +    2, /* permute_cost  */
> +    2, /* reduc_i8_cost  */
> +    2, /* reduc_i16_cost  */
> +    2, /* reduc_i32_cost  */
> +    2, /* reduc_i64_cost  */
> +    2, /* reduc_f16_cost  */
> +    2, /* reduc_f32_cost  */
> +    2, /* reduc_f64_cost  */
> +    2, /* store_elt_extra_cost  */
> +    2, /* vec_to_scalar_cost  */
> +    1, /* scalar_to_vec_cost  */
> +    1, /* align_load_cost  */
> +    1, /* unalign_load_cost  */
> +    1, /* unalign_store_cost  */
> +    1  /* store_cost  */
> +  },
> +  2, /* clast_cost  */
> +  2, /* fadda_f16_cost  */
> +  2, /* fadda_f32_cost  */
> +  2, /* fadda_f64_cost  */
> +  4, /* gather_load_x32_cost  */
> +  2, /* gather_load_x64_cost  */
> +  1 /* scatter_store_elt_cost  */
> +};
> +
> +/* Generic costs for vector insn classes.  */
> +static const struct cpu_vector_cost generic_armv8_a_vector_cost =
> +{
> +  1, /* scalar_int_stmt_cost  */
> +  1, /* scalar_fp_stmt_cost  */
> +  1, /* scalar_load_cost  */
> +  1, /* scalar_store_cost  */
> +  3, /* cond_taken_branch_cost  */
> +  1, /* cond_not_taken_branch_cost  */
> +  &generic_armv8_a_advsimd_vector_cost, /* advsimd  */
> +  &generic_armv8_a_sve_vector_cost, /* sve */
> +  nullptr /* issue_info  */
> +};
> +
> +/* Generic costs for branch instructions.  */
> +static const struct cpu_branch_cost generic_armv8_a_branch_cost =
> +{
> +  1,  /* Predictable.  */
> +  3   /* Unpredictable.  */
> +};
> +
> +/* Generic approximation modes.  */
> +static const cpu_approx_modes generic_armv8_a_approx_modes =
> +{
> +  AARCH64_APPROX_NONE,	/* division  */
> +  AARCH64_APPROX_NONE,	/* sqrt  */
> +  AARCH64_APPROX_NONE	/* recip_sqrt  */
> +};
> +
> +/* Generic prefetch settings (which disable prefetch).  */
> +static const cpu_prefetch_tune generic_armv8_a_prefetch_tune =
> +{
> +  0,			/* num_slots  */
> +  -1,			/* l1_cache_size  */
> +  -1,			/* l1_cache_line_size  */
> +  -1,			/* l2_cache_size  */
> +  true,			/* prefetch_dynamic_strides */
> +  -1,			/* minimum_stride */
> +  -1			/* default_opt_level  */
> +};
> +
> +static const struct tune_params generic_armv8_a_tunings =
> +{
> +  &cortexa76_extra_costs,
> +  &generic_armv8_a_addrcost_table,
> +  &generic_armv8_a_regmove_cost,
> +  &generic_armv8_a_vector_cost,
> +  &generic_armv8_a_branch_cost,
> +  &generic_armv8_a_approx_modes,
> +  SVE_NOT_IMPLEMENTED, /* sve_width  */
> +  { 4, /* load_int.  */
> +    2, /* store_int.  */
> +    5, /* load_fp.  */
> +    2, /* store_fp.  */
> +    4, /* load_pred.  */
> +    4 /* store_pred.  */
> +  }, /* memmov_cost.  */
> +  3, /* issue_rate  */
> +  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops  */
> +  "32:16",	/* function_align.  */
> +  "4",		/* jump_align.  */
> +  "32:16",	/* loop_align.  */
> +  2,	/* int_reassoc_width.  */
> +  4,	/* fp_reassoc_width.  */
> +  1,	/* fma_reassoc_width.  */
> +  2,	/* vec_reassoc_width.  */
> +  2,	/* min_div_recip_mul_sf.  */
> +  2,	/* min_div_recip_mul_df.  */
> +  0,	/* max_case_values.  */
> +  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
> +  (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND
> +   | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
> +   | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
> +   | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT),	/* tune_flags.  */
> +  &generic_prefetch_tune,
> +  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
> +  AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
> +};
> +
> +#endif /* GCC_AARCH64_H_GENERIC_ARMV8_A.  */
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_asrd_1.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_asrd_1.c
> index aac06bd8093bed9e50928ee23f9a075888f14543..96e9935360100e25a4c01cceabc7aa840f520a3e 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/cond_asrd_1.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_asrd_1.c
> @@ -1,5 +1,5 @@
>   /* { dg-do compile } */
> -/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256" } */
> +/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256 --param=aarch64-autovec-preference=2" } */
>   
>   #include <stdint.h>
>   
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_4.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_4.c
> index f6278916e1afeb3f0cb8fdbff4e98782ad0a726e..6f969a829425960b414508a7e354a1f39426a0e4 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_4.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_4.c
> @@ -1,5 +1,5 @@
>   /* { dg-do compile } */
> -/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256" } */
> +/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256 --param=aarch64-autovec-preference=2" } */
>   
>   #include <stdint.h>
>   
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_5.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_5.c
> index 03a6636f2d20b12f7e950a5bd6e43216139370fa..e6ec5157cd6dcc6b6dc24c5384432289b6dcdfba 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_5.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_5.c
> @@ -1,5 +1,5 @@
>   /* { dg-do compile } */
> -/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256" } */
> +/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256 --param=aarch64-autovec-preference=2" } */
>   
>   #include <stdint.h>
>   
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_uxt_5.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_uxt_5.c
> index 9a2bd8f152ff32e8da1c4e2a73a31a249e5991c7..7ed35921b6f914441dc463c4030fcc4663a6813c 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/cond_uxt_5.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_uxt_5.c
> @@ -1,5 +1,5 @@
>   /* { dg-do compile } */
> -/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256" } */
> +/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256 --param=aarch64-autovec-preference=2" } */
>   
>   #include <stdint.h>
>   
> diff --git a/gcc/testsuite/gcc.target/aarch64/target_attr_13.c b/gcc/testsuite/gcc.target/aarch64/target_attr_13.c
> index d5bee3a7b900bf9348c9cbfd67f487c381b13bf6..4bdb167944cda1861dd0462d905149646be69693 100644
> --- a/gcc/testsuite/gcc.target/aarch64/target_attr_13.c
> +++ b/gcc/testsuite/gcc.target/aarch64/target_attr_13.c
> @@ -1,5 +1,5 @@
>   /* { dg-do assemble } */
> -/* { dg-options "-O2 -march=armv8-a+crc+crypto -mcpu=generic" } */
> +/* { dg-options "-O2 -mcpu=generic+crypto" } */
>   
>   #include "arm_acle.h"
>   
> diff --git a/gcc/testsuite/gcc.target/aarch64/target_attr_15.c b/gcc/testsuite/gcc.target/aarch64/target_attr_15.c
> index 069a0010865334324a100bab358bb53369f122fb..e6f31ba72ee77d1129f3cfbe2d90216d6c355c57 100644
> --- a/gcc/testsuite/gcc.target/aarch64/target_attr_15.c
> +++ b/gcc/testsuite/gcc.target/aarch64/target_attr_15.c
> @@ -1,5 +1,5 @@
>   /* { dg-do assemble } */
> -/* { dg-options "-march=armv8-a+crypto -mcpu=generic -save-temps" } */
> +/* { dg-options "-mcpu=generic+crypto -save-temps" } */
>   
>   /* Check that "+nothing" clears the ISA flags.  */
>   
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/6]AArch64: Add new generic-armv9-a CPU and make it the default for Armv9
  2023-11-15 17:08 ` [PATCH 4/6]AArch64: Add new generic-armv9-a CPU and make it the default for Armv9 Tamar Christina
@ 2023-11-16  9:23   ` Richard Earnshaw
  0 siblings, 0 replies; 14+ messages in thread
From: Richard Earnshaw @ 2023-11-16  9:23 UTC (permalink / raw)
  To: Tamar Christina, gcc-patches
  Cc: nd, Richard.Earnshaw, Marcus.Shawcroft, Kyrylo.Tkachov,
	richard.sandiford



On 15/11/2023 17:08, Tamar Christina wrote:
> Hi All,
> 
> This patch adds a new generic scheduling model "generic-armv9-a" and makes it
> the default for all Armv9 architectures.
> 
> -mcpu=generic and -mtune=generic is kept around for those that really want the
> deprecated cost model.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	PR target/111370
> 	* config/aarch64/aarch64-arches.def (armv9-a, armv9.1-a, armv9.2-a,
> 	armv9.3-a): Update to generic-armv9-a.
> 	* config/aarch64/aarch64-cores.def (generic-armv9-a): New.
> 	* config/aarch64/aarch64-tune.md: Regenerate.
> 	* config/aarch64/aarch64.cc: Include generic_armv9_a.h.
> 	* config/aarch64/tuning_models/generic_armv9_a.h: New file.

OK, but see the comment on patch 3 about 'generic'.

R.

> 
> --- inline copy of patch --
> diff --git a/gcc/config/aarch64/aarch64-arches.def b/gcc/config/aarch64/aarch64-arches.def
> index f89e4ea1f48acc2875c9a834d93d94c94163cddc..6b9a19c490ba0b35082077e877b19906138f039b 100644
> --- a/gcc/config/aarch64/aarch64-arches.def
> +++ b/gcc/config/aarch64/aarch64-arches.def
> @@ -40,9 +40,9 @@ AARCH64_ARCH("armv8.6-a",     generic_armv8_a,   V8_6A,     8,  (V8_5A, I8MM, BF
>   AARCH64_ARCH("armv8.7-a",     generic_armv8_a,   V8_7A,     8,  (V8_6A, LS64))
>   AARCH64_ARCH("armv8.8-a",     generic_armv8_a,   V8_8A,     8,  (V8_7A, MOPS))
>   AARCH64_ARCH("armv8-r",       generic_armv8_a,   V8R  ,     8,  (V8_4A))
> -AARCH64_ARCH("armv9-a",       generic,           V9A  ,     9,  (V8_5A, SVE2))
> -AARCH64_ARCH("armv9.1-a",     generic,           V9_1A,     9,  (V8_6A, V9A))
> -AARCH64_ARCH("armv9.2-a",     generic,           V9_2A,     9,  (V8_7A, V9_1A))
> -AARCH64_ARCH("armv9.3-a",     generic,           V9_3A,     9,  (V8_8A, V9_2A))
> +AARCH64_ARCH("armv9-a",       generic_armv9_a,   V9A  ,     9,  (V8_5A, SVE2))
> +AARCH64_ARCH("armv9.1-a",     generic_armv9_a,   V9_1A,     9,  (V8_6A, V9A))
> +AARCH64_ARCH("armv9.2-a",     generic_armv9_a,   V9_2A,     9,  (V8_7A, V9_1A))
> +AARCH64_ARCH("armv9.3-a",     generic_armv9_a,   V9_3A,     9,  (V8_8A, V9_2A))
>   
>   #undef AARCH64_ARCH
> diff --git a/gcc/config/aarch64/aarch64-cores.def b/gcc/config/aarch64/aarch64-cores.def
> index 30f4dd04ed71823bc34c0c405d49963b6b2d1375..16752b77f4baf8d1aa8a5406826aa29e367120c5 100644
> --- a/gcc/config/aarch64/aarch64-cores.def
> +++ b/gcc/config/aarch64/aarch64-cores.def
> @@ -191,6 +191,7 @@ AARCH64_CORE("demeter", demeter, cortexa57, V9A, (I8MM, BF16, SVE2_BITPERM, RNG,
>   
>   /* Generic Architecture Processors.  */
>   AARCH64_CORE("generic",  generic, cortexa53, V8A,  (), generic, 0x0, 0x0, -1)
> -AARCH64_CORE("generic-armv8-a",  generic_armv8_a, cortexa53, V8A,  (), generic_armv8_a, 0x0, 0x0, -1)
> +AARCH64_CORE("generic-armv8-a",  generic_armv8_a, cortexa53, V8A, (), generic_armv8_a, 0x0, 0x0, -1)
> +AARCH64_CORE("generic-armv9-a",  generic_armv9_a, cortexa53, V9A, (), generic_armv9_a, 0x0, 0x0, -1)
>   
>   #undef AARCH64_CORE
> diff --git a/gcc/config/aarch64/aarch64-tune.md b/gcc/config/aarch64/aarch64-tune.md
> index 0a32056f255de455f47a0b7395dfef0af84c6b5e..61bb85211252970f0a0526929d6b88353bdd930f 100644
> --- a/gcc/config/aarch64/aarch64-tune.md
> +++ b/gcc/config/aarch64/aarch64-tune.md
> @@ -1,5 +1,5 @@
>   ;; -*- buffer-read-only: t -*-
>   ;; Generated automatically by gentune.sh from aarch64-cores.def
>   (define_attr "tune"
> -	"cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,ampere1a,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,neoversen1,ares,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,neoversev1,zeus,neoverse512tvb,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa520,cortexa710,cortexa715,cortexa720,cortexx2,cortexx3,cortexx4,neoversen2,neoversev2,demeter,generic,generic_armv8_a"
> +	"cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,ampere1a,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,neoversen1,ares,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,neoversev1,zeus,neoverse512tvb,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa520,cortexa710,cortexa715,cortexa720,cortexx2,cortexx3,cortexx4,neoversen2,neoversev2,demeter,generic,generic_armv8_a,generic_armv9_a"
>   	(const (symbol_ref "((enum attr_tune) aarch64_tune)")))
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 08635e0df9cfa02286f3950383a32f6f93d1b4e0..5bed5f84cef242ec01f8510c76a450f81a985521 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -356,6 +356,7 @@ static const struct aarch64_flag_desc aarch64_tuning_flags[] =
>   /* Tuning parameters.  */
>   #include "tuning_models/generic.h"
>   #include "tuning_models/generic_armv8_a.h"
> +#include "tuning_models/generic_armv9_a.h"
>   #include "tuning_models/cortexa35.h"
>   #include "tuning_models/cortexa53.h"
>   #include "tuning_models/cortexa57.h"
> diff --git a/gcc/config/aarch64/tuning_models/generic_armv9_a.h b/gcc/config/aarch64/tuning_models/generic_armv9_a.h
> new file mode 100644
> index 0000000000000000000000000000000000000000..c017468592a9dba74ddd432247aaf51a70bb34b5
> --- /dev/null
> +++ b/gcc/config/aarch64/tuning_models/generic_armv9_a.h
> @@ -0,0 +1,245 @@
> +/* Tuning model description for AArch64 architecture.
> +   Copyright (C) 2009-2023 Free Software Foundation, Inc.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3, or (at your option)
> +   any later version.
> +
> +   GCC is distributed in the hope that it will be useful, but
> +   WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with GCC; see the file COPYING3.  If not see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#ifndef GCC_AARCH64_H_GENERIC_ARMV9_A
> +#define GCC_AARCH64_H_GENERIC_ARMV9_A
> +
> +#include "generic.h"
> +#include "generic_armv8_a.h"
> +
> +static const struct cpu_addrcost_table generic_armv9_a_addrcost_table =
> +{
> +    {
> +      1, /* hi  */
> +      0, /* si  */
> +      0, /* di  */
> +      1, /* ti  */
> +    },
> +  0, /* pre_modify  */
> +  0, /* post_modify  */
> +  2, /* post_modify_ld3_st3  */
> +  2, /* post_modify_ld4_st4  */
> +  0, /* register_offset  */
> +  0, /* register_sextend  */
> +  0, /* register_zextend  */
> +  0 /* imm_offset  */
> +};
> +
> +static const struct cpu_regmove_cost generic_armv9_a_regmove_cost =
> +{
> +  1, /* GP2GP  */
> +  /* Spilling to int<->fp instead of memory is recommended so set
> +     realistic costs compared to memmov_cost.  */
> +  3, /* GP2FP  */
> +  2, /* FP2GP  */
> +  2 /* FP2FP  */
> +};
> +
> +static const advsimd_vec_cost generic_armv9_a_advsimd_vector_cost =
> +{
> +  2, /* int_stmt_cost  */
> +  2, /* fp_stmt_cost  */
> +  2, /* ld2_st2_permute_cost */
> +  2, /* ld3_st3_permute_cost  */
> +  3, /* ld4_st4_permute_cost  */
> +  3, /* permute_cost  */
> +  4, /* reduc_i8_cost  */
> +  4, /* reduc_i16_cost  */
> +  2, /* reduc_i32_cost  */
> +  2, /* reduc_i64_cost  */
> +  6, /* reduc_f16_cost  */
> +  4, /* reduc_f32_cost  */
> +  2, /* reduc_f64_cost  */
> +  2, /* store_elt_extra_cost  */
> +  /* This value is just inherited from the Cortex-A57 table.  */
> +  8, /* vec_to_scalar_cost  */
> +  /* This depends very much on what the scalar value is and
> +     where it comes from.  E.g. some constants take two dependent
> +     instructions or a load, while others might be moved from a GPR.
> +     4 seems to be a reasonable compromise in practice.  */
> +  4, /* scalar_to_vec_cost  */
> +  4, /* align_load_cost  */
> +  4, /* unalign_load_cost  */
> +  /* Although stores have a latency of 2 and compete for the
> +     vector pipes, in practice it's better not to model that.  */
> +  1, /* unalign_store_cost  */
> +  1  /* store_cost  */
> +};
> +
> +static const sve_vec_cost generic_armv9_a_sve_vector_cost =
> +{
> +  {
> +    2, /* int_stmt_cost  */
> +    2, /* fp_stmt_cost  */
> +    3, /* ld2_st2_permute_cost  */
> +    4, /* ld3_st3_permute_cost  */
> +    4, /* ld4_st4_permute_cost  */
> +    3, /* permute_cost  */
> +    /* Theoretically, a reduction involving 15 scalar ADDs could
> +       complete in ~5 cycles and would have a cost of 15.  [SU]ADDV
> +       completes in 11 cycles, so give it a cost of 15 + 6.  */
> +    21, /* reduc_i8_cost  */
> +    /* Likewise for 7 scalar ADDs (~3 cycles) vs. 9: 7 + 6.  */
> +    13, /* reduc_i16_cost  */
> +    /* Likewise for 3 scalar ADDs (~2 cycles) vs. 8: 3 + 6.  */
> +    9, /* reduc_i32_cost  */
> +    /* Likewise for 1 scalar ADD (~1 cycles) vs. 2: 1 + 1.  */
> +    2, /* reduc_i64_cost  */
> +    /* Theoretically, a reduction involving 7 scalar FADDs could
> +       complete in ~8 cycles and would have a cost of 14.  FADDV
> +       completes in 6 cycles, so give it a cost of 14 - 2.  */
> +    12, /* reduc_f16_cost  */
> +    /* Likewise for 3 scalar FADDs (~4 cycles) vs. 4: 6 - 0.  */
> +    6, /* reduc_f32_cost  */
> +    /* Likewise for 1 scalar FADD (~2 cycles) vs. 2: 2 - 0.  */
> +    2, /* reduc_f64_cost  */
> +    2, /* store_elt_extra_cost  */
> +    /* This value is just inherited from the Cortex-A57 table.  */
> +    8, /* vec_to_scalar_cost  */
> +    /* See the comment above the Advanced SIMD versions.  */
> +    4, /* scalar_to_vec_cost  */
> +    4, /* align_load_cost  */
> +    4, /* unalign_load_cost  */
> +    /* Although stores have a latency of 2 and compete for the
> +       vector pipes, in practice it's better not to model that.  */
> +    1, /* unalign_store_cost  */
> +    1  /* store_cost  */
> +  },
> +  3, /* clast_cost  */
> +  10, /* fadda_f16_cost  */
> +  6, /* fadda_f32_cost  */
> +  4, /* fadda_f64_cost  */
> +  /* A strided Advanced SIMD x64 load would take two parallel FP loads
> +     (8 cycles) plus an insertion (2 cycles).  Assume a 64-bit SVE gather
> +     is 1 cycle more.  The Advanced SIMD version is costed as 2 scalar loads
> +     (cost 8) and a vec_construct (cost 2).  Add a full vector operation
> +     (cost 2) to that, to avoid the difference being lost in rounding.
> +
> +     There is no easy comparison between a strided Advanced SIMD x32 load
> +     and an SVE 32-bit gather, but cost an SVE 32-bit gather as 1 vector
> +     operation more than a 64-bit gather.  */
> +  14, /* gather_load_x32_cost  */
> +  12, /* gather_load_x64_cost  */
> +  3 /* scatter_store_elt_cost  */
> +};
> +
> +static const aarch64_scalar_vec_issue_info generic_armv9_a_scalar_issue_info =
> +{
> +  3, /* loads_stores_per_cycle  */
> +  2, /* stores_per_cycle  */
> +  4, /* general_ops_per_cycle  */
> +  0, /* fp_simd_load_general_ops  */
> +  1 /* fp_simd_store_general_ops  */
> +};
> +
> +static const aarch64_advsimd_vec_issue_info generic_armv9_a_advsimd_issue_info =
> +{
> +  {
> +    3, /* loads_stores_per_cycle  */
> +    2, /* stores_per_cycle  */
> +    2, /* general_ops_per_cycle  */
> +    0, /* fp_simd_load_general_ops  */
> +    1 /* fp_simd_store_general_ops  */
> +  },
> +  2, /* ld2_st2_general_ops  */
> +  2, /* ld3_st3_general_ops  */
> +  3 /* ld4_st4_general_ops  */
> +};
> +
> +static const aarch64_sve_vec_issue_info generic_armv9_a_sve_issue_info =
> +{
> +  {
> +    {
> +      3, /* loads_per_cycle  */
> +      2, /* stores_per_cycle  */
> +      2, /* general_ops_per_cycle  */
> +      0, /* fp_simd_load_general_ops  */
> +      1 /* fp_simd_store_general_ops  */
> +    },
> +    2, /* ld2_st2_general_ops  */
> +    3, /* ld3_st3_general_ops  */
> +    3 /* ld4_st4_general_ops  */
> +  },
> +  2, /* pred_ops_per_cycle  */
> +  2, /* while_pred_ops  */
> +  2, /* int_cmp_pred_ops  */
> +  1, /* fp_cmp_pred_ops  */
> +  1, /* gather_scatter_pair_general_ops  */
> +  1 /* gather_scatter_pair_pred_ops  */
> +};
> +
> +static const aarch64_vec_issue_info generic_armv9_a_vec_issue_info =
> +{
> +  &generic_armv9_a_scalar_issue_info,
> +  &generic_armv9_a_advsimd_issue_info,
> +  &generic_armv9_a_sve_issue_info
> +};
> +
> +/* Neoverse N2 costs for vector insn classes.  */
> +static const struct cpu_vector_cost generic_armv9_a_vector_cost =
> +{
> +  1, /* scalar_int_stmt_cost  */
> +  2, /* scalar_fp_stmt_cost  */
> +  4, /* scalar_load_cost  */
> +  1, /* scalar_store_cost  */
> +  1, /* cond_taken_branch_cost  */
> +  1, /* cond_not_taken_branch_cost  */
> +  &generic_armv9_a_advsimd_vector_cost, /* advsimd  */
> +  &generic_armv9_a_sve_vector_cost, /* sve  */
> +  &generic_armv9_a_vec_issue_info /* issue_info  */
> +};
> +
> +static const struct tune_params generic_armv9_a_tunings =
> +{
> +  &cortexa76_extra_costs,
> +  &generic_armv9_a_addrcost_table,
> +  &generic_armv9_a_regmove_cost,
> +  &generic_armv9_a_vector_cost,
> +  &generic_branch_cost,
> +  &generic_approx_modes,
> +  SVE_SCALABLE, /* sve_width  */
> +  { 4, /* load_int.  */
> +    1, /* store_int.  */
> +    6, /* load_fp.  */
> +    2, /* store_fp.  */
> +    6, /* load_pred.  */
> +    1 /* store_pred.  */
> +  }, /* memmov_cost.  */
> +  3, /* issue_rate  */
> +  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops  */
> +  "32:16",	/* function_align.  */
> +  "4",		/* jump_align.  */
> +  "32:16",	/* loop_align.  */
> +  2,	/* int_reassoc_width.  */
> +  4,	/* fp_reassoc_width.  */
> +  1,	/* fma_reassoc_width.  */
> +  2,	/* vec_reassoc_width.  */
> +  2,	/* min_div_recip_mul_sf.  */
> +  2,	/* min_div_recip_mul_df.  */
> +  0,	/* max_case_values.  */
> +  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
> +  (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND
> +   | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
> +   | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT),	/* tune_flags.  */
> +  &generic_prefetch_tune,
> +  AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
> +  AARCH64_LDP_STP_POLICY_ALWAYS	   /* stp_policy_model.  */
> +};
> +
> +#endif /* GCC_AARCH64_H_GENERIC_ARMV9_A.  */
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 6/6]AArch64: only emit mismatch error when features would be disabled.
  2023-11-15 17:08 ` [PATCH 6/6]AArch64: only emit mismatch error when features would be disabled Tamar Christina
@ 2023-11-16  9:26   ` Richard Earnshaw
  2023-11-16  9:33     ` Tamar Christina
  2023-11-16 10:33   ` Richard Earnshaw
  1 sibling, 1 reply; 14+ messages in thread
From: Richard Earnshaw @ 2023-11-16  9:26 UTC (permalink / raw)
  To: Tamar Christina, gcc-patches
  Cc: nd, Richard.Earnshaw, Marcus.Shawcroft, Kyrylo.Tkachov,
	richard.sandiford



On 15/11/2023 17:08, Tamar Christina wrote:
> Hi All,
> 
> At the moment we emit a warning whenever you specify both -march and -mcpu
> and the architecture of them differ.  The idea originally was that the user may
> not be aware of this change.
> 
> However this has a few problems:
> 
> 1.  Architecture revisions is not an observable part of the architecture,
>      extensions are.  Starting with GCC 14 we have therefore relaxed the rule that
>      all extensions can be enabled at any architecture level.  Therefore it's
>      incorrect, or at least not useful to keep the check on architecture.
> 
> 2.  It's problematic in Makefiles and other build systems, where you want to
>      for certain files enable CPU specific builds.  i.e. you may be by default
>      building for -march=armv8-a but for some file for -mcpu=neoverse-n1.  Since
>      there's no easy way to remove the earlier options we end up warning and
>      there's no way to disable just this warning.  Build systems compiling with
>      -Werror face an issue in this case that compiling with GCC is needlessly
>      hard.
> 
> 3. It doesn't actually warn for cases that may lead to issues, so e.g.
>     -march=armv8.2-a+sve -mcpu=neoverse-n1 does not give a warning that SVE would
>     be disabled.
> 
> For this reason I have one of two proposals:
> 
> 1.  Just remove this warning all together.
> 
> 2.  Rework the warning based on extensions and only warn when features would be
>      disabled by the presence of the -mcpu.  This is the approach this patch has
>      taken.

There's a third option here, which is what I plan to add for the Arm port:

3. Add -mcpu=unset and -march=unset support in the driver, which has the 
effect of suppressing any earlier option that sets that flag.

[BTW: patch 5 seems to be missing so I'm holding off on approving this now.]

R.

> 
> As examples:
> 
>> aarch64-none-linux-gnu-gcc -march=armv8.2-a+sve -mcpu=neoverse-n1
> cc1: warning: switch ‘-mcpu=neoverse-n1’ conflicts with ‘-march=armv8.2-a+sve’ switch and resulted in options +crc+sve+norcpc+nodotprod being added                                                                                                                                        .arch armv8.2-a+crc+sve
> 
>> aarch64-none-linux-gnu-gcc -march=armv8.2-a -mcpu=neoverse-n1
>> aarch64-none-linux-gnu-gcc -march=armv8.2-a+dotprod -mcpu=neoverse-n1
>> aarch64-none-linux-gnu-gcc -march=armv8.2-a+dotprod -mcpu=neoverse-n2
> <no warning>
> 
> The one remaining issue here is that if both -march and -mcpu are specified we
> pick the -march.  This is not particularly obvious and for the use case to be
> more useful I think it makes sense to pick the CPU's arch?
> 
> I did not make that change in the patch as it changes semantics.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Note that I can't write a test for this because dg-warning expects warnings to
> be at a particular line and doesn't support warnings at the "global" level.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* config/aarch64/aarch64.cc (aarch64_override_options): Rework warnings.
> 
> --- inline copy of patch --
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index caf80d66b3a744cc93899645aa5f9374983cd3db..3afd222ad3bdcfb922cc010dcc0b138db29caf7f 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -16388,12 +16388,22 @@ aarch64_override_options (void)
>     if (cpu && arch)
>       {
>         /* If both -mcpu and -march are specified, warn if they are not
> -	 architecturally compatible and prefer the -march ISA flags.  */
> -      if (arch->arch != cpu->arch)
> -	{
> -	  warning (0, "switch %<-mcpu=%s%> conflicts with %<-march=%s%> switch",
> +	 feature compatible.  feature compatible means that the inclusion of the
> +	 cpu features would end up disabling an achitecture feature.  In
> +	 otherwords the cpu features need to be a strict superset of the arch
> +	 features and if so prefer the -march ISA flags.  */
> +      auto full_arch_flags = arch->flags | arch_isa;
> +      auto full_cpu_flags = cpu->flags | cpu_isa;
> +      if (~full_cpu_flags & full_arch_flags)
> +	{
> +	  std::string ext_diff
> +	    = aarch64_get_extension_string_for_isa_flags (full_arch_flags,
> +							  full_cpu_flags);
> +	  warning (0, "switch %<-mcpu=%s%> conflicts with %<-march=%s%> switch "
> +		      "and resulted in options %s being added",
>   		       aarch64_cpu_string,
> -		       aarch64_arch_string);
> +		       aarch64_arch_string,
> +		       ext_diff.c_str ());
>   	}
>   
>         selected_arch = arch->arch;
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH 6/6]AArch64: only emit mismatch error when features would be disabled.
  2023-11-16  9:26   ` Richard Earnshaw
@ 2023-11-16  9:33     ` Tamar Christina
  2023-11-16  9:41       ` Richard Earnshaw
  0 siblings, 1 reply; 14+ messages in thread
From: Tamar Christina @ 2023-11-16  9:33 UTC (permalink / raw)
  To: Richard Earnshaw, gcc-patches
  Cc: nd, Richard Earnshaw, Marcus Shawcroft, Kyrylo Tkachov,
	Richard Sandiford

> -----Original Message-----
> From: Richard Earnshaw <Richard.Earnshaw@foss.arm.com>
> Sent: Thursday, November 16, 2023 9:27 AM
> To: Tamar Christina <Tamar.Christina@arm.com>; gcc-patches@gcc.gnu.org
> Cc: nd <nd@arm.com>; Richard Earnshaw <Richard.Earnshaw@arm.com>;
> Marcus Shawcroft <Marcus.Shawcroft@arm.com>; Kyrylo Tkachov
> <Kyrylo.Tkachov@arm.com>; Richard Sandiford
> <Richard.Sandiford@arm.com>
> Subject: Re: [PATCH 6/6]AArch64: only emit mismatch error when features
> would be disabled.
> 
> 
> 
> On 15/11/2023 17:08, Tamar Christina wrote:
> > Hi All,
> >
> > At the moment we emit a warning whenever you specify both -march and
> > -mcpu and the architecture of them differ.  The idea originally was
> > that the user may not be aware of this change.
> >
> > However this has a few problems:
> >
> > 1.  Architecture revisions is not an observable part of the architecture,
> >      extensions are.  Starting with GCC 14 we have therefore relaxed the rule
> that
> >      all extensions can be enabled at any architecture level.  Therefore it's
> >      incorrect, or at least not useful to keep the check on architecture.
> >
> > 2.  It's problematic in Makefiles and other build systems, where you want to
> >      for certain files enable CPU specific builds.  i.e. you may be by default
> >      building for -march=armv8-a but for some file for -mcpu=neoverse-n1.
> Since
> >      there's no easy way to remove the earlier options we end up warning and
> >      there's no way to disable just this warning.  Build systems compiling with
> >      -Werror face an issue in this case that compiling with GCC is needlessly
> >      hard.
> >
> > 3. It doesn't actually warn for cases that may lead to issues, so e.g.
> >     -march=armv8.2-a+sve -mcpu=neoverse-n1 does not give a warning that
> SVE would
> >     be disabled.
> >
> > For this reason I have one of two proposals:
> >
> > 1.  Just remove this warning all together.
> >
> > 2.  Rework the warning based on extensions and only warn when features
> would be
> >      disabled by the presence of the -mcpu.  This is the approach this patch has
> >      taken.
> 
> There's a third option here, which is what I plan to add for the Arm port:
> 
> 3. Add -mcpu=unset and -march=unset support in the driver, which has the
> effect of suppressing any earlier option that sets that flag.
> 
> [BTW: patch 5 seems to be missing so I'm holding off on approving this now.]
> 

Ah sorry, I should have re-numbered this series. Patch 5 was sent earlier to unblock
an internal team. It was https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632802.html

Thanks,
Tamar
> R.
> 
> >
> > As examples:
> >
> >> aarch64-none-linux-gnu-gcc -march=armv8.2-a+sve -mcpu=neoverse-n1
> > cc1: warning: switch ‘-mcpu=neoverse-n1’ conflicts with ‘-march=armv8.2-
> a+sve’ switch and resulted in options +crc+sve+norcpc+nodotprod being
> added
> .arch armv8.2-a+crc+sve
> >
> >> aarch64-none-linux-gnu-gcc -march=armv8.2-a -mcpu=neoverse-n1
> >> aarch64-none-linux-gnu-gcc -march=armv8.2-a+dotprod -mcpu=neoverse-
> n1
> >> aarch64-none-linux-gnu-gcc -march=armv8.2-a+dotprod -mcpu=neoverse-
> n2
> > <no warning>
> >
> > The one remaining issue here is that if both -march and -mcpu are
> > specified we pick the -march.  This is not particularly obvious and
> > for the use case to be more useful I think it makes sense to pick the CPU's
> arch?
> >
> > I did not make that change in the patch as it changes semantics.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Note that I can't write a test for this because dg-warning expects
> > warnings to be at a particular line and doesn't support warnings at the
> "global" level.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > 	* config/aarch64/aarch64.cc (aarch64_override_options): Rework
> warnings.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/config/aarch64/aarch64.cc
> > b/gcc/config/aarch64/aarch64.cc index
> >
> caf80d66b3a744cc93899645aa5f9374983cd3db..3afd222ad3bdcfb922cc01
> 0dcc0b
> > 138db29caf7f 100644
> > --- a/gcc/config/aarch64/aarch64.cc
> > +++ b/gcc/config/aarch64/aarch64.cc
> > @@ -16388,12 +16388,22 @@ aarch64_override_options (void)
> >     if (cpu && arch)
> >       {
> >         /* If both -mcpu and -march are specified, warn if they are not
> > -	 architecturally compatible and prefer the -march ISA flags.  */
> > -      if (arch->arch != cpu->arch)
> > -	{
> > -	  warning (0, "switch %<-mcpu=%s%> conflicts with %<-march=%s%>
> switch",
> > +	 feature compatible.  feature compatible means that the inclusion of
> the
> > +	 cpu features would end up disabling an achitecture feature.  In
> > +	 otherwords the cpu features need to be a strict superset of the arch
> > +	 features and if so prefer the -march ISA flags.  */
> > +      auto full_arch_flags = arch->flags | arch_isa;
> > +      auto full_cpu_flags = cpu->flags | cpu_isa;
> > +      if (~full_cpu_flags & full_arch_flags)
> > +	{
> > +	  std::string ext_diff
> > +	    = aarch64_get_extension_string_for_isa_flags (full_arch_flags,
> > +							  full_cpu_flags);
> > +	  warning (0, "switch %<-mcpu=%s%> conflicts with %<-march=%s%>
> switch "
> > +		      "and resulted in options %s being added",
> >   		       aarch64_cpu_string,
> > -		       aarch64_arch_string);
> > +		       aarch64_arch_string,
> > +		       ext_diff.c_str ());
> >   	}
> >
> >         selected_arch = arch->arch;
> >
> >
> >
> >

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 6/6]AArch64: only emit mismatch error when features would be disabled.
  2023-11-16  9:33     ` Tamar Christina
@ 2023-11-16  9:41       ` Richard Earnshaw
  2023-11-16  9:50         ` Tamar Christina
  0 siblings, 1 reply; 14+ messages in thread
From: Richard Earnshaw @ 2023-11-16  9:41 UTC (permalink / raw)
  To: Tamar Christina, gcc-patches
  Cc: nd, Richard Earnshaw, Marcus Shawcroft, Kyrylo Tkachov,
	Richard Sandiford



On 16/11/2023 09:33, Tamar Christina wrote:
>> -----Original Message-----
>> From: Richard Earnshaw <Richard.Earnshaw@foss.arm.com>
>> Sent: Thursday, November 16, 2023 9:27 AM
>> To: Tamar Christina <Tamar.Christina@arm.com>; gcc-patches@gcc.gnu.org
>> Cc: nd <nd@arm.com>; Richard Earnshaw <Richard.Earnshaw@arm.com>;
>> Marcus Shawcroft <Marcus.Shawcroft@arm.com>; Kyrylo Tkachov
>> <Kyrylo.Tkachov@arm.com>; Richard Sandiford
>> <Richard.Sandiford@arm.com>
>> Subject: Re: [PATCH 6/6]AArch64: only emit mismatch error when features
>> would be disabled.
>>
>>
>>
>> On 15/11/2023 17:08, Tamar Christina wrote:
>>> Hi All,
>>>
>>> At the moment we emit a warning whenever you specify both -march and
>>> -mcpu and the architecture of them differ.  The idea originally was
>>> that the user may not be aware of this change.
>>>
>>> However this has a few problems:
>>>
>>> 1.  Architecture revisions is not an observable part of the architecture,
>>>       extensions are.  Starting with GCC 14 we have therefore relaxed the rule
>> that
>>>       all extensions can be enabled at any architecture level.  Therefore it's
>>>       incorrect, or at least not useful to keep the check on architecture.
>>>
>>> 2.  It's problematic in Makefiles and other build systems, where you want to
>>>       for certain files enable CPU specific builds.  i.e. you may be by default
>>>       building for -march=armv8-a but for some file for -mcpu=neoverse-n1.
>> Since
>>>       there's no easy way to remove the earlier options we end up warning and
>>>       there's no way to disable just this warning.  Build systems compiling with
>>>       -Werror face an issue in this case that compiling with GCC is needlessly
>>>       hard.
>>>
>>> 3. It doesn't actually warn for cases that may lead to issues, so e.g.
>>>      -march=armv8.2-a+sve -mcpu=neoverse-n1 does not give a warning that
>> SVE would
>>>      be disabled.
>>>
>>> For this reason I have one of two proposals:
>>>
>>> 1.  Just remove this warning all together.
>>>
>>> 2.  Rework the warning based on extensions and only warn when features
>> would be
>>>       disabled by the presence of the -mcpu.  This is the approach this patch has
>>>       taken.
>>
>> There's a third option here, which is what I plan to add for the Arm port:
>>
>> 3. Add -mcpu=unset and -march=unset support in the driver, which has the
>> effect of suppressing any earlier option that sets that flag.
>>
>> [BTW: patch 5 seems to be missing so I'm holding off on approving this now.]
>>
> 
> Ah sorry, I should have re-numbered this series. Patch 5 was sent earlier to unblock
> an internal team. It was https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632802.html

Ah, OK.

So going back to your option 2.  What should happen if the user 
specified -mcpu=cortex-r82 and then specifies an extension that doesn't 
exist in the R profile?

R.

> 
> Thanks,
> Tamar
>> R.
>>
>>>
>>> As examples:
>>>
>>>> aarch64-none-linux-gnu-gcc -march=armv8.2-a+sve -mcpu=neoverse-n1
>>> cc1: warning: switch ‘-mcpu=neoverse-n1’ conflicts with ‘-march=armv8.2-
>> a+sve’ switch and resulted in options +crc+sve+norcpc+nodotprod being
>> added
>> .arch armv8.2-a+crc+sve
>>>
>>>> aarch64-none-linux-gnu-gcc -march=armv8.2-a -mcpu=neoverse-n1
>>>> aarch64-none-linux-gnu-gcc -march=armv8.2-a+dotprod -mcpu=neoverse-
>> n1
>>>> aarch64-none-linux-gnu-gcc -march=armv8.2-a+dotprod -mcpu=neoverse-
>> n2
>>> <no warning>
>>>
>>> The one remaining issue here is that if both -march and -mcpu are
>>> specified we pick the -march.  This is not particularly obvious and
>>> for the use case to be more useful I think it makes sense to pick the CPU's
>> arch?
>>>
>>> I did not make that change in the patch as it changes semantics.
>>>
>>> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>>>
>>> Note that I can't write a test for this because dg-warning expects
>>> warnings to be at a particular line and doesn't support warnings at the
>> "global" level.
>>>
>>> Ok for master?
>>>
>>> Thanks,
>>> Tamar
>>>
>>> gcc/ChangeLog:
>>>
>>> 	* config/aarch64/aarch64.cc (aarch64_override_options): Rework
>> warnings.
>>>
>>> --- inline copy of patch --
>>> diff --git a/gcc/config/aarch64/aarch64.cc
>>> b/gcc/config/aarch64/aarch64.cc index
>>>
>> caf80d66b3a744cc93899645aa5f9374983cd3db..3afd222ad3bdcfb922cc01
>> 0dcc0b
>>> 138db29caf7f 100644
>>> --- a/gcc/config/aarch64/aarch64.cc
>>> +++ b/gcc/config/aarch64/aarch64.cc
>>> @@ -16388,12 +16388,22 @@ aarch64_override_options (void)
>>>      if (cpu && arch)
>>>        {
>>>          /* If both -mcpu and -march are specified, warn if they are not
>>> -	 architecturally compatible and prefer the -march ISA flags.  */
>>> -      if (arch->arch != cpu->arch)
>>> -	{
>>> -	  warning (0, "switch %<-mcpu=%s%> conflicts with %<-march=%s%>
>> switch",
>>> +	 feature compatible.  feature compatible means that the inclusion of
>> the
>>> +	 cpu features would end up disabling an achitecture feature.  In
>>> +	 otherwords the cpu features need to be a strict superset of the arch
>>> +	 features and if so prefer the -march ISA flags.  */
>>> +      auto full_arch_flags = arch->flags | arch_isa;
>>> +      auto full_cpu_flags = cpu->flags | cpu_isa;
>>> +      if (~full_cpu_flags & full_arch_flags)
>>> +	{
>>> +	  std::string ext_diff
>>> +	    = aarch64_get_extension_string_for_isa_flags (full_arch_flags,
>>> +							  full_cpu_flags);
>>> +	  warning (0, "switch %<-mcpu=%s%> conflicts with %<-march=%s%>
>> switch "
>>> +		      "and resulted in options %s being added",
>>>    		       aarch64_cpu_string,
>>> -		       aarch64_arch_string);
>>> +		       aarch64_arch_string,
>>> +		       ext_diff.c_str ());
>>>    	}
>>>
>>>          selected_arch = arch->arch;
>>>
>>>
>>>
>>>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH 6/6]AArch64: only emit mismatch error when features would be disabled.
  2023-11-16  9:41       ` Richard Earnshaw
@ 2023-11-16  9:50         ` Tamar Christina
  0 siblings, 0 replies; 14+ messages in thread
From: Tamar Christina @ 2023-11-16  9:50 UTC (permalink / raw)
  To: Richard Earnshaw, gcc-patches
  Cc: nd, Richard Earnshaw, Marcus Shawcroft, Kyrylo Tkachov,
	Richard Sandiford

> -----Original Message-----
> From: Richard Earnshaw <Richard.Earnshaw@foss.arm.com>
> Sent: Thursday, November 16, 2023 9:42 AM
> To: Tamar Christina <Tamar.Christina@arm.com>; gcc-patches@gcc.gnu.org
> Cc: nd <nd@arm.com>; Richard Earnshaw <Richard.Earnshaw@arm.com>;
> Marcus Shawcroft <Marcus.Shawcroft@arm.com>; Kyrylo Tkachov
> <Kyrylo.Tkachov@arm.com>; Richard Sandiford
> <Richard.Sandiford@arm.com>
> Subject: Re: [PATCH 6/6]AArch64: only emit mismatch error when features
> would be disabled.
> 
> 
> 
> On 16/11/2023 09:33, Tamar Christina wrote:
> >> -----Original Message-----
> >> From: Richard Earnshaw <Richard.Earnshaw@foss.arm.com>
> >> Sent: Thursday, November 16, 2023 9:27 AM
> >> To: Tamar Christina <Tamar.Christina@arm.com>;
> >> gcc-patches@gcc.gnu.org
> >> Cc: nd <nd@arm.com>; Richard Earnshaw <Richard.Earnshaw@arm.com>;
> >> Marcus Shawcroft <Marcus.Shawcroft@arm.com>; Kyrylo Tkachov
> >> <Kyrylo.Tkachov@arm.com>; Richard Sandiford
> >> <Richard.Sandiford@arm.com>
> >> Subject: Re: [PATCH 6/6]AArch64: only emit mismatch error when
> >> features would be disabled.
> >>
> >>
> >>
> >> On 15/11/2023 17:08, Tamar Christina wrote:
> >>> Hi All,
> >>>
> >>> At the moment we emit a warning whenever you specify both -march and
> >>> -mcpu and the architecture of them differ.  The idea originally was
> >>> that the user may not be aware of this change.
> >>>
> >>> However this has a few problems:
> >>>
> >>> 1.  Architecture revisions is not an observable part of the architecture,
> >>>       extensions are.  Starting with GCC 14 we have therefore
> >>> relaxed the rule
> >> that
> >>>       all extensions can be enabled at any architecture level.  Therefore it's
> >>>       incorrect, or at least not useful to keep the check on architecture.
> >>>
> >>> 2.  It's problematic in Makefiles and other build systems, where you want
> to
> >>>       for certain files enable CPU specific builds.  i.e. you may be by default
> >>>       building for -march=armv8-a but for some file for -mcpu=neoverse-n1.
> >> Since
> >>>       there's no easy way to remove the earlier options we end up warning
> and
> >>>       there's no way to disable just this warning.  Build systems compiling
> with
> >>>       -Werror face an issue in this case that compiling with GCC is needlessly
> >>>       hard.
> >>>
> >>> 3. It doesn't actually warn for cases that may lead to issues, so e.g.
> >>>      -march=armv8.2-a+sve -mcpu=neoverse-n1 does not give a warning
> >>> that
> >> SVE would
> >>>      be disabled.
> >>>
> >>> For this reason I have one of two proposals:
> >>>
> >>> 1.  Just remove this warning all together.
> >>>
> >>> 2.  Rework the warning based on extensions and only warn when
> >>> features
> >> would be
> >>>       disabled by the presence of the -mcpu.  This is the approach this patch
> has
> >>>       taken.
> >>
> >> There's a third option here, which is what I plan to add for the Arm port:
> >>
> >> 3. Add -mcpu=unset and -march=unset support in the driver, which has
> >> the effect of suppressing any earlier option that sets that flag.
> >>
> >> [BTW: patch 5 seems to be missing so I'm holding off on approving
> >> this now.]
> >>
> >
> > Ah sorry, I should have re-numbered this series. Patch 5 was sent
> > earlier to unblock an internal team. It was
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632802.html
> 
> Ah, OK.
> 
> So going back to your option 2.  What should happen if the user specified -
> mcpu=cortex-r82 and then specifies an extension that doesn't exist in the R
> profile?
> 

AArch64 in general does not validate extensions to architectures.  So basically
we would allow it.

e.g. 
> aarch64-none-elf-gcc -O3 ./gcc/testsuite/gcc.dg/tree-ssa/slsr-20.c -S -o - -march=armv8.2-a+sve -mcpu=cortex-r82                                         
cc1: warning: switch '-mcpu=cortex-r82' conflicts with '-march=armv8.2-a+sve' switch and would result in options +sve+norcpc+nodotprod+nofp16fml+noflagm+nopauth being added
        .arch armv8.2-a+crc+sve

The new warning only tells you exactly what the compiler will be doing to your options, but doesn't change the behavior
the compiler exhibits today since we always take -march over -mcpu.

The difference is today we just say "there's a conflict" and don't specify what the conflict it.

Regards,
Tamar
> R.
> 
> >
> > Thanks,
> > Tamar
> >> R.
> >>
> >>>
> >>> As examples:
> >>>
> >>>> aarch64-none-linux-gnu-gcc -march=armv8.2-a+sve -mcpu=neoverse-n1
> >>> cc1: warning: switch ‘-mcpu=neoverse-n1’ conflicts with
> >>> ‘-march=armv8.2-
> >> a+sve’ switch and resulted in options +crc+sve+norcpc+nodotprod being
> >> added
> >> .arch armv8.2-a+crc+sve
> >>>
> >>>> aarch64-none-linux-gnu-gcc -march=armv8.2-a -mcpu=neoverse-n1
> >>>> aarch64-none-linux-gnu-gcc -march=armv8.2-a+dotprod -
> mcpu=neoverse-
> >> n1
> >>>> aarch64-none-linux-gnu-gcc -march=armv8.2-a+dotprod -
> mcpu=neoverse-
> >> n2
> >>> <no warning>
> >>>
> >>> The one remaining issue here is that if both -march and -mcpu are
> >>> specified we pick the -march.  This is not particularly obvious and
> >>> for the use case to be more useful I think it makes sense to pick
> >>> the CPU's
> >> arch?
> >>>
> >>> I did not make that change in the patch as it changes semantics.
> >>>
> >>> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >>>
> >>> Note that I can't write a test for this because dg-warning expects
> >>> warnings to be at a particular line and doesn't support warnings at
> >>> the
> >> "global" level.
> >>>
> >>> Ok for master?
> >>>
> >>> Thanks,
> >>> Tamar
> >>>
> >>> gcc/ChangeLog:
> >>>
> >>> 	* config/aarch64/aarch64.cc (aarch64_override_options): Rework
> >> warnings.
> >>>
> >>> --- inline copy of patch --
> >>> diff --git a/gcc/config/aarch64/aarch64.cc
> >>> b/gcc/config/aarch64/aarch64.cc index
> >>>
> >>
> caf80d66b3a744cc93899645aa5f9374983cd3db..3afd222ad3bdcfb922cc01
> >> 0dcc0b
> >>> 138db29caf7f 100644
> >>> --- a/gcc/config/aarch64/aarch64.cc
> >>> +++ b/gcc/config/aarch64/aarch64.cc
> >>> @@ -16388,12 +16388,22 @@ aarch64_override_options (void)
> >>>      if (cpu && arch)
> >>>        {
> >>>          /* If both -mcpu and -march are specified, warn if they are not
> >>> -	 architecturally compatible and prefer the -march ISA flags.  */
> >>> -      if (arch->arch != cpu->arch)
> >>> -	{
> >>> -	  warning (0, "switch %<-mcpu=%s%> conflicts with %<-march=%s%>
> >> switch",
> >>> +	 feature compatible.  feature compatible means that the inclusion
> >>> +of
> >> the
> >>> +	 cpu features would end up disabling an achitecture feature.  In
> >>> +	 otherwords the cpu features need to be a strict superset of the arch
> >>> +	 features and if so prefer the -march ISA flags.  */
> >>> +      auto full_arch_flags = arch->flags | arch_isa;
> >>> +      auto full_cpu_flags = cpu->flags | cpu_isa;
> >>> +      if (~full_cpu_flags & full_arch_flags)
> >>> +	{
> >>> +	  std::string ext_diff
> >>> +	    = aarch64_get_extension_string_for_isa_flags (full_arch_flags,
> >>> +							  full_cpu_flags);
> >>> +	  warning (0, "switch %<-mcpu=%s%> conflicts with %<-march=%s%>
> >> switch "
> >>> +		      "and resulted in options %s being added",
> >>>    		       aarch64_cpu_string,
> >>> -		       aarch64_arch_string);
> >>> +		       aarch64_arch_string,
> >>> +		       ext_diff.c_str ());
> >>>    	}
> >>>
> >>>          selected_arch = arch->arch;
> >>>
> >>>
> >>>
> >>>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 6/6]AArch64: only emit mismatch error when features would be disabled.
  2023-11-15 17:08 ` [PATCH 6/6]AArch64: only emit mismatch error when features would be disabled Tamar Christina
  2023-11-16  9:26   ` Richard Earnshaw
@ 2023-11-16 10:33   ` Richard Earnshaw
  1 sibling, 0 replies; 14+ messages in thread
From: Richard Earnshaw @ 2023-11-16 10:33 UTC (permalink / raw)
  To: Tamar Christina, gcc-patches
  Cc: nd, Richard.Earnshaw, Marcus.Shawcroft, Kyrylo.Tkachov,
	richard.sandiford



On 15/11/2023 17:08, Tamar Christina wrote:
> Hi All,
> 
> At the moment we emit a warning whenever you specify both -march and -mcpu
> and the architecture of them differ.  The idea originally was that the user may
> not be aware of this change.
> 
> However this has a few problems:
> 
> 1.  Architecture revisions is not an observable part of the architecture,
>      extensions are.  Starting with GCC 14 we have therefore relaxed the rule that
>      all extensions can be enabled at any architecture level.  Therefore it's
>      incorrect, or at least not useful to keep the check on architecture.
> 
> 2.  It's problematic in Makefiles and other build systems, where you want to
>      for certain files enable CPU specific builds.  i.e. you may be by default
>      building for -march=armv8-a but for some file for -mcpu=neoverse-n1.  Since
>      there's no easy way to remove the earlier options we end up warning and
>      there's no way to disable just this warning.  Build systems compiling with
>      -Werror face an issue in this case that compiling with GCC is needlessly
>      hard.
> 
> 3. It doesn't actually warn for cases that may lead to issues, so e.g.
>     -march=armv8.2-a+sve -mcpu=neoverse-n1 does not give a warning that SVE would
>     be disabled.
> 
> For this reason I have one of two proposals:
> 
> 1.  Just remove this warning all together.
> 
> 2.  Rework the warning based on extensions and only warn when features would be
>      disabled by the presence of the -mcpu.  This is the approach this patch has
>      taken.
> 
> As examples:
> 
>> aarch64-none-linux-gnu-gcc -march=armv8.2-a+sve -mcpu=neoverse-n1
> cc1: warning: switch ‘-mcpu=neoverse-n1’ conflicts with ‘-march=armv8.2-a+sve’ switch and resulted in options +crc+sve+norcpc+nodotprod being added                                                                                                                                        .arch armv8.2-a+crc+sve
> 
>> aarch64-none-linux-gnu-gcc -march=armv8.2-a -mcpu=neoverse-n1
>> aarch64-none-linux-gnu-gcc -march=armv8.2-a+dotprod -mcpu=neoverse-n1
>> aarch64-none-linux-gnu-gcc -march=armv8.2-a+dotprod -mcpu=neoverse-n2
> <no warning>
> 
> The one remaining issue here is that if both -march and -mcpu are specified we
> pick the -march.  This is not particularly obvious and for the use case to be
> more useful I think it makes sense to pick the CPU's arch?

The intent was always that users would either specify -march (with 
-mtune) or they would specify just -mcpu, not that they should mix both 
on the the command line.  -mcpu=<cpu> is supposed to be the equivalent 
of -march=<arch-of(cpu)> -mtune=<cpu>.  Both the Arm and AArch64 
compilers implement the rule that -march dominates any -mcpu setting and 
this is (or at least used to be) documented in the manual.

Part of the problem is that there's no clear way to recover positional 
information from the parameter list, so that it's not possible in the 
specs files to determine whether the user wrote

-mcpu=x -march=y

or

-march=y -mcpu=x

Now if a single source of rules for the cpu/arch conflates things in 
this way, that is pilot error, but we can't currently distinguish that 
case from the one where, say, the user adds -mcpu to CFLAGS in a 
makefile, but the make rules themselves need specific architecture 
features in order to build a specific file.

This is where unset becomes useful as it will provide a clean(er) way to 
say ignore any conflict from an earlier option and use the following flags.

Hence

(-mcpu=x) (-mcpu=unset -march=y)

(parenthesis indicate different sources of flags) will cause the 
compiler to forget any earlier -mcpu and just look at -march.  Conversely,

(-march=y) (-march=unset -mcpu=x)

would clear any earlier -march flag and tell the compiler to just use 
-mcpu from now on.

> 
> I did not make that change in the patch as it changes semantics.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Note that I can't write a test for this because dg-warning expects warnings to
> be at a particular line and doesn't support warnings at the "global" level.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* config/aarch64/aarch64.cc (aarch64_override_options): Rework warnings.
> 
> --- inline copy of patch --
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index caf80d66b3a744cc93899645aa5f9374983cd3db..3afd222ad3bdcfb922cc010dcc0b138db29caf7f 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -16388,12 +16388,22 @@ aarch64_override_options (void)
>     if (cpu && arch)
>       {
>         /* If both -mcpu and -march are specified, warn if they are not
> -	 architecturally compatible and prefer the -march ISA flags.  */
> -      if (arch->arch != cpu->arch)
> -	{
> -	  warning (0, "switch %<-mcpu=%s%> conflicts with %<-march=%s%> switch",
> +	 feature compatible.  feature compatible means that the inclusion of the
> +	 cpu features would end up disabling an achitecture feature.  In
"CPU" and "architecture"

> +	 otherwords the cpu features need to be a strict superset of the arch
"other words" and "CPU".

> +	 features and if so prefer the -march ISA flags.  */
> +      auto full_arch_flags = arch->flags | arch_isa;
> +      auto full_cpu_flags = cpu->flags | cpu_isa;
> +      if (~full_cpu_flags & full_arch_flags)
> +	{
> +	  std::string ext_diff
> +	    = aarch64_get_extension_string_for_isa_flags (full_arch_flags,
> +							  full_cpu_flags);
> +	  warning (0, "switch %<-mcpu=%s%> conflicts with %<-march=%s%> switch "
> +		      "and resulted in options %s being added",

Please check the convention here: should %s be surrounded in %<..%>?  It 
is part of what the user effectively specified on the command line.

>   		       aarch64_cpu_string,
> -		       aarch64_arch_string);
> +		       aarch64_arch_string,
> +		       ext_diff.c_str ());
>   	}
>   
>         selected_arch = arch->arch;
> 
> 


Otherwise OK.

R.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-11-16 10:33 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-15 17:06 [PATCH 1/6]AArch64: Refactor costs models to different files Tamar Christina
2023-11-15 17:07 ` [PATCH 2/6]AArch64: Remove special handling of generic cpu Tamar Christina
2023-11-16  9:14   ` Richard Earnshaw
2023-11-15 17:07 ` [PATCH 3/6]AArch64: Add new generic-armv8-a CPU and make it the default Tamar Christina
2023-11-16  9:23   ` Richard Earnshaw
2023-11-15 17:08 ` [PATCH 4/6]AArch64: Add new generic-armv9-a CPU and make it the default for Armv9 Tamar Christina
2023-11-16  9:23   ` Richard Earnshaw
2023-11-15 17:08 ` [PATCH 6/6]AArch64: only emit mismatch error when features would be disabled Tamar Christina
2023-11-16  9:26   ` Richard Earnshaw
2023-11-16  9:33     ` Tamar Christina
2023-11-16  9:41       ` Richard Earnshaw
2023-11-16  9:50         ` Tamar Christina
2023-11-16 10:33   ` Richard Earnshaw
2023-11-16  9:13 ` [PATCH 1/6]AArch64: Refactor costs models to different files Richard Earnshaw

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).