From: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
To: Manos Anagnostakis <manos.anagnostakis@vrull.eu>,
"gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Cc: Philipp Tomsich <philipp.tomsich@vrull.eu>
Subject: RE: [PATCH] aarch64: Fine-grained ldp and stp policies with test-cases.
Date: Mon, 25 Sep 2023 10:59:10 +0000 [thread overview]
Message-ID: <PAXPR08MB6926FB8C20CE6AD74EE11F1C93FCA@PAXPR08MB6926.eurprd08.prod.outlook.com> (raw)
In-Reply-To: <20230818074943.41754-1-manos.anagnostakis@vrull.eu>
Hi Manos,
Apologies for the long delay.
> -----Original Message-----
> From: Manos Anagnostakis <manos.anagnostakis@vrull.eu>
> Sent: Friday, August 18, 2023 8:50 AM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; Philipp Tomsich
> <philipp.tomsich@vrull.eu>; Manos Anagnostakis
> <manos.anagnostakis@vrull.eu>
> Subject: [PATCH] aarch64: Fine-grained ldp and stp policies with test-cases.
>
> This patch implements the following TODO in gcc/config/aarch64/aarch64.cc
> to provide the requested behaviour for handling ldp and stp:
>
> /* Allow the tuning structure to disable LDP instruction formation
> from combining instructions (e.g., in peephole2).
> TODO: Implement fine-grained tuning control for LDP and STP:
> 1. control policies for load and store separately;
> 2. support the following policies:
> - default (use what is in the tuning structure)
> - always
> - never
> - aligned (only if the compiler can prove that the
> load will be aligned to 2 * element_size) */
>
> It provides two new and concrete command-line options -mldp-policy and -
> mstp-policy
> to give the ability to control load and store policies seperately as
> stated in part 1 of the TODO.
>
> The accepted values for both options are:
> - default: Use the ldp/stp policy defined in the corresponding tuning
> structure.
> - always: Emit ldp/stp regardless of alignment.
> - never: Do not emit ldp/stp.
> - aligned: In order to emit ldp/stp, first check if the load/store will
> be aligned to 2 * element_size.
>
> gcc/ChangeLog:
> * config/aarch64/aarch64-protos.h (struct tune_params): Add
> appropriate enums for the policies.
> * config/aarch64/aarch64-tuning-flags.def
> (AARCH64_EXTRA_TUNING_OPTION): Remove superseded tuning
> options.
> * config/aarch64/aarch64.cc (aarch64_parse_ldp_policy): New
> function to parse ldp-policy option.
> (aarch64_parse_stp_policy): New function to parse stp-policy option.
> (aarch64_override_options_internal): Call parsing functions.
> (aarch64_operands_ok_for_ldpstp): Add option-value check and
> alignment check and remove superseded ones
> (aarch64_operands_adjust_ok_for_ldpstp): Add option-value check and
> alignment check and remove superseded ones.
> * config/aarch64/aarch64.opt: Add options.
>
> gcc/testsuite/ChangeLog:
> * gcc.target/aarch64/ldp_aligned.c: New test.
> * gcc.target/aarch64/ldp_always.c: New test.
> * gcc.target/aarch64/ldp_never.c: New test.
> * gcc.target/aarch64/stp_aligned.c: New test.
> * gcc.target/aarch64/stp_always.c: New test.
> * gcc.target/aarch64/stp_never.c: New test.
>
> Signed-off-by: Manos Anagnostakis <manos.anagnostakis@vrull.eu>
> ---
>
> gcc/config/aarch64/aarch64-protos.h | 24 ++
> gcc/config/aarch64/aarch64-tuning-flags.def | 8 -
> gcc/config/aarch64/aarch64.cc | 229 ++++++++++++++----
> gcc/config/aarch64/aarch64.opt | 8 +
> .../gcc.target/aarch64/ldp_aligned.c | 64 +++++
> gcc/testsuite/gcc.target/aarch64/ldp_always.c | 64 +++++
> gcc/testsuite/gcc.target/aarch64/ldp_never.c | 64 +++++
> .../gcc.target/aarch64/stp_aligned.c | 60 +++++
> gcc/testsuite/gcc.target/aarch64/stp_always.c | 60 +++++
> gcc/testsuite/gcc.target/aarch64/stp_never.c | 60 +++++
> 10 files changed, 580 insertions(+), 61 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_aligned.c
> create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_always.c
> create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_never.c
> create mode 100644 gcc/testsuite/gcc.target/aarch64/stp_aligned.c
> create mode 100644 gcc/testsuite/gcc.target/aarch64/stp_always.c
> create mode 100644 gcc/testsuite/gcc.target/aarch64/stp_never.c
>
> diff --git a/gcc/config/aarch64/aarch64-protos.h
> b/gcc/config/aarch64/aarch64-protos.h
> index 70303d6fd95..be1d73490ed 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -568,6 +568,30 @@ struct tune_params
> /* Place prefetch struct pointer at the end to enable type checking
> errors when tune_params misses elements (e.g., from erroneous merges).
> */
> const struct cpu_prefetch_tune *prefetch;
> +/* An enum specifying how to handle load pairs using a fine-grained policy:
> + - LDP_POLICY_ALIGNED: Emit ldp if the source pointer is aligned
> + to at least double the alignment of the type.
> + - LDP_POLICY_ALWAYS: Emit ldp regardless of alignment.
> + - LDP_POLICY_NEVER: Do not emit ldp. */
> +
> + enum aarch64_ldp_policy_model
> + {
> + LDP_POLICY_ALIGNED,
> + LDP_POLICY_ALWAYS,
> + LDP_POLICY_NEVER
> + } ldp_policy_model;
> +/* An enum specifying how to handle store pairs using a fine-grained policy:
> + - STP_POLICY_ALIGNED: Emit stp if the source pointer is aligned
> + to at least double the alignment of the type.
> + - STP_POLICY_ALWAYS: Emit stp regardless of alignment.
> + - STP_POLICY_NEVER: Do not emit stp. */
> +
> + enum aarch64_stp_policy_model
> + {
> + STP_POLICY_ALIGNED,
> + STP_POLICY_ALWAYS,
> + STP_POLICY_NEVER
> + } stp_policy_model;
> };
>
> /* Classifies an address.
> diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def
> b/gcc/config/aarch64/aarch64-tuning-flags.def
> index 52112ba7c48..774568e9106 100644
> --- a/gcc/config/aarch64/aarch64-tuning-flags.def
> +++ b/gcc/config/aarch64/aarch64-tuning-flags.def
> @@ -30,11 +30,6 @@
>
> AARCH64_EXTRA_TUNING_OPTION ("rename_fma_regs",
> RENAME_FMA_REGS)
>
> -/* Don't create non-8 byte aligned load/store pair. That is if the
> -two load/stores are not at least 8 byte aligned don't create load/store
> -pairs. */
> -AARCH64_EXTRA_TUNING_OPTION ("slow_unaligned_ldpw",
> SLOW_UNALIGNED_LDPW)
> -
> /* Some of the optional shift to some arthematic instructions are
> considered cheap. Logical shift left <=4 with or without a
> zero extend are considered cheap. Sign extend; non logical shift left
> @@ -44,9 +39,6 @@ AARCH64_EXTRA_TUNING_OPTION
> ("cheap_shift_extend", CHEAP_SHIFT_EXTEND)
> /* Disallow load/store pair instructions on Q-registers. */
> AARCH64_EXTRA_TUNING_OPTION ("no_ldp_stp_qregs",
> NO_LDP_STP_QREGS)
>
> -/* Disallow load-pair instructions to be formed in combine/peephole. */
> -AARCH64_EXTRA_TUNING_OPTION ("no_ldp_combine", NO_LDP_COMBINE)
> -
> AARCH64_EXTRA_TUNING_OPTION ("rename_load_regs",
> RENAME_LOAD_REGS)
>
> AARCH64_EXTRA_TUNING_OPTION ("cse_sve_vl_constants",
> CSE_SVE_VL_CONSTANTS)
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 560e5431636..51c94804f12 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -1356,7 +1356,9 @@ static const struct tune_params generic_tunings =
> Neoverse V1. It does not have a noticeable effect on A64FX and should
> have at most a very minor effect on SVE2 cores. */
> (AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS), /* tune_flags. */
> - &generic_prefetch_tune
> + &generic_prefetch_tune,
> + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */
> + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */
> };
>
> static const struct tune_params cortexa35_tunings =
> @@ -1390,7 +1392,9 @@ static const struct tune_params cortexa35_tunings
> =
> 0, /* max_case_values. */
> tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */
> (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */
> - &generic_prefetch_tune
> + &generic_prefetch_tune,
> + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */
> + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */
> };
>
> static const struct tune_params cortexa53_tunings =
> @@ -1424,7 +1428,9 @@ static const struct tune_params cortexa53_tunings
> =
> 0, /* max_case_values. */
> tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */
> (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */
> - &generic_prefetch_tune
> + &generic_prefetch_tune,
> + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */
> + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */
> };
>
> static const struct tune_params cortexa57_tunings =
> @@ -1458,7 +1464,9 @@ static const struct tune_params cortexa57_tunings
> =
> 0, /* max_case_values. */
> tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */
> (AARCH64_EXTRA_TUNE_RENAME_FMA_REGS), /* tune_flags. */
> - &generic_prefetch_tune
> + &generic_prefetch_tune,
> + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */
> + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */
> };
>
> static const struct tune_params cortexa72_tunings =
> @@ -1492,7 +1500,9 @@ static const struct tune_params cortexa72_tunings
> =
> 0, /* max_case_values. */
> tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */
> (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */
> - &generic_prefetch_tune
> + &generic_prefetch_tune,
> + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */
> + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */
> };
>
> static const struct tune_params cortexa73_tunings =
> @@ -1526,7 +1536,9 @@ static const struct tune_params cortexa73_tunings
> =
> 0, /* max_case_values. */
> tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */
> (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */
> - &generic_prefetch_tune
> + &generic_prefetch_tune,
> + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */
> + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */
> };
>
>
> @@ -1561,7 +1573,9 @@ static const struct tune_params exynosm1_tunings
> =
> 48, /* max_case_values. */
> tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */
> (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */
> - &exynosm1_prefetch_tune
> + &exynosm1_prefetch_tune,
> + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */
> + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */
> };
>
> static const struct tune_params thunderxt88_tunings =
> @@ -1593,8 +1607,10 @@ static const struct tune_params
> thunderxt88_tunings =
> 2, /* min_div_recip_mul_df. */
> 0, /* max_case_values. */
> tune_params::AUTOPREFETCHER_OFF, /* autoprefetcher_model. */
> - (AARCH64_EXTRA_TUNE_SLOW_UNALIGNED_LDPW), /* tune_flags. */
> - &thunderxt88_prefetch_tune
> + (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */
> + &thunderxt88_prefetch_tune,
> + tune_params::LDP_POLICY_ALIGNED, /* ldp_policy_model. */
> + tune_params::STP_POLICY_ALIGNED /* stp_policy_model. */
> };
>
> static const struct tune_params thunderx_tunings =
> @@ -1626,9 +1642,10 @@ static const struct tune_params thunderx_tunings
> =
> 2, /* min_div_recip_mul_df. */
> 0, /* max_case_values. */
> tune_params::AUTOPREFETCHER_OFF, /* autoprefetcher_model. */
> - (AARCH64_EXTRA_TUNE_SLOW_UNALIGNED_LDPW
> - | AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND), /* tune_flags. */
> - &thunderx_prefetch_tune
> + (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND), /* tune_flags. */
> + &thunderx_prefetch_tune,
> + tune_params::LDP_POLICY_ALIGNED, /* ldp_policy_model. */
> + tune_params::STP_POLICY_ALIGNED /* stp_policy_model. */
> };
>
> static const struct tune_params tsv110_tunings =
> @@ -1662,7 +1679,9 @@ static const struct tune_params tsv110_tunings =
> 0, /* max_case_values. */
> tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */
> (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */
> - &tsv110_prefetch_tune
> + &tsv110_prefetch_tune,
> + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */
> + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */
> };
>
> static const struct tune_params xgene1_tunings =
> @@ -1695,7 +1714,9 @@ static const struct tune_params xgene1_tunings =
> 17, /* max_case_values. */
> tune_params::AUTOPREFETCHER_OFF, /* autoprefetcher_model. */
> (AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS), /* tune_flags. */
> - &xgene1_prefetch_tune
> + &xgene1_prefetch_tune,
> + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */
> + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */
> };
>
> static const struct tune_params emag_tunings =
> @@ -1728,7 +1749,9 @@ static const struct tune_params emag_tunings =
> 17, /* max_case_values. */
> tune_params::AUTOPREFETCHER_OFF, /* autoprefetcher_model. */
> (AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS), /* tune_flags. */
> - &xgene1_prefetch_tune
> + &xgene1_prefetch_tune,
> + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */
> + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */
> };
>
> static const struct tune_params qdf24xx_tunings =
> @@ -1762,7 +1785,9 @@ static const struct tune_params qdf24xx_tunings =
> 0, /* max_case_values. */
> tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */
> AARCH64_EXTRA_TUNE_RENAME_LOAD_REGS, /* tune_flags. */
> - &qdf24xx_prefetch_tune
> + &qdf24xx_prefetch_tune,
> + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */
> + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */
> };
>
> /* Tuning structure for the Qualcomm Saphira core. Default to falkor values
> @@ -1798,7 +1823,9 @@ static const struct tune_params saphira_tunings =
> 0, /* max_case_values. */
> tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */
> (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */
> - &generic_prefetch_tune
> + &generic_prefetch_tune,
> + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */
> + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */
> };
>
> static const struct tune_params thunderx2t99_tunings =
> @@ -1832,7 +1859,9 @@ static const struct tune_params
> thunderx2t99_tunings =
> 0, /* max_case_values. */
> tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */
> (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */
> - &thunderx2t99_prefetch_tune
> + &thunderx2t99_prefetch_tune,
> + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */
> + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */
> };
>
> static const struct tune_params thunderx3t110_tunings =
> @@ -1866,7 +1895,9 @@ static const struct tune_params
> thunderx3t110_tunings =
> 0, /* max_case_values. */
> tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */
> (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */
> - &thunderx3t110_prefetch_tune
> + &thunderx3t110_prefetch_tune,
> + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */
> + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */
> };
>
> static const struct tune_params neoversen1_tunings =
> @@ -1899,7 +1930,9 @@ static const struct tune_params
> neoversen1_tunings =
> 0, /* max_case_values. */
> tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */
> (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND), /* tune_flags. */
> - &generic_prefetch_tune
> + &generic_prefetch_tune,
> + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */
> + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */
> };
>
> static const struct tune_params ampere1_tunings =
> @@ -1935,8 +1968,10 @@ static const struct tune_params ampere1_tunings
> =
> 2, /* min_div_recip_mul_df. */
> 0, /* max_case_values. */
> tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */
> - (AARCH64_EXTRA_TUNE_NO_LDP_COMBINE), /* tune_flags. */
> - &ere1_prefetch_tune
> + (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */
> + &ere1_prefetch_tune,
> + tune_params::LDP_POLICY_ALIGNED, /* ldp_policy_model. */
> + tune_params::STP_POLICY_ALIGNED /* stp_policy_model. */
> };
>
> static const struct tune_params ampere1a_tunings =
> @@ -1973,8 +2008,10 @@ static const struct tune_params ampere1a_tunings
> =
> 2, /* min_div_recip_mul_df. */
> 0, /* max_case_values. */
> tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */
> - (AARCH64_EXTRA_TUNE_NO_LDP_COMBINE), /* tune_flags. */
> - &ere1_prefetch_tune
> + (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */
> + &ere1_prefetch_tune,
> + tune_params::LDP_POLICY_ALIGNED, /* ldp_policy_model. */
> + tune_params::STP_POLICY_ALIGNED /* stp_policy_model. */
> };
>
> static const advsimd_vec_cost neoversev1_advsimd_vector_cost =
> @@ -2155,7 +2192,9 @@ static const struct tune_params
> neoversev1_tunings =
> | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
> | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT
> | AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND), /* tune_flags. */
> - &generic_prefetch_tune
> + &generic_prefetch_tune,
> + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */
> + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */
> };
>
> static const sve_vec_cost neoverse512tvb_sve_vector_cost =
> @@ -2292,7 +2331,9 @@ static const struct tune_params
> neoverse512tvb_tunings =
> (AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
> | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
> | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT), /*
> tune_flags. */
> - &generic_prefetch_tune
> + &generic_prefetch_tune,
> + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */
> + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */
> };
>
> static const advsimd_vec_cost neoversen2_advsimd_vector_cost =
> @@ -2482,7 +2523,9 @@ static const struct tune_params
> neoversen2_tunings =
> | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
> | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
> | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT), /*
> tune_flags. */
> - &generic_prefetch_tune
> + &generic_prefetch_tune,
> + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */
> + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */
> };
>
> static const advsimd_vec_cost neoversev2_advsimd_vector_cost =
> @@ -2672,7 +2715,9 @@ static const struct tune_params
> neoversev2_tunings =
> | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
> | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
> | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT), /*
> tune_flags. */
> - &generic_prefetch_tune
> + &generic_prefetch_tune,
> + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */
> + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */
> };
>
> static const struct tune_params a64fx_tunings =
> @@ -2705,7 +2750,9 @@ static const struct tune_params a64fx_tunings =
> 0, /* max_case_values. */
> tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */
> (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */
> - &a64fx_prefetch_tune
> + &a64fx_prefetch_tune,
> + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */
> + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */
> };
>
> /* Support for fine-grained override of the tuning structures. */
> @@ -17645,6 +17692,50 @@ aarch64_parse_tune (const char *to_parse,
> const struct processor **res)
> return AARCH_PARSE_INVALID_ARG;
> }
>
> +/* Validate a command-line -mldp-policy option. Parse the policy
> + specified in STR and throw errors if appropriate. */
> +
> +static bool
> +aarch64_parse_ldp_policy (const char *str, struct tune_params* tune)
> +{
> + /* Check the value of the option to be one of the accepted. */
> + if (strcmp (str, "always") == 0)
> + tune->ldp_policy_model = tune_params::LDP_POLICY_ALWAYS;
> + else if (strcmp (str, "never") == 0)
> + tune->ldp_policy_model = tune_params::LDP_POLICY_NEVER;
> + else if (strcmp (str, "aligned") == 0)
> + tune->ldp_policy_model = tune_params::LDP_POLICY_ALIGNED;
> + else if (strcmp (str, "default") != 0)
> + {
> + error ("unknown value %qs for %<-mldp-policy%>", str);
> + return false;
> + }
> +
> + return true;
> +}
> +
> +/* Validate a command-line -mstp-policy option. Parse the policy
> + specified in STR and throw errors if appropriate. */
> +
> +static bool
> +aarch64_parse_stp_policy (const char *str, struct tune_params* tune)
> +{
> + /* Check the value of the option to be one of the accepted. */
> + if (strcmp (str, "always") == 0)
> + tune->stp_policy_model = tune_params::STP_POLICY_ALWAYS;
> + else if (strcmp (str, "never") == 0)
> + tune->stp_policy_model = tune_params::STP_POLICY_NEVER;
> + else if (strcmp (str, "aligned") == 0)
> + tune->stp_policy_model = tune_params::STP_POLICY_ALIGNED;
> + else if (strcmp (str, "default") != 0)
> + {
> + error ("unknown value %qs for %<-mstp-policy%>", str);
> + return false;
> + }
> +
> + return true;
> +}
> +
> /* Parse TOKEN, which has length LENGTH to see if it is an option
> described in FLAG. If it is, return the index bit for that fusion type.
> If not, error (printing OPTION_NAME) and return zero. */
> @@ -17993,6 +18084,14 @@ aarch64_override_options_internal (struct
> gcc_options *opts)
> aarch64_parse_override_string (opts->x_aarch64_override_tune_string,
> &aarch64_tune_params);
>
> + if (opts->x_aarch64_ldp_policy_string)
> + aarch64_parse_ldp_policy (opts->x_aarch64_ldp_policy_string,
> + &aarch64_tune_params);
> +
> + if (opts->x_aarch64_stp_policy_string)
> + aarch64_parse_stp_policy (opts->x_aarch64_stp_policy_string,
> + &aarch64_tune_params);
> +
> /* This target defaults to strict volatile bitfields. */
> if (opts->x_flag_strict_volatile_bitfields < 0 && abi_version_at_least (2))
> opts->x_flag_strict_volatile_bitfields = 1;
> @@ -26301,18 +26400,14 @@ aarch64_operands_ok_for_ldpstp (rtx
> *operands, bool load,
> enum reg_class rclass_1, rclass_2;
> rtx mem_1, mem_2, reg_1, reg_2;
>
> - /* Allow the tuning structure to disable LDP instruction formation
> - from combining instructions (e.g., in peephole2).
> - TODO: Implement fine-grained tuning control for LDP and STP:
> - 1. control policies for load and store separately;
> - 2. support the following policies:
> - - default (use what is in the tuning structure)
> - - always
> - - never
> - - aligned (only if the compiler can prove that the
> - load will be aligned to 2 * element_size) */
> - if (load && (aarch64_tune_params.extra_tuning_flags
> - & AARCH64_EXTRA_TUNE_NO_LDP_COMBINE))
> + /* If we have LDP_POLICY_NEVER, reject the load pair. */
> + if (load
> + && aarch64_tune_params.ldp_policy_model ==
> tune_params::LDP_POLICY_NEVER)
> + return false;
> +
> + /* If we have STP_POLICY_NEVER, reject the store pair. */
> + if (!load
> + && aarch64_tune_params.stp_policy_model ==
> tune_params::STP_POLICY_NEVER)
> return false;
>
> if (load)
> @@ -26339,13 +26434,22 @@ aarch64_operands_ok_for_ldpstp (rtx
> *operands, bool load,
> if (MEM_VOLATILE_P (mem_1) || MEM_VOLATILE_P (mem_2))
> return false;
>
> - /* If we have SImode and slow unaligned ldp,
> - check the alignment to be at least 8 byte. */
> - if (mode == SImode
> - && (aarch64_tune_params.extra_tuning_flags
> - & AARCH64_EXTRA_TUNE_SLOW_UNALIGNED_LDPW)
> + /* If we have LDP_POLICY_ALIGNED,
> + do not emit the load pair unless the alignment is checked to be
> + at least double the alignment of the type. */
> + if (load
> + && aarch64_tune_params.ldp_policy_model ==
> tune_params::LDP_POLICY_ALIGNED
> && !optimize_size
> - && MEM_ALIGN (mem_1) < 8 * BITS_PER_UNIT)
> + && MEM_ALIGN (mem_1) < 2 * GET_MODE_ALIGNMENT (mode))
> + return false;
> +
> + /* If we have STP_POLICY_ALIGNED,
> + do not emit the store pair unless the alignment is checked to be
> + at least double the alignment of the type. */
> + if (!load
> + && aarch64_tune_params.stp_policy_model ==
> tune_params::STP_POLICY_ALIGNED
> + && !optimize_size
> + && MEM_ALIGN (mem_1) < 2 * GET_MODE_ALIGNMENT (mode))
> return false;
I appreciate there is an existing use of optimize_size above, but the recommended way of checking this is optimize_function_for_size_p (cfun)
>
> /* Check if the addresses are in the form of [base+offset]. */
> @@ -26475,6 +26579,16 @@ aarch64_operands_adjust_ok_for_ldpstp (rtx
> *operands, bool load,
> HOST_WIDE_INT offvals[num_insns], msize;
> rtx mem[num_insns], reg[num_insns], base[num_insns],
> offset[num_insns];
>
> + /* If we have LDP_POLICY_NEVER, reject the load pair. */
> + if (load
> + && aarch64_tune_params.ldp_policy_model ==
> tune_params::LDP_POLICY_NEVER)
> + return false;
> +
> + /* If we have STP_POLICY_NEVER, reject the store pair. */
> + if (!load
> + && aarch64_tune_params.stp_policy_model ==
> tune_params::STP_POLICY_NEVER)
> + return false;
> +
> if (load)
> {
> for (int i = 0; i < num_insns; i++)
> @@ -26564,13 +26678,22 @@ aarch64_operands_adjust_ok_for_ldpstp (rtx
> *operands, bool load,
> if (offvals[0] % msize != offvals[2] % msize)
> return false;
>
> - /* If we have SImode and slow unaligned ldp,
> - check the alignment to be at least 8 byte. */
> - if (mode == SImode
> - && (aarch64_tune_params.extra_tuning_flags
> - & AARCH64_EXTRA_TUNE_SLOW_UNALIGNED_LDPW)
> + /* If we have LDP_POLICY_ALIGNED,
> + do not emit the load pair unless the alignment is checked to be
> + at least double the alignment of the type. */
> + if (load
> + && aarch64_tune_params.ldp_policy_model ==
> tune_params::LDP_POLICY_ALIGNED
> + && !optimize_size
> + && MEM_ALIGN (mem[0]) < 2 * GET_MODE_ALIGNMENT (mode))
> + return false;
> +
> + /* If we have STP_POLICY_ALIGNED,
> + do not emit the store pair unless the alignment is checked to be
> + at least double the alignment of the type. */
> + if (!load
> + && aarch64_tune_params.stp_policy_model ==
> tune_params::STP_POLICY_ALIGNED
> && !optimize_size
> - && MEM_ALIGN (mem[0]) < 8 * BITS_PER_UNIT)
> + && MEM_ALIGN (mem[0]) < 2 * GET_MODE_ALIGNMENT (mode))
> return false;
>
> return true;
> diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
> index 4a0580435a8..e5302947ce7 100644
> --- a/gcc/config/aarch64/aarch64.opt
> +++ b/gcc/config/aarch64/aarch64.opt
> @@ -205,6 +205,14 @@ msign-return-address=
> Target WarnRemoved RejectNegative Joined Enum(aarch_ra_sign_scope_t)
> Var(aarch_ra_sign_scope) Init(AARCH_FUNCTION_NONE) Save
> Select return address signing scope.
>
> +mldp-policy=
> +Target RejectNegative Joined Var(aarch64_ldp_policy_string) Save
> +Fine-grained policy for load pairs.
> +
> +mstp-policy=
> +Target RejectNegative Joined Var(aarch64_stp_policy_string) Save
> +Fine-grained policy for store pairs.
I'd like to avoid having -m* option for such low-level codegen tweaks. -m* options should be used for options that enable/disable user-visible features, ABI things etc.
We have target-specific params these days so I'd recommend you implement this in a similar way to -param=aarch64-autovec-preference=.
It will have to take a number rather than a string but that should be okay, as long as the right values are documented in invoke.texi.
Otherwise the approach looks good.
Thanks,
Kyrill
> +
> Enum
> Name(aarch_ra_sign_scope_t) Type(enum aarch_function_type)
> Supported AArch64 return address signing scope (for use with -msign-return-
> address= option):
> diff --git a/gcc/testsuite/gcc.target/aarch64/ldp_aligned.c
> b/gcc/testsuite/gcc.target/aarch64/ldp_aligned.c
> new file mode 100644
> index 00000000000..895018f6b53
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/ldp_aligned.c
> @@ -0,0 +1,64 @@
> +/* { dg-options "-O3 -mldp-policy=aligned" } */
> +
> +#include <stdlib.h>
> +#include <stdint.h>
> +
> +typedef int v4si __attribute__ ((vector_size (16)));
> +
> +#define LDP_TEST_ALIGNED(TYPE) \
> +TYPE ldp_aligned_##TYPE(char* ptr){ \
> + TYPE a_0, a_1; \
> + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1)); \
> + a_0 = arr[0]; \
> + a_1 = arr[1]; \
> + return a_0 + a_1; \
> +}
> +
> +#define LDP_TEST_UNALIGNED(TYPE) \
> +TYPE ldp_unaligned_##TYPE(char* ptr){ \
> + TYPE a_0, a_1; \
> + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1)); \
> + TYPE *a = arr+1; \
> + a_0 = a[0]; \
> + a_1 = a[1]; \
> + return a_0 + a_1; \
> +}
> +
> +#define LDP_TEST_ADJUST_ALIGNED(TYPE) \
> +TYPE ldp_aligned_adjust_##TYPE(char* ptr){ \
> + TYPE a_0, a_1, a_2, a_3; \
> + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1)); \
> + a_0 = arr[100]; \
> + a_1 = arr[101]; \
> + a_2 = arr[102]; \
> + a_3 = arr[103]; \
> + return a_0 + a_1 + a_2 + a_3; \
> +}
> +
> +#define LDP_TEST_ADJUST_UNALIGNED(TYPE) \
> +TYPE ldp_unaligned_adjust_##TYPE(char* ptr){ \
> + TYPE a_0, a_1, a_2, a_3; \
> + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1)); \
> + TYPE *a = arr+1; \
> + a_0 = a[100]; \
> + a_1 = a[101]; \
> + a_2 = a[102]; \
> + a_3 = a[103]; \
> + return a_0 + a_1 + a_2 + a_3; \
> +}
> +
> +LDP_TEST_ALIGNED(int32_t);
> +LDP_TEST_ALIGNED(int64_t);
> +LDP_TEST_ALIGNED(v4si);
> +LDP_TEST_UNALIGNED(int32_t);
> +LDP_TEST_UNALIGNED(int64_t);
> +LDP_TEST_UNALIGNED(v4si);
> +LDP_TEST_ADJUST_ALIGNED(int32_t);
> +LDP_TEST_ADJUST_ALIGNED(int64_t);
> +LDP_TEST_ADJUST_UNALIGNED(int32_t);
> +LDP_TEST_ADJUST_UNALIGNED(int64_t);
> +
> +/* { dg-final { scan-assembler-times "ldp\tw\[0-9\]+, w\[0-9\]" 3 } } */
> +/* { dg-final { scan-assembler-times "ldp\tx\[0-9\]+, x\[0-9\]" 3 } } */
> +/* { dg-final { scan-assembler-times "ldp\tq\[0-9\]+, q\[0-9\]" 1 } } */
> +
> diff --git a/gcc/testsuite/gcc.target/aarch64/ldp_always.c
> b/gcc/testsuite/gcc.target/aarch64/ldp_always.c
> new file mode 100644
> index 00000000000..ead4fe41891
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/ldp_always.c
> @@ -0,0 +1,64 @@
> +/* { dg-options "-O3 -mldp-policy=always" } */
> +
> +#include <stdlib.h>
> +#include <stdint.h>
> +
> +typedef int v4si __attribute__ ((vector_size (16)));
> +
> +#define LDP_TEST_ALIGNED(TYPE) \
> +TYPE ldp_aligned_##TYPE(char* ptr){ \
> + TYPE a_0, a_1; \
> + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1)); \
> + a_0 = arr[0]; \
> + a_1 = arr[1]; \
> + return a_0 + a_1; \
> +}
> +
> +#define LDP_TEST_UNALIGNED(TYPE) \
> +TYPE ldp_unaligned_##TYPE(char* ptr){ \
> + TYPE a_0, a_1; \
> + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1)); \
> + TYPE *a = arr+1; \
> + a_0 = a[0]; \
> + a_1 = a[1]; \
> + return a_0 + a_1; \
> +}
> +
> +#define LDP_TEST_ADJUST_ALIGNED(TYPE) \
> +TYPE ldp_aligned_adjust_##TYPE(char* ptr){ \
> + TYPE a_0, a_1, a_2, a_3; \
> + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1)); \
> + a_0 = arr[100]; \
> + a_1 = arr[101]; \
> + a_2 = arr[102]; \
> + a_3 = arr[103]; \
> + return a_0 + a_1 + a_2 + a_3; \
> +}
> +
> +#define LDP_TEST_ADJUST_UNALIGNED(TYPE) \
> +TYPE ldp_unaligned_adjust_##TYPE(char* ptr){ \
> + TYPE a_0, a_1, a_2, a_3; \
> + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1)); \
> + TYPE *a = arr+1; \
> + a_0 = a[100]; \
> + a_1 = a[101]; \
> + a_2 = a[102]; \
> + a_3 = a[103]; \
> + return a_0 + a_1 + a_2 + a_3; \
> +}
> +
> +LDP_TEST_ALIGNED(int32_t);
> +LDP_TEST_ALIGNED(int64_t);
> +LDP_TEST_ALIGNED(v4si);
> +LDP_TEST_UNALIGNED(int32_t);
> +LDP_TEST_UNALIGNED(int64_t);
> +LDP_TEST_UNALIGNED(v4si);
> +LDP_TEST_ADJUST_ALIGNED(int32_t);
> +LDP_TEST_ADJUST_ALIGNED(int64_t);
> +LDP_TEST_ADJUST_UNALIGNED(int32_t);
> +LDP_TEST_ADJUST_UNALIGNED(int64_t);
> +
> +/* { dg-final { scan-assembler-times "ldp\tw\[0-9\]+, w\[0-9\]" 6 } } */
> +/* { dg-final { scan-assembler-times "ldp\tx\[0-9\]+, x\[0-9\]" 6 } } */
> +/* { dg-final { scan-assembler-times "ldp\tq\[0-9\]+, q\[0-9\]" 2 } } */
> +
> diff --git a/gcc/testsuite/gcc.target/aarch64/ldp_never.c
> b/gcc/testsuite/gcc.target/aarch64/ldp_never.c
> new file mode 100644
> index 00000000000..aae2f087241
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/ldp_never.c
> @@ -0,0 +1,64 @@
> +/* { dg-options "-O3 -mldp-policy=never" } */
> +
> +#include <stdlib.h>
> +#include <stdint.h>
> +
> +typedef int v4si __attribute__ ((vector_size (16)));
> +
> +#define LDP_TEST_ALIGNED(TYPE) \
> +TYPE ldp_aligned_##TYPE(char* ptr){ \
> + TYPE a_0, a_1; \
> + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1)); \
> + a_0 = arr[0]; \
> + a_1 = arr[1]; \
> + return a_0 + a_1; \
> +}
> +
> +#define LDP_TEST_UNALIGNED(TYPE) \
> +TYPE ldp_unaligned_##TYPE(char* ptr){ \
> + TYPE a_0, a_1; \
> + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1)); \
> + TYPE *a = arr+1; \
> + a_0 = a[0]; \
> + a_1 = a[1]; \
> + return a_0 + a_1; \
> +}
> +
> +#define LDP_TEST_ADJUST_ALIGNED(TYPE) \
> +TYPE ldp_aligned_adjust_##TYPE(char* ptr){ \
> + TYPE a_0, a_1, a_2, a_3; \
> + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1)); \
> + a_0 = arr[100]; \
> + a_1 = arr[101]; \
> + a_2 = arr[102]; \
> + a_3 = arr[103]; \
> + return a_0 + a_1 + a_2 + a_3; \
> +}
> +
> +#define LDP_TEST_ADJUST_UNALIGNED(TYPE) \
> +TYPE ldp_unaligned_adjust_##TYPE(char* ptr){ \
> + TYPE a_0, a_1, a_2, a_3; \
> + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1)); \
> + TYPE *a = arr+1; \
> + a_0 = a[100]; \
> + a_1 = a[101]; \
> + a_2 = a[102]; \
> + a_3 = a[103]; \
> + return a_0 + a_1 + a_2 + a_3; \
> +}
> +
> +LDP_TEST_ALIGNED(int32_t);
> +LDP_TEST_ALIGNED(int64_t);
> +LDP_TEST_ALIGNED(v4si);
> +LDP_TEST_UNALIGNED(int32_t);
> +LDP_TEST_UNALIGNED(int64_t);
> +LDP_TEST_UNALIGNED(v4si);
> +LDP_TEST_ADJUST_ALIGNED(int32_t);
> +LDP_TEST_ADJUST_ALIGNED(int64_t);
> +LDP_TEST_ADJUST_UNALIGNED(int32_t);
> +LDP_TEST_ADJUST_UNALIGNED(int64_t);
> +
> +/* { dg-final { scan-assembler-times "ldp\tw\[0-9\]+, w\[0-9\]" 0 } } */
> +/* { dg-final { scan-assembler-times "ldp\tx\[0-9\]+, x\[0-9\]" 0 } } */
> +/* { dg-final { scan-assembler-times "ldp\tq\[0-9\]+, q\[0-9\]" 0 } } */
> +
> diff --git a/gcc/testsuite/gcc.target/aarch64/stp_aligned.c
> b/gcc/testsuite/gcc.target/aarch64/stp_aligned.c
> new file mode 100644
> index 00000000000..07b49629292
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/stp_aligned.c
> @@ -0,0 +1,60 @@
> +/* { dg-options "-O3 -mstp-policy=aligned" } */
> +
> +#include <stdlib.h>
> +#include <stdint.h>
> +
> +typedef int v4si __attribute__ ((vector_size (16)));
> +
> +#define STP_TEST_ALIGNED(TYPE) \
> +TYPE *stp_aligned_##TYPE(char* ptr, TYPE x){ \
> + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1)); \
> + arr[0] = x; \
> + arr[1] = x; \
> + return arr; \
> +}
> +
> +#define STP_TEST_UNALIGNED(TYPE) \
> +TYPE *stp_unaligned_##TYPE(char* ptr, TYPE x){ \
> + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1)); \
> + TYPE *a = arr+1; \
> + a[0] = x; \
> + a[1] = x; \
> + return a; \
> +}
> +
> +#define STP_TEST_ADJUST_ALIGNED(TYPE) \
> +TYPE *stp_aligned_adjust_##TYPE(char* ptr, TYPE x){ \
> + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1)); \
> + arr[100] = x; \
> + arr[101] = x; \
> + arr[102] = x; \
> + arr[103] = x; \
> + return arr; \
> +}
> +
> +#define STP_TEST_ADJUST_UNALIGNED(TYPE) \
> +TYPE *stp_unaligned_adjust_##TYPE(char* ptr, TYPE x){ \
> + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1)); \
> + TYPE *a = arr+1; \
> + a[100] = x; \
> + a[101] = x; \
> + a[102] = x; \
> + a[103] = x; \
> + return a; \
> +}
> +
> +STP_TEST_ALIGNED(int32_t);
> +STP_TEST_ALIGNED(int64_t);
> +STP_TEST_ALIGNED(v4si);
> +STP_TEST_UNALIGNED(int32_t);
> +STP_TEST_UNALIGNED(int64_t);
> +STP_TEST_UNALIGNED(v4si);
> +STP_TEST_ADJUST_ALIGNED(int32_t);
> +STP_TEST_ADJUST_ALIGNED(int64_t);
> +STP_TEST_ADJUST_UNALIGNED(int32_t);
> +STP_TEST_ADJUST_UNALIGNED(int64_t);
> +
> +/* { dg-final { scan-assembler-times "stp\tw\[0-9\]+, w\[0-9\]" 3 } } */
> +/* { dg-final { scan-assembler-times "stp\tx\[0-9\]+, x\[0-9\]" 3 } } */
> +/* { dg-final { scan-assembler-times "stp\tq\[0-9\]+, q\[0-9\]" 1 } } */
> +
> diff --git a/gcc/testsuite/gcc.target/aarch64/stp_always.c
> b/gcc/testsuite/gcc.target/aarch64/stp_always.c
> new file mode 100644
> index 00000000000..6a1c671f02c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/stp_always.c
> @@ -0,0 +1,60 @@
> +/* { dg-options "-O3 -mstp-policy=always" } */
> +
> +#include <stdlib.h>
> +#include <stdint.h>
> +
> +typedef int v4si __attribute__ ((vector_size (16)));
> +
> +#define STP_TEST_ALIGNED(TYPE) \
> +TYPE *stp_aligned_##TYPE(char* ptr, TYPE x){ \
> + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1)); \
> + arr[0] = x; \
> + arr[1] = x; \
> + return arr; \
> +}
> +
> +#define STP_TEST_UNALIGNED(TYPE) \
> +TYPE *stp_unaligned_##TYPE(char* ptr, TYPE x){ \
> + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1)); \
> + TYPE *a = arr+1; \
> + a[0] = x; \
> + a[1] = x; \
> + return a; \
> +}
> +
> +#define STP_TEST_ADJUST_ALIGNED(TYPE) \
> +TYPE *stp_aligned_adjust_##TYPE(char* ptr, TYPE x){ \
> + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1)); \
> + arr[100] = x; \
> + arr[101] = x; \
> + arr[102] = x; \
> + arr[103] = x; \
> + return arr; \
> +}
> +
> +#define STP_TEST_ADJUST_UNALIGNED(TYPE) \
> +TYPE *stp_unaligned_adjust_##TYPE(char* ptr, TYPE x){ \
> + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1)); \
> + TYPE *a = arr+1; \
> + a[100] = x; \
> + a[101] = x; \
> + a[102] = x; \
> + a[103] = x; \
> + return a; \
> +}
> +
> +STP_TEST_ALIGNED(int32_t);
> +STP_TEST_ALIGNED(int64_t);
> +STP_TEST_ALIGNED(v4si);
> +STP_TEST_UNALIGNED(int32_t);
> +STP_TEST_UNALIGNED(int64_t);
> +STP_TEST_UNALIGNED(v4si);
> +STP_TEST_ADJUST_ALIGNED(int32_t);
> +STP_TEST_ADJUST_ALIGNED(int64_t);
> +STP_TEST_ADJUST_UNALIGNED(int32_t);
> +STP_TEST_ADJUST_UNALIGNED(int64_t);
> +
> +/* { dg-final { scan-assembler-times "stp\tw\[0-9\]+, w\[0-9\]" 6 } } */
> +/* { dg-final { scan-assembler-times "stp\tx\[0-9\]+, x\[0-9\]" 6 } } */
> +/* { dg-final { scan-assembler-times "stp\tq\[0-9\]+, q\[0-9\]" 2 } } */
> +
> diff --git a/gcc/testsuite/gcc.target/aarch64/stp_never.c
> b/gcc/testsuite/gcc.target/aarch64/stp_never.c
> new file mode 100644
> index 00000000000..9cd703995b7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/stp_never.c
> @@ -0,0 +1,60 @@
> +/* { dg-options "-O3 -mstp-policy=never" } */
> +
> +#include <stdlib.h>
> +#include <stdint.h>
> +
> +typedef int v4si __attribute__ ((vector_size (16)));
> +
> +#define STP_TEST_ALIGNED(TYPE) \
> +TYPE *stp_aligned_##TYPE(char* ptr, TYPE x){ \
> + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1)); \
> + arr[0] = x; \
> + arr[1] = x; \
> + return arr; \
> +}
> +
> +#define STP_TEST_UNALIGNED(TYPE) \
> +TYPE *stp_unaligned_##TYPE(char* ptr, TYPE x){ \
> + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1)); \
> + TYPE *a = arr+1; \
> + a[0] = x; \
> + a[1] = x; \
> + return a; \
> +}
> +
> +#define STP_TEST_ADJUST_ALIGNED(TYPE) \
> +TYPE *stp_aligned_adjust_##TYPE(char* ptr, TYPE x){ \
> + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1)); \
> + arr[100] = x; \
> + arr[101] = x; \
> + arr[102] = x; \
> + arr[103] = x; \
> + return arr; \
> +}
> +
> +#define STP_TEST_ADJUST_UNALIGNED(TYPE) \
> +TYPE *stp_unaligned_adjust_##TYPE(char* ptr, TYPE x){ \
> + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1)); \
> + TYPE *a = arr+1; \
> + a[100] = x; \
> + a[101] = x; \
> + a[102] = x; \
> + a[103] = x; \
> + return a; \
> +}
> +
> +STP_TEST_ALIGNED(int32_t);
> +STP_TEST_ALIGNED(int64_t);
> +STP_TEST_ALIGNED(v4si);
> +STP_TEST_UNALIGNED(int32_t);
> +STP_TEST_UNALIGNED(int64_t);
> +STP_TEST_UNALIGNED(v4si);
> +STP_TEST_ADJUST_ALIGNED(int32_t);
> +STP_TEST_ADJUST_ALIGNED(int64_t);
> +STP_TEST_ADJUST_UNALIGNED(int32_t);
> +STP_TEST_ADJUST_UNALIGNED(int64_t);
> +
> +/* { dg-final { scan-assembler-times "stp\tw\[0-9\]+, w\[0-9\]" 0 } } */
> +/* { dg-final { scan-assembler-times "stp\tx\[0-9\]+, x\[0-9\]" 0 } } */
> +/* { dg-final { scan-assembler-times "stp\tq\[0-9\]+, q\[0-9\]" 0 } } */
> +
> --
> 2.40.1
next prev parent reply other threads:[~2023-09-25 10:59 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-18 7:49 Manos Anagnostakis
2023-09-25 10:59 ` Kyrylo Tkachov [this message]
2023-09-25 11:37 ` Manos Anagnostakis
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=PAXPR08MB6926FB8C20CE6AD74EE11F1C93FCA@PAXPR08MB6926.eurprd08.prod.outlook.com \
--to=kyrylo.tkachov@arm.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=manos.anagnostakis@vrull.eu \
--cc=philipp.tomsich@vrull.eu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).