On 26/05/16 11:17, James Greenhalgh wrote: > On Fri, May 20, 2016 at 11:04:32AM +0100, Kyrill Tkachov wrote: >> Hi all, >> >> The recent -frename-registers change exposed a deficiency in the way we fuse >> AESE/AESMC instruction pairs in aarch64. >> >> Basically we want to enforce: >> AESE Vn, _ >> AESMC Vn, Vn >> >> to enable the fusion, but regrename comes along and renames the output Vn >> register in AESMC to something else, killing the fusion in the hardware. >> >> The solution in this patch is to add an alternative that ties the input and >> output registers in the AESMC pattern and enable that alternative when the >> fusion is enabled. >> >> With this patch I've confirmed that the above preferred register sequence is >> kept even with -frename-registers when tuning for a cpu that enables the >> fusion and that the chain is broken by regrename otherwise and have seen the >> appropriate improvement in a proprietary benchmark (that I cannot name) that >> exercises this sequence. >> >> Bootstrapped and tested on aarch64-none-linux-gnu. >> >> Ok for trunk? >> >> Thanks, >> Kyrill >> >> 2016-05-20 Kyrylo Tkachov >> >> * config/aarch64/aarch64.c (aarch64_fusion_enabled_p): New function. >> * config/aarch64/aarch64-protos.h (aarch64_fusion_enabled_p): Declare >> prototype. >> * config/aarch64/aarch64-simd.md (aarch64_crypto_aesv16qi): >> Add "=w,0" alternative. Enable it when AES/AESMC fusion is enabled. >> diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h >> index 21cf55b60f86024429ea36ead0d2d8ae4c94b579..f6da854fbaeeab34239a1f874edaedf8a01bf9c2 100644 >> --- a/gcc/config/aarch64/aarch64-protos.h >> +++ b/gcc/config/aarch64/aarch64-protos.h >> @@ -290,6 +290,7 @@ bool aarch64_constant_address_p (rtx); >> bool aarch64_expand_movmem (rtx *); >> bool aarch64_float_const_zero_rtx_p (rtx); >> bool aarch64_function_arg_regno_p (unsigned); >> +bool aarch64_fusion_enabled_p (unsigned int); > This argument type should be "enum aarch64_fusion_pairs". > >> bool aarch64_gen_movmemqi (rtx *); >> bool aarch64_gimple_fold_builtin (gimple_stmt_iterator *); >> bool aarch64_is_extend_from_extract (machine_mode, rtx, rtx); >> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c >> index b93f961fc4ebd9eb3f50b0580741c80ab6eca427..815973ca6e764121f2669ad160918561450e6c50 100644 >> --- a/gcc/config/aarch64/aarch64.c >> +++ b/gcc/config/aarch64/aarch64.c >> @@ -13359,6 +13359,14 @@ aarch_macro_fusion_pair_p (rtx_insn *prev, rtx_insn *curr) >> return false; >> } >> >> +/* Return true iff the instruction fusion described by OP is enabled. */ >> + >> +bool >> +aarch64_fusion_enabled_p (unsigned int op) >> +{ >> + return (aarch64_tune_params.fusible_ops & op) != 0; >> +} >> + > A follow-up patch fixing the uses in aarch_macro_fusion_pair_p to use your > new function would be nice. > > OK with the change to argument type. Thanks, here is what I'm committing. The patch to use the new function in aarch_macro_fusion_pair_p is in testing. Kyrill 2016-05-27 Kyrylo Tkachov * config/aarch64/aarch64.c (aarch64_fusion_enabled_p): New function. * config/aarch64/aarch64-protos.h (aarch64_fusion_enabled_p): Declare prototype. * config/aarch64/aarch64-simd.md (aarch64_crypto_aesv16qi): Add "=w,0" alternative. Enable it when AES/AESMC fusion is enabled. > Thanks, > James >