public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH][AArch64] Tie operand 1 to operand 0 in AESMC pattern when AES/AESMC fusion is enabled
@ 2016-05-20 10:04 Kyrill Tkachov
  2016-05-26 14:06 ` James Greenhalgh
  0 siblings, 1 reply; 3+ messages in thread
From: Kyrill Tkachov @ 2016-05-20 10:04 UTC (permalink / raw)
  To: GCC Patches; +Cc: Marcus Shawcroft, Richard Earnshaw, James Greenhalgh

[-- Attachment #1: Type: text/plain, Size: 1293 bytes --]

Hi all,

The recent -frename-registers change exposed a deficiency in the way we fuse AESE/AESMC instruction
pairs in aarch64.

Basically we want to enforce:
     AESE Vn, _
     AESMC Vn, Vn

to enable the fusion, but regrename comes along and renames the output Vn register in AESMC to something
else, killing the fusion in the hardware.

The solution in this patch is to add an alternative that ties the input and output registers in the AESMC pattern
and enable that alternative when the fusion is enabled.

With this patch I've confirmed that the above preferred register sequence is kept even with -frename-registers
when tuning for a cpu that enables the fusion and that the chain is broken by regrename otherwise and have
seen the appropriate improvement in a proprietary benchmark (that I cannot name) that exercises this sequence.

Bootstrapped and tested on aarch64-none-linux-gnu.

Ok for trunk?

Thanks,
Kyrill

2016-05-20  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * config/aarch64/aarch64.c (aarch64_fusion_enabled_p): New function.
     * config/aarch64/aarch64-protos.h (aarch64_fusion_enabled_p): Declare
     prototype.
     * config/aarch64/aarch64-simd.md (aarch64_crypto_aes<aesmc_op>v16qi):
     Add "=w,0" alternative.  Enable it when AES/AESMC fusion is enabled.

[-- Attachment #2: aarch64-fuse.patch --]
[-- Type: text/x-patch, Size: 2625 bytes --]

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 21cf55b60f86024429ea36ead0d2d8ae4c94b579..f6da854fbaeeab34239a1f874edaedf8a01bf9c2 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -290,6 +290,7 @@ bool aarch64_constant_address_p (rtx);
 bool aarch64_expand_movmem (rtx *);
 bool aarch64_float_const_zero_rtx_p (rtx);
 bool aarch64_function_arg_regno_p (unsigned);
+bool aarch64_fusion_enabled_p (unsigned int);
 bool aarch64_gen_movmemqi (rtx *);
 bool aarch64_gimple_fold_builtin (gimple_stmt_iterator *);
 bool aarch64_is_extend_from_extract (machine_mode, rtx, rtx);
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index bd73bce64414e8bc01732d14311d742cf28f4586..a66948a28e99f4437824a8640b092f7be1c917f6 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -5424,13 +5424,25 @@ (define_insn "aarch64_crypto_aes<aes_op>v16qi"
   [(set_attr "type" "crypto_aese")]
 )
 
+;; When AES/AESMC fusion is enabled we want the register allocation to
+;; look like:
+;;    AESE Vn, _
+;;    AESMC Vn, Vn
+;; So prefer to tie operand 1 to operand 0 when fusing.
+
 (define_insn "aarch64_crypto_aes<aesmc_op>v16qi"
-  [(set (match_operand:V16QI 0 "register_operand" "=w")
-	(unspec:V16QI [(match_operand:V16QI 1 "register_operand" "w")]
+  [(set (match_operand:V16QI 0 "register_operand" "=w,w")
+	(unspec:V16QI [(match_operand:V16QI 1 "register_operand" "0,w")]
 	 CRYPTO_AESMC))]
   "TARGET_SIMD && TARGET_CRYPTO"
   "aes<aesmc_op>\\t%0.16b, %1.16b"
-  [(set_attr "type" "crypto_aesmc")]
+  [(set_attr "type" "crypto_aesmc")
+   (set_attr_alternative "enabled"
+     [(if_then_else (match_test
+		       "aarch64_fusion_enabled_p (AARCH64_FUSE_AES_AESMC)")
+		     (const_string "yes" )
+		     (const_string "no"))
+      (const_string "yes")])]
 )
 
 ;; sha1
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index b93f961fc4ebd9eb3f50b0580741c80ab6eca427..815973ca6e764121f2669ad160918561450e6c50 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -13359,6 +13359,14 @@ aarch_macro_fusion_pair_p (rtx_insn *prev, rtx_insn *curr)
   return false;
 }
 
+/* Return true iff the instruction fusion described by OP is enabled.  */
+
+bool
+aarch64_fusion_enabled_p (unsigned int op)
+{
+  return (aarch64_tune_params.fusible_ops & op) != 0;
+}
+
 /* If MEM is in the form of [base+offset], extract the two parts
    of address and set to BASE and OFFSET, otherwise return false
    after clearing BASE and OFFSET.  */

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH][AArch64] Tie operand 1 to operand 0 in AESMC pattern when AES/AESMC fusion is enabled
  2016-05-20 10:04 [PATCH][AArch64] Tie operand 1 to operand 0 in AESMC pattern when AES/AESMC fusion is enabled Kyrill Tkachov
@ 2016-05-26 14:06 ` James Greenhalgh
  2016-05-27 14:43   ` Kyrill Tkachov
  0 siblings, 1 reply; 3+ messages in thread
From: James Greenhalgh @ 2016-05-26 14:06 UTC (permalink / raw)
  To: Kyrill Tkachov; +Cc: GCC Patches, Marcus Shawcroft, Richard Earnshaw, nd

On Fri, May 20, 2016 at 11:04:32AM +0100, Kyrill Tkachov wrote:
> Hi all,
> 
> The recent -frename-registers change exposed a deficiency in the way we fuse
> AESE/AESMC instruction pairs in aarch64.
> 
> Basically we want to enforce:
>     AESE Vn, _
>     AESMC Vn, Vn
> 
> to enable the fusion, but regrename comes along and renames the output Vn
> register in AESMC to something else, killing the fusion in the hardware.
> 
> The solution in this patch is to add an alternative that ties the input and
> output registers in the AESMC pattern and enable that alternative when the
> fusion is enabled.
> 
> With this patch I've confirmed that the above preferred register sequence is
> kept even with -frename-registers when tuning for a cpu that enables the
> fusion and that the chain is broken by regrename otherwise and have seen the
> appropriate improvement in a proprietary benchmark (that I cannot name) that
> exercises this sequence.
> 
> Bootstrapped and tested on aarch64-none-linux-gnu.
> 
> Ok for trunk?
> 
> Thanks,
> Kyrill
> 
> 2016-05-20  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
> 
>     * config/aarch64/aarch64.c (aarch64_fusion_enabled_p): New function.
>     * config/aarch64/aarch64-protos.h (aarch64_fusion_enabled_p): Declare
>     prototype.
>     * config/aarch64/aarch64-simd.md (aarch64_crypto_aes<aesmc_op>v16qi):
>     Add "=w,0" alternative.  Enable it when AES/AESMC fusion is enabled.

> diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
> index 21cf55b60f86024429ea36ead0d2d8ae4c94b579..f6da854fbaeeab34239a1f874edaedf8a01bf9c2 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -290,6 +290,7 @@ bool aarch64_constant_address_p (rtx);
>  bool aarch64_expand_movmem (rtx *);
>  bool aarch64_float_const_zero_rtx_p (rtx);
>  bool aarch64_function_arg_regno_p (unsigned);
> +bool aarch64_fusion_enabled_p (unsigned int);

This argument type should be "enum aarch64_fusion_pairs".

>  bool aarch64_gen_movmemqi (rtx *);
>  bool aarch64_gimple_fold_builtin (gimple_stmt_iterator *);
>  bool aarch64_is_extend_from_extract (machine_mode, rtx, rtx);
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index b93f961fc4ebd9eb3f50b0580741c80ab6eca427..815973ca6e764121f2669ad160918561450e6c50 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -13359,6 +13359,14 @@ aarch_macro_fusion_pair_p (rtx_insn *prev, rtx_insn *curr)
>    return false;
>  }
>  
> +/* Return true iff the instruction fusion described by OP is enabled.  */
> +
> +bool
> +aarch64_fusion_enabled_p (unsigned int op)
> +{
> +  return (aarch64_tune_params.fusible_ops & op) != 0;
> +}
> +

A follow-up patch fixing the uses in aarch_macro_fusion_pair_p to use your
new function would be nice.

OK with the change to argument type.

Thanks,
James

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH][AArch64] Tie operand 1 to operand 0 in AESMC pattern when AES/AESMC fusion is enabled
  2016-05-26 14:06 ` James Greenhalgh
@ 2016-05-27 14:43   ` Kyrill Tkachov
  0 siblings, 0 replies; 3+ messages in thread
From: Kyrill Tkachov @ 2016-05-27 14:43 UTC (permalink / raw)
  To: James Greenhalgh; +Cc: GCC Patches, Marcus Shawcroft, Richard Earnshaw

[-- Attachment #1: Type: text/plain, Size: 3515 bytes --]


On 26/05/16 11:17, James Greenhalgh wrote:
> On Fri, May 20, 2016 at 11:04:32AM +0100, Kyrill Tkachov wrote:
>> Hi all,
>>
>> The recent -frename-registers change exposed a deficiency in the way we fuse
>> AESE/AESMC instruction pairs in aarch64.
>>
>> Basically we want to enforce:
>>      AESE Vn, _
>>      AESMC Vn, Vn
>>
>> to enable the fusion, but regrename comes along and renames the output Vn
>> register in AESMC to something else, killing the fusion in the hardware.
>>
>> The solution in this patch is to add an alternative that ties the input and
>> output registers in the AESMC pattern and enable that alternative when the
>> fusion is enabled.
>>
>> With this patch I've confirmed that the above preferred register sequence is
>> kept even with -frename-registers when tuning for a cpu that enables the
>> fusion and that the chain is broken by regrename otherwise and have seen the
>> appropriate improvement in a proprietary benchmark (that I cannot name) that
>> exercises this sequence.
>>
>> Bootstrapped and tested on aarch64-none-linux-gnu.
>>
>> Ok for trunk?
>>
>> Thanks,
>> Kyrill
>>
>> 2016-05-20  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>
>>      * config/aarch64/aarch64.c (aarch64_fusion_enabled_p): New function.
>>      * config/aarch64/aarch64-protos.h (aarch64_fusion_enabled_p): Declare
>>      prototype.
>>      * config/aarch64/aarch64-simd.md (aarch64_crypto_aes<aesmc_op>v16qi):
>>      Add "=w,0" alternative.  Enable it when AES/AESMC fusion is enabled.
>> diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
>> index 21cf55b60f86024429ea36ead0d2d8ae4c94b579..f6da854fbaeeab34239a1f874edaedf8a01bf9c2 100644
>> --- a/gcc/config/aarch64/aarch64-protos.h
>> +++ b/gcc/config/aarch64/aarch64-protos.h
>> @@ -290,6 +290,7 @@ bool aarch64_constant_address_p (rtx);
>>   bool aarch64_expand_movmem (rtx *);
>>   bool aarch64_float_const_zero_rtx_p (rtx);
>>   bool aarch64_function_arg_regno_p (unsigned);
>> +bool aarch64_fusion_enabled_p (unsigned int);
> This argument type should be "enum aarch64_fusion_pairs".
>
>>   bool aarch64_gen_movmemqi (rtx *);
>>   bool aarch64_gimple_fold_builtin (gimple_stmt_iterator *);
>>   bool aarch64_is_extend_from_extract (machine_mode, rtx, rtx);
>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>> index b93f961fc4ebd9eb3f50b0580741c80ab6eca427..815973ca6e764121f2669ad160918561450e6c50 100644
>> --- a/gcc/config/aarch64/aarch64.c
>> +++ b/gcc/config/aarch64/aarch64.c
>> @@ -13359,6 +13359,14 @@ aarch_macro_fusion_pair_p (rtx_insn *prev, rtx_insn *curr)
>>     return false;
>>   }
>>   
>> +/* Return true iff the instruction fusion described by OP is enabled.  */
>> +
>> +bool
>> +aarch64_fusion_enabled_p (unsigned int op)
>> +{
>> +  return (aarch64_tune_params.fusible_ops & op) != 0;
>> +}
>> +
> A follow-up patch fixing the uses in aarch_macro_fusion_pair_p to use your
> new function would be nice.
>
> OK with the change to argument type.

Thanks, here is what I'm committing.
The patch to use the new function in aarch_macro_fusion_pair_p is in testing.

Kyrill


2016-05-27  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * config/aarch64/aarch64.c (aarch64_fusion_enabled_p): New function.
     * config/aarch64/aarch64-protos.h (aarch64_fusion_enabled_p): Declare
     prototype.
     * config/aarch64/aarch64-simd.md (aarch64_crypto_aes<aesmc_op>v16qi):
     Add "=w,0" alternative.  Enable it when AES/AESMC fusion is enabled.

> Thanks,
> James
>


[-- Attachment #2: aarch64-tie.patch --]
[-- Type: text/x-patch, Size: 2651 bytes --]

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 98c81baad827f3c0b0c7054dd97d3d0989af3796..6e97c01cf768ec5ca2f18e795a8085b8a247f5b6 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -290,6 +290,7 @@ bool aarch64_constant_address_p (rtx);
 bool aarch64_expand_movmem (rtx *);
 bool aarch64_float_const_zero_rtx_p (rtx);
 bool aarch64_function_arg_regno_p (unsigned);
+bool aarch64_fusion_enabled_p (enum aarch64_fusion_pairs);
 bool aarch64_gen_movmemqi (rtx *);
 bool aarch64_gimple_fold_builtin (gimple_stmt_iterator *);
 bool aarch64_is_extend_from_extract (machine_mode, rtx, rtx);
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 59a578f5937a240b325af22021bbd662230ed404..8be69357eb086e288ae838dc536be8a2ebe0463b 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -5401,13 +5401,25 @@ (define_insn "aarch64_crypto_aes<aes_op>v16qi"
   [(set_attr "type" "crypto_aese")]
 )
 
+;; When AES/AESMC fusion is enabled we want the register allocation to
+;; look like:
+;;    AESE Vn, _
+;;    AESMC Vn, Vn
+;; So prefer to tie operand 1 to operand 0 when fusing.
+
 (define_insn "aarch64_crypto_aes<aesmc_op>v16qi"
-  [(set (match_operand:V16QI 0 "register_operand" "=w")
-	(unspec:V16QI [(match_operand:V16QI 1 "register_operand" "w")]
+  [(set (match_operand:V16QI 0 "register_operand" "=w,w")
+	(unspec:V16QI [(match_operand:V16QI 1 "register_operand" "0,w")]
 	 CRYPTO_AESMC))]
   "TARGET_SIMD && TARGET_CRYPTO"
   "aes<aesmc_op>\\t%0.16b, %1.16b"
-  [(set_attr "type" "crypto_aesmc")]
+  [(set_attr "type" "crypto_aesmc")
+   (set_attr_alternative "enabled"
+     [(if_then_else (match_test
+		       "aarch64_fusion_enabled_p (AARCH64_FUSE_AES_AESMC)")
+		     (const_string "yes" )
+		     (const_string "no"))
+      (const_string "yes")])]
 )
 
 ;; sha1
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 1784b9215b3ada0dd944838b7ca9f9dbfaf750ec..a19855e605d77cfa2221f99759486432963b6f98 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -13235,6 +13235,14 @@ aarch_macro_fusion_pair_p (rtx_insn *prev, rtx_insn *curr)
   return false;
 }
 
+/* Return true iff the instruction fusion described by OP is enabled.  */
+
+bool
+aarch64_fusion_enabled_p (enum aarch64_fusion_pairs op)
+{
+  return (aarch64_tune_params.fusible_ops & op) != 0;
+}
+
 /* If MEM is in the form of [base+offset], extract the two parts
    of address and set to BASE and OFFSET, otherwise return false
    after clearing BASE and OFFSET.  */

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2016-05-27 13:41 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-20 10:04 [PATCH][AArch64] Tie operand 1 to operand 0 in AESMC pattern when AES/AESMC fusion is enabled Kyrill Tkachov
2016-05-26 14:06 ` James Greenhalgh
2016-05-27 14:43   ` Kyrill Tkachov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).