* [PATCH][ARM] Update max_cond_insns settings
@ 2017-04-12 13:02 Wilco Dijkstra
2017-04-20 15:40 ` Wilco Dijkstra
` (2 more replies)
0 siblings, 3 replies; 11+ messages in thread
From: Wilco Dijkstra @ 2017-04-12 13:02 UTC (permalink / raw)
To: GCC Patches; +Cc: nd, Kyrylo Tkachov
The existing setting of max_cond_insns for most cores is non-optimal.
Thumb-2 IT has a maximum limit of 4, so 5 means emitting 2 IT sequences.
Also such long sequences of conditional instructions can increase the number
of executed instructions significantly, so using 5 for max_cond_insns is
non-optimal.
Previous benchmarking showed that setting max_cond_insn to 2 was the best value
for Cortex-A15 and Cortex-A57. All ARMv8-A cores use 2 - apart from Cortex-A35
and Cortex-A53. Given that using 5 is worse, set it to 2. This also has the
advantage of producing more uniform code.
Bootstrap and regress OK on arm-none-linux-gnueabihf.
OK for stage 1?
ChangeLog:
2017-04-12 Wilco Dijkstra <wdijkstr@arm.com>
* gcc/config/arm/arm.c (arm_cortex_a53_tune): Set max_cond_insns to 2.
(arm_cortex_a35_tune): Likewise.
---
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 29e8d1d07d918fbb2a627a653510dfc8587ee01a..1a6d552aa322114795acbb3667c6ea36963bf193 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -1967,7 +1967,7 @@ const struct tune_params arm_cortex_a35_tune =
arm_default_branch_cost,
&arm_default_vec_cost,
1, /* Constant limit. */
- 5, /* Max cond insns. */
+ 2, /* Max cond insns. */
8, /* Memset max inline. */
1, /* Issue rate. */
ARM_PREFETCH_NOT_BENEFICIAL,
@@ -1990,7 +1990,7 @@ const struct tune_params arm_cortex_a53_tune =
arm_default_branch_cost,
&arm_default_vec_cost,
1, /* Constant limit. */
- 5, /* Max cond insns. */
+ 2, /* Max cond insns. */
8, /* Memset max inline. */
2, /* Issue rate. */
ARM_PREFETCH_NOT_BENEFICIAL,
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH][ARM] Update max_cond_insns settings
2017-04-12 13:02 [PATCH][ARM] Update max_cond_insns settings Wilco Dijkstra
@ 2017-04-20 15:40 ` Wilco Dijkstra
2017-05-04 10:24 ` Richard Earnshaw (lists)
2018-11-09 14:13 ` Wilco Dijkstra
2 siblings, 0 replies; 11+ messages in thread
From: Wilco Dijkstra @ 2017-04-20 15:40 UTC (permalink / raw)
To: GCC Patches, Kyrylo Tkachov; +Cc: nd
ping
From: Wilco Dijkstra
Sent: 12 April 2017 14:02
To: GCC Patches
Cc: nd; Kyrylo Tkachov
Subject: [PATCH][ARM] Update max_cond_insns settings
The existing setting of max_cond_insns for most cores is non-optimal.
Thumb-2 IT has a maximum limit of 4, so 5 means emitting 2 IT sequences.
Also such long sequences of conditional instructions can increase the number
of executed instructions significantly, so using 5 for max_cond_insns is
non-optimal.
Previous benchmarking showed that setting max_cond_insn to 2 was the best value
for Cortex-A15 and Cortex-A57. All ARMv8-A cores use 2 - apart from Cortex-A35
and Cortex-A53. Given that using 5 is worse, set it to 2. This also has the
advantage of producing more uniform code.
Bootstrap and regress OK on arm-none-linux-gnueabihf.
OK for stage 1?
ChangeLog:
2017-04-12 Wilco Dijkstra <wdijkstr@arm.com>
* gcc/config/arm/arm.c (arm_cortex_a53_tune): Set max_cond_insns to 2.
(arm_cortex_a35_tune): Likewise.
---
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 29e8d1d07d918fbb2a627a653510dfc8587ee01a..1a6d552aa322114795acbb3667c6ea36963bf193 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -1967,7 +1967,7 @@ const struct tune_params arm_cortex_a35_tune =
arm_default_branch_cost,
&arm_default_vec_cost,
1, /* Constant limit. */
- 5, /* Max cond insns. */
+ 2, /* Max cond insns. */
8, /* Memset max inline. */
1, /* Issue rate. */
ARM_PREFETCH_NOT_BENEFICIAL,
@@ -1990,7 +1990,7 @@ const struct tune_params arm_cortex_a53_tune =
arm_default_branch_cost,
&arm_default_vec_cost,
1, /* Constant limit. */
- 5, /* Max cond insns. */
+ 2, /* Max cond insns. */
8, /* Memset max inline. */
2, /* Issue rate. */
ARM_PREFETCH_NOT_BENEFICIAL,
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH][ARM] Update max_cond_insns settings
2017-04-12 13:02 [PATCH][ARM] Update max_cond_insns settings Wilco Dijkstra
2017-04-20 15:40 ` Wilco Dijkstra
@ 2017-05-04 10:24 ` Richard Earnshaw (lists)
2017-05-04 17:39 ` Wilco Dijkstra
2018-11-09 14:13 ` Wilco Dijkstra
2 siblings, 1 reply; 11+ messages in thread
From: Richard Earnshaw (lists) @ 2017-05-04 10:24 UTC (permalink / raw)
To: Wilco Dijkstra, GCC Patches; +Cc: nd, Kyrylo Tkachov
On 12/04/17 14:02, Wilco Dijkstra wrote:
> The existing setting of max_cond_insns for most cores is non-optimal.
> Thumb-2 IT has a maximum limit of 4, so 5 means emitting 2 IT sequences.
> Also such long sequences of conditional instructions can increase the number
> of executed instructions significantly, so using 5 for max_cond_insns is
> non-optimal.
>
> Previous benchmarking showed that setting max_cond_insn to 2 was the best value
> for Cortex-A15 and Cortex-A57. All ARMv8-A cores use 2 - apart from Cortex-A35
> and Cortex-A53. Given that using 5 is worse, set it to 2. This also has the
> advantage of producing more uniform code.
>
> Bootstrap and regress OK on arm-none-linux-gnueabihf.
>
> OK for stage 1?
>
> ChangeLog:
> 2017-04-12 Wilco Dijkstra <wdijkstr@arm.com>
>
> * gcc/config/arm/arm.c (arm_cortex_a53_tune): Set max_cond_insns to 2.
> (arm_cortex_a35_tune): Likewise.
> ---
>
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index 29e8d1d07d918fbb2a627a653510dfc8587ee01a..1a6d552aa322114795acbb3667c6ea36963bf193 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -1967,7 +1967,7 @@ const struct tune_params arm_cortex_a35_tune =
> arm_default_branch_cost,
> &arm_default_vec_cost,
> 1, /* Constant limit. */
> - 5, /* Max cond insns. */
> + 2, /* Max cond insns. */
> 8, /* Memset max inline. */
> 1, /* Issue rate. */
> ARM_PREFETCH_NOT_BENEFICIAL,
> @@ -1990,7 +1990,7 @@ const struct tune_params arm_cortex_a53_tune =
> arm_default_branch_cost,
> &arm_default_vec_cost,
> 1, /* Constant limit. */
> - 5, /* Max cond insns. */
> + 2, /* Max cond insns. */
> 8, /* Memset max inline. */
> 2, /* Issue rate. */
> ARM_PREFETCH_NOT_BENEFICIAL,
>
This parameter is also used for A32 code. Is that really the right
number there as well?
I do wonder if the code in arm_option_params_internal should be tweaked
to hard-limit the number of skipped insns for Thumb2 to one IT block. So
/* When -mrestrict-it is in use tone down the if-conversion. */
max_insns_skipped = (TARGET_THUMB2 && arm_restrict_it)
? 1
: (TARGET_THUMB2 ? MIN (current_tune->max_insns_skipped, 4)
| current_tune->max_insns_skipped);
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH][ARM] Update max_cond_insns settings
2017-05-04 10:24 ` Richard Earnshaw (lists)
@ 2017-05-04 17:39 ` Wilco Dijkstra
2017-05-05 12:42 ` Richard Earnshaw (lists)
0 siblings, 1 reply; 11+ messages in thread
From: Wilco Dijkstra @ 2017-05-04 17:39 UTC (permalink / raw)
To: Richard Earnshaw, GCC Patches; +Cc: nd, Kyrylo Tkachov
Richard Earnshaw wrote:
> - 5, /* Max cond insns. */
> + 2, /* Max cond insns. */
> This parameter is also used for A32 code. Is that really the right
> number there as well?
Yes, this parameter has always been the same for ARM and Thumb-2.
> I do wonder if the code in arm_option_params_internal should be tweaked
> to hard-limit the number of skipped insns for Thumb2 to one IT block. So
You mean https://gcc.gnu.org/ml/gcc-patches/2017-01/msg01191.html ? :-)
Wilco
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH][ARM] Update max_cond_insns settings
2017-05-04 17:39 ` Wilco Dijkstra
@ 2017-05-05 12:42 ` Richard Earnshaw (lists)
2017-05-05 12:49 ` Wilco Dijkstra
0 siblings, 1 reply; 11+ messages in thread
From: Richard Earnshaw (lists) @ 2017-05-05 12:42 UTC (permalink / raw)
To: Wilco Dijkstra, GCC Patches; +Cc: nd, Kyrylo Tkachov
On 04/05/17 18:38, Wilco Dijkstra wrote:
> Richard Earnshaw wrote:
>
>> - 5, /* Max cond insns. */
>> + 2, /* Max cond insns. */
>
>> This parameter is also used for A32 code. Is that really the right
>> number there as well?
>
> Yes, this parameter has always been the same for ARM and Thumb-2.
I know that. I'm questioning whether that number (2) is right when on
ARM. It seems very low to me, especially when branches are unpredictable.
>
>> I do wonder if the code in arm_option_params_internal should be tweaked
>> to hard-limit the number of skipped insns for Thumb2 to one IT block. So
>
> You mean https://gcc.gnu.org/ml/gcc-patches/2017-01/msg01191.html ? :-)
>
Haven't got as far as that one yet.
R.
> Wilco
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH][ARM] Update max_cond_insns settings
2017-05-05 12:42 ` Richard Earnshaw (lists)
@ 2017-05-05 12:49 ` Wilco Dijkstra
2017-05-05 12:56 ` Richard Earnshaw (lists)
0 siblings, 1 reply; 11+ messages in thread
From: Wilco Dijkstra @ 2017-05-05 12:49 UTC (permalink / raw)
To: Richard Earnshaw, GCC Patches; +Cc: nd, Kyrylo Tkachov
Richard Earnshaw (lists) wrote:
> On 04/05/17 18:38, Wilco Dijkstra wrote:
> > Richard Earnshaw wrote:
> >
>>> - 5, /* Max cond insns. */
>>> + 2, /* Max cond insns. */
>>
>>> This parameter is also used for A32 code. Is that really the right
>>> number there as well?
>>
>> Yes, this parameter has always been the same for ARM and Thumb-2.
>
> I know that. I'm questioning whether that number (2) is right when on
> ARM. It seems very low to me, especially when branches are unpredictable.
Why does it seem low? Benchmarking showed 2 was the best value for modern
cores. The same branch predictor is used, so the same settings should be used
for ARM and Thumb-2.
Wilco
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH][ARM] Update max_cond_insns settings
2017-05-05 12:49 ` Wilco Dijkstra
@ 2017-05-05 12:56 ` Richard Earnshaw (lists)
2017-05-05 16:32 ` Wilco Dijkstra
0 siblings, 1 reply; 11+ messages in thread
From: Richard Earnshaw (lists) @ 2017-05-05 12:56 UTC (permalink / raw)
To: Wilco Dijkstra, GCC Patches; +Cc: nd, Kyrylo Tkachov
On 05/05/17 13:42, Wilco Dijkstra wrote:
> Richard Earnshaw (lists) wrote:
>> On 04/05/17 18:38, Wilco Dijkstra wrote:
>> > Richard Earnshaw wrote:
>> >
>>>> - 5, /* Max cond insns. */
>>>> + 2, /* Max cond insns. */
>>>
>>>> This parameter is also used for A32 code. Is that really the right
>>>> number there as well?
>>>
>>> Yes, this parameter has always been the same for ARM and Thumb-2.
>>
>> I know that. I'm questioning whether that number (2) is right when on
>> ARM. It seems very low to me, especially when branches are unpredictable.
>
> Why does it seem low? Benchmarking showed 2 was the best value for modern
> cores. The same branch predictor is used, so the same settings should be
> used
> for ARM and Thumb-2.
>
> Wilco
>
>
Thumb2 code has to execute an additional instruction to start an IT
sequence. It might therefore seem reasonable for the ARM sequence to be
one instruction longer.
R.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH][ARM] Update max_cond_insns settings
2017-05-05 12:56 ` Richard Earnshaw (lists)
@ 2017-05-05 16:32 ` Wilco Dijkstra
2017-06-13 13:56 ` Wilco Dijkstra
0 siblings, 1 reply; 11+ messages in thread
From: Wilco Dijkstra @ 2017-05-05 16:32 UTC (permalink / raw)
To: Richard Earnshaw, GCC Patches; +Cc: nd, Kyrylo Tkachov
Richard Earnshaw (lists) wrote:
> On 05/05/17 13:42, Wilco Dijkstra wrote:
>> Richard Earnshaw (lists) wrote:
>>> On 04/05/17 18:38, Wilco Dijkstra wrote:
>>> > Richard Earnshaw wrote:
>>> >
>>>>> - 5, /* Max cond insns. */
>>>>> + 2, /* Max cond insns. */
>>>>
>>>>> This parameter is also used for A32 code. Is that really the right
>>>>> number there as well?
>>>>
>>>> Yes, this parameter has always been the same for ARM and Thumb-2.
>>>
>>> I know that. I'm questioning whether that number (2) is right when on
>>> ARM. It seems very low to me, especially when branches are unpredictable.
>>
>> Why does it seem low? Benchmarking showed 2 was the best value for modern
>> cores. The same branch predictor is used, so the same settings should be
>> used
>> for ARM and Thumb-2.
>
> Thumb2 code has to execute an additional instruction to start an IT
> sequence. It might therefore seem reasonable for the ARM sequence to be
> one instruction longer.
The IT instruction has no inputs/outputs and thus behaves like a NOP - unlike
conditional instructions which have real latencies and additional dependencies due
to being conditional. So the overhead of IT itself is small.
Wilco
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH][ARM] Update max_cond_insns settings
2017-05-05 16:32 ` Wilco Dijkstra
@ 2017-06-13 13:56 ` Wilco Dijkstra
2017-06-27 15:41 ` Wilco Dijkstra
0 siblings, 1 reply; 11+ messages in thread
From: Wilco Dijkstra @ 2017-06-13 13:56 UTC (permalink / raw)
To: Richard Earnshaw, GCC Patches; +Cc: nd, Kyrylo Tkachov
ping
Richard Earnshaw (lists) wrote:
> On 05/05/17 13:42, Wilco Dijkstra wrote:
>> Richard Earnshaw (lists) wrote:
>>> On 04/05/17 18:38, Wilco Dijkstra wrote:
>>> > Richard Earnshaw wrote:
>>> >
>>>>> - 5, /* Max cond insns. */
>>>>> + 2, /* Max cond insns. */
>>>>
>>>>> This parameter is also used for A32 code. Is that really the right
>>>>> number there as well?
>>>>
>>>> Yes, this parameter has always been the same for ARM and Thumb-2.
>>>
>>> I know that. I'm questioning whether that number (2) is right when on
>>> ARM. It seems very low to me, especially when branches are unpredictable.
>>
>> Why does it seem low? Benchmarking showed 2 was the best value for modern
>> cores. The same branch predictor is used, so the same settings should be
>> used
>> for ARM and Thumb-2.
>
> Thumb2 code has to execute an additional instruction to start an IT
> sequence. It might therefore seem reasonable for the ARM sequence to be
> one instruction longer.
The IT instruction has no inputs/outputs and thus behaves like a NOP - unlike
conditional instructions which have real latencies and additional dependencies due
to being conditional. So the overhead of IT itself is small.
Wilco
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH][ARM] Update max_cond_insns settings
2017-06-13 13:56 ` Wilco Dijkstra
@ 2017-06-27 15:41 ` Wilco Dijkstra
0 siblings, 0 replies; 11+ messages in thread
From: Wilco Dijkstra @ 2017-06-27 15:41 UTC (permalink / raw)
To: Richard Earnshaw, GCC Patches; +Cc: nd, Kyrylo Tkachov
ping
Richard Earnshaw (lists) wrote:
> On 05/05/17 13:42, Wilco Dijkstra wrote:
>> Richard Earnshaw (lists) wrote:
>>> On 04/05/17 18:38, Wilco Dijkstra wrote:
>>> > Richard Earnshaw wrote:
>>> >
>>>>> - 5, /* Max cond insns. */
>>>>> + 2, /* Max cond insns. */
>>>>
>>>>> This parameter is also used for A32 code. Is that really the right
>>>>> number there as well?
>>>>
>>>> Yes, this parameter has always been the same for ARM and Thumb-2.
>>>
>>> I know that. I'm questioning whether that number (2) is right when on
>>> ARM. It seems very low to me, especially when branches are unpredictable.
>>
>> Why does it seem low? Benchmarking showed 2 was the best value for modern
>> cores. The same branch predictor is used, so the same settings should be
>> used
>> for ARM and Thumb-2.
>
> Thumb2 code has to execute an additional instruction to start an IT
> sequence. It might therefore seem reasonable for the ARM sequence to be
> one instruction longer.
The IT instruction has no inputs/outputs and thus behaves like a NOP - unlike
conditional instructions which have real latencies and additional dependencies due
to being conditional. So the overhead of IT itself is small.
Wilco
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH][ARM] Update max_cond_insns settings
2017-04-12 13:02 [PATCH][ARM] Update max_cond_insns settings Wilco Dijkstra
2017-04-20 15:40 ` Wilco Dijkstra
2017-05-04 10:24 ` Richard Earnshaw (lists)
@ 2018-11-09 14:13 ` Wilco Dijkstra
2 siblings, 0 replies; 11+ messages in thread
From: Wilco Dijkstra @ 2018-11-09 14:13 UTC (permalink / raw)
To: GCC Patches; +Cc: nd, Kyrylo Tkachov
The existing setting of max_cond_insns for most cores is non-optimal.
Thumb-2 IT has a maximum limit of 4, so 5 means emitting 2 IT sequences.
Also such long sequences of conditional instructions can increase the number
of executed instructions significantly, so using 5 for max_cond_insns is
non-optimal.
Previous benchmarking showed that setting max_cond_insn to 2 was the best value
for Cortex-A15 and Cortex-A57. All ARMv8-A cores use 2 - apart from Cortex-A35
and Cortex-A53. Given that using 5 is worse, set it to 2. This also has the
advantage of producing more uniform code.
Bootstrap and regress OK on arm-none-linux-gnueabihf.
OK for stage 1?
ChangeLog:
2017-04-12 Wilco Dijkstra <wdijkstr@arm.com>
* gcc/config/arm/arm.c (arm_cortex_a53_tune): Set max_cond_insns to 2.
(arm_cortex_a35_tune): Likewise.
---
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 29e8d1d07d918fbb2a627a653510dfc8587ee01a..1a6d552aa322114795acbb3667c6ea36963bf193 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -1967,7 +1967,7 @@ const struct tune_params arm_cortex_a35_tune =
arm_default_branch_cost,
&arm_default_vec_cost,
1, /* Constant limit. */
- 5, /* Max cond insns. */
+ 2, /* Max cond insns. */
8, /* Memset max inline. */
1, /* Issue rate. */
ARM_PREFETCH_NOT_BENEFICIAL,
@@ -1990,7 +1990,7 @@ const struct tune_params arm_cortex_a53_tune =
arm_default_branch_cost,
&arm_default_vec_cost,
1, /* Constant limit. */
- 5, /* Max cond insns. */
+ 2, /* Max cond insns. */
8, /* Memset max inline. */
2, /* Issue rate. */
ARM_PREFETCH_NOT_BENEFICIAL,
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2018-11-09 14:13 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-12 13:02 [PATCH][ARM] Update max_cond_insns settings Wilco Dijkstra
2017-04-20 15:40 ` Wilco Dijkstra
2017-05-04 10:24 ` Richard Earnshaw (lists)
2017-05-04 17:39 ` Wilco Dijkstra
2017-05-05 12:42 ` Richard Earnshaw (lists)
2017-05-05 12:49 ` Wilco Dijkstra
2017-05-05 12:56 ` Richard Earnshaw (lists)
2017-05-05 16:32 ` Wilco Dijkstra
2017-06-13 13:56 ` Wilco Dijkstra
2017-06-27 15:41 ` Wilco Dijkstra
2018-11-09 14:13 ` Wilco Dijkstra
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).