public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH][ARM] Update max_cond_insns settings
@ 2017-04-12 13:02 Wilco Dijkstra
  2017-04-20 15:40 ` Wilco Dijkstra
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Wilco Dijkstra @ 2017-04-12 13:02 UTC (permalink / raw)
  To: GCC Patches; +Cc: nd, Kyrylo Tkachov

The existing setting of max_cond_insns for most cores is non-optimal.
Thumb-2 IT has a maximum limit of 4, so 5 means emitting 2 IT sequences.
Also such long sequences of conditional instructions can increase the number
of executed instructions significantly, so using 5 for max_cond_insns is
non-optimal.

Previous benchmarking showed that setting max_cond_insn to 2 was the best value
for Cortex-A15 and Cortex-A57.  All ARMv8-A cores use 2 - apart from Cortex-A35
and Cortex-A53.  Given that using 5 is worse, set it to 2.  This also has the
advantage of producing more uniform code.

Bootstrap and regress OK on arm-none-linux-gnueabihf.

OK for stage 1?

ChangeLog:
2017-04-12  Wilco Dijkstra  <wdijkstr@arm.com>

        * gcc/config/arm/arm.c (arm_cortex_a53_tune): Set max_cond_insns to 2.
        (arm_cortex_a35_tune): Likewise.
---

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 29e8d1d07d918fbb2a627a653510dfc8587ee01a..1a6d552aa322114795acbb3667c6ea36963bf193 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -1967,7 +1967,7 @@ const struct tune_params arm_cortex_a35_tune =
   arm_default_branch_cost,
   &arm_default_vec_cost,
   1,						/* Constant limit.  */
-  5,						/* Max cond insns.  */
+  2,						/* Max cond insns.  */
   8,						/* Memset max inline.  */
   1,						/* Issue rate.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
@@ -1990,7 +1990,7 @@ const struct tune_params arm_cortex_a53_tune =
   arm_default_branch_cost,
   &arm_default_vec_cost,
   1,						/* Constant limit.  */
-  5,						/* Max cond insns.  */
+  2,						/* Max cond insns.  */
   8,						/* Memset max inline.  */
   2,						/* Issue rate.  */
   ARM_PREFETCH_NOT_BENEFICIAL,

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH][ARM] Update max_cond_insns settings
  2017-04-12 13:02 [PATCH][ARM] Update max_cond_insns settings Wilco Dijkstra
@ 2017-04-20 15:40 ` Wilco Dijkstra
  2017-05-04 10:24 ` Richard Earnshaw (lists)
  2018-11-09 14:13 ` Wilco Dijkstra
  2 siblings, 0 replies; 11+ messages in thread
From: Wilco Dijkstra @ 2017-04-20 15:40 UTC (permalink / raw)
  To: GCC Patches, Kyrylo Tkachov; +Cc: nd


ping


From: Wilco Dijkstra
Sent: 12 April 2017 14:02
To: GCC Patches
Cc: nd; Kyrylo Tkachov
Subject: [PATCH][ARM] Update max_cond_insns settings
    
The existing setting of max_cond_insns for most cores is non-optimal.
Thumb-2 IT has a maximum limit of 4, so 5 means emitting 2 IT sequences.
Also such long sequences of conditional instructions can increase the number
of executed instructions significantly, so using 5 for max_cond_insns is
non-optimal.

Previous benchmarking showed that setting max_cond_insn to 2 was the best value
for Cortex-A15 and Cortex-A57.  All ARMv8-A cores use 2 - apart from Cortex-A35
and Cortex-A53.  Given that using 5 is worse, set it to 2.  This also has the
advantage of producing more uniform code.

Bootstrap and regress OK on arm-none-linux-gnueabihf.

OK for stage 1?

ChangeLog:
2017-04-12  Wilco Dijkstra  <wdijkstr@arm.com>

        * gcc/config/arm/arm.c (arm_cortex_a53_tune): Set max_cond_insns to 2.
        (arm_cortex_a35_tune): Likewise.
---

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 29e8d1d07d918fbb2a627a653510dfc8587ee01a..1a6d552aa322114795acbb3667c6ea36963bf193 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -1967,7 +1967,7 @@ const struct tune_params arm_cortex_a35_tune =
   arm_default_branch_cost,
   &arm_default_vec_cost,
   1,                                           /* Constant limit.  */
-  5,                                           /* Max cond insns.  */
+  2,                                           /* Max cond insns.  */
   8,                                           /* Memset max inline.  */
   1,                                           /* Issue rate.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
@@ -1990,7 +1990,7 @@ const struct tune_params arm_cortex_a53_tune =
   arm_default_branch_cost,
   &arm_default_vec_cost,
   1,                                           /* Constant limit.  */
-  5,                                           /* Max cond insns.  */
+  2,                                           /* Max cond insns.  */
   8,                                           /* Memset max inline.  */
   2,                                           /* Issue rate.  */
   ARM_PREFETCH_NOT_BENEFICIAL,    

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH][ARM] Update max_cond_insns settings
  2017-04-12 13:02 [PATCH][ARM] Update max_cond_insns settings Wilco Dijkstra
  2017-04-20 15:40 ` Wilco Dijkstra
@ 2017-05-04 10:24 ` Richard Earnshaw (lists)
  2017-05-04 17:39   ` Wilco Dijkstra
  2018-11-09 14:13 ` Wilco Dijkstra
  2 siblings, 1 reply; 11+ messages in thread
From: Richard Earnshaw (lists) @ 2017-05-04 10:24 UTC (permalink / raw)
  To: Wilco Dijkstra, GCC Patches; +Cc: nd, Kyrylo Tkachov

On 12/04/17 14:02, Wilco Dijkstra wrote:
> The existing setting of max_cond_insns for most cores is non-optimal.
> Thumb-2 IT has a maximum limit of 4, so 5 means emitting 2 IT sequences.
> Also such long sequences of conditional instructions can increase the number
> of executed instructions significantly, so using 5 for max_cond_insns is
> non-optimal.
> 
> Previous benchmarking showed that setting max_cond_insn to 2 was the best value
> for Cortex-A15 and Cortex-A57.  All ARMv8-A cores use 2 - apart from Cortex-A35
> and Cortex-A53.  Given that using 5 is worse, set it to 2.  This also has the
> advantage of producing more uniform code.
> 
> Bootstrap and regress OK on arm-none-linux-gnueabihf.
> 
> OK for stage 1?
> 
> ChangeLog:
> 2017-04-12  Wilco Dijkstra  <wdijkstr@arm.com>
> 
>         * gcc/config/arm/arm.c (arm_cortex_a53_tune): Set max_cond_insns to 2.
>         (arm_cortex_a35_tune): Likewise.
> ---
> 
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index 29e8d1d07d918fbb2a627a653510dfc8587ee01a..1a6d552aa322114795acbb3667c6ea36963bf193 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -1967,7 +1967,7 @@ const struct tune_params arm_cortex_a35_tune =
>    arm_default_branch_cost,
>    &arm_default_vec_cost,
>    1,						/* Constant limit.  */
> -  5,						/* Max cond insns.  */
> +  2,						/* Max cond insns.  */
>    8,						/* Memset max inline.  */
>    1,						/* Issue rate.  */
>    ARM_PREFETCH_NOT_BENEFICIAL,
> @@ -1990,7 +1990,7 @@ const struct tune_params arm_cortex_a53_tune =
>    arm_default_branch_cost,
>    &arm_default_vec_cost,
>    1,						/* Constant limit.  */
> -  5,						/* Max cond insns.  */
> +  2,						/* Max cond insns.  */
>    8,						/* Memset max inline.  */
>    2,						/* Issue rate.  */
>    ARM_PREFETCH_NOT_BENEFICIAL,
> 


This parameter is also used for A32 code.  Is that really the right
number there as well?

I do wonder if the code in arm_option_params_internal should be tweaked
to hard-limit the number of skipped insns for Thumb2 to one IT block.  So

    /* When -mrestrict-it is in use tone down the if-conversion.  */
    max_insns_skipped = (TARGET_THUMB2 && arm_restrict_it)
      ? 1
      : (TARGET_THUMB2 ? MIN (current_tune->max_insns_skipped, 4)
         | current_tune->max_insns_skipped);


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH][ARM] Update max_cond_insns settings
  2017-05-04 10:24 ` Richard Earnshaw (lists)
@ 2017-05-04 17:39   ` Wilco Dijkstra
  2017-05-05 12:42     ` Richard Earnshaw (lists)
  0 siblings, 1 reply; 11+ messages in thread
From: Wilco Dijkstra @ 2017-05-04 17:39 UTC (permalink / raw)
  To: Richard Earnshaw, GCC Patches; +Cc: nd, Kyrylo Tkachov

Richard Earnshaw wrote:

> -  5,                                         /* Max cond insns.  */
> +  2,                                         /* Max cond insns.  */

> This parameter is also used for A32 code.  Is that really the right
> number there as well?

Yes, this parameter has always been the same for ARM and Thumb-2.

> I do wonder if the code in arm_option_params_internal should be tweaked
> to hard-limit the number of skipped insns for Thumb2 to one IT block.  So

You mean https://gcc.gnu.org/ml/gcc-patches/2017-01/msg01191.html ? :-)

Wilco

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH][ARM] Update max_cond_insns settings
  2017-05-04 17:39   ` Wilco Dijkstra
@ 2017-05-05 12:42     ` Richard Earnshaw (lists)
  2017-05-05 12:49       ` Wilco Dijkstra
  0 siblings, 1 reply; 11+ messages in thread
From: Richard Earnshaw (lists) @ 2017-05-05 12:42 UTC (permalink / raw)
  To: Wilco Dijkstra, GCC Patches; +Cc: nd, Kyrylo Tkachov

On 04/05/17 18:38, Wilco Dijkstra wrote:
> Richard Earnshaw wrote:
> 
>> -  5,                                         /* Max cond insns.  */
>> +  2,                                         /* Max cond insns.  */
> 
>> This parameter is also used for A32 code.  Is that really the right
>> number there as well?
> 
> Yes, this parameter has always been the same for ARM and Thumb-2.

I know that.  I'm questioning whether that number (2) is right when on
ARM.  It seems very low to me, especially when branches are unpredictable.

> 
>> I do wonder if the code in arm_option_params_internal should be tweaked
>> to hard-limit the number of skipped insns for Thumb2 to one IT block.  So
> 
> You mean https://gcc.gnu.org/ml/gcc-patches/2017-01/msg01191.html ? :-)
> 

Haven't got as far as that one yet.

R.

> Wilco
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH][ARM] Update max_cond_insns settings
  2017-05-05 12:42     ` Richard Earnshaw (lists)
@ 2017-05-05 12:49       ` Wilco Dijkstra
  2017-05-05 12:56         ` Richard Earnshaw (lists)
  0 siblings, 1 reply; 11+ messages in thread
From: Wilco Dijkstra @ 2017-05-05 12:49 UTC (permalink / raw)
  To: Richard Earnshaw, GCC Patches; +Cc: nd, Kyrylo Tkachov

Richard Earnshaw (lists) wrote:
> On 04/05/17 18:38, Wilco Dijkstra wrote:
> > Richard Earnshaw wrote:
> > 
>>> -  5,                                         /* Max cond insns.  */
>>> +  2,                                         /* Max cond insns.  */
>> 
>>> This parameter is also used for A32 code.  Is that really the right
>>> number there as well?
>> 
>> Yes, this parameter has always been the same for ARM and Thumb-2.
>
> I know that.  I'm questioning whether that number (2) is right when on
> ARM.  It seems very low to me, especially when branches are unpredictable.

Why does it seem low? Benchmarking showed 2 was the best value for modern
cores. The same branch predictor is used, so the same settings should be used
for ARM and Thumb-2.

Wilco

    

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH][ARM] Update max_cond_insns settings
  2017-05-05 12:49       ` Wilco Dijkstra
@ 2017-05-05 12:56         ` Richard Earnshaw (lists)
  2017-05-05 16:32           ` Wilco Dijkstra
  0 siblings, 1 reply; 11+ messages in thread
From: Richard Earnshaw (lists) @ 2017-05-05 12:56 UTC (permalink / raw)
  To: Wilco Dijkstra, GCC Patches; +Cc: nd, Kyrylo Tkachov

On 05/05/17 13:42, Wilco Dijkstra wrote:
> Richard Earnshaw (lists) wrote:
>> On 04/05/17 18:38, Wilco Dijkstra wrote:
>> > Richard Earnshaw wrote:
>> > 
>>>> -  5,                                         /* Max cond insns.  */
>>>> +  2,                                         /* Max cond insns.  */
>>> 
>>>> This parameter is also used for A32 code.  Is that really the right
>>>> number there as well?
>>> 
>>> Yes, this parameter has always been the same for ARM and Thumb-2.
>>
>> I know that.  I'm questioning whether that number (2) is right when on
>> ARM.  It seems very low to me, especially when branches are unpredictable.
> 
> Why does it seem low? Benchmarking showed 2 was the best value for modern
> cores. The same branch predictor is used, so the same settings should be
> used
> for ARM and Thumb-2.
> 
> Wilco
> 
>    

Thumb2 code has to execute an additional instruction to start an IT
sequence.  It might therefore seem reasonable for the ARM sequence to be
one instruction longer.

R.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH][ARM] Update max_cond_insns settings
  2017-05-05 12:56         ` Richard Earnshaw (lists)
@ 2017-05-05 16:32           ` Wilco Dijkstra
  2017-06-13 13:56             ` Wilco Dijkstra
  0 siblings, 1 reply; 11+ messages in thread
From: Wilco Dijkstra @ 2017-05-05 16:32 UTC (permalink / raw)
  To: Richard Earnshaw, GCC Patches; +Cc: nd, Kyrylo Tkachov

Richard Earnshaw (lists) wrote:
> On 05/05/17 13:42, Wilco Dijkstra wrote:
>> Richard Earnshaw (lists) wrote:
>>> On 04/05/17 18:38, Wilco Dijkstra wrote:
>>> > Richard Earnshaw wrote:
>>> > 
>>>>> -  5,                                         /* Max cond insns.  */
>>>>> +  2,                                         /* Max cond insns.  */
>>>> 
>>>>> This parameter is also used for A32 code.  Is that really the right
>>>>> number there as well?
>>>> 
>>>> Yes, this parameter has always been the same for ARM and Thumb-2.
>>>
>>> I know that.  I'm questioning whether that number (2) is right when on
>>> ARM.  It seems very low to me, especially when branches are unpredictable.
>> 
>> Why does it seem low? Benchmarking showed 2 was the best value for modern
>> cores. The same branch predictor is used, so the same settings should be
>> used
>> for ARM and Thumb-2.
>
> Thumb2 code has to execute an additional instruction to start an IT
> sequence.  It might therefore seem reasonable for the ARM sequence to be
> one instruction longer.

The IT instruction has no inputs/outputs and thus behaves like a NOP - unlike
conditional instructions which have real latencies and additional dependencies due
to being conditional. So the overhead of IT itself is small.

Wilco

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH][ARM] Update max_cond_insns settings
  2017-05-05 16:32           ` Wilco Dijkstra
@ 2017-06-13 13:56             ` Wilco Dijkstra
  2017-06-27 15:41               ` Wilco Dijkstra
  0 siblings, 1 reply; 11+ messages in thread
From: Wilco Dijkstra @ 2017-06-13 13:56 UTC (permalink / raw)
  To: Richard Earnshaw, GCC Patches; +Cc: nd, Kyrylo Tkachov

ping
    
Richard Earnshaw (lists) wrote:
> On 05/05/17 13:42, Wilco Dijkstra wrote:
>> Richard Earnshaw (lists) wrote:
>>> On 04/05/17 18:38, Wilco Dijkstra wrote:
>>> > Richard Earnshaw wrote:
>>> > 
>>>>> -  5,                                         /* Max cond insns.  */
>>>>> +  2,                                         /* Max cond insns.  */
>>>> 
>>>>> This parameter is also used for A32 code.  Is that really the right
>>>>> number there as well?
>>>> 
>>>> Yes, this parameter has always been the same for ARM and Thumb-2.
>>>
>>> I know that.  I'm questioning whether that number (2) is right when on
>>> ARM.  It seems very low to me, especially when branches are unpredictable.
>> 
>> Why does it seem low? Benchmarking showed 2 was the best value for modern
>> cores. The same branch predictor is used, so the same settings should be
>> used
>> for ARM and Thumb-2.
>
> Thumb2 code has to execute an additional instruction to start an IT
> sequence.  It might therefore seem reasonable for the ARM sequence to be
> one instruction longer.

The IT instruction has no inputs/outputs and thus behaves like a NOP - unlike
conditional instructions which have real latencies and additional dependencies due
to being conditional. So the overhead of IT itself is small.

Wilco    

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH][ARM] Update max_cond_insns settings
  2017-06-13 13:56             ` Wilco Dijkstra
@ 2017-06-27 15:41               ` Wilco Dijkstra
  0 siblings, 0 replies; 11+ messages in thread
From: Wilco Dijkstra @ 2017-06-27 15:41 UTC (permalink / raw)
  To: Richard Earnshaw, GCC Patches; +Cc: nd, Kyrylo Tkachov


    
ping
    
Richard Earnshaw (lists) wrote:
> On 05/05/17 13:42, Wilco Dijkstra wrote:
>> Richard Earnshaw (lists) wrote:
>>> On 04/05/17 18:38, Wilco Dijkstra wrote:
>>> > Richard Earnshaw wrote:
>>> > 
>>>>> -  5,                                         /* Max cond insns.  */
>>>>> +  2,                                         /* Max cond insns.  */
>>>> 
>>>>> This parameter is also used for A32 code.  Is that really the right
>>>>> number there as well?
>>>> 
>>>> Yes, this parameter has always been the same for ARM and Thumb-2.
>>>
>>> I know that.  I'm questioning whether that number (2) is right when on
>>> ARM.  It seems very low to me, especially when branches are unpredictable.
>> 
>> Why does it seem low? Benchmarking showed 2 was the best value for modern
>> cores. The same branch predictor is used, so the same settings should be
>> used
>> for ARM and Thumb-2.
>
> Thumb2 code has to execute an additional instruction to start an IT
> sequence.  It might therefore seem reasonable for the ARM sequence to be
> one instruction longer.

The IT instruction has no inputs/outputs and thus behaves like a NOP - unlike
conditional instructions which have real latencies and additional dependencies due
to being conditional. So the overhead of IT itself is small.

Wilco        

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH][ARM] Update max_cond_insns settings
  2017-04-12 13:02 [PATCH][ARM] Update max_cond_insns settings Wilco Dijkstra
  2017-04-20 15:40 ` Wilco Dijkstra
  2017-05-04 10:24 ` Richard Earnshaw (lists)
@ 2018-11-09 14:13 ` Wilco Dijkstra
  2 siblings, 0 replies; 11+ messages in thread
From: Wilco Dijkstra @ 2018-11-09 14:13 UTC (permalink / raw)
  To: GCC Patches; +Cc: nd, Kyrylo Tkachov

The existing setting of max_cond_insns for most cores is non-optimal.
Thumb-2 IT has a maximum limit of 4, so 5 means emitting 2 IT sequences.
Also such long sequences of conditional instructions can increase the number
of executed instructions significantly, so using 5 for max_cond_insns is
non-optimal.

Previous benchmarking showed that setting max_cond_insn to 2 was the best value
for Cortex-A15 and Cortex-A57.  All ARMv8-A cores use 2 - apart from Cortex-A35
and Cortex-A53.  Given that using 5 is worse, set it to 2.  This also has the
advantage of producing more uniform code.

Bootstrap and regress OK on arm-none-linux-gnueabihf.

OK for stage 1?

ChangeLog:
2017-04-12  Wilco Dijkstra  <wdijkstr@arm.com>

        * gcc/config/arm/arm.c (arm_cortex_a53_tune): Set max_cond_insns to 2.
        (arm_cortex_a35_tune): Likewise.
---

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 29e8d1d07d918fbb2a627a653510dfc8587ee01a..1a6d552aa322114795acbb3667c6ea36963bf193 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -1967,7 +1967,7 @@ const struct tune_params arm_cortex_a35_tune =
   arm_default_branch_cost,
   &arm_default_vec_cost,
   1,                                           /* Constant limit.  */
-  5,                                           /* Max cond insns.  */
+  2,                                           /* Max cond insns.  */
   8,                                           /* Memset max inline.  */
   1,                                           /* Issue rate.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
@@ -1990,7 +1990,7 @@ const struct tune_params arm_cortex_a53_tune =
   arm_default_branch_cost,
   &arm_default_vec_cost,
   1,                                           /* Constant limit.  */
-  5,                                           /* Max cond insns.  */
+  2,                                           /* Max cond insns.  */
   8,                                           /* Memset max inline.  */
   2,                                           /* Issue rate.  */
   ARM_PREFETCH_NOT_BENEFICIAL,    

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-11-09 14:13 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-12 13:02 [PATCH][ARM] Update max_cond_insns settings Wilco Dijkstra
2017-04-20 15:40 ` Wilco Dijkstra
2017-05-04 10:24 ` Richard Earnshaw (lists)
2017-05-04 17:39   ` Wilco Dijkstra
2017-05-05 12:42     ` Richard Earnshaw (lists)
2017-05-05 12:49       ` Wilco Dijkstra
2017-05-05 12:56         ` Richard Earnshaw (lists)
2017-05-05 16:32           ` Wilco Dijkstra
2017-06-13 13:56             ` Wilco Dijkstra
2017-06-27 15:41               ` Wilco Dijkstra
2018-11-09 14:13 ` Wilco Dijkstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).