public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] AArch64: Tune case-values-threshold
@ 2021-10-18 16:19 Wilco Dijkstra
  2021-10-18 16:31 ` Richard Sandiford
  0 siblings, 1 reply; 7+ messages in thread
From: Wilco Dijkstra @ 2021-10-18 16:19 UTC (permalink / raw)
  To: GCC Patches; +Cc: Richard Sandiford, Kyrylo Tkachov


Tune the case-values-threshold setting for modern cores.  A value of 11 improves
SPECINT2017 by 0.2% and reduces codesize by 0.04%.  With -Os use value 8 which
reduces codesize by 0.07%.

Passes regress, OK for commit?

ChangeLog:

2021-10-18  Wilco Dijkstra  <wdijkstr@arm.com>

        * config/aarch64/aarch64.c (aarch64_case_values_threshold):
        Change to 8 with -Os, 11 otherwise.

---

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index f5b25a7f7041645921e6ad85714efda73b993492..adc5256c5ccc1182710d87cc6a1091083d888663 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -9360,8 +9360,8 @@ aarch64_cannot_force_const_mem (machine_mode mode ATTRIBUTE_UNUSED, rtx x)
    The expansion for a table switch is quite expensive due to the number
    of instructions, the table lookup and hard to predict indirect jump.
    When optimizing for speed, and -O3 enabled, use the per-core tuning if 
-   set, otherwise use tables for > 16 cases as a tradeoff between size and
-   performance.  When optimizing for size, use the default setting.  */
+   set, otherwise use tables for >= 11 cases as a tradeoff between size and
+   performance.  When optimizing for size, use 8 for smallest codesize.  */
 
 static unsigned int
 aarch64_case_values_threshold (void)
@@ -9372,7 +9372,7 @@ aarch64_case_values_threshold (void)
       && selected_cpu->tune->max_case_values != 0)
     return selected_cpu->tune->max_case_values;
   else
-    return optimize_size ? default_case_values_threshold () : 17;
+    return optimize_size ? 8 : 11;
 }
 
 /* Return true if register REGNO is a valid index register.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] AArch64: Tune case-values-threshold
  2021-10-18 16:19 [PATCH] AArch64: Tune case-values-threshold Wilco Dijkstra
@ 2021-10-18 16:31 ` Richard Sandiford
  2021-10-19 12:57   ` Wilco Dijkstra
  0 siblings, 1 reply; 7+ messages in thread
From: Richard Sandiford @ 2021-10-18 16:31 UTC (permalink / raw)
  To: Wilco Dijkstra; +Cc: GCC Patches, Kyrylo Tkachov

Wilco Dijkstra <Wilco.Dijkstra@arm.com> writes:
> Tune the case-values-threshold setting for modern cores.  A value of 11 improves
> SPECINT2017 by 0.2% and reduces codesize by 0.04%.  With -Os use value 8 which
> reduces codesize by 0.07%.
>
> Passes regress, OK for commit?
>
> ChangeLog:
>
> 2021-10-18  Wilco Dijkstra  <wdijkstr@arm.com>
>
>         * config/aarch64/aarch64.c (aarch64_case_values_threshold):
>         Change to 8 with -Os, 11 otherwise.
>
> ---
>
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index f5b25a7f7041645921e6ad85714efda73b993492..adc5256c5ccc1182710d87cc6a1091083d888663 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -9360,8 +9360,8 @@ aarch64_cannot_force_const_mem (machine_mode mode ATTRIBUTE_UNUSED, rtx x)
>     The expansion for a table switch is quite expensive due to the number
>     of instructions, the table lookup and hard to predict indirect jump.
>     When optimizing for speed, and -O3 enabled, use the per-core tuning if
> -   set, otherwise use tables for > 16 cases as a tradeoff between size and
> -   performance.  When optimizing for size, use the default setting.  */
> +   set, otherwise use tables for >= 11 cases as a tradeoff between size and
> +   performance.  When optimizing for size, use 8 for smallest codesize.  */

I'm just concerned that here we're using the same explanation but with
different numbers.  Why are the new numbers more right than the old ones
(especially when it comes to code size, where the trade-off hasn't
really changed)?

It would be good to have more discussion of why certain numbers are
too small or too high, and why 8 is the right pivot point for -Os.

Thanks,
Richard

>
>  static unsigned int
>  aarch64_case_values_threshold (void)
> @@ -9372,7 +9372,7 @@ aarch64_case_values_threshold (void)
>        && selected_cpu->tune->max_case_values != 0)
>      return selected_cpu->tune->max_case_values;
>    else
> -    return optimize_size ? default_case_values_threshold () : 17;
> +    return optimize_size ? 8 : 11;
>  }
>
>  /* Return true if register REGNO is a valid index register.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] AArch64: Tune case-values-threshold
  2021-10-18 16:31 ` Richard Sandiford
@ 2021-10-19 12:57   ` Wilco Dijkstra
  2021-10-19 13:23     ` Richard Sandiford
  0 siblings, 1 reply; 7+ messages in thread
From: Wilco Dijkstra @ 2021-10-19 12:57 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: GCC Patches, Kyrylo Tkachov

Hi Richard,

> I'm just concerned that here we're using the same explanation but with
> different numbers.  Why are the new numbers more right than the old ones
> (especially when it comes to code size, where the trade-off hasn't
> really changed)?

Like all tuning/costing parameters, these values are never fixed but change
over time due to optimizations, micro architectures and workloads.
The previous values were out of date so that's why I retuned them by
benchmarking different values and choosing the best combinations.

> It would be good to have more discussion of why certain numbers are
> too small or too high, and why 8 is the right pivot point for -Os.

You mean add more discussion in the comment? That comment is already overly
large and too specific - it would be better to reduce it. The -Os value was never
tuned, and 8 turns out to be faster and smaller than GCC's default.

Cheers,
Wilco

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] AArch64: Tune case-values-threshold
  2021-10-19 12:57   ` Wilco Dijkstra
@ 2021-10-19 13:23     ` Richard Sandiford
  2021-10-19 14:37       ` Bernhard Reutner-Fischer
  2021-10-19 14:55       ` Wilco Dijkstra
  0 siblings, 2 replies; 7+ messages in thread
From: Richard Sandiford @ 2021-10-19 13:23 UTC (permalink / raw)
  To: Wilco Dijkstra; +Cc: GCC Patches, Kyrylo Tkachov

Wilco Dijkstra <Wilco.Dijkstra@arm.com> writes:
> Hi Richard,
>
>> I'm just concerned that here we're using the same explanation but with
>> different numbers.  Why are the new numbers more right than the old ones
>> (especially when it comes to code size, where the trade-off hasn't
>> really changed)?
>
> Like all tuning/costing parameters, these values are never fixed but change
> over time due to optimizations, micro architectures and workloads.
> The previous values were out of date so that's why I retuned them by
> benchmarking different values and choosing the best combinations.
>
>> It would be good to have more discussion of why certain numbers are
>> too small or too high, and why 8 is the right pivot point for -Os.
>
> You mean add more discussion in the comment? That comment is already overly
> large and too specific - it would be better to reduce it. The -Os value was never
> tuned, and 8 turns out to be faster and smaller than GCC's default.

The problem is that you're effectively asking for these values to be
taken on faith without providing any analysis and without describing
how you arrived at the new numbers.  Did you try other values too?
If so, how did they compare with the numbers that you finally chose?
At least that would give an indication of where the boundaries are.

For example, it's easier to believe that 8 is the right value for -Os if
you say that you tried 9 and 7 as well, and they were worse than 8 by X%
and Y%.  This would also help anyone who wants to tweak the numbers
again in future.

BTW, which CPU did you use to do the experiments?  Are the tuning
parameters for that CPU already consistent with the new generic values?

Thanks,
Richard

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] AArch64: Tune case-values-threshold
  2021-10-19 13:23     ` Richard Sandiford
@ 2021-10-19 14:37       ` Bernhard Reutner-Fischer
  2021-10-19 14:55       ` Wilco Dijkstra
  1 sibling, 0 replies; 7+ messages in thread
From: Bernhard Reutner-Fischer @ 2021-10-19 14:37 UTC (permalink / raw)
  To: Richard Sandiford, Richard Sandiford via Gcc-patches, Wilco Dijkstra
  Cc: GCC Patches

On 19 October 2021 15:23:58 CEST, Richard Sandiford via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
>Wilco Dijkstra <Wilco.Dijkstra@arm.com> writes:
>> Hi Richard,
>>
>>> I'm just concerned that here we're using the same explanation but with
>>> different numbers.  Why are the new numbers more right than the old ones
>>> (especially when it comes to code size, where the trade-off hasn't
>>> really changed)?
>>
>> Like all tuning/costing parameters, these values are never fixed but change
>> over time due to optimizations, micro architectures and workloads.
>> The previous values were out of date so that's why I retuned them by
>> benchmarking different values and choosing the best combinations.
>>
>>> It would be good to have more discussion of why certain numbers are
>>> too small or too high, and why 8 is the right pivot point for -Os.
>>
>> You mean add more discussion in the comment? That comment is already overly
>> large and too specific - it would be better to reduce it. The -Os value was never
>> tuned, and 8 turns out to be faster and smaller than GCC's default.
>
>The problem is that you're effectively asking for these values to be
>taken on faith without providing any analysis and without describing
>how you arrived at the new numbers.  Did you try other values too?
>If so, how did they compare with the numbers that you finally chose?
>At least that would give an indication of where the boundaries are.

Maybe you can show csibe benchmark numbers to show the effects:
http://szeged.github.io/csibe/

thanks,
>
>For example, it's easier to believe that 8 is the right value for -Os if
>you say that you tried 9 and 7 as well, and they were worse than 8 by X%
>and Y%.  This would also help anyone who wants to tweak the numbers
>again in future.
>
>BTW, which CPU did you use to do the experiments?  Are the tuning
>parameters for that CPU already consistent with the new generic values?
>
>Thanks,
>Richard


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] AArch64: Tune case-values-threshold
  2021-10-19 13:23     ` Richard Sandiford
  2021-10-19 14:37       ` Bernhard Reutner-Fischer
@ 2021-10-19 14:55       ` Wilco Dijkstra
  2021-10-20 11:30         ` Richard Sandiford
  1 sibling, 1 reply; 7+ messages in thread
From: Wilco Dijkstra @ 2021-10-19 14:55 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: GCC Patches, Kyrylo Tkachov

Hi Richard,

> The problem is that you're effectively asking for these values to be
> taken on faith without providing any analysis and without describing
> how you arrived at the new numbers.  Did you try other values too?
> If so, how did they compare with the numbers that you finally chose?
> At least that would give an indication of where the boundaries are.

Yes, I obviously tried other values, pretty much all in range 1-20. There is
generally a range of 4-5 values that are very similar in size, and then you
choose one in the middle which also looks good for performance.

> For example, it's easier to believe that 8 is the right value for -Os if
> you say that you tried 9 and 7 as well, and they were worse than 8 by X%
> and Y%.  This would also help anyone who wants to tweak the numbers
> again in future.

For -Os, the size range for values 6-10 is within 0.01% so they are virtually
identical and I picked the median. Whether this will remain best in the future
is unclear since it depends on so many things, so at some point it needs
to be looked at again, just like most other tunings.

> BTW, which CPU did you use to do the experiments?  Are the tuning
> parameters for that CPU already consistent with the new generic values?

This was done on Neoverse N1. Almost no CPUs use per-CPU tuning for this.

Cheers,
Wilco

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] AArch64: Tune case-values-threshold
  2021-10-19 14:55       ` Wilco Dijkstra
@ 2021-10-20 11:30         ` Richard Sandiford
  0 siblings, 0 replies; 7+ messages in thread
From: Richard Sandiford @ 2021-10-20 11:30 UTC (permalink / raw)
  To: Wilco Dijkstra; +Cc: GCC Patches, Kyrylo Tkachov

Wilco Dijkstra <Wilco.Dijkstra@arm.com> writes:
> Hi Richard,
>
>> The problem is that you're effectively asking for these values to be
>> taken on faith without providing any analysis and without describing
>> how you arrived at the new numbers.  Did you try other values too?
>> If so, how did they compare with the numbers that you finally chose?
>> At least that would give an indication of where the boundaries are.
>
> Yes, I obviously tried other values, pretty much all in range 1-20. There is
> generally a range of 4-5 values that are very similar in size, and then you
> choose one in the middle which also looks good for performance.
>
>> For example, it's easier to believe that 8 is the right value for -Os if
>> you say that you tried 9 and 7 as well, and they were worse than 8 by X%
>> and Y%.  This would also help anyone who wants to tweak the numbers
>> again in future.
>
> For -Os, the size range for values 6-10 is within 0.01% so they are virtually
> identical and I picked the median. Whether this will remain best in the future
> is unclear since it depends on so many things, so at some point it needs
> to be looked at again, just like most other tunings.

Thanks.  These details are useful.  For example, if someone finds
a compelling reason to bump the new values by +/-2 (to help with a
particular test case) then it sounds we should accept that, since it
wouldn't conflict with your work.

So the patch is OK, thanks.

(FWIW, I tried building a linux kernel I had lying around at -Os,
which also showed an improvement of ~0.07%.)

Richard

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-10-20 11:30 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-18 16:19 [PATCH] AArch64: Tune case-values-threshold Wilco Dijkstra
2021-10-18 16:31 ` Richard Sandiford
2021-10-19 12:57   ` Wilco Dijkstra
2021-10-19 13:23     ` Richard Sandiford
2021-10-19 14:37       ` Bernhard Reutner-Fischer
2021-10-19 14:55       ` Wilco Dijkstra
2021-10-20 11:30         ` Richard Sandiford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).