[PATCH][ARM] Switch to default sched pressure algorithm

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH][ARM] Switch to default sched pressure algorithm
@ 2019-07-29 17:20 Wilco Dijkstra
  2019-07-30  9:29 ` Christophe Lyon
  2019-08-19 15:53 ` Wilco Dijkstra
  0 siblings, 2 replies; 18+ messages in thread
From: Wilco Dijkstra @ 2019-07-29 17:20 UTC (permalink / raw)
  To: GCC Patches; +Cc: nd, Richard Earnshaw, Kyrylo Tkachov

Currently the Arm backend selects the alternative sched pressure algorithm.
The issue is that this doesn't take register pressure into account, and so
it causes significant additional spilling on Arm where there are only 14
allocatable registers.  SPEC2006 shows significant codesize reduction
with the default pressure algorithm, so switch back to that.  PR77308 shows
~800 fewer instructions.

SPECINT2006 is ~0.6% faster on Cortex-A57 together with the other DImode
patches. Overall SPEC codesize is 1.1% smaller.

Bootstrap & regress OK on arm-none-linux-gnueabihf --with-cpu=cortex-a57

ChangeLog:
2019-07-29  Wilco Dijkstra  <wdijkstr@arm.com>

	* config/arm/arm.c (arm_option_override): Don't override sched
	pressure algorithm.

--

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 81286cadf32f908e045d704128c5e06842e0cc92..628cf02f23fb29392a63d87f561c3ee2fb73a515 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -3575,11 +3575,6 @@ arm_option_override (void)
   if (use_neon_for_64bits == 1)
      prefer_neon_for_64bits = true;
 
-  /* Use the alternative scheduling-pressure algorithm by default.  */
-  maybe_set_param_value (PARAM_SCHED_PRESSURE_ALGORITHM, SCHED_PRESSURE_MODEL,
-			 global_options.x_param_values,
-			 global_options_set.x_param_values);
-
   /* Look through ready list and all of queue for instructions
      relevant for L2 auto-prefetcher.  */
   int param_sched_autopref_queue_depth;

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH][ARM] Switch to default sched pressure algorithm
  2019-07-29 17:20 [PATCH][ARM] Switch to default sched pressure algorithm Wilco Dijkstra
@ 2019-07-30  9:29 ` Christophe Lyon
  2019-07-30  9:34   ` Ramana Radhakrishnan
  2019-08-19 15:53 ` Wilco Dijkstra
  1 sibling, 1 reply; 18+ messages in thread
From: Christophe Lyon @ 2019-07-30  9:29 UTC (permalink / raw)
  To: Wilco Dijkstra; +Cc: GCC Patches, nd, Richard Earnshaw, Kyrylo Tkachov

On Mon, 29 Jul 2019 at 18:49, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
>
> Currently the Arm backend selects the alternative sched pressure algorithm.
> The issue is that this doesn't take register pressure into account, and so
> it causes significant additional spilling on Arm where there are only 14
> allocatable registers.  SPEC2006 shows significant codesize reduction
> with the default pressure algorithm, so switch back to that.  PR77308 shows
> ~800 fewer instructions.
>
> SPECINT2006 is ~0.6% faster on Cortex-A57 together with the other DImode
> patches. Overall SPEC codesize is 1.1% smaller.
>

Hi Wilco,

Do you know which benchmarks were used when this was checked-in?
It isn't clear from https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00706.html

Thanks,

Christophe

> Bootstrap & regress OK on arm-none-linux-gnueabihf --with-cpu=cortex-a57
>
> ChangeLog:
> 2019-07-29  Wilco Dijkstra  <wdijkstr@arm.com>
>
>         * config/arm/arm.c (arm_option_override): Don't override sched
>         pressure algorithm.
>
> --
>
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index 81286cadf32f908e045d704128c5e06842e0cc92..628cf02f23fb29392a63d87f561c3ee2fb73a515 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -3575,11 +3575,6 @@ arm_option_override (void)
>    if (use_neon_for_64bits == 1)
>       prefer_neon_for_64bits = true;
>
> -  /* Use the alternative scheduling-pressure algorithm by default.  */
> -  maybe_set_param_value (PARAM_SCHED_PRESSURE_ALGORITHM, SCHED_PRESSURE_MODEL,
> -                        global_options.x_param_values,
> -                        global_options_set.x_param_values);
> -
>    /* Look through ready list and all of queue for instructions
>       relevant for L2 auto-prefetcher.  */
>    int param_sched_autopref_queue_depth;
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH][ARM] Switch to default sched pressure algorithm
  2019-07-30  9:29 ` Christophe Lyon
@ 2019-07-30  9:34   ` Ramana Radhakrishnan
  2019-07-30 12:53     ` Richard Earnshaw (lists)
  0 siblings, 1 reply; 18+ messages in thread
From: Ramana Radhakrishnan @ 2019-07-30  9:34 UTC (permalink / raw)
  To: Christophe Lyon, Wilco Dijkstra
  Cc: GCC Patches, nd, Richard Earnshaw, Kyrylo Tkachov

On 30/07/2019 10:08, Christophe Lyon wrote:
> On Mon, 29 Jul 2019 at 18:49, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
>>
>> Currently the Arm backend selects the alternative sched pressure algorithm.
>> The issue is that this doesn't take register pressure into account, and so
>> it causes significant additional spilling on Arm where there are only 14
>> allocatable registers.  SPEC2006 shows significant codesize reduction
>> with the default pressure algorithm, so switch back to that.  PR77308 shows
>> ~800 fewer instructions.
>>
>> SPECINT2006 is ~0.6% faster on Cortex-A57 together with the other DImode
>> patches. Overall SPEC codesize is 1.1% smaller.
>>
> 
> Hi Wilco,
> 
> Do you know which benchmarks were used when this was checked-in?
> It isn't clear from https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00706.html

It was from my time in Linaro and thus would have been a famous embedded 
benchmark, coremark , spec2000 - all tested probably on cortex-a9 and 
Cortex-A15. In addition to this I would like to see what the impact of 
this is on something like Cortex-A53 as the issue rates are likely to be 
different on the schedulers causing different behaviour.


I don't have all the notes today for that - maybe you can look into the 
linaro wiki.

I am concerned about taking this patch in without some more data across 
a variety of cores.

Thanks,
Ramana


> 
> Thanks,
> 
> Christophe
> 
>> Bootstrap & regress OK on arm-none-linux-gnueabihf --with-cpu=cortex-a57
>>
>> ChangeLog:
>> 2019-07-29  Wilco Dijkstra  <wdijkstr@arm.com>
>>
>>          * config/arm/arm.c (arm_option_override): Don't override sched
>>          pressure algorithm.
>>
>> --
>>
>> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
>> index 81286cadf32f908e045d704128c5e06842e0cc92..628cf02f23fb29392a63d87f561c3ee2fb73a515 100644
>> --- a/gcc/config/arm/arm.c
>> +++ b/gcc/config/arm/arm.c
>> @@ -3575,11 +3575,6 @@ arm_option_override (void)
>>     if (use_neon_for_64bits == 1)
>>        prefer_neon_for_64bits = true;
>>
>> -  /* Use the alternative scheduling-pressure algorithm by default.  */
>> -  maybe_set_param_value (PARAM_SCHED_PRESSURE_ALGORITHM, SCHED_PRESSURE_MODEL,
>> -                        global_options.x_param_values,
>> -                        global_options_set.x_param_values);
>> -
>>     /* Look through ready list and all of queue for instructions
>>        relevant for L2 auto-prefetcher.  */
>>     int param_sched_autopref_queue_depth;
>>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH][ARM] Switch to default sched pressure algorithm
  2019-07-30  9:34   ` Ramana Radhakrishnan
@ 2019-07-30 12:53     ` Richard Earnshaw (lists)
  2019-07-30 15:16       ` Wilco Dijkstra
  0 siblings, 1 reply; 18+ messages in thread
From: Richard Earnshaw (lists) @ 2019-07-30 12:53 UTC (permalink / raw)
  To: Ramana Radhakrishnan, Christophe Lyon, Wilco Dijkstra
  Cc: GCC Patches, nd, Kyrylo Tkachov



On 30/07/2019 10:31, Ramana Radhakrishnan wrote:
> On 30/07/2019 10:08, Christophe Lyon wrote:
>> On Mon, 29 Jul 2019 at 18:49, Wilco Dijkstra <Wilco.Dijkstra@arm.com> 
>> wrote:
>>>
>>> Currently the Arm backend selects the alternative sched pressure 
>>> algorithm.
>>> The issue is that this doesn't take register pressure into account, 
>>> and so
>>> it causes significant additional spilling on Arm where there are only 14
>>> allocatable registers.Â  SPEC2006 shows significant codesize reduction
>>> with the default pressure algorithm, so switch back to that.Â  PR77308 
>>> shows
>>> ~800 fewer instructions.
>>>
>>> SPECINT2006 is ~0.6% faster on Cortex-A57 together with the other DImode
>>> patches. Overall SPEC codesize is 1.1% smaller.
>>>
>>
>> Hi Wilco,
>>
>> Do you know which benchmarks were used when this was checked-in?
>> It isn't clear from 
>> https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00706.html
> 
> It was from my time in Linaro and thus would have been a famous embedded 
> benchmark, coremark , spec2000 - all tested probably on cortex-a9 and 
> Cortex-A15. In addition to this I would like to see what the impact of 
> this is on something like Cortex-A53 as the issue rates are likely to be 
> different on the schedulers causing different behaviour.
> 
> 
> I don't have all the notes today for that - maybe you can look into the 
> linaro wiki.
> 
> I am concerned about taking this patch in without some more data across 
> a variety of cores.
> 

My concern is the original patch 
(https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00706.html) is lacking in 
any real detail as to the reasons for the choice of the second algorithm 
over the first.

- It's not clear what the win was
- It's not clear what outliers there were and whether they were significant.

And finally, it's not clear if, 7 years later, this is still the best 
choice.

If the second algorithm really is better, why is no other target using 
it by default?

I think we need a bit more information (both ways).  In particular I'm 
concerned not just by the overall benchmark average, but also the amount 
of variance across the benchmarks.  I think the default needs to avoid 
significant outliers if at all possible, even if it is marginally less 
good on the average.

R.

> Thanks,
> Ramana
> 
> 
>>
>> Thanks,
>>
>> Christophe
>>
>>> Bootstrap & regress OK on arm-none-linux-gnueabihf --with-cpu=cortex-a57
>>>
>>> ChangeLog:
>>> 2019-07-29Â  Wilco DijkstraÂ  <wdijkstr@arm.com>
>>>
>>> Â Â Â Â Â Â Â Â  * config/arm/arm.c (arm_option_override): Don't override sched
>>> Â Â Â Â Â Â Â Â  pressure algorithm.
>>>
>>> -- 
>>>
>>> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
>>> index 
>>> 81286cadf32f908e045d704128c5e06842e0cc92..628cf02f23fb29392a63d87f561c3ee2fb73a515 
>>> 100644
>>> --- a/gcc/config/arm/arm.c
>>> +++ b/gcc/config/arm/arm.c
>>> @@ -3575,11 +3575,6 @@ arm_option_override (void)
>>> Â Â Â  if (use_neon_for_64bits == 1)
>>> Â Â Â Â Â Â  prefer_neon_for_64bits = true;
>>>
>>> -Â  /* Use the alternative scheduling-pressure algorithm by default.Â  */
>>> -Â  maybe_set_param_value (PARAM_SCHED_PRESSURE_ALGORITHM, 
>>> SCHED_PRESSURE_MODEL,
>>> -Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  global_options.x_param_values,
>>> -Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  global_options_set.x_param_values);
>>> -
>>> Â Â Â  /* Look through ready list and all of queue for instructions
>>> Â Â Â Â Â Â  relevant for L2 auto-prefetcher.Â  */
>>> Â Â Â  int param_sched_autopref_queue_depth;
>>>
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH][ARM] Switch to default sched pressure algorithm
  2019-07-30 12:53     ` Richard Earnshaw (lists)
@ 2019-07-30 15:16       ` Wilco Dijkstra
  2019-10-10 21:38         ` Ramana Radhakrishnan
  0 siblings, 1 reply; 18+ messages in thread
From: Wilco Dijkstra @ 2019-07-30 15:16 UTC (permalink / raw)
  To: Richard Earnshaw, Ramana Radhakrishnan, Christophe Lyon
  Cc: GCC Patches, nd, Kyrylo Tkachov

Hi all,

 >On 30/07/2019 10:31, Ramana Radhakrishnan wrote:
 >> On 30/07/2019 10:08, Christophe Lyon wrote:

 >>> Hi Wilco,
 >>>
 >>> Do you know which benchmarks were used when this was checked-in?
 >>> It isn't clear from 
 >>> https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00706.html
 >> 
 >> It was from my time in Linaro and thus would have been a famous embedded 
 >> benchmark, coremark , spec2000 - all tested probably on cortex-a9 and 
 >> Cortex-A15. In addition to this I would like to see what the impact of 
 >> this is on something like Cortex-A53 as the issue rates are likely to be 
 >> different on the schedulers causing different behaviour.

Obviously there are differences between various schedulers, but the general
issue is that register pressure is increased many times beyond the spilling limit
(a few cases I looked at had a pressure well over 120 when there are only 14
integer registers - this causes panic spilling in the register allocator).

In fact the spilling overhead between the 2 algorithms is almost identical on
Cortex-A53 and Cortex-A57, so the issue isn't directly related to the pipeline
model used. It seems more related to the scheduler being too aggressive
and not caring about register pressure at all (for example lifting a load 100
instructions before its use so it must be spilled).

 >> I don't have all the notes today for that - maybe you can look into the 
 >> linaro wiki.
 >> 
 >> I am concerned about taking this patch in without some more data across 
 >> a variety of cores.
 >> 
 >
 > My concern is the original patch 
 > (https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00706.html) is lacking in 
 > any real detail as to the reasons for the choice of the second algorithm 
 > over the first.
 > 
 > - It's not clear what the win was
 > - It's not clear what outliers there were and whether they were significant.
 >
 > And finally, it's not clear if, 7 years later, this is still the best 
 > choice.
 > 
 > If the second algorithm really is better, why is no other target using 
 > it by default?
 >
 > I think we need a bit more information (both ways).  In particular I'm 
 > concerned not just by the overall benchmark average, but also the amount 
 > of variance across the benchmarks.  I think the default needs to avoid 
 > significant outliers if at all possible, even if it is marginally less 
 > good on the average.

The results clearly show that algorithm 1 works best on Arm today - I haven't
seen a single benchmark where algorithm 2 results in less spilling. We could
tune algorithm 2 so it switches back to algorithm 1 when register pressure is
high or a basic block is large. However until it is fixed, the evidence is that
algorithm 1 is the best choice for current cores.

Wilco

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH][ARM] Switch to default sched pressure algorithm
  2019-07-30 15:16       ` Wilco Dijkstra
@ 2019-10-10 21:38         ` Ramana Radhakrishnan
  2019-10-11 17:33           ` Wilco Dijkstra
  0 siblings, 1 reply; 18+ messages in thread
From: Ramana Radhakrishnan @ 2019-10-10 21:38 UTC (permalink / raw)
  To: Wilco Dijkstra
  Cc: Richard Earnshaw, Ramana Radhakrishnan, Christophe Lyon,
	GCC Patches, nd, Kyrylo Tkachov

On Tue, Jul 30, 2019 at 4:16 PM Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
>
> Hi all,
>
>  >On 30/07/2019 10:31, Ramana Radhakrishnan wrote:
>  >> On 30/07/2019 10:08, Christophe Lyon wrote:
>
>  >>> Hi Wilco,
>  >>>
>  >>> Do you know which benchmarks were used when this was checked-in?
>  >>> It isn't clear from
>  >>> https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00706.html
>  >>
>  >> It was from my time in Linaro and thus would have been a famous embedded
>  >> benchmark, coremark , spec2000 - all tested probably on cortex-a9 and
>  >> Cortex-A15. In addition to this I would like to see what the impact of
>  >> this is on something like Cortex-A53 as the issue rates are likely to be
>  >> different on the schedulers causing different behaviour.
>
> Obviously there are differences between various schedulers, but the general
> issue is that register pressure is increased many times beyond the spilling limit
> (a few cases I looked at had a pressure well over 120 when there are only 14
> integer registers - this causes panic spilling in the register allocator).
>
> In fact the spilling overhead between the 2 algorithms is almost identical on
> Cortex-A53 and Cortex-A57, so the issue isn't directly related to the pipeline
> model used. It seems more related to the scheduler being too aggressive
> and not caring about register pressure at all (for example lifting a load 100
> instructions before its use so it must be spilled).

In those days it would have been the Cortex-A8, Cortex-A9 schedulers
and the Cortex-A15
schedulers and IIRC the benchmarking would have been mostly on a
Cortex-A9 board or on some
Cortex-A15 boards we had (long gone now) inside Arm.

Can you see what happens with the Cortex-A8 or Cortex-A9 schedulers to
spread the range
across some v7-a CPUs as well ? While they aren't that popular today I
would suggest
you look at them because the defaults for v7-a are still to use the
Cortex-A8 scheduler and
the Cortex-A9 scheduler might well also get used in places given the
availability of hardware.


>
>  >> I don't have all the notes today for that - maybe you can look into the
>  >> linaro wiki.
>  >>
>  >> I am concerned about taking this patch in without some more data across
>  >> a variety of cores.
>  >>
>  >
>  > My concern is the original patch
>  > (https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00706.html) is lacking in
>  > any real detail as to the reasons for the choice of the second algorithm
>  > over the first.
>  >
>  > - It's not clear what the win was
>  > - It's not clear what outliers there were and whether they were significant.
>  >
>  > And finally, it's not clear if, 7 years later, this is still the best
>  > choice.
>  >
>  > If the second algorithm really is better, why is no other target using
>  > it by default?
>  >
>  > I think we need a bit more information (both ways).  In particular I'm
>  > concerned not just by the overall benchmark average, but also the amount
>  > of variance across the benchmarks.  I think the default needs to avoid
>  > significant outliers if at all possible, even if it is marginally less
>  > good on the average.
>
> The results clearly show that algorithm 1 works best on Arm today - I haven't
> seen a single benchmark where algorithm 2 results in less spilling. We could
> tune algorithm 2 so it switches back to algorithm 1 when register pressure is
> high or a basic block is large. However until it is fixed, the evidence is that
> algorithm 1 is the best choice for current cores.

I'd be happy to move this forward if you could show if there is no
*increase* in spills
for the same range of benchmarks that you are doing for the Cortex-A8
and Cortex-A9
schedulers.

Sorry about the time it has taken. I've been a bit otherwise occupied recently.

regards
Ramana

>
> Wilco

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH][ARM] Switch to default sched pressure algorithm
  2019-10-10 21:38         ` Ramana Radhakrishnan
@ 2019-10-11 17:33           ` Wilco Dijkstra
  2019-10-11 21:44             ` Wilco Dijkstra
  2019-10-12  0:58             ` Ramana Radhakrishnan
  0 siblings, 2 replies; 18+ messages in thread
From: Wilco Dijkstra @ 2019-10-11 17:33 UTC (permalink / raw)
  To: Ramana Radhakrishnan
  Cc: Richard Earnshaw, Ramana Radhakrishnan, Christophe Lyon,
	GCC Patches, nd, Kyrylo Tkachov

Hi Ramana, 

> Can you see what happens with the Cortex-A8 or Cortex-A9 schedulers to
> spread the range across some v7-a CPUs as well ? While they aren't that popular today I
> would suggest you look at them because the defaults for v7-a are still to use the
> Cortex-A8 scheduler and the Cortex-A9 scheduler might well also get used in places given
> the availability of hardware.

The results are practically identical to Cortex-A53 and A57 - there is a huge codesize win
across the board on SPEC2006, there isn't a single benchmark that is larger (ie. more
spilling).

> I'd be happy to move this forward if you could show if there is no *increase* in spills
> for the same range of benchmarks that you are doing for the Cortex-A8 and Cortex-A9
> schedulers.

There certainly isn't. I don't think results like these could be any more one-sided, it's a
significant win for every single benchmark, both for codesize and performance!

What isn't clear is whether something has gone horribly wrong in the scheduler which
could be fixed/reverted, but as it is right now I can't see it being useful at all. This means
we should also reevaluate whether pressure scheduling now hurts AArch64 too.

Cheers,
Wilco

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH][ARM] Switch to default sched pressure algorithm
  2019-10-11 17:33           ` Wilco Dijkstra
@ 2019-10-11 21:44             ` Wilco Dijkstra
  2019-10-12  1:07               ` Ramana Radhakrishnan
  2019-10-12  0:58             ` Ramana Radhakrishnan
  1 sibling, 1 reply; 18+ messages in thread
From: Wilco Dijkstra @ 2019-10-11 21:44 UTC (permalink / raw)
  To: Ramana Radhakrishnan
  Cc: Richard Earnshaw, Ramana Radhakrishnan, Christophe Lyon,
	GCC Patches, nd, Kyrylo Tkachov

Hi,

 > the defaults for v7-a are still to use the
 > Cortex-A8 scheduler

I missed that part, but that's a serious bug btw - Cortex-A8 is 15 years old now so
way beyond obsolete. Even Cortex-A53 is ancient now, but it has an accurate scheduler
that performs surprisingly well on both in-order and out-of-order implementations.

Cheers,
Wilco

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH][ARM] Switch to default sched pressure algorithm
  2019-10-11 21:44             ` Wilco Dijkstra
@ 2019-10-12  1:07               ` Ramana Radhakrishnan
  0 siblings, 0 replies; 18+ messages in thread
From: Ramana Radhakrishnan @ 2019-10-12  1:07 UTC (permalink / raw)
  To: Wilco Dijkstra
  Cc: Richard Earnshaw, Ramana Radhakrishnan, Christophe Lyon,
	GCC Patches, nd, Kyrylo Tkachov

On Fri, Oct 11, 2019 at 10:42 PM Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
>
> Hi,
>
>  > the defaults for v7-a are still to use the
>  > Cortex-A8 scheduler
>
> I missed that part, but that's a serious bug btw - Cortex-A8 is 15 years old now so
> way beyond obsolete. Even Cortex-A53 is ancient now, but it has an accurate scheduler
> that performs surprisingly well on both in-order and out-of-order implementations.

On armv8-a we do use cortex-a53 as the default scheduler in the AArch32 backend.

regards
Ramana

>
> Cheers,
> Wilco

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH][ARM] Switch to default sched pressure algorithm
  2019-10-11 17:33           ` Wilco Dijkstra
  2019-10-11 21:44             ` Wilco Dijkstra
@ 2019-10-12  0:58             ` Ramana Radhakrishnan
  2019-10-15 18:05               ` Christophe Lyon
  1 sibling, 1 reply; 18+ messages in thread
From: Ramana Radhakrishnan @ 2019-10-12  0:58 UTC (permalink / raw)
  To: Wilco Dijkstra
  Cc: Richard Earnshaw, Ramana Radhakrishnan, Christophe Lyon,
	GCC Patches, nd, Kyrylo Tkachov

On Fri, Oct 11, 2019 at 6:19 PM Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
>
> Hi Ramana,
>
> > Can you see what happens with the Cortex-A8 or Cortex-A9 schedulers to
> > spread the range across some v7-a CPUs as well ? While they aren't that popular today I
> > would suggest you look at them because the defaults for v7-a are still to use the
> > Cortex-A8 scheduler and the Cortex-A9 scheduler might well also get used in places given
> > the availability of hardware.
>
> The results are practically identical to Cortex-A53 and A57 - there is a huge codesize win
> across the board on SPEC2006, there isn't a single benchmark that is larger (ie. more
> spilling).
>
> > I'd be happy to move this forward if you could show if there is no *increase* in spills
> > for the same range of benchmarks that you are doing for the Cortex-A8 and Cortex-A9
> > schedulers.
>
> There certainly isn't. I don't think results like these could be any more one-sided, it's a
> significant win for every single benchmark, both for codesize and performance!
>

Ok go ahead - please be sensitive to testsuite regressions.

Ramana


> What isn't clear is whether something has gone horribly wrong in the scheduler which
> could be fixed/reverted, but as it is right now I can't see it being useful at all. This means
> we should also reevaluate whether pressure scheduling now hurts AArch64 too.
>
> Cheers,
> Wilco

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH][ARM] Switch to default sched pressure algorithm
  2019-10-12  0:58             ` Ramana Radhakrishnan
@ 2019-10-15 18:05               ` Christophe Lyon
  2019-10-16 12:51                 ` Wilco Dijkstra
  0 siblings, 1 reply; 18+ messages in thread
From: Christophe Lyon @ 2019-10-15 18:05 UTC (permalink / raw)
  To: Ramana Radhakrishnan
  Cc: Wilco Dijkstra, Richard Earnshaw, Ramana Radhakrishnan,
	GCC Patches, nd, Kyrylo Tkachov

On Sat, 12 Oct 2019 at 02:52, Ramana Radhakrishnan
<ramana.gcc@googlemail.com> wrote:
>
> On Fri, Oct 11, 2019 at 6:19 PM Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
> >
> > Hi Ramana,
> >
> > > Can you see what happens with the Cortex-A8 or Cortex-A9 schedulers to
> > > spread the range across some v7-a CPUs as well ? While they aren't that popular today I
> > > would suggest you look at them because the defaults for v7-a are still to use the
> > > Cortex-A8 scheduler and the Cortex-A9 scheduler might well also get used in places given
> > > the availability of hardware.
> >
> > The results are practically identical to Cortex-A53 and A57 - there is a huge codesize win
> > across the board on SPEC2006, there isn't a single benchmark that is larger (ie. more
> > spilling).
> >
> > > I'd be happy to move this forward if you could show if there is no *increase* in spills
> > > for the same range of benchmarks that you are doing for the Cortex-A8 and Cortex-A9
> > > schedulers.
> >
> > There certainly isn't. I don't think results like these could be any more one-sided, it's a
> > significant win for every single benchmark, both for codesize and performance!
> >
>
> Ok go ahead - please be sensitive to testsuite regressions.
>

Hi Wilco,

I've noticed that your patch caused a regression:
FAIL: gcc.dg/tree-prof/pr77698.c scan-rtl-dump-times alignments
"internal loop alignment added" 1

Christophe


when the compiler is configured --with-mode thumb (or forcing -mthumb
when running the tests)
> Ramana
>
>
> > What isn't clear is whether something has gone horribly wrong in the scheduler which
> > could be fixed/reverted, but as it is right now I can't see it being useful at all. This means
> > we should also reevaluate whether pressure scheduling now hurts AArch64 too.
> >
> > Cheers,
> > Wilco

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH][ARM] Switch to default sched pressure algorithm
  2019-10-15 18:05               ` Christophe Lyon
@ 2019-10-16 12:51                 ` Wilco Dijkstra
  2019-10-16 15:43                   ` Richard Earnshaw (lists)
  0 siblings, 1 reply; 18+ messages in thread
From: Wilco Dijkstra @ 2019-10-16 12:51 UTC (permalink / raw)
  To: Christophe Lyon, Ramana Radhakrishnan
  Cc: Richard Earnshaw, Ramana Radhakrishnan, GCC Patches, nd, Kyrylo Tkachov

Hi Christophe,

> I've noticed that your patch caused a regression:
> FAIL: gcc.dg/tree-prof/pr77698.c scan-rtl-dump-times alignments
> "internal loop alignment added" 1

That's just a testism - it only tests for loop alignment and doesn't
consider the possibility of the loop being jumped into like this:

.L17:
        adds    r0, r0, #1
        b       .L27
.L6:
        ldr     r4, [r2, #12]
        adds    r0, r0, #4
        ldr     lr, [r1]
        str     lr, [r3, r4, lsl #2]
        ldr     r4, [r2, #12]
        ldr     lr, [r1]
        str     lr, [r3, r4, lsl #2]
        ldr     r4, [r2, #12]
        ldr     lr, [r1]
        str     lr, [r3, r4, lsl #2]
.L27:
        ldr     r4, [r2, #12]
        cmp     ip, r0
        ldr     lr, [r1]
        str     lr, [r3, r4, lsl #2]
        bne     .L6
        pop     {r4, pc}

It seems minor changes in scheduling allows blocks to be commoned or not.
The underlying issue is that commoning like this should not be allowed on blocks
with different profile stats - particularly on loops where it inhibits scheduling of
the loop itself.

Cheers,
Wilco

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH][ARM] Switch to default sched pressure algorithm
  2019-10-16 12:51                 ` Wilco Dijkstra
@ 2019-10-16 15:43                   ` Richard Earnshaw (lists)
  2019-12-19 13:26                     ` Wilco Dijkstra
  0 siblings, 1 reply; 18+ messages in thread
From: Richard Earnshaw (lists) @ 2019-10-16 15:43 UTC (permalink / raw)
  To: Wilco Dijkstra, Christophe Lyon, Ramana Radhakrishnan
  Cc: Ramana Radhakrishnan, GCC Patches, nd, Kyrylo Tkachov

On 16/10/2019 13:13, Wilco Dijkstra wrote:
> Hi Christophe,
> 
>> I've noticed that your patch caused a regression:
>> FAIL: gcc.dg/tree-prof/pr77698.c scan-rtl-dump-times alignments
>> "internal loop alignment added" 1
> 
> That's just a testism - it only tests for loop alignment and doesn't
> consider the possibility of the loop being jumped into like this:
> 
> .L17:
>  Â  Â  Â  Â  adds Â  Â r0, r0, #1
>  Â  Â  Â  Â  b Â  Â  Â  .L27
> .L6:
>  Â  Â  Â  Â  ldr Â  Â  r4, [r2, #12]
>  Â  Â  Â  Â  adds Â  Â r0, r0, #4
>  Â  Â  Â  Â  ldr Â  Â  lr, [r1]
>  Â  Â  Â  Â  str Â  Â  lr, [r3, r4, lsl #2]
>  Â  Â  Â  Â  ldr Â  Â  r4, [r2, #12]
>  Â  Â  Â  Â  ldr Â  Â  lr, [r1]
>  Â  Â  Â  Â  str Â  Â  lr, [r3, r4, lsl #2]
>  Â  Â  Â  Â  ldr Â  Â  r4, [r2, #12]
>  Â  Â  Â  Â  ldr Â  Â  lr, [r1]
>  Â  Â  Â  Â  str Â  Â  lr, [r3, r4, lsl #2]
> .L27:
>  Â  Â  Â  Â  ldr Â  Â  r4, [r2, #12]
>  Â  Â  Â  Â  cmp Â  Â  ip, r0
>  Â  Â  Â  Â  ldr Â  Â  lr, [r1]
>  Â  Â  Â  Â  str Â  Â  lr, [r3, r4, lsl #2]
>  Â  Â  Â  Â  bne Â  Â  .L6
>  Â  Â  Â  Â  pop Â  Â  {r4, pc}
> 
> It seems minor changes in scheduling allows blocks to be commoned or not.
> The underlying issue is that commoning like this should not be allowed on blocks
> with different profile stats - particularly on loops where it inhibits scheduling of
> the loop itself.
> 
> Cheers,
> Wilco
> 

So what's your proposed solution?  Leaving the test failing is not an 
option.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH][ARM] Switch to default sched pressure algorithm
  2019-10-16 15:43                   ` Richard Earnshaw (lists)
@ 2019-12-19 13:26                     ` Wilco Dijkstra
  0 siblings, 0 replies; 18+ messages in thread
From: Wilco Dijkstra @ 2019-12-19 13:26 UTC (permalink / raw)
  To: Richard Earnshaw, Christophe Lyon; +Cc: GCC Patches

Hi,

>> I've noticed that your patch caused a regression:
>> FAIL: gcc.dg/tree-prof/pr77698.c scan-rtl-dump-times alignments
>> "internal loop alignment added" 1

I've created https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93007

Cheers,
Wilco

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH][ARM] Switch to default sched pressure algorithm
  2019-07-29 17:20 [PATCH][ARM] Switch to default sched pressure algorithm Wilco Dijkstra
  2019-07-30  9:29 ` Christophe Lyon
@ 2019-08-19 15:53 ` Wilco Dijkstra
  2019-09-02 12:11   ` Wilco Dijkstra
  1 sibling, 1 reply; 18+ messages in thread
From: Wilco Dijkstra @ 2019-08-19 15:53 UTC (permalink / raw)
  To: GCC Patches; +Cc: nd, Richard Earnshaw, Kyrylo Tkachov


ping


  
Currently the Arm backend selects the alternative sched pressure algorithm.
 The issue is that this doesn't take register pressure into account, and so
 it causes significant additional spilling on Arm where there are only 14
 allocatable registers.  SPEC2006 shows significant codesize reduction
 with the default pressure algorithm, so switch back to that.  PR77308 shows
 ~800 fewer instructions.
 
 SPECINT2006 is ~0.6% faster on Cortex-A57 together with the other DImode
 patches. Overall SPEC codesize is 1.1% smaller.
 
 Bootstrap & regress OK on arm-none-linux-gnueabihf --with-cpu=cortex-a57
 
 ChangeLog:
 2019-07-29  Wilco Dijkstra  <wdijkstr@arm.com>
 
         * config/arm/arm.c (arm_option_override): Don't override sched
         pressure algorithm.
 
 --
 
 diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
 index 81286cadf32f908e045d704128c5e06842e0cc92..628cf02f23fb29392a63d87f561c3ee2fb73a515 100644
 --- a/gcc/config/arm/arm.c
 +++ b/gcc/config/arm/arm.c
 @@ -3575,11 +3575,6 @@ arm_option_override (void)
    if (use_neon_for_64bits == 1)
       prefer_neon_for_64bits = true;
  
 -  /* Use the alternative scheduling-pressure algorithm by default.  */
 -  maybe_set_param_value (PARAM_SCHED_PRESSURE_ALGORITHM, SCHED_PRESSURE_MODEL,
 -                        global_options.x_param_values,
 -                        global_options_set.x_param_values);
 -
    /* Look through ready list and all of queue for instructions
       relevant for L2 auto-prefetcher.  */
    int param_sched_autopref_queue_depth;
 
     

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH][ARM] Switch to default sched pressure algorithm
  2019-08-19 15:53 ` Wilco Dijkstra
@ 2019-09-02 12:11   ` Wilco Dijkstra
  2019-09-09 17:05     ` Wilco Dijkstra
  0 siblings, 1 reply; 18+ messages in thread
From: Wilco Dijkstra @ 2019-09-02 12:11 UTC (permalink / raw)
  To: GCC Patches; +Cc: nd, Richard Earnshaw, Kyrylo Tkachov

ping

 Currently the Arm backend selects the alternative sched pressure algorithm.
  The issue is that this doesn't take register pressure into account, and so
  it causes significant additional spilling on Arm where there are only 14
  allocatable registers.  SPEC2006 shows significant codesize reduction
  with the default pressure algorithm, so switch back to that.  PR77308 shows
  ~800 fewer instructions.

  SPECINT2006 is ~0.6% faster on Cortex-A57 together with the other DImode
  patches. Overall SPEC codesize is 1.1% smaller.

  Bootstrap & regress OK on arm-none-linux-gnueabihf --with-cpu=cortex-a57

  ChangeLog:
  2019-07-29  Wilco Dijkstra  <wdijkstr@arm.com>

          * config/arm/arm.c (arm_option_override): Don't override sched
          pressure algorithm.

  --

  diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
  index 81286cadf32f908e045d704128c5e06842e0cc92..628cf02f23fb29392a63d87f561c3ee2fb73a515 100644
  --- a/gcc/config/arm/arm.c
  +++ b/gcc/config/arm/arm.c
  @@ -3575,11 +3575,6 @@ arm_option_override (void)
     if (use_neon_for_64bits == 1)
        prefer_neon_for_64bits = true;

  -  /* Use the alternative scheduling-pressure algorithm by default.  */
  -  maybe_set_param_value (PARAM_SCHED_PRESSURE_ALGORITHM, SCHED_PRESSURE_MODEL,
  -                        global_options.x_param_values,
  -                        global_options_set.x_param_values);
  -
     /* Look through ready list and all of queue for instructions
        relevant for L2 auto-prefetcher.  */
     int param_sched_autopref_queue_depth;

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH][ARM] Switch to default sched pressure algorithm
  2019-09-02 12:11   ` Wilco Dijkstra
@ 2019-09-09 17:05     ` Wilco Dijkstra
  2019-10-10 17:25       ` Wilco Dijkstra
  0 siblings, 1 reply; 18+ messages in thread
From: Wilco Dijkstra @ 2019-09-09 17:05 UTC (permalink / raw)
  To: GCC Patches; +Cc: nd, Richard Earnshaw, Kyrylo Tkachov

ping

  Currently the Arm backend selects the alternative sched pressure algorithm.
   The issue is that this doesn't take register pressure into account, and so
   it causes significant additional spilling on Arm where there are only 14
   allocatable registers.  SPEC2006 shows significant codesize reduction
   with the default pressure algorithm, so switch back to that.  PR77308 shows
   ~800 fewer instructions.

   SPECINT2006 is ~0.6% faster on Cortex-A57 together with the other DImode
   patches. Overall SPEC codesize is 1.1% smaller.

   Bootstrap & regress OK on arm-none-linux-gnueabihf --with-cpu=cortex-a57

   ChangeLog:
   2019-07-29  Wilco Dijkstra  <wdijkstr@arm.com>

           * config/arm/arm.c (arm_option_override): Don't override sched
           pressure algorithm.

   --

   diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
   index 81286cadf32f908e045d704128c5e06842e0cc92..628cf02f23fb29392a63d87f561c3ee2fb73a515 100644
   --- a/gcc/config/arm/arm.c
   +++ b/gcc/config/arm/arm.c
   @@ -3575,11 +3575,6 @@ arm_option_override (void)
      if (use_neon_for_64bits == 1)
         prefer_neon_for_64bits = true;

   -  /* Use the alternative scheduling-pressure algorithm by default.  */
   -  maybe_set_param_value (PARAM_SCHED_PRESSURE_ALGORITHM, SCHED_PRESSURE_MODEL,
   -                        global_options.x_param_values,
   -                        global_options_set.x_param_values);
   -
      /* Look through ready list and all of queue for instructions
         relevant for L2 auto-prefetcher.  */
      int param_sched_autopref_queue_depth;

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH][ARM] Switch to default sched pressure algorithm
  2019-09-09 17:05     ` Wilco Dijkstra
@ 2019-10-10 17:25       ` Wilco Dijkstra
  0 siblings, 0 replies; 18+ messages in thread
From: Wilco Dijkstra @ 2019-10-10 17:25 UTC (permalink / raw)
  To: GCC Patches, Richard Earnshaw, Kyrylo Tkachov, Richard Sandiford; +Cc: nd

ping

   Currently the Arm backend selects the alternative sched pressure algorithm.
   The issue is that this doesn't take register pressure into account, and so
   it causes significant additional spilling on Arm where there are only 14
   allocatable registers.  SPEC2006 shows significant codesize reduction
   with the default pressure algorithm, so switch back to that.  PR77308 shows
   ~800 fewer instructions.
   
   SPECINT2006 is ~0.6% faster on Cortex-A57 together with the other DImode
   patches. Overall SPEC codesize is 1.1% smaller.
   
   Bootstrap & regress OK on arm-none-linux-gnueabihf --with-cpu=cortex-a57
   
   ChangeLog:
   2019-07-29  Wilco Dijkstra  <wdijkstr@arm.com>
   
           * config/arm/arm.c (arm_option_override): Don't override sched
           pressure algorithm.
   
   --
   
   diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
   index 81286cadf32f908e045d704128c5e06842e0cc92..628cf02f23fb29392a63d87f561c3ee2fb73a515 100644
   --- a/gcc/config/arm/arm.c
   +++ b/gcc/config/arm/arm.c
   @@ -3575,11 +3575,6 @@ arm_option_override (void)
      if (use_neon_for_64bits == 1)
         prefer_neon_for_64bits = true;
    
   -  /* Use the alternative scheduling-pressure algorithm by default.  */
   -  maybe_set_param_value (PARAM_SCHED_PRESSURE_ALGORITHM, SCHED_PRESSURE_MODEL,
   -                        global_options.x_param_values,
   -                        global_options_set.x_param_values);
   -
      /* Look through ready list and all of queue for instructions
         relevant for L2 auto-prefetcher.  */
      int param_sched_autopref_queue_depth;

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2019-12-19 13:22 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-29 17:20 [PATCH][ARM] Switch to default sched pressure algorithm Wilco Dijkstra
2019-07-30  9:29 ` Christophe Lyon
2019-07-30  9:34   ` Ramana Radhakrishnan
2019-07-30 12:53     ` Richard Earnshaw (lists)
2019-07-30 15:16       ` Wilco Dijkstra
2019-10-10 21:38         ` Ramana Radhakrishnan
2019-10-11 17:33           ` Wilco Dijkstra
2019-10-11 21:44             ` Wilco Dijkstra
2019-10-12  1:07               ` Ramana Radhakrishnan
2019-10-12  0:58             ` Ramana Radhakrishnan
2019-10-15 18:05               ` Christophe Lyon
2019-10-16 12:51                 ` Wilco Dijkstra
2019-10-16 15:43                   ` Richard Earnshaw (lists)
2019-12-19 13:26                     ` Wilco Dijkstra
2019-08-19 15:53 ` Wilco Dijkstra
2019-09-02 12:11   ` Wilco Dijkstra
2019-09-09 17:05     ` Wilco Dijkstra
2019-10-10 17:25       ` Wilco Dijkstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).