* [PATCH][ARM] Switch to default sched pressure algorithm @ 2019-07-29 17:20 Wilco Dijkstra 2019-07-30 9:29 ` Christophe Lyon 2019-08-19 15:53 ` Wilco Dijkstra 0 siblings, 2 replies; 18+ messages in thread From: Wilco Dijkstra @ 2019-07-29 17:20 UTC (permalink / raw) To: GCC Patches; +Cc: nd, Richard Earnshaw, Kyrylo Tkachov Currently the Arm backend selects the alternative sched pressure algorithm. The issue is that this doesn't take register pressure into account, and so it causes significant additional spilling on Arm where there are only 14 allocatable registers. SPEC2006 shows significant codesize reduction with the default pressure algorithm, so switch back to that. PR77308 shows ~800 fewer instructions. SPECINT2006 is ~0.6% faster on Cortex-A57 together with the other DImode patches. Overall SPEC codesize is 1.1% smaller. Bootstrap & regress OK on arm-none-linux-gnueabihf --with-cpu=cortex-a57 ChangeLog: 2019-07-29 Wilco Dijkstra <wdijkstr@arm.com> * config/arm/arm.c (arm_option_override): Don't override sched pressure algorithm. -- diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 81286cadf32f908e045d704128c5e06842e0cc92..628cf02f23fb29392a63d87f561c3ee2fb73a515 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -3575,11 +3575,6 @@ arm_option_override (void) if (use_neon_for_64bits == 1) prefer_neon_for_64bits = true; - /* Use the alternative scheduling-pressure algorithm by default. */ - maybe_set_param_value (PARAM_SCHED_PRESSURE_ALGORITHM, SCHED_PRESSURE_MODEL, - global_options.x_param_values, - global_options_set.x_param_values); - /* Look through ready list and all of queue for instructions relevant for L2 auto-prefetcher. */ int param_sched_autopref_queue_depth; ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH][ARM] Switch to default sched pressure algorithm 2019-07-29 17:20 [PATCH][ARM] Switch to default sched pressure algorithm Wilco Dijkstra @ 2019-07-30 9:29 ` Christophe Lyon 2019-07-30 9:34 ` Ramana Radhakrishnan 2019-08-19 15:53 ` Wilco Dijkstra 1 sibling, 1 reply; 18+ messages in thread From: Christophe Lyon @ 2019-07-30 9:29 UTC (permalink / raw) To: Wilco Dijkstra; +Cc: GCC Patches, nd, Richard Earnshaw, Kyrylo Tkachov On Mon, 29 Jul 2019 at 18:49, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote: > > Currently the Arm backend selects the alternative sched pressure algorithm. > The issue is that this doesn't take register pressure into account, and so > it causes significant additional spilling on Arm where there are only 14 > allocatable registers. SPEC2006 shows significant codesize reduction > with the default pressure algorithm, so switch back to that. PR77308 shows > ~800 fewer instructions. > > SPECINT2006 is ~0.6% faster on Cortex-A57 together with the other DImode > patches. Overall SPEC codesize is 1.1% smaller. > Hi Wilco, Do you know which benchmarks were used when this was checked-in? It isn't clear from https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00706.html Thanks, Christophe > Bootstrap & regress OK on arm-none-linux-gnueabihf --with-cpu=cortex-a57 > > ChangeLog: > 2019-07-29 Wilco Dijkstra <wdijkstr@arm.com> > > * config/arm/arm.c (arm_option_override): Don't override sched > pressure algorithm. > > -- > > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c > index 81286cadf32f908e045d704128c5e06842e0cc92..628cf02f23fb29392a63d87f561c3ee2fb73a515 100644 > --- a/gcc/config/arm/arm.c > +++ b/gcc/config/arm/arm.c > @@ -3575,11 +3575,6 @@ arm_option_override (void) > if (use_neon_for_64bits == 1) > prefer_neon_for_64bits = true; > > - /* Use the alternative scheduling-pressure algorithm by default. */ > - maybe_set_param_value (PARAM_SCHED_PRESSURE_ALGORITHM, SCHED_PRESSURE_MODEL, > - global_options.x_param_values, > - global_options_set.x_param_values); > - > /* Look through ready list and all of queue for instructions > relevant for L2 auto-prefetcher. */ > int param_sched_autopref_queue_depth; > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH][ARM] Switch to default sched pressure algorithm 2019-07-30 9:29 ` Christophe Lyon @ 2019-07-30 9:34 ` Ramana Radhakrishnan 2019-07-30 12:53 ` Richard Earnshaw (lists) 0 siblings, 1 reply; 18+ messages in thread From: Ramana Radhakrishnan @ 2019-07-30 9:34 UTC (permalink / raw) To: Christophe Lyon, Wilco Dijkstra Cc: GCC Patches, nd, Richard Earnshaw, Kyrylo Tkachov On 30/07/2019 10:08, Christophe Lyon wrote: > On Mon, 29 Jul 2019 at 18:49, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote: >> >> Currently the Arm backend selects the alternative sched pressure algorithm. >> The issue is that this doesn't take register pressure into account, and so >> it causes significant additional spilling on Arm where there are only 14 >> allocatable registers. SPEC2006 shows significant codesize reduction >> with the default pressure algorithm, so switch back to that. PR77308 shows >> ~800 fewer instructions. >> >> SPECINT2006 is ~0.6% faster on Cortex-A57 together with the other DImode >> patches. Overall SPEC codesize is 1.1% smaller. >> > > Hi Wilco, > > Do you know which benchmarks were used when this was checked-in? > It isn't clear from https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00706.html It was from my time in Linaro and thus would have been a famous embedded benchmark, coremark , spec2000 - all tested probably on cortex-a9 and Cortex-A15. In addition to this I would like to see what the impact of this is on something like Cortex-A53 as the issue rates are likely to be different on the schedulers causing different behaviour. I don't have all the notes today for that - maybe you can look into the linaro wiki. I am concerned about taking this patch in without some more data across a variety of cores. Thanks, Ramana > > Thanks, > > Christophe > >> Bootstrap & regress OK on arm-none-linux-gnueabihf --with-cpu=cortex-a57 >> >> ChangeLog: >> 2019-07-29 Wilco Dijkstra <wdijkstr@arm.com> >> >> * config/arm/arm.c (arm_option_override): Don't override sched >> pressure algorithm. >> >> -- >> >> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c >> index 81286cadf32f908e045d704128c5e06842e0cc92..628cf02f23fb29392a63d87f561c3ee2fb73a515 100644 >> --- a/gcc/config/arm/arm.c >> +++ b/gcc/config/arm/arm.c >> @@ -3575,11 +3575,6 @@ arm_option_override (void) >> if (use_neon_for_64bits == 1) >> prefer_neon_for_64bits = true; >> >> - /* Use the alternative scheduling-pressure algorithm by default. */ >> - maybe_set_param_value (PARAM_SCHED_PRESSURE_ALGORITHM, SCHED_PRESSURE_MODEL, >> - global_options.x_param_values, >> - global_options_set.x_param_values); >> - >> /* Look through ready list and all of queue for instructions >> relevant for L2 auto-prefetcher. */ >> int param_sched_autopref_queue_depth; >> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH][ARM] Switch to default sched pressure algorithm 2019-07-30 9:34 ` Ramana Radhakrishnan @ 2019-07-30 12:53 ` Richard Earnshaw (lists) 2019-07-30 15:16 ` Wilco Dijkstra 0 siblings, 1 reply; 18+ messages in thread From: Richard Earnshaw (lists) @ 2019-07-30 12:53 UTC (permalink / raw) To: Ramana Radhakrishnan, Christophe Lyon, Wilco Dijkstra Cc: GCC Patches, nd, Kyrylo Tkachov On 30/07/2019 10:31, Ramana Radhakrishnan wrote: > On 30/07/2019 10:08, Christophe Lyon wrote: >> On Mon, 29 Jul 2019 at 18:49, Wilco Dijkstra <Wilco.Dijkstra@arm.com> >> wrote: >>> >>> Currently the Arm backend selects the alternative sched pressure >>> algorithm. >>> The issue is that this doesn't take register pressure into account, >>> and so >>> it causes significant additional spilling on Arm where there are only 14 >>> allocatable registers. SPEC2006 shows significant codesize reduction >>> with the default pressure algorithm, so switch back to that. PR77308 >>> shows >>> ~800 fewer instructions. >>> >>> SPECINT2006 is ~0.6% faster on Cortex-A57 together with the other DImode >>> patches. Overall SPEC codesize is 1.1% smaller. >>> >> >> Hi Wilco, >> >> Do you know which benchmarks were used when this was checked-in? >> It isn't clear from >> https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00706.html > > It was from my time in Linaro and thus would have been a famous embedded > benchmark, coremark , spec2000 - all tested probably on cortex-a9 and > Cortex-A15. In addition to this I would like to see what the impact of > this is on something like Cortex-A53 as the issue rates are likely to be > different on the schedulers causing different behaviour. > > > I don't have all the notes today for that - maybe you can look into the > linaro wiki. > > I am concerned about taking this patch in without some more data across > a variety of cores. > My concern is the original patch (https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00706.html) is lacking in any real detail as to the reasons for the choice of the second algorithm over the first. - It's not clear what the win was - It's not clear what outliers there were and whether they were significant. And finally, it's not clear if, 7 years later, this is still the best choice. If the second algorithm really is better, why is no other target using it by default? I think we need a bit more information (both ways). In particular I'm concerned not just by the overall benchmark average, but also the amount of variance across the benchmarks. I think the default needs to avoid significant outliers if at all possible, even if it is marginally less good on the average. R. > Thanks, > Ramana > > >> >> Thanks, >> >> Christophe >> >>> Bootstrap & regress OK on arm-none-linux-gnueabihf --with-cpu=cortex-a57 >>> >>> ChangeLog: >>> 2019-07-29 Wilco Dijkstra <wdijkstr@arm.com> >>> >>>         * config/arm/arm.c (arm_option_override): Don't override sched >>>         pressure algorithm. >>> >>> -- >>> >>> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c >>> index >>> 81286cadf32f908e045d704128c5e06842e0cc92..628cf02f23fb29392a63d87f561c3ee2fb73a515 >>> 100644 >>> --- a/gcc/config/arm/arm.c >>> +++ b/gcc/config/arm/arm.c >>> @@ -3575,11 +3575,6 @@ arm_option_override (void) >>>    if (use_neon_for_64bits == 1) >>>       prefer_neon_for_64bits = true; >>> >>> - /* Use the alternative scheduling-pressure algorithm by default. */ >>> - maybe_set_param_value (PARAM_SCHED_PRESSURE_ALGORITHM, >>> SCHED_PRESSURE_MODEL, >>> -                       global_options.x_param_values, >>> -                       global_options_set.x_param_values); >>> - >>>    /* Look through ready list and all of queue for instructions >>>       relevant for L2 auto-prefetcher. */ >>>    int param_sched_autopref_queue_depth; >>> > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH][ARM] Switch to default sched pressure algorithm 2019-07-30 12:53 ` Richard Earnshaw (lists) @ 2019-07-30 15:16 ` Wilco Dijkstra 2019-10-10 21:38 ` Ramana Radhakrishnan 0 siblings, 1 reply; 18+ messages in thread From: Wilco Dijkstra @ 2019-07-30 15:16 UTC (permalink / raw) To: Richard Earnshaw, Ramana Radhakrishnan, Christophe Lyon Cc: GCC Patches, nd, Kyrylo Tkachov Hi all, >On 30/07/2019 10:31, Ramana Radhakrishnan wrote: >> On 30/07/2019 10:08, Christophe Lyon wrote: >>> Hi Wilco, >>> >>> Do you know which benchmarks were used when this was checked-in? >>> It isn't clear from >>> https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00706.html >> >> It was from my time in Linaro and thus would have been a famous embedded >> benchmark, coremark , spec2000 - all tested probably on cortex-a9 and >> Cortex-A15. In addition to this I would like to see what the impact of >> this is on something like Cortex-A53 as the issue rates are likely to be >> different on the schedulers causing different behaviour. Obviously there are differences between various schedulers, but the general issue is that register pressure is increased many times beyond the spilling limit (a few cases I looked at had a pressure well over 120 when there are only 14 integer registers - this causes panic spilling in the register allocator). In fact the spilling overhead between the 2 algorithms is almost identical on Cortex-A53 and Cortex-A57, so the issue isn't directly related to the pipeline model used. It seems more related to the scheduler being too aggressive and not caring about register pressure at all (for example lifting a load 100 instructions before its use so it must be spilled). >> I don't have all the notes today for that - maybe you can look into the >> linaro wiki. >> >> I am concerned about taking this patch in without some more data across >> a variety of cores. >> > > My concern is the original patch > (https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00706.html) is lacking in > any real detail as to the reasons for the choice of the second algorithm > over the first. > > - It's not clear what the win was > - It's not clear what outliers there were and whether they were significant. > > And finally, it's not clear if, 7 years later, this is still the best > choice. > > If the second algorithm really is better, why is no other target using > it by default? > > I think we need a bit more information (both ways). In particular I'm > concerned not just by the overall benchmark average, but also the amount > of variance across the benchmarks. I think the default needs to avoid > significant outliers if at all possible, even if it is marginally less > good on the average. The results clearly show that algorithm 1 works best on Arm today - I haven't seen a single benchmark where algorithm 2 results in less spilling. We could tune algorithm 2 so it switches back to algorithm 1 when register pressure is high or a basic block is large. However until it is fixed, the evidence is that algorithm 1 is the best choice for current cores. Wilco ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH][ARM] Switch to default sched pressure algorithm 2019-07-30 15:16 ` Wilco Dijkstra @ 2019-10-10 21:38 ` Ramana Radhakrishnan 2019-10-11 17:33 ` Wilco Dijkstra 0 siblings, 1 reply; 18+ messages in thread From: Ramana Radhakrishnan @ 2019-10-10 21:38 UTC (permalink / raw) To: Wilco Dijkstra Cc: Richard Earnshaw, Ramana Radhakrishnan, Christophe Lyon, GCC Patches, nd, Kyrylo Tkachov On Tue, Jul 30, 2019 at 4:16 PM Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote: > > Hi all, > > >On 30/07/2019 10:31, Ramana Radhakrishnan wrote: > >> On 30/07/2019 10:08, Christophe Lyon wrote: > > >>> Hi Wilco, > >>> > >>> Do you know which benchmarks were used when this was checked-in? > >>> It isn't clear from > >>> https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00706.html > >> > >> It was from my time in Linaro and thus would have been a famous embedded > >> benchmark, coremark , spec2000 - all tested probably on cortex-a9 and > >> Cortex-A15. In addition to this I would like to see what the impact of > >> this is on something like Cortex-A53 as the issue rates are likely to be > >> different on the schedulers causing different behaviour. > > Obviously there are differences between various schedulers, but the general > issue is that register pressure is increased many times beyond the spilling limit > (a few cases I looked at had a pressure well over 120 when there are only 14 > integer registers - this causes panic spilling in the register allocator). > > In fact the spilling overhead between the 2 algorithms is almost identical on > Cortex-A53 and Cortex-A57, so the issue isn't directly related to the pipeline > model used. It seems more related to the scheduler being too aggressive > and not caring about register pressure at all (for example lifting a load 100 > instructions before its use so it must be spilled). In those days it would have been the Cortex-A8, Cortex-A9 schedulers and the Cortex-A15 schedulers and IIRC the benchmarking would have been mostly on a Cortex-A9 board or on some Cortex-A15 boards we had (long gone now) inside Arm. Can you see what happens with the Cortex-A8 or Cortex-A9 schedulers to spread the range across some v7-a CPUs as well ? While they aren't that popular today I would suggest you look at them because the defaults for v7-a are still to use the Cortex-A8 scheduler and the Cortex-A9 scheduler might well also get used in places given the availability of hardware. > > >> I don't have all the notes today for that - maybe you can look into the > >> linaro wiki. > >> > >> I am concerned about taking this patch in without some more data across > >> a variety of cores. > >> > > > > My concern is the original patch > > (https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00706.html) is lacking in > > any real detail as to the reasons for the choice of the second algorithm > > over the first. > > > > - It's not clear what the win was > > - It's not clear what outliers there were and whether they were significant. > > > > And finally, it's not clear if, 7 years later, this is still the best > > choice. > > > > If the second algorithm really is better, why is no other target using > > it by default? > > > > I think we need a bit more information (both ways). In particular I'm > > concerned not just by the overall benchmark average, but also the amount > > of variance across the benchmarks. I think the default needs to avoid > > significant outliers if at all possible, even if it is marginally less > > good on the average. > > The results clearly show that algorithm 1 works best on Arm today - I haven't > seen a single benchmark where algorithm 2 results in less spilling. We could > tune algorithm 2 so it switches back to algorithm 1 when register pressure is > high or a basic block is large. However until it is fixed, the evidence is that > algorithm 1 is the best choice for current cores. I'd be happy to move this forward if you could show if there is no *increase* in spills for the same range of benchmarks that you are doing for the Cortex-A8 and Cortex-A9 schedulers. Sorry about the time it has taken. I've been a bit otherwise occupied recently. regards Ramana > > Wilco ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH][ARM] Switch to default sched pressure algorithm 2019-10-10 21:38 ` Ramana Radhakrishnan @ 2019-10-11 17:33 ` Wilco Dijkstra 2019-10-11 21:44 ` Wilco Dijkstra 2019-10-12 0:58 ` Ramana Radhakrishnan 0 siblings, 2 replies; 18+ messages in thread From: Wilco Dijkstra @ 2019-10-11 17:33 UTC (permalink / raw) To: Ramana Radhakrishnan Cc: Richard Earnshaw, Ramana Radhakrishnan, Christophe Lyon, GCC Patches, nd, Kyrylo Tkachov Hi Ramana, > Can you see what happens with the Cortex-A8 or Cortex-A9 schedulers to > spread the range across some v7-a CPUs as well ? While they aren't that popular today I > would suggest you look at them because the defaults for v7-a are still to use the > Cortex-A8 scheduler and the Cortex-A9 scheduler might well also get used in places given > the availability of hardware. The results are practically identical to Cortex-A53 and A57 - there is a huge codesize win across the board on SPEC2006, there isn't a single benchmark that is larger (ie. more spilling). > I'd be happy to move this forward if you could show if there is no *increase* in spills > for the same range of benchmarks that you are doing for the Cortex-A8 and Cortex-A9 > schedulers. There certainly isn't. I don't think results like these could be any more one-sided, it's a significant win for every single benchmark, both for codesize and performance! What isn't clear is whether something has gone horribly wrong in the scheduler which could be fixed/reverted, but as it is right now I can't see it being useful at all. This means we should also reevaluate whether pressure scheduling now hurts AArch64 too. Cheers, Wilco ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH][ARM] Switch to default sched pressure algorithm 2019-10-11 17:33 ` Wilco Dijkstra @ 2019-10-11 21:44 ` Wilco Dijkstra 2019-10-12 1:07 ` Ramana Radhakrishnan 2019-10-12 0:58 ` Ramana Radhakrishnan 1 sibling, 1 reply; 18+ messages in thread From: Wilco Dijkstra @ 2019-10-11 21:44 UTC (permalink / raw) To: Ramana Radhakrishnan Cc: Richard Earnshaw, Ramana Radhakrishnan, Christophe Lyon, GCC Patches, nd, Kyrylo Tkachov Hi, > the defaults for v7-a are still to use the > Cortex-A8 scheduler I missed that part, but that's a serious bug btw - Cortex-A8 is 15 years old now so way beyond obsolete. Even Cortex-A53 is ancient now, but it has an accurate scheduler that performs surprisingly well on both in-order and out-of-order implementations. Cheers, Wilco ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH][ARM] Switch to default sched pressure algorithm 2019-10-11 21:44 ` Wilco Dijkstra @ 2019-10-12 1:07 ` Ramana Radhakrishnan 0 siblings, 0 replies; 18+ messages in thread From: Ramana Radhakrishnan @ 2019-10-12 1:07 UTC (permalink / raw) To: Wilco Dijkstra Cc: Richard Earnshaw, Ramana Radhakrishnan, Christophe Lyon, GCC Patches, nd, Kyrylo Tkachov On Fri, Oct 11, 2019 at 10:42 PM Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote: > > Hi, > > > the defaults for v7-a are still to use the > > Cortex-A8 scheduler > > I missed that part, but that's a serious bug btw - Cortex-A8 is 15 years old now so > way beyond obsolete. Even Cortex-A53 is ancient now, but it has an accurate scheduler > that performs surprisingly well on both in-order and out-of-order implementations. On armv8-a we do use cortex-a53 as the default scheduler in the AArch32 backend. regards Ramana > > Cheers, > Wilco ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH][ARM] Switch to default sched pressure algorithm 2019-10-11 17:33 ` Wilco Dijkstra 2019-10-11 21:44 ` Wilco Dijkstra @ 2019-10-12 0:58 ` Ramana Radhakrishnan 2019-10-15 18:05 ` Christophe Lyon 1 sibling, 1 reply; 18+ messages in thread From: Ramana Radhakrishnan @ 2019-10-12 0:58 UTC (permalink / raw) To: Wilco Dijkstra Cc: Richard Earnshaw, Ramana Radhakrishnan, Christophe Lyon, GCC Patches, nd, Kyrylo Tkachov On Fri, Oct 11, 2019 at 6:19 PM Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote: > > Hi Ramana, > > > Can you see what happens with the Cortex-A8 or Cortex-A9 schedulers to > > spread the range across some v7-a CPUs as well ? While they aren't that popular today I > > would suggest you look at them because the defaults for v7-a are still to use the > > Cortex-A8 scheduler and the Cortex-A9 scheduler might well also get used in places given > > the availability of hardware. > > The results are practically identical to Cortex-A53 and A57 - there is a huge codesize win > across the board on SPEC2006, there isn't a single benchmark that is larger (ie. more > spilling). > > > I'd be happy to move this forward if you could show if there is no *increase* in spills > > for the same range of benchmarks that you are doing for the Cortex-A8 and Cortex-A9 > > schedulers. > > There certainly isn't. I don't think results like these could be any more one-sided, it's a > significant win for every single benchmark, both for codesize and performance! > Ok go ahead - please be sensitive to testsuite regressions. Ramana > What isn't clear is whether something has gone horribly wrong in the scheduler which > could be fixed/reverted, but as it is right now I can't see it being useful at all. This means > we should also reevaluate whether pressure scheduling now hurts AArch64 too. > > Cheers, > Wilco ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH][ARM] Switch to default sched pressure algorithm 2019-10-12 0:58 ` Ramana Radhakrishnan @ 2019-10-15 18:05 ` Christophe Lyon 2019-10-16 12:51 ` Wilco Dijkstra 0 siblings, 1 reply; 18+ messages in thread From: Christophe Lyon @ 2019-10-15 18:05 UTC (permalink / raw) To: Ramana Radhakrishnan Cc: Wilco Dijkstra, Richard Earnshaw, Ramana Radhakrishnan, GCC Patches, nd, Kyrylo Tkachov On Sat, 12 Oct 2019 at 02:52, Ramana Radhakrishnan <ramana.gcc@googlemail.com> wrote: > > On Fri, Oct 11, 2019 at 6:19 PM Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote: > > > > Hi Ramana, > > > > > Can you see what happens with the Cortex-A8 or Cortex-A9 schedulers to > > > spread the range across some v7-a CPUs as well ? While they aren't that popular today I > > > would suggest you look at them because the defaults for v7-a are still to use the > > > Cortex-A8 scheduler and the Cortex-A9 scheduler might well also get used in places given > > > the availability of hardware. > > > > The results are practically identical to Cortex-A53 and A57 - there is a huge codesize win > > across the board on SPEC2006, there isn't a single benchmark that is larger (ie. more > > spilling). > > > > > I'd be happy to move this forward if you could show if there is no *increase* in spills > > > for the same range of benchmarks that you are doing for the Cortex-A8 and Cortex-A9 > > > schedulers. > > > > There certainly isn't. I don't think results like these could be any more one-sided, it's a > > significant win for every single benchmark, both for codesize and performance! > > > > Ok go ahead - please be sensitive to testsuite regressions. > Hi Wilco, I've noticed that your patch caused a regression: FAIL: gcc.dg/tree-prof/pr77698.c scan-rtl-dump-times alignments "internal loop alignment added" 1 Christophe when the compiler is configured --with-mode thumb (or forcing -mthumb when running the tests) > Ramana > > > > What isn't clear is whether something has gone horribly wrong in the scheduler which > > could be fixed/reverted, but as it is right now I can't see it being useful at all. This means > > we should also reevaluate whether pressure scheduling now hurts AArch64 too. > > > > Cheers, > > Wilco ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH][ARM] Switch to default sched pressure algorithm 2019-10-15 18:05 ` Christophe Lyon @ 2019-10-16 12:51 ` Wilco Dijkstra 2019-10-16 15:43 ` Richard Earnshaw (lists) 0 siblings, 1 reply; 18+ messages in thread From: Wilco Dijkstra @ 2019-10-16 12:51 UTC (permalink / raw) To: Christophe Lyon, Ramana Radhakrishnan Cc: Richard Earnshaw, Ramana Radhakrishnan, GCC Patches, nd, Kyrylo Tkachov Hi Christophe, > I've noticed that your patch caused a regression: > FAIL: gcc.dg/tree-prof/pr77698.c scan-rtl-dump-times alignments > "internal loop alignment added" 1 That's just a testism - it only tests for loop alignment and doesn't consider the possibility of the loop being jumped into like this: .L17: adds r0, r0, #1 b .L27 .L6: ldr r4, [r2, #12] adds r0, r0, #4 ldr lr, [r1] str lr, [r3, r4, lsl #2] ldr r4, [r2, #12] ldr lr, [r1] str lr, [r3, r4, lsl #2] ldr r4, [r2, #12] ldr lr, [r1] str lr, [r3, r4, lsl #2] .L27: ldr r4, [r2, #12] cmp ip, r0 ldr lr, [r1] str lr, [r3, r4, lsl #2] bne .L6 pop {r4, pc} It seems minor changes in scheduling allows blocks to be commoned or not. The underlying issue is that commoning like this should not be allowed on blocks with different profile stats - particularly on loops where it inhibits scheduling of the loop itself. Cheers, Wilco ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH][ARM] Switch to default sched pressure algorithm 2019-10-16 12:51 ` Wilco Dijkstra @ 2019-10-16 15:43 ` Richard Earnshaw (lists) 2019-12-19 13:26 ` Wilco Dijkstra 0 siblings, 1 reply; 18+ messages in thread From: Richard Earnshaw (lists) @ 2019-10-16 15:43 UTC (permalink / raw) To: Wilco Dijkstra, Christophe Lyon, Ramana Radhakrishnan Cc: Ramana Radhakrishnan, GCC Patches, nd, Kyrylo Tkachov On 16/10/2019 13:13, Wilco Dijkstra wrote: > Hi Christophe, > >> I've noticed that your patch caused a regression: >> FAIL: gcc.dg/tree-prof/pr77698.c scan-rtl-dump-times alignments >> "internal loop alignment added" 1 > > That's just a testism - it only tests for loop alignment and doesn't > consider the possibility of the loop being jumped into like this: > > .L17: >     adds   r0, r0, #1 >     b    .L27 > .L6: >     ldr   r4, [r2, #12] >     adds   r0, r0, #4 >     ldr   lr, [r1] >     str   lr, [r3, r4, lsl #2] >     ldr   r4, [r2, #12] >     ldr   lr, [r1] >     str   lr, [r3, r4, lsl #2] >     ldr   r4, [r2, #12] >     ldr   lr, [r1] >     str   lr, [r3, r4, lsl #2] > .L27: >     ldr   r4, [r2, #12] >     cmp   ip, r0 >     ldr   lr, [r1] >     str   lr, [r3, r4, lsl #2] >     bne   .L6 >     pop   {r4, pc} > > It seems minor changes in scheduling allows blocks to be commoned or not. > The underlying issue is that commoning like this should not be allowed on blocks > with different profile stats - particularly on loops where it inhibits scheduling of > the loop itself. > > Cheers, > Wilco > So what's your proposed solution? Leaving the test failing is not an option. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH][ARM] Switch to default sched pressure algorithm 2019-10-16 15:43 ` Richard Earnshaw (lists) @ 2019-12-19 13:26 ` Wilco Dijkstra 0 siblings, 0 replies; 18+ messages in thread From: Wilco Dijkstra @ 2019-12-19 13:26 UTC (permalink / raw) To: Richard Earnshaw, Christophe Lyon; +Cc: GCC Patches Hi, >> I've noticed that your patch caused a regression: >> FAIL: gcc.dg/tree-prof/pr77698.c scan-rtl-dump-times alignments >> "internal loop alignment added" 1 I've created https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93007 Cheers, Wilco ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH][ARM] Switch to default sched pressure algorithm 2019-07-29 17:20 [PATCH][ARM] Switch to default sched pressure algorithm Wilco Dijkstra 2019-07-30 9:29 ` Christophe Lyon @ 2019-08-19 15:53 ` Wilco Dijkstra 2019-09-02 12:11 ` Wilco Dijkstra 1 sibling, 1 reply; 18+ messages in thread From: Wilco Dijkstra @ 2019-08-19 15:53 UTC (permalink / raw) To: GCC Patches; +Cc: nd, Richard Earnshaw, Kyrylo Tkachov ping Currently the Arm backend selects the alternative sched pressure algorithm. The issue is that this doesn't take register pressure into account, and so it causes significant additional spilling on Arm where there are only 14 allocatable registers. SPEC2006 shows significant codesize reduction with the default pressure algorithm, so switch back to that. PR77308 shows ~800 fewer instructions. SPECINT2006 is ~0.6% faster on Cortex-A57 together with the other DImode patches. Overall SPEC codesize is 1.1% smaller. Bootstrap & regress OK on arm-none-linux-gnueabihf --with-cpu=cortex-a57 ChangeLog: 2019-07-29 Wilco Dijkstra <wdijkstr@arm.com> * config/arm/arm.c (arm_option_override): Don't override sched pressure algorithm. -- diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 81286cadf32f908e045d704128c5e06842e0cc92..628cf02f23fb29392a63d87f561c3ee2fb73a515 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -3575,11 +3575,6 @@ arm_option_override (void) if (use_neon_for_64bits == 1) prefer_neon_for_64bits = true; - /* Use the alternative scheduling-pressure algorithm by default. */ - maybe_set_param_value (PARAM_SCHED_PRESSURE_ALGORITHM, SCHED_PRESSURE_MODEL, - global_options.x_param_values, - global_options_set.x_param_values); - /* Look through ready list and all of queue for instructions relevant for L2 auto-prefetcher. */ int param_sched_autopref_queue_depth; ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH][ARM] Switch to default sched pressure algorithm 2019-08-19 15:53 ` Wilco Dijkstra @ 2019-09-02 12:11 ` Wilco Dijkstra 2019-09-09 17:05 ` Wilco Dijkstra 0 siblings, 1 reply; 18+ messages in thread From: Wilco Dijkstra @ 2019-09-02 12:11 UTC (permalink / raw) To: GCC Patches; +Cc: nd, Richard Earnshaw, Kyrylo Tkachov ping Currently the Arm backend selects the alternative sched pressure algorithm. The issue is that this doesn't take register pressure into account, and so it causes significant additional spilling on Arm where there are only 14 allocatable registers. SPEC2006 shows significant codesize reduction with the default pressure algorithm, so switch back to that. PR77308 shows ~800 fewer instructions. SPECINT2006 is ~0.6% faster on Cortex-A57 together with the other DImode patches. Overall SPEC codesize is 1.1% smaller. Bootstrap & regress OK on arm-none-linux-gnueabihf --with-cpu=cortex-a57 ChangeLog: 2019-07-29 Wilco Dijkstra <wdijkstr@arm.com> * config/arm/arm.c (arm_option_override): Don't override sched pressure algorithm. -- diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 81286cadf32f908e045d704128c5e06842e0cc92..628cf02f23fb29392a63d87f561c3ee2fb73a515 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -3575,11 +3575,6 @@ arm_option_override (void) if (use_neon_for_64bits == 1) prefer_neon_for_64bits = true; - /* Use the alternative scheduling-pressure algorithm by default. */ - maybe_set_param_value (PARAM_SCHED_PRESSURE_ALGORITHM, SCHED_PRESSURE_MODEL, - global_options.x_param_values, - global_options_set.x_param_values); - /* Look through ready list and all of queue for instructions relevant for L2 auto-prefetcher. */ int param_sched_autopref_queue_depth; ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH][ARM] Switch to default sched pressure algorithm 2019-09-02 12:11 ` Wilco Dijkstra @ 2019-09-09 17:05 ` Wilco Dijkstra 2019-10-10 17:25 ` Wilco Dijkstra 0 siblings, 1 reply; 18+ messages in thread From: Wilco Dijkstra @ 2019-09-09 17:05 UTC (permalink / raw) To: GCC Patches; +Cc: nd, Richard Earnshaw, Kyrylo Tkachov ping Currently the Arm backend selects the alternative sched pressure algorithm. The issue is that this doesn't take register pressure into account, and so it causes significant additional spilling on Arm where there are only 14 allocatable registers. SPEC2006 shows significant codesize reduction with the default pressure algorithm, so switch back to that. PR77308 shows ~800 fewer instructions. SPECINT2006 is ~0.6% faster on Cortex-A57 together with the other DImode patches. Overall SPEC codesize is 1.1% smaller. Bootstrap & regress OK on arm-none-linux-gnueabihf --with-cpu=cortex-a57 ChangeLog: 2019-07-29 Wilco Dijkstra <wdijkstr@arm.com> * config/arm/arm.c (arm_option_override): Don't override sched pressure algorithm. -- diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 81286cadf32f908e045d704128c5e06842e0cc92..628cf02f23fb29392a63d87f561c3ee2fb73a515 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -3575,11 +3575,6 @@ arm_option_override (void) if (use_neon_for_64bits == 1) prefer_neon_for_64bits = true; - /* Use the alternative scheduling-pressure algorithm by default. */ - maybe_set_param_value (PARAM_SCHED_PRESSURE_ALGORITHM, SCHED_PRESSURE_MODEL, - global_options.x_param_values, - global_options_set.x_param_values); - /* Look through ready list and all of queue for instructions relevant for L2 auto-prefetcher. */ int param_sched_autopref_queue_depth; ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH][ARM] Switch to default sched pressure algorithm 2019-09-09 17:05 ` Wilco Dijkstra @ 2019-10-10 17:25 ` Wilco Dijkstra 0 siblings, 0 replies; 18+ messages in thread From: Wilco Dijkstra @ 2019-10-10 17:25 UTC (permalink / raw) To: GCC Patches, Richard Earnshaw, Kyrylo Tkachov, Richard Sandiford; +Cc: nd ping Currently the Arm backend selects the alternative sched pressure algorithm. The issue is that this doesn't take register pressure into account, and so it causes significant additional spilling on Arm where there are only 14 allocatable registers. SPEC2006 shows significant codesize reduction with the default pressure algorithm, so switch back to that. PR77308 shows ~800 fewer instructions. SPECINT2006 is ~0.6% faster on Cortex-A57 together with the other DImode patches. Overall SPEC codesize is 1.1% smaller. Bootstrap & regress OK on arm-none-linux-gnueabihf --with-cpu=cortex-a57 ChangeLog: 2019-07-29 Wilco Dijkstra <wdijkstr@arm.com> * config/arm/arm.c (arm_option_override): Don't override sched pressure algorithm. -- diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 81286cadf32f908e045d704128c5e06842e0cc92..628cf02f23fb29392a63d87f561c3ee2fb73a515 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -3575,11 +3575,6 @@ arm_option_override (void) if (use_neon_for_64bits == 1) prefer_neon_for_64bits = true; - /* Use the alternative scheduling-pressure algorithm by default. */ - maybe_set_param_value (PARAM_SCHED_PRESSURE_ALGORITHM, SCHED_PRESSURE_MODEL, - global_options.x_param_values, - global_options_set.x_param_values); - /* Look through ready list and all of queue for instructions relevant for L2 auto-prefetcher. */ int param_sched_autopref_queue_depth; ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2019-12-19 13:22 UTC | newest] Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-07-29 17:20 [PATCH][ARM] Switch to default sched pressure algorithm Wilco Dijkstra 2019-07-30 9:29 ` Christophe Lyon 2019-07-30 9:34 ` Ramana Radhakrishnan 2019-07-30 12:53 ` Richard Earnshaw (lists) 2019-07-30 15:16 ` Wilco Dijkstra 2019-10-10 21:38 ` Ramana Radhakrishnan 2019-10-11 17:33 ` Wilco Dijkstra 2019-10-11 21:44 ` Wilco Dijkstra 2019-10-12 1:07 ` Ramana Radhakrishnan 2019-10-12 0:58 ` Ramana Radhakrishnan 2019-10-15 18:05 ` Christophe Lyon 2019-10-16 12:51 ` Wilco Dijkstra 2019-10-16 15:43 ` Richard Earnshaw (lists) 2019-12-19 13:26 ` Wilco Dijkstra 2019-08-19 15:53 ` Wilco Dijkstra 2019-09-02 12:11 ` Wilco Dijkstra 2019-09-09 17:05 ` Wilco Dijkstra 2019-10-10 17:25 ` Wilco Dijkstra
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).