* Re: [RFC] Selective scheduling pass
@ 2008-06-05 6:44 Steven Bosscher
2008-06-05 18:29 ` Andrey Belevantsev
0 siblings, 1 reply; 9+ messages in thread
From: Steven Bosscher @ 2008-06-05 6:44 UTC (permalink / raw)
To: Andrey Belevantsev, gcc-patches; +Cc: Jim Wilson, Vladimir Makarov
(xf. http://gcc.gnu.org/ml/gcc-patches/2008-06/msg00114.html)
Hi Andrey,
Thanks for this very nice work. I was wondering if you could say a
little more about the performance impact of the selective scheduler on
the generated code...
You posted SPEC scores and there are as many ups as there are downs in
there. Where do the regressions come from? The 3-5% on e.g. galgel
and vortex are quite substantial slowdowns, but do you know why they
happen?
Also, you post a new scheduler and a set of target tunings in one set
of patches. I would like to know what the performance impact is of
just the target changes alone. That is, what happens to e.g. SPEC
scores for ia64 with just the tweaks and tunings patch
(http://gcc.gnu.org/ml/gcc-patches/2008-06/msg00117.html)? I assume
those target changes alone (the ones not related to sel-sched) also
have a positive performance impact. Since you've globbed everything
into one patch set, it's impossible to tell how much of the
performance changes can be attributed to sel-sched, and how much is
just target tweaks...
Gr.
Steven
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC] Selective scheduling pass
2008-06-05 6:44 [RFC] Selective scheduling pass Steven Bosscher
@ 2008-06-05 18:29 ` Andrey Belevantsev
2008-06-05 23:38 ` Steven Bosscher
0 siblings, 1 reply; 9+ messages in thread
From: Andrey Belevantsev @ 2008-06-05 18:29 UTC (permalink / raw)
To: Steven Bosscher; +Cc: gcc-patches, Jim Wilson, Vladimir Makarov
Steven Bosscher wrote:
> (xf. http://gcc.gnu.org/ml/gcc-patches/2008-06/msg00114.html)
>
> Hi Andrey,
>
> Thanks for this very nice work. I was wondering if you could say a
> little more about the performance impact of the selective scheduler on
> the generated code...
Thank you Steven!
> You posted SPEC scores and there are as many ups as there are downs in
> there. Where do the regressions come from? The 3-5% on e.g. galgel
> and vortex are quite substantial slowdowns, but do you know why they
> happen?
We have tried to fix all regressions for which we have found a culprit.
Galgel slows down because of two things. First, we cannot use cselib
as the ebb scheduler does, because it works only on extended basic
blocks. We tried to support cselib for multiple fences, analogously to
what we did with target contexts, but we didn't manage to make it work
well. I think that we can turn cselib on at least for those regions
that are ebbs.
The second problem of galgel is that the heuristic that restricts
pipelining on small loops doesn't work well in this case. However, we
have turned it on because overall it provided a speedup. I would note
that on SPEC FP, to which went the most tuning, there are far more ups
then downs :)
> Also, you post a new scheduler and a set of target tunings in one set
> of patches. I would like to know what the performance impact is of
> just the target changes alone. That is, what happens to e.g. SPEC
> scores for ia64 with just the tweaks and tunings patch
> (http://gcc.gnu.org/ml/gcc-patches/2008-06/msg00117.html)? I assume
> those target changes alone (the ones not related to sel-sched) also
> have a positive performance impact. Since you've globbed everything
> into one patch set, it's impossible to tell how much of the
> performance changes can be attributed to sel-sched, and how much is
> just target tweaks...
AFAIR, the target tunings gave around 1% when we have tested it on -O2.
That was a couple of months ago. We will retest tonight to get fresh
numbers for this.
Andrey
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC] Selective scheduling pass
2008-06-05 18:29 ` Andrey Belevantsev
@ 2008-06-05 23:38 ` Steven Bosscher
0 siblings, 0 replies; 9+ messages in thread
From: Steven Bosscher @ 2008-06-05 23:38 UTC (permalink / raw)
To: Andrey Belevantsev; +Cc: gcc-patches, Jim Wilson, Vladimir Makarov
On Thu, Jun 5, 2008 at 8:29 PM, Andrey Belevantsev <abel@ispras.ru> wrote:
>> Also, you post a new scheduler and a set of target tunings in one set
>> of patches. I would like to know what the performance impact is of
>> just the target changes alone. That is, what happens to e.g. SPEC
>> scores for ia64 with just the tweaks and tunings patch
>> (http://gcc.gnu.org/ml/gcc-patches/2008-06/msg00117.html)? I assume
>> those target changes alone (the ones not related to sel-sched) also
>> have a positive performance impact. Since you've globbed everything
>> into one patch set, it's impossible to tell how much of the
>> performance changes can be attributed to sel-sched, and how much is
>> just target tweaks...
>
> AFAIR, the target tunings gave around 1% when we have tested it on -O2.
> That was a couple of months ago. We will retest tonight to get fresh
> numbers for this.
Great.
Another thought/suggestion/comment:
Does the selective scheduler also make the register renaming pass
obsolete (pass_regrename and pass_cprop_hardreg)? I would expect it
does, since the scheduler handles the register renaming itself, as far
as I understand.
These two passes are quite expensive (or at least they were when I
last looked at them). You could perhaps buy yourself some compile
time back if you can demonstrate you don't need to run these passes if
you do post-regalloc selective scheduling.
Gr.
Steven
^ permalink raw reply [flat|nested] 9+ messages in thread
* [RFC] Selective scheduling pass
@ 2008-06-03 14:24 Andrey Belevantsev
2008-06-03 22:03 ` Vladimir Makarov
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Andrey Belevantsev @ 2008-06-03 14:24 UTC (permalink / raw)
To: GCC Patches; +Cc: Jim Wilson, Vladimir Makarov
Hello,
The patches in this thread introduce selective scheduler in GCC,
implemented by myself, Dmitry Melnik, Dmitry Zhurikhin, Alexander
Monakov, and Maxim Kuvyrkov while he was at ISP RAS. Selective
scheduler is aimed at scheduling eager targets such as ia64, power6, and
cell. The implementation contains both the scheduler and the software
pipeliner, which can be used on loops with control flow not handled by
SMS. The scheduler can work either before or after register allocation,
but it is currently tuned to work after.
The scheduler was bootstrapped and tested on ia64, with all default
languages, both as a first and as a second scheduler. It was also
bootstrapped with c, c++, and fortran enabled on ppc64 and x86-64.
On ia64, test results on SPEC2k FP comparing -O3 -ffast-math on trunk
and sel-sched branch show 3.8% speedup on average, SPEC INT shows both
small speedups and regressions, staying around neutral in average:
168.wupwise 513 552 7,60%
171.swim 757 772 1,98%
172.mgrid 570 643 12,81%
173.applu 503 524 4,17%
177.mesa 796 795 -0,13%
178.galgel 814 787 -3,32%
179.art 1990 2098 5,43%
183.equake 513 569 10,92%
187.facerec 958 991 3,44%
188.ammp 765 775 1,31%
189.lucas 860 869 1,05%
191.fma3d 549 536 -2,37%
200.sixtrack 300 323 7,67%
301.apsi 522 546 4,60%
Geomean 673,97 699,87 3,84%
164.gzip 683 682 -0,15%
175.vpr 814 802 -1,47%
176.gcc 1080 1069 -1,02%
181.mcf 701 708 1,00%
186.crafty 872 855 -1,95%
197.parser 729 728 -0,14%
252.eon 793 785 -1,01%
253.perlbmk 824 839 1,82%
254.gap 558 569 1,97%
255.vortex 1012 966 -4,55%
256.bzip2 758 762 0,53%
300.twolf 1005 1015 1,00%
Geomean 806,04 803,25 -0,35%
On power6, Revital Eres saw speedups on several tests; additional tuning
is required to get good results there, which is complicated because we
don't have power6. On cell, there was some third-party testing in 2007,
showing 4-6% speedups, but I don't have more detailed information.
Compile time slowdown measured with --enable-checking=assert is quite
significant -- about 12% on spec int and about 18% on spec fp and
cc1-i-files collection. For this reason, we have enabled selective
scheduler by default at -O3 on ia64 and disabled by default on other
targets.
Our current plan is to work on further compile time improvements and
performance tuning for ppc and cell, hopefully with the help of IBM
Haifa folks. If we will complete this work before the end of stage2,
then we can enable selective scheduling at -O3 also for ppc in 4.4. In
the mid-term, we will work on removing the ebb scheduler, as it is now
used on ia64 only and will be superseded by selective scheduler when
we'll further improve compile time.
Andrey
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC] Selective scheduling pass
2008-06-03 14:24 Andrey Belevantsev
@ 2008-06-03 22:03 ` Vladimir Makarov
2008-06-04 16:55 ` Mark Mitchell
2008-06-05 3:45 ` Seongbae Park (박성배, 朴成培)
2 siblings, 0 replies; 9+ messages in thread
From: Vladimir Makarov @ 2008-06-03 22:03 UTC (permalink / raw)
To: Andrey Belevantsev; +Cc: GCC Patches, Jim Wilson
Andrey Belevantsev wrote:
> Hello,
>
> The patches in this thread introduce selective scheduler in GCC,
> implemented by myself, Dmitry Melnik, Dmitry Zhurikhin, Alexander
> Monakov, and Maxim Kuvyrkov while he was at ISP RAS. Selective
> scheduler is aimed at scheduling eager targets such as ia64, power6,
> and cell. The implementation contains both the scheduler and the
> software pipeliner, which can be used on loops with control flow not
> handled by SMS. The scheduler can work either before or after
> register allocation, but it is currently tuned to work after.
>
> The scheduler was bootstrapped and tested on ia64, with all default
> languages, both as a first and as a second scheduler. It was also
> bootstrapped with c, c++, and fortran enabled on ppc64 and x86-64.
>
> On ia64, test results on SPEC2k FP comparing -O3 -ffast-math on trunk
> and sel-sched branch show 3.8% speedup on average, SPEC INT shows both
> small speedups and regressions, staying around neutral in average:
>
Congratulation! I followed the project for a long time. Finally some
useful milestone is achieved and you have got a pretty big improvement.
The scheduling algorithm is superior than what we had because it permits
to improve insn schedules on all execution paths by insn cloning and
other transformations.
> On power6, Revital Eres saw speedups on several tests; additional
> tuning is required to get good results there, which is complicated
> because we don't have power6. On cell, there was some third-party
> testing in 2007, showing 4-6% speedups, but I don't have more detailed
> information.
>
> Compile time slowdown measured with --enable-checking=assert is quite
> significant -- about 12% on spec int and about 18% on spec fp and
> cc1-i-files collection. For this reason, we have enabled selective
> scheduler by default at -O3 on ia64 and disabled by default on other
> targets.
>
Itanium is pretty specific target. It would be interesting to know how
big a slowdown for ppc.
> Our current plan is to work on further compile time improvements and
> performance tuning for ppc and cell, hopefully with the help of IBM
> Haifa folks. If we will complete this work before the end of stage2,
> then we can enable selective scheduling at -O3 also for ppc in 4.4.
> In the mid-term, we will work on removing the ebb scheduler, as it is
> now used on ia64 only and will be superseded by selective scheduler
> when we'll further improve compile time.
>
I think we should rid of EBB scheduler finally. You could try to
improve compile-time problem preventing some transformations in the new
scheduler in -O2 mode.
If you solve compile-time problem, I think we should work on removing
haifa-scheduler too to have just one insn scheduler. But as I
understand it will not happen soon.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC] Selective scheduling pass
2008-06-03 14:24 Andrey Belevantsev
2008-06-03 22:03 ` Vladimir Makarov
@ 2008-06-04 16:55 ` Mark Mitchell
2008-06-04 20:50 ` Andrey Belevantsev
2008-06-05 3:45 ` Seongbae Park (박성배, 朴成培)
2 siblings, 1 reply; 9+ messages in thread
From: Mark Mitchell @ 2008-06-04 16:55 UTC (permalink / raw)
To: Andrey Belevantsev; +Cc: GCC Patches, Jim Wilson, Vladimir Makarov
Andrey Belevantsev wrote:
> The patches in this thread introduce selective scheduler in GCC,
> implemented by myself, Dmitry Melnik, Dmitry Zhurikhin, Alexander
> Monakov, and Maxim Kuvyrkov while he was at ISP RAS. Selective
> scheduler is aimed at scheduling eager targets such as ia64, power6, and
> cell. The implementation contains both the scheduler and the software
> pipeliner, which can be used on loops with control flow not handled by
> SMS. The scheduler can work either before or after register allocation,
> but it is currently tuned to work after.
>
> On ia64, test results on SPEC2k FP comparing -O3 -ffast-math on trunk
> and sel-sched branch show 3.8% speedup on average, SPEC INT shows both
> small speedups and regressions, staying around neutral in average:
That's a very good result. Congratulations!
I know that this scheduler is aimed at CPUs like the ones you mention
above. However, would it function correctly on other CPUs with more
"traditional" characteristics, like older ARM, MIPS, or x86 cores? And,
would it be reasonably possible to tune it for those CPUs as well?
As with the IRA allocator, I'd like to avoid having multiple schedulers
in GCC. (I know we've done that for a while, but I still think it's
undesirable.) So, I'd like to see if we can get this to work well
across all of the Primary and Secondary CPUs, and then just make it "the
GCC scheduler" rather than an optional thing enabled at some
optimization levels on some CPUs.
Do you think that's feasible? Or do you think that there are inherent
aspects of the algorithm that mean that we need to have this new
scheduler for one class of CPUs and the old scheduler for the other
class? Is there any way to make the new scheduler do a reasonable job
with the existing descriptions in GCC, so that port maintainers can tune
later, or is a level of effort like that for Itanium require
> Compile time slowdown measured with --enable-checking=assert is quite
> significant -- about 12% on spec int and about 18% on spec fp and
> cc1-i-files collection. For this reason, we have enabled selective
> scheduler by default at -O3 on ia64 and disabled by default on other
> targets.
Do you understand what's causing the compile-time slowdown?
Thanks,
--
Mark Mitchell
CodeSourcery
mark@codesourcery.com
(650) 331-3385 x713
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC] Selective scheduling pass
2008-06-04 16:55 ` Mark Mitchell
@ 2008-06-04 20:50 ` Andrey Belevantsev
0 siblings, 0 replies; 9+ messages in thread
From: Andrey Belevantsev @ 2008-06-04 20:50 UTC (permalink / raw)
To: Mark Mitchell; +Cc: GCC Patches, Jim Wilson, Vladimir Makarov
Mark Mitchell wrote:
> That's a very good result. Congratulations!
Thank you!
> I know that this scheduler is aimed at CPUs like the ones you mention
> above. However, would it function correctly on other CPUs with more
> "traditional" characteristics, like older ARM, MIPS, or x86 cores? And,
> would it be reasonably possible to tune it for those CPUs as well?
When a target doesn't do anything "fancy" in scheduler hooks, everything
should just work (modulo bugs, of course; we've tried only ppc64 and
x86-64). In case a target saves some information describing scheduler's
state, simple hooks manipulating this data should be implemented, like
we did for the rs6000 port.
> As with the IRA allocator, I'd like to avoid having multiple schedulers
> in GCC. (I know we've done that for a while, but I still think it's
> undesirable.) So, I'd like to see if we can get this to work well
> across all of the Primary and Secondary CPUs, and then just make it "the
> GCC scheduler" rather than an optional thing enabled at some
> optimization levels on some CPUs.
This is our goal as well, and I think it can be done incrementally. We
are now working on the ppc performance. Then we need to tune the
scheduler so that for traditional targets it is no worse in performance
and the slowdown is reasonable, e.g. with disabling pipelining and
decreasing the scheduling window. The last thing to do is to speed up
the implementation so that for scheduling-eager targets with pipelining
enabled the slowdown will be acceptable for -O2.
Note that the selective scheduler does not subsume SMS, but complements
it, because SMS does better job for countable loops, but cannot handle
loops with control flow and with unknown number of iterations. So in
any case there will be two schedulers.
> Do you think that's feasible? Or do you think that there are inherent
> aspects of the algorithm that mean that we need to have this new
> scheduler for one class of CPUs and the old scheduler for the other
> class? Is there any way to make the new scheduler do a reasonable job
> with the existing descriptions in GCC, so that port maintainers can tune
> later, or is a level of effort like that for Itanium require
The ia64 backend is very complex, and we put a lot of efforts in tuning
it by itself -- you can see it in my other mail about target changes.
So I think that tuning for other targets will be simpler. The cell
results I mentioned in the mail were received from a guy who did the
tuning internally in Samsung, and AFAIR he didn't mentioned any
target-independent changes he had to do, but basically he just made it
working.
>> Compile time slowdown measured with --enable-checking=assert is quite
>> significant -- about 12% on spec int and about 18% on spec fp and
>> cc1-i-files collection. For this reason, we have enabled selective
>> scheduler by default at -O3 on ia64 and disabled by default on other
>> targets.
>
> Do you understand what's causing the compile-time slowdown?
The part that takes most time is the update of availability sets, as
this is the central part of the algorithm. Renaming is quite expensive
too, but we have tackled this limiting it only to several insns with the
largest priority. To make the updates faster, you need to build the
data dependence graph and to keep it up to date while scheduling.
Unfortunately, we didn't manage to do this during this project. The
first step towards this goal will be to make the dependence graph
classify the dependencies built on control/data, lhs/rhs,
register/memory etc. Then we can invent the mechanism of updating the
graph, which would not be trivial -- e.g. when an insn gets renamed, we
have introduced a register-register copy which can generate completely
new register dependencies that cannot be devised from existing ones.
Such a project is likely to make it to trunk on the next release cycle,
and that would correspond to the last step of the incremental approach
outlined above.
Yours, Andrey
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC] Selective scheduling pass
2008-06-03 14:24 Andrey Belevantsev
2008-06-03 22:03 ` Vladimir Makarov
2008-06-04 16:55 ` Mark Mitchell
@ 2008-06-05 3:45 ` Seongbae Park (박성배, 朴成培)
2008-06-05 13:49 ` Andrey Belevantsev
2 siblings, 1 reply; 9+ messages in thread
From: Seongbae Park (박성배, 朴成培) @ 2008-06-05 3:45 UTC (permalink / raw)
To: Andrey Belevantsev; +Cc: GCC Patches, Jim Wilson, Vladimir Makarov
On Tue, Jun 3, 2008 at 7:16 AM, Andrey Belevantsev <abel@ispras.ru> wrote:
> Hello,
>
> The patches in this thread introduce selective scheduler in GCC, implemented
> by myself, Dmitry Melnik, Dmitry Zhurikhin, Alexander Monakov, and Maxim
> Kuvyrkov while he was at ISP RAS. Selective scheduler is aimed at
> scheduling eager targets such as ia64, power6, and cell. The implementation
> contains both the scheduler and the software pipeliner, which can be used on
> loops with control flow not handled by SMS. The scheduler can work either
> before or after register allocation, but it is currently tuned to work
> after.
>
> The scheduler was bootstrapped and tested on ia64, with all default
> languages, both as a first and as a second scheduler. It was also
> bootstrapped with c, c++, and fortran enabled on ppc64 and x86-64.
>
> On ia64, test results on SPEC2k FP comparing -O3 -ffast-math on trunk and
> sel-sched branch show 3.8% speedup on average, SPEC INT shows both small
> speedups and regressions, staying around neutral in average:
>
> 168.wupwise 513 552 7,60%
> 171.swim 757 772 1,98%
> 172.mgrid 570 643 12,81%
> 173.applu 503 524 4,17%
> 177.mesa 796 795 -0,13%
> 178.galgel 814 787 -3,32%
> 179.art 1990 2098 5,43%
> 183.equake 513 569 10,92%
> 187.facerec 958 991 3,44%
> 188.ammp 765 775 1,31%
> 189.lucas 860 869 1,05%
> 191.fma3d 549 536 -2,37%
> 200.sixtrack 300 323 7,67%
> 301.apsi 522 546 4,60%
> Geomean 673,97 699,87 3,84%
>
> 164.gzip 683 682 -0,15%
> 175.vpr 814 802 -1,47%
> 176.gcc 1080 1069 -1,02%
> 181.mcf 701 708 1,00%
> 186.crafty 872 855 -1,95%
> 197.parser 729 728 -0,14%
> 252.eon 793 785 -1,01%
> 253.perlbmk 824 839 1,82%
> 254.gap 558 569 1,97%
> 255.vortex 1012 966 -4,55%
> 256.bzip2 758 762 0,53%
> 300.twolf 1005 1015 1,00%
> Geomean 806,04 803,25 -0,35%
Presumably this is with any profile feedback ?
If so, numbers look ok.
Have you tried it with profile feedback ?
Selective scheduling (and most other aggressive global scheduling algorithms)
can benefit quite a bit from profile feedback,
and tuning can be quite different for with and without profile feedback.
Seongbae
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC] Selective scheduling pass
2008-06-05 3:45 ` Seongbae Park (박성배, 朴成培)
@ 2008-06-05 13:49 ` Andrey Belevantsev
0 siblings, 0 replies; 9+ messages in thread
From: Andrey Belevantsev @ 2008-06-05 13:49 UTC (permalink / raw)
To: "Seongbae Park (박성배,
朴成培)"
Cc: GCC Patches, Jim Wilson, Vladimir Makarov
Seongbae Park (ë°ì±ë°°, æ´æå¹) wrote:
> Presumably this is with any profile feedback ?
> If so, numbers look ok.
You probably mean that the numbers are without profile feedback. This
is true.
> Have you tried it with profile feedback ?
> Selective scheduling (and most other aggressive global scheduling algorithms)
> can benefit quite a bit from profile feedback,
> and tuning can be quite different for with and without profile feedback.
No, we haven't tried that. I've got the impression that profile
optimizations are not of big importance to GCC developers, so we focused
on tuning without profile feedback. Nevertheless, we'll try SPEC with
profiling feedback tonight. I will be happy to discuss how the
scheduler can be tuned to use the profiling information -- will you
attend the summit btw?
Andrey
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2008-06-05 23:38 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-06-05 6:44 [RFC] Selective scheduling pass Steven Bosscher
2008-06-05 18:29 ` Andrey Belevantsev
2008-06-05 23:38 ` Steven Bosscher
-- strict thread matches above, loose matches on Subject: below --
2008-06-03 14:24 Andrey Belevantsev
2008-06-03 22:03 ` Vladimir Makarov
2008-06-04 16:55 ` Mark Mitchell
2008-06-04 20:50 ` Andrey Belevantsev
2008-06-05 3:45 ` Seongbae Park (박성배, 朴成培)
2008-06-05 13:49 ` Andrey Belevantsev
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).