Re: [RFC] Selective scheduling pass

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* Re: [RFC] Selective scheduling pass
@ 2008-06-05  6:44 Steven Bosscher
  2008-06-05 18:29 ` Andrey Belevantsev
  0 siblings, 1 reply; 9+ messages in thread
From: Steven Bosscher @ 2008-06-05  6:44 UTC (permalink / raw)
  To: Andrey Belevantsev, gcc-patches; +Cc: Jim Wilson, Vladimir Makarov

(xf. http://gcc.gnu.org/ml/gcc-patches/2008-06/msg00114.html)

Hi Andrey,

Thanks for this very nice work.  I was wondering if you could say a
little more about the performance impact of the selective scheduler on
the generated code...

You posted SPEC scores and there are as many ups as there are downs in
there.  Where do the regressions come from?  The 3-5% on e.g. galgel
and vortex are quite substantial slowdowns, but do you know why they
happen?

Also, you post a new scheduler and a set of target tunings in one set
of patches.  I would like to know what the performance impact is of
just the target changes alone.  That is, what happens to e.g. SPEC
scores for ia64 with just the tweaks and tunings patch
(http://gcc.gnu.org/ml/gcc-patches/2008-06/msg00117.html)?  I assume
those target changes alone (the ones not related to sel-sched) also
have a positive performance impact.  Since you've globbed everything
into one patch set, it's impossible to tell how much of the
performance changes can be attributed to sel-sched, and how much is
just target tweaks...

Gr.
Steven

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] Selective scheduling pass
  2008-06-05  6:44 [RFC] Selective scheduling pass Steven Bosscher
@ 2008-06-05 18:29 ` Andrey Belevantsev
  2008-06-05 23:38   ` Steven Bosscher
  0 siblings, 1 reply; 9+ messages in thread
From: Andrey Belevantsev @ 2008-06-05 18:29 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: gcc-patches, Jim Wilson, Vladimir Makarov

Steven Bosscher wrote:
> (xf. http://gcc.gnu.org/ml/gcc-patches/2008-06/msg00114.html)
> 
> Hi Andrey,
> 
> Thanks for this very nice work.  I was wondering if you could say a
> little more about the performance impact of the selective scheduler on
> the generated code...
Thank you Steven!


> You posted SPEC scores and there are as many ups as there are downs in
> there.  Where do the regressions come from?  The 3-5% on e.g. galgel
> and vortex are quite substantial slowdowns, but do you know why they
> happen?
We have tried to fix all regressions for which we have found a culprit. 
  Galgel slows down because of two things.  First, we cannot use cselib 
as the ebb scheduler does, because it works only on extended basic 
blocks.  We tried to support cselib for multiple fences, analogously to 
what we did with target contexts, but we didn't manage to make it work 
well.  I think that we can turn cselib on at least for those regions 
that are ebbs.

The second problem of galgel is that the heuristic that restricts 
pipelining on small loops doesn't work well in this case.  However, we 
have turned it on because overall it provided a speedup.  I would note 
that on SPEC FP, to which went the most tuning, there are far more ups 
then downs :)


> Also, you post a new scheduler and a set of target tunings in one set
> of patches.  I would like to know what the performance impact is of
> just the target changes alone.  That is, what happens to e.g. SPEC
> scores for ia64 with just the tweaks and tunings patch
> (http://gcc.gnu.org/ml/gcc-patches/2008-06/msg00117.html)?  I assume
> those target changes alone (the ones not related to sel-sched) also
> have a positive performance impact.  Since you've globbed everything
> into one patch set, it's impossible to tell how much of the
> performance changes can be attributed to sel-sched, and how much is
> just target tweaks...
AFAIR, the target tunings gave around 1% when we have tested it on -O2. 
   That was a couple of months ago.  We will retest tonight to get fresh 
numbers for this.

Andrey

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] Selective scheduling pass
  2008-06-05 18:29 ` Andrey Belevantsev
@ 2008-06-05 23:38   ` Steven Bosscher
  0 siblings, 0 replies; 9+ messages in thread
From: Steven Bosscher @ 2008-06-05 23:38 UTC (permalink / raw)
  To: Andrey Belevantsev; +Cc: gcc-patches, Jim Wilson, Vladimir Makarov

On Thu, Jun 5, 2008 at 8:29 PM, Andrey Belevantsev <abel@ispras.ru> wrote:
>> Also, you post a new scheduler and a set of target tunings in one set
>> of patches.  I would like to know what the performance impact is of
>> just the target changes alone.  That is, what happens to e.g. SPEC
>> scores for ia64 with just the tweaks and tunings patch
>> (http://gcc.gnu.org/ml/gcc-patches/2008-06/msg00117.html)?  I assume
>> those target changes alone (the ones not related to sel-sched) also
>> have a positive performance impact.  Since you've globbed everything
>> into one patch set, it's impossible to tell how much of the
>> performance changes can be attributed to sel-sched, and how much is
>> just target tweaks...
>
> AFAIR, the target tunings gave around 1% when we have tested it on -O2.
>  That was a couple of months ago.  We will retest tonight to get fresh
> numbers for this.

Great.

Another thought/suggestion/comment:

Does the selective scheduler also make the register renaming pass
obsolete (pass_regrename and pass_cprop_hardreg)?  I would expect it
does, since the scheduler handles the register renaming itself, as far
as I understand.

These two passes are quite expensive (or at least they were when I
last looked at them).  You could perhaps buy yourself some compile
time back if you can demonstrate you don't need to run these passes if
you do post-regalloc selective scheduling.

Gr.
Steven

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [RFC] Selective scheduling pass
@ 2008-06-03 14:24 Andrey Belevantsev
  2008-06-03 22:03 ` Vladimir Makarov
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Andrey Belevantsev @ 2008-06-03 14:24 UTC (permalink / raw)
  To: GCC Patches; +Cc: Jim Wilson, Vladimir Makarov

Hello,

The patches in this thread introduce selective scheduler in GCC, 
implemented by myself, Dmitry Melnik, Dmitry Zhurikhin, Alexander 
Monakov, and Maxim Kuvyrkov while he was at ISP RAS.  Selective 
scheduler is aimed at scheduling eager targets such as ia64, power6, and 
cell.  The implementation contains both the scheduler and the software 
pipeliner, which can be used on loops with control flow not handled by 
SMS.  The scheduler can work either before or after register allocation, 
but it is currently tuned to work after.

The scheduler was bootstrapped and tested on ia64, with all default 
languages, both as a first and as a second scheduler.  It was also 
bootstrapped with c, c++, and fortran enabled on ppc64 and x86-64.

On ia64, test results on SPEC2k FP comparing -O3 -ffast-math on trunk 
and sel-sched branch show 3.8% speedup on average, SPEC INT shows both 
small speedups and regressions, staying around neutral in average:

168.wupwise	513	552	7,60%
171.swim	757	772	1,98%
172.mgrid	570	643	12,81%
173.applu	503	524	4,17%
177.mesa	796	795	-0,13%
178.galgel	814	787	-3,32%
179.art		1990	2098	5,43%
183.equake	513	569	10,92%
187.facerec	958	991	3,44%
188.ammp	765	775	1,31%
189.lucas	860	869	1,05%
191.fma3d	549	536	-2,37%
200.sixtrack	300	323	7,67%
301.apsi	522	546	4,60%
Geomean		673,97	699,87	3,84%

164.gzip	683	682	-0,15%
175.vpr		814	802	-1,47%
176.gcc		1080	1069	-1,02%
181.mcf		701	708	1,00%
186.crafty	872	855	-1,95%
197.parser	729	728	-0,14%
252.eon		793	785	-1,01%
253.perlbmk	824	839	1,82%
254.gap		558	569	1,97%
255.vortex	1012	966	-4,55%
256.bzip2	758	762	0,53%
300.twolf	1005	1015	1,00%
Geomean		806,04	803,25	-0,35%

On power6, Revital Eres saw speedups on several tests; additional tuning 
is required to get good results there, which is complicated because we 
don't have power6.  On cell, there was some third-party testing in 2007, 
showing 4-6% speedups, but I don't have more detailed information.

Compile time slowdown measured with --enable-checking=assert is quite 
significant -- about 12% on spec int and about 18% on spec fp and 
cc1-i-files collection.  For this reason, we have enabled selective 
scheduler by default at -O3 on ia64 and disabled by default on other 
targets.

Our current plan is to work on further compile time improvements and 
performance tuning for ppc and cell, hopefully with the help of IBM 
Haifa folks.  If we will complete this work before the end of stage2, 
then we can enable selective scheduling at -O3 also for ppc in 4.4.  In 
the mid-term, we will work on removing the ebb scheduler, as it is now 
used on ia64 only and will be superseded by selective scheduler when 
we'll further improve compile time.

Andrey

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] Selective scheduling pass
  2008-06-03 14:24 Andrey Belevantsev
@ 2008-06-03 22:03 ` Vladimir Makarov
  2008-06-04 16:55 ` Mark Mitchell
  2008-06-05  3:45 ` Seongbae Park (박성배, 朴成培)
  2 siblings, 0 replies; 9+ messages in thread
From: Vladimir Makarov @ 2008-06-03 22:03 UTC (permalink / raw)
  To: Andrey Belevantsev; +Cc: GCC Patches, Jim Wilson

Andrey Belevantsev wrote:
> Hello,
>
> The patches in this thread introduce selective scheduler in GCC, 
> implemented by myself, Dmitry Melnik, Dmitry Zhurikhin, Alexander 
> Monakov, and Maxim Kuvyrkov while he was at ISP RAS.  Selective 
> scheduler is aimed at scheduling eager targets such as ia64, power6, 
> and cell.  The implementation contains both the scheduler and the 
> software pipeliner, which can be used on loops with control flow not 
> handled by SMS.  The scheduler can work either before or after 
> register allocation, but it is currently tuned to work after.
>
> The scheduler was bootstrapped and tested on ia64, with all default 
> languages, both as a first and as a second scheduler.  It was also 
> bootstrapped with c, c++, and fortran enabled on ppc64 and x86-64.
>
> On ia64, test results on SPEC2k FP comparing -O3 -ffast-math on trunk 
> and sel-sched branch show 3.8% speedup on average, SPEC INT shows both 
> small speedups and regressions, staying around neutral in average:
>
Congratulation!  I followed the project for a long time.  Finally some 
useful milestone is achieved and you have got a pretty big improvement.  
The scheduling algorithm is superior than what we had because it permits 
to improve insn schedules on all execution paths by insn cloning and 
other transformations.
> On power6, Revital Eres saw speedups on several tests; additional 
> tuning is required to get good results there, which is complicated 
> because we don't have power6.  On cell, there was some third-party 
> testing in 2007, showing 4-6% speedups, but I don't have more detailed 
> information.
>
> Compile time slowdown measured with --enable-checking=assert is quite 
> significant -- about 12% on spec int and about 18% on spec fp and 
> cc1-i-files collection.  For this reason, we have enabled selective 
> scheduler by default at -O3 on ia64 and disabled by default on other 
> targets.
>
Itanium is pretty specific target.  It would be interesting to know how 
big a slowdown for ppc.
> Our current plan is to work on further compile time improvements and 
> performance tuning for ppc and cell, hopefully with the help of IBM 
> Haifa folks.  If we will complete this work before the end of stage2, 
> then we can enable selective scheduling at -O3 also for ppc in 4.4.  
> In the mid-term, we will work on removing the ebb scheduler, as it is 
> now used on ia64 only and will be superseded by selective scheduler 
> when we'll further improve compile time.
>

I think we should rid of EBB scheduler finally.  You could try to 
improve compile-time problem preventing some transformations in the new 
scheduler in -O2 mode.

If you solve compile-time problem, I think we should work on removing 
haifa-scheduler too to have just one insn scheduler.  But as I 
understand it will not happen soon.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] Selective scheduling pass
  2008-06-03 14:24 Andrey Belevantsev
  2008-06-03 22:03 ` Vladimir Makarov
@ 2008-06-04 16:55 ` Mark Mitchell
  2008-06-04 20:50   ` Andrey Belevantsev
  2008-06-05  3:45 ` Seongbae Park (박성배, 朴成培)
  2 siblings, 1 reply; 9+ messages in thread
From: Mark Mitchell @ 2008-06-04 16:55 UTC (permalink / raw)
  To: Andrey Belevantsev; +Cc: GCC Patches, Jim Wilson, Vladimir Makarov

Andrey Belevantsev wrote:

> The patches in this thread introduce selective scheduler in GCC, 
> implemented by myself, Dmitry Melnik, Dmitry Zhurikhin, Alexander 
> Monakov, and Maxim Kuvyrkov while he was at ISP RAS.  Selective 
> scheduler is aimed at scheduling eager targets such as ia64, power6, and 
> cell.  The implementation contains both the scheduler and the software 
> pipeliner, which can be used on loops with control flow not handled by 
> SMS.  The scheduler can work either before or after register allocation, 
> but it is currently tuned to work after.
> 
> On ia64, test results on SPEC2k FP comparing -O3 -ffast-math on trunk 
> and sel-sched branch show 3.8% speedup on average, SPEC INT shows both 
> small speedups and regressions, staying around neutral in average:

That's a very good result.  Congratulations!

I know that this scheduler is aimed at CPUs like the ones you mention 
above.  However, would it function correctly on other CPUs with more 
"traditional" characteristics, like older ARM, MIPS, or x86 cores?  And, 
would it be reasonably possible to tune it for those CPUs as well?

As with the IRA allocator, I'd like to avoid having multiple schedulers 
in GCC.  (I know we've done that for a while, but I still think it's 
undesirable.)  So, I'd like to see if we can get this to work well 
across all of the Primary and Secondary CPUs, and then just make it "the 
GCC scheduler" rather than an optional thing enabled at some 
optimization levels on some CPUs.

Do you think that's feasible?  Or do you think that there are inherent 
aspects of the algorithm that mean that we need to have this new 
scheduler for one class of CPUs and the old scheduler for the other 
class?  Is there any way to make the new scheduler do a reasonable job 
with the existing descriptions in GCC, so that port maintainers can tune 
later, or is a level of effort like that for Itanium require

> Compile time slowdown measured with --enable-checking=assert is quite 
> significant -- about 12% on spec int and about 18% on spec fp and 
> cc1-i-files collection.  For this reason, we have enabled selective 
> scheduler by default at -O3 on ia64 and disabled by default on other 
> targets.

Do you understand what's causing the compile-time slowdown?

Thanks,

-- 
Mark Mitchell
CodeSourcery
mark@codesourcery.com
(650) 331-3385 x713

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] Selective scheduling pass
  2008-06-04 16:55 ` Mark Mitchell
@ 2008-06-04 20:50   ` Andrey Belevantsev
  0 siblings, 0 replies; 9+ messages in thread
From: Andrey Belevantsev @ 2008-06-04 20:50 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: GCC Patches, Jim Wilson, Vladimir Makarov

Mark Mitchell wrote:
> That's a very good result.  Congratulations! 
Thank you!

> I know that this scheduler is aimed at CPUs like the ones you mention 
> above.  However, would it function correctly on other CPUs with more 
> "traditional" characteristics, like older ARM, MIPS, or x86 cores?  And, 
> would it be reasonably possible to tune it for those CPUs as well?
When a target doesn't do anything "fancy" in scheduler hooks, everything 
should just work (modulo bugs, of course; we've tried only ppc64 and 
x86-64).  In case a target saves some information describing scheduler's 
state, simple hooks manipulating this data should be implemented, like 
we did for the rs6000 port.

> As with the IRA allocator, I'd like to avoid having multiple schedulers 
> in GCC.  (I know we've done that for a while, but I still think it's 
> undesirable.)  So, I'd like to see if we can get this to work well 
> across all of the Primary and Secondary CPUs, and then just make it "the 
> GCC scheduler" rather than an optional thing enabled at some 
> optimization levels on some CPUs.
This is our goal as well, and I think it can be done incrementally.  We 
are now working on the ppc performance.  Then we need to tune the 
scheduler so that for traditional targets it is no worse in performance 
and the slowdown is reasonable, e.g. with disabling pipelining and 
decreasing the scheduling window.  The last thing to do is to speed up 
the implementation so that for scheduling-eager targets with pipelining 
enabled the slowdown will be acceptable for -O2.

Note that the selective scheduler does not subsume SMS, but complements 
it, because SMS does better job for countable loops, but cannot handle 
loops with control flow and with unknown number of iterations.  So in 
any case there will be two schedulers.

> Do you think that's feasible?  Or do you think that there are inherent 
> aspects of the algorithm that mean that we need to have this new 
> scheduler for one class of CPUs and the old scheduler for the other 
> class?  Is there any way to make the new scheduler do a reasonable job 
> with the existing descriptions in GCC, so that port maintainers can tune 
> later, or is a level of effort like that for Itanium require
The ia64 backend is very complex, and we put a lot of efforts in tuning 
it by itself -- you can see it in my other mail about target changes. 
So I think that tuning for other targets will be simpler.  The cell 
results I mentioned in the mail were received from a guy who did the 
tuning internally in Samsung, and AFAIR he didn't mentioned any 
target-independent changes he had to do, but basically he just made it 
working.

>> Compile time slowdown measured with --enable-checking=assert is quite 
>> significant -- about 12% on spec int and about 18% on spec fp and 
>> cc1-i-files collection.  For this reason, we have enabled selective 
>> scheduler by default at -O3 on ia64 and disabled by default on other 
>> targets.
> 
> Do you understand what's causing the compile-time slowdown?
The part that takes most time is the update of availability sets, as 
this is the central part of the algorithm.  Renaming is quite expensive 
too, but we have tackled this limiting it only to several insns with the 
largest priority.  To make the updates faster, you need to build the 
data dependence graph and to keep it up to date while scheduling. 
Unfortunately, we didn't manage to do this during this project.  The 
first step towards this goal will be to make the dependence graph 
classify the dependencies built on control/data, lhs/rhs, 
register/memory etc.  Then we can invent the mechanism of updating the 
graph, which would not be trivial -- e.g. when an insn gets renamed, we 
have introduced a register-register copy which can generate completely 
new register dependencies that cannot be devised from existing ones. 
Such a project is likely to make it to trunk on the next release cycle, 
and that would correspond to the last step of the incremental approach 
outlined above.

Yours, Andrey

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] Selective scheduling pass
  2008-06-03 14:24 Andrey Belevantsev
  2008-06-03 22:03 ` Vladimir Makarov
  2008-06-04 16:55 ` Mark Mitchell
@ 2008-06-05  3:45 ` Seongbae Park (박성배, 朴成培)
  2008-06-05 13:49   ` Andrey Belevantsev
  2 siblings, 1 reply; 9+ messages in thread
From: Seongbae Park (박성배, 朴成培) @ 2008-06-05  3:45 UTC (permalink / raw)
  To: Andrey Belevantsev; +Cc: GCC Patches, Jim Wilson, Vladimir Makarov

On Tue, Jun 3, 2008 at 7:16 AM, Andrey Belevantsev <abel@ispras.ru> wrote:
> Hello,
>
> The patches in this thread introduce selective scheduler in GCC, implemented
> by myself, Dmitry Melnik, Dmitry Zhurikhin, Alexander Monakov, and Maxim
> Kuvyrkov while he was at ISP RAS.  Selective scheduler is aimed at
> scheduling eager targets such as ia64, power6, and cell.  The implementation
> contains both the scheduler and the software pipeliner, which can be used on
> loops with control flow not handled by SMS.  The scheduler can work either
> before or after register allocation, but it is currently tuned to work
> after.
>
> The scheduler was bootstrapped and tested on ia64, with all default
> languages, both as a first and as a second scheduler.  It was also
> bootstrapped with c, c++, and fortran enabled on ppc64 and x86-64.
>
> On ia64, test results on SPEC2k FP comparing -O3 -ffast-math on trunk and
> sel-sched branch show 3.8% speedup on average, SPEC INT shows both small
> speedups and regressions, staying around neutral in average:
>
> 168.wupwise     513     552     7,60%
> 171.swim        757     772     1,98%
> 172.mgrid       570     643     12,81%
> 173.applu       503     524     4,17%
> 177.mesa        796     795     -0,13%
> 178.galgel      814     787     -3,32%
> 179.art         1990    2098    5,43%
> 183.equake      513     569     10,92%
> 187.facerec     958     991     3,44%
> 188.ammp        765     775     1,31%
> 189.lucas       860     869     1,05%
> 191.fma3d       549     536     -2,37%
> 200.sixtrack    300     323     7,67%
> 301.apsi        522     546     4,60%
> Geomean         673,97  699,87  3,84%
>
> 164.gzip        683     682     -0,15%
> 175.vpr         814     802     -1,47%
> 176.gcc         1080    1069    -1,02%
> 181.mcf         701     708     1,00%
> 186.crafty      872     855     -1,95%
> 197.parser      729     728     -0,14%
> 252.eon         793     785     -1,01%
> 253.perlbmk     824     839     1,82%
> 254.gap         558     569     1,97%
> 255.vortex      1012    966     -4,55%
> 256.bzip2       758     762     0,53%
> 300.twolf       1005    1015    1,00%
> Geomean         806,04  803,25  -0,35%

Presumably this is with any profile feedback ?
If so, numbers look ok.
Have you tried it with profile feedback ?
Selective scheduling (and most other aggressive global scheduling algorithms)
can benefit quite a bit from profile feedback,
and tuning can be quite different for with and without profile feedback.

Seongbae

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] Selective scheduling pass
  2008-06-05  3:45 ` Seongbae Park (박성배, 朴成培)
@ 2008-06-05 13:49   ` Andrey Belevantsev
  0 siblings, 0 replies; 9+ messages in thread
From: Andrey Belevantsev @ 2008-06-05 13:49 UTC (permalink / raw)
  To: "Seongbae Park (박성배,
	朴成培)"
  Cc: GCC Patches, Jim Wilson, Vladimir Makarov

Seongbae Park (ë°•ì„±ë°°, æœ´æˆåŸ¹) wrote:
> Presumably this is with any profile feedback ?
> If so, numbers look ok.
You probably mean that the numbers are without profile feedback.  This 
is true.

> Have you tried it with profile feedback ?
> Selective scheduling (and most other aggressive global scheduling algorithms)
> can benefit quite a bit from profile feedback,
> and tuning can be quite different for with and without profile feedback.
No, we haven't tried that.  I've got the impression that profile 
optimizations are not of big importance to GCC developers, so we focused 
on tuning without profile feedback.  Nevertheless, we'll try SPEC with 
profiling feedback tonight.  I will be happy to discuss how the 
scheduler can be tuned to use the profiling information -- will you 
attend the summit btw?

Andrey

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2008-06-05 23:38 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-06-05  6:44 [RFC] Selective scheduling pass Steven Bosscher
2008-06-05 18:29 ` Andrey Belevantsev
2008-06-05 23:38   ` Steven Bosscher
  -- strict thread matches above, loose matches on Subject: below --
2008-06-03 14:24 Andrey Belevantsev
2008-06-03 22:03 ` Vladimir Makarov
2008-06-04 16:55 ` Mark Mitchell
2008-06-04 20:50   ` Andrey Belevantsev
2008-06-05  3:45 ` Seongbae Park (박성배, 朴成培)
2008-06-05 13:49   ` Andrey Belevantsev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).