Superblock Instruction Scheduling in GCC

public inbox for gcc-help@gcc.gnu.org
 help / color / mirror / Atom feed

* Superblock Instruction Scheduling in GCC
@ 2003-12-03  6:14 Ghassan Shobaki
  2003-12-03 14:03 ` Vladimir N. Makarov
  0 siblings, 1 reply; 33+ messages in thread
From: Ghassan Shobaki @ 2003-12-03  6:14 UTC (permalink / raw)
  To: gcc-help; +Cc: gcc

I know how to get gcc to form superblocks (by using the -ftracer
command-line switch), but is there a way to get it to use these
superblocks as scheduling regions in the instruction scheduling pass?
Currently, the instruction scheduling module forms regions that are totally
different from the superblocks that are formed in the tracer module
even though each superblock is a valid scheduling region.
Any idea how I can achieve this? Or are there any plans to do superblock
instruction scheduling in the near future?

Thanks

-Ghassan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Superblock Instruction Scheduling in GCC
  2003-12-03  6:14 Superblock Instruction Scheduling in GCC Ghassan Shobaki
@ 2003-12-03 14:03 ` Vladimir N. Makarov
  2003-12-04  0:06   ` Jan Hubicka
  0 siblings, 1 reply; 33+ messages in thread
From: Vladimir N. Makarov @ 2003-12-03 14:03 UTC (permalink / raw)
  To: Ghassan Shobaki; +Cc: gcc-help, gcc

Ghassan Shobaki wrote:

> I know how to get gcc to form superblocks (by using the -ftracer
> command-line switch), but is there a way to get it to use these
> superblocks as scheduling regions in the instruction scheduling pass?
> Currently, the instruction scheduling module forms regions that are totally
> different from the superblocks that are formed in the tracer module
> even though each superblock is a valid scheduling region.
> Any idea how I can achieve this? Or are there any plans to do superblock
> instruction scheduling in the near future?

There was Jan Hubicka's patch for this.  Please look at it

http://gcc.gnu.org/ml/gcc-patches/2003-02/msg00499.html

This patch should work for all platforms except for IA64 whose the second
scheduling is made on EBB.

I tried trace scheduling for IA64 (but I did not post the patch for ia64).
Here the results are

http://gcc.gnu.org/ml/gcc-patches/2003-02/msg00499.html

The problem with trace scheduling is that the generated code is bigger, the
compiler is slower and the code improvement is insignificant.

  If you manage to achieve an improvement for a platform on a credible
benchmark (SPEC95, SPEC2000), we could consider to add the patch to gcc at
least for given platform for -O3.   Because the compiler changed since the
patch was posted, there is a probability that you could achieve this.

Vlad

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Superblock Instruction Scheduling in GCC
  2003-12-03 14:03 ` Vladimir N. Makarov
@ 2003-12-04  0:06   ` Jan Hubicka
  2003-12-04  1:26     ` Ghassan Shobaki
                       ` (2 more replies)
  0 siblings, 3 replies; 33+ messages in thread
From: Jan Hubicka @ 2003-12-04  0:06 UTC (permalink / raw)
  To: Vladimir N. Makarov; +Cc: Ghassan Shobaki, gcc-help, gcc

> Ghassan Shobaki wrote:
> 
> > I know how to get gcc to form superblocks (by using the -ftracer
> > command-line switch), but is there a way to get it to use these
> > superblocks as scheduling regions in the instruction scheduling pass?
> > Currently, the instruction scheduling module forms regions that are totally
> > different from the superblocks that are formed in the tracer module
> > even though each superblock is a valid scheduling region.
> > Any idea how I can achieve this? Or are there any plans to do superblock
> > instruction scheduling in the near future?
> 
> There was Jan Hubicka's patch for this.  Please look at it
> 
> http://gcc.gnu.org/ml/gcc-patches/2003-02/msg00499.html
> 
> This patch should work for all platforms except for IA64 whose the second
> scheduling is made on EBB.

This patch is currently in the mainline tree, so you can simply use
-fsched2-use-traces / -fsched2-use-superblocks
> 
> I tried trace scheduling for IA64 (but I did not post the patch for ia64).
> Here the results are
> 
> http://gcc.gnu.org/ml/gcc-patches/2003-02/msg00499.html
> 
> The problem with trace scheduling is that the generated code is bigger, the
> compiler is slower and the code improvement is insignificant.
> 
>   If you manage to achieve an improvement for a platform on a credible
> benchmark (SPEC95, SPEC2000), we could consider to add the patch to gcc at
> least for given platform for -O3.   Because the compiler changed since the
> patch was posted, there is a probability that you could achieve this.

Yes, we need experimenting here.
I was quite surprised that the benefits wasn't too noticeable on
in-order architecture and I would like to hear about any results
(positive or negative).
-fsched2-use-superblocks should bring most of benefits at no code size
costs, while -fsched2-use-traces is more experimental and probably needs
profile feedback to do somethign usefull.  (I managed to get some
speedups using this on Athlon but the benefits wasn't considerable
enought to discuss inclusion in -O3 -fbranch-probabilities combination)

Honza
> 
> 
> Vlad
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Superblock Instruction Scheduling in GCC
  2003-12-04  0:06   ` Jan Hubicka
@ 2003-12-04  1:26     ` Ghassan Shobaki
  2003-12-04 10:25       ` Jan Hubicka
  2003-12-09 21:11     ` Ghassan Shobaki
  2004-01-28  1:31     ` Superblock Instruction Scheduling in GCC 3.4 Ghassan Shobaki
  2 siblings, 1 reply; 33+ messages in thread
From: Ghassan Shobaki @ 2003-12-04  1:26 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Vladimir N. Makarov, gcc-help, gcc

Jan and Vladimir,

Thank you guys for the immediate and very helpful responses. I have just
tried the -fsched2-use-superblocks on vresion 3.3.1 that I currently have,
but it did not work. Is this version supposed to have it or I have to
download version 3.3.2 in order to get this feature?

Also, what's exactly the difference between -fsched2-use-superblocks and
-fsched2-use-traces? As far as I know, a superblock is a single-entry
multiple-exit region that is formed from a trace using tail duplication.
This means that superblocks are more likely (but not necessarily) to
increase code size due to tail duplication. This also implies that
superblock scheduling is simpler than trace scheduling. Is this consistent
with what the above two gcc comman-line options mean?

As far as experimentation is concerned, let me give some background about
what I am doing and what kind of input I might be able to provide:

I am doing research on optimal superblock scheduling and I need to import
superblocks (more precisely, superblock data dependence graphs) from gcc to
run them through my optimal solver (my research group currently has a way
to import basic blocks and I am trying to extend that to superblocks).
Even though this optimal solver is currently too slow to be included in a
production compiler like gcc, it will be useful for studying the quality
of gcc's schedules by comparing them against optimal. It will probably
take me two or three months to get to that point for the very simplistic
machine models that we are working with, but I'll be more than happy to
provide you with any interesting results that I might come up with.

Regards

-Ghassan



On Thu, 4 Dec 2003, Jan Hubicka wrote:

> > Ghassan Shobaki wrote:
> >
> > > I know how to get gcc to form superblocks (by using the -ftracer
> > > command-line switch), but is there a way to get it to use these
> > > superblocks as scheduling regions in the instruction scheduling pass?
> > > Currently, the instruction scheduling module forms regions that are totally
> > > different from the superblocks that are formed in the tracer module
> > > even though each superblock is a valid scheduling region.
> > > Any idea how I can achieve this? Or are there any plans to do superblock
> > > instruction scheduling in the near future?
> >
> > There was Jan Hubicka's patch for this.  Please look at it
> >
> > http://gcc.gnu.org/ml/gcc-patches/2003-02/msg00499.html
> >
> > This patch should work for all platforms except for IA64 whose the second
> > scheduling is made on EBB.
>
> This patch is currently in the mainline tree, so you can simply use
> -fsched2-use-traces / -fsched2-use-superblocks
> >
> > I tried trace scheduling for IA64 (but I did not post the patch for ia64).
> > Here the results are
> >
> > http://gcc.gnu.org/ml/gcc-patches/2003-02/msg00499.html
> >
> > The problem with trace scheduling is that the generated code is bigger, the
> > compiler is slower and the code improvement is insignificant.
> >
> >   If you manage to achieve an improvement for a platform on a credible
> > benchmark (SPEC95, SPEC2000), we could consider to add the patch to gcc at
> > least for given platform for -O3.   Because the compiler changed since the
> > patch was posted, there is a probability that you could achieve this.
>
> Yes, we need experimenting here.
> I was quite surprised that the benefits wasn't too noticeable on
> in-order architecture and I would like to hear about any results
> (positive or negative).
> -fsched2-use-superblocks should bring most of benefits at no code size
> costs, while -fsched2-use-traces is more experimental and probably needs
> profile feedback to do somethign usefull.  (I managed to get some
> speedups using this on Athlon but the benefits wasn't considerable
> enought to discuss inclusion in -O3 -fbranch-probabilities combination)
>
> Honza
> >
> >
> > Vlad
> >
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Superblock Instruction Scheduling in GCC
  2003-12-04  1:26     ` Ghassan Shobaki
@ 2003-12-04 10:25       ` Jan Hubicka
  2003-12-04 15:54         ` Vladimir Makarov
  0 siblings, 1 reply; 33+ messages in thread
From: Jan Hubicka @ 2003-12-04 10:25 UTC (permalink / raw)
  To: Ghassan Shobaki; +Cc: Jan Hubicka, Vladimir N. Makarov, gcc-help, gcc

> Jan and Vladimir,
> 
> Thank you guys for the immediate and very helpful responses. I have just
> tried the -fsched2-use-superblocks on vresion 3.3.1 that I currently have,
> but it did not work. Is this version supposed to have it or I have to
> download version 3.3.2 in order to get this feature?
The -fsched2-use-superblocks is only in mainline (development version of
3.4).  So you will need to either download snapshot of it or backport
the changes, if you want to experiment.
I would guess that using snapshot of 3.4 would be easier for you :)
> 
> Also, what's exactly the difference between -fsched2-use-superblocks and
> -fsched2-use-traces? As far as I know, a superblock is a single-entry
> multiple-exit region that is formed from a trace using tail duplication.
> This means that superblocks are more likely (but not necessarily) to
> increase code size due to tail duplication. This also implies that
> superblock scheduling is simpler than trace scheduling. Is this consistent
> with what the above two gcc comman-line options mean?
At the moment, superblocks scheduling and trace scheduling is same, jus
ttrace sheduling does tail duplication just before scheduling so
superblocks are really traces.
The algorithm is very primitive, in particular it does not do any
scheduling requiring compoensation code to be produced. This may be
reason why it does not bring very much.
> 
> As far as experimentation is concerned, let me give some background about
> what I am doing and what kind of input I might be able to provide:
> 
> I am doing research on optimal superblock scheduling and I need to import
> superblocks (more precisely, superblock data dependence graphs) from gcc to
> run them through my optimal solver (my research group currently has a way
> to import basic blocks and I am trying to extend that to superblocks).
> Even though this optimal solver is currently too slow to be included in a
> production compiler like gcc, it will be useful for studying the quality
> of gcc's schedules by comparing them against optimal. It will probably
> take me two or three months to get to that point for the very simplistic
> machine models that we are working with, but I'll be more than happy to
> provide you with any interesting results that I might come up with.

Yes, this looks interesting.  I would be curious how much gap there is
in between current scheduling and optimal one deifnitly.
On GCC mainline, you may also want to experiment with multipass
scheduling that (to my understanding) given large enought limits should
produce optimal schedules, but this is Vladimir's domain.

Honza
> 
> Regards
> 
> -Ghassan
> 
> 
> 
> On Thu, 4 Dec 2003, Jan Hubicka wrote:
> 
> > > Ghassan Shobaki wrote:
> > >
> > > > I know how to get gcc to form superblocks (by using the -ftracer
> > > > command-line switch), but is there a way to get it to use these
> > > > superblocks as scheduling regions in the instruction scheduling pass?
> > > > Currently, the instruction scheduling module forms regions that are totally
> > > > different from the superblocks that are formed in the tracer module
> > > > even though each superblock is a valid scheduling region.
> > > > Any idea how I can achieve this? Or are there any plans to do superblock
> > > > instruction scheduling in the near future?
> > >
> > > There was Jan Hubicka's patch for this.  Please look at it
> > >
> > > http://gcc.gnu.org/ml/gcc-patches/2003-02/msg00499.html
> > >
> > > This patch should work for all platforms except for IA64 whose the second
> > > scheduling is made on EBB.
> >
> > This patch is currently in the mainline tree, so you can simply use
> > -fsched2-use-traces / -fsched2-use-superblocks
> > >
> > > I tried trace scheduling for IA64 (but I did not post the patch for ia64).
> > > Here the results are
> > >
> > > http://gcc.gnu.org/ml/gcc-patches/2003-02/msg00499.html
> > >
> > > The problem with trace scheduling is that the generated code is bigger, the
> > > compiler is slower and the code improvement is insignificant.
> > >
> > >   If you manage to achieve an improvement for a platform on a credible
> > > benchmark (SPEC95, SPEC2000), we could consider to add the patch to gcc at
> > > least for given platform for -O3.   Because the compiler changed since the
> > > patch was posted, there is a probability that you could achieve this.
> >
> > Yes, we need experimenting here.
> > I was quite surprised that the benefits wasn't too noticeable on
> > in-order architecture and I would like to hear about any results
> > (positive or negative).
> > -fsched2-use-superblocks should bring most of benefits at no code size
> > costs, while -fsched2-use-traces is more experimental and probably needs
> > profile feedback to do somethign usefull.  (I managed to get some
> > speedups using this on Athlon but the benefits wasn't considerable
> > enought to discuss inclusion in -O3 -fbranch-probabilities combination)
> >
> > Honza
> > >
> > >
> > > Vlad
> > >
> >

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Superblock Instruction Scheduling in GCC
  2003-12-04 10:25       ` Jan Hubicka
@ 2003-12-04 15:54         ` Vladimir Makarov
  0 siblings, 0 replies; 33+ messages in thread
From: Vladimir Makarov @ 2003-12-04 15:54 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Ghassan Shobaki, gcc-help, gcc

Jan Hubicka wrote:
> 
> >
> > As far as experimentation is concerned, let me give some background about
> > what I am doing and what kind of input I might be able to provide:
> >
> > I am doing research on optimal superblock scheduling and I need to import
> > superblocks (more precisely, superblock data dependence graphs) from gcc to
> > run them through my optimal solver (my research group currently has a way
> > to import basic blocks and I am trying to extend that to superblocks).
> > Even though this optimal solver is currently too slow to be included in a
> > production compiler like gcc, it will be useful for studying the quality
> > of gcc's schedules by comparing them against optimal. It will probably
> > take me two or three months to get to that point for the very simplistic
> > machine models that we are working with, but I'll be more than happy to
> > provide you with any interesting results that I might come up with.
> 
> Yes, this looks interesting.  I would be curious how much gap there is
> in between current scheduling and optimal one deifnitly.
> On GCC mainline, you may also want to experiment with multipass
> scheduling that (to my understanding) given large enought limits should
> produce optimal schedules, but this is Vladimir's domain.
> 

  Actually the first cycle multipass instruction scheduling tries only
to maximize number of insns issued on the current cycle.  An optimal
insn scheduling should minimize execution time of scheduled code which
means it might require to issue not maximal number of insns on a cycle. 
Also the first cycle multipass insn scheduling solves the task in a
primitive way (just trying part of permutations of ready insns without
reusing already evaluated information from other tried permutation). 
This algorithm was just a demonstration of fast work DFA hazard
recognizer.  It improves code for Itanium (as I remember about 2% for
SPECINT2000).

  As for optimal insn scheduling, I wrote a prototype for BB 2-3 years
ago.  It was based on a dynamic programing approach (reusing already
evaluated information about scheduled insn subsequences).  It worked
fast for sequences up to 15-20 insns for ultrasparc.  I see some
improvements (say 7 cycles instead of 9 cycles) for about 30% of bbs on
small benchmarks.  I see the following problem with this approach for
gcc as an industrial compiler:

   1. Gcc has a lot of scheduler hooks which are oriented to list insn
scheduling algorithm.  Many ports uses them.  It is practically
impossible to embed these hooks into an optimal insn scheduling
algorithm.

   2. Long sequences of insns should be divided to keep the algorithm
speed acceptable.  Simple approach to this might work even worse than
list insn scheduling.  Some information (like when data from previous
subsequence will be ready) should be transferred from one subsequence to
another sequence.

  3.  Modern insn scheduling algorithms are now not only about insn
rearranging (in linear code) but can include many other transformations
like code cloning, predication, data/control flow speculation.  I don't
see how these transformations could be a part of an optimal insn
scheduling).  And I think there is more code improving potential using
these transformations than an optimal scheduling.  Although there is
definitely some areas where an optimal scheduling could be useful.

In any case, it would be very interesting to see the results of your
research.  We should encourage people to make researches on gcc.

Vlad

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Superblock Instruction Scheduling in GCC
  2003-12-04  0:06   ` Jan Hubicka
  2003-12-04  1:26     ` Ghassan Shobaki
@ 2003-12-09 21:11     ` Ghassan Shobaki
  2004-01-28  1:31     ` Superblock Instruction Scheduling in GCC 3.4 Ghassan Shobaki
  2 siblings, 0 replies; 33+ messages in thread
From: Ghassan Shobaki @ 2003-12-09 21:11 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Vladimir N. Makarov, gcc-help, gcc



I could not check out the integrated patch since the cvs server is
currently down. On the other hand, when I tried to add the patch (as
published on the web) manually on top of release 3.3.1 the resulting
binary failed. I am not sure if I did something wrong or just needed
to do that on top of a more recent version of the gcc code.
So, I am kind of stuck at this point. Do you have any source files that I
can add on top of release 3.3.1 or 3.3.2 to enable superblock scheduling?

Thanks

-Ghassan


On Thu, 4 Dec 2003, Jan Hubicka wrote:

> > Ghassan Shobaki wrote:
> >
> > > I know how to get gcc to form superblocks (by using the -ftracer
> > > command-line switch), but is there a way to get it to use these
> > > superblocks as scheduling regions in the instruction scheduling pass?
> > > Currently, the instruction scheduling module forms regions that are totally
> > > different from the superblocks that are formed in the tracer module
> > > even though each superblock is a valid scheduling region.
> > > Any idea how I can achieve this? Or are there any plans to do superblock
> > > instruction scheduling in the near future?
> >
> > There was Jan Hubicka's patch for this.  Please look at it
> >
> > http://gcc.gnu.org/ml/gcc-patches/2003-02/msg00499.html
> >
> > This patch should work for all platforms except for IA64 whose the second
> > scheduling is made on EBB.
>
> This patch is currently in the mainline tree, so you can simply use
> -fsched2-use-traces / -fsched2-use-superblocks
> >
> > I tried trace scheduling for IA64 (but I did not post the patch for ia64).
> > Here the results are
> >
> > http://gcc.gnu.org/ml/gcc-patches/2003-02/msg00499.html
> >
> > The problem with trace scheduling is that the generated code is bigger, the
> > compiler is slower and the code improvement is insignificant.
> >
> >   If you manage to achieve an improvement for a platform on a credible
> > benchmark (SPEC95, SPEC2000), we could consider to add the patch to gcc at
> > least for given platform for -O3.   Because the compiler changed since the
> > patch was posted, there is a probability that you could achieve this.
>
> Yes, we need experimenting here.
> I was quite surprised that the benefits wasn't too noticeable on
> in-order architecture and I would like to hear about any results
> (positive or negative).
> -fsched2-use-superblocks should bring most of benefits at no code size
> costs, while -fsched2-use-traces is more experimental and probably needs
> profile feedback to do somethign usefull.  (I managed to get some
> speedups using this on Athlon but the benefits wasn't considerable
> enought to discuss inclusion in -O3 -fbranch-probabilities combination)
>
> Honza
> >
> >
> > Vlad
> >
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Superblock Instruction Scheduling in GCC 3.4
  2003-12-04  0:06   ` Jan Hubicka
  2003-12-04  1:26     ` Ghassan Shobaki
  2003-12-09 21:11     ` Ghassan Shobaki
@ 2004-01-28  1:31     ` Ghassan Shobaki
  2004-01-28 12:23       ` Jan Hubicka
  2 siblings, 1 reply; 33+ messages in thread
From: Ghassan Shobaki @ 2004-01-28  1:31 UTC (permalink / raw)
  To: gcc-help; +Cc: Vladimir N. Makarov, gcc, hubicka

Hi,

I am currently experimenting with superblock scheduling in a prerelease
version (20040114) of gcc 3.4 on the spec2000 benchmarks. I have
a few questions:

The version that I have does not seem to accept the
-fsched2-use-traces command line switch. So I went a head and set the
following global variables in toplev.c in order to enforce superblock
formation and scheduling:

int flag_optimize = 1;
int flag_schedule_insns = 1;
int flag_schedule_insns_after_reload = 1;
int flag_sched2_use_traces = 1;

Q1. Is there a known bug in version 3.4 related to command-line options?
    Or I am doing something wrong?

Q2. Should I set the following flag as well?
int flag_branch_probabilities = ?;
I tried it both ways and the superblocks generated when this flag is
RESET are on average larger and have more branches in them (they are much
harder to schedule), which does not make sense to me. What is superblock
formation based on when branch probablities are not computed? It seems to
me that setting this flag should be necessary!

Q3. Are there any other flags that I need to set in order to ensure
proper superblock formation?

Q3. I have looked into the trace formation code in tracer.c and was
confused about three different terms related to about the same
concept : probability, count and frequency. Is there a documention that
precisely defines each of these three terms and how they are used in
trace formation?

Thanks

-Ghassan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Superblock Instruction Scheduling in GCC 3.4
  2004-01-28  1:31     ` Superblock Instruction Scheduling in GCC 3.4 Ghassan Shobaki
@ 2004-01-28 12:23       ` Jan Hubicka
  2004-01-29  7:07         ` Ghassan Shobaki
  0 siblings, 1 reply; 33+ messages in thread
From: Jan Hubicka @ 2004-01-28 12:23 UTC (permalink / raw)
  To: Ghassan Shobaki; +Cc: gcc-help, Vladimir N. Makarov, gcc, hubicka

> Hi,
> 
> I am currently experimenting with superblock scheduling in a prerelease
> version (20040114) of gcc 3.4 on the spec2000 benchmarks. I have
> a few questions:
> 
> The version that I have does not seem to accept the
> -fsched2-use-traces command line switch. So I went a head and set the
> following global variables in toplev.c in order to enforce superblock
> formation and scheduling:
> 
> int flag_optimize = 1;
> int flag_schedule_insns = 1;
> int flag_schedule_insns_after_reload = 1;
> int flag_sched2_use_traces = 1;
> 
> Q1. Is there a known bug in version 3.4 related to command-line options?
>     Or I am doing something wrong?

I jus ttested and -fsched2-use-traces seems to just work for me.  What
kind of problems do you get?
> 
> Q2. Should I set the following flag as well?
> int flag_branch_probabilities = ?;
> I tried it both ways and the superblocks generated when this flag is
> RESET are on average larger and have more branches in them (they are much
> harder to schedule), which does not make sense to me. What is superblock
> formation based on when branch probablities are not computed? It seems to
> me that setting this flag should be necessary!

Yes, however it should be implied by -O2 flag.
You also may want the profile feedback.  Without -fbranch-probabilities
you won't get any superblocks formed.
> 
> Q3. Are there any other flags that I need to set in order to ensure
> proper superblock formation?

I think -O2 -fsched2-use-traces shall do what you want.
> 
> Q3. I have looked into the trace formation code in tracer.c and was
> confused about three different terms related to about the same
> concept : probability, count and frequency. Is there a documention that
> precisely defines each of these three terms and how they are used in
> trace formation?

You may take a look at cfg.texi documentation I sent to mailing list
year or two ago, but it never went into official sources.
In general the count represent number of execution of given edge/block
when profile is read from feedback.  It is 64bit number.
Each basic block has frequence that represent how commonly it is
executed relative to REG_FREQ_BASE.  It is 32bit number and the most
frequent basic block is known to have REG_FREQ_BASE frequency at the
time profile was built, so it is easier to manipulate with when you
don't care about exactness and it is also available without profile
feedback based on guesses made by compiler.
Probablity is set for edges and represent probability that control flow
will leave basic block via that edge, again relative to REG_FREQ_BASE.

Honza
> 
> Thanks
> 
> -Ghassan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Superblock Instruction Scheduling in GCC 3.4
  2004-01-28 12:23       ` Jan Hubicka
@ 2004-01-29  7:07         ` Ghassan Shobaki
  2004-01-29 10:13           ` Jan Hubicka
  0 siblings, 1 reply; 33+ messages in thread
From: Ghassan Shobaki @ 2004-01-29  7:07 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc-help, Vladimir N. Makarov, gcc, hubicka


Ok, I have just verified that gcc DOES accept the -fsched2-use-tracer and
invoke the ebb scheduler as expected. However, it does not set the
flag_branch_probabilities automatically. It only sets it when I
explicitly use the -fbranch-probabilities command-line switch. Here are
the two cases that I have tried:

g++ -O3 -fsched2-use-traces
Generates ~151K superblocks on my benchmark suite with lots of large
superblocks that include 10 basic blocks or more

g++ -O3 -fsched2-use-traces -fbranch-probabilities
Generates only ~123K superblocks on my benchmark suite with the vast
majority of superblocks consisting of less than 10 basic blocks

So, the question is: Why did the compiler generate more superblocks
when branch probabilities were not computed? Do the superblocks generated
in that case make any sense?
And the bottom line question for me is: which setting should I use in my
research on superblocks?

Thanks

-Ghassan


On Wed, 28 Jan 2004, Jan Hubicka wrote:

> > Hi,
> >
> > I am currently experimenting with superblock scheduling in a prerelease
> > version (20040114) of gcc 3.4 on the spec2000 benchmarks. I have
> > a few questions:
> >
> > The version that I have does not seem to accept the
> > -fsched2-use-traces command line switch. So I went a head and set the
> > following global variables in toplev.c in order to enforce superblock
> > formation and scheduling:
> >
> > int flag_optimize = 1;
> > int flag_schedule_insns = 1;
> > int flag_schedule_insns_after_reload = 1;
> > int flag_sched2_use_traces = 1;
> >
> > Q1. Is there a known bug in version 3.4 related to command-line options?
> >     Or I am doing something wrong?
>
> I jus ttested and -fsched2-use-traces seems to just work for me.  What
> kind of problems do you get?
> >
> > Q2. Should I set the following flag as well?
> > int flag_branch_probabilities = ?;
> > I tried it both ways and the superblocks generated when this flag is
> > RESET are on average larger and have more branches in them (they are much
> > harder to schedule), which does not make sense to me. What is superblock
> > formation based on when branch probablities are not computed? It seems to
> > me that setting this flag should be necessary!
>
> Yes, however it should be implied by -O2 flag.
> You also may want the profile feedback.  Without -fbranch-probabilities
> you won't get any superblocks formed.
> >
> > Q3. Are there any other flags that I need to set in order to ensure
> > proper superblock formation?
>
> I think -O2 -fsched2-use-traces shall do what you want.
> >
> > Q3. I have looked into the trace formation code in tracer.c and was
> > confused about three different terms related to about the same
> > concept : probability, count and frequency. Is there a documention that
> > precisely defines each of these three terms and how they are used in
> > trace formation?
>
> You may take a look at cfg.texi documentation I sent to mailing list
> year or two ago, but it never went into official sources.
> In general the count represent number of execution of given edge/block
> when profile is read from feedback.  It is 64bit number.
> Each basic block has frequence that represent how commonly it is
> executed relative to REG_FREQ_BASE.  It is 32bit number and the most
> frequent basic block is known to have REG_FREQ_BASE frequency at the
> time profile was built, so it is easier to manipulate with when you
> don't care about exactness and it is also available without profile
> feedback based on guesses made by compiler.
> Probablity is set for edges and represent probability that control flow
> will leave basic block via that edge, again relative to REG_FREQ_BASE.
>
> Honza
> >
> > Thanks
> >
> > -Ghassan
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Superblock Instruction Scheduling in GCC 3.4
  2004-01-29  7:07         ` Ghassan Shobaki
@ 2004-01-29 10:13           ` Jan Hubicka
  2004-02-01 18:17             ` Ghassan Shobaki
  0 siblings, 1 reply; 33+ messages in thread
From: Jan Hubicka @ 2004-01-29 10:13 UTC (permalink / raw)
  To: Ghassan Shobaki; +Cc: Jan Hubicka, gcc-help, Vladimir N. Makarov, gcc, hubicka

> 
> Ok, I have just verified that gcc DOES accept the -fsched2-use-tracer and
> invoke the ebb scheduler as expected. However, it does not set the
> flag_branch_probabilities automatically. It only sets it when I
> explicitly use the -fbranch-probabilities command-line switch. Here are
> the two cases that I have tried:
> 
> g++ -O3 -fsched2-use-traces
> Generates ~151K superblocks on my benchmark suite with lots of large
> superblocks that include 10 basic blocks or more
> 
> g++ -O3 -fsched2-use-traces -fbranch-probabilities
> Generates only ~123K superblocks on my benchmark suite with the vast
> majority of superblocks consisting of less than 10 basic blocks

-fbranch-probabilities can be accpeted only when program has been
earlier profiled.  GCC does have logic for statically guessing the
branch outcomes when these are not available
(-fguess-branch-probability) so the superblocks can be built, just they
are inferrior to those built with feedback available.
> 
> So, the question is: Why did the compiler generate more superblocks
> when branch probabilities were not computed? Do the superblocks generated
> in that case make any sense?
> And the bottom line question for me is: which setting should I use in my
> research on superblocks?

It is always better to use the profile, so I would recommend you
-fbranch-probabilities unless you are interested in experiments with
static prediction algorithms.

Honza

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Superblock Instruction Scheduling in GCC 3.4
  2004-01-29 10:13           ` Jan Hubicka
@ 2004-02-01 18:17             ` Ghassan Shobaki
  2004-02-01 21:55               ` Jan Hubicka
  0 siblings, 1 reply; 33+ messages in thread
From: Ghassan Shobaki @ 2004-02-01 18:17 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Jan Hubicka, gcc-help, Vladimir N. Makarov, gcc


Eventually, I will be doing profile-based experiments. However, at this
point I am interested in static probabilities because it is an easier option
that will allow me to get some initial results more quickly.
Now, my question is: when I used the f-branch-probablities switch without
doing profiling first, gcc still accepted it and generated some
superblocks. Were these invalid superblocks or what?

Thanks

-Ghassan


On Thu, 29 Jan 2004, Jan Hubicka wrote:

> >
> > Ok, I have just verified that gcc DOES accept the -fsched2-use-tracer and
> > invoke the ebb scheduler as expected. However, it does not set the
> > flag_branch_probabilities automatically. It only sets it when I
> > explicitly use the -fbranch-probabilities command-line switch. Here are
> > the two cases that I have tried:
> >
> > g++ -O3 -fsched2-use-traces
> > Generates ~151K superblocks on my benchmark suite with lots of large
> > superblocks that include 10 basic blocks or more
> >
> > g++ -O3 -fsched2-use-traces -fbranch-probabilities
> > Generates only ~123K superblocks on my benchmark suite with the vast
> > majority of superblocks consisting of less than 10 basic blocks
>
> -fbranch-probabilities can be accpeted only when program has been
> earlier profiled.  GCC does have logic for statically guessing the
> branch outcomes when these are not available
> (-fguess-branch-probability) so the superblocks can be built, just they
> are inferrior to those built with feedback available.
> >
> > So, the question is: Why did the compiler generate more superblocks
> > when branch probabilities were not computed? Do the superblocks generated
> > in that case make any sense?
> > And the bottom line question for me is: which setting should I use in my
> > research on superblocks?
>
> It is always better to use the profile, so I would recommend you
> -fbranch-probabilities unless you are interested in experiments with
> static prediction algorithms.
>
> Honza
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Superblock Instruction Scheduling in GCC 3.4
  2004-02-01 18:17             ` Ghassan Shobaki
@ 2004-02-01 21:55               ` Jan Hubicka
  2004-02-02  5:52                 ` Ghassan Shobaki
  2004-02-10 18:57                 ` Superblock Scheduling Alg in GCC Ghassan Shobaki
  0 siblings, 2 replies; 33+ messages in thread
From: Jan Hubicka @ 2004-02-01 21:55 UTC (permalink / raw)
  To: Ghassan Shobaki
  Cc: Jan Hubicka, Jan Hubicka, gcc-help, Vladimir N. Makarov, gcc

> 
> Eventually, I will be doing profile-based experiments. However, at this
> point I am interested in static probabilities because it is an easier option
> that will allow me to get some initial results more quickly.
> Now, my question is: when I used the f-branch-probablities switch without
> doing profiling first, gcc still accepted it and generated some
> superblocks. Were these invalid superblocks or what?

With -fbranch-probabilities and no profile, GCC will assume that all the
blocks were never executed and conclude that they are cold.  This limits
several algorithms to more or less optimize for size rather than speed.
So if you want traces based on static predictions, you should be using
-fguess-branch-probability (default by -O2) and not
-fbranch-probabilities.

Honza
> 
> Thanks
> 
> -Ghassan
> 
> 
> On Thu, 29 Jan 2004, Jan Hubicka wrote:
> 
> > >
> > > Ok, I have just verified that gcc DOES accept the -fsched2-use-tracer and
> > > invoke the ebb scheduler as expected. However, it does not set the
> > > flag_branch_probabilities automatically. It only sets it when I
> > > explicitly use the -fbranch-probabilities command-line switch. Here are
> > > the two cases that I have tried:
> > >
> > > g++ -O3 -fsched2-use-traces
> > > Generates ~151K superblocks on my benchmark suite with lots of large
> > > superblocks that include 10 basic blocks or more
> > >
> > > g++ -O3 -fsched2-use-traces -fbranch-probabilities
> > > Generates only ~123K superblocks on my benchmark suite with the vast
> > > majority of superblocks consisting of less than 10 basic blocks
> >
> > -fbranch-probabilities can be accpeted only when program has been
> > earlier profiled.  GCC does have logic for statically guessing the
> > branch outcomes when these are not available
> > (-fguess-branch-probability) so the superblocks can be built, just they
> > are inferrior to those built with feedback available.
> > >
> > > So, the question is: Why did the compiler generate more superblocks
> > > when branch probabilities were not computed? Do the superblocks generated
> > > in that case make any sense?
> > > And the bottom line question for me is: which setting should I use in my
> > > research on superblocks?
> >
> > It is always better to use the profile, so I would recommend you
> > -fbranch-probabilities unless you are interested in experiments with
> > static prediction algorithms.
> >
> > Honza
> >

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Superblock Instruction Scheduling in GCC 3.4
  2004-02-01 21:55               ` Jan Hubicka
@ 2004-02-02  5:52                 ` Ghassan Shobaki
  2004-02-10 18:57                 ` Superblock Scheduling Alg in GCC Ghassan Shobaki
  1 sibling, 0 replies; 33+ messages in thread
From: Ghassan Shobaki @ 2004-02-02  5:52 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Jan Hubicka, gcc-help, Vladimir N. Makarov, gcc


That explains the results the I am getting. Things make sense to me now.

Thank you very much!

-Ghassan

On Sun, 1 Feb 2004, Jan Hubicka wrote:

> >
> > Eventually, I will be doing profile-based experiments. However, at this
> > point I am interested in static probabilities because it is an easier option
> > that will allow me to get some initial results more quickly.
> > Now, my question is: when I used the f-branch-probablities switch without
> > doing profiling first, gcc still accepted it and generated some
> > superblocks. Were these invalid superblocks or what?
>
> With -fbranch-probabilities and no profile, GCC will assume that all the
> blocks were never executed and conclude that they are cold.  This limits
> several algorithms to more or less optimize for size rather than speed.
> So if you want traces based on static predictions, you should be using
> -fguess-branch-probability (default by -O2) and not
> -fbranch-probabilities.
>
> Honza
> >
> > Thanks
> >
> > -Ghassan
> >
> >
> > On Thu, 29 Jan 2004, Jan Hubicka wrote:
> >
> > > >
> > > > Ok, I have just verified that gcc DOES accept the -fsched2-use-tracer and
> > > > invoke the ebb scheduler as expected. However, it does not set the
> > > > flag_branch_probabilities automatically. It only sets it when I
> > > > explicitly use the -fbranch-probabilities command-line switch. Here are
> > > > the two cases that I have tried:
> > > >
> > > > g++ -O3 -fsched2-use-traces
> > > > Generates ~151K superblocks on my benchmark suite with lots of large
> > > > superblocks that include 10 basic blocks or more
> > > >
> > > > g++ -O3 -fsched2-use-traces -fbranch-probabilities
> > > > Generates only ~123K superblocks on my benchmark suite with the vast
> > > > majority of superblocks consisting of less than 10 basic blocks
> > >
> > > -fbranch-probabilities can be accpeted only when program has been
> > > earlier profiled.  GCC does have logic for statically guessing the
> > > branch outcomes when these are not available
> > > (-fguess-branch-probability) so the superblocks can be built, just they
> > > are inferrior to those built with feedback available.
> > > >
> > > > So, the question is: Why did the compiler generate more superblocks
> > > > when branch probabilities were not computed? Do the superblocks generated
> > > > in that case make any sense?
> > > > And the bottom line question for me is: which setting should I use in my
> > > > research on superblocks?
> > >
> > > It is always better to use the profile, so I would recommend you
> > > -fbranch-probabilities unless you are interested in experiments with
> > > static prediction algorithms.
> > >
> > > Honza
> > >
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Superblock Scheduling Alg in GCC
  2004-02-01 21:55               ` Jan Hubicka
  2004-02-02  5:52                 ` Ghassan Shobaki
@ 2004-02-10 18:57                 ` Ghassan Shobaki
  2004-02-10 20:08                   ` Vladimir Makarov
  1 sibling, 1 reply; 33+ messages in thread
From: Ghassan Shobaki @ 2004-02-10 18:57 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Jan Hubicka, gcc-help, gcc

Are there any documents describing the algorithm used in the superblock instruction 
scheduler? Does it use any (or a combination of) of published 
techniques such as critical path and speculative hedge and 
successive retirement ..etc? Or it just has its own algorithm?

Thanks

-Ghassan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Superblock Scheduling Alg in GCC
  2004-02-10 18:57                 ` Superblock Scheduling Alg in GCC Ghassan Shobaki
@ 2004-02-10 20:08                   ` Vladimir Makarov
  2004-02-10 20:17                     ` Vladimir Makarov
  0 siblings, 1 reply; 33+ messages in thread
From: Vladimir Makarov @ 2004-02-10 20:08 UTC (permalink / raw)
  To: Ghassan Shobaki; +Cc: Jan Hubicka, Jan Hubicka, gcc-help, gcc

Ghassan Shobaki wrote:

>Are there any documents describing the algorithm used in the superblock instruction 
>scheduler?
>
I don't know one.

> Does it use any (or a combination of) of published 
>techniques such as critical path and speculative hedge and 
>successive retirement ..etc? Or it just has its own algorithm?
>
> 
>
It uses own algorithm which was grown from original haifa-scheduler.   
Earlier it was one file which was divided.  Superblock scheduler uses 
code of sched-deps.c and haifa-sched.c and directs them through a few hooks.

Generally speaking suberblock is believed to be a basic block to which 
list scheduling is applied.  The superblock scheduler just checks that 
the insn can be issued speculatively and prefer to issue more frequently 
executed insns when the priority is the same (and now when insn register 
weights are the same).  But calculation of insn priorities does not take 
basic block frequencies (or belonging to different basic blocks) into 
account.  So the algorithm is very simple.

No more advanced approaches like heuristics based on critical path to 
the last exit of superblock, dependence height and speculative yeild 
(taking block excution probability into account when the insn priority 
is calculated), sucessive retirment (preference of non-speculative insn 
movement first), or speculative hedge aiming to achieve minimal delay to 
all exits are used.

So there are a lot of things to improve the code.  But it will be not 
easy to add them because big part of code is used by the region based 
scheduler too.

Vlad.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Superblock Scheduling Alg in GCC
  2004-02-10 20:08                   ` Vladimir Makarov
@ 2004-02-10 20:17                     ` Vladimir Makarov
  2004-02-10 20:27                       ` Jan Hubicka
  0 siblings, 1 reply; 33+ messages in thread
From: Vladimir Makarov @ 2004-02-10 20:17 UTC (permalink / raw)
  To: Vladimir Makarov; +Cc: Ghassan Shobaki, Jan Hubicka, Jan Hubicka, gcc-help, gcc

Vladimir Makarov wrote:

> Ghassan Shobaki wrote:
>
>> Are there any documents describing the algorithm used in the 
>> superblock instruction scheduler?
>>
> I don't know one.
>
>> Does it use any (or a combination of) of published techniques such as 
>> critical path and speculative hedge and successive retirement ..etc? 
>> Or it just has its own algorithm?
>>
>>
>>
> It uses own algorithm which was grown from original haifa-scheduler.   
> Earlier it was one file which was divided.  Superblock scheduler uses 
> code of sched-deps.c and haifa-sched.c and directs them through a few 
> hooks.
>
> Generally speaking suberblock is believed to be a basic block to which 
> list scheduling is applied.  The superblock scheduler just checks that 
> the insn can be issued speculatively and prefer to issue more 
> frequently executed insns when the priority is the same (and now when 
> insn register weights are the same).  But calculation of insn 
> priorities does not take basic block frequencies (or belonging to 
> different basic blocks) into account.  So the algorithm is very simple.
>
> No more advanced approaches like heuristics based on critical path to 
> the last exit of superblock, dependence height and speculative yeild 
> (taking block excution probability into account when the insn priority 
> is calculated), sucessive retirment (preference of non-speculative 
> insn movement first), or speculative hedge aiming to achieve minimal 
> delay to all exits are used.
>
> So there are a lot of things to improve the code.  But it will be not 
> easy to add them because big part of code is used by the region based 
> scheduler too.

Sorry, I forgot too add that what I wrote is about extended basic block 
scheduler (file sched-ebb.c) which is used for Itanium as a default 
after the register allocation.  If you are intersting in Jan Hubicka's 
trace scheduler, I think that Jan's answer will be more competent.

Vlad


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Superblock Scheduling Alg in GCC
  2004-02-10 20:17                     ` Vladimir Makarov
@ 2004-02-10 20:27                       ` Jan Hubicka
  2004-04-14 17:03                         ` Ghassan Shobaki
  2004-05-27  6:41                         ` Ghassan Shobaki
  0 siblings, 2 replies; 33+ messages in thread
From: Jan Hubicka @ 2004-02-10 20:27 UTC (permalink / raw)
  To: Vladimir Makarov; +Cc: Ghassan Shobaki, Jan Hubicka, Jan Hubicka, gcc-help, gcc

> Vladimir Makarov wrote:
> 
> >Ghassan Shobaki wrote:
> >
> >>Are there any documents describing the algorithm used in the 
> >>superblock instruction scheduler?
> >>
> >I don't know one.
> >
> >>Does it use any (or a combination of) of published techniques such as 
> >>critical path and speculative hedge and successive retirement ..etc? 
> >>Or it just has its own algorithm?
> >>
> >>
> >>
> >It uses own algorithm which was grown from original haifa-scheduler.   
> >Earlier it was one file which was divided.  Superblock scheduler uses 
> >code of sched-deps.c and haifa-sched.c and directs them through a few 
> >hooks.
> >
> >Generally speaking suberblock is believed to be a basic block to which 
> >list scheduling is applied.  The superblock scheduler just checks that 
> >the insn can be issued speculatively and prefer to issue more 
> >frequently executed insns when the priority is the same (and now when 
> >insn register weights are the same).  But calculation of insn 
> >priorities does not take basic block frequencies (or belonging to 
> >different basic blocks) into account.  So the algorithm is very simple.
> >
> >No more advanced approaches like heuristics based on critical path to 
> >the last exit of superblock, dependence height and speculative yeild 
> >(taking block excution probability into account when the insn priority 
> >is calculated), sucessive retirment (preference of non-speculative 
> >insn movement first), or speculative hedge aiming to achieve minimal 
> >delay to all exits are used.
> >
> >So there are a lot of things to improve the code.  But it will be not 
> >easy to add them because big part of code is used by the region based 
> >scheduler too.
> 
> Sorry, I forgot too add that what I wrote is about extended basic block 
> scheduler (file sched-ebb.c) which is used for Itanium as a default 
> after the register allocation.  If you are intersting in Jan Hubicka's 
> trace scheduler, I think that Jan's answer will be more competent.

I really didn't changed much in the algorithm.  All I was interested in
was to make it work with CFG and plugged in the tail duplication pass.
So your description is still valid.
The code has little logic to avoid moving instructions too much up and
adding some extra heuristics shall not be that dificult, but I didn't
read the papers on topic very curefully :)


Honza
> 
> Vlad
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Superblock Scheduling Alg in GCC
  2004-02-10 20:27                       ` Jan Hubicka
@ 2004-04-14 17:03                         ` Ghassan Shobaki
  2004-04-14 22:44                           ` Vladimir Makarov
  2004-05-27  6:41                         ` Ghassan Shobaki
  1 sibling, 1 reply; 33+ messages in thread
From: Ghassan Shobaki @ 2004-04-14 17:03 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Vladimir Makarov, Jan Hubicka, gcc-help, gcc

Guys,

I have done some experiments on two different heuristics for superblock 
scheduling by comparing them with optimal scheduling. The two heuristics 
are simple critical path CP (priority is the CP from the last branch only) 
and weighted critical path WCP (priority is a weighted sum of critical pahts 
from all branches below an instruction, with the branch weight being the 
probability of exiting the superblock at that branch). On the simple 
machine model that I am using, I got the following results: 

fp2000 benchmark: CP is optimal on 72% of the superblocks and WCP is 
optimal on 84% of the superblocks

int2000 benchmark: CP is optimal on 83% of the superblocks and WCP is 
optimal on 94% of the superblocks

These numbers do not necessarily reflect actual run-time performance 
improvements, since they were collected in a standalone setup were 
superblock data dependence graphs were extracted from gcc and scheduled
separately for a simple machine model (dual issue in which one int and one 
fp instruction can issue in each cycle). However, these results 
do suggest that the WCP heuristic is significantly better than the simple CP.

The reference for the WCP heuristic is:
R. Bringmann, Compiler-Controlled Speculation, PHD thesis, Dept. of CS, 
UIUC, IL 1995. (where the technique is called Dependence Height and 
Speculative Yield DHASY).

However, you don't really need to go there since the heuristic itself is 
so simple and intuitive that my description above is almost sufficient. 
Let me know if you are interested and I'll give you the details and 
explain one particular subtility about it.

Regards

-Ghassan

On Tue, 10 Feb 2004, Jan Hubicka wrote:

> > Vladimir Makarov wrote:
> > 
> > >Ghassan Shobaki wrote:
> > >
> > >>Are there any documents describing the algorithm used in the 
> > >>superblock instruction scheduler?
> > >>
> > >I don't know one.
> > >
> > >>Does it use any (or a combination of) of published techniques such as 
> > >>critical path and speculative hedge and successive retirement ..etc? 
> > >>Or it just has its own algorithm?
> > >>
> > >>
> > >>
> > >It uses own algorithm which was grown from original haifa-scheduler.   
> > >Earlier it was one file which was divided.  Superblock scheduler uses 
> > >code of sched-deps.c and haifa-sched.c and directs them through a few 
> > >hooks.
> > >
> > >Generally speaking suberblock is believed to be a basic block to which 
> > >list scheduling is applied.  The superblock scheduler just checks that 
> > >the insn can be issued speculatively and prefer to issue more 
> > >frequently executed insns when the priority is the same (and now when 
> > >insn register weights are the same).  But calculation of insn 
> > >priorities does not take basic block frequencies (or belonging to 
> > >different basic blocks) into account.  So the algorithm is very simple.
> > >
> > >No more advanced approaches like heuristics based on critical path to 
> > >the last exit of superblock, dependence height and speculative yeild 
> > >(taking block excution probability into account when the insn priority 
> > >is calculated), sucessive retirment (preference of non-speculative 
> > >insn movement first), or speculative hedge aiming to achieve minimal 
> > >delay to all exits are used.
> > >
> > >So there are a lot of things to improve the code.  But it will be not 
> > >easy to add them because big part of code is used by the region based 
> > >scheduler too.
> > 
> > Sorry, I forgot too add that what I wrote is about extended basic block 
> > scheduler (file sched-ebb.c) which is used for Itanium as a default 
> > after the register allocation.  If you are intersting in Jan Hubicka's 
> > trace scheduler, I think that Jan's answer will be more competent.
> 
> I really didn't changed much in the algorithm.  All I was interested in
> was to make it work with CFG and plugged in the tail duplication pass.
> So your description is still valid.
> The code has little logic to avoid moving instructions too much up and
> adding some extra heuristics shall not be that dificult, but I didn't
> read the papers on topic very curefully :)
> 
> 
> Honza
> > 
> > Vlad
> > 
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Superblock Scheduling Alg in GCC
  2004-04-14 17:03                         ` Ghassan Shobaki
@ 2004-04-14 22:44                           ` Vladimir Makarov
       [not found]                             ` <Pine.LNX.4.58.0404142056120.4634@hawking.ece.ucdavis.edu>
  0 siblings, 1 reply; 33+ messages in thread
From: Vladimir Makarov @ 2004-04-14 22:44 UTC (permalink / raw)
  To: Ghassan Shobaki; +Cc: Jan Hubicka, Jan Hubicka, gcc-help, gcc

Ghassan Shobaki wrote:

>Guys,
>
>I have done some experiments on two different heuristics for superblock 
>scheduling by comparing them with optimal scheduling. The two heuristics 
>are simple critical path CP (priority is the CP from the last branch only) 
>and weighted critical path WCP (priority is a weighted sum of critical pahts 
>from all branches below an instruction, with the branch weight being the 
>probability of exiting the superblock at that branch). On the simple 
>machine model that I am using, I got the following results: 
>
>fp2000 benchmark: CP is optimal on 72% of the superblocks and WCP is 
>optimal on 84% of the superblocks
>
>int2000 benchmark: CP is optimal on 83% of the superblocks and WCP is 
>optimal on 94% of the superblocks
>
>These numbers do not necessarily reflect actual run-time performance 
>improvements, since they were collected in a standalone setup were 
>superblock data dependence graphs were extracted from gcc and scheduled
>separately for a simple machine model (dual issue in which one int and one 
>fp instruction can issue in each cycle). However, these results 
>do suggest that the WCP heuristic is significantly better than the simple CP.
>
>The reference for the WCP heuristic is:
>R. Bringmann, Compiler-Controlled Speculation, PHD thesis, Dept. of CS, 
>UIUC, IL 1995. (where the technique is called Dependence Height and 
>Speculative Yield DHASY).
>
>However, you don't really need to go there since the heuristic itself is 
>so simple and intuitive that my description above is almost sufficient. 
>Let me know if you are interested and I'll give you the details and 
>explain one particular subtility about it.
>
>  
>
Yes, I am interesting in this.  Could you give me the reference to WCP 
details (I can not find the document on the Internet).   I could give it 
a try.  May be this approach will work.

Vlad


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Superblock Scheduling Alg in GCC
       [not found]                             ` <Pine.LNX.4.58.0404142056120.4634@hawking.ece.ucdavis.edu>
@ 2004-04-17 19:29                               ` Ghassan Shobaki
  0 siblings, 0 replies; 33+ messages in thread
From: Ghassan Shobaki @ 2004-04-17 19:29 UTC (permalink / raw)
  To: Vladimir Makarov; +Cc: Jan Hubicka, Jan Hubicka, gcc-help, gcc


I had sent this message on Thursday attaching the paper to it, but it 
bounced due to the size limit enforced by the gcc email server. Here is 
the paper citation:

A. Eichenberger and W. Meleis, Balance Scheduling: Weighing Branch 
Tradeoffs in Superblocks, 32nd Annual Intl. Conf. on Microarchitecture 
(IEEE/ACM), 1999, pp. 272. 
 
A soft copy can be found at:

http://www.ece.neu.edu/faculty/meleis.html

Regards

-Ghassan


On Thu, 15 Apr 2004, Ghassan Shobaki wrote:

> 
> The attached paper by Eichenberger and Meleis includes a 
> sufficiently clear description of the Dependence Height and Speculative 
> Yield DHASY Heuristic (the one that I prefer to call weighted 
> critical path WCP for simplicity). The objective of the paper is 
> presenting a complex heuristic, called Balance Scheduling, which 
> seems to be too slow for practical use, but in the Background section 
> they describe DHASY among other simpler heuristics. DHASY appears in the 
> second paragraph of page 274.
> 
> In my experiments I have used a modified version of it that seems to work 
> a bit better in practice. I don't yet have a good explanation why my 
> modified version works better, but I can provide you with some information 
> about it if you want. 
> 
> -Ghassan
>  
> 
> On Wed, 14 Apr 2004, Vladimir Makarov wrote:
> 
> > Ghassan Shobaki wrote:
> > 
> > >Guys,
> > >
> > >I have done some experiments on two different heuristics for superblock 
> > >scheduling by comparing them with optimal scheduling. The two heuristics 
> > >are simple critical path CP (priority is the CP from the last branch only) 
> > >and weighted critical path WCP (priority is a weighted sum of critical pahts 
> > >from all branches below an instruction, with the branch weight being the 
> > >probability of exiting the superblock at that branch). On the simple 
> > >machine model that I am using, I got the following results: 
> > >
> > >fp2000 benchmark: CP is optimal on 72% of the superblocks and WCP is 
> > >optimal on 84% of the superblocks
> > >
> > >int2000 benchmark: CP is optimal on 83% of the superblocks and WCP is 
> > >optimal on 94% of the superblocks
> > >
> > >These numbers do not necessarily reflect actual run-time performance 
> > >improvements, since they were collected in a standalone setup were 
> > >superblock data dependence graphs were extracted from gcc and scheduled
> > >separately for a simple machine model (dual issue in which one int and one 
> > >fp instruction can issue in each cycle). However, these results 
> > >do suggest that the WCP heuristic is significantly better than the simple CP.
> > >
> > >The reference for the WCP heuristic is:
> > >R. Bringmann, Compiler-Controlled Speculation, PHD thesis, Dept. of CS, 
> > >UIUC, IL 1995. (where the technique is called Dependence Height and 
> > >Speculative Yield DHASY).
> > >
> > >However, you don't really need to go there since the heuristic itself is 
> > >so simple and intuitive that my description above is almost sufficient. 
> > >Let me know if you are interested and I'll give you the details and 
> > >explain one particular subtility about it.
> > >
> > >  
> > >
> > Yes, I am interesting in this.  Could you give me the reference to WCP 
> > details (I can not find the document on the Internet).   I could give it 
> > a try.  May be this approach will work.
> > 
> > Vlad
> > 
> > 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Superblock Scheduling Alg in GCC
  2004-02-10 20:27                       ` Jan Hubicka
  2004-04-14 17:03                         ` Ghassan Shobaki
@ 2004-05-27  6:41                         ` Ghassan Shobaki
  2004-05-27 10:06                           ` Tree-SSA: Grammer Sumit Jain
  2005-02-11 16:55                           ` Trace Scheduling in GCC Ghassan Shobaki
  1 sibling, 2 replies; 33+ messages in thread
From: Ghassan Shobaki @ 2004-05-27  6:41 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Vladimir Makarov, Jan Hubicka, gcc-help, gcc


I am currently writing a paper on optimal superblock scheduling in 
which I compare optimal vs three heuristics (CP, DHASY and SR). In 
the experimental section, it would be nice if I can make a statement about
which heuristic is the closest to what gcc does. However, I am 
confused about your description below and have a few questions:  

- You are saying that it chooses the inst which has higher frequency of 
execution, but it does not take block execution frequency into account. 
This sounds contradictory to me! Is there a difference between instruction 
frequency and block frequency? Don't all insts within a basic block have 
the same frequency of execution?

- You are saying that the priority does not depend on the critical path 
from the last exit, but critical path is the primary and most 
effective priority scheme in list scheduling and without it I don't think 
the scheduler will behave well. Did I misunderstand this? Is it true that 
critical path is not used at all?

- You are saying that it does not favor local insts to insts from other 
basic blocks (speculative insts). However, I am under the impression that 
the Haifa scheduler does favor local instructions (useful code motion) to 
external instructions (speculative code motion). This impression comes 
from reaading the "Brenstein and Rodeh" paper on which the Haifa scheduler 
is most likely based. If the superblock scheduler was based on the Haifa 
scheduler then it will probably be giving lower priority to speculative 
motions. Is this true?

Thanks

-Ghassan
 
  

On Tue, 10 Feb 2004, Jan Hubicka wrote:

> > Vladimir Makarov wrote:
> > 
> > >Ghassan Shobaki wrote:
> > >
> > >>Are there any documents describing the algorithm used in the 
> > >>superblock instruction scheduler?
> > >>
> > >I don't know one.
> > >
> > >>Does it use any (or a combination of) of published techniques such as 
> > >>critical path and speculative hedge and successive retirement ..etc? 
> > >>Or it just has its own algorithm?
> > >>
> > >>
> > >>
> > >It uses own algorithm which was grown from original haifa-scheduler.   
> > >Earlier it was one file which was divided.  Superblock scheduler uses 
> > >code of sched-deps.c and haifa-sched.c and directs them through a few 
> > >hooks.
> > >
> > >Generally speaking suberblock is believed to be a basic block to which 
> > >list scheduling is applied.  The superblock scheduler just checks that 
> > >the insn can be issued speculatively and prefer to issue more 
> > >frequently executed insns when the priority is the same (and now when 
> > >insn register weights are the same).  But calculation of insn 
> > >priorities does not take basic block frequencies (or belonging to 
> > >different basic blocks) into account.  So the algorithm is very simple.
> > >
> > >No more advanced approaches like heuristics based on critical path to 
> > >the last exit of superblock, dependence height and speculative yeild 
> > >(taking block excution probability into account when the insn priority 
> > >is calculated), sucessive retirment (preference of non-speculative 
> > >insn movement first), or speculative hedge aiming to achieve minimal 
> > >delay to all exits are used.
> > >
> > >So there are a lot of things to improve the code.  But it will be not 
> > >easy to add them because big part of code is used by the region based 
> > >scheduler too.
> > 
> > Sorry, I forgot too add that what I wrote is about extended basic block 
> > scheduler (file sched-ebb.c) which is used for Itanium as a default 
> > after the register allocation.  If you are intersting in Jan Hubicka's 
> > trace scheduler, I think that Jan's answer will be more competent.
> 
> I really didn't changed much in the algorithm.  All I was interested in
> was to make it work with CFG and plugged in the tail duplication pass.
> So your description is still valid.
> The code has little logic to avoid moving instructions too much up and
> adding some extra heuristics shall not be that dificult, but I didn't
> read the papers on topic very curefully :)
> 
> 
> Honza
> > 
> > Vlad
> > 
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Tree-SSA: Grammer
  2004-05-27  6:41                         ` Ghassan Shobaki
@ 2004-05-27 10:06                           ` Sumit Jain
  2004-05-27 12:56                             ` llewelly
  2005-02-11 16:55                           ` Trace Scheduling in GCC Ghassan Shobaki
  1 sibling, 1 reply; 33+ messages in thread
From: Sumit Jain @ 2004-05-27 10:06 UTC (permalink / raw)
  To: gcc-help

Hi,

I am planning to use tree-ssa branch of gcc for my project which involves
some program analysis. I have tried to read all the documents related to
tree-ssa. But I am not sure that the GENERIC grammer given in the
gccsummit-2003-proceedings.pdf is complete or not. I have this doubt
because when I see the implementation of few functions related to ssa, I
see some NIY statements.

Is anyone using the ssa branch for the some purpose ? If you know about
this grammer, could you please clarify my doubt.

-S

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Tree-SSA: Grammer
  2004-05-27 10:06                           ` Tree-SSA: Grammer Sumit Jain
@ 2004-05-27 12:56                             ` llewelly
  0 siblings, 0 replies; 33+ messages in thread
From: llewelly @ 2004-05-27 12:56 UTC (permalink / raw)
  To: Sumit Jain; +Cc: gcc-help

Sumit Jain <sumit@cs.sunysb.edu> writes:

> Hi,
> 
> I am planning to use tree-ssa branch of gcc for my project which involves
> some program analysis. I have tried to read all the documents related to
> tree-ssa. But I am not sure that the GENERIC grammer given in the
> gccsummit-2003-proceedings.pdf is complete or not. I have this doubt
> because when I see the implementation of few functions related to ssa, I
> see some NIY statements.
> 
> Is anyone using the ssa branch for the some purpose ? If you know about
> this grammer, could you please clarify my doubt.

You will get much better answers to this question on 
gcc at gcc.gnu.org .

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Trace Scheduling in GCC
  2004-05-27  6:41                         ` Ghassan Shobaki
  2004-05-27 10:06                           ` Tree-SSA: Grammer Sumit Jain
@ 2005-02-11 16:55                           ` Ghassan Shobaki
       [not found]                             ` <4210D6C0.3030005@redhat.com>
  1 sibling, 1 reply; 33+ messages in thread
From: Ghassan Shobaki @ 2005-02-11 16:55 UTC (permalink / raw)
  To: gcc-help; +Cc: Vladimir Makarov, Jan Hubicka

Hi,

I have previously studied superblock scheduling by importing superblock 
DAGs from gcc and feeding them into my scheduler (I have published 
my results at Micro in case you are interested: 
http://www.microarch.org/micro37/papers/25_Shobaki-Superblock.pdf) 

Now, I wonder if I can do the same thing to traces by importing the DAGs and 
the necessary data-flow information from gcc. Unlike the superblock, which has 
a single entry and multipl exits, a trace is a more general structure that can 
have multiple entries and multiple exits. So, the question is: Can I 
extract traces for my experiments from gcc?

My understanding is that since version 3.4 gcc offers these two command-line 
options: 

-fsched2-use-superblocks: This does extended basic block scheduling; no 
trace formation or tail duplication. It just schedules the extended basic 
blocks that appear naturally in the code.

-fsched2-use-traces: This does superblock scheduling, i.e. it forms 
traces then it does tail duplication to eliminate side entrances.

I am right about the functionalities of these options?
It seems to me that none of these options meets my objective of studying 
traces that have mltiple entrances and multiple exits. Please correct me 
if I am wrong. And if I am right, is there another way for me to get what 
I want, may be by doing some quick hack that bypasses tail duplication under 
the -fsched2-use-tarces option -does this make sense?

Thanks
-Ghassan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Trace Scheduling in GCC
       [not found]                             ` <4210D6C0.3030005@redhat.com>
@ 2005-02-15  6:30                               ` Vladimir Makarov
  2006-11-24 18:42                                 ` Phase Ordering of Scheduling and Allocation " Ghassan Shobaki
  2005-02-15  7:45                               ` Trace Scheduling " Jan Hubicka
  2005-02-16 10:40                               ` Vladimir Makarov
  2 siblings, 1 reply; 33+ messages in thread
From: Vladimir Makarov @ 2005-02-15  6:30 UTC (permalink / raw)
  To: Ghassan Shobaki; +Cc: gcc-help, Jan Hubicka

Ghassan Shobaki wrote:

> Hi,
>
> I have previously studied superblock scheduling by importing 
> superblock DAGs from gcc and feeding them into my scheduler (I have 
> published my results at Micro in case you are interested: 
> http://www.microarch.org/micro37/papers/25_Shobaki-Superblock.pdf)  
>
Thanks for the article, Ghassan.  But I don't think this is the first 
optimal algorithm as you wrote in your introdauction.  I implemented an 
analogous approach about seven years ago.  And I am sure I am not the first. 

The problem with the optimal algorithms is that they describe always 
simplified models.  For example, the real register allocation is not 
only assigning registers.  It is also coalescing, spilling, 
rematerialization, register elimination, sometimes code selection etc. 
The insn scheduling is not only rearranging insns but also partial 
register renaming and forward substitution, insn cloning, sometimes 
predication, insn mutation, code unification, supporting data/control 
speculation hardware, code unification etc.

It can not be decribed by a resonable model for optimal solution.  IMHO 
usage of metaheuristic (like simulated annealing, taboo search, genetic 
aproaches etc) to get a better solution of real problems is more 
productive way.  They work fine for real problem like chip design, 
tracing and placements in board designing.

In any case, I'll read your article with a great attention.  It will be 
a joy to me to compare your approach with mine.

If you could manage integrate your code and approach into gcc for its 
real usage, that would be great.  That would be the first industrial 
compiler using such approach.

> Now, I wonder if I can do the same thing to traces by importing the 
> DAGs and the necessary data-flow information from gcc. Unlike the 
> superblock, which has a single entry and multipl exits, a trace is a 
> more general structure that can have multiple entries and multiple 
> exits. So, the question is: Can I 
> extract traces for my experiments from gcc?
>
> My understanding is that since version 3.4 gcc offers these two 
> command-line options:
> -fsched2-use-superblocks: This does extended basic block scheduling; 
> no trace formation or tail duplication. It just schedules the extended 
> basic blocks that appear naturally in the code.
>
> -fsched2-use-traces: This does superblock scheduling, i.e. it forms 
> traces then it does tail duplication to eliminate side entrances.
>
> I am right about the functionalities of these options?
> It seems to me that none of these options meets my objective of 
> studying traces that have mltiple entrances and multiple exits. Please 
> correct me if I am wrong. And if I am right, is there another way for 
> me to get what I want, may be by doing some quick hack that bypasses 
> tail duplication under 
> the -fsched2-use-tarces option -does this make sense?
>
>  
>
Sorry, I don't know this code well.  I am in the same position as you. I 
should research the code to answer your questions.  I think Jan is 
the right person to do it.  Because he wrote this code.

Vlad

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Trace Scheduling in GCC
       [not found]                             ` <4210D6C0.3030005@redhat.com>
  2005-02-15  6:30                               ` Vladimir Makarov
@ 2005-02-15  7:45                               ` Jan Hubicka
  2005-02-15  9:18                                 ` Ghassan Shobaki
  2005-02-16 10:40                               ` Vladimir Makarov
  2 siblings, 1 reply; 33+ messages in thread
From: Jan Hubicka @ 2005-02-15  7:45 UTC (permalink / raw)
  To: Vladimir Makarov; +Cc: Ghassan Shobaki, gcc-help, Jan Hubicka

> >
> >I am right about the functionalities of these options?
> >It seems to me that none of these options meets my objective of studying 
> >traces that have mltiple entrances and multiple exits. Please correct me 
> >if I am wrong. And if I am right, is there another way for me to get what 
> >I want, may be by doing some quick hack that bypasses tail duplication 
> >under the -fsched2-use-tarces option -does this make sense?
> >
> > 
> >
> Sorry, I don't know this code well.  I am in the same position as you. 
> I should research the code to answer your questions.  I think Jan is 
> the right person to do it.  Because he wrote this code.

The -fsched2-use-traces is simple superblock scheduler that is preceeded
by trace discovery and tail duplication pass.
So if you want traces directly, you might steal the code from tracer.c
but except for that I don't think it suits your needs very well,
unforutnately.

Honza
> 
> Vlad
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Trace Scheduling in GCC
  2005-02-15  7:45                               ` Trace Scheduling " Jan Hubicka
@ 2005-02-15  9:18                                 ` Ghassan Shobaki
  2005-02-17 16:19                                   ` Jan Hubicka
  0 siblings, 1 reply; 33+ messages in thread
From: Ghassan Shobaki @ 2005-02-15  9:18 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Vladimir Makarov, gcc-help, Jan Hubicka

Honza,

But what will happen if I try to bypass the tail duplication stage? In
theory that should generate traces. However, the question is will the
data dependence graph construction and scheduling stages still work? (for
my purposes, the scheduler does not have to produce correct schedules,
since all what I need is importing the data dependence graphs to
feed them into my standalone scheduler. So, it is all about the data
dependence graphs for my purpose).

Thanks
-Ghassan


On Mon, 14 Feb 2005, Jan Hubicka wrote:

> > >
> > >I am right about the functionalities of these options?
> > >It seems to me that none of these options meets my objective of studying
> > >traces that have mltiple entrances and multiple exits. Please correct me
> > >if I am wrong. And if I am right, is there another way for me to get what
> > >I want, may be by doing some quick hack that bypasses tail duplication
> > >under the -fsched2-use-tarces option -does this make sense?
> > >
> > >
> > >
> > Sorry, I don't know this code well.  I am in the same position as you.
> > I should research the code to answer your questions.  I think Jan is
> > the right person to do it.  Because he wrote this code.
>
> The -fsched2-use-traces is simple superblock scheduler that is preceeded
> by trace discovery and tail duplication pass.
> So if you want traces directly, you might steal the code from tracer.c
> but except for that I don't think it suits your needs very well,
> unforutnately.
>
> Honza
> >
> > Vlad
> >
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Trace Scheduling in GCC
       [not found]                             ` <4210D6C0.3030005@redhat.com>
  2005-02-15  6:30                               ` Vladimir Makarov
  2005-02-15  7:45                               ` Trace Scheduling " Jan Hubicka
@ 2005-02-16 10:40                               ` Vladimir Makarov
       [not found]                                 ` <Pine.GHP.4.58.0502150840080.14429@dante.ece.ucdavis.edu>
  2 siblings, 1 reply; 33+ messages in thread
From: Vladimir Makarov @ 2005-02-16 10:40 UTC (permalink / raw)
  To: Ghassan Shobaki; +Cc: gcc-help, Jan Hubicka

Vladimir Makarov wrote:

> Ghassan Shobaki wrote:
>
>> Hi,
>>
>> I have previously studied superblock scheduling by importing 
>> superblock DAGs from gcc and feeding them into my scheduler (I have 
>> published my results at Micro in case you are interested: 
>> http://www.microarch.org/micro37/papers/25_Shobaki-Superblock.pdf)  
>>
> Thanks for the article, Ghassan.  But I don't think this is the first 
> optimal algorithm as you wrote in your introdauction.  I implemented 
> an analogous approach about seven years ago.  And I am sure I am not 
> the first.
>
Ghassan, my appologies.  I've read your article.  What I wrote was about 
optimal basic blocks scheduling.  I've not seen articles about optimal 
scheduling beyond basic blocks.  So most probably your article is really 
the first published work about optimal scheduling of superblocks.

My appologies again.

Vlad


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Trace Scheduling in GCC
       [not found]                                 ` <Pine.GHP.4.58.0502150840080.14429@dante.ece.ucdavis.edu>
@ 2005-02-16 11:19                                   ` Jan Hubicka
  0 siblings, 0 replies; 33+ messages in thread
From: Jan Hubicka @ 2005-02-16 11:19 UTC (permalink / raw)
  To: Ghassan Shobaki; +Cc: Vladimir Makarov, gcc-help, Jan Hubicka

> Vladimir,
> 
> No problem at all! I am currently working on the much harder problem of
> trace scheduling (again, to the best of my knowledge, there has been no
> optimal solution for it). That's why I need to get actual traces from a
> production compiler to experimentally evaluate my algorithm. It looks
> that what I need to do is examine the trace formation code in gcc and see
> if I can eliminate the tail duplication step.

This should be fairly easy - see function find_trace that given a seed
BB will return you fully grown trace.  The algorithm match pretty
closely what Impact folks describe in their papers.

Honza
> 
> Thank you!
> 
> -Ghassan
> 
> 
> On Tue, 15 Feb 2005, Vladimir Makarov wrote:
> 
> > Vladimir Makarov wrote:
> >
> > > Ghassan Shobaki wrote:
> > >
> > >> Hi,
> > >>
> > >> I have previously studied superblock scheduling by importing
> > >> superblock DAGs from gcc and feeding them into my scheduler (I have
> > >> published my results at Micro in case you are interested:
> > >> http://www.microarch.org/micro37/papers/25_Shobaki-Superblock.pdf)
> > >>
> > > Thanks for the article, Ghassan.  But I don't think this is the first
> > > optimal algorithm as you wrote in your introdauction.  I implemented
> > > an analogous approach about seven years ago.  And I am sure I am not
> > > the first.
> > >
> > Ghassan, my appologies.  I've read your article.  What I wrote was about
> > optimal basic blocks scheduling.  I've not seen articles about optimal
> > scheduling beyond basic blocks.  So most probably your article is really
> > the first published work about optimal scheduling of superblocks.
> >
> > My appologies again.
> >
> > Vlad
> >
> >

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Trace Scheduling in GCC
  2005-02-15  9:18                                 ` Ghassan Shobaki
@ 2005-02-17 16:19                                   ` Jan Hubicka
  0 siblings, 0 replies; 33+ messages in thread
From: Jan Hubicka @ 2005-02-17 16:19 UTC (permalink / raw)
  To: Ghassan Shobaki; +Cc: Jan Hubicka, Vladimir Makarov, gcc-help, Jan Hubicka

> Honza,
> 
> But what will happen if I try to bypass the tail duplication stage? In
> theory that should generate traces. However, the question is will the
> data dependence graph construction and scheduling stages still work? (for

I am not too faimilar with scheduler itself, but I believe that it
builds full dependence graph for region scheduling, so you ought to be
safe.

Honza
> my purposes, the scheduler does not have to produce correct schedules,
> since all what I need is importing the data dependence graphs to
> feed them into my standalone scheduler. So, it is all about the data
> dependence graphs for my purpose).
> 
> Thanks
> -Ghassan
> 
> 
> On Mon, 14 Feb 2005, Jan Hubicka wrote:
> 
> > > >
> > > >I am right about the functionalities of these options?
> > > >It seems to me that none of these options meets my objective of studying
> > > >traces that have mltiple entrances and multiple exits. Please correct me
> > > >if I am wrong. And if I am right, is there another way for me to get what
> > > >I want, may be by doing some quick hack that bypasses tail duplication
> > > >under the -fsched2-use-tarces option -does this make sense?
> > > >
> > > >
> > > >
> > > Sorry, I don't know this code well.  I am in the same position as you.
> > > I should research the code to answer your questions.  I think Jan is
> > > the right person to do it.  Because he wrote this code.
> >
> > The -fsched2-use-traces is simple superblock scheduler that is preceeded
> > by trace discovery and tail duplication pass.
> > So if you want traces directly, you might steal the code from tracer.c
> > but except for that I don't think it suits your needs very well,
> > unforutnately.
> >
> > Honza
> > >
> > > Vlad
> > >
> >

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Phase Ordering of Scheduling and Allocation in GCC
  2005-02-15  6:30                               ` Vladimir Makarov
@ 2006-11-24 18:42                                 ` Ghassan Shobaki
  2006-11-24 19:28                                   ` Vladimir N. Makarov
  0 siblings, 1 reply; 33+ messages in thread
From: Ghassan Shobaki @ 2006-11-24 18:42 UTC (permalink / raw)
  To: gcc-help; +Cc: Jan Hubicka, vmakarov

Hi,

I am currently experimenting with superblock and trace scheduling in GCC 
(using the -fsched2-use-superblocks option). In an attempt to get traces 
with more parallelism, I tried to change the phase ordering so that 
superblcok scheduling is done before register allocation. However, when I 
switched the order and ran the compiler I got the following error 
message:

bits.c:113: internal compiler error: basic blocks not laid down 
consecutively
Please submit a full bug report,

So, the question is: will it be possible to do superblock scheduling 
before regsiter allocation or there is a fundamental reason that prohibits 
that?

If that's possible, any idea what changes I need to make besides 
simply switching the order in which the two phases are invoked?
The above error message may give you a good idea about the problem.

Thanks
-Ghassan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Phase Ordering of Scheduling and Allocation in GCC
  2006-11-24 18:42                                 ` Phase Ordering of Scheduling and Allocation " Ghassan Shobaki
@ 2006-11-24 19:28                                   ` Vladimir N. Makarov
  0 siblings, 0 replies; 33+ messages in thread
From: Vladimir N. Makarov @ 2006-11-24 19:28 UTC (permalink / raw)
  To: Ghassan Shobaki; +Cc: gcc-help, Jan Hubicka

Ghassan Shobaki wrote:

>Hi,
>
>I am currently experimenting with superblock and trace scheduling in GCC 
>(using the -fsched2-use-superblocks option). In an attempt to get traces 
>with more parallelism, I tried to change the phase ordering so that 
>superblcok scheduling is done before register allocation. However, when I 
>switched the order and ran the compiler I got the following error 
>message:
>
>bits.c:113: internal compiler error: basic blocks not laid down 
>consecutively
>Please submit a full bug report,
>
>So, the question is: will it be possible to do superblock scheduling 
>before regsiter allocation or there is a fundamental reason that prohibits 
>that?
>
>  
>
I don't know such reason but ebb scheduling was written for Itanium to 
work as practically the very last pass.  It was not supposed to work 
before the register allocator.

>If that's possible, any idea what changes I need to make besides 
>simply switching the order in which the two phases are invoked?
>The above error message may give you a good idea about the problem.
>
>  
>
It is hard to say but I think that ebb scheduler is not accurate with 
maintaining CFG in a right shape therefore verifying CFG information 
failed after the ebb.  So you have to find what is wrong and fix it.  
Sorry there is no easier way to make it work before the register allocator.

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2006-11-24 19:28 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-12-03  6:14 Superblock Instruction Scheduling in GCC Ghassan Shobaki
2003-12-03 14:03 ` Vladimir N. Makarov
2003-12-04  0:06   ` Jan Hubicka
2003-12-04  1:26     ` Ghassan Shobaki
2003-12-04 10:25       ` Jan Hubicka
2003-12-04 15:54         ` Vladimir Makarov
2003-12-09 21:11     ` Ghassan Shobaki
2004-01-28  1:31     ` Superblock Instruction Scheduling in GCC 3.4 Ghassan Shobaki
2004-01-28 12:23       ` Jan Hubicka
2004-01-29  7:07         ` Ghassan Shobaki
2004-01-29 10:13           ` Jan Hubicka
2004-02-01 18:17             ` Ghassan Shobaki
2004-02-01 21:55               ` Jan Hubicka
2004-02-02  5:52                 ` Ghassan Shobaki
2004-02-10 18:57                 ` Superblock Scheduling Alg in GCC Ghassan Shobaki
2004-02-10 20:08                   ` Vladimir Makarov
2004-02-10 20:17                     ` Vladimir Makarov
2004-02-10 20:27                       ` Jan Hubicka
2004-04-14 17:03                         ` Ghassan Shobaki
2004-04-14 22:44                           ` Vladimir Makarov
     [not found]                             ` <Pine.LNX.4.58.0404142056120.4634@hawking.ece.ucdavis.edu>
2004-04-17 19:29                               ` Ghassan Shobaki
2004-05-27  6:41                         ` Ghassan Shobaki
2004-05-27 10:06                           ` Tree-SSA: Grammer Sumit Jain
2004-05-27 12:56                             ` llewelly
2005-02-11 16:55                           ` Trace Scheduling in GCC Ghassan Shobaki
     [not found]                             ` <4210D6C0.3030005@redhat.com>
2005-02-15  6:30                               ` Vladimir Makarov
2006-11-24 18:42                                 ` Phase Ordering of Scheduling and Allocation " Ghassan Shobaki
2006-11-24 19:28                                   ` Vladimir N. Makarov
2005-02-15  7:45                               ` Trace Scheduling " Jan Hubicka
2005-02-15  9:18                                 ` Ghassan Shobaki
2005-02-17 16:19                                   ` Jan Hubicka
2005-02-16 10:40                               ` Vladimir Makarov
     [not found]                                 ` <Pine.GHP.4.58.0502150840080.14429@dante.ece.ucdavis.edu>
2005-02-16 11:19                                   ` Jan Hubicka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).