public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* SMS scheduling
@ 2004-09-28 12:57 Canqun Yang
  2004-09-29 15:36 ` Mostafa Hagog
  0 siblings, 1 reply; 17+ messages in thread
From: Canqun Yang @ 2004-09-28 12:57 UTC (permalink / raw)
  To: Ayal Zaks, Mostafa Hagog; +Cc: gcc

Hi, all

I tested the Swing Modulo Scheduling of GCC. For 
simple program, the numerical caculation of PI, it 
achieves significant speedup on IA64. But, for little 
bit complex programs, the SMS can hardly work out a 
successful schedule.

The algorithms implemented by Ayal and Mostafa are 
correct. It seems that the SMS itself is wrong. The 
schedule priority order calculated by SMS is much 
different from normal MinDist algorithm. Besides this, 
SMS is not sensitive to II. Is SMS really wrong?
  

Canqun Yang

Creative Compiler Research Group.
National University of Defense Technology, China.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: SMS scheduling
  2004-09-28 12:57 SMS scheduling Canqun Yang
@ 2004-09-29 15:36 ` Mostafa Hagog
  2004-09-30  8:50   ` Mostafa Hagog
                     ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Mostafa Hagog @ 2004-09-29 15:36 UTC (permalink / raw)
  To: Canqun Yang; +Cc: Ayal Zaks, gcc


Canqun Yang <canqun@nudt.edu.cn> wrote on 28/09/2004 11:06:29:

> Hi, all
>
> I tested the Swing Modulo Scheduling of GCC. For
> simple program, the numerical caculation of PI, it
> achieves significant speedup on IA64. But, for little
> bit complex programs, the SMS can hardly work out a
> successful schedule.
>
> The algorithms implemented by Ayal and Mostafa are
> correct. It seems that the SMS itself is wrong. The
> schedule priority order calculated by SMS is much
> different from normal MinDist algorithm.

SMS prioritizes the nodes using critical patch based
heuristic - the main idea is that instructions that are
on the critical path are less flexible in means of
scheduling.
Why do you think that the priority order should be
according to MinDist algorithm (what is the minimum
distance in this case)? Can you provide an example
that supports this?

> Besides this,
> SMS is not sensitive to II. Is SMS really wrong?

The node ordering step in SMS is not sensitive to II
(as mentioned before it is based on critical path
heuristic).  However, the scheduling step is sensitive
to II - failing to schedule the nodes of a loop within
II cycles will lead to try scheduling the nodes again
within II + 1 cycles.


Mostafa.





^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: SMS scheduling
  2004-09-29 15:36 ` Mostafa Hagog
@ 2004-09-30  8:50   ` Mostafa Hagog
  2004-09-30  8:51     ` Richard Henderson
  2004-10-02 12:25   ` Canqun Yang
  2004-10-09 17:55   ` Canqun Yang
  2 siblings, 1 reply; 17+ messages in thread
From: Mostafa Hagog @ 2004-09-30  8:50 UTC (permalink / raw)
  To: Canqun Yang; +Cc: gcc


BTW, there are several improvement to SMS that
are postponed to post-gcc4.0, these enhancement include:
1. Use of loop-info structure to detect loops.
2. Use of alias sets to remove memory dependencies.
3. Remove register anti-dependencies.
Patches for these improvements are ready.
We are also working on using the register pressure utility
in SMS.

Mostafa.

Mostafa Hagog wrote on 29/09/2004 15:12:26:

>
> Canqun Yang <canqun@nudt.edu.cn> wrote on 28/09/2004 11:06:29:
>
> > Hi, all
> >
> > I tested the Swing Modulo Scheduling of GCC. For
> > simple program, the numerical caculation of PI, it
> > achieves significant speedup on IA64. But, for little
> > bit complex programs, the SMS can hardly work out a
> > successful schedule.
> >
> > The algorithms implemented by Ayal and Mostafa are
> > correct. It seems that the SMS itself is wrong. The
> > schedule priority order calculated by SMS is much
> > different from normal MinDist algorithm.
>
> SMS prioritizes the nodes using critical patch based
> heuristic - the main idea is that instructions that are
> on the critical path are less flexible in means of
> scheduling.
> Why do you think that the priority order should be
> according to MinDist algorithm (what is the minimum
> distance in this case)? Can you provide an example
> that supports this?
>
> > Besides this,
> > SMS is not sensitive to II. Is SMS really wrong?
>
> The node ordering step in SMS is not sensitive to II
> (as mentioned before it is based on critical path
> heuristic).  However, the scheduling step is sensitive
> to II - failing to schedule the nodes of a loop within
> II cycles will lead to try scheduling the nodes again
> within II + 1 cycles.
>
>
> Mostafa.
>
>
>
>
>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: SMS scheduling
  2004-09-30  8:50   ` Mostafa Hagog
@ 2004-09-30  8:51     ` Richard Henderson
  2004-09-30 10:57       ` Mostafa Hagog
  0 siblings, 1 reply; 17+ messages in thread
From: Richard Henderson @ 2004-09-30  8:51 UTC (permalink / raw)
  To: Mostafa Hagog; +Cc: Canqun Yang, gcc

On Thu, Sep 30, 2004 at 08:58:20AM +0200, Mostafa Hagog wrote:
> 2. Use of alias sets to remove memory dependencies.

Err, how are you detecting memory dependencies now? 
Aren't you using the alias.c routines?


r~

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: SMS scheduling
  2004-09-30  8:51     ` Richard Henderson
@ 2004-09-30 10:57       ` Mostafa Hagog
  2004-09-30 11:26         ` Richard Henderson
  0 siblings, 1 reply; 17+ messages in thread
From: Mostafa Hagog @ 2004-09-30 10:57 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Canqun Yang, gcc



Richard Henderson <rth@redhat.com> wrote on 30/09/2004 09:57:28:

> On Thu, Sep 30, 2004 at 08:58:20AM +0200, Mostafa Hagog wrote:
> > 2. Use of alias sets to remove memory dependencies.
>
> Err, how are you detecting memory dependencies now?
> Aren't you using the alias.c routines?

For 4.0 SMS intra-loop memory dependencies are detected using
the alias.c (as in haifa-sched) for inter-loop memory dependencies
we are very conservative and assume that all memory accesses are
aliased.

Mostafa.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: SMS scheduling
  2004-09-30 10:57       ` Mostafa Hagog
@ 2004-09-30 11:26         ` Richard Henderson
  2004-09-30 11:29           ` Mostafa Hagog
  0 siblings, 1 reply; 17+ messages in thread
From: Richard Henderson @ 2004-09-30 11:26 UTC (permalink / raw)
  To: Mostafa Hagog; +Cc: Canqun Yang, gcc

On Thu, Sep 30, 2004 at 10:03:37AM +0200, Mostafa Hagog wrote:
> For 4.0 SMS intra-loop memory dependencies are detected using
> the alias.c (as in haifa-sched) for inter-loop memory dependencies
> we are very conservative and assume that all memory accesses are
> aliased.

alias.c is flow insensative, and thus as conservative as you
could want.  Why aren't you using it all the time?


r~

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: SMS scheduling
  2004-09-30 11:26         ` Richard Henderson
@ 2004-09-30 11:29           ` Mostafa Hagog
  2004-09-30 19:34             ` Richard Henderson
  0 siblings, 1 reply; 17+ messages in thread
From: Mostafa Hagog @ 2004-09-30 11:29 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Canqun Yang, gcc, Ayal Zaks



Richard Henderson <rth@redhat.com> wrote on 30/09/2004 10:42:14:

> On Thu, Sep 30, 2004 at 10:03:37AM +0200, Mostafa Hagog wrote:
> > For 4.0 SMS intra-loop memory dependencies are detected using
> > the alias.c (as in haifa-sched) for inter-loop memory dependencies
> > we are very conservative and assume that all memory accesses are
> > aliased.
>
> alias.c is flow insensative, and thus as conservative as you
> could want.  Why aren't you using it all the time?

The answer is within your question; flow insensitive.
In SMS we are interested in loop-carried dependancies
and this is not provided by alias.c.  One thought is to
propagate the dependency information from trees down to the
RTL (Sebastian Bob looked into this).  This is a long term
working item for SMS.

Mostafa.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: SMS scheduling
  2004-09-30 11:29           ` Mostafa Hagog
@ 2004-09-30 19:34             ` Richard Henderson
  2004-10-03 17:59               ` Mostafa Hagog
  0 siblings, 1 reply; 17+ messages in thread
From: Richard Henderson @ 2004-09-30 19:34 UTC (permalink / raw)
  To: Mostafa Hagog; +Cc: Canqun Yang, gcc, Ayal Zaks

On Thu, Sep 30, 2004 at 10:51:31AM +0200, Mostafa Hagog wrote:
> > On Thu, Sep 30, 2004 at 10:03:37AM +0200, Mostafa Hagog wrote:
> > > For 4.0 SMS intra-loop memory dependencies are detected using
> > > the alias.c (as in haifa-sched) for inter-loop memory dependencies
> > > we are very conservative and assume that all memory accesses are
> > > aliased.
> >
> > alias.c is flow insensative, and thus as conservative as you
> > could want.  Why aren't you using it all the time?
> 
> The answer is within your question; flow insensitive.
> In SMS we are interested in loop-carried dependancies
> and this is not provided by alias.c.

Certainly, but surely flow insensitive data is better than
assuming all memory accesses are aliased.


r~

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: SMS scheduling
  2004-09-29 15:36 ` Mostafa Hagog
  2004-09-30  8:50   ` Mostafa Hagog
@ 2004-10-02 12:25   ` Canqun Yang
  2004-10-03 19:54     ` Mostafa Hagog
  2004-10-09 17:55   ` Canqun Yang
  2 siblings, 1 reply; 17+ messages in thread
From: Canqun Yang @ 2004-10-02 12:25 UTC (permalink / raw)
  To: Mostafa Hagog; +Cc: Ayal Zaks, gcc

Mostafa Hagog <MUSTAFA@il.ibm.com>:

> 
> Canqun Yang <canqun@nudt.edu.cn> wrote on 28/09/2004 
11:06:29:
>
> > Hi, all
> >
> > I tested the Swing Modulo Scheduling of GCC. For
> > simple program, the numerical caculation of PI, it
> > achieves significant speedup on IA64. But, for 
little
> > bit complex programs, the SMS can hardly work out a
> > successful schedule.
> >
> > The algorithms implemented by Ayal and Mostafa are
> > correct. It seems that the SMS itself is wrong. The
> > schedule priority order calculated by SMS is much
> > different from normal MinDist algorithm.
>
> SMS prioritizes the nodes using critical patch based
> heuristic - the main idea is that instructions that 
are
> on the critical path are less flexible in means of
> scheduling.
> Why do you think that the priority order should be
> according to MinDist algorithm (what is the minimum
> distance in this case)? Can you provide an example
> that supports this?
>

Yes, I'll give an example after my holiday.
 
> > Besides this,
> > SMS is not sensitive to II. Is SMS really wrong?
>
> The node ordering step in SMS is not sensitive to II
> (as mentioned before it is based on critical path
> heuristic).  However, the scheduling step is sensitive
> to II - failing to schedule the nodes of a loop within
> II cycles will lead to try scheduling the nodes again
> within II + 1 cycles.
>
The node order calculated by SMS is not suitable. 
Although, the scheduling step does increase the II 
value to try a successful schedule, it is useless. In 
fact, the scheduler almost always fail at the same insn 
while the II is increasing. 


Canqun Yang
Creative Compiler Research Group.
National University of Defense Technology, China.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: SMS scheduling
  2004-09-30 19:34             ` Richard Henderson
@ 2004-10-03 17:59               ` Mostafa Hagog
  2004-10-04  1:06                 ` Richard Henderson
  0 siblings, 1 reply; 17+ messages in thread
From: Mostafa Hagog @ 2004-10-03 17:59 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Ayal Zaks, Canqun Yang, gcc


Richard Henderson <rth@redhat.com> wrote on 30/09/2004 19:03:35:

> On Thu, Sep 30, 2004 at 10:51:31AM +0200, Mostafa Hagog wrote:
> > > On Thu, Sep 30, 2004 at 10:03:37AM +0200, Mostafa Hagog wrote:
> > > > For 4.0 SMS intra-loop memory dependencies are detected using
> > > > the alias.c (as in haifa-sched) for inter-loop memory dependencies
> > > > we are very conservative and assume that all memory accesses are
> > > > aliased.
> > >
> > > alias.c is flow insensative, and thus as conservative as you
> > > could want.  Why aren't you using it all the time?
> >
> > The answer is within your question; flow insensitive.
> > In SMS we are interested in loop-carried dependancies
> > and this is not provided by alias.c.
>
> Certainly, but surely flow insensitive data is better than
> assuming all memory accesses are aliased.

Yes sure, but the only useful information that we can get
is the alias sets. If two memory accesses are in different alias
sets then they don't alias, otherwise we must assume they are
aliased across iterations. This is one of the patches that
are pending to post-gcc4.0.

Mostafa.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: SMS scheduling
  2004-10-02 12:25   ` Canqun Yang
@ 2004-10-03 19:54     ` Mostafa Hagog
  0 siblings, 0 replies; 17+ messages in thread
From: Mostafa Hagog @ 2004-10-03 19:54 UTC (permalink / raw)
  To: Canqun Yang; +Cc: Ayal Zaks, canqun, gcc


canqun@nudt.edu.cn wrote on 02/10/2004 11:41:05:

> Mostafa Hagog <MUSTAFA@il.ibm.com>:
>
> >
> > Canqun Yang <canqun@nudt.edu.cn> wrote on 28/09/2004
> 11:06:29:
> >
> > > Hi, all
> > >
> > > I tested the Swing Modulo Scheduling of GCC. For
> > > simple program, the numerical caculation of PI, it
> > > achieves significant speedup on IA64. But, for
> little
> > > bit complex programs, the SMS can hardly work out a
> > > successful schedule.
> > >
> > > The algorithms implemented by Ayal and Mostafa are
> > > correct. It seems that the SMS itself is wrong. The
> > > schedule priority order calculated by SMS is much
> > > different from normal MinDist algorithm.
> >
> > SMS prioritizes the nodes using critical patch based
> > heuristic - the main idea is that instructions that
> are
> > on the critical path are less flexible in means of
> > scheduling.
> > Why do you think that the priority order should be
> > according to MinDist algorithm (what is the minimum
> > distance in this case)? Can you provide an example
> > that supports this?
> >
>
> Yes, I'll give an example after my holiday.
>
> > > Besides this,
> > > SMS is not sensitive to II. Is SMS really wrong?
> >
> > The node ordering step in SMS is not sensitive to II
> > (as mentioned before it is based on critical path
> > heuristic).  However, the scheduling step is sensitive
> > to II - failing to schedule the nodes of a loop within
> > II cycles will lead to try scheduling the nodes again
> > within II + 1 cycles.
> >
> The node order calculated by SMS is not suitable.

Not suitable for what? For your example?

> Although, the scheduling step does increase the II
> value to try a successful schedule, it is useless. In
> fact, the scheduler almost always fail at the same insn
> while the II is increasing.

This shouldn't be the case in general (maybe this is in
your specific example). In general, when we increase the
II we increase the "mobility" of instructions (i.e the
cycles that we can schedule an instruction in).  Thus if
an instructions couldn't be scheduled within II cycles it
has better chance to be scheduled within II + 1 cycles.
So there maybe a specific problem of your example and/or
ia64 scheduling.

Mostafa.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: SMS scheduling
  2004-10-03 17:59               ` Mostafa Hagog
@ 2004-10-04  1:06                 ` Richard Henderson
  2004-10-04 12:14                   ` Mostafa Hagog
  0 siblings, 1 reply; 17+ messages in thread
From: Richard Henderson @ 2004-10-04  1:06 UTC (permalink / raw)
  To: Mostafa Hagog; +Cc: Ayal Zaks, Canqun Yang, gcc

On Sun, Oct 03, 2004 at 08:59:22AM +0200, Mostafa Hagog wrote:
> Yes sure, but the only useful information that we can get
> is the alias sets.

Not true.  We can tell if they point to different base objects.


r~

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: SMS scheduling
  2004-10-04  1:06                 ` Richard Henderson
@ 2004-10-04 12:14                   ` Mostafa Hagog
  2004-10-04 17:20                     ` Richard Henderson
  0 siblings, 1 reply; 17+ messages in thread
From: Mostafa Hagog @ 2004-10-04 12:14 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Ayal Zaks, Canqun Yang, gcc


Richard Henderson <rth@redhat.com> wrote on 04/10/2004 00:30:56:

> On Sun, Oct 03, 2004 at 08:59:22AM +0200, Mostafa Hagog wrote:
> > Yes sure, but the only useful information that we can get
> > is the alias sets.
>
> Not true.  We can tell if they point to different base objects.

And if they belong to the same alias sets (even when they have
different base objects) that means that the accesses that
belong to different iterations may alias, right ?


>
>
> r~


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: SMS scheduling
  2004-10-04 12:14                   ` Mostafa Hagog
@ 2004-10-04 17:20                     ` Richard Henderson
  2004-10-04 23:03                       ` Ayal Zaks
  0 siblings, 1 reply; 17+ messages in thread
From: Richard Henderson @ 2004-10-04 17:20 UTC (permalink / raw)
  To: Mostafa Hagog; +Cc: Ayal Zaks, Canqun Yang, gcc

On Mon, Oct 04, 2004 at 11:01:57AM +0200, Mostafa Hagog wrote:
> And if they belong to the same alias sets (even when they have
> different base objects) that means that the accesses that
> belong to different iterations may alias, right ?

No.  If they are for different base objects, they cannot alias,
since it is illegal to iterate past the end of one object into
another object.


r~

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: SMS scheduling
  2004-10-04 17:20                     ` Richard Henderson
@ 2004-10-04 23:03                       ` Ayal Zaks
  0 siblings, 0 replies; 17+ messages in thread
From: Ayal Zaks @ 2004-10-04 23:03 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Canqun Yang, gcc, Mostafa Hagog


Richard Henderson <rth@redhat.com> wrote on 04/10/2004 18:31:14:

> On Mon, Oct 04, 2004 at 11:01:57AM +0200, Mostafa Hagog wrote:
> > And if they belong to the same alias sets (even when they have
> > different base objects) that means that the accesses that
> > belong to different iterations may alias, right ?
>
> No.  If they are for different base objects, they cannot alias,
> since it is illegal to iterate past the end of one object into
> another object.

The problem is probably with bases that are pointers, cf.
tree-data-ref.c/array_base_name_differ_p().

Ayal.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: SMS scheduling
  2004-09-29 15:36 ` Mostafa Hagog
  2004-09-30  8:50   ` Mostafa Hagog
  2004-10-02 12:25   ` Canqun Yang
@ 2004-10-09 17:55   ` Canqun Yang
  2004-10-24 19:54     ` Mostafa Hagog
  2 siblings, 1 reply; 17+ messages in thread
From: Canqun Yang @ 2004-10-09 17:55 UTC (permalink / raw)
  To: Mostafa Hagog; +Cc: Ayal Zaks, Richard Henderson, gcc

Mostafa Hagog <MUSTAFA@il.ibm.com>:

> 
> Canqun Yang <canqun@nudt.edu.cn> wrote on 28/09/2004 
11:06:29:
>
> > Hi, all
> >
> > I tested the Swing Modulo Scheduling of GCC. For
> > simple program, the numerical caculation of PI, it
> > achieves significant speedup on IA64. But, for 
little
> > bit complex programs, the SMS can hardly work out a
> > successful schedule.
> >
> > The algorithms implemented by Ayal and Mostafa are
> > correct. It seems that the SMS itself is wrong. The
> > schedule priority order calculated by SMS is much
> > different from normal MinDist algorithm.
>
> SMS prioritizes the nodes using critical patch based
> heuristic - the main idea is that instructions that 
are
> on the critical path are less flexible in means of
> scheduling.
> Why do you think that the priority order should be
> according to MinDist algorithm (what is the minimum
> distance in this case)? Can you provide an example
> that supports this?
>
> > Besides this,
> > SMS is not sensitive to II. Is SMS really wrong?
>
> The node ordering step in SMS is not sensitive to II
> (as mentioned before it is based on critical path
> heuristic).  However, the scheduling step is 
sensitive
> to II - failing to schedule the nodes of a loop 
within
> II cycles will lead to try scheduling the nodes again
> within II + 1 cycles.
>
>
> Mostafa.
>

An example for SMS.

Content

  1. Fortran source
     'doloop_get_register' was modified to let it 
return the needed value.

  2. vcg file of data dependence graph

  3. node order parameters

  4. node order calculated by SMS

  5. failed schedule
     The schedule will block on nearly the same insn 
node while the II is increasing.

  6. node order calculated by mindist algorithm
     mindist is an algorithm for computing the minmum 
permissible time interval between every pair of 
operations. It is a time consuming algorithm.
     Ref. Iterative Modulo Scheduling, B. Ramakrisha 
Rau, Compiler and Architecture Research, HPL-94-115, 
1995.

  7. successful schedule


1. Fortran source

#define N 32767
double uold[N], vold[N], alpha;

void calc3 ()
{
  int i;

  for (i = 0; i < N; i++)
    {
      uold[i] += alpha;
      vold[i] += alpha;
    }
}


2. vcg file of data dependence graph

graph: {
node: {title: "0_29" info1: "(insn 29 23 30 1 (set 
(reg:DF 355)
        (mem/s:DF (reg/f:DI 351 [ ivtmp.19 ]) [3 uold 
S8 A64])) 25 {*movdf_internal} (nil)
    (nil))
"}
edge: { sourcename: "0_29" targetname: "2_31" 
label: "0_0"}
edge: { sourcename: "0_29" targetname: "1_30" 
label: "6_0"}
node: {title: "1_30" info1: "(insn 30 29 31 1 (set 
(reg:DF 356)
        (plus:DF (reg:DF 355)
            (reg:DF 353 [ alpha.1 ]))) 133 {adddf3} 
(insn_list:REG_DEP_TRUE 29 (nil))
    (expr_list:REG_DEAD (reg:DF 355)
        (nil)))
"}
edge: { sourcename: "1_30" targetname: "2_31" 
label: "7_0"}
node: {title: "2_31" info1: "(insn 31 30 35 1 (set 
(mem/s:DF (post_inc:DI (reg/f:DI 351 [ 

ivtmp.19 ])) [3 uold S8 A64])
        (reg:DF 356)) 25 {*movdf_internal} 
(insn_list:REG_DEP_TRUE 30 

(insn_list:REG_DEP_ANTI 29 (nil)))
    (expr_list:REG_DEAD (reg:DF 356)
        (expr_list:REG_INC (reg/f:DI 351 [ ivtmp.19 ])
            (nil))))
"}
backedge: {color: red sourcename: "2_31" 
targetname: "2_31" label: "1_1"}
backedge: {color: red sourcename: "2_31" 
targetname: "0_29" label: "1_1"}
edge: { sourcename: "2_31" targetname: "6_71" 
label: "0_0"}
backedge: {color: red sourcename: "2_31" 
targetname: "3_35" label: "1_1"}
backedge: {color: red sourcename: "2_31" 
targetname: "0_29" label: "1_1"}
node: {title: "3_35" info1: "(insn 35 31 36 1 (set 
(reg:DF 357)
        (mem/s:DF (reg/f:DI 350 [ ivtmp.23 ]) [3 vold 
S8 A64])) 25 {*movdf_internal} (nil)
    (nil))
"}
edge: { sourcename: "3_35" targetname: "5_37" 
label: "0_0"}
edge: { sourcename: "3_35" targetname: "4_36" 
label: "6_0"}
backedge: {color: red sourcename: "3_35" 
targetname: "2_31" label: "0_1"}
node: {title: "4_36" info1: "(insn 36 35 37 1 (set 
(reg:DF 358)
        (plus:DF (reg:DF 353 [ alpha.1 ])
            (reg:DF 357))) 133 {adddf3} 
(insn_list:REG_DEP_TRUE 35 (nil))
    (expr_list:REG_DEAD (reg:DF 357)
        (nil)))
"}
edge: { sourcename: "4_36" targetname: "5_37" 
label: "7_0"}
node: {title: "5_37" info1: "(insn 37 36 39 1 (set 
(mem/s:DF (post_inc:DI (reg/f:DI 350 [ 

ivtmp.23 ])) [3 vold S8 A64])
        (reg:DF 358)) 25 {*movdf_internal} 
(insn_list:REG_DEP_TRUE 36 

(insn_list:REG_DEP_ANTI 35 (nil)))
    (expr_list:REG_DEAD (reg:DF 358)
        (expr_list:REG_INC (reg/f:DI 350 [ ivtmp.23 ])
            (nil))))
"}
backedge: {color: red sourcename: "5_37" 
targetname: "5_37" label: "1_1"}
backedge: {color: red sourcename: "5_37" 
targetname: "3_35" label: "1_1"}
edge: { sourcename: "5_37" targetname: "6_71" 
label: "0_0"}
backedge: {color: red sourcename: "5_37" 
targetname: "3_35" label: "1_1"}
backedge: {color: red sourcename: "5_37" 
targetname: "2_31" label: "0_1"}
backedge: {color: red sourcename: "5_37" 
targetname: "0_29" label: "1_1"}
node: {title: "6_71" info1: "(jump_insn 71 40 81 1 
(parallel [
            (set (pc)
                (if_then_else (ne (reg:DI 332 ar.lc)
                        (const_int 0 [0x0]))
                    (label_ref 22)
                    (pc)))
            (set (reg:DI 332 ar.lc)
                (if_then_else:DI (ne (reg:DI 332 ar.lc)
                        (const_int 0 [0x0]))
                    (plus:DI (reg:DI 332 ar.lc)
                        (const_int -1 
[0xffffffffffffffff]))
                    (reg:DI 332 ar.lc)))
        ]) 223 {doloop_end_internal} 
(insn_list:REG_DEP_OUTPUT 31 (insn_list:REG_DEP_OUTPUT 

37 (nil)))
    (nil))
"}
}


3. node order parameters

#   ASAP   ALAP  HEIGHT MOB  DEPTH
0    0      0     13     0     0
1    6      6     7      0     6
2    13     13    0      0     13
3    0      0     13     0     0
4    6      6     7      0     6
5    13     13    0      0     13
6    13     13    0      0     13



4. node order calculated by SMS
  2, 5, 1, 4, 0, 3, 6


5. failed schedule

;; Function calc3

SMS single-bb-loop
SMS doloop
SMS built-ddg 7
SMS num-loads 2
SMS num-stores 2
SMS const-doloop 32766
SMS iis 14 14 28 (rec_mii, mii, maxii)
Starting with ii=14
Trying to schedule node 2 in (13 .. 27) step 1
Schedule in 13
Trying to schedule node 5 in (27 .. 13) step -1
Schedule in 27
Trying to schedule node 1 in (6 .. -8) step -1
Schedule in 6
Trying to schedule node 4 in (20 .. 6) step -1
Schedule in 20
Trying to schedule node 0 in (14 .. 1) step 1
Starting with ii=15
Trying to schedule node 2 in (13 .. 28) step 1
Schedule in 13
Trying to schedule node 5 in (28 .. 13) step -1
Schedule in 28
Trying to schedule node 1 in (6 .. -9) step -1
Schedule in 6
Trying to schedule node 4 in (21 .. 6) step -1
Schedule in 21
Trying to schedule node 0 in (14 .. 1) step 1
Starting with ii=16
Trying to schedule node 2 in (13 .. 29) step 1
Schedule in 13
Trying to schedule node 5 in (29 .. 13) step -1
Schedule in 29
Trying to schedule node 1 in (6 .. -10) step -1
Schedule in 6
Trying to schedule node 4 in (22 .. 6) step -1
Schedule in 22
Trying to schedule node 0 in (14 .. 1) step 1
Starting with ii=17
Trying to schedule node 2 in (13 .. 30) step 1
Schedule in 13
Trying to schedule node 5 in (30 .. 13) step -1
Schedule in 30
Trying to schedule node 1 in (6 .. -11) step -1
Schedule in 6
Trying to schedule node 4 in (23 .. 6) step -1
Schedule in 23
Trying to schedule node 0 in (14 .. 1) step 1
Starting with ii=18
......


6. node order calculated by mindist algorithm
  3, 1, 4, 0, 2, 5, 6


7. The successful schedule.

;; Function calc3

SMS single-bb-loop
SMS doloop
SMS built-ddg 7
SMS num-loads 2
SMS num-stores 2
SMS const-doloop 32766
SMS iis 14 14 28 (rec_mii, mii, maxii)
Starting with ii=14
Trying to schedule node 3 in (0 .. 14) step 1
Schedule in 0
Trying to schedule node 1 in (6 .. 20) step 1
Schedule in 6
Trying to schedule node 4 in (6 .. 20) step 1
Schedule in 6
Trying to schedule node 0 in (0 .. -14) step -1
Schedule in 0
Trying to schedule node 2 in (13 .. 13) step 1
Starting with ii=15
Trying to schedule node 3 in (0 .. 15) step 1
Schedule in 0
Trying to schedule node 1 in (6 .. 21) step 1
Schedule in 6
Trying to schedule node 4 in (6 .. 21) step 1
Schedule in 6
Trying to schedule node 0 in (0 .. -15) step -1
Schedule in 0
Trying to schedule node 2 in (13 .. 14) step 1
Schedule in 13
Trying to schedule node 5 in (13 .. 14) step 1
Schedule in 13
SMS succeeded 15 1 (with ii, sc)

 



Canqun Yang
Creative Compiler Research Group.
National University of Defense Technology, China.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: SMS scheduling
  2004-10-09 17:55   ` Canqun Yang
@ 2004-10-24 19:54     ` Mostafa Hagog
  0 siblings, 0 replies; 17+ messages in thread
From: Mostafa Hagog @ 2004-10-24 19:54 UTC (permalink / raw)
  To: Canqun Yang; +Cc: Ayal Zaks, gcc, Richard Henderson


canqun@nudt.edu.cn wrote on 09/10/2004 09:18:11:

> Mostafa Hagog <MUSTAFA@il.ibm.com>:
>
> >
> > Canqun Yang <canqun@nudt.edu.cn> wrote on 28/09/2004
> 11:06:29:
> >
> > > Hi, all
> > >
> > > I tested the Swing Modulo Scheduling of GCC. For
> > > simple program, the numerical caculation of PI, it
> > > achieves significant speedup on IA64. But, for
> little
> > > bit complex programs, the SMS can hardly work out a
> > > successful schedule.
> > >
> > > The algorithms implemented by Ayal and Mostafa are
> > > correct. It seems that the SMS itself is wrong. The
> > > schedule priority order calculated by SMS is much
> > > different from normal MinDist algorithm.
> >
> > SMS prioritizes the nodes using critical patch based
> > heuristic - the main idea is that instructions that
> are
> > on the critical path are less flexible in means of
> > scheduling.
> > Why do you think that the priority order should be
> > according to MinDist algorithm (what is the minimum
> > distance in this case)? Can you provide an example
> > that supports this?
> >
> > > Besides this,
> > > SMS is not sensitive to II. Is SMS really wrong?
> >
> > The node ordering step in SMS is not sensitive to II
> > (as mentioned before it is based on critical path
> > heuristic).  However, the scheduling step is
> sensitive
> > to II - failing to schedule the nodes of a loop
> within
> > II cycles will lead to try scheduling the nodes again
> > within II + 1 cycles.
> >
> >
> > Mostafa.
> >
>
> An example for SMS.
>
> Content
>
>   1. Fortran source
>      'doloop_get_register' was modified to let it
> return the needed value.
>
>   2. vcg file of data dependence graph
>
>   3. node order parameters
>
>   4. node order calculated by SMS
>
>   5. failed schedule
>      The schedule will block on nearly the same insn
> node while the II is increasing.
>
>   6. node order calculated by mindist algorithm
>      mindist is an algorithm for computing the minmum
> permissible time interval between every pair of
> operations. It is a time consuming algorithm.
>      Ref. Iterative Modulo Scheduling, B. Ramakrisha
> Rau, Compiler and Architecture Research, HPL-94-115,
> 1995.
>
>   7. successful schedule
>
>
> 1. Fortran source
>
> #define N 32767
> double uold[N], vold[N], alpha;
>
> void calc3 ()
> {
>   int i;
>
>   for (i = 0; i < N; i++)
>     {
>       uold[i] += alpha;
>       vold[i] += alpha;
>     }
> }
>
>
> 2. vcg file of data dependence graph
>
> graph: {
> node: {title: "0_29" info1: "(insn 29 23 30 1 (set
> (reg:DF 355)
>         (mem/s:DF (reg/f:DI 351 [ ivtmp.19 ]) [3 uold
> S8 A64])) 25 {*movdf_internal} (nil)
>     (nil))
> "}
> edge: { sourcename: "0_29" targetname: "2_31"
> label: "0_0"}
> edge: { sourcename: "0_29" targetname: "1_30"
> label: "6_0"}
> node: {title: "1_30" info1: "(insn 30 29 31 1 (set
> (reg:DF 356)
>         (plus:DF (reg:DF 355)
>             (reg:DF 353 [ alpha.1 ]))) 133 {adddf3}
> (insn_list:REG_DEP_TRUE 29 (nil))
>     (expr_list:REG_DEAD (reg:DF 355)
>         (nil)))
> "}
> edge: { sourcename: "1_30" targetname: "2_31"
> label: "7_0"}
> node: {title: "2_31" info1: "(insn 31 30 35 1 (set
> (mem/s:DF (post_inc:DI (reg/f:DI 351 [
>
> ivtmp.19 ])) [3 uold S8 A64])
>         (reg:DF 356)) 25 {*movdf_internal}
> (insn_list:REG_DEP_TRUE 30
>
> (insn_list:REG_DEP_ANTI 29 (nil)))
>     (expr_list:REG_DEAD (reg:DF 356)
>         (expr_list:REG_INC (reg/f:DI 351 [ ivtmp.19 ])
>             (nil))))
> "}
> backedge: {color: red sourcename: "2_31"
> targetname: "2_31" label: "1_1"}
> backedge: {color: red sourcename: "2_31"
> targetname: "0_29" label: "1_1"}
> edge: { sourcename: "2_31" targetname: "6_71"
> label: "0_0"}
> backedge: {color: red sourcename: "2_31"
> targetname: "3_35" label: "1_1"}
> backedge: {color: red sourcename: "2_31"
> targetname: "0_29" label: "1_1"}
> node: {title: "3_35" info1: "(insn 35 31 36 1 (set
> (reg:DF 357)
>         (mem/s:DF (reg/f:DI 350 [ ivtmp.23 ]) [3 vold
> S8 A64])) 25 {*movdf_internal} (nil)
>     (nil))
> "}
> edge: { sourcename: "3_35" targetname: "5_37"
> label: "0_0"}
> edge: { sourcename: "3_35" targetname: "4_36"
> label: "6_0"}
> backedge: {color: red sourcename: "3_35"
> targetname: "2_31" label: "0_1"}
> node: {title: "4_36" info1: "(insn 36 35 37 1 (set
> (reg:DF 358)
>         (plus:DF (reg:DF 353 [ alpha.1 ])
>             (reg:DF 357))) 133 {adddf3}
> (insn_list:REG_DEP_TRUE 35 (nil))
>     (expr_list:REG_DEAD (reg:DF 357)
>         (nil)))
> "}
> edge: { sourcename: "4_36" targetname: "5_37"
> label: "7_0"}
> node: {title: "5_37" info1: "(insn 37 36 39 1 (set
> (mem/s:DF (post_inc:DI (reg/f:DI 350 [
>
> ivtmp.23 ])) [3 vold S8 A64])
>         (reg:DF 358)) 25 {*movdf_internal}
> (insn_list:REG_DEP_TRUE 36
>
> (insn_list:REG_DEP_ANTI 35 (nil)))
>     (expr_list:REG_DEAD (reg:DF 358)
>         (expr_list:REG_INC (reg/f:DI 350 [ ivtmp.23 ])
>             (nil))))
> "}
> backedge: {color: red sourcename: "5_37"
> targetname: "5_37" label: "1_1"}
> backedge: {color: red sourcename: "5_37"
> targetname: "3_35" label: "1_1"}
> edge: { sourcename: "5_37" targetname: "6_71"
> label: "0_0"}
> backedge: {color: red sourcename: "5_37"
> targetname: "3_35" label: "1_1"}
> backedge: {color: red sourcename: "5_37"
> targetname: "2_31" label: "0_1"}
> backedge: {color: red sourcename: "5_37"
> targetname: "0_29" label: "1_1"}
> node: {title: "6_71" info1: "(jump_insn 71 40 81 1
> (parallel [
>             (set (pc)
>                 (if_then_else (ne (reg:DI 332 ar.lc)
>                         (const_int 0 [0x0]))
>                     (label_ref 22)
>                     (pc)))
>             (set (reg:DI 332 ar.lc)
>                 (if_then_else:DI (ne (reg:DI 332 ar.lc)
>                         (const_int 0 [0x0]))
>                     (plus:DI (reg:DI 332 ar.lc)
>                         (const_int -1
> [0xffffffffffffffff]))
>                     (reg:DI 332 ar.lc)))
>         ]) 223 {doloop_end_internal}
> (insn_list:REG_DEP_OUTPUT 31 (insn_list:REG_DEP_OUTPUT
>
> 37 (nil)))
>     (nil))
> "}
> }

As we can see in the above graph the main problem here is the
fact that we are over conservative in calculating memory dependencies,
many of them could be removed with improved aliasing in the RTL level
(take for example the memory dependences from node 0 to nodes 2 and 5).
When these dependencies are removed the example becomes simple.
However, this is a good theoretical example to see where SMS fails,
and the need for improving it - see more comment below.

>
>
> 3. node order parameters
>
> #   ASAP   ALAP  HEIGHT MOB  DEPTH
> 0    0      0     13     0     0
> 1    6      6     7      0     6
> 2    13     13    0      0     13
> 3    0      0     13     0     0
> 4    6      6     7      0     6
> 5    13     13    0      0     13
> 6    13     13    0      0     13
>
>
>
> 4. node order calculated by SMS
>   2, 5, 1, 4, 0, 3, 6
>
>
> 5. failed schedule
>
> ;; Function calc3
>
> SMS single-bb-loop
> SMS doloop
> SMS built-ddg 7
> SMS num-loads 2
> SMS num-stores 2
> SMS const-doloop 32766
> SMS iis 14 14 28 (rec_mii, mii, maxii)
> Starting with ii=14
> Trying to schedule node 2 in (13 .. 27) step 1
> Schedule in 13
> Trying to schedule node 5 in (27 .. 13) step -1
> Schedule in 27
> Trying to schedule node 1 in (6 .. -8) step -1
> Schedule in 6
> Trying to schedule node 4 in (20 .. 6) step -1
> Schedule in 20
> Trying to schedule node 0 in (14 .. 1) step 1
> Starting with ii=15
> Trying to schedule node 2 in (13 .. 28) step 1
> Schedule in 13
> Trying to schedule node 5 in (28 .. 13) step -1
> Schedule in 28
> Trying to schedule node 1 in (6 .. -9) step -1
> Schedule in 6
> Trying to schedule node 4 in (21 .. 6) step -1
> Schedule in 21
> Trying to schedule node 0 in (14 .. 1) step 1
> Starting with ii=16
> Trying to schedule node 2 in (13 .. 29) step 1
> Schedule in 13
> Trying to schedule node 5 in (29 .. 13) step -1
> Schedule in 29
> Trying to schedule node 1 in (6 .. -10) step -1
> Schedule in 6
> Trying to schedule node 4 in (22 .. 6) step -1
> Schedule in 22
> Trying to schedule node 0 in (14 .. 1) step 1
> Starting with ii=17
> Trying to schedule node 2 in (13 .. 30) step 1
> Schedule in 13
> Trying to schedule node 5 in (30 .. 13) step -1
> Schedule in 30
> Trying to schedule node 1 in (6 .. -11) step -1
> Schedule in 6
> Trying to schedule node 4 in (23 .. 6) step -1
> Schedule in 23
> Trying to schedule node 0 in (14 .. 1) step 1
> Starting with ii=18
> ......

As you have said before (and as we can see from the above ),
increasing the II in this case doesn't help finding a cycle to
schedule node 0 in.  This is because node 5 is always scheduled II
cycles after node 2 - thus increasing the II will push node 5 further
and the scheduling window for node 0 remains the same.  The reason for
this is that we want to schedule node 5 as close as possible to the
relevant instance of node 2.  This is the heuristic of SMS and this
is an example where this heuristic fails.  There are two main ways
to overcome this problem; one is to use backtracking which is a
reasonable because of the way we use the DFA.  The other way is to
try to keep up-to-date the properties: ASAP, ALAP, MOBILITY, DEPTH,
and HIGHT and use them in calculating the scheduling window.  It is
important to understand that using a different ordering means using
different heuristic for modulo scheduling.  This could be good for
one example but not for another example;  in the overall the SMS
ordering heuristic should give better results because of it tries to
reduce register pressure during the scheduling.

Mostafa.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2004-10-24 12:01 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-09-28 12:57 SMS scheduling Canqun Yang
2004-09-29 15:36 ` Mostafa Hagog
2004-09-30  8:50   ` Mostafa Hagog
2004-09-30  8:51     ` Richard Henderson
2004-09-30 10:57       ` Mostafa Hagog
2004-09-30 11:26         ` Richard Henderson
2004-09-30 11:29           ` Mostafa Hagog
2004-09-30 19:34             ` Richard Henderson
2004-10-03 17:59               ` Mostafa Hagog
2004-10-04  1:06                 ` Richard Henderson
2004-10-04 12:14                   ` Mostafa Hagog
2004-10-04 17:20                     ` Richard Henderson
2004-10-04 23:03                       ` Ayal Zaks
2004-10-02 12:25   ` Canqun Yang
2004-10-03 19:54     ` Mostafa Hagog
2004-10-09 17:55   ` Canqun Yang
2004-10-24 19:54     ` Mostafa Hagog

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).