public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
From: Mostafa Hagog <MUSTAFA@il.ibm.com>
To: Canqun Yang <canqun@nudt.edu.cn>
Cc: Ayal Zaks <ZAKS@il.ibm.com>,
	gcc@gcc.gnu.org, Richard Henderson <rth@redhat.com>
Subject: Re: SMS scheduling
Date: Sun, 24 Oct 2004 19:54:00 -0000	[thread overview]
Message-ID: <OF2AE158E7.88D21E98-ONC2256F37.002A6AAA-C2256F37.004209BE@il.ibm.com> (raw)
In-Reply-To: <20041009071811.3B4745AA3E@ds20.nudt.edu.cn>


canqun@nudt.edu.cn wrote on 09/10/2004 09:18:11:

> Mostafa Hagog <MUSTAFA@il.ibm.com>:
>
> >
> > Canqun Yang <canqun@nudt.edu.cn> wrote on 28/09/2004
> 11:06:29:
> >
> > > Hi, all
> > >
> > > I tested the Swing Modulo Scheduling of GCC. For
> > > simple program, the numerical caculation of PI, it
> > > achieves significant speedup on IA64. But, for
> little
> > > bit complex programs, the SMS can hardly work out a
> > > successful schedule.
> > >
> > > The algorithms implemented by Ayal and Mostafa are
> > > correct. It seems that the SMS itself is wrong. The
> > > schedule priority order calculated by SMS is much
> > > different from normal MinDist algorithm.
> >
> > SMS prioritizes the nodes using critical patch based
> > heuristic - the main idea is that instructions that
> are
> > on the critical path are less flexible in means of
> > scheduling.
> > Why do you think that the priority order should be
> > according to MinDist algorithm (what is the minimum
> > distance in this case)? Can you provide an example
> > that supports this?
> >
> > > Besides this,
> > > SMS is not sensitive to II. Is SMS really wrong?
> >
> > The node ordering step in SMS is not sensitive to II
> > (as mentioned before it is based on critical path
> > heuristic).  However, the scheduling step is
> sensitive
> > to II - failing to schedule the nodes of a loop
> within
> > II cycles will lead to try scheduling the nodes again
> > within II + 1 cycles.
> >
> >
> > Mostafa.
> >
>
> An example for SMS.
>
> Content
>
>   1. Fortran source
>      'doloop_get_register' was modified to let it
> return the needed value.
>
>   2. vcg file of data dependence graph
>
>   3. node order parameters
>
>   4. node order calculated by SMS
>
>   5. failed schedule
>      The schedule will block on nearly the same insn
> node while the II is increasing.
>
>   6. node order calculated by mindist algorithm
>      mindist is an algorithm for computing the minmum
> permissible time interval between every pair of
> operations. It is a time consuming algorithm.
>      Ref. Iterative Modulo Scheduling, B. Ramakrisha
> Rau, Compiler and Architecture Research, HPL-94-115,
> 1995.
>
>   7. successful schedule
>
>
> 1. Fortran source
>
> #define N 32767
> double uold[N], vold[N], alpha;
>
> void calc3 ()
> {
>   int i;
>
>   for (i = 0; i < N; i++)
>     {
>       uold[i] += alpha;
>       vold[i] += alpha;
>     }
> }
>
>
> 2. vcg file of data dependence graph
>
> graph: {
> node: {title: "0_29" info1: "(insn 29 23 30 1 (set
> (reg:DF 355)
>         (mem/s:DF (reg/f:DI 351 [ ivtmp.19 ]) [3 uold
> S8 A64])) 25 {*movdf_internal} (nil)
>     (nil))
> "}
> edge: { sourcename: "0_29" targetname: "2_31"
> label: "0_0"}
> edge: { sourcename: "0_29" targetname: "1_30"
> label: "6_0"}
> node: {title: "1_30" info1: "(insn 30 29 31 1 (set
> (reg:DF 356)
>         (plus:DF (reg:DF 355)
>             (reg:DF 353 [ alpha.1 ]))) 133 {adddf3}
> (insn_list:REG_DEP_TRUE 29 (nil))
>     (expr_list:REG_DEAD (reg:DF 355)
>         (nil)))
> "}
> edge: { sourcename: "1_30" targetname: "2_31"
> label: "7_0"}
> node: {title: "2_31" info1: "(insn 31 30 35 1 (set
> (mem/s:DF (post_inc:DI (reg/f:DI 351 [
>
> ivtmp.19 ])) [3 uold S8 A64])
>         (reg:DF 356)) 25 {*movdf_internal}
> (insn_list:REG_DEP_TRUE 30
>
> (insn_list:REG_DEP_ANTI 29 (nil)))
>     (expr_list:REG_DEAD (reg:DF 356)
>         (expr_list:REG_INC (reg/f:DI 351 [ ivtmp.19 ])
>             (nil))))
> "}
> backedge: {color: red sourcename: "2_31"
> targetname: "2_31" label: "1_1"}
> backedge: {color: red sourcename: "2_31"
> targetname: "0_29" label: "1_1"}
> edge: { sourcename: "2_31" targetname: "6_71"
> label: "0_0"}
> backedge: {color: red sourcename: "2_31"
> targetname: "3_35" label: "1_1"}
> backedge: {color: red sourcename: "2_31"
> targetname: "0_29" label: "1_1"}
> node: {title: "3_35" info1: "(insn 35 31 36 1 (set
> (reg:DF 357)
>         (mem/s:DF (reg/f:DI 350 [ ivtmp.23 ]) [3 vold
> S8 A64])) 25 {*movdf_internal} (nil)
>     (nil))
> "}
> edge: { sourcename: "3_35" targetname: "5_37"
> label: "0_0"}
> edge: { sourcename: "3_35" targetname: "4_36"
> label: "6_0"}
> backedge: {color: red sourcename: "3_35"
> targetname: "2_31" label: "0_1"}
> node: {title: "4_36" info1: "(insn 36 35 37 1 (set
> (reg:DF 358)
>         (plus:DF (reg:DF 353 [ alpha.1 ])
>             (reg:DF 357))) 133 {adddf3}
> (insn_list:REG_DEP_TRUE 35 (nil))
>     (expr_list:REG_DEAD (reg:DF 357)
>         (nil)))
> "}
> edge: { sourcename: "4_36" targetname: "5_37"
> label: "7_0"}
> node: {title: "5_37" info1: "(insn 37 36 39 1 (set
> (mem/s:DF (post_inc:DI (reg/f:DI 350 [
>
> ivtmp.23 ])) [3 vold S8 A64])
>         (reg:DF 358)) 25 {*movdf_internal}
> (insn_list:REG_DEP_TRUE 36
>
> (insn_list:REG_DEP_ANTI 35 (nil)))
>     (expr_list:REG_DEAD (reg:DF 358)
>         (expr_list:REG_INC (reg/f:DI 350 [ ivtmp.23 ])
>             (nil))))
> "}
> backedge: {color: red sourcename: "5_37"
> targetname: "5_37" label: "1_1"}
> backedge: {color: red sourcename: "5_37"
> targetname: "3_35" label: "1_1"}
> edge: { sourcename: "5_37" targetname: "6_71"
> label: "0_0"}
> backedge: {color: red sourcename: "5_37"
> targetname: "3_35" label: "1_1"}
> backedge: {color: red sourcename: "5_37"
> targetname: "2_31" label: "0_1"}
> backedge: {color: red sourcename: "5_37"
> targetname: "0_29" label: "1_1"}
> node: {title: "6_71" info1: "(jump_insn 71 40 81 1
> (parallel [
>             (set (pc)
>                 (if_then_else (ne (reg:DI 332 ar.lc)
>                         (const_int 0 [0x0]))
>                     (label_ref 22)
>                     (pc)))
>             (set (reg:DI 332 ar.lc)
>                 (if_then_else:DI (ne (reg:DI 332 ar.lc)
>                         (const_int 0 [0x0]))
>                     (plus:DI (reg:DI 332 ar.lc)
>                         (const_int -1
> [0xffffffffffffffff]))
>                     (reg:DI 332 ar.lc)))
>         ]) 223 {doloop_end_internal}
> (insn_list:REG_DEP_OUTPUT 31 (insn_list:REG_DEP_OUTPUT
>
> 37 (nil)))
>     (nil))
> "}
> }

As we can see in the above graph the main problem here is the
fact that we are over conservative in calculating memory dependencies,
many of them could be removed with improved aliasing in the RTL level
(take for example the memory dependences from node 0 to nodes 2 and 5).
When these dependencies are removed the example becomes simple.
However, this is a good theoretical example to see where SMS fails,
and the need for improving it - see more comment below.

>
>
> 3. node order parameters
>
> #   ASAP   ALAP  HEIGHT MOB  DEPTH
> 0    0      0     13     0     0
> 1    6      6     7      0     6
> 2    13     13    0      0     13
> 3    0      0     13     0     0
> 4    6      6     7      0     6
> 5    13     13    0      0     13
> 6    13     13    0      0     13
>
>
>
> 4. node order calculated by SMS
>   2, 5, 1, 4, 0, 3, 6
>
>
> 5. failed schedule
>
> ;; Function calc3
>
> SMS single-bb-loop
> SMS doloop
> SMS built-ddg 7
> SMS num-loads 2
> SMS num-stores 2
> SMS const-doloop 32766
> SMS iis 14 14 28 (rec_mii, mii, maxii)
> Starting with ii=14
> Trying to schedule node 2 in (13 .. 27) step 1
> Schedule in 13
> Trying to schedule node 5 in (27 .. 13) step -1
> Schedule in 27
> Trying to schedule node 1 in (6 .. -8) step -1
> Schedule in 6
> Trying to schedule node 4 in (20 .. 6) step -1
> Schedule in 20
> Trying to schedule node 0 in (14 .. 1) step 1
> Starting with ii=15
> Trying to schedule node 2 in (13 .. 28) step 1
> Schedule in 13
> Trying to schedule node 5 in (28 .. 13) step -1
> Schedule in 28
> Trying to schedule node 1 in (6 .. -9) step -1
> Schedule in 6
> Trying to schedule node 4 in (21 .. 6) step -1
> Schedule in 21
> Trying to schedule node 0 in (14 .. 1) step 1
> Starting with ii=16
> Trying to schedule node 2 in (13 .. 29) step 1
> Schedule in 13
> Trying to schedule node 5 in (29 .. 13) step -1
> Schedule in 29
> Trying to schedule node 1 in (6 .. -10) step -1
> Schedule in 6
> Trying to schedule node 4 in (22 .. 6) step -1
> Schedule in 22
> Trying to schedule node 0 in (14 .. 1) step 1
> Starting with ii=17
> Trying to schedule node 2 in (13 .. 30) step 1
> Schedule in 13
> Trying to schedule node 5 in (30 .. 13) step -1
> Schedule in 30
> Trying to schedule node 1 in (6 .. -11) step -1
> Schedule in 6
> Trying to schedule node 4 in (23 .. 6) step -1
> Schedule in 23
> Trying to schedule node 0 in (14 .. 1) step 1
> Starting with ii=18
> ......

As you have said before (and as we can see from the above ),
increasing the II in this case doesn't help finding a cycle to
schedule node 0 in.  This is because node 5 is always scheduled II
cycles after node 2 - thus increasing the II will push node 5 further
and the scheduling window for node 0 remains the same.  The reason for
this is that we want to schedule node 5 as close as possible to the
relevant instance of node 2.  This is the heuristic of SMS and this
is an example where this heuristic fails.  There are two main ways
to overcome this problem; one is to use backtracking which is a
reasonable because of the way we use the DFA.  The other way is to
try to keep up-to-date the properties: ASAP, ALAP, MOBILITY, DEPTH,
and HIGHT and use them in calculating the scheduling window.  It is
important to understand that using a different ordering means using
different heuristic for modulo scheduling.  This could be good for
one example but not for another example;  in the overall the SMS
ordering heuristic should give better results because of it tries to
reduce register pressure during the scheduling.

Mostafa.

      reply	other threads:[~2004-10-24 12:01 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-09-28 12:57 Canqun Yang
2004-09-29 15:36 ` Mostafa Hagog
2004-09-30  8:50   ` Mostafa Hagog
2004-09-30  8:51     ` Richard Henderson
2004-09-30 10:57       ` Mostafa Hagog
2004-09-30 11:26         ` Richard Henderson
2004-09-30 11:29           ` Mostafa Hagog
2004-09-30 19:34             ` Richard Henderson
2004-10-03 17:59               ` Mostafa Hagog
2004-10-04  1:06                 ` Richard Henderson
2004-10-04 12:14                   ` Mostafa Hagog
2004-10-04 17:20                     ` Richard Henderson
2004-10-04 23:03                       ` Ayal Zaks
2004-10-02 12:25   ` Canqun Yang
2004-10-03 19:54     ` Mostafa Hagog
2004-10-09 17:55   ` Canqun Yang
2004-10-24 19:54     ` Mostafa Hagog [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=OF2AE158E7.88D21E98-ONC2256F37.002A6AAA-C2256F37.004209BE@il.ibm.com \
    --to=mustafa@il.ibm.com \
    --cc=ZAKS@il.ibm.com \
    --cc=canqun@nudt.edu.cn \
    --cc=gcc@gcc.gnu.org \
    --cc=rth@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).