Re: SMS in gcc4.0

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* Re: SMS in gcc4.0
@ 2005-06-02  1:25 Canqun Yang
  0 siblings, 0 replies; 15+ messages in thread
From: Canqun Yang @ 2005-06-02  1:25 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: gcc, Mostafa Hagog, Ayal Zaks

Canqun Yang <canqun@nudt.edu.cn>:

> Steven Bosscher <stevenb@suse.de>:
>
> > On Wednesday 01 June 2005 16:43, Canqun Yang wrote:
> > > Hi, all
> > >
> > > I've taken a look on modulo-sched.c recently, and
> found
> > > that both new_cycles and orig_cycles are
> imprecise. The
> > > reason is that kernel_number_of_cycles does not
> take the
> > > data dependences of insns into account as the DFA
> > > scheduler does in haifa-sched.c.
> >
> > How does this affect the cycles computation?
> >
>
> An insns is ready for schedule only when all the 
insns
> it dependent on have already be scheduled. In haifa-
> sched.c, there is a queue to hold the insns which are
> ready for schedule.
>
> To find how the data dependence affect the cycles
> computation, the more simple way is to compare the
> two versions of assembly code generated by GCC
> respectively, one is generated by turning on '-
fmodulo-
> sched', the other not. Without SMS, the code in loop
> has many stops ';;' to seperate the instrcutions 
which
> have data dependence, while with SMS, though the
> kernel code of the loop has more instructions, but
> less stops ';;'.
>
> > > On IA-64, three improvements are needed to let 
SMS
> work.
> > > 1) Modify doloop_register_get or the similar
> function
> > > defined in doloop.c to recognize the loop count
> > > register. I have supplied a patch about this in
> April.
> >
> > Mustafa and I have a patch that has a similar
> effect, see
> > http://gcc.gnu.org/ml/gcc-patches/2005-
> 06/msg00035.html.
> >
> > > 2) Use more precise way to calculate the values 
of
> the
> > > two kind of cycles, or just ignore this benefit
> assertion.
> >
> > Probably need to be more precise :-/
> >
> > When I manually hacked modulo-sched.c to ignore 
this
> test, I
> > did see loops getting scheduled, but I also ran 
into
> ICEs in
> > cfglayout.
>
> There are no ICEs for pi.f90, swim.f, and mgrid.f
> according to my test. But, an internal compile error
> of 'unrecognizable insn' is produced
> by 'gen_sub2_insn' which explicitly minus 'ar.lc' 
when
> swim.f and mgrid.f are being compiled.


There is no ICEs for pi.f90 according to my test. But 
ICEs of 'unreconizable insn' is procuded 
by 'gen_sub2_insns' which explicitly minus 'ar.lc' 
when swim.f and mgrid.f are being compiled.


>
> >
> > > 3) The counted loop register 'ar.lc' of IA-64 can
> not be
> > > updated  directly. Another temporary register is
> needed
> > > to evaluate the value of the actural loop count
> after
> > > SMS schedule, and assign its value to 'ar.lc'.
> >
> > Actually, should SMS just not update the loop
> register in place?
> > I never figured out why it tries to produce a sub
> insns (using
> > gen_sub2_insn which is also wrong btw).
> >
>
> The current implementation of SMS does not use IA-
64's
> epilog register (ar.ec). After SMS, the loop count is
> just used to control the execution times of the 
kernel
> code, and the kernel code will execute
>    loop_count - (stage_count - 1) times
> The sub insns generated by gen_sub2_insn is used to
> produce this value.
>
>
> > Gr.
> > Steven
> >
> >
>

Canqun Yang
Creative Compiler Research Group.
National University of Defense Technology, China.

^ permalink raw reply	[flat|nested] 15+ messages in thread

[parent not found: <OF89DFFFBC.F31A0E99-ON43256FEA.005530A7-43256FEA.0055CD57@il.ibm.com>]

[parent not found: <200504211739.42879.stevenb@suse.de>]

* Re: SMS in gcc4.0
       [not found] ` <200504211739.42879.stevenb@suse.de>
@ 2005-04-22  3:57   ` Canqun Yang
  2005-04-22  6:58     ` Steven Bosscher
  0 siblings, 1 reply; 15+ messages in thread
From: Canqun Yang @ 2005-04-22  3:57 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: Mostafa Hagog, Ayal Zaks, gcc

Steven Bosscher <stevenb@suse.de>:

> On Thursday 21 April 2005 17:37, Mostafa Hagog wrote:
> > The other thing is to analyze this problem more 
deeply but I don't have
> > IA64.
> ...and I don't care enough about it.  Canqun?
>
> Gr.
> Steven
>
> 

Ok, I'll try this.

Canqun Yang
Creative Compiler Research Group.
National University of Defense Technology, China.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: SMS in gcc4.0
  2005-04-22  3:57   ` Canqun Yang
@ 2005-04-22  6:58     ` Steven Bosscher
  2005-05-09 10:54       ` Mostafa Hagog
  0 siblings, 1 reply; 15+ messages in thread
From: Steven Bosscher @ 2005-04-22  6:58 UTC (permalink / raw)
  To: Canqun Yang; +Cc: Mostafa Hagog, Ayal Zaks, gcc

On Friday 22 April 2005 04:43, Canqun Yang wrote:
> Steven Bosscher <stevenb@suse.de>:
> > On Thursday 21 April 2005 17:37, Mostafa Hagog wrote:
> > > The other thing is to analyze this problem more
>
> deeply but I don't have
>
> > > IA64.
> >
> > ...and I don't care enough about it.  Canqun?
> >
> > Gr.
> > Steven
>
> Ok, I'll try this.

Thanks!
For the record, this refers to a patch I sent to Mostafa and Canqun to
do what Mostafa suggested last month to make SMS work for ia64, see 
http://gcc.gnu.org/ml/gcc-patches/2005-03/msg02848.html.

Gr.
Steven


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: SMS in gcc4.0
  2005-04-22  6:58     ` Steven Bosscher
@ 2005-05-09 10:54       ` Mostafa Hagog
  2005-06-01 14:28         ` Canqun Yang
  0 siblings, 1 reply; 15+ messages in thread
From: Mostafa Hagog @ 2005-05-09 10:54 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: Ayal Zaks, Canqun Yang, gcc





Steven Bosscher <stevenb@suse.de> wrote on 22/04/2005 09:39:09:


>
> Thanks!
> For the record, this refers to a patch I sent to Mostafa and Canqun to
> do what Mostafa suggested last month to make SMS work for ia64, see
> http://gcc.gnu.org/ml/gcc-patches/2005-03/msg02848.html.

I have tested the patch on powerpc-apple-darwin and there are some tests
that
started failing. So I am going to debug it to see what causes the failures.

Mostafa.

>
> Gr.
> Steven
>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: SMS in gcc4.0
  2005-05-09 10:54       ` Mostafa Hagog
@ 2005-06-01 14:28         ` Canqun Yang
  2005-06-01 14:35           ` Steven Bosscher
  0 siblings, 1 reply; 15+ messages in thread
From: Canqun Yang @ 2005-06-01 14:28 UTC (permalink / raw)
  To: Mostafa Hagog; +Cc: Steven Bosscher, Ayal Zaks, gcc

Hi, all

I've taken a look on modulo-sched.c recently, and found
that both new_cycles and orig_cycles are imprecise. The
reason is that kernel_number_of_cycles does not take the
data dependences of insns into account as the DFA
scheduler does in haifa-sched.c.  

On IA-64, three improvements are needed to let SMS work.
1) Modify doloop_register_get or the similar function
defined in doloop.c to recognize the loop count
register. I have supplied a patch about this in April.

2) Use more precise way to calculate the values of the
two kind of cycles, or just ignore this benefit assertion.

3) The counted loop register 'ar.lc' of IA-64 can not be
updated  directly. Another temporary register is needed
to evaluate the value of the actural loop count after
SMS schedule, and assign its value to 'ar.lc'.

Mostafa Hagog <MUSTAFA@il.ibm.com>:

> 
>
>
>
> Steven Bosscher <stevenb@suse.de> wrote on 22/04/2005
09:39:09:
>
>
> >
> > Thanks!
> > For the record, this refers to a patch I sent to
Mostafa and Canqun to
> > do what Mostafa suggested last month to make SMS
work for ia64, see
> > http://gcc.gu.org/ml/gcc-patches/2005-03/msg02848.html.
>
> I have tested the patch on powerpc-apple-darwin and
there are some tests
> that
> started failing. So I am going to debug it to see what
causes the failures.
>
> Mostafa.
>
> >
> > Gr.
> > Steven
> >
> >
>
> 

Canqun Yang
Creative Compiler Research Group.
National University of Defense Technology, China.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: SMS in gcc4.0
  2005-06-01 14:28         ` Canqun Yang
@ 2005-06-01 14:35           ` Steven Bosscher
  2005-06-02  1:13             ` Canqun Yang
  2005-06-02 13:09             ` Mostafa Hagog
  0 siblings, 2 replies; 15+ messages in thread
From: Steven Bosscher @ 2005-06-01 14:35 UTC (permalink / raw)
  To: gcc, Canqun Yang; +Cc: Mostafa Hagog, Ayal Zaks

On Wednesday 01 June 2005 16:43, Canqun Yang wrote:
> Hi, all
>
> I've taken a look on modulo-sched.c recently, and found
> that both new_cycles and orig_cycles are imprecise. The
> reason is that kernel_number_of_cycles does not take the
> data dependences of insns into account as the DFA
> scheduler does in haifa-sched.c.

How does this affect the cycles computation?

> On IA-64, three improvements are needed to let SMS work.
> 1) Modify doloop_register_get or the similar function
> defined in doloop.c to recognize the loop count
> register. I have supplied a patch about this in April.

Mustafa and I have a patch that has a similar effect, see
http://gcc.gnu.org/ml/gcc-patches/2005-06/msg00035.html.

> 2) Use more precise way to calculate the values of the
> two kind of cycles, or just ignore this benefit assertion.

Probably need to be more precise :-/

When I manually hacked modulo-sched.c to ignore this test, I
did see loops getting scheduled, but I also ran into ICEs in
cfglayout.

> 3) The counted loop register 'ar.lc' of IA-64 can not be
> updated  directly. Another temporary register is needed
> to evaluate the value of the actural loop count after
> SMS schedule, and assign its value to 'ar.lc'.

Actually, should SMS just not update the loop register in place?
I never figured out why it tries to produce a sub insns (using
gen_sub2_insn which is also wrong btw).

Gr.
Steven

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: SMS in gcc4.0
  2005-06-01 14:35           ` Steven Bosscher
@ 2005-06-02  1:13             ` Canqun Yang
  2005-06-02 13:41               ` Mostafa Hagog
  2005-06-02 13:09             ` Mostafa Hagog
  1 sibling, 1 reply; 15+ messages in thread
From: Canqun Yang @ 2005-06-02  1:13 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: gcc, Mostafa Hagog, Ayal Zaks

Steven Bosscher <stevenb@suse.de>:

> On Wednesday 01 June 2005 16:43, Canqun Yang wrote:
> > Hi, all
> >
> > I've taken a look on modulo-sched.c recently, and 
found
> > that both new_cycles and orig_cycles are 
imprecise. The
> > reason is that kernel_number_of_cycles does not 
take the
> > data dependences of insns into account as the DFA
> > scheduler does in haifa-sched.c.
>
> How does this affect the cycles computation?
>

An insns is ready for schedule only when all the insns 
it dependent on have already be scheduled. In haifa-
sched.c, there is a queue to hold the insns which are 
ready for schedule.

To find how the data dependence affect the cycles 
computation, the more simple way is to compare the  
two versions of assembly code generated by GCC 
respectively, one is generated by turning on '-fmodulo-
sched', the other not. Without SMS, the code in loop 
has many stops ';;' to seperate the instrcutions which 
have data dependence, while with SMS, though the 
kernel code of the loop has more instructions, but 
less stops ';;'. 

> > On IA-64, three improvements are needed to let SMS 
work.
> > 1) Modify doloop_register_get or the similar 
function
> > defined in doloop.c to recognize the loop count
> > register. I have supplied a patch about this in 
April.
>
> Mustafa and I have a patch that has a similar 
effect, see
> http://gcc.gnu.org/ml/gcc-patches/2005-
06/msg00035.html.
>
> > 2) Use more precise way to calculate the values of 
the
> > two kind of cycles, or just ignore this benefit 
assertion.
>
> Probably need to be more precise :-/
>
> When I manually hacked modulo-sched.c to ignore this 
test, I
> did see loops getting scheduled, but I also ran into 
ICEs in
> cfglayout.

There are no ICEs for pi.f90, swim.f, and mgrid.f 
according to my test. But, an internal compile error 
of 'unrecognizable insn' is produced 
by 'gen_sub2_insn' which explicitly minus 'ar.lc' when 
swim.f and mgrid.f are being compiled.

>
> > 3) The counted loop register 'ar.lc' of IA-64 can 
not be
> > updated  directly. Another temporary register is 
needed
> > to evaluate the value of the actural loop count 
after
> > SMS schedule, and assign its value to 'ar.lc'.
>
> Actually, should SMS just not update the loop 
register in place?
> I never figured out why it tries to produce a sub 
insns (using
> gen_sub2_insn which is also wrong btw).
>

The current implementation of SMS does not use IA-64's 
epilog register (ar.ec). After SMS, the loop count is 
just used to control the execution times of the kernel 
code, and the kernel code will execute 
   loop_count - (stage_count - 1) times
The sub insns generated by gen_sub2_insn is used to 
produce this value.

> Gr.
> Steven
>
> 

Canqun Yang
Creative Compiler Research Group.
National University of Defense Technology, China.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: SMS in gcc4.0
  2005-06-02  1:13             ` Canqun Yang
@ 2005-06-02 13:41               ` Mostafa Hagog
  0 siblings, 0 replies; 15+ messages in thread
From: Mostafa Hagog @ 2005-06-02 13:41 UTC (permalink / raw)
  To: Canqun Yang; +Cc: Ayal Zaks, canqun, gcc, Steven Bosscher

canqun@nudt.edu.cn wrote on 02/06/2005 04:29:17:

> Steven Bosscher <stevenb@suse.de>:
>
> > On Wednesday 01 June 2005 16:43, Canqun Yang wrote:
> > > Hi, all
> > >
> > > I've taken a look on modulo-sched.c recently, and found
> > > that both new_cycles and orig_cycles are imprecise. The
> > > reason is that kernel_number_of_cycles does not take the
> > > data dependences of insns into account as the DFA
> > > scheduler does in haifa-sched.c.
> >
> > How does this affect the cycles computation?
> >
>
> An insns is ready for schedule only when all the insns
> it dependent on have already be scheduled. In haifa-
> sched.c, there is a queue to hold the insns which are
> ready for schedule.

The code mentioned above is the part that decides if SMS did a better
schedule or not and decides if to use the SMSed kernel or stay with the
original kernel -- we don't want to spend more compile time on additional
scheduling pass to make this more accurate (if this is what you are
suggesting).
A proper solution is to add a parameter to GCC that its purpose is to
define what is "a good schedule" in means of kernel cycles. You even can
disable this by saying that any kernel generated by SMS is better than the
original.
For example, if we add a parameter, like: --param sms-percent-from-orig=N.
When N is 0 SMSed kernel will always be preferred over the original. The
current implementation is similar to --param sms-percent-from-orig=100.

>
> To find how the data dependence affect the cycles
> computation, the more simple way is to compare the
> two versions of assembly code generated by GCC
> respectively, one is generated by turning on '-fmodulo-
> sched', the other not. Without SMS, the code in loop
> has many stops ';;' to seperate the instrcutions which
> have data dependence, while with SMS, though the
> kernel code of the loop has more instructions, but
> less stops ';;'.

What are you suggesting here?

> >
> > When I manually hacked modulo-sched.c to ignore this test, I
> > did see loops getting scheduled, but I also ran into ICEs in
> > cfglayout.
>
> There are no ICEs for pi.f90, swim.f, and mgrid.f
> according to my test. But, an internal compile error
> of 'unrecognizable insn' is produced
> by 'gen_sub2_insn' which explicitly minus 'ar.lc' when
> swim.f and mgrid.f are being compiled.

I have committed a fix for SMS so I am not sure if this what eliminates the
ICEs Steven was seeing.

Regarding the 'unrecognizable insn' due to gen_sub2_insn; I had an
impression that it should generate the required register moves in order to
perform the calculation, if this is not true then there should be a wrapper
to this function that makes sure that we do the subtraction properly.  This
is not something specific to SMS, how do you generate a subtraction from
ar.lc on IA-64? I suppose that there is a pattern that covers this and
generates the appropriate RTL. Thus gen_sub2_insn should be using this
pattern and it is not -- so this is our problem.

Mostafa.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: SMS in gcc4.0
  2005-06-01 14:35           ` Steven Bosscher
  2005-06-02  1:13             ` Canqun Yang
@ 2005-06-02 13:09             ` Mostafa Hagog
  2005-06-02 13:14               ` Steven Bosscher
  1 sibling, 1 reply; 15+ messages in thread
From: Mostafa Hagog @ 2005-06-02 13:09 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: Ayal Zaks, Canqun Yang, gcc

Steven Bosscher <stevenb@suse.de> wrote on 01/06/2005 17:35:20:

> On Wednesday 01 June 2005 16:43, Canqun Yang wrote:
>
> > 3) The counted loop register 'ar.lc' of IA-64 can not be
> > updated  directly. Another temporary register is needed
> > to evaluate the value of the actural loop count after
> > SMS schedule, and assign its value to 'ar.lc'.
>
> Actually, should SMS just not update the loop register in place?
> I never figured out why it tries to produce a sub insns (using
> gen_sub2_insn which is also wrong btw).

The subtraction is required because SMSed kernel is always executed less
times than the original kernel of the loop. This difference is actually the
level of interleaving (parallelism) we get from SMS.
As to the subtraction for IA-64; I expect that the gen_sub2_insn handles
the subtraction correctly and generate the required RTL to do the
subtraction according to the machine description. If it requires additional
moves this is due to a target limitation, which means this is not the
problem of SMS, it should be somewhere in the machine description.
Mostafa.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: SMS in gcc4.0
  2005-06-02 13:09             ` Mostafa Hagog
@ 2005-06-02 13:14               ` Steven Bosscher
  0 siblings, 0 replies; 15+ messages in thread
From: Steven Bosscher @ 2005-06-02 13:14 UTC (permalink / raw)
  To: Mostafa Hagog; +Cc: Ayal Zaks, Canqun Yang, gcc

On Jun 02, 2005 03:09 PM, Mostafa Hagog <MUSTAFA@il.ibm.com> wrote:

> As to the subtraction for IA-64; I expect that the gen_sub2_insn handles
> the subtraction correctly and generate the required RTL to do the
> subtraction according to the machine description.

But that expectation is incorrect.  gen_sub2_insn will quite hapilly
produce RTL that doesn't satisfy the predicates and constraints of any
insn.

> If it requires additional
> moves this is due to a target limitation, which means this is not the
> problem of SMS, it should be somewhere in the machine description.

No, the machine description is fine.  There simply are no sub and add
insns for ar.lc (loop counter reg on ia64).

Gr.
Steven

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: SMS in gcc4.0
@ 2005-03-31 17:06 Canqun Yang
  2005-03-31 18:33 ` Steven Bosscher
  2005-03-31 18:37 ` Steven Bosscher
  0 siblings, 2 replies; 15+ messages in thread
From: Canqun Yang @ 2005-03-31 17:06 UTC (permalink / raw)
  To: MUSTAFA, mark.davis, gp, ZAKS, gcc, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 540 bytes --]

Hi, all

This patch will fix doloop_register_get defined in 
modulo-sched.c, and let the program of PI caculation 
on IA-64 be successfully modulo scheduled. On 1GHz 
Itanium-2, it costs just 3.128 seconds to execute when 
compiled with "-fmodulo-shced -O3" turned on, while 
5.454 seconds whithout "-fmodulo-sched".


2005-03-31  Canqun Yang  <canqun@nudt.edu.cn>

	* modulo-sched.c (doloop_register_get): Deal 
with if_then_else pattern.  


Canqun Yang
Creative Compiler Research Group.
National University of Defense Technology, China.

[-- Attachment #2: pi.f90 --]
[-- Type: application/octet-stream, Size: 298 bytes --]

! Compute the value of pi by numeric integration
program pi_ex
   implicit none
   integer n, i
   double precision h, pi
   parameter (n=80000000)
   h = 1.0 / n

   pi = 0.0
   do i=1, n
      pi = pi + h * (4 / (1.0  +  h*(i-0.5) * h*(i-0.5)))
   end do
   
   write(*,*) pi
end 

[-- Attachment #3: modulo-sched.txt --]
[-- Type: text/plain, Size: 2328 bytes --]

*** /home/ycq/mainline/gcc/gcc/modulo-sched.c	Mon Mar 21 10:49:23 2005
--- modulo-sched.c	Thu Mar 31 21:11:08 2005
*************** static rtx
*** 263,269 ****
  doloop_register_get (rtx insn, rtx *comp)
  {
    rtx pattern, cmp, inc, reg, condition;
! 
    if (!JUMP_P (insn))
      return NULL_RTX;
    pattern = PATTERN (insn);
--- 263,270 ----
  doloop_register_get (rtx insn, rtx *comp)
  {
    rtx pattern, cmp, inc, reg, condition;
!   rtx src;
!   
    if (!JUMP_P (insn))
      return NULL_RTX;
    pattern = PATTERN (insn);
*************** doloop_register_get (rtx insn, rtx *comp
*** 293,303 ****
  
    /* Extract loop counter register.  */
    reg = SET_DEST (inc);
  
    /* Check if something = (plus (reg) (const_int -1)).  */
!   if (GET_CODE (SET_SRC (inc)) != PLUS
!       || XEXP (SET_SRC (inc), 0) != reg
!       || XEXP (SET_SRC (inc), 1) != constm1_rtx)
      return NULL_RTX;
  
    /* Check for (set (pc) (if_then_else (condition)
--- 294,315 ----
  
    /* Extract loop counter register.  */
    reg = SET_DEST (inc);
+   src = SET_SRC (inc);
  
+   /* On IA-64, the RTL pattern of SRC is just like this 
+     (if_then_else:DI (ne (reg:DI 332 ar.lc)
+             (const_int 0 [0x0]))
+         (plus:DI (reg:DI 332 ar.lc)
+             (const_int -1 [0xffffffffffffffff]))
+         (reg:DI 332 ar.lc))  */
+ 
+   if (GET_CODE (src) == IF_THEN_ELSE)
+     src = XEXP (src, 1);
+   
    /* Check if something = (plus (reg) (const_int -1)).  */
!   if (GET_CODE (src) != PLUS
!       || XEXP (src, 0) != reg
!       || XEXP (src, 1) != constm1_rtx)
      return NULL_RTX;
  
    /* Check for (set (pc) (if_then_else (condition)
*************** doloop_register_get (rtx insn, rtx *comp
*** 318,324 ****
       if ((GET_CODE (condition) != GE && GET_CODE (condition) != NE)
  	 || GET_CODE (XEXP (condition, 1)) != CONST_INT).  */
    if (GET_CODE (condition) != NE
!       || XEXP (condition, 1) != const1_rtx)
      return NULL_RTX;
  
    if (XEXP (condition, 0) == reg)
--- 330,337 ----
       if ((GET_CODE (condition) != GE && GET_CODE (condition) != NE)
  	 || GET_CODE (XEXP (condition, 1)) != CONST_INT).  */
    if (GET_CODE (condition) != NE
!       || (XEXP (condition, 1) != const1_rtx
! 	  && XEXP (condition, 1) != const0_rtx))
      return NULL_RTX;
  
    if (XEXP (condition, 0) == reg)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: SMS in gcc4.0
  2005-03-31 17:06 Canqun Yang
@ 2005-03-31 18:33 ` Steven Bosscher
  2005-03-31 18:37 ` Steven Bosscher
  1 sibling, 0 replies; 15+ messages in thread
From: Steven Bosscher @ 2005-03-31 18:33 UTC (permalink / raw)
  To: Canqun Yang; +Cc: MUSTAFA, mark.davis, gp, ZAKS, gcc, gcc-patches

On Mar 31, 2005 03:56 PM, Canqun Yang <canqun@nudt.edu.cn> wrote:

> This patch will fix doloop_register_get defined in 
> modulo-sched.c, and let the program of PI caculation 
> on IA-64 be successfully modulo scheduled. On 1GHz 
> Itanium-2, it costs just 3.128 seconds to execute when 
> compiled with "-fmodulo-shced -O3" turned on, while 
> 5.454 seconds whithout "-fmodulo-sched".

Nice!  But makes me wonder... Mustafa, why can doloop_register_get
not just accept the same doloop patterns as the one accepted in
doloop_condition_get?

Gr.
Steven

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: SMS in gcc4.0
  2005-03-31 17:06 Canqun Yang
  2005-03-31 18:33 ` Steven Bosscher
@ 2005-03-31 18:37 ` Steven Bosscher
  2005-03-31 18:52   ` Mostafa Hagog
  1 sibling, 1 reply; 15+ messages in thread
From: Steven Bosscher @ 2005-03-31 18:37 UTC (permalink / raw)
  To: Canqun Yang; +Cc: MUSTAFA, mark.davis, gp, ZAKS, gcc, gcc-patches

On Mar 31, 2005 03:56 PM, Canqun Yang <canqun@nudt.edu.cn> wrote:

> This patch will fix doloop_register_get defined in 
> modulo-sched.c, and let the program of PI caculation 
> on IA-64 be successfully modulo scheduled. On 1GHz 
> Itanium-2, it costs just 3.128 seconds to execute when 
> compiled with "-fmodulo-shced -O3" turned on, while 
> 5.454 seconds whithout "-fmodulo-sched".

Nice!  But makes me wonder... Mustafa, why can doloop_register_get
not just accept the same doloop patterns as the one accepted in
doloop_condition_get?

Gr.
Steven

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: SMS in gcc4.0
  2005-03-31 18:37 ` Steven Bosscher
@ 2005-03-31 18:52   ` Mostafa Hagog
  0 siblings, 0 replies; 15+ messages in thread
From: Mostafa Hagog @ 2005-03-31 18:52 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: Ayal Zaks, Canqun Yang, gcc, gcc-patches, gp, mark.davis

Steven Bosscher <stevenb@suse.de> wrote on 31/03/2005 16:55:52:

> On Mar 31, 2005 03:56 PM, Canqun Yang <canqun@nudt.edu.cn> wrote:
>
> > This patch will fix doloop_register_get defined in
> > modulo-sched.c, and let the program of PI caculation
> > on IA-64 be successfully modulo scheduled. On 1GHz
> > Itanium-2, it costs just 3.128 seconds to execute when
> > compiled with "-fmodulo-shced -O3" turned on, while
> > 5.454 seconds whithout "-fmodulo-sched".
>
> Nice!  But makes me wonder... Mustafa, why can doloop_register_get
> not just accept the same doloop patterns as the one accepted in
> doloop_condition_get?

It should.  It seems that there was a major change to doloop_condition_get
after we implemented SMS and created doloop_register_get, the correct fix
is to combine both and make the doloop_condition_get external.  I will
prepare a patch for this.  Actually, there is a more fundamental change
that we can do to SMS, which is to use doloop_valid_p which tell us that
there is a countable loop no matter if the machine supports BCTs or not,
but this requires changes to the way we generate the prologue/epilogue.

Mostafa.

>
> Gr.
> Steven
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

[parent not found: <EFA13842B4FD55469C1A837C9C849644019D731D@hdsmsx401.amr.corp.intel.com>]

* RE: SMS in gcc4.0
       [not found] <EFA13842B4FD55469C1A837C9C849644019D731D@hdsmsx401.amr.corp.intel.com>
@ 2005-03-31 16:13 ` Mostafa Hagog
  0 siblings, 0 replies; 15+ messages in thread
From: Mostafa Hagog @ 2005-03-31 16:13 UTC (permalink / raw)
  To: Davis, Mark; +Cc: Gerald Pfeifer, Davis, Mark, gcc, Ayal Zaks

Hi Mark,

First of all I would like this discussion to be on the GCC mailing list; so
I am CCing the GCC mailing list (I hope this is OK with all the others).

"Davis, Mark" <mark.davis@intel.com> wrote on 31/03/2005 00:23:02:
> Mostafa & Gerald,
>
...
> It was mentioned that you folks had recently
> added SMS to gcc4.0, and I found the SMS paper from last year's gcc
> summit, and the description of SMS capabilities in gcc on the 4.0
> Features web site.  So the obvious approach is to use SMS for Itanium as
> well as Power5 and ....
>
> 1) Is SMS in gcc currently turned on for anything other than Power5?  I
> built a gcc4.0 for Itanium, and tried compiling the summation example
> from your paper (and some unrolled summation examples) using
>    -O3 -fmodulo-sched

We haven't yet put efforts to tune SMS for any specific architecture
(including Power5); SMS is implemented as general as the paper (mentioned
in http://gcc.gnu.org/news/sms.html) describes.

>
> but didn't see any difference in the .s file from not using
> -fmodulo-sched.  Are there other switches to turn on or dumps to look
> at?

I would suggest to start looking at SMS dumps to see what is doing there;
you can do so by adding the -dm flag to your compilation.  If you want you
can send me those dumps and I will look into them.

>
>
> I'm afraid I also was the origin of some of the "not very useful"
> comments about SMS.  From my way of thinking, if SMS doesn't have alias
> information or array dependence analysis, then SMS can't pipeline loops
> storing into array elements; therefore it is not very useful as a
> pipeliner, even if the swing modulo scheduling part is excellent.
> 2) Did I miss something here?

This is true; that's why we need accurate alias info in RTL level and this
is one of the efforts that one should concentrate on in improving SMS.

>
> I do not know about gcc internals (which is why I'm "project-managing",
> not "implementing"), so it was interesting and disturbing to hear what
> you and Vlad had to say about the different internal representations
> relative to when the SMS phase runs:
>    a) it seems to be too early to see the machine code
>    b) it's too late to have alias info
>
> 3) Do you agree with this assessment?

Its not black or white.  We need accurate alias info at the RTL level to be
able to software pipeline (SMS) the majority of the loops in the real world
programs- currently the RTL alias info is not accurate enough for those
loops.  Having the alias info make us capable of eliminating memory to
memory dependancies and thus make us know that we can interleave different
iterations of the loop.  The alias info is usually based on high level
representation of the code, the lower you are the more information you
lose.  One of the things that we would do is maintain this information
while we go down in the trees and RTL representation each pass that does
some transformation on the code will require additional effort to maintain
alias information which complicates it - that's why we want SMS to be  as
early as possible.  The other side of the coin is the modeling of the
machine resources (SMS is trying to solve a scheduling problem).  In SMS we
use DFA for resource modeling in which we follow each one of the
instruction resource usage and try to get the optimal schedule by moving
instructions among the different iteration trying to avoid resource
conflicts.  The problem in doing this early is that later passes can change
the resource usage of instructions when doing transformations on the code
(splitting instructions for example) and thus make the schedule not
optimal.  A good example for a way to handle this is the disabling of the
second scheduling pass for SMSed loops to prevent it from screwing the
schedules generated by SMS.  We can do the same for other passes and have a
cost model to decide if it is beneficial or not to perform the optimization
inside the SMSed loops.  Other problem that results from doing SMS before
register allocation is increasing the register pressure when SMS is
aggressive. IMO, this problem should be addressed later by using register
pressure estimation inside SMS.

> 4) Do you have any suggestions about using SMS for an in-order
> microarchitecture like Itanium which is more sensitive to the exact
> schedule than OOO microarchitectures like Power5?

Actually I would say that an in-order machine would benefit more from SMS
than an OOO machine, because theoretically OOO machines do the job of SMS
in hardware in many cases. The problem is not for IA64 being in-order, but
the fact that IA64 and other in-order machines are highly dependent on the
scheduling and among them SMS.  My suggestion is that we must invest in
lowering alias info to RTL and feed this information to the DDG used by SMS
which is implemented in ddg.c.

> 5) In the Intel compiler for Itanium, we carry the alias information
> from the high-level IL down to the machine-code level IL, and pipeline
> on the machine-code IL, before register allocation.

This is where SMS is currently positioned; it means that our problem is not
where SMS is performed but the alias information not getting there.  This
is exactly what we were thinking all the time; the IC example reinforces
this thought.

>
> thanks,
> Mark Davis
> Intel Compiler Lab
> (formerly with DEC compiler team)
> Nashua, NH
>

Mostafa.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2005-06-02 13:41 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-06-02  1:25 SMS in gcc4.0 Canqun Yang
     [not found] <OF89DFFFBC.F31A0E99-ON43256FEA.005530A7-43256FEA.0055CD57@il.ibm.com>
     [not found] ` <200504211739.42879.stevenb@suse.de>
2005-04-22  3:57   ` Canqun Yang
2005-04-22  6:58     ` Steven Bosscher
2005-05-09 10:54       ` Mostafa Hagog
2005-06-01 14:28         ` Canqun Yang
2005-06-01 14:35           ` Steven Bosscher
2005-06-02  1:13             ` Canqun Yang
2005-06-02 13:41               ` Mostafa Hagog
2005-06-02 13:09             ` Mostafa Hagog
2005-06-02 13:14               ` Steven Bosscher
  -- strict thread matches above, loose matches on Subject: below --
2005-03-31 17:06 Canqun Yang
2005-03-31 18:33 ` Steven Bosscher
2005-03-31 18:37 ` Steven Bosscher
2005-03-31 18:52   ` Mostafa Hagog
     [not found] <EFA13842B4FD55469C1A837C9C849644019D731D@hdsmsx401.amr.corp.intel.com>
2005-03-31 16:13 ` Mostafa Hagog

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).