public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* DFA scheduler producing sub-optimal code.
@ 2002-11-06  2:56 Dan Towner
  2002-11-06  9:19 ` Vladimir Makarov
  0 siblings, 1 reply; 2+ messages in thread
From: Dan Towner @ 2002-11-06  2:56 UTC (permalink / raw)
  To: gcc

Hi,

I am using the DFA scheduler for a 16-bit VLIW. The VLIW has 3 
instruction slots, and one constant value slot. On one of my test cases, 
I get the following schedule (I've added lines to show VLIW packets):

;;        4--> 10   R9=[R5+0x2]              :slot1,nothing
------------------------------------------------------------------------
;;        5--> 11   R10=[R5+0x4]             :slot1,nothing
------------------------------------------------------------------------
;;        6--> 12   R11=[R5+0x6]             :slot1,nothing
;;        6--> 22   R3=R9                    :slot0|slot1
------------------------------------------------------------------------
;;        7--> 23   R4=R10                   :slot0|slot1
------------------------------------------------------------------------
;;        8--> 24   R5=R11                   :slot0|slot1
------------------------------------------------------------------------
;;        9--> 70   R0=FP+0x4                :(slot0+slot1+slot2+slotC)

Notice that cycles 7 and 8 could be scheduled into either slot0 or 
slot1. The instructions have been `split' from an SI 
register-to-register move, which has been decomposed into its 
constituent sub-register moves. However, the two instructions in their 
new form use different registers.

Why doesn't the scheduler combine these two instructions into the same 
cycle? I've tried using the sched-verbose option (see output below), but 
I can't see any reason why it has chosen a new cycle for insn 24.

Thanks,

Dan.

----------------------

;;      Clock 4
;;      Ready list (t =  4):    70  12  11  10
;;        4--> 10   R9=[R5+0x2]                        :slot1,nothing
;;              dependences resolved: insn 22 into queue with cost=2
;;              Ready-->Q: insn 22: queued for 2 cycles.
;;      Ready list (t =  4):    70  12  11
;;              Ready-->Q: insn 11: queued for 1 cycles.
;;      Ready list (t =  4):    70  12
;;              Ready-->Q: insn 12: queued for 1 cycles.
;;      Ready list (t =  4):    70
;;              Ready-->Q: insn 70: queued for 1 cycles.
;;      Ready list (t =  4):
;;              Q-->Ready: insn 70: moving to ready without stalls
;;              Q-->Ready: insn 12: moving to ready without stalls
;;              Q-->Ready: insn 11: moving to ready without stalls
;;              Ready list after queue_to_ready:    11  12  70
;;      Clock 5
;;      Ready list (t =  5):    70  12  11
;;        5--> 11   R10=[R5+0x4]                       :slot1,nothing
;;              dependences resolved: insn 23 into queue with cost=2
;;              Ready-->Q: insn 23: queued for 2 cycles.
;;      Ready list (t =  5):    70  12
;;              Ready-->Q: insn 12: queued for 1 cycles.
;;      Ready list (t =  5):    70
;;              Ready-->Q: insn 70: queued for 1 cycles.
;;      Ready list (t =  5):
;;              Q-->Ready: insn 70: moving to ready without stalls
;;              Q-->Ready: insn 12: moving to ready without stalls
;;              Q-->Ready: insn 22: moving to ready without stalls
;;              Ready list after queue_to_ready:    22  12  70
;;      Clock 6
;;      Ready list (t =  6):    70  22  12
;;        6--> 12   R11=[R5+0x6]                       :slot1,nothing
;;              dependences resolved: insn 24 into queue with cost=2
;;              Ready-->Q: insn 24: queued for 2 cycles.
;;      Ready list (t =  6):    70  22
;;        6--> 22   R3=R9                              :slot0|slot1
;;      Ready list (t =  6):    70
;;              Ready-->Q: insn 70: queued for 1 cycles.
;;      Ready list (t =  6):
;;              Q-->Ready: insn 70: moving to ready without stalls
;;              Q-->Ready: insn 23: moving to ready without stalls
;;              Ready list after queue_to_ready:    23  70
;;      Clock 7
;;      Ready list (t =  7):    70  23
;;        7--> 23   R4=R10                             :slot0|slot1
;;      Ready list (t =  7):    70
;;              Ready-->Q: insn 70: queued for 1 cycles.
;;      Ready list (t =  7):
;;              Q-->Ready: insn 70: moving to ready without stalls
;;              Q-->Ready: insn 24: moving to ready without stalls
;;              Ready list after queue_to_ready:    24  70
;;      Clock 8
;;      Ready list (t =  8):    70  24
;;        8--> 24   R5=R11                             :slot0|slot1
;;      Ready list (t =  8):    70
;;              Ready-->Q: insn 70: queued for 1 cycles.
;;      Ready list (t =  8):
;;              Q-->Ready: insn 70: moving to ready without stalls
;;              Ready list after queue_to_ready:    70
;;      Clock 9
;;      Ready list (t =  9):    70
;;        9--> 70   R0=FP+0x4 
:(slot0+slot1+slot2+slotC)
;;              dependences resolved: insn 25 into queue with cost=1
;;              Ready-->Q: insn 25: queued for 1 cycles.
;;      Ready list (t =  9):
;;              Q-->Ready: insn 25: moving to ready without stalls
;;              Ready list after queue_to_ready:    25

=============================================================================
Daniel Towner
picoChip Designs Ltd., Riverside Buildings, 108, Walcot Street, BATH, 
BA1 5BG
dant@picochip.com
07786 702589


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: DFA scheduler producing sub-optimal code.
  2002-11-06  2:56 DFA scheduler producing sub-optimal code Dan Towner
@ 2002-11-06  9:19 ` Vladimir Makarov
  0 siblings, 0 replies; 2+ messages in thread
From: Vladimir Makarov @ 2002-11-06  9:19 UTC (permalink / raw)
  To: Dan Towner; +Cc: gcc

Dan Towner wrote:
> 
> Hi,
> 
> I am using the DFA scheduler for a 16-bit VLIW. The VLIW has 3
> instruction slots, and one constant value slot. On one of my test cases,
> I get the following schedule (I've added lines to show VLIW packets):
> 
> ;;        4--> 10   R9=[R5+0x2]              :slot1,nothing
> ------------------------------------------------------------------------
> ;;        5--> 11   R10=[R5+0x4]             :slot1,nothing
> ------------------------------------------------------------------------
> ;;        6--> 12   R11=[R5+0x6]             :slot1,nothing
> ;;        6--> 22   R3=R9                    :slot0|slot1
> ------------------------------------------------------------------------
> ;;        7--> 23   R4=R10                   :slot0|slot1
> ------------------------------------------------------------------------
> ;;        8--> 24   R5=R11                   :slot0|slot1
> ------------------------------------------------------------------------
> ;;        9--> 70   R0=FP+0x4                :(slot0+slot1+slot2+slotC)
> 
> Notice that cycles 7 and 8 could be scheduled into either slot0 or
> slot1. The instructions have been `split' from an SI
> register-to-register move, which has been decomposed into its
> constituent sub-register moves. However, the two instructions in their
> new form use different registers.
> 
> Why doesn't the scheduler combine these two instructions into the same
> cycle? I've tried using the sched-verbose option (see output below), but
> I can't see any reason why it has chosen a new cycle for insn 24.
>

The most probably, it is because the latency time of insn #12 is 2
cycles.  So insn #24 can not be issued until the data is ready.
 
> Thanks,
> 
> Dan.
> 
> ----------------------
> 
> ;;      Clock 4
> ;;      Ready list (t =  4):    70  12  11  10
> ;;        4--> 10   R9=[R5+0x2]                        :slot1,nothing
> ;;              dependences resolved: insn 22 into queue with cost=2
> ;;              Ready-->Q: insn 22: queued for 2 cycles.
> ;;      Ready list (t =  4):    70  12  11
> ;;              Ready-->Q: insn 11: queued for 1 cycles.
> ;;      Ready list (t =  4):    70  12
> ;;              Ready-->Q: insn 12: queued for 1 cycles.
> ;;      Ready list (t =  4):    70
> ;;              Ready-->Q: insn 70: queued for 1 cycles.
> ;;      Ready list (t =  4):
> ;;              Q-->Ready: insn 70: moving to ready without stalls
> ;;              Q-->Ready: insn 12: moving to ready without stalls
> ;;              Q-->Ready: insn 11: moving to ready without stalls
> ;;              Ready list after queue_to_ready:    11  12  70
> ;;      Clock 5
> ;;      Ready list (t =  5):    70  12  11
> ;;        5--> 11   R10=[R5+0x4]                       :slot1,nothing
> ;;              dependences resolved: insn 23 into queue with cost=2
> ;;              Ready-->Q: insn 23: queued for 2 cycles.
> ;;      Ready list (t =  5):    70  12
> ;;              Ready-->Q: insn 12: queued for 1 cycles.
> ;;      Ready list (t =  5):    70
> ;;              Ready-->Q: insn 70: queued for 1 cycles.
> ;;      Ready list (t =  5):
> ;;              Q-->Ready: insn 70: moving to ready without stalls
> ;;              Q-->Ready: insn 12: moving to ready without stalls
> ;;              Q-->Ready: insn 22: moving to ready without stalls
> ;;              Ready list after queue_to_ready:    22  12  70
> ;;      Clock 6
> ;;      Ready list (t =  6):    70  22  12
> ;;        6--> 12   R11=[R5+0x6]                       :slot1,nothing
> ;;              dependences resolved: insn 24 into queue with cost=2
> ;;              Ready-->Q: insn 24: queued for 2 cycles.
> ;;      Ready list (t =  6):    70  22
> ;;        6--> 22   R3=R9                              :slot0|slot1
> ;;      Ready list (t =  6):    70
> ;;              Ready-->Q: insn 70: queued for 1 cycles.
> ;;      Ready list (t =  6):
> ;;              Q-->Ready: insn 70: moving to ready without stalls
> ;;              Q-->Ready: insn 23: moving to ready without stalls
> ;;              Ready list after queue_to_ready:    23  70
> ;;      Clock 7
> ;;      Ready list (t =  7):    70  23
> ;;        7--> 23   R4=R10                             :slot0|slot1
> ;;      Ready list (t =  7):    70
> ;;              Ready-->Q: insn 70: queued for 1 cycles.
> ;;      Ready list (t =  7):
> ;;              Q-->Ready: insn 70: moving to ready without stalls
> ;;              Q-->Ready: insn 24: moving to ready without stalls
> ;;              Ready list after queue_to_ready:    24  70
> ;;      Clock 8
> ;;      Ready list (t =  8):    70  24
> ;;        8--> 24   R5=R11                             :slot0|slot1
> ;;      Ready list (t =  8):    70
> ;;              Ready-->Q: insn 70: queued for 1 cycles.
> ;;      Ready list (t =  8):
> ;;              Q-->Ready: insn 70: moving to ready without stalls
> ;;              Ready list after queue_to_ready:    70
> ;;      Clock 9
> ;;      Ready list (t =  9):    70
> ;;        9--> 70   R0=FP+0x4
> :(slot0+slot1+slot2+slotC)
> ;;              dependences resolved: insn 25 into queue with cost=1
> ;;              Ready-->Q: insn 25: queued for 1 cycles.
> ;;      Ready list (t =  9):
> ;;              Q-->Ready: insn 25: moving to ready without stalls
> ;;              Ready list after queue_to_ready:    25
> 
> =============================================================================
> Daniel Towner
> picoChip Designs Ltd., Riverside Buildings, 108, Walcot Street, BATH,
> BA1 5BG
> dant@picochip.com
> 07786 702589

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2002-11-06 16:50 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-11-06  2:56 DFA scheduler producing sub-optimal code Dan Towner
2002-11-06  9:19 ` Vladimir Makarov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).