* DFA scheduler producing sub-optimal code.
@ 2002-11-06 2:56 Dan Towner
2002-11-06 9:19 ` Vladimir Makarov
0 siblings, 1 reply; 2+ messages in thread
From: Dan Towner @ 2002-11-06 2:56 UTC (permalink / raw)
To: gcc
Hi,
I am using the DFA scheduler for a 16-bit VLIW. The VLIW has 3
instruction slots, and one constant value slot. On one of my test cases,
I get the following schedule (I've added lines to show VLIW packets):
;; 4--> 10 R9=[R5+0x2] :slot1,nothing
------------------------------------------------------------------------
;; 5--> 11 R10=[R5+0x4] :slot1,nothing
------------------------------------------------------------------------
;; 6--> 12 R11=[R5+0x6] :slot1,nothing
;; 6--> 22 R3=R9 :slot0|slot1
------------------------------------------------------------------------
;; 7--> 23 R4=R10 :slot0|slot1
------------------------------------------------------------------------
;; 8--> 24 R5=R11 :slot0|slot1
------------------------------------------------------------------------
;; 9--> 70 R0=FP+0x4 :(slot0+slot1+slot2+slotC)
Notice that cycles 7 and 8 could be scheduled into either slot0 or
slot1. The instructions have been `split' from an SI
register-to-register move, which has been decomposed into its
constituent sub-register moves. However, the two instructions in their
new form use different registers.
Why doesn't the scheduler combine these two instructions into the same
cycle? I've tried using the sched-verbose option (see output below), but
I can't see any reason why it has chosen a new cycle for insn 24.
Thanks,
Dan.
----------------------
;; Clock 4
;; Ready list (t = 4): 70 12 11 10
;; 4--> 10 R9=[R5+0x2] :slot1,nothing
;; dependences resolved: insn 22 into queue with cost=2
;; Ready-->Q: insn 22: queued for 2 cycles.
;; Ready list (t = 4): 70 12 11
;; Ready-->Q: insn 11: queued for 1 cycles.
;; Ready list (t = 4): 70 12
;; Ready-->Q: insn 12: queued for 1 cycles.
;; Ready list (t = 4): 70
;; Ready-->Q: insn 70: queued for 1 cycles.
;; Ready list (t = 4):
;; Q-->Ready: insn 70: moving to ready without stalls
;; Q-->Ready: insn 12: moving to ready without stalls
;; Q-->Ready: insn 11: moving to ready without stalls
;; Ready list after queue_to_ready: 11 12 70
;; Clock 5
;; Ready list (t = 5): 70 12 11
;; 5--> 11 R10=[R5+0x4] :slot1,nothing
;; dependences resolved: insn 23 into queue with cost=2
;; Ready-->Q: insn 23: queued for 2 cycles.
;; Ready list (t = 5): 70 12
;; Ready-->Q: insn 12: queued for 1 cycles.
;; Ready list (t = 5): 70
;; Ready-->Q: insn 70: queued for 1 cycles.
;; Ready list (t = 5):
;; Q-->Ready: insn 70: moving to ready without stalls
;; Q-->Ready: insn 12: moving to ready without stalls
;; Q-->Ready: insn 22: moving to ready without stalls
;; Ready list after queue_to_ready: 22 12 70
;; Clock 6
;; Ready list (t = 6): 70 22 12
;; 6--> 12 R11=[R5+0x6] :slot1,nothing
;; dependences resolved: insn 24 into queue with cost=2
;; Ready-->Q: insn 24: queued for 2 cycles.
;; Ready list (t = 6): 70 22
;; 6--> 22 R3=R9 :slot0|slot1
;; Ready list (t = 6): 70
;; Ready-->Q: insn 70: queued for 1 cycles.
;; Ready list (t = 6):
;; Q-->Ready: insn 70: moving to ready without stalls
;; Q-->Ready: insn 23: moving to ready without stalls
;; Ready list after queue_to_ready: 23 70
;; Clock 7
;; Ready list (t = 7): 70 23
;; 7--> 23 R4=R10 :slot0|slot1
;; Ready list (t = 7): 70
;; Ready-->Q: insn 70: queued for 1 cycles.
;; Ready list (t = 7):
;; Q-->Ready: insn 70: moving to ready without stalls
;; Q-->Ready: insn 24: moving to ready without stalls
;; Ready list after queue_to_ready: 24 70
;; Clock 8
;; Ready list (t = 8): 70 24
;; 8--> 24 R5=R11 :slot0|slot1
;; Ready list (t = 8): 70
;; Ready-->Q: insn 70: queued for 1 cycles.
;; Ready list (t = 8):
;; Q-->Ready: insn 70: moving to ready without stalls
;; Ready list after queue_to_ready: 70
;; Clock 9
;; Ready list (t = 9): 70
;; 9--> 70 R0=FP+0x4
:(slot0+slot1+slot2+slotC)
;; dependences resolved: insn 25 into queue with cost=1
;; Ready-->Q: insn 25: queued for 1 cycles.
;; Ready list (t = 9):
;; Q-->Ready: insn 25: moving to ready without stalls
;; Ready list after queue_to_ready: 25
=============================================================================
Daniel Towner
picoChip Designs Ltd., Riverside Buildings, 108, Walcot Street, BATH,
BA1 5BG
dant@picochip.com
07786 702589
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: DFA scheduler producing sub-optimal code.
2002-11-06 2:56 DFA scheduler producing sub-optimal code Dan Towner
@ 2002-11-06 9:19 ` Vladimir Makarov
0 siblings, 0 replies; 2+ messages in thread
From: Vladimir Makarov @ 2002-11-06 9:19 UTC (permalink / raw)
To: Dan Towner; +Cc: gcc
Dan Towner wrote:
>
> Hi,
>
> I am using the DFA scheduler for a 16-bit VLIW. The VLIW has 3
> instruction slots, and one constant value slot. On one of my test cases,
> I get the following schedule (I've added lines to show VLIW packets):
>
> ;; 4--> 10 R9=[R5+0x2] :slot1,nothing
> ------------------------------------------------------------------------
> ;; 5--> 11 R10=[R5+0x4] :slot1,nothing
> ------------------------------------------------------------------------
> ;; 6--> 12 R11=[R5+0x6] :slot1,nothing
> ;; 6--> 22 R3=R9 :slot0|slot1
> ------------------------------------------------------------------------
> ;; 7--> 23 R4=R10 :slot0|slot1
> ------------------------------------------------------------------------
> ;; 8--> 24 R5=R11 :slot0|slot1
> ------------------------------------------------------------------------
> ;; 9--> 70 R0=FP+0x4 :(slot0+slot1+slot2+slotC)
>
> Notice that cycles 7 and 8 could be scheduled into either slot0 or
> slot1. The instructions have been `split' from an SI
> register-to-register move, which has been decomposed into its
> constituent sub-register moves. However, the two instructions in their
> new form use different registers.
>
> Why doesn't the scheduler combine these two instructions into the same
> cycle? I've tried using the sched-verbose option (see output below), but
> I can't see any reason why it has chosen a new cycle for insn 24.
>
The most probably, it is because the latency time of insn #12 is 2
cycles. So insn #24 can not be issued until the data is ready.
> Thanks,
>
> Dan.
>
> ----------------------
>
> ;; Clock 4
> ;; Ready list (t = 4): 70 12 11 10
> ;; 4--> 10 R9=[R5+0x2] :slot1,nothing
> ;; dependences resolved: insn 22 into queue with cost=2
> ;; Ready-->Q: insn 22: queued for 2 cycles.
> ;; Ready list (t = 4): 70 12 11
> ;; Ready-->Q: insn 11: queued for 1 cycles.
> ;; Ready list (t = 4): 70 12
> ;; Ready-->Q: insn 12: queued for 1 cycles.
> ;; Ready list (t = 4): 70
> ;; Ready-->Q: insn 70: queued for 1 cycles.
> ;; Ready list (t = 4):
> ;; Q-->Ready: insn 70: moving to ready without stalls
> ;; Q-->Ready: insn 12: moving to ready without stalls
> ;; Q-->Ready: insn 11: moving to ready without stalls
> ;; Ready list after queue_to_ready: 11 12 70
> ;; Clock 5
> ;; Ready list (t = 5): 70 12 11
> ;; 5--> 11 R10=[R5+0x4] :slot1,nothing
> ;; dependences resolved: insn 23 into queue with cost=2
> ;; Ready-->Q: insn 23: queued for 2 cycles.
> ;; Ready list (t = 5): 70 12
> ;; Ready-->Q: insn 12: queued for 1 cycles.
> ;; Ready list (t = 5): 70
> ;; Ready-->Q: insn 70: queued for 1 cycles.
> ;; Ready list (t = 5):
> ;; Q-->Ready: insn 70: moving to ready without stalls
> ;; Q-->Ready: insn 12: moving to ready without stalls
> ;; Q-->Ready: insn 22: moving to ready without stalls
> ;; Ready list after queue_to_ready: 22 12 70
> ;; Clock 6
> ;; Ready list (t = 6): 70 22 12
> ;; 6--> 12 R11=[R5+0x6] :slot1,nothing
> ;; dependences resolved: insn 24 into queue with cost=2
> ;; Ready-->Q: insn 24: queued for 2 cycles.
> ;; Ready list (t = 6): 70 22
> ;; 6--> 22 R3=R9 :slot0|slot1
> ;; Ready list (t = 6): 70
> ;; Ready-->Q: insn 70: queued for 1 cycles.
> ;; Ready list (t = 6):
> ;; Q-->Ready: insn 70: moving to ready without stalls
> ;; Q-->Ready: insn 23: moving to ready without stalls
> ;; Ready list after queue_to_ready: 23 70
> ;; Clock 7
> ;; Ready list (t = 7): 70 23
> ;; 7--> 23 R4=R10 :slot0|slot1
> ;; Ready list (t = 7): 70
> ;; Ready-->Q: insn 70: queued for 1 cycles.
> ;; Ready list (t = 7):
> ;; Q-->Ready: insn 70: moving to ready without stalls
> ;; Q-->Ready: insn 24: moving to ready without stalls
> ;; Ready list after queue_to_ready: 24 70
> ;; Clock 8
> ;; Ready list (t = 8): 70 24
> ;; 8--> 24 R5=R11 :slot0|slot1
> ;; Ready list (t = 8): 70
> ;; Ready-->Q: insn 70: queued for 1 cycles.
> ;; Ready list (t = 8):
> ;; Q-->Ready: insn 70: moving to ready without stalls
> ;; Ready list after queue_to_ready: 70
> ;; Clock 9
> ;; Ready list (t = 9): 70
> ;; 9--> 70 R0=FP+0x4
> :(slot0+slot1+slot2+slotC)
> ;; dependences resolved: insn 25 into queue with cost=1
> ;; Ready-->Q: insn 25: queued for 1 cycles.
> ;; Ready list (t = 9):
> ;; Q-->Ready: insn 25: moving to ready without stalls
> ;; Ready list after queue_to_ready: 25
>
> =============================================================================
> Daniel Towner
> picoChip Designs Ltd., Riverside Buildings, 108, Walcot Street, BATH,
> BA1 5BG
> dant@picochip.com
> 07786 702589
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2002-11-06 16:50 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-11-06 2:56 DFA scheduler producing sub-optimal code Dan Towner
2002-11-06 9:19 ` Vladimir Makarov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).