Instruction scheduler question

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* Instruction scheduler question
@ 2011-10-07 13:38 BELBACHIR Selim
  2011-10-07 13:57 ` Bernd Schmidt
  0 siblings, 1 reply; 2+ messages in thread
From: BELBACHIR Selim @ 2011-10-07 13:38 UTC (permalink / raw)
  To: gcc

Hello,

I'm trying to express the instruction latency time constraints of a private processor.


* Overview :

Two cycles are necessary between a comparison instruction and a conditionnal jump instruction (GSR is updated 2 cycles after comparison).

If nothing better than 'nop' can be used between compare and jump the asm shall be :

     cmp $A $B
     nop
     nop
     jmpifeq $C

I copied mips method to insert 'nop' (using TARGET_MACHINE_DEPENDENT_REORG macro).


* Automaton definition :

(define_cpu_unit "ctrl")
(define_cpu_unit "readmem")
(define_cpu_unit "gsr")

;; To express that the compare result (gsr) will only be available in 3 cycles
(define_insn_reservation "COMPARE" 3 
  (eq_attr "type" "compare")
  "gsr+ctrl,gsr*2")

;; To express that jump uses the gsr result
(define_insn_reservation "JUMP" 1 
  (eq_attr "type" "jump")
  "ctrl+gsr")

My compare insn has attribute type = 'compare' and my jump insn has attribute type = 'jump'


* Problem :

I never see instruction other than 'nop' between the compare and jump instructions. 
For example, I see :

(asm result)

load d($C2),$R1     <--1st operand for comparison     ctrl,readmem,nothing
loadi 0,$C4         <- 2nd operand for comparison     ctrl,nothing
load d($C2+4),$R2   <--no data dependancies           ctrl,readmem,nothing
cmp $C4,$R1                                           (gsr+ctrl),gsr*2
nop
nop
jmpifeq .L5                                           (ctrl+gsr)            

(.sched2)

;;   --- Region Dependences --- b 2 bb 0
;;      insn  code    bb   dep  prio  cost   reservation
;;      ----  ----    --   ---  ----  ----   -----------
;;       18     0     2    14     7     3   ctrl,readmem,nothing  : 23 22 19
;;       19     0     2    15     3     3   ctrl,readmem,nothing  : 23
;;       81     0     2     1     6     2   ctrl,nothing          : 23 22
;;       22    10     2     3     4     3   (gsr+ctrl),gsr*2      : 23
;;       23     9     2    20     1     1   (ctrl+gsr)            :


;;              dependencies resolved: insn 81
;;              Ready-->Q: insn 81: queued for 1 cycles.
;;              tick updated: insn 81 into queue with cost=1
;;              dependencies resolved: insn 18
;;              tick updated: insn 18 into ready
;;      Ready list (t =  18):    18:17
;;              Q-->Ready: insn 81: moving to ready without stalls
;;              Ready list after queue_to_ready:    81:19  18:17
;;              Ready list after ready_sort:    81:19  18:17
;;      Ready list (t =  19):    81:19  18:17
;;       19-->    18 $R1=[$C2]                         :ctrl,datar,nothing
;;              dependencies resolved: insn 19
;;              tick updated: insn 19 into ready
;;      Ready list (t =  19):    19:18  81:19
;;              Ready list after queue_to_ready:    19:18  81:19
;;              Ready list after ready_sort:    19:18  81:19
;;      Ready list (t =  20):    19:18  81:19
;;       20-->    81 $C4=0x0                           :ctrl,nothing
;;              dependencies resolved: insn 22
;;              Ready-->Q: insn 22: queued for 2 cycles.
;;              tick updated: insn 22 into queue with cost=2
;;      Ready list (t =  20):    19:18
;;              Ready list after queue_to_ready:    19:18
;;              Ready list after ready_sort:    19:18
;;      Ready list (t =  21):    19:18
;;       21-->    19 $R2=[$C2+0x4]                     :ctrl,datar,nothing
;;      Ready list (t =  21):
;;              Q-->Ready: insn 22: moving to ready without stalls
;;              Ready list after queue_to_ready:    22:20
;;              Ready list after ready_sort:    22:20
;;      Ready list (t =  22):    22:20
;;       22-->    22 {$GSR=cmp($R1,$C4);clobber $R3;}  :(gsr+ctrl),gsr*2
;;              dependencies resolved: insn 23
;;              Ready-->Q: insn 23: queued for 3 cycles.
;;              tick updated: insn 23 into queue with cost=3
;;      Ready list (t =  22):
;;              Q-->Ready: insn 23: moving to ready with 2 stalls
;;              Ready list after queue_to_ready:    23:21
;;              Ready list after ready_sort:    23:21
;;      Ready list (t =  25):    23:21
;;       25-->    23 pc={($GSR==0x0)?L110:pc}          :(ctrl+gsr)
;;      Ready list (t =  25):
;;      Ready list (final):


The 'load d($C2+4),$R2' instruction seems a good canditate to be moved between compare and jump instruction because there are no data dependencies with compare/jump instructions and no reservation collision.

Can someone explain me how to obtain the following assembler ? : 

load d($C2),$R1     
loadi 0,$C4         
cmp $C4,$R1
load d($C2+4),$R2
nop
jmpifeq .L5                                           

     Regards,

          Selim Belbachir

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Instruction scheduler question
  2011-10-07 13:38 Instruction scheduler question BELBACHIR Selim
@ 2011-10-07 13:57 ` Bernd Schmidt
  0 siblings, 0 replies; 2+ messages in thread
From: Bernd Schmidt @ 2011-10-07 13:57 UTC (permalink / raw)
  To: BELBACHIR Selim; +Cc: gcc

On 10/07/11 09:50, BELBACHIR Selim wrote:
> (asm result)
> 
> load d($C2),$R1     <--1st operand for comparison     ctrl,readmem,nothing
> loadi 0,$C4         <- 2nd operand for comparison     ctrl,nothing
> load d($C2+4),$R2   <--no data dependancies           ctrl,readmem,nothing
> cmp $C4,$R1                                           (gsr+ctrl),gsr*2
> nop
> nop
> jmpifeq .L5                                           (ctrl+gsr)            
> 
> (.sched2)
> ;;       20-->    81 $C4=0x0                           :ctrl,nothing
> ;;              dependencies resolved: insn 22
> ;;              Ready-->Q: insn 22: queued for 2 cycles.
> ;;              tick updated: insn 22 into queue with cost=2

Insn 22 is the compare, and the compiler thinks that its operand $C4
becomes available only after two cycles. Presumably the first load to
$R1 has a delay of 3 cycles. So the scheduler places the compare at the
point where its operands are ready. The load to $R2 occurs before it
because it can execute earlier.

If the delays are modeled correctly, I think there's no way to get the
code to run faster.


Bernd

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2011-10-07 11:52 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-10-07 13:38 Instruction scheduler question BELBACHIR Selim
2011-10-07 13:57 ` Bernd Schmidt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).