public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Lousy scheduling decisions owing to bad interaction between sched and greg :(
@ 2004-09-03 12:52 Dave Korn
  2004-09-03 13:26 ` Joern Rennecke
  2004-09-03 16:35 ` Jeffrey A Law
  0 siblings, 2 replies; 5+ messages in thread
From: Dave Korn @ 2004-09-03 12:52 UTC (permalink / raw)
  To: gcc



    Morning all!


[  This is a rather long post, so I'll summarize very quickly at the top:
1) I've got bad scheduling decisions being made, 2) I think I know why, 3)
Would anyone be so kind as to cast an eye over my define_function_unit
statement to sanity-check it for me, 4) Does an expander sound like the
right solution to the problem?  ]


===================================================
 1)  I've got bad scheduling decisions being made.
===================================================

  Well, I've run up against something confusing here.  I'm using an
old-style scheduler description, and it's failing very badly to avoid memory
access stalls.  I think I've understood _why_ it's going wrong, but I can't
quite see how it could ever hope to get it right in the situation.  Let me
explain.....

========================
 2) I think I know why.
========================

  The RTL in question looks like this:

------------------------<source.i.22.sched>------------------------
(insn 25 21 29 0 0x101f6b40 (set (mem/s:SI (const:SI (plus:SI
(symbol_ref/v:SI ("TxHeaderArray"))
                    (const_int 4 [0x4]))) [7 TxHeaderArray+4 S4 A32])
        (mem/v:SI (const_int 64036 [0xfa24]) [7 S4 A32])) 22 {movsi}
(insn_list 13 (insn_list 21 (insn_list:REG_DEP_ANTI 16 (nil))))
    (nil))

(insn 29 25 36 0 0x101f6b40 (set (mem/s:SI (symbol_ref/v:SI
("TxHeaderArray")) [7 TxHeaderArray+0 S4 A32])
        (mem/v:SI (const_int 64032 [0xfa20]) [7 S4 A32])) 22 {movsi}
(insn_list 13 (insn_list 21 (insn_list:REG_DEP_ANTI 16
(insn_list:REG_DEP_ANTI 25 (nil)))))
    (nil))
------------------------<source.i.22.sched>------------------------

  Now, I have reg->mem and mem->reg patterns in movsi, but there's no
mem->mem, so it's going to need a register.  I'm using a CPU that can issue
a load every cycle, but the result isn't available until the next cycle + 1;
if you attempt to use the result of a load the very next cycle you stall.
So I want it to generate code that looks like...


   load reg_1 from mem at TxHeaderArray+4
   load reg_2 from mem at TxHeaderArray
   store reg_1 into mem at 0xfa24
   store reg_2 into mem at 0xfa20

but what I'm generally getting is in fact...

   load reg_1 from mem at TxHeaderArray+4
   store reg_1 into mem at 0xfa24
   load reg_1 from mem at TxHeaderArray
   store reg_1 into mem at 0xfa20

and here's the chain of events that I think is causing it:

============================================================================
=============================
 3) Would anyone be so kind as to look over my define_function_unit
statement to sanity-check it for me?
============================================================================
=============================

--------------------------<from .md file>--------------------------
; Memory unit: results not available on cycle after command
; executes but 1 cycle after that.
(define_function_unit "memory" 1 0 (eq_attr "type" "load") 2 1)
--------------------------<from .md file>--------------------------

  IIUIC, this function unit definition accurately reflects the cpu's
behaviour:  one memory load unit, simultaneity of zero meaning you can issue
as many loads as you want without waiting for earlier ones to complete,
ready delay of 2 meaning that if the load is issued on cycle n the data can
be used on cycle n+2, and an issue-delay of 1 meaning a zero-cycle delay
before the next instruction can be issued.  All reasonably sane, I hope.

================================
 2) I think I know why, part 2.
================================

  Now, the sequence of events is I think like this:

------------------------<source.i.22.sched>------------------------
;;   ==================== scheduling visualization  
;;   clock     memory                             settest
no-unit 
;;   =====     ==============================
==============================     ======= 
;;   0         ------------------------------
------------------------------     10      
;;   1         ------------------------------
------------------------------     13      
;;   2         16   r70=[0xfa18]
------------------------------   
;;       .
;;   4         ------------------------------
------------------------------     18      
;;   5         ------------------------------
------------------------------     21      
;;   6         ------------------------------
------------------------------     25      
;;   7         ------------------------------
------------------------------     29      
------------------------<source.i.22.sched>------------------------

  - 1st scheduling pass doesn't recognize these insns as mem loads at all,
because, being m->m sets, they don't match any of the m->r alternatives in
movsi.  It does correctly identify insn 16 which _does_ match the m->r
alternative:

------------------------<source.i.22.sched>------------------------
(insn 16 13 18 0 0x101f6b40 (set (reg/v:SI 70)
        (mem/v:SI (const_int 64024 [0xfa18]) [7 S4 A32])) 22 {movsi}
(insn_list 13 (nil))
    (nil))
------------------------<source.i.22.sched>------------------------

  - Greg comes along, sees that there's no m->m pattern for movsi, and
decides a reload is needed.  Because there's no overlap in the life of the
operands and no aliasing between them, it decides to allocate the same hard
reg for both, then goes ahead and breaks the m->m moves into m->r and r->m
reloaded insns:

-------------------------<source.i.24.greg>------------------------
Reloads for insn # 25
Reload 0: reload_out (SI) = (mem/s:SI (const:SI (plus:SI (symbol_ref/v:SI
("TxHeaderArray"))
                                                            (const_int 4
[0x4]))) [7 TxHeaderArray+4 S4 A32])
	NO_REGS, RELOAD_FOR_OUTPUT (opnum = 0), optional
	reload_out_reg: (mem/s:SI (const:SI (plus:SI (symbol_ref/v:SI
("TxHeaderArray"))
                                                            (const_int 4
[0x4]))) [7 TxHeaderArray+4 S4 A32])
Reload 1: reload_in (SI) = (mem/v:SI (const_int 64036 [0xfa24]) [7 S4 A32])
	GENERAL_REGS, RELOAD_FOR_INPUT (opnum = 1)
	reload_in_reg: (mem/v:SI (const_int 64036 [0xfa24]) [7 S4 A32])
	reload_reg_rtx: (reg:SI 4 r4)

Reloads for insn # 29
Reload 0: reload_out (SI) = (mem/s:SI (symbol_ref/v:SI ("TxHeaderArray")) [7
TxHeaderArray+0 S4 A32])
	NO_REGS, RELOAD_FOR_OUTPUT (opnum = 0), optional
	reload_out_reg: (mem/s:SI (symbol_ref/v:SI ("TxHeaderArray")) [7
TxHeaderArray+0 S4 A32])
Reload 1: reload_in (SI) = (mem/v:SI (const_int 64032 [0xfa20]) [7 S4 A32])
	GENERAL_REGS, RELOAD_FOR_INPUT (opnum = 1)
	reload_in_reg: (mem/v:SI (const_int 64032 [0xfa20]) [7 S4 A32])
	reload_reg_rtx: (reg:SI 4 r4)
;; Register dispositions:
69 in 4  70 in 3  

;; Hard regs used:  3 4

......

(insn 38 21 25 0 0x0 (set (reg:SI 4 r4)
        (mem/v:SI (const_int 64036 [0xfa24]) [7 S4 A32])) 22 {movsi} (nil)
    (nil))

(insn 25 38 39 0 0x101f6b40 (set (mem/s:SI (const:SI (plus:SI
(symbol_ref/v:SI ("TxHeaderArray"))
                    (const_int 4 [0x4]))) [7 TxHeaderArray+4 S4 A32])
        (reg:SI 4 r4)) 22 {movsi} (insn_list 13 (insn_list 21
(insn_list:REG_DEP_ANTI 16 (nil))))
    (nil))

(insn 39 25 29 0 0x0 (set (reg:SI 4 r4)
        (mem/v:SI (const_int 64032 [0xfa20]) [7 S4 A32])) 22 {movsi} (nil)
    (nil))

(insn 29 39 36 0 0x101f6b40 (set (mem/s:SI (symbol_ref/v:SI
("TxHeaderArray")) [7 TxHeaderArray+0 S4 A32])
        (reg:SI 4 r4)) 22 {movsi} (insn_list 13 (insn_list 21
(insn_list:REG_DEP_ANTI 16 (insn_list:REG_DEP_ANTI 25 (nil)))))
    (nil))
-------------------------<source.i.24.greg>------------------------

  - 2nd scheduling pass this time *does* recognize these insns as mem loads,
but there's nothing it can do to schedule them around each other because
they're now sharing the same hard reg and it would get trashed (as reflected
by the REG_DEP_ANTI 25 note on insn 29, I take it).

------------------------<source.i.30.sched2>-----------------------
;;   ==================== scheduling visualization  
;;   clock     memory                             settest
no-unit 
;;   =====     ==============================
==============================     ======= 
;;   0         ------------------------------
------------------------------     10      
;;   1         ------------------------------
------------------------------     13      
;;   2         16   r3=[0xfa18]
------------------------------   
;;       .
;;   4         ------------------------------
------------------------------     18      
;;   5         ------------------------------
------------------------------     21      
;;   6         38   r4=[0xfa24]
------------------------------   
;;       .
;;   8         ------------------------------
------------------------------     25      
;;   9         39   r4=[0xfa20]
------------------------------   
;;       .
;;   11        ------------------------------
------------------------------     29      
------------------------<source.i.30.sched2>-----------------------

  Well, I can't quite see how to straighten this mess out.  Sched1 can't
deduce that there are going to be memory loads here because reload hasn't
happened yet and the patterns don't match a load alternative.  Greg can't
know that that these insns need scheduling apart from each other so he
chooses to save registers rather than extend data lifespan.  Sched2 can't do
anything about Greg's choice of registers so it's stuck with being obliged
to schedule the insns with a stall.

===================================================================
 4) Does an expander sound like the right solution to the problem? 
===================================================================

  So, do I have to write a movsi mem->mem expander to fix this?  I think it
might work if I break the mem->mem into a mem->reg and reg->mem at an
earlier stage so that the mem load operations can be recognized by sched1.
Does this sound likely to anybody?

    cheers, 
      DaveK
-- 
Can't think of a witty .sigline today....

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Lousy scheduling decisions owing to bad interaction between sched and greg :(
  2004-09-03 12:52 Lousy scheduling decisions owing to bad interaction between sched and greg :( Dave Korn
@ 2004-09-03 13:26 ` Joern Rennecke
  2004-09-03 14:42   ` Richard Earnshaw
  2004-09-03 16:35 ` Jeffrey A Law
  1 sibling, 1 reply; 5+ messages in thread
From: Joern Rennecke @ 2004-09-03 13:26 UTC (permalink / raw)
  To: Dave Korn; +Cc: gcc

> ===================================================================
>  4) Does an expander sound like the right solution to the problem? 
> ===================================================================

Yes.  You'll also have to have an insn predicate for movsi that rejects
memory-memory moves to prevent the rtl optimizers from generating them.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Lousy scheduling decisions owing to bad interaction between sched and greg :(
  2004-09-03 13:26 ` Joern Rennecke
@ 2004-09-03 14:42   ` Richard Earnshaw
  0 siblings, 0 replies; 5+ messages in thread
From: Richard Earnshaw @ 2004-09-03 14:42 UTC (permalink / raw)
  To: Joern Rennecke; +Cc: Dave Korn, gcc

On Fri, 2004-09-03 at 14:29, Joern Rennecke wrote:
> > ===================================================================
> >  4) Does an expander sound like the right solution to the problem? 
> > ===================================================================
> 
> Yes.  You'll also have to have an insn predicate for movsi that rejects
> memory-memory moves to prevent the rtl optimizers from generating them.

It can't really be a predicate (in the strict sense) because they can
only validate one operand.  Instead you need a final condition on the
insn that rejects it in the case where both operands turn out to be
memory operands.  The ARM does this for its mov<mm> patterns as, I
suspect, do several ports.

R.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Lousy scheduling decisions owing to bad interaction between sched and greg :(
  2004-09-03 12:52 Lousy scheduling decisions owing to bad interaction between sched and greg :( Dave Korn
  2004-09-03 13:26 ` Joern Rennecke
@ 2004-09-03 16:35 ` Jeffrey A Law
  2004-09-03 17:26   ` Lousy scheduling decisions owing to bad interaction betweensched " Dave Korn
  1 sibling, 1 reply; 5+ messages in thread
From: Jeffrey A Law @ 2004-09-03 16:35 UTC (permalink / raw)
  To: Dave Korn; +Cc: gcc

On Fri, 2004-09-03 at 06:49, Dave Korn wrote:
>     Morning all!
> 
> 
> [  This is a rather long post, so I'll summarize very quickly at the top:
> 1) I've got bad scheduling decisions being made, 2) I think I know why, 3)
> Would anyone be so kind as to cast an eye over my define_function_unit
> statement to sanity-check it for me, 4) Does an expander sound like the
> right solution to the problem?  ]
Just to make sure I understand.

Your port has r->m and m->r instructions, but no m->m instructions.
Right?  That's the core issue, you need to reflect that in your
move patterns' predicates so that you do not get m->m insns.

This is pretty typical for many targets.  You'll want an expander
for your move patterns which breaks m->m moves into m->r, r->m
sequences.

You'll want a matching pattern which requires a register for
either the source or destination operand.

You can probably crib code from any of the riscy-like targets
to do this.

jeff



^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: Lousy scheduling decisions owing to bad interaction betweensched and greg :(
  2004-09-03 16:35 ` Jeffrey A Law
@ 2004-09-03 17:26   ` Dave Korn
  0 siblings, 0 replies; 5+ messages in thread
From: Dave Korn @ 2004-09-03 17:26 UTC (permalink / raw)
  To: law; +Cc: gcc


> -----Original Message-----
> From: Jeffrey A Law 
> Sent: 03 September 2004 17:35

> On Fri, 2004-09-03 at 06:49, Dave Korn wrote:
> >     Morning all!
> > 
> > 
> > [  This is a rather long post, so I'll summarize very 
> quickly at the top:
> > 1) I've got bad scheduling decisions being made, 2) I think 
> I know why, 3)
> > Would anyone be so kind as to cast an eye over my 
> define_function_unit
> > statement to sanity-check it for me, 4) Does an expander 
> sound like the
> > right solution to the problem?  ]
> Just to make sure I understand.
> 
> Your port has r->m and m->r instructions, but no m->m instructions.
> Right?  That's the core issue, you need to reflect that in your
> move patterns' predicates so that you do not get m->m insns.

  Yep, spot on.

> This is pretty typical for many targets.  You'll want an expander
> for your move patterns which breaks m->m moves into m->r, r->m
> sequences.
> 
> You'll want a matching pattern which requires a register for
> either the source or destination operand.
> 
> You can probably crib code from any of the riscy-like targets
> to do this.
> 
> jeff


  Yep, that seems to be the consensus.  Thanks, Jeff, Richard, and Joern,
for your helpful replies!


    cheers, 
      DaveK
-- 
Can't think of a witty .sigline today....

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2004-09-03 17:26 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-09-03 12:52 Lousy scheduling decisions owing to bad interaction between sched and greg :( Dave Korn
2004-09-03 13:26 ` Joern Rennecke
2004-09-03 14:42   ` Richard Earnshaw
2004-09-03 16:35 ` Jeffrey A Law
2004-09-03 17:26   ` Lousy scheduling decisions owing to bad interaction betweensched " Dave Korn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).