public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Incorrect DFA scheduling of output dependency.
@ 2004-12-06 11:30 Daniel Towner
  2004-12-06 12:30 ` Nathan Sidwell
  2004-12-06 12:31 ` Steven Bosscher
  0 siblings, 2 replies; 12+ messages in thread
From: Daniel Towner @ 2004-12-06 11:30 UTC (permalink / raw)
  To: gcc

Hi all,

I am using the DFA scheduler to implement VLIW scheduling for a 16-bit 
DSP. Recently I have come across an apparent bug in the scheduler. 
Consider the following sequence of instructions, with DFA scheduling 
turned off:

_L11:
        COPY.0 0,R3
        LSL.0 R3,2,R5
        ADD.0 R3,1,R3
        ADD.0 R5,FP,R5

        // etc...

(note that destination operands come last, sources come first)

To begin with, this is rather a strange sequence - a register is initialised to 0, and then various operations performed on that register - I haven't figured out why constant propogation doesn't make a better job of this. Anyway, with DFA scheduling turned on, I get the following code instead:

_L11:

        COPY.0 0,R3     \
        LSL.1 R3,2,R5

        ADD.0 R3,1,R3   \
        ADD.1 R5,FP,R5

        // etc...

Now the DFA scheduler has grouped the four instructions into two VLIW packets. However, the first of these packets contains an instruction which writes to R3, and which reads from R3. Thus, the wrong value of R3 is shifted into R5. It appears that the output dependency of the initial instruction is not being respected?

Any ideas?

thanks,

dan.

============================================================================
Daniel Towner
picoChip Designs Ltd., Riverside Buildings, 108, Walcot Street, BATH,
BA1 5BG
daniel.towner@picochip.com
07786 702589 



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Incorrect DFA scheduling of output dependency.
  2004-12-06 11:30 Incorrect DFA scheduling of output dependency Daniel Towner
@ 2004-12-06 12:30 ` Nathan Sidwell
  2004-12-06 12:31 ` Steven Bosscher
  1 sibling, 0 replies; 12+ messages in thread
From: Nathan Sidwell @ 2004-12-06 12:30 UTC (permalink / raw)
  To: Daniel Towner; +Cc: gcc

Daniel,
> I am using the DFA scheduler to implement VLIW scheduling for a 16-bit 
> DSP. Recently I have come across an apparent bug in the scheduler. 
> Consider the following sequence of instructions, with DFA scheduling 
> turned off:
> 

> Now the DFA scheduler has grouped the four instructions into two VLIW 
> packets. However, the first of these packets contains an instruction 
> which writes to R3, and which reads from R3. Thus, the wrong value of R3 
> is shifted into R5. It appears that the output dependency of the initial 
> instruction is not being respected?

something is wrong with your scheduler description, but I know not what :)
You need to tell the scheduler about the bundling, how are you doing that?

nathan

-- 
Nathan Sidwell    ::   http://www.codesourcery.com   ::     CodeSourcery LLC
nathan@codesourcery.com    ::     http://www.planetfall.pwp.blueyonder.co.uk

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Incorrect DFA scheduling of output dependency.
  2004-12-06 11:30 Incorrect DFA scheduling of output dependency Daniel Towner
  2004-12-06 12:30 ` Nathan Sidwell
@ 2004-12-06 12:31 ` Steven Bosscher
  2004-12-06 16:27   ` Daniel Towner
  1 sibling, 1 reply; 12+ messages in thread
From: Steven Bosscher @ 2004-12-06 12:31 UTC (permalink / raw)
  To: Daniel Towner; +Cc: gcc

On Dec 06, 2004 12:29 PM, Daniel Towner <daniel.towner@picochip.com> wrote:

> Any ideas?

You're not making this easy because you haven't told anything about:
1) what target you're working on (apparently something that is not
   in the FSF GCC tree);
2) what your DFA description looks like (did you tell the scheduler
   that those two instructions are issued in parallel?); and
3) what version of GCC you are working with.

Anyway, I'll assume you work from mainline.  If you use a recent
snapshot: There was a bug last week where REG notes were being
removed in pairs when they shouldn't.  That bug had been there for
about two weeks.  Perhaps your REG_DEP_OUTPUT note was incorrectly
being removed.  In that case, try a newer shapshot.

But first look at the scheduler dumps (-dS and -dR) to see if the
output dependency is there, of course...

Gr.
Steven


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Incorrect DFA scheduling of output dependency.
  2004-12-06 12:31 ` Steven Bosscher
@ 2004-12-06 16:27   ` Daniel Towner
  2004-12-06 17:12     ` Vladimir Makarov
  0 siblings, 1 reply; 12+ messages in thread
From: Daniel Towner @ 2004-12-06 16:27 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: gcc, Nathan Sidwell

Steven, Nathan, et al.

>You're not making this easy because you haven't told anything about:
>1) what target you're working on (apparently something that is not
>   in the FSF GCC tree);
>2) what your DFA description looks like (did you tell the scheduler
>   that those two instructions are issued in parallel?); and
>3) what version of GCC you are working with.
>  
>
I'm working on a 16-bit DSP port of gcc, which hasn't been contributed 
back to the mainline tree yet. The port is based on 3.4.3

The DFA scheduler describes a machine which has 3 execution slots, plus 
an additional slot for a long immediate value. I've attached my DFA 
description below. The DFA scheduler normally ensures that instructions 
with data dependencies are placed in different cycles. Once the 
scheduler has completed, the first instruction for each cycle is marked 
with a TI mode instruction. I have specialised versions of 
asm_output_opcode and final_prescan_insn which detect the TI mode 
labels, and arrange for the assembly output to include the VLIW packing 
information.

I have to use the machine dependent reorganisation phase to run the 
scheduler, so that the last-jump-optimisation doesn't disturb the TI 
mode labels applied to the first instruction in each clock cycle (as per 
the IA64).

>But first look at the scheduler dumps (-dS and -dR) to see if the
>output dependency is there, of course...
>
I was wrong here. The instruction sequence is actually a data 
(read-after-write) dependency, not an output dependency 
(write-after-write). However, the relevent portion of the scheduler dump 
is as follows:

(note 82 147 64 2 [bb 2] NOTE_INSN_BASIC_BLOCK)

(insn:TI 64 82 150 2 (set (reg/v:HI 4 R4 [orig:25 rdIndex ] [25])
        (const_int 0 [0x0])) 15 {movhi} (nil)
    (nil))

(note 150 64 133 2 NOTE_INSN_LOOP_END)

(insn 133 150 135 2 (set (reg:HI 5 R5 [33])
        (ashift:HI (reg/v:HI 4 R4 [orig:25 rdIndex ] [25])
            (const_int 2 [0x2]))) 48 {ashlhi3} (insn_list:REG_DEP_ANTI 
64 (nil))
    (expr_list:REG_EQUAL (ashift:HI (reg/v:HI 4 R4 [orig:25 rdIndex ] [25])
            (const_int 2 [0x2]))
        (nil)))

Does this state that insn 133 is anti-dependent on insn 64? An 
anti-dependency is a write following a read, but in this sequence a read 
follows a write. The anti-dependency first appears after the basic block 
reordering pass has been run (which is immediately before the 
instruction scheduling pass).

If I modify TARGET_SCHED_ADJUST_COST to return 1 when an anti-dependency 
is encountered, this results in the two instructions being scheduled in 
different cycles (and hence, different VLIW packets). For a VLIW machine 
however, it is legal for anti-dependent instructions to be scheduled in 
the same cycle, so I can't use this method to permanently fix the problem.

many thanks,

dan.

;;==============================================================================
;; Scheduling, including delay slot scheduling.
;;==============================================================================

(automata_option "v")
(automata_option "ndfa")

;; Define each VLIW slot as a CPU resource.

(define_attr "type"
  "picoAlu,basicAlu,nonCcAlu,mem,branch,mul,mac,app,comms,unknown"
  (const_string "unknown"))

;; Define whether an instruction uses a long constant.

(define_attr "longConstant"
  "true,false" (const_string "false"))

;; Define three EU slots.
(define_query_cpu_unit "slot0,slot1,slot2")

;; Each instruction comes in forms with and without long
;; constants. The long constant is treated as though it were also an
;; instruction. Thus, an instruction which used slot0, will use slot0
;; plus one of the other slots for the constant. This mechanism
;; ensures that it is impossible for 3 instructions to be issued, if
;; one of them has a long constant.

; Extended ALU - Slot 0
(define_insn_reservation "picoAluInsn" 1
  (and (eq_attr "type" "picoAlu") (eq_attr "longConstant" "false"))
  "slot0")
(define_insn_reservation "picoAluInsnWithConst" 1
  (and (eq_attr "type" "picoAlu") (eq_attr "longConstant" "true"))
  "(slot0+slot1)|(slot0+slot2)")

; Basic ALU - Slot 0 or 1
(define_insn_reservation "basicAluInsn" 1
  (and (eq_attr "type" "basicAlu") (eq_attr "longConstant" "false"))
  "(slot0|slot1)")
(define_insn_reservation "basicAluInsnWithConst" 1
  (and (eq_attr "type" "basicAlu") (eq_attr "longConstant" "true"))
  "(slot0+slot1) | (slot1+slot2) | (slot0+slot2)")

; ALU which must not set flags - Slot 1
(define_insn_reservation "nonCcAluInsn" 1
  (and (eq_attr "type" "nonCcAlu") (eq_attr "longConstant" "false"))
  "slot1")
(define_insn_reservation "nonCcAluInsnWithConst" 1
  (and (eq_attr "type" "nonCcAlu") (eq_attr "longConstant" "true"))
  "(slot1+slot0) | (slot1+slot2)")

; Memory - Slot 1
(define_insn_reservation "memInsn" 2
  (and (eq_attr "type" "mem") (eq_attr "longConstant" "false"))
  "slot1,nothing")
(define_insn_reservation "memInsnWithConst" 2
  (and (eq_attr "type" "mem") (eq_attr "longConstant" "true"))
  "slot1+(slot0|slot2),nothing")

; Multiply - Slot 2
(define_insn_reservation "mulInsn" 1
  (and (eq_attr "type" "mul") (eq_attr "longConstant" "false"))
  "slot2")
(define_insn_reservation "mulInsnWithConst" 1
  (and (eq_attr "type" "mul") (eq_attr "longConstant" "true"))
  "(slot2+slot0)|(slot2+slot1)")

; Branch - Slot 2
(define_insn_reservation "branchInsn" 1
  (and (eq_attr "type" "branch") (eq_attr "longConstant" "false"))
  "slot2")
(define_insn_reservation "branchInsnWithConst" 1
  (and (eq_attr "type" "branch") (eq_attr "longConstant" "true"))
  "(slot2+slot0)|(slot2+slot1)")

; Communications - Slot 1
(define_insn_reservation "commsInsn" 1
  (eq_attr "type" "comms")
  "slot1")

; Unknown instructions are assumed to take a single cycle, and use all
; slots. This enables them to actually output a sequence of
; instructions without any limitation.

(define_insn_reservation "unknownInsn" 1
  (eq_attr "type" "unknown")
  "(slot0+slot1+slot2)")


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Incorrect DFA scheduling of output dependency.
  2004-12-06 16:27   ` Daniel Towner
@ 2004-12-06 17:12     ` Vladimir Makarov
  2004-12-07 10:59       ` Daniel Towner
  0 siblings, 1 reply; 12+ messages in thread
From: Vladimir Makarov @ 2004-12-06 17:12 UTC (permalink / raw)
  To: Daniel Towner; +Cc: Steven Bosscher, gcc, Nathan Sidwell

Daniel Towner wrote:

> Steven, Nathan, et al.
>
>> You're not making this easy because you haven't told anything about:
>> 1) what target you're working on (apparently something that is not
>>   in the FSF GCC tree);
>> 2) what your DFA description looks like (did you tell the scheduler
>>   that those two instructions are issued in parallel?); and
>> 3) what version of GCC you are working with.
>>  
>>
> I'm working on a 16-bit DSP port of gcc, which hasn't been contributed 
> back to the mainline tree yet. The port is based on 3.4.3
>
> The DFA scheduler describes a machine which has 3 execution slots, 
> plus an additional slot for a long immediate value. I've attached my 
> DFA description below. The DFA scheduler normally ensures that 
> instructions with data dependencies are placed in different cycles. 
> Once the scheduler has completed, the first instruction for each cycle 
> is marked with a TI mode instruction. I have specialised versions of 
> asm_output_opcode and final_prescan_insn which detect the TI mode 
> labels, and arrange for the assembly output to include the VLIW 
> packing information.
>
> I have to use the machine dependent reorganisation phase to run the 
> scheduler, so that the last-jump-optimisation doesn't disturb the TI 
> mode labels applied to the first instruction in each clock cycle (as 
> per the IA64).

Yes that is right it should be the very last pass of the compiler to 
generate correct code for a VLIW processor based on the labels set up 
the scheduler.

>
>> But first look at the scheduler dumps (-dS and -dR) to see if the
>> output dependency is there, of course...
>>
> I was wrong here. The instruction sequence is actually a data 
> (read-after-write) dependency, not an output dependency 
> (write-after-write). However, the relevent portion of the scheduler 
> dump is as follows:
>
> (note 82 147 64 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
>
> (insn:TI 64 82 150 2 (set (reg/v:HI 4 R4 [orig:25 rdIndex ] [25])
>        (const_int 0 [0x0])) 15 {movhi} (nil)
>    (nil))
>
> (note 150 64 133 2 NOTE_INSN_LOOP_END)
>
> (insn 133 150 135 2 (set (reg:HI 5 R5 [33])
>        (ashift:HI (reg/v:HI 4 R4 [orig:25 rdIndex ] [25])
>            (const_int 2 [0x2]))) 48 {ashlhi3} (insn_list:REG_DEP_ANTI 
> 64 (nil))
>    (expr_list:REG_EQUAL (ashift:HI (reg/v:HI 4 R4 [orig:25 rdIndex ] 
> [25])
>            (const_int 2 [0x2]))
>        (nil)))
>
> Does this state that insn 133 is anti-dependent on insn 64?

Yes, it does.  And that is wrong.

> An anti-dependency is a write following a read, but in this sequence a 
> read follows a write. The anti-dependency first appears after the 
> basic block reordering pass has been run (which is immediately before 
> the instruction scheduling pass).

The information is not enough to make a real analysis of the bug.  But I 
can guess.

Even if the dependence was added in basic block reordering pass (can not 
say more about this), it should have been removed in insn scheduling 
first.  Even if the dependence was not removed, it should have been 
changed by higher priority dependence (true dependence).  So my guess, 
your scheduler did not call sched_analyze.

Vlad

>
> If I modify TARGET_SCHED_ADJUST_COST to return 1 when an 
> anti-dependency is encountered, this results in the two instructions 
> being scheduled in different cycles (and hence, different VLIW 
> packets). For a VLIW machine however, it is legal for anti-dependent 
> instructions to be scheduled in the same cycle, so I can't use this 
> method to permanently fix the problem.
>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Incorrect DFA scheduling of output dependency.
  2004-12-06 17:12     ` Vladimir Makarov
@ 2004-12-07 10:59       ` Daniel Towner
  2004-12-07 13:01         ` Steven Bosscher
  2004-12-07 22:15         ` Vladimir N. Makarov
  0 siblings, 2 replies; 12+ messages in thread
From: Daniel Towner @ 2004-12-07 10:59 UTC (permalink / raw)
  To: Vladimir Makarov; +Cc: Steven Bosscher, gcc, Nathan Sidwell

Vlad, et al.,

>> I was wrong here. The instruction sequence is actually a data 
>> (read-after-write) dependency, not an output dependency 
>> (write-after-write). However, the relevent portion of the scheduler 
>> dump is as follows:
>>
>> (note 82 147 64 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
>>
>> (insn:TI 64 82 150 2 (set (reg/v:HI 4 R4 [orig:25 rdIndex ] [25])
>>        (const_int 0 [0x0])) 15 {movhi} (nil)
>>    (nil))
>>
>> (note 150 64 133 2 NOTE_INSN_LOOP_END)
>>
>> (insn 133 150 135 2 (set (reg:HI 5 R5 [33])
>>        (ashift:HI (reg/v:HI 4 R4 [orig:25 rdIndex ] [25])
>>            (const_int 2 [0x2]))) 48 {ashlhi3} (insn_list:REG_DEP_ANTI 
>> 64 (nil))
>>    (expr_list:REG_EQUAL (ashift:HI (reg/v:HI 4 R4 [orig:25 rdIndex ] 
>> [25])
>>            (const_int 2 [0x2]))
>>        (nil)))
>>
>> Does this state that insn 133 is anti-dependent on insn 64?
>
I've discovered that the anti-dependency is inserted by sched_analyze. 
It occurs because of the NOTE_INSN_LOOP_END between the two instructions 
above. This note introduces a move barrier between the instructions, 
which is intended to prevent the two instructions being reordered. 
Currently, this barrier is represented by making the second instruction 
anti-dependent upon the first. For most processors, I guess that such a 
dependency works as expected, but a VLIW machine is able to emit such 
instructions in a single cycle, resulting in an incorrect schedule. It 
feels like this should be a true dependency, but the relevent code seems 
to make a distinction between a true dependency  (a TRUE_BARRIER) and a 
order dependency (a MOVE_BARRIER). What sort of dependency should 
actually be inserted here?

thanks,

dan.

============================================================================
Daniel Towner
picoChip Designs Ltd., Riverside Buildings, 108, Walcot Street, BATH,
BA1 5BG
daniel.towner@picochip.com
07786 702589 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Incorrect DFA scheduling of output dependency.
  2004-12-07 10:59       ` Daniel Towner
@ 2004-12-07 13:01         ` Steven Bosscher
  2004-12-07 13:15           ` Steven Bosscher
  2004-12-07 22:15         ` Vladimir N. Makarov
  1 sibling, 1 reply; 12+ messages in thread
From: Steven Bosscher @ 2004-12-07 13:01 UTC (permalink / raw)
  To: Daniel Towner; +Cc: Vladimir Makarov, gcc, Nathan Sidwell

On Dec 07, 2004 11:59 AM, Daniel Towner <daniel.towner@picochip.com> wrote:

> Vlad, et al.,
> 
> >> I was wrong here. The instruction sequence is actually a data 
> >> (read-after-write) dependency, not an output dependency 
> >> (write-after-write). However, the relevent portion of the scheduler 
> >> dump is as follows:
> >>
> >> (note 82 147 64 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
> >>
> >> (insn:TI 64 82 150 2 (set (reg/v:HI 4 R4 [orig:25 rdIndex ] [25])
> >>        (const_int 0 [0x0])) 15 {movhi} (nil)
> >>    (nil))
> >>
> >> (note 150 64 133 2 NOTE_INSN_LOOP_END)
> >>
> >> (insn 133 150 135 2 (set (reg:HI 5 R5 [33])
> >>        (ashift:HI (reg/v:HI 4 R4 [orig:25 rdIndex ] [25])
> >>            (const_int 2 [0x2]))) 48 {ashlhi3} (insn_list:REG_DEP_ANTI 
> >> 64 (nil))
> >>    (expr_list:REG_EQUAL (ashift:HI (reg/v:HI 4 R4 [orig:25 rdIndex ] 
> >> [25])
> >>            (const_int 2 [0x2]))
> >>        (nil)))
> >>
> >> Does this state that insn 133 is anti-dependent on insn 64?
> >
> I've discovered that the anti-dependency is inserted by sched_analyze. 
> It occurs because of the NOTE_INSN_LOOP_END between the two instructions 
> above. This note introduces a move barrier between the instructions, 
> which is intended to prevent the two instructions being reordered. 


Can someone explain please why we have loop notes in the middle of
a basic block?

Gr.
Steven


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Incorrect DFA scheduling of output dependency.
  2004-12-07 13:01         ` Steven Bosscher
@ 2004-12-07 13:15           ` Steven Bosscher
  2004-12-07 13:26             ` Jeffrey A Law
  0 siblings, 1 reply; 12+ messages in thread
From: Steven Bosscher @ 2004-12-07 13:15 UTC (permalink / raw)
  To: gcc; +Cc: Daniel Towner, Vladimir Makarov, Nathan Sidwell, rth

On Dec 07, 2004 02:01 PM, Steven Bosscher <stevenb@suse.de> wrote:

> On Dec 07, 2004 11:59 AM, Daniel Towner <daniel.towner@picochip.com> wrote:
> 
> > Vlad, et al.,
> > 
> > >> I was wrong here. The instruction sequence is actually a data 
> > >> (read-after-write) dependency, not an output dependency 
> > >> (write-after-write). However, the relevent portion of the scheduler 
> > >> dump is as follows:
> > >>
> > >> (note 82 147 64 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
> > >>
> > >> (insn:TI 64 82 150 2 (set (reg/v:HI 4 R4 [orig:25 rdIndex ] [25])
> > >>        (const_int 0 [0x0])) 15 {movhi} (nil)
> > >>    (nil))
> > >>
> > >> (note 150 64 133 2 NOTE_INSN_LOOP_END)
> > >>
> > >> (insn 133 150 135 2 (set (reg:HI 5 R5 [33])
> > >>        (ashift:HI (reg/v:HI 4 R4 [orig:25 rdIndex ] [25])
> > >>            (const_int 2 [0x2]))) 48 {ashlhi3} (insn_list:REG_DEP_ANTI 
> > >> 64 (nil))
> > >>    (expr_list:REG_EQUAL (ashift:HI (reg/v:HI 4 R4 [orig:25 rdIndex ] 
> > >> [25])
> > >>            (const_int 2 [0x2]))
> > >>        (nil)))
> > >>
> > >> Does this state that insn 133 is anti-dependent on insn 64?
> > >
> > I've discovered that the anti-dependency is inserted by sched_analyze. 
> > It occurs because of the NOTE_INSN_LOOP_END between the two instructions 
> > above. This note introduces a move barrier between the instructions, 
> > which is intended to prevent the two instructions being reordered. 
> 
> 
> Can someone explain please why we have loop notes in the middle of
> a basic block?

In fact maybe someone with a lot of RTL-fu should explain what this
comment in sched-deps is supposed to mean to begin with:

  /* If there is a {LOOP,EHREGION}_{BEG,END} insn note in the middle of a basic
     block, then we must be sure that no instructions are scheduled across it.
     Otherwise, the reg_n_refs info (which depends on loop_depth) would
     become incorrect.  */

I read this and I had never heard of reg_n_refs before, so:

$ grep -w -r reg_n_refs *
FSFChangeLog.11:        * combine.c (try_combine): Clear reg_n_refs if i2dest is not
haifa-sched.c:   be correct.  Namely: reg_n_refs, reg_n_sets, reg_n_deaths,
sched-deps.c:     Otherwise, the reg_n_refs info (which depends on loop_depth) would

So even in the ChangeLogs there is only one reference to reg_n_regs.

Is this bitrot?

Gr.
Steven


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Incorrect DFA scheduling of output dependency.
  2004-12-07 13:15           ` Steven Bosscher
@ 2004-12-07 13:26             ` Jeffrey A Law
  2004-12-07 13:40               ` Daniel Berlin
  0 siblings, 1 reply; 12+ messages in thread
From: Jeffrey A Law @ 2004-12-07 13:26 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: gcc, Daniel Towner, Vladimir Makarov, Nathan Sidwell, rth

On Tue, 2004-12-07 at 14:14 +0100, Steven Bosscher wrote:

> > 
> > Can someone explain please why we have loop notes in the middle of
> > a basic block?
It's historical.  I think it's relatively uncommon.

> 
> In fact maybe someone with a lot of RTL-fu should explain what this
> comment in sched-deps is supposed to mean to begin with:
> 
>   /* If there is a {LOOP,EHREGION}_{BEG,END} insn note in the middle of a basic
>      block, then we must be sure that no instructions are scheduled across it.
>      Otherwise, the reg_n_refs info (which depends on loop_depth) would
>      become incorrect.  */
> 
> I read this and I had never heard of reg_n_refs before, so:
> 
> $ grep -w -r reg_n_refs *
> FSFChangeLog.11:        * combine.c (try_combine): Clear reg_n_refs if i2dest is not
> haifa-sched.c:   be correct.  Namely: reg_n_refs, reg_n_sets, reg_n_deaths,
> sched-deps.c:     Otherwise, the reg_n_refs info (which depends on loop_depth) would
> 
> So even in the ChangeLogs there is only one reference to reg_n_regs.
> 
> Is this bitrot?
reg_n_refs got moved into the reg_info_def structure along with most of
the other information related to registers.  The comment needs updating.
jeff


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Incorrect DFA scheduling of output dependency.
  2004-12-07 13:26             ` Jeffrey A Law
@ 2004-12-07 13:40               ` Daniel Berlin
  0 siblings, 0 replies; 12+ messages in thread
From: Daniel Berlin @ 2004-12-07 13:40 UTC (permalink / raw)
  To: Jeffrey A Law
  Cc: Steven Bosscher, gcc, Daniel Towner, Vladimir Makarov,
	Nathan Sidwell, rth



>> FSFChangeLog.11:        * combine.c (try_combine): Clear reg_n_refs if i2dest is not
>> haifa-sched.c:   be correct.  Namely: reg_n_refs, reg_n_sets, reg_n_deaths,
>> sched-deps.c:     Otherwise, the reg_n_refs info (which depends on loop_depth) would
>>
>> So even in the ChangeLogs there is only one reference to reg_n_regs.
>>
>> Is this bitrot?
> reg_n_refs got moved into the reg_info_def structure along with most of
> the other information related to registers.  The comment needs updating.
> jeff

Just to followup, it's now REG_N_REFS, defined in regs.h

--Dan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Incorrect DFA scheduling of output dependency.
  2004-12-07 10:59       ` Daniel Towner
  2004-12-07 13:01         ` Steven Bosscher
@ 2004-12-07 22:15         ` Vladimir N. Makarov
       [not found]           ` <41B6360E.6010806@redhat.com>
  1 sibling, 1 reply; 12+ messages in thread
From: Vladimir N. Makarov @ 2004-12-07 22:15 UTC (permalink / raw)
  To: Daniel Towner; +Cc: Steven Bosscher, gcc, Nathan Sidwell

[-- Attachment #1: Type: text/plain, Size: 2429 bytes --]

Daniel Towner wrote:

> Vlad, et al.,
>
>>> I was wrong here. The instruction sequence is actually a data 
>>> (read-after-write) dependency, not an output dependency 
>>> (write-after-write). However, the relevent portion of the scheduler 
>>> dump is as follows:
>>>
>>> (note 82 147 64 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
>>>
>>> (insn:TI 64 82 150 2 (set (reg/v:HI 4 R4 [orig:25 rdIndex ] [25])
>>>        (const_int 0 [0x0])) 15 {movhi} (nil)
>>>    (nil))
>>>
>>> (note 150 64 133 2 NOTE_INSN_LOOP_END)
>>>
>>> (insn 133 150 135 2 (set (reg:HI 5 R5 [33])
>>>        (ashift:HI (reg/v:HI 4 R4 [orig:25 rdIndex ] [25])
>>>            (const_int 2 [0x2]))) 48 {ashlhi3} 
>>> (insn_list:REG_DEP_ANTI 64 (nil))
>>>    (expr_list:REG_EQUAL (ashift:HI (reg/v:HI 4 R4 [orig:25 rdIndex ] 
>>> [25])
>>>            (const_int 2 [0x2]))
>>>        (nil)))
>>>
>>> Does this state that insn 133 is anti-dependent on insn 64?
>>
>>
> I've discovered that the anti-dependency is inserted by sched_analyze. 
> It occurs because of the NOTE_INSN_LOOP_END between the two 
> instructions above. This note introduces a move barrier between the 
> instructions, which is intended to prevent the two instructions being 
> reordered. Currently, this barrier is represented by making the second 
> instruction anti-dependent upon the first. For most processors, I 
> guess that such a dependency works as expected, but a VLIW machine is 
> able to emit such instructions in a single cycle, resulting in an 
> incorrect schedule. It feels like this should be a true dependency, 
> but the relevent code seems to make a distinction between a true 
> dependency  (a TRUE_BARRIER) and a order dependency (a MOVE_BARRIER). 
> What sort of dependency should actually be inserted here?
>
Please try the following patch if it works for you, I could commit it 
into the main line.  It should solve the problem of generation of 
incorrect schedule for VLIW.  But the problem of generation of not 
optimal schedule will still exist because the first insn after the 
barrier behaves as one setting and using all registers.

Vlad

Vladimir Makarov  <vmakarov@redhat.com>
                                                                                

        * sched-deps.c (sched_analyze_insn): Use more accurate dependence
        type for the first insn after MOVE_BARRIER.
                                                                                



[-- Attachment #2: Z --]
[-- Type: text/plain, Size: 2905 bytes --]

Index: sched-deps.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/sched-deps.c,v
retrieving revision 1.65.2.1.2.1
diff -c -p -r1.65.2.1.2.1 sched-deps.c
*** sched-deps.c	20 May 2004 13:01:49 -0000	1.65.2.1.2.1
--- sched-deps.c	7 Dec 2004 22:09:08 -0000
*************** sched_analyze_insn (struct deps *deps, r
*** 965,977 ****
  	  EXECUTE_IF_SET_IN_REG_SET (&deps->reg_last_in_use, 0, i,
  	    {
  	      struct deps_reg *reg_last = &deps->reg_last[i];
  	      add_dependence_list (insn, reg_last->uses, REG_DEP_ANTI);
! 	      add_dependence_list
! 		(insn, reg_last->sets,
! 		 reg_pending_barrier == TRUE_BARRIER ? 0 : REG_DEP_ANTI);
! 	      add_dependence_list
! 		(insn, reg_last->clobbers,
! 		 reg_pending_barrier == TRUE_BARRIER ? 0 : REG_DEP_ANTI);
  	    });
  	}
        else
--- 965,980 ----
  	  EXECUTE_IF_SET_IN_REG_SET (&deps->reg_last_in_use, 0, i,
  	    {
  	      struct deps_reg *reg_last = &deps->reg_last[i];
+ 	      enum reg_note dep_type;
+ 
  	      add_dependence_list (insn, reg_last->uses, REG_DEP_ANTI);
! 	      dep_type = (reg_pending_barrier == TRUE_BARRIER
! 			  ? 0 : REGNO_REG_SET_P (reg_pending_uses, i)
! 			  ? 0 : (REGNO_REG_SET_P (reg_pending_set, i)
! 				 || REGNO_REG_SET_P (reg_pending_clobber, i))
! 			  ? REG_DEP_OUTPUT : REG_DEP_ANTI);
! 	      add_dependence_list (insn, reg_last->sets, dep_type);
! 	      add_dependence_list (insn, reg_last->clobbers, dep_type);
  	    });
  	}
        else
*************** sched_analyze_insn (struct deps *deps, r
*** 979,992 ****
  	  EXECUTE_IF_SET_IN_REG_SET (&deps->reg_last_in_use, 0, i,
  	    {
  	      struct deps_reg *reg_last = &deps->reg_last[i];
  	      add_dependence_list_and_free (insn, &reg_last->uses,
  					    REG_DEP_ANTI);
! 	      add_dependence_list_and_free
! 		(insn, &reg_last->sets,
! 		 reg_pending_barrier == TRUE_BARRIER ? 0 : REG_DEP_ANTI);
! 	      add_dependence_list_and_free
! 		(insn, &reg_last->clobbers,
! 		 reg_pending_barrier == TRUE_BARRIER ? 0 : REG_DEP_ANTI);
  	      reg_last->uses_length = 0;
  	      reg_last->clobbers_length = 0;
  	    });
--- 982,999 ----
  	  EXECUTE_IF_SET_IN_REG_SET (&deps->reg_last_in_use, 0, i,
  	    {
  	      struct deps_reg *reg_last = &deps->reg_last[i];
+ 	      enum reg_note dep_type;
+ 
  	      add_dependence_list_and_free (insn, &reg_last->uses,
  					    REG_DEP_ANTI);
! 	      dep_type = (reg_pending_barrier == TRUE_BARRIER
! 			  ? 0 : REGNO_REG_SET_P (reg_pending_uses, i)
! 			  ? 0 : (REGNO_REG_SET_P (reg_pending_set, i)
! 				 || REGNO_REG_SET_P (reg_pending_clobber, i))
! 			  ? REG_DEP_OUTPUT : REG_DEP_ANTI);
! 	      add_dependence_list_and_free (insn, &reg_last->sets, dep_type);
! 	      add_dependence_list_and_free (insn, &reg_last->clobbers,
! 					    dep_type),
  	      reg_last->uses_length = 0;
  	      reg_last->clobbers_length = 0;
  	    });

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Incorrect DFA scheduling of output dependency.
       [not found]           ` <41B6360E.6010806@redhat.com>
@ 2004-12-08  9:53             ` Daniel Towner
  0 siblings, 0 replies; 12+ messages in thread
From: Daniel Towner @ 2004-12-08  9:53 UTC (permalink / raw)
  To: Vladimir N. Makarov; +Cc: gcc, Steven Bosscher, Nathan Sidwell, gcc


>> Please try the following patch if it works for you, I could commit it 
>> into the main line.  It should solve the problem of generation of 
>> incorrect schedule for VLIW.  But the problem of generation of not 
>> optimal schedule will still exist because the first insn after the 
>> barrier behaves as one setting and using all registers.
>>                                                                                
>>
>>        * sched-deps.c (sched_analyze_insn): Use more accurate dependence
>>        type for the first insn after MOVE_BARRIER.
>>  
>
> Sorry, the previous patch had some typos and failed to be compiled.  
> So here is the correct version of the patch.


Yes, that works.

Thanks for your help everyone.

dan.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2004-12-08  9:53 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-12-06 11:30 Incorrect DFA scheduling of output dependency Daniel Towner
2004-12-06 12:30 ` Nathan Sidwell
2004-12-06 12:31 ` Steven Bosscher
2004-12-06 16:27   ` Daniel Towner
2004-12-06 17:12     ` Vladimir Makarov
2004-12-07 10:59       ` Daniel Towner
2004-12-07 13:01         ` Steven Bosscher
2004-12-07 13:15           ` Steven Bosscher
2004-12-07 13:26             ` Jeffrey A Law
2004-12-07 13:40               ` Daniel Berlin
2004-12-07 22:15         ` Vladimir N. Makarov
     [not found]           ` <41B6360E.6010806@redhat.com>
2004-12-08  9:53             ` Daniel Towner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).