* Incorrect DFA scheduling of output dependency. @ 2004-12-06 11:30 Daniel Towner 2004-12-06 12:30 ` Nathan Sidwell 2004-12-06 12:31 ` Steven Bosscher 0 siblings, 2 replies; 12+ messages in thread From: Daniel Towner @ 2004-12-06 11:30 UTC (permalink / raw) To: gcc Hi all, I am using the DFA scheduler to implement VLIW scheduling for a 16-bit DSP. Recently I have come across an apparent bug in the scheduler. Consider the following sequence of instructions, with DFA scheduling turned off: _L11: COPY.0 0,R3 LSL.0 R3,2,R5 ADD.0 R3,1,R3 ADD.0 R5,FP,R5 // etc... (note that destination operands come last, sources come first) To begin with, this is rather a strange sequence - a register is initialised to 0, and then various operations performed on that register - I haven't figured out why constant propogation doesn't make a better job of this. Anyway, with DFA scheduling turned on, I get the following code instead: _L11: COPY.0 0,R3 \ LSL.1 R3,2,R5 ADD.0 R3,1,R3 \ ADD.1 R5,FP,R5 // etc... Now the DFA scheduler has grouped the four instructions into two VLIW packets. However, the first of these packets contains an instruction which writes to R3, and which reads from R3. Thus, the wrong value of R3 is shifted into R5. It appears that the output dependency of the initial instruction is not being respected? Any ideas? thanks, dan. ============================================================================ Daniel Towner picoChip Designs Ltd., Riverside Buildings, 108, Walcot Street, BATH, BA1 5BG daniel.towner@picochip.com 07786 702589 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Incorrect DFA scheduling of output dependency. 2004-12-06 11:30 Incorrect DFA scheduling of output dependency Daniel Towner @ 2004-12-06 12:30 ` Nathan Sidwell 2004-12-06 12:31 ` Steven Bosscher 1 sibling, 0 replies; 12+ messages in thread From: Nathan Sidwell @ 2004-12-06 12:30 UTC (permalink / raw) To: Daniel Towner; +Cc: gcc Daniel, > I am using the DFA scheduler to implement VLIW scheduling for a 16-bit > DSP. Recently I have come across an apparent bug in the scheduler. > Consider the following sequence of instructions, with DFA scheduling > turned off: > > Now the DFA scheduler has grouped the four instructions into two VLIW > packets. However, the first of these packets contains an instruction > which writes to R3, and which reads from R3. Thus, the wrong value of R3 > is shifted into R5. It appears that the output dependency of the initial > instruction is not being respected? something is wrong with your scheduler description, but I know not what :) You need to tell the scheduler about the bundling, how are you doing that? nathan -- Nathan Sidwell :: http://www.codesourcery.com :: CodeSourcery LLC nathan@codesourcery.com :: http://www.planetfall.pwp.blueyonder.co.uk ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Incorrect DFA scheduling of output dependency. 2004-12-06 11:30 Incorrect DFA scheduling of output dependency Daniel Towner 2004-12-06 12:30 ` Nathan Sidwell @ 2004-12-06 12:31 ` Steven Bosscher 2004-12-06 16:27 ` Daniel Towner 1 sibling, 1 reply; 12+ messages in thread From: Steven Bosscher @ 2004-12-06 12:31 UTC (permalink / raw) To: Daniel Towner; +Cc: gcc On Dec 06, 2004 12:29 PM, Daniel Towner <daniel.towner@picochip.com> wrote: > Any ideas? You're not making this easy because you haven't told anything about: 1) what target you're working on (apparently something that is not in the FSF GCC tree); 2) what your DFA description looks like (did you tell the scheduler that those two instructions are issued in parallel?); and 3) what version of GCC you are working with. Anyway, I'll assume you work from mainline. If you use a recent snapshot: There was a bug last week where REG notes were being removed in pairs when they shouldn't. That bug had been there for about two weeks. Perhaps your REG_DEP_OUTPUT note was incorrectly being removed. In that case, try a newer shapshot. But first look at the scheduler dumps (-dS and -dR) to see if the output dependency is there, of course... Gr. Steven ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Incorrect DFA scheduling of output dependency. 2004-12-06 12:31 ` Steven Bosscher @ 2004-12-06 16:27 ` Daniel Towner 2004-12-06 17:12 ` Vladimir Makarov 0 siblings, 1 reply; 12+ messages in thread From: Daniel Towner @ 2004-12-06 16:27 UTC (permalink / raw) To: Steven Bosscher; +Cc: gcc, Nathan Sidwell Steven, Nathan, et al. >You're not making this easy because you haven't told anything about: >1) what target you're working on (apparently something that is not > in the FSF GCC tree); >2) what your DFA description looks like (did you tell the scheduler > that those two instructions are issued in parallel?); and >3) what version of GCC you are working with. > > I'm working on a 16-bit DSP port of gcc, which hasn't been contributed back to the mainline tree yet. The port is based on 3.4.3 The DFA scheduler describes a machine which has 3 execution slots, plus an additional slot for a long immediate value. I've attached my DFA description below. The DFA scheduler normally ensures that instructions with data dependencies are placed in different cycles. Once the scheduler has completed, the first instruction for each cycle is marked with a TI mode instruction. I have specialised versions of asm_output_opcode and final_prescan_insn which detect the TI mode labels, and arrange for the assembly output to include the VLIW packing information. I have to use the machine dependent reorganisation phase to run the scheduler, so that the last-jump-optimisation doesn't disturb the TI mode labels applied to the first instruction in each clock cycle (as per the IA64). >But first look at the scheduler dumps (-dS and -dR) to see if the >output dependency is there, of course... > I was wrong here. The instruction sequence is actually a data (read-after-write) dependency, not an output dependency (write-after-write). However, the relevent portion of the scheduler dump is as follows: (note 82 147 64 2 [bb 2] NOTE_INSN_BASIC_BLOCK) (insn:TI 64 82 150 2 (set (reg/v:HI 4 R4 [orig:25 rdIndex ] [25]) (const_int 0 [0x0])) 15 {movhi} (nil) (nil)) (note 150 64 133 2 NOTE_INSN_LOOP_END) (insn 133 150 135 2 (set (reg:HI 5 R5 [33]) (ashift:HI (reg/v:HI 4 R4 [orig:25 rdIndex ] [25]) (const_int 2 [0x2]))) 48 {ashlhi3} (insn_list:REG_DEP_ANTI 64 (nil)) (expr_list:REG_EQUAL (ashift:HI (reg/v:HI 4 R4 [orig:25 rdIndex ] [25]) (const_int 2 [0x2])) (nil))) Does this state that insn 133 is anti-dependent on insn 64? An anti-dependency is a write following a read, but in this sequence a read follows a write. The anti-dependency first appears after the basic block reordering pass has been run (which is immediately before the instruction scheduling pass). If I modify TARGET_SCHED_ADJUST_COST to return 1 when an anti-dependency is encountered, this results in the two instructions being scheduled in different cycles (and hence, different VLIW packets). For a VLIW machine however, it is legal for anti-dependent instructions to be scheduled in the same cycle, so I can't use this method to permanently fix the problem. many thanks, dan. ;;============================================================================== ;; Scheduling, including delay slot scheduling. ;;============================================================================== (automata_option "v") (automata_option "ndfa") ;; Define each VLIW slot as a CPU resource. (define_attr "type" "picoAlu,basicAlu,nonCcAlu,mem,branch,mul,mac,app,comms,unknown" (const_string "unknown")) ;; Define whether an instruction uses a long constant. (define_attr "longConstant" "true,false" (const_string "false")) ;; Define three EU slots. (define_query_cpu_unit "slot0,slot1,slot2") ;; Each instruction comes in forms with and without long ;; constants. The long constant is treated as though it were also an ;; instruction. Thus, an instruction which used slot0, will use slot0 ;; plus one of the other slots for the constant. This mechanism ;; ensures that it is impossible for 3 instructions to be issued, if ;; one of them has a long constant. ; Extended ALU - Slot 0 (define_insn_reservation "picoAluInsn" 1 (and (eq_attr "type" "picoAlu") (eq_attr "longConstant" "false")) "slot0") (define_insn_reservation "picoAluInsnWithConst" 1 (and (eq_attr "type" "picoAlu") (eq_attr "longConstant" "true")) "(slot0+slot1)|(slot0+slot2)") ; Basic ALU - Slot 0 or 1 (define_insn_reservation "basicAluInsn" 1 (and (eq_attr "type" "basicAlu") (eq_attr "longConstant" "false")) "(slot0|slot1)") (define_insn_reservation "basicAluInsnWithConst" 1 (and (eq_attr "type" "basicAlu") (eq_attr "longConstant" "true")) "(slot0+slot1) | (slot1+slot2) | (slot0+slot2)") ; ALU which must not set flags - Slot 1 (define_insn_reservation "nonCcAluInsn" 1 (and (eq_attr "type" "nonCcAlu") (eq_attr "longConstant" "false")) "slot1") (define_insn_reservation "nonCcAluInsnWithConst" 1 (and (eq_attr "type" "nonCcAlu") (eq_attr "longConstant" "true")) "(slot1+slot0) | (slot1+slot2)") ; Memory - Slot 1 (define_insn_reservation "memInsn" 2 (and (eq_attr "type" "mem") (eq_attr "longConstant" "false")) "slot1,nothing") (define_insn_reservation "memInsnWithConst" 2 (and (eq_attr "type" "mem") (eq_attr "longConstant" "true")) "slot1+(slot0|slot2),nothing") ; Multiply - Slot 2 (define_insn_reservation "mulInsn" 1 (and (eq_attr "type" "mul") (eq_attr "longConstant" "false")) "slot2") (define_insn_reservation "mulInsnWithConst" 1 (and (eq_attr "type" "mul") (eq_attr "longConstant" "true")) "(slot2+slot0)|(slot2+slot1)") ; Branch - Slot 2 (define_insn_reservation "branchInsn" 1 (and (eq_attr "type" "branch") (eq_attr "longConstant" "false")) "slot2") (define_insn_reservation "branchInsnWithConst" 1 (and (eq_attr "type" "branch") (eq_attr "longConstant" "true")) "(slot2+slot0)|(slot2+slot1)") ; Communications - Slot 1 (define_insn_reservation "commsInsn" 1 (eq_attr "type" "comms") "slot1") ; Unknown instructions are assumed to take a single cycle, and use all ; slots. This enables them to actually output a sequence of ; instructions without any limitation. (define_insn_reservation "unknownInsn" 1 (eq_attr "type" "unknown") "(slot0+slot1+slot2)") ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Incorrect DFA scheduling of output dependency. 2004-12-06 16:27 ` Daniel Towner @ 2004-12-06 17:12 ` Vladimir Makarov 2004-12-07 10:59 ` Daniel Towner 0 siblings, 1 reply; 12+ messages in thread From: Vladimir Makarov @ 2004-12-06 17:12 UTC (permalink / raw) To: Daniel Towner; +Cc: Steven Bosscher, gcc, Nathan Sidwell Daniel Towner wrote: > Steven, Nathan, et al. > >> You're not making this easy because you haven't told anything about: >> 1) what target you're working on (apparently something that is not >> in the FSF GCC tree); >> 2) what your DFA description looks like (did you tell the scheduler >> that those two instructions are issued in parallel?); and >> 3) what version of GCC you are working with. >> >> > I'm working on a 16-bit DSP port of gcc, which hasn't been contributed > back to the mainline tree yet. The port is based on 3.4.3 > > The DFA scheduler describes a machine which has 3 execution slots, > plus an additional slot for a long immediate value. I've attached my > DFA description below. The DFA scheduler normally ensures that > instructions with data dependencies are placed in different cycles. > Once the scheduler has completed, the first instruction for each cycle > is marked with a TI mode instruction. I have specialised versions of > asm_output_opcode and final_prescan_insn which detect the TI mode > labels, and arrange for the assembly output to include the VLIW > packing information. > > I have to use the machine dependent reorganisation phase to run the > scheduler, so that the last-jump-optimisation doesn't disturb the TI > mode labels applied to the first instruction in each clock cycle (as > per the IA64). Yes that is right it should be the very last pass of the compiler to generate correct code for a VLIW processor based on the labels set up the scheduler. > >> But first look at the scheduler dumps (-dS and -dR) to see if the >> output dependency is there, of course... >> > I was wrong here. The instruction sequence is actually a data > (read-after-write) dependency, not an output dependency > (write-after-write). However, the relevent portion of the scheduler > dump is as follows: > > (note 82 147 64 2 [bb 2] NOTE_INSN_BASIC_BLOCK) > > (insn:TI 64 82 150 2 (set (reg/v:HI 4 R4 [orig:25 rdIndex ] [25]) > (const_int 0 [0x0])) 15 {movhi} (nil) > (nil)) > > (note 150 64 133 2 NOTE_INSN_LOOP_END) > > (insn 133 150 135 2 (set (reg:HI 5 R5 [33]) > (ashift:HI (reg/v:HI 4 R4 [orig:25 rdIndex ] [25]) > (const_int 2 [0x2]))) 48 {ashlhi3} (insn_list:REG_DEP_ANTI > 64 (nil)) > (expr_list:REG_EQUAL (ashift:HI (reg/v:HI 4 R4 [orig:25 rdIndex ] > [25]) > (const_int 2 [0x2])) > (nil))) > > Does this state that insn 133 is anti-dependent on insn 64? Yes, it does. And that is wrong. > An anti-dependency is a write following a read, but in this sequence a > read follows a write. The anti-dependency first appears after the > basic block reordering pass has been run (which is immediately before > the instruction scheduling pass). The information is not enough to make a real analysis of the bug. But I can guess. Even if the dependence was added in basic block reordering pass (can not say more about this), it should have been removed in insn scheduling first. Even if the dependence was not removed, it should have been changed by higher priority dependence (true dependence). So my guess, your scheduler did not call sched_analyze. Vlad > > If I modify TARGET_SCHED_ADJUST_COST to return 1 when an > anti-dependency is encountered, this results in the two instructions > being scheduled in different cycles (and hence, different VLIW > packets). For a VLIW machine however, it is legal for anti-dependent > instructions to be scheduled in the same cycle, so I can't use this > method to permanently fix the problem. > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Incorrect DFA scheduling of output dependency. 2004-12-06 17:12 ` Vladimir Makarov @ 2004-12-07 10:59 ` Daniel Towner 2004-12-07 13:01 ` Steven Bosscher 2004-12-07 22:15 ` Vladimir N. Makarov 0 siblings, 2 replies; 12+ messages in thread From: Daniel Towner @ 2004-12-07 10:59 UTC (permalink / raw) To: Vladimir Makarov; +Cc: Steven Bosscher, gcc, Nathan Sidwell Vlad, et al., >> I was wrong here. The instruction sequence is actually a data >> (read-after-write) dependency, not an output dependency >> (write-after-write). However, the relevent portion of the scheduler >> dump is as follows: >> >> (note 82 147 64 2 [bb 2] NOTE_INSN_BASIC_BLOCK) >> >> (insn:TI 64 82 150 2 (set (reg/v:HI 4 R4 [orig:25 rdIndex ] [25]) >> (const_int 0 [0x0])) 15 {movhi} (nil) >> (nil)) >> >> (note 150 64 133 2 NOTE_INSN_LOOP_END) >> >> (insn 133 150 135 2 (set (reg:HI 5 R5 [33]) >> (ashift:HI (reg/v:HI 4 R4 [orig:25 rdIndex ] [25]) >> (const_int 2 [0x2]))) 48 {ashlhi3} (insn_list:REG_DEP_ANTI >> 64 (nil)) >> (expr_list:REG_EQUAL (ashift:HI (reg/v:HI 4 R4 [orig:25 rdIndex ] >> [25]) >> (const_int 2 [0x2])) >> (nil))) >> >> Does this state that insn 133 is anti-dependent on insn 64? > I've discovered that the anti-dependency is inserted by sched_analyze. It occurs because of the NOTE_INSN_LOOP_END between the two instructions above. This note introduces a move barrier between the instructions, which is intended to prevent the two instructions being reordered. Currently, this barrier is represented by making the second instruction anti-dependent upon the first. For most processors, I guess that such a dependency works as expected, but a VLIW machine is able to emit such instructions in a single cycle, resulting in an incorrect schedule. It feels like this should be a true dependency, but the relevent code seems to make a distinction between a true dependency (a TRUE_BARRIER) and a order dependency (a MOVE_BARRIER). What sort of dependency should actually be inserted here? thanks, dan. ============================================================================ Daniel Towner picoChip Designs Ltd., Riverside Buildings, 108, Walcot Street, BATH, BA1 5BG daniel.towner@picochip.com 07786 702589 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Incorrect DFA scheduling of output dependency. 2004-12-07 10:59 ` Daniel Towner @ 2004-12-07 13:01 ` Steven Bosscher 2004-12-07 13:15 ` Steven Bosscher 2004-12-07 22:15 ` Vladimir N. Makarov 1 sibling, 1 reply; 12+ messages in thread From: Steven Bosscher @ 2004-12-07 13:01 UTC (permalink / raw) To: Daniel Towner; +Cc: Vladimir Makarov, gcc, Nathan Sidwell On Dec 07, 2004 11:59 AM, Daniel Towner <daniel.towner@picochip.com> wrote: > Vlad, et al., > > >> I was wrong here. The instruction sequence is actually a data > >> (read-after-write) dependency, not an output dependency > >> (write-after-write). However, the relevent portion of the scheduler > >> dump is as follows: > >> > >> (note 82 147 64 2 [bb 2] NOTE_INSN_BASIC_BLOCK) > >> > >> (insn:TI 64 82 150 2 (set (reg/v:HI 4 R4 [orig:25 rdIndex ] [25]) > >> (const_int 0 [0x0])) 15 {movhi} (nil) > >> (nil)) > >> > >> (note 150 64 133 2 NOTE_INSN_LOOP_END) > >> > >> (insn 133 150 135 2 (set (reg:HI 5 R5 [33]) > >> (ashift:HI (reg/v:HI 4 R4 [orig:25 rdIndex ] [25]) > >> (const_int 2 [0x2]))) 48 {ashlhi3} (insn_list:REG_DEP_ANTI > >> 64 (nil)) > >> (expr_list:REG_EQUAL (ashift:HI (reg/v:HI 4 R4 [orig:25 rdIndex ] > >> [25]) > >> (const_int 2 [0x2])) > >> (nil))) > >> > >> Does this state that insn 133 is anti-dependent on insn 64? > > > I've discovered that the anti-dependency is inserted by sched_analyze. > It occurs because of the NOTE_INSN_LOOP_END between the two instructions > above. This note introduces a move barrier between the instructions, > which is intended to prevent the two instructions being reordered. Can someone explain please why we have loop notes in the middle of a basic block? Gr. Steven ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Incorrect DFA scheduling of output dependency. 2004-12-07 13:01 ` Steven Bosscher @ 2004-12-07 13:15 ` Steven Bosscher 2004-12-07 13:26 ` Jeffrey A Law 0 siblings, 1 reply; 12+ messages in thread From: Steven Bosscher @ 2004-12-07 13:15 UTC (permalink / raw) To: gcc; +Cc: Daniel Towner, Vladimir Makarov, Nathan Sidwell, rth On Dec 07, 2004 02:01 PM, Steven Bosscher <stevenb@suse.de> wrote: > On Dec 07, 2004 11:59 AM, Daniel Towner <daniel.towner@picochip.com> wrote: > > > Vlad, et al., > > > > >> I was wrong here. The instruction sequence is actually a data > > >> (read-after-write) dependency, not an output dependency > > >> (write-after-write). However, the relevent portion of the scheduler > > >> dump is as follows: > > >> > > >> (note 82 147 64 2 [bb 2] NOTE_INSN_BASIC_BLOCK) > > >> > > >> (insn:TI 64 82 150 2 (set (reg/v:HI 4 R4 [orig:25 rdIndex ] [25]) > > >> (const_int 0 [0x0])) 15 {movhi} (nil) > > >> (nil)) > > >> > > >> (note 150 64 133 2 NOTE_INSN_LOOP_END) > > >> > > >> (insn 133 150 135 2 (set (reg:HI 5 R5 [33]) > > >> (ashift:HI (reg/v:HI 4 R4 [orig:25 rdIndex ] [25]) > > >> (const_int 2 [0x2]))) 48 {ashlhi3} (insn_list:REG_DEP_ANTI > > >> 64 (nil)) > > >> (expr_list:REG_EQUAL (ashift:HI (reg/v:HI 4 R4 [orig:25 rdIndex ] > > >> [25]) > > >> (const_int 2 [0x2])) > > >> (nil))) > > >> > > >> Does this state that insn 133 is anti-dependent on insn 64? > > > > > I've discovered that the anti-dependency is inserted by sched_analyze. > > It occurs because of the NOTE_INSN_LOOP_END between the two instructions > > above. This note introduces a move barrier between the instructions, > > which is intended to prevent the two instructions being reordered. > > > Can someone explain please why we have loop notes in the middle of > a basic block? In fact maybe someone with a lot of RTL-fu should explain what this comment in sched-deps is supposed to mean to begin with: /* If there is a {LOOP,EHREGION}_{BEG,END} insn note in the middle of a basic block, then we must be sure that no instructions are scheduled across it. Otherwise, the reg_n_refs info (which depends on loop_depth) would become incorrect. */ I read this and I had never heard of reg_n_refs before, so: $ grep -w -r reg_n_refs * FSFChangeLog.11: * combine.c (try_combine): Clear reg_n_refs if i2dest is not haifa-sched.c: be correct. Namely: reg_n_refs, reg_n_sets, reg_n_deaths, sched-deps.c: Otherwise, the reg_n_refs info (which depends on loop_depth) would So even in the ChangeLogs there is only one reference to reg_n_regs. Is this bitrot? Gr. Steven ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Incorrect DFA scheduling of output dependency. 2004-12-07 13:15 ` Steven Bosscher @ 2004-12-07 13:26 ` Jeffrey A Law 2004-12-07 13:40 ` Daniel Berlin 0 siblings, 1 reply; 12+ messages in thread From: Jeffrey A Law @ 2004-12-07 13:26 UTC (permalink / raw) To: Steven Bosscher; +Cc: gcc, Daniel Towner, Vladimir Makarov, Nathan Sidwell, rth On Tue, 2004-12-07 at 14:14 +0100, Steven Bosscher wrote: > > > > Can someone explain please why we have loop notes in the middle of > > a basic block? It's historical. I think it's relatively uncommon. > > In fact maybe someone with a lot of RTL-fu should explain what this > comment in sched-deps is supposed to mean to begin with: > > /* If there is a {LOOP,EHREGION}_{BEG,END} insn note in the middle of a basic > block, then we must be sure that no instructions are scheduled across it. > Otherwise, the reg_n_refs info (which depends on loop_depth) would > become incorrect. */ > > I read this and I had never heard of reg_n_refs before, so: > > $ grep -w -r reg_n_refs * > FSFChangeLog.11: * combine.c (try_combine): Clear reg_n_refs if i2dest is not > haifa-sched.c: be correct. Namely: reg_n_refs, reg_n_sets, reg_n_deaths, > sched-deps.c: Otherwise, the reg_n_refs info (which depends on loop_depth) would > > So even in the ChangeLogs there is only one reference to reg_n_regs. > > Is this bitrot? reg_n_refs got moved into the reg_info_def structure along with most of the other information related to registers. The comment needs updating. jeff ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Incorrect DFA scheduling of output dependency. 2004-12-07 13:26 ` Jeffrey A Law @ 2004-12-07 13:40 ` Daniel Berlin 0 siblings, 0 replies; 12+ messages in thread From: Daniel Berlin @ 2004-12-07 13:40 UTC (permalink / raw) To: Jeffrey A Law Cc: Steven Bosscher, gcc, Daniel Towner, Vladimir Makarov, Nathan Sidwell, rth >> FSFChangeLog.11: * combine.c (try_combine): Clear reg_n_refs if i2dest is not >> haifa-sched.c: be correct. Namely: reg_n_refs, reg_n_sets, reg_n_deaths, >> sched-deps.c: Otherwise, the reg_n_refs info (which depends on loop_depth) would >> >> So even in the ChangeLogs there is only one reference to reg_n_regs. >> >> Is this bitrot? > reg_n_refs got moved into the reg_info_def structure along with most of > the other information related to registers. The comment needs updating. > jeff Just to followup, it's now REG_N_REFS, defined in regs.h --Dan ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Incorrect DFA scheduling of output dependency. 2004-12-07 10:59 ` Daniel Towner 2004-12-07 13:01 ` Steven Bosscher @ 2004-12-07 22:15 ` Vladimir N. Makarov [not found] ` <41B6360E.6010806@redhat.com> 1 sibling, 1 reply; 12+ messages in thread From: Vladimir N. Makarov @ 2004-12-07 22:15 UTC (permalink / raw) To: Daniel Towner; +Cc: Steven Bosscher, gcc, Nathan Sidwell [-- Attachment #1: Type: text/plain, Size: 2429 bytes --] Daniel Towner wrote: > Vlad, et al., > >>> I was wrong here. The instruction sequence is actually a data >>> (read-after-write) dependency, not an output dependency >>> (write-after-write). However, the relevent portion of the scheduler >>> dump is as follows: >>> >>> (note 82 147 64 2 [bb 2] NOTE_INSN_BASIC_BLOCK) >>> >>> (insn:TI 64 82 150 2 (set (reg/v:HI 4 R4 [orig:25 rdIndex ] [25]) >>> (const_int 0 [0x0])) 15 {movhi} (nil) >>> (nil)) >>> >>> (note 150 64 133 2 NOTE_INSN_LOOP_END) >>> >>> (insn 133 150 135 2 (set (reg:HI 5 R5 [33]) >>> (ashift:HI (reg/v:HI 4 R4 [orig:25 rdIndex ] [25]) >>> (const_int 2 [0x2]))) 48 {ashlhi3} >>> (insn_list:REG_DEP_ANTI 64 (nil)) >>> (expr_list:REG_EQUAL (ashift:HI (reg/v:HI 4 R4 [orig:25 rdIndex ] >>> [25]) >>> (const_int 2 [0x2])) >>> (nil))) >>> >>> Does this state that insn 133 is anti-dependent on insn 64? >> >> > I've discovered that the anti-dependency is inserted by sched_analyze. > It occurs because of the NOTE_INSN_LOOP_END between the two > instructions above. This note introduces a move barrier between the > instructions, which is intended to prevent the two instructions being > reordered. Currently, this barrier is represented by making the second > instruction anti-dependent upon the first. For most processors, I > guess that such a dependency works as expected, but a VLIW machine is > able to emit such instructions in a single cycle, resulting in an > incorrect schedule. It feels like this should be a true dependency, > but the relevent code seems to make a distinction between a true > dependency (a TRUE_BARRIER) and a order dependency (a MOVE_BARRIER). > What sort of dependency should actually be inserted here? > Please try the following patch if it works for you, I could commit it into the main line. It should solve the problem of generation of incorrect schedule for VLIW. But the problem of generation of not optimal schedule will still exist because the first insn after the barrier behaves as one setting and using all registers. Vlad Vladimir Makarov <vmakarov@redhat.com> * sched-deps.c (sched_analyze_insn): Use more accurate dependence type for the first insn after MOVE_BARRIER. [-- Attachment #2: Z --] [-- Type: text/plain, Size: 2905 bytes --] Index: sched-deps.c =================================================================== RCS file: /cvs/gcc/gcc/gcc/sched-deps.c,v retrieving revision 1.65.2.1.2.1 diff -c -p -r1.65.2.1.2.1 sched-deps.c *** sched-deps.c 20 May 2004 13:01:49 -0000 1.65.2.1.2.1 --- sched-deps.c 7 Dec 2004 22:09:08 -0000 *************** sched_analyze_insn (struct deps *deps, r *** 965,977 **** EXECUTE_IF_SET_IN_REG_SET (&deps->reg_last_in_use, 0, i, { struct deps_reg *reg_last = &deps->reg_last[i]; add_dependence_list (insn, reg_last->uses, REG_DEP_ANTI); ! add_dependence_list ! (insn, reg_last->sets, ! reg_pending_barrier == TRUE_BARRIER ? 0 : REG_DEP_ANTI); ! add_dependence_list ! (insn, reg_last->clobbers, ! reg_pending_barrier == TRUE_BARRIER ? 0 : REG_DEP_ANTI); }); } else --- 965,980 ---- EXECUTE_IF_SET_IN_REG_SET (&deps->reg_last_in_use, 0, i, { struct deps_reg *reg_last = &deps->reg_last[i]; + enum reg_note dep_type; + add_dependence_list (insn, reg_last->uses, REG_DEP_ANTI); ! dep_type = (reg_pending_barrier == TRUE_BARRIER ! ? 0 : REGNO_REG_SET_P (reg_pending_uses, i) ! ? 0 : (REGNO_REG_SET_P (reg_pending_set, i) ! || REGNO_REG_SET_P (reg_pending_clobber, i)) ! ? REG_DEP_OUTPUT : REG_DEP_ANTI); ! add_dependence_list (insn, reg_last->sets, dep_type); ! add_dependence_list (insn, reg_last->clobbers, dep_type); }); } else *************** sched_analyze_insn (struct deps *deps, r *** 979,992 **** EXECUTE_IF_SET_IN_REG_SET (&deps->reg_last_in_use, 0, i, { struct deps_reg *reg_last = &deps->reg_last[i]; add_dependence_list_and_free (insn, ®_last->uses, REG_DEP_ANTI); ! add_dependence_list_and_free ! (insn, ®_last->sets, ! reg_pending_barrier == TRUE_BARRIER ? 0 : REG_DEP_ANTI); ! add_dependence_list_and_free ! (insn, ®_last->clobbers, ! reg_pending_barrier == TRUE_BARRIER ? 0 : REG_DEP_ANTI); reg_last->uses_length = 0; reg_last->clobbers_length = 0; }); --- 982,999 ---- EXECUTE_IF_SET_IN_REG_SET (&deps->reg_last_in_use, 0, i, { struct deps_reg *reg_last = &deps->reg_last[i]; + enum reg_note dep_type; + add_dependence_list_and_free (insn, ®_last->uses, REG_DEP_ANTI); ! dep_type = (reg_pending_barrier == TRUE_BARRIER ! ? 0 : REGNO_REG_SET_P (reg_pending_uses, i) ! ? 0 : (REGNO_REG_SET_P (reg_pending_set, i) ! || REGNO_REG_SET_P (reg_pending_clobber, i)) ! ? REG_DEP_OUTPUT : REG_DEP_ANTI); ! add_dependence_list_and_free (insn, ®_last->sets, dep_type); ! add_dependence_list_and_free (insn, ®_last->clobbers, ! dep_type), reg_last->uses_length = 0; reg_last->clobbers_length = 0; }); ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <41B6360E.6010806@redhat.com>]
* Re: Incorrect DFA scheduling of output dependency. [not found] ` <41B6360E.6010806@redhat.com> @ 2004-12-08 9:53 ` Daniel Towner 0 siblings, 0 replies; 12+ messages in thread From: Daniel Towner @ 2004-12-08 9:53 UTC (permalink / raw) To: Vladimir N. Makarov; +Cc: gcc, Steven Bosscher, Nathan Sidwell, gcc >> Please try the following patch if it works for you, I could commit it >> into the main line. It should solve the problem of generation of >> incorrect schedule for VLIW. But the problem of generation of not >> optimal schedule will still exist because the first insn after the >> barrier behaves as one setting and using all registers. >> >> >> * sched-deps.c (sched_analyze_insn): Use more accurate dependence >> type for the first insn after MOVE_BARRIER. >> > > Sorry, the previous patch had some typos and failed to be compiled. > So here is the correct version of the patch. Yes, that works. Thanks for your help everyone. dan. ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2004-12-08 9:53 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2004-12-06 11:30 Incorrect DFA scheduling of output dependency Daniel Towner 2004-12-06 12:30 ` Nathan Sidwell 2004-12-06 12:31 ` Steven Bosscher 2004-12-06 16:27 ` Daniel Towner 2004-12-06 17:12 ` Vladimir Makarov 2004-12-07 10:59 ` Daniel Towner 2004-12-07 13:01 ` Steven Bosscher 2004-12-07 13:15 ` Steven Bosscher 2004-12-07 13:26 ` Jeffrey A Law 2004-12-07 13:40 ` Daniel Berlin 2004-12-07 22:15 ` Vladimir N. Makarov [not found] ` <41B6360E.6010806@redhat.com> 2004-12-08 9:53 ` Daniel Towner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).