public inbox for gcc-help@gcc.gnu.org
 help / color / mirror / Atom feed
* Scheduling and inserting NOPs
@ 2007-11-12 18:59 Boris Boesler
  2007-11-13 18:18 ` Ian Lance Taylor
  0 siblings, 1 reply; 16+ messages in thread
From: Boris Boesler @ 2007-11-12 18:59 UTC (permalink / raw)
  To: gcc-help

Hi!

  My next GCC-adventure is scheduling:

  1) As far as I can see some levels of optimization do not schedule  
the code. To generate correct code I have to insert NOP-operations to  
fill the delays (the architecture has no interlocks), but I can't  
find a function/hook to generate these NOPs. Could someone point me  
to the functions in the internals document? I tried  
TARGET_SCHED_DFA_NEW_CYCLE but this one can't write to the output.

  2) The scheduling depends on the producer/consumer. An operation  
can consume a register-operand without any difficult scheduling:

cycle 1: MOV R1, #4711	;; r1 := 4711
cycle 2: ADD R1, R1, R1	;; r1 := r1 + r1

  But if an operation reads the memory via an address register, then  
the register must be written some cycles before the operation:

cycle 1: MOV A1, _label_a	;; get address of var a
cycle 2: NOP
..
cycle 5: ADD (A1), (A1), (A1)	;; *a := *a + *a

  I tried to solve this with (define_bypass 4 ..) with my own guard- 
function, but the ADD-operation appears before cycle 5 (same problem  
as no.1?)

  Hm, these are FAQs? Is there a collection of all answers to these  
FAQs?

Every idea is welcome,
thanks,
Boris

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Scheduling and inserting NOPs
  2007-11-12 18:59 Scheduling and inserting NOPs Boris Boesler
@ 2007-11-13 18:18 ` Ian Lance Taylor
  2007-11-13 20:51   ` Boris Boesler
  0 siblings, 1 reply; 16+ messages in thread
From: Ian Lance Taylor @ 2007-11-13 18:18 UTC (permalink / raw)
  To: Boris Boesler; +Cc: gcc-help

Boris Boesler <baembel@gmx.de> writes:

>   1) As far as I can see some levels of optimization do not schedule
> the code. To generate correct code I have to insert NOP-operations to
> fill the delays (the architecture has no interlocks), but I can't
> find a function/hook to generate these NOPs. Could someone point me
> to the functions in the internals document? I tried
> TARGET_SCHED_DFA_NEW_CYCLE but this one can't write to the output.

>   2) The scheduling depends on the producer/consumer. An operation
> can consume a register-operand without any difficult scheduling:
> 
> cycle 1: MOV R1, #4711	;; r1 := 4711
> cycle 2: ADD R1, R1, R1	;; r1 := r1 + r1
> 
>   But if an operation reads the memory via an address register, then
> the register must be written some cycles before the operation:
> 
> cycle 1: MOV A1, _label_a	;; get address of var a
> cycle 2: NOP
> ..
> cycle 5: ADD (A1), (A1), (A1)	;; *a := *a + *a
> 
>   I tried to solve this with (define_bypass 4 ..) with my own guard-
> function, but the ADD-operation appears before cycle 5 (same problem
> as no.1?)

gcc is weak on VLIW support.  The scheduler does not insert NOPs for
you.

The usual workaround is to write code in TARGET_ASM_PROLOGUE which
does it.  E.g., frv_pack_insns in config/frv/frv.c.

Ian

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Scheduling and inserting NOPs
  2007-11-13 18:18 ` Ian Lance Taylor
@ 2007-11-13 20:51   ` Boris Boesler
  2007-11-14 15:11     ` Ian Lance Taylor
  0 siblings, 1 reply; 16+ messages in thread
From: Boris Boesler @ 2007-11-13 20:51 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: gcc-help

Hi!

Am 13.11.2007 um 19:14 schrieb Ian Lance Taylor:

> Boris Boesler <baembel@gmx.de> writes:
>
>>   1) As far as I can see some levels of optimization do not schedule
>> the code. To generate correct code I have to insert NOP-operations to
>> fill the delays (the architecture has no interlocks),

  ...

> gcc is weak on VLIW support.  The scheduler does not insert NOPs for
> you.

  VLIWs? As I mentioned above the processor does not wait till the  
operands are evaluated; this must be done by the programmer/compiler  
by inserting NOPs or other instructions.

  I found some stuff for TI C4X/C6X processors ..

Boris

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Scheduling and inserting NOPs
  2007-11-13 20:51   ` Boris Boesler
@ 2007-11-14 15:11     ` Ian Lance Taylor
  2007-11-15 14:28       ` Boris Boesler
  0 siblings, 1 reply; 16+ messages in thread
From: Ian Lance Taylor @ 2007-11-14 15:11 UTC (permalink / raw)
  To: Boris Boesler; +Cc: gcc-help

Boris Boesler <baembel@gmx.de> writes:

> Hi!
> 
> Am 13.11.2007 um 19:14 schrieb Ian Lance Taylor:
> 
> > Boris Boesler <baembel@gmx.de> writes:
> >
> >>   1) As far as I can see some levels of optimization do not schedule
> >> the code. To generate correct code I have to insert NOP-operations to
> >> fill the delays (the architecture has no interlocks),
> 
>   ...
> 
> > gcc is weak on VLIW support.  The scheduler does not insert NOPs for
> > you.
> 
>   VLIWs? As I mentioned above the processor does not wait till the
> operands are evaluated; this must be done by the programmer/compiler
> by inserting NOPs or other instructions.

It amounts to the same thing: you need to insert NOP instructions.

Ian

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Scheduling and inserting NOPs
  2007-11-14 15:11     ` Ian Lance Taylor
@ 2007-11-15 14:28       ` Boris Boesler
  2007-11-15 17:34         ` Ian Lance Taylor
  0 siblings, 1 reply; 16+ messages in thread
From: Boris Boesler @ 2007-11-15 14:28 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: gcc-help

Hi!

Am 14.11.2007 um 07:17 schrieb Ian Lance Taylor:

>>   VLIWs? As I mentioned above the processor does not wait till the
>> operands are evaluated; this must be done by the programmer/compiler
>> by inserting NOPs or other instructions.
>
> It amounts to the same thing: you need to insert NOP instructions.

  And you are working on it really, really hard? ;-)

Boris

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Scheduling and inserting NOPs
  2007-11-15 14:28       ` Boris Boesler
@ 2007-11-15 17:34         ` Ian Lance Taylor
  2007-11-21 17:08           ` Boris Boesler
  0 siblings, 1 reply; 16+ messages in thread
From: Ian Lance Taylor @ 2007-11-15 17:34 UTC (permalink / raw)
  To: Boris Boesler; +Cc: gcc-help

Boris Boesler <baembel@gmx.de> writes:

> Am 14.11.2007 um 07:17 schrieb Ian Lance Taylor:
> 
> >>   VLIWs? As I mentioned above the processor does not wait till the
> >> operands are evaluated; this must be done by the programmer/compiler
> >> by inserting NOPs or other instructions.
> >
> > It amounts to the same thing: you need to insert NOP instructions.
> 
>   And you are working on it really, really hard? ;-)

In my earlier message I explained what people do today for processors
which need NOP instructions to be inserted.

Nobody is working on implementing a better approach.

Ian

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Scheduling and inserting NOPs
  2007-11-15 17:34         ` Ian Lance Taylor
@ 2007-11-21 17:08           ` Boris Boesler
  2007-11-21 17:26             ` Andrew Haley
  2007-11-21 18:32             ` Ian Lance Taylor
  0 siblings, 2 replies; 16+ messages in thread
From: Boris Boesler @ 2007-11-21 17:08 UTC (permalink / raw)
  To: gcc-help; +Cc: Ian Lance Taylor

Hi!

> In my earlier message I explained what people do today for processors
> which need NOP instructions to be inserted.
>
> Nobody is working on implementing a better approach.

  I had a small discussion with Ian, which showed that my problem is  
not understood. I try to summarize my previous emails:

  I have a processor with a 6 stages pipeline. Every stage is used  
for exactly one cycle. This means:
1) there can never be a hardware structural hazard (that's good)
2) scheduling is not needed (even better)

  So what's the problem? The problem is that there are 2 stage which  
can read registers and there is no bypass/result-forwarding from the  
register-write-back to the first register-read-stage.
  The following program is correct on my processor (pseudo-code):
A0 := &a
A0 := &b
NOP
A1 := 1;
A2 := (A0) + A1 ; contents of memory at label a plus 1
A3 := (A0) + A1 ; contents of memory at label b plus 1

  There has to be a number of instructions (e.g. NOPs) between  
writing the address-register A0 and reading it for indirect memory- 
access. To solve this I need scheduling. I implemented the scheduler  
specification including its bypasses. The code is scheduled  
correctly, as I can see in the verbose output. The missing/empty  
cycles are visible - but empty.

  Now I need a pass to insert the missing NOPs. The approach Ian  
mentioned with the DFA-simulator does not work, because the simulator  
detects structural hardware hazards only (see http://www.nabble.com/Re 
%3A-Question-of-the-DFA-scheduler-p603486.html), which can not occur  
in my processor. Or did the DFA change?

  NOP-insertion could be done in the hook TARGET_SCHED_REORDER with a  
little bit of book-keeping the cycles, but in this hook no code can  
be added.

  In the thread above the ia64 port is mentioned, I have to check  
that. But isn't there a simple solution to insert NOPs?

Boris

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Scheduling and inserting NOPs
  2007-11-21 17:08           ` Boris Boesler
@ 2007-11-21 17:26             ` Andrew Haley
  2007-11-21 17:32               ` Boris Boesler
  2007-11-21 18:32             ` Ian Lance Taylor
  1 sibling, 1 reply; 16+ messages in thread
From: Andrew Haley @ 2007-11-21 17:26 UTC (permalink / raw)
  To: Boris Boesler; +Cc: gcc-help, Ian Lance Taylor

Boris Boesler writes:
 > Hi!
 > 
 > > In my earlier message I explained what people do today for processors
 > > which need NOP instructions to be inserted.
 > >
 > > Nobody is working on implementing a better approach.
 > 
 >   I had a small discussion with Ian, which showed that my problem is  
 > not understood. I try to summarize my previous emails:
 > 
 >   I have a processor with a 6 stages pipeline. Every stage is used  
 > for exactly one cycle. This means:
 > 1) there can never be a hardware structural hazard (that's good)
 > 2) scheduling is not needed (even better)
 > 
 >   So what's the problem? The problem is that there are 2 stage which  
 > can read registers and there is no bypass/result-forwarding from the  
 > register-write-back to the first register-read-stage.
 >   The following program is correct on my processor (pseudo-code):
 > A0 := &a
 > A0 := &b
 > NOP
 > A1 := 1;
 > A2 := (A0) + A1 ; contents of memory at label a plus 1
 > A3 := (A0) + A1 ; contents of memory at label b plus 1
 > 
 >   There has to be a number of instructions (e.g. NOPs) between  
 > writing the address-register A0 and reading it for indirect memory- 
 > access. To solve this I need scheduling. I implemented the scheduler  
 > specification including its bypasses. The code is scheduled  
 > correctly, as I can see in the verbose output. The missing/empty  
 > cycles are visible - but empty.
 > 
 >   Now I need a pass to insert the missing NOPs.

Is there some special reason you can't do this in machine dependent
reorg?  That seems like the obvious place.  Just walk every basic
block stuffing NOPs.

Andrew.

-- 
Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SL4 1TE, UK
Registered in England and Wales No. 3798903

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Scheduling and inserting NOPs
  2007-11-21 17:26             ` Andrew Haley
@ 2007-11-21 17:32               ` Boris Boesler
  2007-11-21 17:39                 ` Andrew Haley
  0 siblings, 1 reply; 16+ messages in thread
From: Boris Boesler @ 2007-11-21 17:32 UTC (permalink / raw)
  To: gcc-help

Am 21.11.2007 um 18:07 schrieb Andrew Haley:

> Is there some special reason you can't do this in machine dependent
> reorg?  That seems like the obvious place.  Just walk every basic
> block stuffing NOPs.

  The problem isn't where, it's how. The insn-list after scheduling  
doesn't provide any information about cycles or delays, does it? How  
do I find out before which insn I have to insert a NOP, if the  
scheduler doesn't pass any info to me?

Boris

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Scheduling and inserting NOPs
  2007-11-21 17:32               ` Boris Boesler
@ 2007-11-21 17:39                 ` Andrew Haley
  2007-11-21 17:54                   ` Andrew Haley
  0 siblings, 1 reply; 16+ messages in thread
From: Andrew Haley @ 2007-11-21 17:39 UTC (permalink / raw)
  To: Boris Boesler; +Cc: gcc-help

Boris Boesler writes:
 > Am 21.11.2007 um 18:07 schrieb Andrew Haley:
 > 
 > > Is there some special reason you can't do this in machine dependent
 > > reorg?  That seems like the obvious place.  Just walk every basic
 > > block stuffing NOPs.
 > 
 >   The problem isn't where, it's how. The insn-list after scheduling  
 > doesn't provide any information about cycles or delays, does it? How  
 > do I find out before which insn I have to insert a NOP, if the  
 > scheduler doesn't pass any info to me?

Walk the instructions in each BB, and if you see a violation stuff a NOP.

This is so simple that I'm wondering if there's something important
that I'm failing to understand.

Andrew.

-- 
Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SL4 1TE, UK
Registered in England and Wales No. 3798903

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Scheduling and inserting NOPs
  2007-11-21 17:39                 ` Andrew Haley
@ 2007-11-21 17:54                   ` Andrew Haley
  2007-11-21 18:15                     ` Boris Boesler
  0 siblings, 1 reply; 16+ messages in thread
From: Andrew Haley @ 2007-11-21 17:54 UTC (permalink / raw)
  To: Boris Boesler, gcc-help

Andrew Haley writes:
 > Boris Boesler writes:
 >  > Am 21.11.2007 um 18:07 schrieb Andrew Haley:
 >  > 
 >  > > Is there some special reason you can't do this in machine dependent
 >  > > reorg?  That seems like the obvious place.  Just walk every basic
 >  > > block stuffing NOPs.
 >  > 
 >  >   The problem isn't where, it's how. The insn-list after scheduling  
 >  > doesn't provide any information about cycles or delays, does it? How  
 >  > do I find out before which insn I have to insert a NOP, if the  
 >  > scheduler doesn't pass any info to me?
 > 
 > Walk the instructions in each BB, and if you see a violation stuff a NOP.
 > 
 > This is so simple that I'm wondering if there's something important
 > that I'm failing to understand.

Ah, hold on.  Is it that you don't know where sched inserted the empty
cycles, and there's no easy way to find out?

Andrew.

-- 
Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SL4 1TE, UK
Registered in England and Wales No. 3798903

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Scheduling and inserting NOPs
  2007-11-21 17:54                   ` Andrew Haley
@ 2007-11-21 18:15                     ` Boris Boesler
  2007-11-22 17:16                       ` Andrew Haley
  0 siblings, 1 reply; 16+ messages in thread
From: Boris Boesler @ 2007-11-21 18:15 UTC (permalink / raw)
  To: gcc-help


Am 21.11.2007 um 18:39 schrieb Andrew Haley:

>> Walk the instructions in each BB, and if you see a violation stuff  
>> a NOP.
>>
>> This is so simple that I'm wondering if there's something important
>> that I'm failing to understand.
>
> Ah, hold on.  Is it that you don't know where sched inserted the empty
> cycles, and there's no easy way to find out?

  Finally. YES.

  To solve this someone simply stored cycle and insn (recorded in  
TARGET_SCHED_REORDER) in a hash-table and reused this info in  
TARGET_MACHINE_DEPENDENT_REORG. But I read that there passes which  
might reorder or insert additional code. In that case the info in the  
hash-table might be incorrect or not valid.

  Any ideas?

Boris

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Scheduling and inserting NOPs
  2007-11-21 17:08           ` Boris Boesler
  2007-11-21 17:26             ` Andrew Haley
@ 2007-11-21 18:32             ` Ian Lance Taylor
  1 sibling, 0 replies; 16+ messages in thread
From: Ian Lance Taylor @ 2007-11-21 18:32 UTC (permalink / raw)
  To: Boris Boesler; +Cc: gcc-help

Boris Boesler <baembel@gmx.de> writes:

>   Now I need a pass to insert the missing NOPs. The approach Ian
> mentioned with the DFA-simulator does not work, because the simulator
> detects structural hardware hazards only (see http://www.nabble.com/Re
> %3A-Question-of-the-DFA-scheduler-p603486.html), which can not occur
> in my processor. Or did the DFA change?

I didn't suggest using the DFA simulator.  I agree that that won't
work.  What I suggested was to insert the NOPs in the
TARGET_ASM_FUNCTION_PROLOGUE hook.  This is not a scheduling hook.  It
is target specific code that is run immediately before the assembly
instructions are generated.

It seems to me that I keep saying the same thing, and you keep saying
that it won't work without explaining why.  I won't reply again.

Andrew's suggestion of using TARGET_MACHINE_DEPENDENT_REORG can also
work.  I don't like that approach as much in practice because it
doesn't work in conjunction with delay slots.  On some machines delay
slots are a good way to model VLIW semantics, in which instructions
can execute in parallel.

Ian

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Scheduling and inserting NOPs
  2007-11-21 18:15                     ` Boris Boesler
@ 2007-11-22 17:16                       ` Andrew Haley
  2007-11-22 17:19                         ` Boris Boesler
  0 siblings, 1 reply; 16+ messages in thread
From: Andrew Haley @ 2007-11-22 17:16 UTC (permalink / raw)
  To: Boris Boesler; +Cc: gcc-help

Boris Boesler writes:
 > 
 > Am 21.11.2007 um 18:39 schrieb Andrew Haley:
 > 
 > >> Walk the instructions in each BB, and if you see a violation stuff  
 > >> a NOP.
 > >>
 > >> This is so simple that I'm wondering if there's something important
 > >> that I'm failing to understand.
 > >
 > > Ah, hold on.  Is it that you don't know where sched inserted the empty
 > > cycles, and there's no easy way to find out?
 > 
 >   Finally. YES.
 > 
 >   To solve this someone simply stored cycle and insn (recorded in  
 > TARGET_SCHED_REORDER) in a hash-table and reused this info in  
 > TARGET_MACHINE_DEPENDENT_REORG. But I read that there passes which  
 > might reorder or insert additional code. In that case the info in the  
 > hash-table might be incorrect or not valid.
 > 
 >   Any ideas?

I'd still do what I suggested: walk the instruction list inserting
NOPs, first based on the information from sched, and if that doesn't
work out because stuff has moved, just count instructions to make sure
you don't violate the constraints.

Andrew.

-- 
Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SL4 1TE, UK
Registered in England and Wales No. 3798903

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Scheduling and inserting NOPs
  2007-11-22 17:16                       ` Andrew Haley
@ 2007-11-22 17:19                         ` Boris Boesler
  2007-11-22 18:54                           ` Andrew Haley
  0 siblings, 1 reply; 16+ messages in thread
From: Boris Boesler @ 2007-11-22 17:19 UTC (permalink / raw)
  To: gcc-help; +Cc: Andrew Haley


Am 22.11.2007 um 17:28 schrieb Andrew Haley:

> I'd still do what I suggested: walk the instruction list inserting
> NOPs, first based on the information from sched, and if that doesn't
> work out because stuff has moved, just count instructions to make sure
> you don't violate the constraints.

  Where is the sched info?

  I'm following another approach:
  I "cache" the last n insns and check with my bypass guard function  
if the bypass will be ignored. If the bypass will be ignored then I  
have to insert NOPs. This works basically, but there are two obstacles:
1) in the future I have to check the scheduling across basic block  
boundaries
2) the approach fails if instruction fill a branch delay slots; NOPs  
are not inserted at all because I can't handle the sequence at the  
moment.

Boris

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Scheduling and inserting NOPs
  2007-11-22 17:19                         ` Boris Boesler
@ 2007-11-22 18:54                           ` Andrew Haley
  0 siblings, 0 replies; 16+ messages in thread
From: Andrew Haley @ 2007-11-22 18:54 UTC (permalink / raw)
  To: Boris Boesler; +Cc: gcc-help

Boris Boesler writes:
 > 
 > Am 22.11.2007 um 17:28 schrieb Andrew Haley:
 > 
 > > I'd still do what I suggested: walk the instruction list inserting
 > > NOPs, first based on the information from sched, and if that doesn't
 > > work out because stuff has moved, just count instructions to make sure
 > > you don't violate the constraints.
 > 
 >   Where is the sched info?

You just told me that "someone simply stored cycle and insn (recorded
in TARGET_SCHED_REORDER) in a hash-table and reused this info in
TARGET_MACHINE_DEPENDENT_REORG."  That info.

 >   I'm following another approach:
 >   I "cache" the last n insns and check with my bypass guard function  
 > if the bypass will be ignored. If the bypass will be ignored then I  
 > have to insert NOPs. This works basically, but there are two obstacles:

 > 1) in the future I have to check the scheduling across basic block  
 > boundaries

Ooh, that really is nasty; you'll need the CFG for that.

 > 2) the approach fails if instruction fill a branch delay slots; NOPs  
 > are not inserted at all because I can't handle the sequence at the  
 > moment.

I suspect this is what Ian was referring to.  In any case, you're well
on the way now, by the sounds of it.

Andrew.

-- 
Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SL4 1TE, UK
Registered in England and Wales No. 3798903

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2007-11-22 17:19 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-11-12 18:59 Scheduling and inserting NOPs Boris Boesler
2007-11-13 18:18 ` Ian Lance Taylor
2007-11-13 20:51   ` Boris Boesler
2007-11-14 15:11     ` Ian Lance Taylor
2007-11-15 14:28       ` Boris Boesler
2007-11-15 17:34         ` Ian Lance Taylor
2007-11-21 17:08           ` Boris Boesler
2007-11-21 17:26             ` Andrew Haley
2007-11-21 17:32               ` Boris Boesler
2007-11-21 17:39                 ` Andrew Haley
2007-11-21 17:54                   ` Andrew Haley
2007-11-21 18:15                     ` Boris Boesler
2007-11-22 17:16                       ` Andrew Haley
2007-11-22 17:19                         ` Boris Boesler
2007-11-22 18:54                           ` Andrew Haley
2007-11-21 18:32             ` Ian Lance Taylor

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).