RE: ideas for modeling pipeline with bypass registers?

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* RE: ideas for modeling pipeline with bypass registers?
@ 2003-07-30 15:43 Nitin  Gupta--SSW, Noida
  0 siblings, 0 replies; 9+ messages in thread
From: Nitin  Gupta--SSW, Noida @ 2003-07-30 15:43 UTC (permalink / raw)
  To: Greg McGary; +Cc: gcc

> > How many such registers do you have in your target?
> 
> I don't see how it matters at the level of abstraction we are dealing
> with, but there are three functional-units: adder, barrel-shifter,
> logic, and each has a bypass register.
> 
> Greg
> 
Well this is what I understand about your target. Plz correct me if I am
wrong.
1. ALU has 3 units each with a bypass register.
2. These bypass registers are exclusivey for these functional units only.
3. Each ALU operation will use the bypass register depending on the
function.
4. The bypass register can be explicitly used for reading but not writing
any value to it.


Then probably it might  nit be possible to abstract two byepass registers.
since according to the timing analysis given by you:

        insn-0: bypass-even <- r2 + r3;
        insn-1: bypass-odd <- bypass-even + r5;
        insn-2: [ r1 <- bypass-even; bypass-even <- r2 + bypass-odd; ]
        insn-3: r4 <- bypass-odd 
        insn-4: r6 <- bypass-even

insn-0 will use the adder-bypass register and similarly insn-1 also.thus in
insn-2 you will have the clobbered value of bypass register. Hence under the
above assumptions your strategy won't be possible to implement.

Just one thought that if you have the timing analysis of pipeline then
probably you can have following type of timin analysis using just one bypass
register:

        insn-0: bypass <- r2 + r3;
        insn-1: [ r1 <- bypass; bypass <- bypass + r5; ]	<-- here
        insn-2: [r4 <- bypass;  bypass <- r2 + bypass; ]	<-- here
        insn-3: r6 <- bypass

The instructions marked "here" will actually be possible if the pipeline
allows such a register read/write mechanism. This can be clear from the
timing analysis of the pipeline.

~Nitin

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: ideas for modeling pipeline with bypass registers?
  2003-07-28 23:17 Greg McGary
  2003-07-29  1:29 ` tm_gccmail
@ 2003-07-29 21:12 ` tm_gccmail
  1 sibling, 0 replies; 9+ messages in thread
From: tm_gccmail @ 2003-07-29 21:12 UTC (permalink / raw)
  To: Greg McGary; +Cc: gcc

On 28 Jul 2003, Greg McGary wrote:

> I'm working on a GCC port to a pipelined machine that has 1 insn
> latency between ALU operation and numbered-register writeback
> (i.e.,  insn-0: ALU op
>         insn-1: ...
>         insn-2: ALU op result appears in destination register
> 
> In order to make ALU ops available with no latency, there's a special
> "bypass" register that can be used before the register-file writeback.
> This bypass value is available for only one cycle, after which it is
> clobbered by the next ALU op.
> 
> Questions:
> 
> Are there any other GCC ports to CPUs with a similar architectural
> feature?

I just thought of this, but in a way this is similar to the original MIPS
without pipeline interlocks.

On the original MIPS implementations, the instructions had a fixed latency
and the pipeline would never stall. So if you tried to fetch a result
before it was ready, the register would be garbage.

If I understand correctly, this is handled at the assembler level by
having the assembler emit NOPs if the compiler emits code where the
latency requirements are not met.

This is turned on/off with the ".set reorder" and ".set
noreorder" assembler directives.

Toshi

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: ideas for modeling pipeline with bypass registers?
  2003-07-29 17:53 Nitin  Gupta--SSW, Noida
@ 2003-07-29 18:26 ` Greg McGary
  0 siblings, 0 replies; 9+ messages in thread
From: Greg McGary @ 2003-07-29 18:26 UTC (permalink / raw)
  To: Nitin Gupta--SSW, Noida; +Cc: gcc, greg

"Nitin  Gupta--SSW, Noida" <nitingup@noida.hcltech.com> writes:

> > No.  It doesn't make sense to allocate it because it has a fixed,
> > limited, machine-defined use and isn't available for other purposes.

> How many such registers do you have in your target?

I don't see how it matters at the level of abstraction we are dealing
with, but there are three functional-units: adder, barrel-shifter,
logic, and each has a bypass register.

Greg

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: ideas for modeling pipeline with bypass registers?
@ 2003-07-29 17:53 Nitin  Gupta--SSW, Noida
  2003-07-29 18:26 ` Greg McGary
  0 siblings, 1 reply; 9+ messages in thread
From: Nitin  Gupta--SSW, Noida @ 2003-07-29 17:53 UTC (permalink / raw)
  To: Greg McGary; +Cc: gcc

> 
> > Can this bypass register be used in allocation?
> 
> No.  It doesn't make sense to allocate it because it has a fixed,
> limited, machine-defined use and isn't available for other purposes.
> 
> 
How many such registers do you have in your target?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: ideas for modeling pipeline with bypass registers?
  2003-07-29 15:38 Nitin  Gupta--SSW, Noida
@ 2003-07-29 17:40 ` Greg McGary
  0 siblings, 0 replies; 9+ messages in thread
From: Greg McGary @ 2003-07-29 17:40 UTC (permalink / raw)
  To: Nitin Gupta--SSW, Noida; +Cc: gcc, greg

"Nitin  Gupta--SSW, Noida" <nitingup@noida.hcltech.com> writes:

> >     source code:
> > 
> >         r1 = r2 + r3;
> >         r4 = r1 + r5;
> >         r6 = r2 + r4;
> > 
> >     timing analysis:
> > 
> >         insn-0: bypass-even <- r2 + r3;
> >         insn-1: bypass-odd <- bypass-even + r5;
> >         insn-2: [ r1 <- bypass-even; bypass-even <- r2 + bypass-odd; ]
> >         insn-3: r4 <- bypass-odd 
> >         insn-4: r6 <- bypass-even

> Can this bypass register be used in allocation?

No.  It doesn't make sense to allocate it because it has a fixed,
limited, machine-defined use and isn't available for other purposes.

> Another point considering
> just your example only is that if this calculated value of r1 is only for
> getting value of r4 then the insn to store bypass register in r1 can
> actually be not needed.  similarly for r4, you can avoid insn-3.

Storing bypass into the register file is automatic, and unavoidable.
Don't confuse the timing analysis with assembler code.  These are not
instructions, but rather descriptions of events.  I should have
labeled them "cycle-N", rather than "insn-N".  Also, I wrote the
example with the assumption that the computation continued long past
this fragment and the destination registers would have lifetimes
beyond the uses shown here.

> Your approach will actually give good results when there are number of ALU
> operations simultaneously and splitting one ALU operation into a bypass
> register at rtl level can actually give insn scheduler chances of allowing
> some other mem load/store insns to be grouped with it.

I assume that by "Your [Greg's] approach", you mean explicitly
modeling bypass registers in RTL.  My gut tells me this is the way to
go, but I haven't worked through the details enough to understand the
problems.

Greg

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: ideas for modeling pipeline with bypass registers?
@ 2003-07-29 15:38 Nitin  Gupta--SSW, Noida
  2003-07-29 17:40 ` Greg McGary
  0 siblings, 1 reply; 9+ messages in thread
From: Nitin  Gupta--SSW, Noida @ 2003-07-29 15:38 UTC (permalink / raw)
  To: Greg McGary; +Cc: gcc

> 
> In order to make ALU ops available with no latency, there's a special
> "bypass" register that can be used before the register-file writeback.
> This bypass value is available for only one cycle, after which it is
> clobbered by the next ALU op.
> 
> Questions:
> 
> Are there any other GCC ports to CPUs with a similar architectural
> feature?
> 
> How might this be modeled in GCC?
> 
> One idea is to tell GCC there's no latency for register-file writeback
> from the ALU, then teach PRINT_OPERAND() to detect the case where it's
> outputting a register operand whose value isn't available yet, and
> print the bypass register name instead.  Seems very kludgy.
> 

This is the easiest implementation but the pipeline will be badly affected.
I would advise not to follow this approach.

> Another idea, which I like much better, but don't yet know if it's
> feasible (please offer your opinions) is to explicitly model the ALU
> operation first writing ALU result to the bypass register, then emit a
> following insn to copy the bypass reg to the originally named dest
> register.  The insn scheduler would make sure these operations are
> adequately separated in time.  There would be no assembler code
> generated for the copy of bypass register to dest register, because
> it's done implicitly by the machine.  A wrinkle is that I'd need two
> bypass registers, call them bypass-odd and bypass-even to handle the
> interleaving of insns that would otherwise clobber a single bypass
> register.  E.g.,
> 
>     source code:
> 
>         r1 = r2 + r3;
>         r4 = r1 + r5;
>         r6 = r2 + r4;
> 
>     timing analysis:
> 
>         insn-0: bypass-even <- r2 + r3;
>         insn-1: bypass-odd <- bypass-even + r5;
>         insn-2: [ r1 <- bypass-even; bypass-even <- r2 + bypass-odd; ]
>         insn-3: r4 <- bypass-odd 
>         insn-4: r6 <- bypass-even
> 
> Observe that the even-numbered insns can only set and write-back from
> the even bypass, and can only read-as-operand the odd bypass,
> similarly for odd insns with roles reversed.

Can this byepass register be used in allocation? Another point considering
just your exapmle only is that if this calculated value of r1 is only for
getting value of r4 then the insn to store byepass register in r1 can
actually be not needed. similarly for r4, you can avoid insn-3.
Your approach will actually give good results when there are number of ALU
operations simultaneously and splitting one ALU operation into a byepass
register at rtl level can actually give insn scheduler chances of allowing
some other mem load/store insns to be grouped with it.

~Nitin

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: ideas for modeling pipeline with bypass registers?
  2003-07-29  1:29 ` tm_gccmail
@ 2003-07-29  6:50   ` Greg McGary
  0 siblings, 0 replies; 9+ messages in thread
From: Greg McGary @ 2003-07-29  6:50 UTC (permalink / raw)
  To: tm_gccmail; +Cc: gcc, greg

<tm_gccmail@mail.kloo.net> writes:

> On 28 Jul 2003, Greg McGary wrote:
> 
> > I'm working on a GCC port to a pipelined machine that has 1 insn
> > latency between ALU operation and numbered-register writeback
> > (i.e.,  insn-0: ALU op
> >         insn-1: ...
> >         insn-2: ALU op result appears in destination register
> > 
> > In order to make ALU ops available with no latency, there's a special
> > "bypass" register that can be used before the register-file writeback.
> > This bypass value is available for only one cycle, after which it is
> > clobbered by the next ALU op.
> 
> I'm assuming this bypass register is an architecturally visible register?

Yes, it is visible.  It can be named as a source operand in most every
context that the ordinary register-file members can.

Greg

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: ideas for modeling pipeline with bypass registers?
  2003-07-28 23:17 Greg McGary
@ 2003-07-29  1:29 ` tm_gccmail
  2003-07-29  6:50   ` Greg McGary
  2003-07-29 21:12 ` tm_gccmail
  1 sibling, 1 reply; 9+ messages in thread
From: tm_gccmail @ 2003-07-29  1:29 UTC (permalink / raw)
  To: Greg McGary; +Cc: gcc

On 28 Jul 2003, Greg McGary wrote:

> I'm working on a GCC port to a pipelined machine that has 1 insn
> latency between ALU operation and numbered-register writeback
> (i.e.,  insn-0: ALU op
>         insn-1: ...
>         insn-2: ALU op result appears in destination register
> 
> In order to make ALU ops available with no latency, there's a special
> "bypass" register that can be used before the register-file writeback.
> This bypass value is available for only one cycle, after which it is
> clobbered by the next ALU op.

I'm assuming this bypass register is an architecturally visible register?

Toshi


^ permalink raw reply	[flat|nested] 9+ messages in thread

* ideas for modeling pipeline with bypass registers?
@ 2003-07-28 23:17 Greg McGary
  2003-07-29  1:29 ` tm_gccmail
  2003-07-29 21:12 ` tm_gccmail
  0 siblings, 2 replies; 9+ messages in thread
From: Greg McGary @ 2003-07-28 23:17 UTC (permalink / raw)
  To: gcc; +Cc: greg

I'm working on a GCC port to a pipelined machine that has 1 insn
latency between ALU operation and numbered-register writeback
(i.e.,  insn-0: ALU op
        insn-1: ...
        insn-2: ALU op result appears in destination register

In order to make ALU ops available with no latency, there's a special
"bypass" register that can be used before the register-file writeback.
This bypass value is available for only one cycle, after which it is
clobbered by the next ALU op.

Questions:

Are there any other GCC ports to CPUs with a similar architectural
feature?

How might this be modeled in GCC?

One idea is to tell GCC there's no latency for register-file writeback
from the ALU, then teach PRINT_OPERAND() to detect the case where it's
outputting a register operand whose value isn't available yet, and
print the bypass register name instead.  Seems very kludgy.

Another idea, which I like much better, but don't yet know if it's
feasible (please offer your opinions) is to explicitly model the ALU
operation first writing ALU result to the bypass register, then emit a
following insn to copy the bypass reg to the originally named dest
register.  The insn scheduler would make sure these operations are
adequately separated in time.  There would be no assembler code
generated for the copy of bypass register to dest register, because
it's done implicitly by the machine.  A wrinkle is that I'd need two
bypass registers, call them bypass-odd and bypass-even to handle the
interleaving of insns that would otherwise clobber a single bypass
register.  E.g.,

    source code:

        r1 = r2 + r3;
        r4 = r1 + r5;
        r6 = r2 + r4;

    timing analysis:

        insn-0: bypass-even <- r2 + r3;
        insn-1: bypass-odd <- bypass-even + r5;
        insn-2: [ r1 <- bypass-even; bypass-even <- r2 + bypass-odd; ]
        insn-3: r4 <- bypass-odd 
        insn-4: r6 <- bypass-even

Observe that the even-numbered insns can only set and write-back from
the even bypass, and can only read-as-operand the odd bypass,
similarly for odd insns with roles reversed.

Is GCC capable of doing this somewhat naturally?

Comments?  Better ideas?

Greg

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2003-07-30 14:55 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-07-30 15:43 ideas for modeling pipeline with bypass registers? Nitin  Gupta--SSW, Noida
  -- strict thread matches above, loose matches on Subject: below --
2003-07-29 17:53 Nitin  Gupta--SSW, Noida
2003-07-29 18:26 ` Greg McGary
2003-07-29 15:38 Nitin  Gupta--SSW, Noida
2003-07-29 17:40 ` Greg McGary
2003-07-28 23:17 Greg McGary
2003-07-29  1:29 ` tm_gccmail
2003-07-29  6:50   ` Greg McGary
2003-07-29 21:12 ` tm_gccmail

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).