public inbox for cgen@sourceware.org
 help / color / mirror / Atom feed
* generalizing the delay rtx function
@ 2001-03-08 13:01 Frank Ch. Eigler
  2001-03-12 20:04 ` Ben Elliston
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Frank Ch. Eigler @ 2001-03-08 13:01 UTC (permalink / raw)
  To: cgen

Hi -

As you may be aware, the delay rtx function is used, despite its
work-in-progress designation, to model delayed branches in
constructs like
	(delay 1
	   (set pc (add pc 42)))
The DELAY-SLOT insn attribute is inferred from this for use by
simulator mainlines.  That's the extent of the effect of the delay
rtx.

In order to model architectures with exposed pipelines (i.e., no
or limited pipeline interlocks), and related effects like delayed
loads, I'd like to take it beyond this, by coupling it to the
parallel-write mechanism.

As you might be aware, ports that have VLIW features tend to use
the "parallel-write" mechanism in their semantic blocks in order
to queue updates to registers/memory? until after all concurrently
executed instructions have been processed.  This lets multiple
reader instructions execute together with a writer instruction,
without detailed worry about the evaluation sequence.

Anyway, how about a scheme such as this:

- Provide a clear definition for the DELAY rtx:
  The numeric argument is the number of instruction cycles
  after the current one, at which the enclosed set expressions
  take effect.
- Restrict the use of the DELAY rtx to only contain SET expressions
  to hardware/memory registers.  Forbid other calculations.
- Possibly, force use of (DELAY 0 ....) to express VLIW concurrency,
  at least in new ports.
- Infer "parallel-write?" (or a new equivalent) from the presence of
  DELAY rtxs.
- Eliminate special treatment of PC by fitting delayed branches into
  this model.

Then, the generated simulator code would be changed, so that:

- Semantic functions, instead of taking a single parexec structure
  pointer (for write queueing), take an array of them.  Within
  (DELAY <N> RTX*) blocks, define OPRND to point to the appropriate
  elements in the parexec array.
- The insn evaluation loop would keep an array of parexec structs
  as a rotating buffer, always running the writeback code on the first
  one, then rotating the set, then passing it to the next insn.  cgen
  could compute the maximum index needed.

This way, code like
	(set reg1 1)
	(delay 0 (set reg1 3))
	(delay 1 (set reg2 5))
	(delay 2 (set reg1 6))
would each be well-defined and useful.

An alternate cgen syntax possibility is to introduce a
	(delayed-set N lvalue rvalue)
rtx.

Any advice?


- FChE
-- 
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.4 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE6p/MSVZbdDOm/ZT0RAhEUAJ0bwN0RZp+xVHRPYAqGSVyAAyBG4gCcCWDM
cYY/X5f0qbjpafcqdKj1FKw=
=3H6j
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: generalizing the delay rtx function
  2001-03-08 13:01 generalizing the delay rtx function Frank Ch. Eigler
@ 2001-03-12 20:04 ` Ben Elliston
  2001-03-12 20:33 ` Doug Evans
  2001-03-13 23:40 ` matthew green
  2 siblings, 0 replies; 7+ messages in thread
From: Ben Elliston @ 2001-03-12 20:04 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: cgen

>>>>> "Frank" == Frank Ch Eigler <fche@redhat.com> writes:

  Frank> This way, code like
  Frank> 	(set reg1 1)
  Frank> 	(delay 0 (set reg1 3))
  Frank> 	(delay 1 (set reg2 5))
  Frank> 	(delay 2 (set reg1 6))
  Frank> would each be well-defined and useful.

  Frank> An alternate cgen syntax possibility is to introduce a
  Frank> 	(delayed-set N lvalue rvalue)
  Frank> rtx.

  Frank> Any advice?

My preference is for the more generalised form of (delay ..).  It
looks very useful, I might add.

Ben

^ permalink raw reply	[flat|nested] 7+ messages in thread

* generalizing the delay rtx function
  2001-03-08 13:01 generalizing the delay rtx function Frank Ch. Eigler
  2001-03-12 20:04 ` Ben Elliston
@ 2001-03-12 20:33 ` Doug Evans
  2001-03-13 23:40 ` matthew green
  2 siblings, 0 replies; 7+ messages in thread
From: Doug Evans @ 2001-03-12 20:33 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: cgen

Frank Ch. Eigler writes:
 > Hi -
 > 
 > As you may be aware, the delay rtx function is used, despite its
 > work-in-progress designation,

Despite?  How do you define "work-in-progress"?

 > to model delayed branches in
 > constructs like
 > 	(delay 1
 > 	   (set pc (add pc 42)))
 > The DELAY-SLOT insn attribute is inferred from this for use by
 > simulator mainlines.  That's the extent of the effect of the delay
 > rtx.
 > 
 > In order to model architectures with exposed pipelines (i.e., no
 > or limited pipeline interlocks), and related effects like delayed
 > loads, I'd like to take it beyond this, by coupling it to the
 > parallel-write mechanism.

You mean take the work-in-progress and finish(*1) the work?

[(*1) or make closer to being finished ...]

 > As you might be aware, ports that have VLIW features tend to use
 > the "parallel-write" mechanism in their semantic blocks in order
 > to queue updates to registers/memory? until after all concurrently
 > executed instructions have been processed.  This lets multiple
 > reader instructions execute together with a writer instruction,
 > without detailed worry about the evaluation sequence.
 > 
 > Anyway, how about a scheme such as this:
 > 
 > - Provide a clear definition for the DELAY rtx:
 >   The numeric argument is the number of instruction cycles
 >   after the current one, at which the enclosed set expressions
 >   take effect.
 > - Restrict the use of the DELAY rtx to only contain SET expressions
 >   to hardware/memory registers.  Forbid other calculations.

Ok.

 > - Possibly, force use of (DELAY 0 ....) to express VLIW concurrency,
 >   at least in new ports.

Ick.  Or rather, got an example?

 > - Infer "parallel-write?" (or a new equivalent) from the presence of
 >   DELAY rtxs.

Ditto.

 > - Eliminate special treatment of PC by fitting delayed branches into
 >   this model.

No current opinion.

 > Then, the generated simulator code would be changed, so that:
 > 
 > - Semantic functions, instead of taking a single parexec structure
 >   pointer (for write queueing), take an array of them.  Within
 >   (DELAY <N> RTX*) blocks, define OPRND to point to the appropriate
 >   elements in the parexec array.

Guess I'd have to see the implementation.

 > - The insn evaluation loop would keep an array of parexec structs
 >   as a rotating buffer, always running the writeback code on the first
 >   one, then rotating the set, then passing it to the next insn.  cgen
 >   could compute the maximum index needed.
 > 
 > This way, code like
 > 	(set reg1 1)
 > 	(delay 0 (set reg1 3))
 > 	(delay 1 (set reg2 5))
 > 	(delay 2 (set reg1 6))
 > would each be well-defined and useful.

Yep.

 > An alternate cgen syntax possibility is to introduce a
 > 	(delayed-set N lvalue rvalue)
 > rtx.
 > 
 > Any advice?

Eat healthy and exercise.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* re: generalizing the delay rtx function
  2001-03-08 13:01 generalizing the delay rtx function Frank Ch. Eigler
  2001-03-12 20:04 ` Ben Elliston
  2001-03-12 20:33 ` Doug Evans
@ 2001-03-13 23:40 ` matthew green
  2001-03-14  5:05   ` Frank Ch. Eigler
  2 siblings, 1 reply; 7+ messages in thread
From: matthew green @ 2001-03-13 23:40 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: cgen

   
   - Provide a clear definition for the DELAY rtx:
     The numeric argument is the number of instruction cycles
     after the current one, at which the enclosed set expressions
     take effect.

is this possible?  eg, (sparc) if i do:

	ba	foo
	 ld	[%l1 + 4], %o0

vs.

	ba	foo
	 tst	%o0

the load can take *much* longer than the tst?


other than that, i think this looks fine.



.mrg

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: generalizing the delay rtx function
  2001-03-13 23:40 ` matthew green
@ 2001-03-14  5:05   ` Frank Ch. Eigler
  2001-03-14 16:43     ` matthew green
  0 siblings, 1 reply; 7+ messages in thread
From: Frank Ch. Eigler @ 2001-03-14  5:05 UTC (permalink / raw)
  To: matthew green; +Cc: cgen

Hi -

On Wed, Mar 14, 2001 at 06:40:05PM +1100, matthew green wrote:
: [...]
: is this possible?  eg, (sparc) if i do:
: 
: 	ba	foo
: 	 ld	[%l1 + 4], %o0
: vs.
: 	ba	foo
: 	 tst	%o0
: 
: the load can take *much* longer than the tst?

I'm not sure I guess correctly at your point; is it that
these two code sequences require a different number of
clock cycles to run on a SPARC chip?  If so, yes, but
the delay is not a programmer-visible one, so the 
proposed extensions to the delay rtx would not be used
to model it.


- FChE
-- 
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.4 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE6r2yjVZbdDOm/ZT0RAlXlAJwJO/CGR9B8l6z2Lh3FLpYnUtwwdQCfSHmo
SgzwbSjByhZ4D82MjfRlDeU=
=LVMc
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 7+ messages in thread

* re: generalizing the delay rtx function
  2001-03-14  5:05   ` Frank Ch. Eigler
@ 2001-03-14 16:43     ` matthew green
  2001-03-14 16:55       ` Frank Ch. Eigler
  0 siblings, 1 reply; 7+ messages in thread
From: matthew green @ 2001-03-14 16:43 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: cgen

   
   On Wed, Mar 14, 2001 at 06:40:05PM +1100, matthew green wrote:
   : [...]
   : is this possible?  eg, (sparc) if i do:
   :=20
   : 	ba	foo
   : 	 ld	[%l1 + 4], %o0
   : vs.
   : 	ba	foo
   : 	 tst	%o0
   :=20
   : the load can take *much* longer than the tst?
   
   I'm not sure I guess correctly at your point; is it that
   these two code sequences require a different number of
   clock cycles to run on a SPARC chip?  If so, yes, but
   the delay is not a programmer-visible one, so the=20
   proposed extensions to the delay rtx would not be used
   to model it.
   

you said the delay rtx would be changed to indicate the number
of "instruction cycles" before the effect is seen.  my example
above shows a case where this isn't going to be known.  if you
didn't mean "instruction cycles" but really "instructions" then
my point is meaningless.


.mrg.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: generalizing the delay rtx function
  2001-03-14 16:43     ` matthew green
@ 2001-03-14 16:55       ` Frank Ch. Eigler
  0 siblings, 0 replies; 7+ messages in thread
From: Frank Ch. Eigler @ 2001-03-14 16:55 UTC (permalink / raw)
  To: matthew green; +Cc: cgen

Hi -

mrg wrote:
: [...]
: you said the delay rtx would be changed to indicate the number
: of "instruction cycles" before the effect is seen.  my example
: above shows a case where this isn't going to be known.  if you
: didn't mean "instruction cycles" but really "instructions" then
: my point is meaningless.

Ah - a terminology glitch.  "instruction cycle" == time taken for
an instruction; "instruction cycle" != "clock cycle".  You're right;
I should have just used "instruction" though, to avoid the ambiguity.

- FChE
-- 
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.4 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE6sBMNVZbdDOm/ZT0RAjJkAJ444Scl8L/m9xpVAnelO0y33DW0yQCaArlg
n2H7TbFSg7PazO/tTos3qdU=
=DIom
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2001-03-14 16:55 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-03-08 13:01 generalizing the delay rtx function Frank Ch. Eigler
2001-03-12 20:04 ` Ben Elliston
2001-03-12 20:33 ` Doug Evans
2001-03-13 23:40 ` matthew green
2001-03-14  5:05   ` Frank Ch. Eigler
2001-03-14 16:43     ` matthew green
2001-03-14 16:55       ` Frank Ch. Eigler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).