public inbox for cgen@sourceware.org
 help / color / mirror / Atom feed
* "just in time" compiler/translator for the simulators.
@ 2001-09-15  6:33 Johan Rydberg
  2001-09-15 11:07 ` graydon
       [not found] ` <20010915140753.A532.cygnus.local.cgen@venge.net>
  0 siblings, 2 replies; 3+ messages in thread
From: Johan Rydberg @ 2001-09-15  6:33 UTC (permalink / raw)
  To: cgen

Hi, I would like your comments on something.

I'm thinking about implementing something that you might call
a `just in time' compiler/translator for GNU simulators.  This
would increase (hopefully) performance a bit.

The idea is to translate the simulated insns into native insns
and run them on the host machine.  Insns that can not be translated
will be simulated in `the old fashion way'.

The translation will be done in three steps.

  . translate a whole block of insns into intermediate code
  . perform register allocation and optimize the intermediate code
  . generate host code from the intermediate code

The intermediate code will be represented as `three address' stmts.

Well, a small example:

  The insns;

    lw  4(r1), r3

  Which is defined with CGEN RTL as;

    (set r3 (mem SI (add r1 4)))

  Would give a intermediate code tree that looks something like;

                  . assign
                 / \
                /   \
              mem    r3
              /
            add
            / \
           r1  4

  And would translate into three-address stmts;

    T1 = r1 + 4
    T2 = *T1
    r3 = T2


  Which would generate into something like;

    # begin insn
    call   nsim_insn_begin

    # read register 1 (hardware nr 0)
    mov.l  $0, 4(%esp)          # hardware number
    mov.l  $1, 8(%esp)          # register number
    call   nsim_hw_read_4

    # add "4" to register
    add.l  $4, %eax

    # read memory
    mov.l  %eax, 4(%esp)        # memory address
    call   nsim_mem_read_4

    # store value in register
    mov.l  $0, 4(%esp)
    mov.l  $3, 8(%esp)
    mov.l  %eax, 12(%esp)
    call   nsim_hw_write_4
    ...

  This could of course be optimized a bit.


Note: The translated code have a runtime environment that has
      a pointer to the current CPU structure already pushed on
      the stack.  Space is reserved on the stack for arguments
      to "nsim*" functions.

Comments? Suggestions?

-- 
Johan Rydberg

$ ON F$ERROR("LANGUAGE","ENGLISH","IN_MESSAGE").GT.F$ERROR("NORMAL") -
             THEN EXCUSE/OBJECT=ME

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: "just in time" compiler/translator for the simulators.
  2001-09-15  6:33 "just in time" compiler/translator for the simulators Johan Rydberg
@ 2001-09-15 11:07 ` graydon
       [not found] ` <20010915140753.A532.cygnus.local.cgen@venge.net>
  1 sibling, 0 replies; 3+ messages in thread
From: graydon @ 2001-09-15 11:07 UTC (permalink / raw)
  To: cgen

hi,

On Sat, Sep 15, 2001 at 03:33:18PM +0200, Johan Rydberg wrote:

> The idea is to translate the simulated insns into native insns
> and run them on the host machine.  Insns that can not be translated
> will be simulated in `the old fashion way'.

I had a very similar conversation with fche a couple months ago, so I'll just
regurgitate what he said and tailor it a bit to the current proposal.

when the simulator is generated, we have a static description of the insn's
semantics, but it is at least partially abstract: it has "holes" into which the
actual flags, operand values, etc. will be placed, when a given instance of the
insn is decoded and extracted. if you're lucky, the chosen semantics won't
depend on the target CPU's dynamic state, so we'll assume that for now.

once an insn is decoded and extracted, in our present simulators, a record is
kept in a hashtable indicating the decoded semantic function (a function
pointer) and the extracted operand values. the table is hashed on the pc value
of the insn, so if the insn is returned to (say in a loop) the same record is
fetched and fed into the semantic function for subsequent execution. if we're
being very ambitious we chain such records together into pseudo basic blocks,
jumping directly from semantics to semantics. 

now, keep in mind that semantic functions can be specialized arbitrarily.  for
instance, say we have one semantic function representing a three-operand "mul"
insn. we may specialize this to eight functions: one for immediate operands and
one for indirect operands, in each of 3 operand "holes" (2^3 = 8).  so we'd
have "mul_imm-imm-imm", "mul_imm-imm-ind", "mul_imm-ind-imm", etc. when
decoding and extracting, we could set the semantic function pointer to the
variant within this space of 8 mul functions, and save us ever having to
execute any sort of operand-mode switching logic inside the function.

but that's just one specialization; we could in fact specialize semantic
functions into "small immediate" vs. "large immediate", into "power-of-two" vs.
"general integer", even all the way down to the individual bit-pattern level.
i.e., in a 16-bit insn word machine, we could generate 2^16 semantic functions,
one for each possible opcode _and operand_. obviously this becomes a little
unwieldy on large insn-word machines, not to mention inefficient on
sparsely-coded insn sets. but the thing to keep in mind is that the
specialization itself can be performed statically, during simulator generation,
when we have a lot of time on our hands. gcc generates code "one function at a
time", so it will not run out of memory or anything processing an excessively
large set of semantic functions, and you're only ever going to load into memory
those functions which are demand-paged in by nature of being used. so it's not
too bad.

what you're proposing (jit simulation in general) is to delay the task of
specializing semantic functions until the moment of execution (or perhaps
slightly before, say during loading). this has the advantage that you only ever
generate the specialized variant when it occurs (avoiding 2^32 functions), so
you can probably specialize all the way down to the bit level, i.e. perform
a reasonably full "translation".

the disadvantage is that you're essentially taking on the burden of a compiler
backend. you need to do host insn selection, scheduling, register allocation,
dataflow optimization, and assembly for every host platform you want to work
with. the only credible tool I can imagine using for this "live" is MLRISC,
which means you're coding in SML; not a terrible burden, but something to keep
in mind.

another, slightly weirder approach is to scan your target programs insns and
emit fully-specialized semantic C for those insns alone, and feed them into
gcc, essentially pre-decoding and pre-extracting the entire set of functions
used by your program alone. then you could feed gperf the set of insn bit
patterns you encountered, and get a nice direct dispatch table into your
semantic functions.  this would be comparatively easier than jitting, as you'd
just be guiding the existing specialization concept by the set of insns which
actually occurs in your program, and leaving all the backend work to gcc.

the downside would be that you'd need to re-do all this stuff for each target
program; similar to your jit proposal, you'd want to do it into a temporary file
at program-load time. loading a really big program could take a while. 

many mixtures of these strategies are of course possible. I wouldn't fully endorse
jitting carte-blanche, but it might be a good strategy in some settings.

-graydon

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: "just in time" compiler/translator for the simulators.
       [not found] ` <20010915140753.A532.cygnus.local.cgen@venge.net>
@ 2001-09-15 11:44   ` Frank Ch. Eigler
  0 siblings, 0 replies; 3+ messages in thread
From: Frank Ch. Eigler @ 2001-09-15 11:44 UTC (permalink / raw)
  To: cgen

graydon wrote:

: [...]  another, slightly weirder approach is to scan your target
: programs insns and emit fully-specialized semantic C for those insns
: alone, and feed them into gcc, essentially pre-decoding and
: pre-extracting the entire set of functions used by your program
: alone. [...]

I actually built a working toy implementation of this sort of thing a
few years ago.  It worked fairly well, demonstrating a number of
predictable down-sides, and a number of exciting possibilities.

I never completed the paper on it, but its last draft may be of
interest to you.  If nothing else, it discusses various simulation
techniques, and it's fairly readable.

        http://web.elastic.org/~fche/gxsim.ps

Get it while it's hot .. uhh ... stone cold.  Limited time offer.

- FChE

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2001-09-15 11:44 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-09-15  6:33 "just in time" compiler/translator for the simulators Johan Rydberg
2001-09-15 11:07 ` graydon
     [not found] ` <20010915140753.A532.cygnus.local.cgen@venge.net>
2001-09-15 11:44   ` Frank Ch. Eigler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).