From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <cgen-return-1450-listarch-cgen=sources.redhat.com@sources.redhat.com>
Received: (qmail 32675 invoked by alias); 6 Aug 2003 18:13:03 -0000
Mailing-List: contact cgen-help@sources.redhat.com; run by ezmlm
Precedence: bulk
List-Subscribe: <mailto:cgen-subscribe@sources.redhat.com>
List-Archive: <http://sources.redhat.com/ml/cgen/>
List-Post: <mailto:cgen@sources.redhat.com>
List-Help: <mailto:cgen-help@sources.redhat.com>, <http://sources.redhat.com/lists.html#faqs>
Sender: cgen-owner@sources.redhat.com
Received: (qmail 32632 invoked from network); 6 Aug 2003 18:13:01 -0000
Received: from unknown (HELO tiktok.the-meissners.org) (66.205.90.83)
  by sources.redhat.com with SMTP; 6 Aug 2003 18:13:01 -0000
Received: from tiktok.the-meissners.org (localhost [127.0.0.1])
	by tiktok.the-meissners.org (8.12.8/8.12.8) with ESMTP id h76ID0rn028895
	for <cgen@sources.redhat.com>; Wed, 6 Aug 2003 14:13:00 -0400
Received: (from meissner@localhost)
	by tiktok.the-meissners.org (8.12.8/8.12.8/Submit) id h76ICxgD028893
	for cgen@sources.redhat.com; Wed, 6 Aug 2003 14:12:59 -0400
Date: Wed, 06 Aug 2003 18:23:00 -0000
From: Michael Meissner <cgen-mail@the-meissners.org>
To: cgen@sources.redhat.com
Subject: Re: Types and other issues with cgen
Message-ID: <20030806181259.GA28859@tiktok.the-meissners.org>
Mail-Followup-To: Michael Meissner <cgen-mail@the-meissners.org>,
	cgen@sources.redhat.com
References: <20030806024506.GA12937@tiktok.the-meissners.org> <16177.14973.213564.499122@casey.transmeta.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <16177.14973.213564.499122@casey.transmeta.com>
User-Agent: Mutt/1.4.1i
X-SW-Source: 2003-q3/txt/msg00026.txt.bz2

On Wed, Aug 06, 2003 at 10:27:25AM -0700, Doug Evans wrote:
> Michael Meissner writes:
>  > I've been looking at the internal types used within cgen, and I wanted to get
>  > some comments before I start making wholesale changes.  Sorry for the length,
>  > but I thought it is important to talk about the issues (#1, #4, and #8 are
>  > minor issues).
>  > 
>  > 1) Cgen uses the PARAMS macro to selectively hide prototypes.  Given that both GCC
>  >    and BINUTILS now require a C90 compiler with prototypes, would patches that go
>  >    through and compeletely prototype things be accepted?
> 
> Yep.
> 
>  > 2) Cgen has a type mechanism (DI/SI/etc.) but it doesn't seem to be used in the
>  >    actual code for at least the assembler and disassembler
> 
> Using the modes in the assembler/disassembler isn't the right way to go.
> These modes are for semantic operation, not assembly/disassembly.
> Imagine some instruction with an immediate operand that is a fixed
> set of constants that is encoded with special magic numbers.
> Register indices are another example.
> There's a disconnect between representation in the instruction
> and use during semantic evaluation.
> 
>  > (I haven't gotten to sim/sid yet).
> 
> Modes are definately used in simulation.
> 
>  >    All fields in the cgen_fields structure are signed long, no
>  >    matter what the type that I declare in the .cpu file is.  In part this seems
>  >    to be because extract_normal and friends take an address of the field to
>  >    fill, and return 0/1 for error and success.  Wouldn't a better approach be
>  >    to size & type the fields as the user specified, and make the extract
>  >    functions return the extracted value and return error/success via a
>  >    pointer.  I could see either separate extractor functions for each type, or
>  >    signed/unsigned extractor functions of the widest type, or just a single
>  >    extract function being used.
> 
> Either way one has to have multiple functions per type
> (unless of course one used a union or some such),
> regardless of whether the pass/fail indicator is the result
> or returned via a pointer to it.

Not necessarily, you could always have 1 function which returned the widest
type, and the compiler can do any narrowing/sign conversion.

> Having multiple variants of the internal extract_normal routine
> is an increment in complication I haven't needed yet so I've been
> defering it.
>
> Note that there are already functions that have multiple variants
> dependent on type.  See for example m32r_cgen_[gs]et_{int,vma}_operand
> in opcodes/m32r-ibld.c.
> These functions aren't currently used by any binutils program.
> They're services offered to programs outside of binutils.
> 
>  > 3) Signed long is another problem in that the machine I'm targeting is a 64-bit
>  >    machine, but I am doing development on an x86 machine.  If we keep to a
>  >    single type, it should be at least bfd_signed_vma which will be the
>  >    appropriate size to hold addresses in the target machine.  This will mean
>  >    having to rewrite the places that just call printf or the print functions,
>  >    but that is not too difficult.  Another possibility is to use a cgen
>  >    specific type (or two types for signed/unsigned) that is sized to be as
>  >    large as the largest type used in the .cpu file.  Ideally for 32-bit ports
>  >    on 32-bit hosts, you would not slow things down by using 64 bit types
>  >    blindly, but it would allow those of us developing for larger hosts to
>  >    use cgen.
> 
> For assembly/disassembly purposes the issue is what is the maximum
> size of a "word" in the instruction's representation?
> And for the sake of [V]LIW machines let's keep separate the notion
> of individual instructions inside one collection of instructions
> (or in Transmeta parlance: atoms and molecules (whoop dee doo)).

Yep.

> I'm assuming/hoping you can pack each instruction separately
> and then combine them at the end, and for now do the final packing
> (or initial unpacking for disassembly) outside of cgen.

Yes, the packing is fairly trival (break the instructions into a 2 bit field
and a 41 bit field, combine the 3 2-bit fields into 1 5-bit field, and the
resultant 5-bit field, followed by the 3 41-bit fields make for a 128-bit
combined instruction).  In the instruction encoding, only the values 0, 1, and
2 are allowed.  Labels will force padding to the next 128-bit boundary.

The 1 86-bit instruction is treated as two separate 43-bit instructions.

>  >    There are machines out there with 128 bit registers, such as the MIPS chip
>  >    that is at the heart of the Sony playstation, the SES2 registers on the
>  >    Pentium IV, and the Altivec registers on the newer Powerpcs.  However, C
>  >    compilers don't often times give 128 bit types.  We might want to think
>  >    about how to handle these machines as well.  In terms of instruction size, I
>  >    do have a 86 bit instruction which pushes the problem also.  This may
>  >    require using gmp if needed.  Too bad, we aren't coding in C++, where we
>  >    could just define a class type to get the extra precision.
> 
> cgen based simulators (written in C) can already handle simulating
> architectures with 64 bit values on hosts where the compiler doesn't
> have long long (with C++ there's less of an issue).
> Dunno how often it is used, so no claim is made that there isn't bitrot
> or that it's complete, but it was tested way back when.
> Grep for HAVE_LONGLONG in sim/common/cgen-types.h.
> 
> Semantics modes are to some extent black boxes.
> As new modes become needed we can add them.
> A simulator on a host with a compiler that can't represent them
> can represent them as a struct and provide the necessary
> manipulators of that struct. (for c++ s/struct/class/ if you prefer)
> No claim is made that the addition will be a walk in the park,
> but that's the plan-of-record.
> 
>  > 4) As a nit, we use unsigned int for the hash type, and I suspect it might be
>  >    cleaner if we had a cgen specific type for holding hash values (ie,
>  >    cgen_hash_t).
> 
> Sure.  An increment in complication I was defering.
> One might want to add to the name the context in which it is used.
> Cgen might want to use different kinds of hashes in different contexts.
> 
>  > 5) As an experiment, I compiled cgen with -Wconversion, and it showed a lot of
>  >    places where implicit signed<->unsigned conversions were going on.  A lot of
>  >    the places were using int to hold sizes like buffer lengths, and passing
>  >    sizeof(...) to the value, and size_t would be more useful.  Unfortunately it
>  >    also shows other places where having a single type for the fields (such as
>  >    long currently, or bfd_signed_vma/cgen_int_t possibly in the future).  One
>  >    of my thoughts is to have a union of an appropriate unsigned and signed
>  >    types of the same size, and use the appropriate element in the expansion.
> 
> Removing the warnings would certainly be a good idea, though this
> particular warning doesn't always have a high signal/noise ratio.

My first attempt at using a 64-bit type fails on the m32r since there are
places I haven't caught yet where it stores a 32-bit unsigned value (which
happens to hold a negative value) into a larger 64-bit item, and the sign
doesn't extend correctly.  I'm assuming as I go through the tedious task of
fixing all of the warnings, it will show where I'm losing precision.

>  > 6) Using bfd_put_bits and bfd_get_bits to convert the bits into proper endian
>  >    format only works for bit sizes of 8, 16, 32, and 64.  In all other places,
>  >    bfd aborts (my machine has mostly 43 bit instructions, and 1 86 bit
>  >    instruction before the encoding mentioned in #7).  It might be better to
>  >    open code this, rather than falling back to the bfd functions.
>  > 
>  >    Another idea is to always encode instructions expressed as a series of bytes
>  >    in big endian (or little endian) format, and then expect the final assembler
>  >    encoding to do the appropriate copying.  Otherwise, I see a lot of code that
>  >    checks the endianess to get the correct byte.
> 
> A final assembly pass to do the appropriate copying isn't necessarily
> a slam-dunk.
> 
> The asm/disasm side of cgen currently has two modes of representing
> instructions: as an "int" in host byte order, or as a string of bytes
> in target byte order.

Yes I know, but in the code that handles the bytes rather than the integer
case, I see ?: operations to get the correct endian orientation so you know
whether to fetch a byte from the beginning or the end.

>  > 7) As I have mentioned in the past, my machine uses 3 43-bit instructions that
>  >    are encoded into a 128 bit super instruction.  Any ideas for the syntax for
>  >    specifying the encode/decode operations?
> 
> I'm not sure I understand.  In what context?

Basically where would be the proper place to add this (define-isa seems the
logical canidate, though define-mach/define-cpu are other possibilities).  I'm
thinking something like the handlers option in define-operand.

>  > 8) The @arch@_cgen_hw_table uses (PTR) in initializing the asm_data field.
>  >    This makes debugging harder.  Would it be possible to have 2 fields so that
>  >    each member is correctly typed, and you can print out pointers in the
>  >    debugger?
> 
> 2 fields?  How would it look?
> (it's certainly a useful thing to do, and I'd say go for it,
> but I'm not clear what the result would look like)

I'm going off of this comment in include/opcode/cgen.h:

	typedef struct
	{
	  char *name;
	  enum cgen_hw_type type;
	  /* There is currently no example where both index specs and value specs
	     are required, so for now both are clumped under "asm_data".  */
	  enum cgen_asm_type asm_type;
	  PTR asm_data;
	#ifndef CGEN_HW_NBOOL_ATTRS
	#define CGEN_HW_NBOOL_ATTRS 1
	#endif
	  CGEN_ATTR_TYPE (CGEN_HW_NBOOL_ATTRS) attrs;
	#define CGEN_HW_ATTRS(hw) (&(hw)->attrs)
	} CGEN_HW_ENTRY;

I assume we would just have two fields, one for holding index specs, and the
other for holding value specs.

> Note that things are currently not totally hopeless.
> One could print the value and then say "info sym <value>",
> and then print the variable gdb gives.
> 
>  > So, suggestions on how you would like me to extend cgen to handle the problems
>  > my machine exposes?
> 
> For assembly/disassembly, I need to think about it for a bit.
> I think what we need to do is be able to handle each insn
> individually and handle packing/unpacking outside of cgen (for now).
> That reduces the problem to handling how "words" are layed out
> in each individual insn(atom).  Since we're dealing with 43 bit
> entities (or 2*43 bits), I'm wondering if treating them as
> 64 bit entities for packing/unpacking will work.

It might once we use a 64-bit type.  However, the long instruction still is a
problem since it doesn't fit in an integral type.

> (how the 2*43 bits case would be handled would depend on the details
> I guess, maybe 2*64 bits or maybe 32+64 bits).
> 
> I'm guessing studying how to handle ia64 would suffice.

That was my first thought (the designer of my machine used the IA-64 as a
model, so they are similar in some superficial ways).  However, I've concluded
that the IA-64 port is not a port that was completed, even if you build it on a
64-bit machine so that long is 64-bits.  Among other things, it has no support
for encoding the instrucitons, and so would fail in bfd_put_bits.

>  > My initial thoughts are to use a cgen specific type for the types.  The first
>  > round would use bfd_vma/bfd_signed_vma, but eventually size the type based on
>  > the maximum size used in the .cpu file.  I'm thinking of using the union with
>  > signed and unsigned fields, to deal with many of the conversion issues.
> 
> If after reading the above you still think this is the way to go,
> let's discuss it further.

-- 
Michael Meissner
email: gnu@the-meissners.org
http://www.the-meissners.org