From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 32675 invoked by alias); 6 Aug 2003 18:13:03 -0000 Mailing-List: contact cgen-help@sources.redhat.com; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cgen-owner@sources.redhat.com Received: (qmail 32632 invoked from network); 6 Aug 2003 18:13:01 -0000 Received: from unknown (HELO tiktok.the-meissners.org) (66.205.90.83) by sources.redhat.com with SMTP; 6 Aug 2003 18:13:01 -0000 Received: from tiktok.the-meissners.org (localhost [127.0.0.1]) by tiktok.the-meissners.org (8.12.8/8.12.8) with ESMTP id h76ID0rn028895 for ; Wed, 6 Aug 2003 14:13:00 -0400 Received: (from meissner@localhost) by tiktok.the-meissners.org (8.12.8/8.12.8/Submit) id h76ICxgD028893 for cgen@sources.redhat.com; Wed, 6 Aug 2003 14:12:59 -0400 Date: Wed, 06 Aug 2003 18:23:00 -0000 From: Michael Meissner To: cgen@sources.redhat.com Subject: Re: Types and other issues with cgen Message-ID: <20030806181259.GA28859@tiktok.the-meissners.org> Mail-Followup-To: Michael Meissner , cgen@sources.redhat.com References: <20030806024506.GA12937@tiktok.the-meissners.org> <16177.14973.213564.499122@casey.transmeta.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <16177.14973.213564.499122@casey.transmeta.com> User-Agent: Mutt/1.4.1i X-SW-Source: 2003-q3/txt/msg00026.txt.bz2 On Wed, Aug 06, 2003 at 10:27:25AM -0700, Doug Evans wrote: > Michael Meissner writes: > > I've been looking at the internal types used within cgen, and I wanted to get > > some comments before I start making wholesale changes. Sorry for the length, > > but I thought it is important to talk about the issues (#1, #4, and #8 are > > minor issues). > > > > 1) Cgen uses the PARAMS macro to selectively hide prototypes. Given that both GCC > > and BINUTILS now require a C90 compiler with prototypes, would patches that go > > through and compeletely prototype things be accepted? > > Yep. > > > 2) Cgen has a type mechanism (DI/SI/etc.) but it doesn't seem to be used in the > > actual code for at least the assembler and disassembler > > Using the modes in the assembler/disassembler isn't the right way to go. > These modes are for semantic operation, not assembly/disassembly. > Imagine some instruction with an immediate operand that is a fixed > set of constants that is encoded with special magic numbers. > Register indices are another example. > There's a disconnect between representation in the instruction > and use during semantic evaluation. > > > (I haven't gotten to sim/sid yet). > > Modes are definately used in simulation. > > > All fields in the cgen_fields structure are signed long, no > > matter what the type that I declare in the .cpu file is. In part this seems > > to be because extract_normal and friends take an address of the field to > > fill, and return 0/1 for error and success. Wouldn't a better approach be > > to size & type the fields as the user specified, and make the extract > > functions return the extracted value and return error/success via a > > pointer. I could see either separate extractor functions for each type, or > > signed/unsigned extractor functions of the widest type, or just a single > > extract function being used. > > Either way one has to have multiple functions per type > (unless of course one used a union or some such), > regardless of whether the pass/fail indicator is the result > or returned via a pointer to it. Not necessarily, you could always have 1 function which returned the widest type, and the compiler can do any narrowing/sign conversion. > Having multiple variants of the internal extract_normal routine > is an increment in complication I haven't needed yet so I've been > defering it. > > Note that there are already functions that have multiple variants > dependent on type. See for example m32r_cgen_[gs]et_{int,vma}_operand > in opcodes/m32r-ibld.c. > These functions aren't currently used by any binutils program. > They're services offered to programs outside of binutils. > > > 3) Signed long is another problem in that the machine I'm targeting is a 64-bit > > machine, but I am doing development on an x86 machine. If we keep to a > > single type, it should be at least bfd_signed_vma which will be the > > appropriate size to hold addresses in the target machine. This will mean > > having to rewrite the places that just call printf or the print functions, > > but that is not too difficult. Another possibility is to use a cgen > > specific type (or two types for signed/unsigned) that is sized to be as > > large as the largest type used in the .cpu file. Ideally for 32-bit ports > > on 32-bit hosts, you would not slow things down by using 64 bit types > > blindly, but it would allow those of us developing for larger hosts to > > use cgen. > > For assembly/disassembly purposes the issue is what is the maximum > size of a "word" in the instruction's representation? > And for the sake of [V]LIW machines let's keep separate the notion > of individual instructions inside one collection of instructions > (or in Transmeta parlance: atoms and molecules (whoop dee doo)). Yep. > I'm assuming/hoping you can pack each instruction separately > and then combine them at the end, and for now do the final packing > (or initial unpacking for disassembly) outside of cgen. Yes, the packing is fairly trival (break the instructions into a 2 bit field and a 41 bit field, combine the 3 2-bit fields into 1 5-bit field, and the resultant 5-bit field, followed by the 3 41-bit fields make for a 128-bit combined instruction). In the instruction encoding, only the values 0, 1, and 2 are allowed. Labels will force padding to the next 128-bit boundary. The 1 86-bit instruction is treated as two separate 43-bit instructions. > > There are machines out there with 128 bit registers, such as the MIPS chip > > that is at the heart of the Sony playstation, the SES2 registers on the > > Pentium IV, and the Altivec registers on the newer Powerpcs. However, C > > compilers don't often times give 128 bit types. We might want to think > > about how to handle these machines as well. In terms of instruction size, I > > do have a 86 bit instruction which pushes the problem also. This may > > require using gmp if needed. Too bad, we aren't coding in C++, where we > > could just define a class type to get the extra precision. > > cgen based simulators (written in C) can already handle simulating > architectures with 64 bit values on hosts where the compiler doesn't > have long long (with C++ there's less of an issue). > Dunno how often it is used, so no claim is made that there isn't bitrot > or that it's complete, but it was tested way back when. > Grep for HAVE_LONGLONG in sim/common/cgen-types.h. > > Semantics modes are to some extent black boxes. > As new modes become needed we can add them. > A simulator on a host with a compiler that can't represent them > can represent them as a struct and provide the necessary > manipulators of that struct. (for c++ s/struct/class/ if you prefer) > No claim is made that the addition will be a walk in the park, > but that's the plan-of-record. > > > 4) As a nit, we use unsigned int for the hash type, and I suspect it might be > > cleaner if we had a cgen specific type for holding hash values (ie, > > cgen_hash_t). > > Sure. An increment in complication I was defering. > One might want to add to the name the context in which it is used. > Cgen might want to use different kinds of hashes in different contexts. > > > 5) As an experiment, I compiled cgen with -Wconversion, and it showed a lot of > > places where implicit signed<->unsigned conversions were going on. A lot of > > the places were using int to hold sizes like buffer lengths, and passing > > sizeof(...) to the value, and size_t would be more useful. Unfortunately it > > also shows other places where having a single type for the fields (such as > > long currently, or bfd_signed_vma/cgen_int_t possibly in the future). One > > of my thoughts is to have a union of an appropriate unsigned and signed > > types of the same size, and use the appropriate element in the expansion. > > Removing the warnings would certainly be a good idea, though this > particular warning doesn't always have a high signal/noise ratio. My first attempt at using a 64-bit type fails on the m32r since there are places I haven't caught yet where it stores a 32-bit unsigned value (which happens to hold a negative value) into a larger 64-bit item, and the sign doesn't extend correctly. I'm assuming as I go through the tedious task of fixing all of the warnings, it will show where I'm losing precision. > > 6) Using bfd_put_bits and bfd_get_bits to convert the bits into proper endian > > format only works for bit sizes of 8, 16, 32, and 64. In all other places, > > bfd aborts (my machine has mostly 43 bit instructions, and 1 86 bit > > instruction before the encoding mentioned in #7). It might be better to > > open code this, rather than falling back to the bfd functions. > > > > Another idea is to always encode instructions expressed as a series of bytes > > in big endian (or little endian) format, and then expect the final assembler > > encoding to do the appropriate copying. Otherwise, I see a lot of code that > > checks the endianess to get the correct byte. > > A final assembly pass to do the appropriate copying isn't necessarily > a slam-dunk. > > The asm/disasm side of cgen currently has two modes of representing > instructions: as an "int" in host byte order, or as a string of bytes > in target byte order. Yes I know, but in the code that handles the bytes rather than the integer case, I see ?: operations to get the correct endian orientation so you know whether to fetch a byte from the beginning or the end. > > 7) As I have mentioned in the past, my machine uses 3 43-bit instructions that > > are encoded into a 128 bit super instruction. Any ideas for the syntax for > > specifying the encode/decode operations? > > I'm not sure I understand. In what context? Basically where would be the proper place to add this (define-isa seems the logical canidate, though define-mach/define-cpu are other possibilities). I'm thinking something like the handlers option in define-operand. > > 8) The @arch@_cgen_hw_table uses (PTR) in initializing the asm_data field. > > This makes debugging harder. Would it be possible to have 2 fields so that > > each member is correctly typed, and you can print out pointers in the > > debugger? > > 2 fields? How would it look? > (it's certainly a useful thing to do, and I'd say go for it, > but I'm not clear what the result would look like) I'm going off of this comment in include/opcode/cgen.h: typedef struct { char *name; enum cgen_hw_type type; /* There is currently no example where both index specs and value specs are required, so for now both are clumped under "asm_data". */ enum cgen_asm_type asm_type; PTR asm_data; #ifndef CGEN_HW_NBOOL_ATTRS #define CGEN_HW_NBOOL_ATTRS 1 #endif CGEN_ATTR_TYPE (CGEN_HW_NBOOL_ATTRS) attrs; #define CGEN_HW_ATTRS(hw) (&(hw)->attrs) } CGEN_HW_ENTRY; I assume we would just have two fields, one for holding index specs, and the other for holding value specs. > Note that things are currently not totally hopeless. > One could print the value and then say "info sym ", > and then print the variable gdb gives. > > > So, suggestions on how you would like me to extend cgen to handle the problems > > my machine exposes? > > For assembly/disassembly, I need to think about it for a bit. > I think what we need to do is be able to handle each insn > individually and handle packing/unpacking outside of cgen (for now). > That reduces the problem to handling how "words" are layed out > in each individual insn(atom). Since we're dealing with 43 bit > entities (or 2*43 bits), I'm wondering if treating them as > 64 bit entities for packing/unpacking will work. It might once we use a 64-bit type. However, the long instruction still is a problem since it doesn't fit in an integral type. > (how the 2*43 bits case would be handled would depend on the details > I guess, maybe 2*64 bits or maybe 32+64 bits). > > I'm guessing studying how to handle ia64 would suffice. That was my first thought (the designer of my machine used the IA-64 as a model, so they are similar in some superficial ways). However, I've concluded that the IA-64 port is not a port that was completed, even if you build it on a 64-bit machine so that long is 64-bits. Among other things, it has no support for encoding the instrucitons, and so would fail in bfd_put_bits. > > My initial thoughts are to use a cgen specific type for the types. The first > > round would use bfd_vma/bfd_signed_vma, but eventually size the type based on > > the maximum size used in the .cpu file. I'm thinking of using the union with > > signed and unsigned fields, to deal with many of the conversion issues. > > If after reading the above you still think this is the way to go, > let's discuss it further. -- Michael Meissner email: gnu@the-meissners.org http://www.the-meissners.org