From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 16592 invoked by alias); 25 Mar 2011 16:44:55 -0000 Received: (qmail 16577 invoked by uid 22791); 25 Mar 2011 16:44:52 -0000 X-SWARE-Spam-Status: No, hits=-1.8 required=5.0 tests=AWL,BAYES_00 X-Spam-Check-By: sourceware.org Received: from toast.topped-with-meat.com (HELO topped-with-meat.com) (168.75.111.31) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 25 Mar 2011 16:44:42 +0000 Received: by topped-with-meat.com (Postfix, from userid 5281) id A09BC2C15B; Fri, 25 Mar 2011 09:44:40 -0700 (PDT) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit From: Roland McGrath To: Jakub Jelinek X-Fcc: ~/Mail/dwarf Cc: Richard Henderson , Jason Merrill , Cary Coutant , gcc-patches@gcc.gnu.org, Jan Kratochvil , Tom Tromey , Mark Wielaard Subject: Re: [RFC PATCH] Typed DWARF stack In-Reply-To: Jakub Jelinek's message of Friday, 25 March 2011 12:32:37 +0100 <20110325113237.GY18914@tyan-ft48-01.lab.bos.redhat.com> References: <20110325113237.GY18914@tyan-ft48-01.lab.bos.redhat.com> Message-Id: <20110325164440.A09BC2C15B@topped-with-meat.com> Date: Fri, 25 Mar 2011 16:48:00 -0000 X-CMAE-Score: 0 X-CMAE-Analysis: v=2.0 cv=K6VZ71qI c=1 sm=1 a=idiWna3vi0QA:10 a=kj9zAlcOel0A:10 a=JmaSJLzREIgCNMBnS3sA:9 a=EAppbe0aAmDtC7odeQcA:7 a=gvhqnzPMqgV6HOJVvZ0S_AiS6D4A:4 a=CjuIK1q_8ugA:10 a=eoVJXbw0gjJWFLWk:21 a=8WzLmSmwEbmha6c0:21 a=WkljmVdYkabdwxfqvArNOQ==:117 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2011-03/txt/msg01769.txt.bz2 It's been a while since I read Cary's proposal, and I am no longer likely to do much work of my own in this area. So I'll just respond at the high level. I like very much the essential notion of the stack being of typed entities with no specification of how the consumer actually implements it. In particular, this also brings implicit-pointer into the fold as a new type of a stack element, i.e. "imaginary pointer". Obviously only the operations that amount to addition/subtraction with an integer actually make sense on such a stack element. This makes the second operand of DW_OP(GNU_)implicit_pointer superfluous (and hence the common case of zero a byte smaller), since nonzero cases can just be followed by an appropriate DW_OP_plus_uconst or suchlike. To make clear that this should be supported, the specification wording would have to be somewhat more abstract than suggesting that a union of target bit-pattern types of various sizes would suffice. But it does not seem much of a stretch to me, and IMHO it's most appropriate that DWARF not say very much about consumer implementation. This is more of a stretch, but IMHO it would also make sense to exploit this "typed" concept to roll in the distinction between values and locations. This notion is not very well-formed, but perhaps worth investigating. That is, one could consider DW_OP_reg* a stack operation that pushes the location in a register. Implicitly, traditional operations push the location in an address. But DW_OP_stack_value "pops" the address location and "pushes" the address literal as a value. If the TOS at the end of an expression is a location, then it's a mutable location, otherwise it's a value. When an expression is used in a value-only context (DW_AT_frame_base, etc.) then TOS addresses are always (identity-)converted integer values and TOS registers are always fetched values. This way of thinking is natural to me, and it makes the DW_AT_frame_base specification a natural and straightforward instance of a general thing rather than being a one-off special case. But perhaps it is too convoluted for other people. As to encoding, I have a fancy idea that I discussed off the cuff at the Summit with Jakub and Richard, and still quite like, though I haven't fleshed it out any more than I did then. I hope to inspire someone else to actually want to implement it. It's rather more ambitious than the things that Jakub will just add while the rest of us sleep, so I wouldn't suggest that such DW_OP_GNU_* extensions be delayed for this plan. But perhaps it can become coherent enough and get done enough to seriously propose it for DWARF 5. The basic notion is extreme extensibility for DWARF operations, done in a way that could yield in practice even smaller encodings that we get today, while supporting an arbitrarily larger vocabulary of operations. It's modelled on the DIE encoding, using abbrevs. Add a section ".debug_opabbrev". A chunk of this would be pointed to by a CU's DW_AT_op_list attribute--or it could be a CU header field, but there's not much reason to go that route, except perhaps if such a CU would always be version=5 anyway so as to make old consumers know that they don't know how to inerpret expressions therein. (It seems very unlikely that any .debug_frame/.eh_frame DW_CFA_*expression would ever need to go beyond the DWARF 4 operation vocabuluary. But if they do, that header could be likewise extended with a .{debug,eh}_opabbrev pointer. Though that's pretty clearly beyond the pale for introspective in-process .eh_frame decoding.) The presence of this attribute signals that this opabbrev table controls the interpretation of all expressions found in that CU (i.e. in its DW_FORM_exprloc attributes and in any .debug_loc lists it points to). The opabbrev table is much like the DIE abbrev table. It maps an opcode to an operation and a list of operand encodings. Each operand is indicated by a DW_FORM_* encoding and the list terminated with zero, as with DIE abbrevs. For generality, opcodes would be ULEB128 and that seems fine since any one CU using more than 127 seems like an outlier. But if real-world CU's might not too infrequently use 128-255 different opcodes, then a table header byte could give the encoding of opcodes too, so it might be DW_FORM_data1 instead to optimize the packing of such a case. I'm somewhat undecided on how best to encode operations in an opabbrev table. My first instinct is towards easy extensibility, by encoding each as a pair of DW_FORM_strp, being "family" and "operation". (Perhaps for compactness a table would contain several runs of subtables, each subtable using a single family string with only the operation string in each subtable element that defines a particular opcode.) Then family is either unspecified and conventional (perhaps instead called "vendor", but I prefer "family" so as to indicate side standards on common families), or perhaps a domain name, reverse domain name, or URL (where might live a pointer to the full specification of the family's operation names). The owner of a given family name defines the set of operation names valid therein, their permissible operand list lengths and encodings, and their meanings. As with DIE abbrevs, the two key features of this scheme are compactness of exprloc blocks given wise encoding choices, and foreign extensions that are always structurally comprehensible to all consumers even when they don't know how to interpret them semantically. There are some other things I like about it too. There's no longer any need for separate operations distinguished only by their operand encodings. e.g., you need just one operation "constant" and that can be used in opabbrev's each with a DW_FORM_data* operand, in others with a DW_FORM_block* operand, etc. This makes the family specification of operations IMHO more natural and focussed on semantics, though it has to be very clear about the semantics of extension of operands to word size and so forth. Similarly, "breg" can be one operation that sometimes has one operand (register number) and sometimes has two (register number and offset), to use the optimal encoding without the extra zero byte now common on a DW_OP_breg*. I haven't considered a special encoding for regN/bregN style ranges of opcodes; something like that could make sense to keep the opabbrev table size down, but this probably doesn't matter in comparison to keeping the actual exprloc blocks small--it could just be a normal opabbrev making one opcode be operation "reg2" and another "reg17", or an extension like DW_FORM_direct that says an operand value appears directly in the abbrev and has no encoding in the op itself (which I've been considering for attribute values in DIE abbrevs too). This also opens the possibility of defining a different operation family like "ieee754" where its "add" and "mul" operations et al are explicitly defined with unambiguous semantics. IMHO this is far better than the current plan of overloading the single DW_OP_add et al to have implicit semantics based on the types of the top two stack elements. In a first implementation, a compiler could just use a single canned opabbrev table. (It could emit that in COMDAT, or even just as an undefined symbol provided inside a .debug_opabbrev section in some libgcc.a object.) In fact, such a table could be trivially written out retroactively to describe all the DWARF 4 opcodes and extant GNU extensions with DW_FORM_data1 encoding for opcodes and appropriate operand lists. Post-compile DWARF processors (linkers, compressors, rewriters) could tack that onto today's object files to make them fully navigable by non-GNU consumers without breaking compatibility with existing tools that grok the GNU extension formats today. A fancy implementation would choose opabbrev's carefully based on what its CU's really use, so as to pack the smallest possible actual exprloc blocks. As I say, I'm not really working on this stuff any more except maybe for (as yet wholly absent) spare time. But, food for thought. Thanks, Roland