From mboxrd@z Thu Jan 1 00:00:00 1970 Content-Type: multipart/mixed; boundary="===============4640067985154178846==" MIME-Version: 1.0 From: Petr Machata To: elfutils-devel@lists.fedorahosted.org Subject: Plan for supporting .debug_macro Date: Mon, 08 Sep 2014 19:00:19 +0200 Message-ID: --===============4640067985154178846== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Hi there, I have a bit of rough-ish code implementing support for .debug_macro as well as the old-style .debug_macinfo. It's sitting on a brach pmachata/macro. Before I get around to splitting this into proper commit units, finishing everything up as it should be, and writing test cases, I'd like to outline my overall plan and some rationale behind it. My original plan was that dwarf_getmacros just transparently supports the new format, but that can't work. If it did, the extant users would get new-style macros and wrongly compare their opcodes to DW_MACINFO_* constants. So we definitely need at least one new interface, but maybe we could get away with hiding the old one in compat-versioned ABI and have the new one do its thing. But another point is transparent includes. There's an opcode that references another .debug_macro section. libdw being the rather low-level library that it is, I think the user should actually be able to see the opcode, and that means there should be at least two entry points--one for "gimme macros for this DIE" and one for "gimme macros for this offset". My current approach has these interfaces for iteration: /* Iterate through the macro section referenced by CUDIE and call CALLBACK for each macro information entry. Keeps iterating while CALLBACK returns DWARF_CB_OK. If the callback returns DWARF_CB_ABORT, it stops iterating and returns a token, which can later be passed to dwarf_getmacros_next to restart the iteration at the point where it stopped. Returns -1 for errors. */ extern ptrdiff_t dwarf_getmacros_die (Dwarf_Die *cudie, int (*callback) (Dwarf_Macro *, void *), void *arg) __nonnull_attribute__ (2); /* This is similar in operation to dwarf_getmacros_die, but iterates always through .debug_macro, and selects the section to iterate through by offset instead of by CU. This is used for handling DW_MACRO_GNU_transparent_include's or similar opcodes. The returned token can again be passed to dwarf_getmacros_next. */ extern ptrdiff_t dwarf_getmacros_addr (Dwarf *dbg, Dwarf_Off offset, int (*callback) (Dwarf_Macro *, void *), void *arg) __nonnull_attribute__ (3); /* Continue in iteration through a macro section. Use the token returned from dwarf_getmacros_die, dwarf_getmacros_addr or a previous invocation of dwarf_getmacros_next to continue the iteration. Returns -1 for errors. */ extern ptrdiff_t dwarf_getmacros_next (Dwarf *dbg, int (*callback) (Dwarf_Macro *, void *), void *arg, ptrdiff_t offset) __nonnull_attribute__ (2); ... but if people think this split to _die/_addr and _next is a bad idea, I can just inline the token into _die and _addr calls. My intention is that the current dwarf_getmacros interface would still exist, so that outstanding code keeps building and behaving correctly, though I don't suppose we have very many clients. But under the covers it uses all the new-style interfaces. If you squint enough, you can pretend that the old style format is like the new-style format, except with an implicit header that introduces a default mapping between opcodes and parameter description, and indeed that's how it's done. The new format is very generous as far as possible extensions go. Each section can have a table with description of format of parameters of individual opcodes. There's no real upper limit placed on number of parameters. The allowed formats are also very general--signed and unsigned values, direct and indirect strings, blocks, flags, and .debug_macro references. It very much seems to me that we just can't get away with the current approach based on dwarf_macro_paramN. I propose to piggy-back on existing dwarf_formX calls and presenting the macro arguments as attributes: /* Get number of parameters of MACRO and store it to *PARAMCNTP. */ extern int dwarf_macro_getparamcnt (Dwarf_Macro *macro, size_t *paramcntp); /* Get IDX-th parameter of MACRO, and stores it to *ATTRIBUTE. Returns 0 on success or -1 for errors. After a successful call, you can query ATTRIBUTE by dwarf_whatform to determine which of the dwarf_formX calls to make to get actual value out of ATTRIBUTE. Note that calling dwarf_whatattr is not meaningful for pseudo-attributes formed this way. */ extern int dwarf_macro_param (Dwarf_Macro *macro, size_t idx, Dwarf_Attribute *attribute); Making this work is admittedly a bit hacky. Many dwarf_formX functions need a reasonable attribute opcode (we put DW_AT_GNU_macros there, so that DW_FORM_sec_offset is decoded properly), and many also access CU fields and from there Dwarf, so we need to hook a fake CU in there to placate these uses. Other than this bit of ugliness, the interface seems rather clean and understandable to me. Clients already know how to decode attribute values, and might be able to hook the macro parameter decoding into the same framework easily enough. Of particular interest is this bit of spec: The macinfo entry types defined in this standard may, but might not, be described in the table, other macinfo entry types used in the section should be described there. So it is in fact legitimate to re-describe existing opcodes. Presumably this is provision for clients to keep working with as yet unsupported opcodes. My implementation allows the table to actually change the way an opcode is described. I think that is necessary. If the table claims that opcode X has three DW_FORM_data1 parameters, then that's what we better decode, otherwise the rest of the section will come as garbage. The corresponding code in readelf.c will need to be updated as well, I believe. That's not been done yet. dwarf_macro_param1 and dwarf_macro_param2 keep working where they used to work before. They piggy-back on dwarf_macro_getparamcnt and dwarf_macro_param, calling dwarf_formudata and dwarf_formstring as necessary. The question remains of whether to inline transparent includes or not. We do need both the _die and the _offset entry points, because apart from the standardized opcodes, there could be extensions with include-like semantics--after all, DW_FORM_sec_offset is a legal form for these macros. But should _die/_attr calls do the inlining themselves, or should there be another entry point, say dwarf_getmacros_die_integrate that does this? I lean towards the latter, but haven't written that dwarf_getmacros_die_integrate yet. The final two interfaces not yet mentioned are these: /* Return Dwarf version of this macro opcode. The versions are 0 for macro elements coming from DW_AT_macro_info, and 4 for macro elements coming from DW_AT_GNU_macros. It is expected that 5 and further will be for macro elements coming from standardized DW_AT_macros. */ extern int dwarf_macro_version (Dwarf_Macro *macro, unsigned int *versionp) __nonnull_attribute__ (2); /* Set *OFFP to .debug_line offset associated with the Dwarf macro section that this MACRO comes from. Returns -1 for errors or 0 for success. The offset is considered to be 0xffffffffffffffff (or (Dwarf_Off) -1) if .debug_line offset was not set in the section header, which will always be the case for .debug_macinfo macro sections, and may be the case for .debug_macro sections as well. This condition is not considered an error. */ extern int dwarf_macro_line_offset (Dwarf_Macro *macro, Dwarf_Off *offp) __nonnull_attribute__ (2); The first is for distinguishing macros coming from various standard levels, the latter for accessing optional line_offset header field. Both of these are bound to macros instead of iteration entry points. The reason for that is that when inside a callback, I need to decide whether what I'm looking at is a DW_MACRO_GNU_* or DW_MACINFO_* (or whatever). So I need the version right there and then. After the iteration entry point returns is too late. Similarly with the line table offset--if a macro entry comes from a transparently-included section, I need the macro entry to tell me where to look for mapping from file numbers to names, there's no other way around that. Hopefully this makes sense. As I said, it's all on the branch. If there are any comments, I'll work on integrating them to my branch. Otherwise I'll just dump the patch set on you soonish and we'll see how that goes. Thanks, Petr --===============4640067985154178846==--