From mboxrd@z Thu Jan  1 00:00:00 1970
Content-Type: multipart/mixed; boundary="===============4640067985154178846=="
MIME-Version: 1.0
From: Petr Machata <pmachata@redhat.com>
To: elfutils-devel@lists.fedorahosted.org
Subject: Plan for supporting .debug_macro
Date: Mon, 08 Sep 2014 19:00:19 +0200
Message-ID: <m2a96a59p8.fsf@redhat.com>

--===============4640067985154178846==
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable

Hi there,

I have a bit of rough-ish code implementing support for .debug_macro as
well as the old-style .debug_macinfo.  It's sitting on a brach
pmachata/macro.  Before I get around to splitting this into proper
commit units, finishing everything up as it should be, and writing test
cases, I'd like to outline my overall plan and some rationale behind it.

My original plan was that dwarf_getmacros just transparently supports
the new format, but that can't work.  If it did, the extant users would
get new-style macros and wrongly compare their opcodes to DW_MACINFO_*
constants.  So we definitely need at least one new interface, but maybe
we could get away with hiding the old one in compat-versioned ABI and
have the new one do its thing.

But another point is transparent includes.  There's an opcode that
references another .debug_macro section.  libdw being the rather
low-level library that it is, I think the user should actually be able
to see the opcode, and that means there should be at least two entry
points--one for "gimme macros for this DIE" and one for "gimme macros
for this offset".  My current approach has these interfaces for
iteration:

/* Iterate through the macro section referenced by CUDIE and call
   CALLBACK for each macro information entry.  Keeps iterating while
   CALLBACK returns DWARF_CB_OK.  If the callback returns
   DWARF_CB_ABORT, it stops iterating and returns a token, which can
   later be passed to dwarf_getmacros_next to restart the iteration at
   the point where it stopped.  Returns -1 for errors.  */
extern ptrdiff_t dwarf_getmacros_die (Dwarf_Die *cudie,
				      int (*callback) (Dwarf_Macro *, void *),
				      void *arg)
     __nonnull_attribute__ (2);

/* This is similar in operation to dwarf_getmacros_die, but iterates
   always through .debug_macro, and selects the section to iterate
   through by offset instead of by CU.  This is used for handling
   DW_MACRO_GNU_transparent_include's or similar opcodes.  The
   returned token can again be passed to dwarf_getmacros_next.  */
extern ptrdiff_t dwarf_getmacros_addr (Dwarf *dbg, Dwarf_Off offset,
				       int (*callback) (Dwarf_Macro *, void *),
				       void *arg)
  __nonnull_attribute__ (3);

/* Continue in iteration through a macro section.  Use the token
   returned from dwarf_getmacros_die, dwarf_getmacros_addr or a
   previous invocation of dwarf_getmacros_next to continue the
   iteration.  Returns -1 for errors.  */
extern ptrdiff_t dwarf_getmacros_next (Dwarf *dbg,
				       int (*callback) (Dwarf_Macro *, void *),
				       void *arg, ptrdiff_t offset)
  __nonnull_attribute__ (2);

... but if people think this split to _die/_addr and _next is a bad
idea, I can just inline the token into _die and _addr calls.

My intention is that the current dwarf_getmacros interface would still
exist, so that outstanding code keeps building and behaving correctly,
though I don't suppose we have very many clients.  But under the covers
it uses all the new-style interfaces.  If you squint enough, you can
pretend that the old style format is like the new-style format, except
with an implicit header that introduces a default mapping between
opcodes and parameter description, and indeed that's how it's done.

The new format is very generous as far as possible extensions go.  Each
section can have a table with description of format of parameters of
individual opcodes.  There's no real upper limit placed on number of
parameters.  The allowed formats are also very general--signed and
unsigned values, direct and indirect strings, blocks, flags, and
.debug_macro references.  It very much seems to me that we just can't
get away with the current approach based on dwarf_macro_paramN.  I
propose to piggy-back on existing dwarf_formX calls and presenting the
macro arguments as attributes:

/* Get number of parameters of MACRO and store it to *PARAMCNTP.  */
extern int dwarf_macro_getparamcnt (Dwarf_Macro *macro, size_t *paramcntp);

/* Get IDX-th parameter of MACRO, and stores it to *ATTRIBUTE.
   Returns 0 on success or -1 for errors.

   After a successful call, you can query ATTRIBUTE by dwarf_whatform
   to determine which of the dwarf_formX calls to make to get actual
   value out of ATTRIBUTE.  Note that calling dwarf_whatattr is not
   meaningful for pseudo-attributes formed this way.  */
extern int dwarf_macro_param (Dwarf_Macro *macro, size_t idx,
			      Dwarf_Attribute *attribute);

Making this work is admittedly a bit hacky.  Many dwarf_formX functions
need a reasonable attribute opcode (we put DW_AT_GNU_macros there, so
that DW_FORM_sec_offset is decoded properly), and many also access CU
fields and from there Dwarf, so we need to hook a fake CU in there to
placate these uses.  Other than this bit of ugliness, the interface
seems rather clean and understandable to me.  Clients already know how
to decode attribute values, and might be able to hook the macro
parameter decoding into the same framework easily enough.

Of particular interest is this bit of spec:

  The macinfo entry types defined in this standard may, but might not,
  be described in the table, other macinfo entry types used in the
  section should be described there.

So it is in fact legitimate to re-describe existing opcodes.  Presumably
this is provision for clients to keep working with as yet unsupported
opcodes.  My implementation allows the table to actually change the way
an opcode is described.  I think that is necessary.  If the table claims
that opcode X has three DW_FORM_data1 parameters, then that's what we
better decode, otherwise the rest of the section will come as garbage.
The corresponding code in readelf.c will need to be updated as well, I
believe.  That's not been done yet.

dwarf_macro_param1 and dwarf_macro_param2 keep working where they used
to work before.  They piggy-back on dwarf_macro_getparamcnt and
dwarf_macro_param, calling dwarf_formudata and dwarf_formstring as
necessary.

The question remains of whether to inline transparent includes or not.
We do need both the _die and the _offset entry points, because apart
from the standardized opcodes, there could be extensions with
include-like semantics--after all, DW_FORM_sec_offset is a legal form
for these macros.  But should _die/_attr calls do the inlining
themselves, or should there be another entry point, say
dwarf_getmacros_die_integrate that does this?  I lean towards the
latter, but haven't written that dwarf_getmacros_die_integrate yet.

The final two interfaces not yet mentioned are these:

/* Return Dwarf version of this macro opcode.  The versions are 0 for
   macro elements coming from DW_AT_macro_info, and 4 for macro
   elements coming from DW_AT_GNU_macros.  It is expected that 5 and
   further will be for macro elements coming from standardized
   DW_AT_macros.  */
extern int dwarf_macro_version (Dwarf_Macro *macro, unsigned int *versionp)
  __nonnull_attribute__ (2);

/* Set *OFFP to .debug_line offset associated with the Dwarf macro
   section that this MACRO comes from.  Returns -1 for errors or 0 for
   success.

   The offset is considered to be 0xffffffffffffffff (or (Dwarf_Off)
   -1) if .debug_line offset was not set in the section header, which
   will always be the case for .debug_macinfo macro sections, and may
   be the case for .debug_macro sections as well.  This condition is
   not considered an error.  */
extern int dwarf_macro_line_offset (Dwarf_Macro *macro, Dwarf_Off *offp)
  __nonnull_attribute__ (2);

The first is for distinguishing macros coming from various standard
levels, the latter for accessing optional line_offset header field.
Both of these are bound to macros instead of iteration entry points.
The reason for that is that when inside a callback, I need to decide
whether what I'm looking at is a DW_MACRO_GNU_* or DW_MACINFO_* (or
whatever).  So I need the version right there and then.  After the
iteration entry point returns is too late.  Similarly with the line
table offset--if a macro entry comes from a transparently-included
section, I need the macro entry to tell me where to look for mapping
from file numbers to names, there's no other way around that.

Hopefully this makes sense.  As I said, it's all on the branch.  If
there are any comments, I'll work on integrating them to my branch.
Otherwise I'll just dump the patch set on you soonish and we'll see how
that goes.

Thanks,
Petr

--===============4640067985154178846==--