public inbox for elfutils@sourceware.org
 help / color / mirror / Atom feed
From: KJ Tsanaktsidis <ktsanaktsidis@zendesk.com>
To: Mark Wielaard <mark@klomp.org>
Cc: elfutils-devel@sourceware.org
Subject: Re: dwarf_aggregate_size doesn't work with arrays in partial CUs
Date: Sun, 3 Oct 2021 16:05:22 +1100	[thread overview]
Message-ID: <CAJ7wOOu7Pmf6FoFmAYQX=tBqxxCdnx3+pjtx2=2-VQSzp9gc-w@mail.gmail.com> (raw)
In-Reply-To: <afebae258ba067f19c025661babc6c341efc49b5.camel@klomp.org>

On Thu, Sep 30, 2021 at 12:27 AM Mark Wielaard <mark@klomp.org> wrote:
>
> Hi KJ,
>
> On Sat, 2021-09-25 at 17:21 +1000, KJ Tsanaktsidis via Elfutils-devel
> wrote:
> > I'm writing a program that uses ptrace to poke at internal OpenSSL
> > data structures for another process. I'm using libdw to parse the
> > DWARF data for the copy of OpenSSL actually linked in to the target
> > process, so I can extract struct offsets, member sizes and the like
> > and poke at the right places.
> >
> > I've run into an issue where dwarf_aggregate_size can't calculate the
> > size of an array, when the array is included in a partial CU
> > (DW_TAG_partial_unit). If the array unit includes a DW_AT_upper_bound
> > attribute, but not a DW_AT_lower_bound attribute, then
> > dwarf_aggregate_size will infer the lower bound based on the
> > DW_AT_language attribute of the enclisng CU (i.e. whether the language
> > uses zero or one based indexing).
> >
> > However, the debug symbols I'm looking at for OpenSSL from the Ubuntu
> > repositories have the DW_AT_language on the full compilation unit
> > entries, but not in the partial ones included in them. This means that
> > caling dwarf_aggregate_size on the array type DIE does not work.
>
> That is indeed a problem, since dwarf_aggregate_size doesn't provide
> another way to provide the language to use for the
> dwarf_default_lower_bound call. And the default is to return an
> DWARF_E_UNKNOWN_LANGUAGE error.
>
> Maybe we should change the default to assume the lower bound is zero?
>
> > The DWARF spec doesn't really seem to have anything to say on the
> > matter (all it says is "A full or partial compilation unit entry may
> > have the following attributes", but doesn't say what it logically
> > means if an attribute is present on the complete CU but not a partial
> > one).
>
> I think it is assumed that it inherits those attributes from the CU
> from which the partial one was imported and/or from the CU of the DIE
> that referenced the DIE in the partial unit. But I don't think it is
> easy to track that with libdw currently.
>
> > I guess it doesn't really make sense for a single compilation unit to
> > contain multiple languages? So I wonder if dwarf_srclang (called by
> > dwarf_aggregate_size) should crawl through the list of CU's to see if
> > the DIE's CU is included in a CU that _does_ specify DW_AT_language
> > (recursively, I suppose). Then, we can infer that the partial CU's
> > language is the same as the enclosing one.
> >
> > If people reckon this is a good idea (or, have a better one!), I'm
> > happy to try and put together a patch.
>
> I think that suggestion is sound, but really expensive. It also is
> somewhat tricky if you have alt files, you'll have to track back to the
> original Dwarf to see if it imports one of the partial units from the
> alt file.
>
> But I also don't have a good alternative idea. We could maybe have a
> variant of dwarf_aggregate_size that takes a language default value,
> but that doesn't seem like a very generic solution. Or maybe a variant
> of dwarf_srclang that takes any DIE (not just a CU DIE) and which tries
> to figure out the best language to use, which falls back to some
> default value if it cannot figure out what the language is that can be
> used with dwarf_default_lower_bound to get a default (most likely
> zero)?
>
> We could also ask producers (like dwz) to always include a
> DW_AT_language for partial units they create. But that of course makes
> the partial units bigger (and at least dwz creates them to make the
> full debuginfo smaller).
>
> Cheers,
>
> Mark
>

I guess we don't want to hide some really expensive traversal
operation inside a simple call to dwarf_aggregate_size, no...

What if we instead provide a way for the user to specify what language
a CU is? Like "dwarf_cu_report_language(Dwarf_Die *cu, int lang)".
That would get saved with the (partial) CU, and dwarf_srclang could
retrieve this information (if DW_AT_language isn't set). Then, the
user could recursively traverse all CUs and call
dwarf_cu_report_language on each partial CU. And as a bonus, we could
even wrap that up in dwarf_cu_traverse_partial_cu_set_language or
something (OK, the name needs a bit of workshopping).

That way, the expensive thing is in a separate call that's marked as
being very expensive (and cached, so it only needs to be done once).
Sound like a reasonable approach?


  reply	other threads:[~2021-10-03  5:05 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-25  7:21 KJ Tsanaktsidis
2021-09-29 14:21 ` Mark Wielaard
2021-10-03  5:05   ` KJ Tsanaktsidis [this message]
2021-11-10 13:40     ` Mark Wielaard

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJ7wOOu7Pmf6FoFmAYQX=tBqxxCdnx3+pjtx2=2-VQSzp9gc-w@mail.gmail.com' \
    --to=ktsanaktsidis@zendesk.com \
    --cc=elfutils-devel@sourceware.org \
    --cc=mark@klomp.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).