public inbox for elfutils@sourceware.org
 help / color / mirror / Atom feed
* dwarf_aggregate_size doesn't work with arrays in partial CUs
@ 2021-09-25  7:21 KJ Tsanaktsidis
  2021-09-29 14:21 ` Mark Wielaard
  0 siblings, 1 reply; 4+ messages in thread
From: KJ Tsanaktsidis @ 2021-09-25  7:21 UTC (permalink / raw)
  To: elfutils-devel

Hi folks,

I'm writing a program that uses ptrace to poke at internal OpenSSL
data structures for another process. I'm using libdw to parse the
DWARF data for the copy of OpenSSL actually linked in to the target
process, so I can extract struct offsets, member sizes and the like
and poke at the right places.

I've run into an issue where dwarf_aggregate_size can't calculate the
size of an array, when the array is included in a partial CU
(DW_TAG_partial_unit). If the array unit includes a DW_AT_upper_bound
attribute, but not a DW_AT_lower_bound attribute, then
dwarf_aggregate_size will infer the lower bound based on the
DW_AT_language attribute of the enclisng CU (i.e. whether the language
uses zero or one based indexing).

However, the debug symbols I'm looking at for OpenSSL from the Ubuntu
repositories have the DW_AT_language on the full compilation unit
entries, but not in the partial ones included in them. This means that
caling dwarf_aggregate_size on the array type DIE does not work.

The DWARF spec doesn't really seem to have anything to say on the
matter (all it says is "A full or partial compilation unit entry may
have the following attributes", but doesn't say what it logically
means if an attribute is present on the complete CU but not a partial
one).

I guess it doesn't really make sense for a single compilation unit to
contain multiple languages? So I wonder if dwarf_srclang (called by
dwarf_aggregate_size) should crawl through the list of CU's to see if
the DIE's CU is included in a CU that _does_ specify DW_AT_language
(recursively, I suppose). Then, we can infer that the partial CU's
language is the same as the enclosing one.

If people reckon this is a good idea (or, have a better one!), I'm
happy to try and put together a patch.

KJ


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: dwarf_aggregate_size doesn't work with arrays in partial CUs
  2021-09-25  7:21 dwarf_aggregate_size doesn't work with arrays in partial CUs KJ Tsanaktsidis
@ 2021-09-29 14:21 ` Mark Wielaard
  2021-10-03  5:05   ` KJ Tsanaktsidis
  0 siblings, 1 reply; 4+ messages in thread
From: Mark Wielaard @ 2021-09-29 14:21 UTC (permalink / raw)
  To: KJ Tsanaktsidis, elfutils-devel

Hi KJ,

On Sat, 2021-09-25 at 17:21 +1000, KJ Tsanaktsidis via Elfutils-devel
wrote:
> I'm writing a program that uses ptrace to poke at internal OpenSSL
> data structures for another process. I'm using libdw to parse the
> DWARF data for the copy of OpenSSL actually linked in to the target
> process, so I can extract struct offsets, member sizes and the like
> and poke at the right places.
> 
> I've run into an issue where dwarf_aggregate_size can't calculate the
> size of an array, when the array is included in a partial CU
> (DW_TAG_partial_unit). If the array unit includes a DW_AT_upper_bound
> attribute, but not a DW_AT_lower_bound attribute, then
> dwarf_aggregate_size will infer the lower bound based on the
> DW_AT_language attribute of the enclisng CU (i.e. whether the language
> uses zero or one based indexing).
> 
> However, the debug symbols I'm looking at for OpenSSL from the Ubuntu
> repositories have the DW_AT_language on the full compilation unit
> entries, but not in the partial ones included in them. This means that
> caling dwarf_aggregate_size on the array type DIE does not work.

That is indeed a problem, since dwarf_aggregate_size doesn't provide
another way to provide the language to use for the
dwarf_default_lower_bound call. And the default is to return an
DWARF_E_UNKNOWN_LANGUAGE error.

Maybe we should change the default to assume the lower bound is zero?

> The DWARF spec doesn't really seem to have anything to say on the
> matter (all it says is "A full or partial compilation unit entry may
> have the following attributes", but doesn't say what it logically
> means if an attribute is present on the complete CU but not a partial
> one).

I think it is assumed that it inherits those attributes from the CU
from which the partial one was imported and/or from the CU of the DIE
that referenced the DIE in the partial unit. But I don't think it is
easy to track that with libdw currently.

> I guess it doesn't really make sense for a single compilation unit to
> contain multiple languages? So I wonder if dwarf_srclang (called by
> dwarf_aggregate_size) should crawl through the list of CU's to see if
> the DIE's CU is included in a CU that _does_ specify DW_AT_language
> (recursively, I suppose). Then, we can infer that the partial CU's
> language is the same as the enclosing one.
> 
> If people reckon this is a good idea (or, have a better one!), I'm
> happy to try and put together a patch.

I think that suggestion is sound, but really expensive. It also is
somewhat tricky if you have alt files, you'll have to track back to the
original Dwarf to see if it imports one of the partial units from the
alt file.

But I also don't have a good alternative idea. We could maybe have a
variant of dwarf_aggregate_size that takes a language default value,
but that doesn't seem like a very generic solution. Or maybe a variant
of dwarf_srclang that takes any DIE (not just a CU DIE) and which tries
to figure out the best language to use, which falls back to some
default value if it cannot figure out what the language is that can be
used with dwarf_default_lower_bound to get a default (most likely
zero)?

We could also ask producers (like dwz) to always include a
DW_AT_language for partial units they create. But that of course makes
the partial units bigger (and at least dwz creates them to make the
full debuginfo smaller).

Cheers,

Mark

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: dwarf_aggregate_size doesn't work with arrays in partial CUs
  2021-09-29 14:21 ` Mark Wielaard
@ 2021-10-03  5:05   ` KJ Tsanaktsidis
  2021-11-10 13:40     ` Mark Wielaard
  0 siblings, 1 reply; 4+ messages in thread
From: KJ Tsanaktsidis @ 2021-10-03  5:05 UTC (permalink / raw)
  To: Mark Wielaard; +Cc: elfutils-devel

On Thu, Sep 30, 2021 at 12:27 AM Mark Wielaard <mark@klomp.org> wrote:
>
> Hi KJ,
>
> On Sat, 2021-09-25 at 17:21 +1000, KJ Tsanaktsidis via Elfutils-devel
> wrote:
> > I'm writing a program that uses ptrace to poke at internal OpenSSL
> > data structures for another process. I'm using libdw to parse the
> > DWARF data for the copy of OpenSSL actually linked in to the target
> > process, so I can extract struct offsets, member sizes and the like
> > and poke at the right places.
> >
> > I've run into an issue where dwarf_aggregate_size can't calculate the
> > size of an array, when the array is included in a partial CU
> > (DW_TAG_partial_unit). If the array unit includes a DW_AT_upper_bound
> > attribute, but not a DW_AT_lower_bound attribute, then
> > dwarf_aggregate_size will infer the lower bound based on the
> > DW_AT_language attribute of the enclisng CU (i.e. whether the language
> > uses zero or one based indexing).
> >
> > However, the debug symbols I'm looking at for OpenSSL from the Ubuntu
> > repositories have the DW_AT_language on the full compilation unit
> > entries, but not in the partial ones included in them. This means that
> > caling dwarf_aggregate_size on the array type DIE does not work.
>
> That is indeed a problem, since dwarf_aggregate_size doesn't provide
> another way to provide the language to use for the
> dwarf_default_lower_bound call. And the default is to return an
> DWARF_E_UNKNOWN_LANGUAGE error.
>
> Maybe we should change the default to assume the lower bound is zero?
>
> > The DWARF spec doesn't really seem to have anything to say on the
> > matter (all it says is "A full or partial compilation unit entry may
> > have the following attributes", but doesn't say what it logically
> > means if an attribute is present on the complete CU but not a partial
> > one).
>
> I think it is assumed that it inherits those attributes from the CU
> from which the partial one was imported and/or from the CU of the DIE
> that referenced the DIE in the partial unit. But I don't think it is
> easy to track that with libdw currently.
>
> > I guess it doesn't really make sense for a single compilation unit to
> > contain multiple languages? So I wonder if dwarf_srclang (called by
> > dwarf_aggregate_size) should crawl through the list of CU's to see if
> > the DIE's CU is included in a CU that _does_ specify DW_AT_language
> > (recursively, I suppose). Then, we can infer that the partial CU's
> > language is the same as the enclosing one.
> >
> > If people reckon this is a good idea (or, have a better one!), I'm
> > happy to try and put together a patch.
>
> I think that suggestion is sound, but really expensive. It also is
> somewhat tricky if you have alt files, you'll have to track back to the
> original Dwarf to see if it imports one of the partial units from the
> alt file.
>
> But I also don't have a good alternative idea. We could maybe have a
> variant of dwarf_aggregate_size that takes a language default value,
> but that doesn't seem like a very generic solution. Or maybe a variant
> of dwarf_srclang that takes any DIE (not just a CU DIE) and which tries
> to figure out the best language to use, which falls back to some
> default value if it cannot figure out what the language is that can be
> used with dwarf_default_lower_bound to get a default (most likely
> zero)?
>
> We could also ask producers (like dwz) to always include a
> DW_AT_language for partial units they create. But that of course makes
> the partial units bigger (and at least dwz creates them to make the
> full debuginfo smaller).
>
> Cheers,
>
> Mark
>

I guess we don't want to hide some really expensive traversal
operation inside a simple call to dwarf_aggregate_size, no...

What if we instead provide a way for the user to specify what language
a CU is? Like "dwarf_cu_report_language(Dwarf_Die *cu, int lang)".
That would get saved with the (partial) CU, and dwarf_srclang could
retrieve this information (if DW_AT_language isn't set). Then, the
user could recursively traverse all CUs and call
dwarf_cu_report_language on each partial CU. And as a bonus, we could
even wrap that up in dwarf_cu_traverse_partial_cu_set_language or
something (OK, the name needs a bit of workshopping).

That way, the expensive thing is in a separate call that's marked as
being very expensive (and cached, so it only needs to be done once).
Sound like a reasonable approach?


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: dwarf_aggregate_size doesn't work with arrays in partial CUs
  2021-10-03  5:05   ` KJ Tsanaktsidis
@ 2021-11-10 13:40     ` Mark Wielaard
  0 siblings, 0 replies; 4+ messages in thread
From: Mark Wielaard @ 2021-11-10 13:40 UTC (permalink / raw)
  To: KJ Tsanaktsidis; +Cc: elfutils-devel

Hi KJ,

On Sun, 2021-10-03 at 16:05 +1100, KJ Tsanaktsidis via Elfutils-devel
wrote:
> I guess we don't want to hide some really expensive traversal
> operation inside a simple call to dwarf_aggregate_size, no...
> 
> What if we instead provide a way for the user to specify what
> language
> a CU is? Like "dwarf_cu_report_language(Dwarf_Die *cu, int lang)".
> That would get saved with the (partial) CU, and dwarf_srclang could
> retrieve this information (if DW_AT_language isn't set). Then, the
> user could recursively traverse all CUs and call
> dwarf_cu_report_language on each partial CU. And as a bonus, we could
> even wrap that up in dwarf_cu_traverse_partial_cu_set_language or
> something (OK, the name needs a bit of workshopping).
> 
> That way, the expensive thing is in a separate call that's marked as
> being very expensive (and cached, so it only needs to be done once).
> Sound like a reasonable approach?

Sorry for forgetting about this discussion. I do think the above makes
sense. I opened a bug to track this:
https://sourceware.org/bugzilla/show_bug.cgi?id=28578

Cheers,

Mark

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-11-10 13:40 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-25  7:21 dwarf_aggregate_size doesn't work with arrays in partial CUs KJ Tsanaktsidis
2021-09-29 14:21 ` Mark Wielaard
2021-10-03  5:05   ` KJ Tsanaktsidis
2021-11-10 13:40     ` Mark Wielaard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).