* dwarf_aggregate_size doesn't work with arrays in partial CUs @ 2021-09-25 7:21 KJ Tsanaktsidis 2021-09-29 14:21 ` Mark Wielaard 0 siblings, 1 reply; 4+ messages in thread From: KJ Tsanaktsidis @ 2021-09-25 7:21 UTC (permalink / raw) To: elfutils-devel Hi folks, I'm writing a program that uses ptrace to poke at internal OpenSSL data structures for another process. I'm using libdw to parse the DWARF data for the copy of OpenSSL actually linked in to the target process, so I can extract struct offsets, member sizes and the like and poke at the right places. I've run into an issue where dwarf_aggregate_size can't calculate the size of an array, when the array is included in a partial CU (DW_TAG_partial_unit). If the array unit includes a DW_AT_upper_bound attribute, but not a DW_AT_lower_bound attribute, then dwarf_aggregate_size will infer the lower bound based on the DW_AT_language attribute of the enclisng CU (i.e. whether the language uses zero or one based indexing). However, the debug symbols I'm looking at for OpenSSL from the Ubuntu repositories have the DW_AT_language on the full compilation unit entries, but not in the partial ones included in them. This means that caling dwarf_aggregate_size on the array type DIE does not work. The DWARF spec doesn't really seem to have anything to say on the matter (all it says is "A full or partial compilation unit entry may have the following attributes", but doesn't say what it logically means if an attribute is present on the complete CU but not a partial one). I guess it doesn't really make sense for a single compilation unit to contain multiple languages? So I wonder if dwarf_srclang (called by dwarf_aggregate_size) should crawl through the list of CU's to see if the DIE's CU is included in a CU that _does_ specify DW_AT_language (recursively, I suppose). Then, we can infer that the partial CU's language is the same as the enclosing one. If people reckon this is a good idea (or, have a better one!), I'm happy to try and put together a patch. KJ ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: dwarf_aggregate_size doesn't work with arrays in partial CUs 2021-09-25 7:21 dwarf_aggregate_size doesn't work with arrays in partial CUs KJ Tsanaktsidis @ 2021-09-29 14:21 ` Mark Wielaard 2021-10-03 5:05 ` KJ Tsanaktsidis 0 siblings, 1 reply; 4+ messages in thread From: Mark Wielaard @ 2021-09-29 14:21 UTC (permalink / raw) To: KJ Tsanaktsidis, elfutils-devel Hi KJ, On Sat, 2021-09-25 at 17:21 +1000, KJ Tsanaktsidis via Elfutils-devel wrote: > I'm writing a program that uses ptrace to poke at internal OpenSSL > data structures for another process. I'm using libdw to parse the > DWARF data for the copy of OpenSSL actually linked in to the target > process, so I can extract struct offsets, member sizes and the like > and poke at the right places. > > I've run into an issue where dwarf_aggregate_size can't calculate the > size of an array, when the array is included in a partial CU > (DW_TAG_partial_unit). If the array unit includes a DW_AT_upper_bound > attribute, but not a DW_AT_lower_bound attribute, then > dwarf_aggregate_size will infer the lower bound based on the > DW_AT_language attribute of the enclisng CU (i.e. whether the language > uses zero or one based indexing). > > However, the debug symbols I'm looking at for OpenSSL from the Ubuntu > repositories have the DW_AT_language on the full compilation unit > entries, but not in the partial ones included in them. This means that > caling dwarf_aggregate_size on the array type DIE does not work. That is indeed a problem, since dwarf_aggregate_size doesn't provide another way to provide the language to use for the dwarf_default_lower_bound call. And the default is to return an DWARF_E_UNKNOWN_LANGUAGE error. Maybe we should change the default to assume the lower bound is zero? > The DWARF spec doesn't really seem to have anything to say on the > matter (all it says is "A full or partial compilation unit entry may > have the following attributes", but doesn't say what it logically > means if an attribute is present on the complete CU but not a partial > one). I think it is assumed that it inherits those attributes from the CU from which the partial one was imported and/or from the CU of the DIE that referenced the DIE in the partial unit. But I don't think it is easy to track that with libdw currently. > I guess it doesn't really make sense for a single compilation unit to > contain multiple languages? So I wonder if dwarf_srclang (called by > dwarf_aggregate_size) should crawl through the list of CU's to see if > the DIE's CU is included in a CU that _does_ specify DW_AT_language > (recursively, I suppose). Then, we can infer that the partial CU's > language is the same as the enclosing one. > > If people reckon this is a good idea (or, have a better one!), I'm > happy to try and put together a patch. I think that suggestion is sound, but really expensive. It also is somewhat tricky if you have alt files, you'll have to track back to the original Dwarf to see if it imports one of the partial units from the alt file. But I also don't have a good alternative idea. We could maybe have a variant of dwarf_aggregate_size that takes a language default value, but that doesn't seem like a very generic solution. Or maybe a variant of dwarf_srclang that takes any DIE (not just a CU DIE) and which tries to figure out the best language to use, which falls back to some default value if it cannot figure out what the language is that can be used with dwarf_default_lower_bound to get a default (most likely zero)? We could also ask producers (like dwz) to always include a DW_AT_language for partial units they create. But that of course makes the partial units bigger (and at least dwz creates them to make the full debuginfo smaller). Cheers, Mark ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: dwarf_aggregate_size doesn't work with arrays in partial CUs 2021-09-29 14:21 ` Mark Wielaard @ 2021-10-03 5:05 ` KJ Tsanaktsidis 2021-11-10 13:40 ` Mark Wielaard 0 siblings, 1 reply; 4+ messages in thread From: KJ Tsanaktsidis @ 2021-10-03 5:05 UTC (permalink / raw) To: Mark Wielaard; +Cc: elfutils-devel On Thu, Sep 30, 2021 at 12:27 AM Mark Wielaard <mark@klomp.org> wrote: > > Hi KJ, > > On Sat, 2021-09-25 at 17:21 +1000, KJ Tsanaktsidis via Elfutils-devel > wrote: > > I'm writing a program that uses ptrace to poke at internal OpenSSL > > data structures for another process. I'm using libdw to parse the > > DWARF data for the copy of OpenSSL actually linked in to the target > > process, so I can extract struct offsets, member sizes and the like > > and poke at the right places. > > > > I've run into an issue where dwarf_aggregate_size can't calculate the > > size of an array, when the array is included in a partial CU > > (DW_TAG_partial_unit). If the array unit includes a DW_AT_upper_bound > > attribute, but not a DW_AT_lower_bound attribute, then > > dwarf_aggregate_size will infer the lower bound based on the > > DW_AT_language attribute of the enclisng CU (i.e. whether the language > > uses zero or one based indexing). > > > > However, the debug symbols I'm looking at for OpenSSL from the Ubuntu > > repositories have the DW_AT_language on the full compilation unit > > entries, but not in the partial ones included in them. This means that > > caling dwarf_aggregate_size on the array type DIE does not work. > > That is indeed a problem, since dwarf_aggregate_size doesn't provide > another way to provide the language to use for the > dwarf_default_lower_bound call. And the default is to return an > DWARF_E_UNKNOWN_LANGUAGE error. > > Maybe we should change the default to assume the lower bound is zero? > > > The DWARF spec doesn't really seem to have anything to say on the > > matter (all it says is "A full or partial compilation unit entry may > > have the following attributes", but doesn't say what it logically > > means if an attribute is present on the complete CU but not a partial > > one). > > I think it is assumed that it inherits those attributes from the CU > from which the partial one was imported and/or from the CU of the DIE > that referenced the DIE in the partial unit. But I don't think it is > easy to track that with libdw currently. > > > I guess it doesn't really make sense for a single compilation unit to > > contain multiple languages? So I wonder if dwarf_srclang (called by > > dwarf_aggregate_size) should crawl through the list of CU's to see if > > the DIE's CU is included in a CU that _does_ specify DW_AT_language > > (recursively, I suppose). Then, we can infer that the partial CU's > > language is the same as the enclosing one. > > > > If people reckon this is a good idea (or, have a better one!), I'm > > happy to try and put together a patch. > > I think that suggestion is sound, but really expensive. It also is > somewhat tricky if you have alt files, you'll have to track back to the > original Dwarf to see if it imports one of the partial units from the > alt file. > > But I also don't have a good alternative idea. We could maybe have a > variant of dwarf_aggregate_size that takes a language default value, > but that doesn't seem like a very generic solution. Or maybe a variant > of dwarf_srclang that takes any DIE (not just a CU DIE) and which tries > to figure out the best language to use, which falls back to some > default value if it cannot figure out what the language is that can be > used with dwarf_default_lower_bound to get a default (most likely > zero)? > > We could also ask producers (like dwz) to always include a > DW_AT_language for partial units they create. But that of course makes > the partial units bigger (and at least dwz creates them to make the > full debuginfo smaller). > > Cheers, > > Mark > I guess we don't want to hide some really expensive traversal operation inside a simple call to dwarf_aggregate_size, no... What if we instead provide a way for the user to specify what language a CU is? Like "dwarf_cu_report_language(Dwarf_Die *cu, int lang)". That would get saved with the (partial) CU, and dwarf_srclang could retrieve this information (if DW_AT_language isn't set). Then, the user could recursively traverse all CUs and call dwarf_cu_report_language on each partial CU. And as a bonus, we could even wrap that up in dwarf_cu_traverse_partial_cu_set_language or something (OK, the name needs a bit of workshopping). That way, the expensive thing is in a separate call that's marked as being very expensive (and cached, so it only needs to be done once). Sound like a reasonable approach? ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: dwarf_aggregate_size doesn't work with arrays in partial CUs 2021-10-03 5:05 ` KJ Tsanaktsidis @ 2021-11-10 13:40 ` Mark Wielaard 0 siblings, 0 replies; 4+ messages in thread From: Mark Wielaard @ 2021-11-10 13:40 UTC (permalink / raw) To: KJ Tsanaktsidis; +Cc: elfutils-devel Hi KJ, On Sun, 2021-10-03 at 16:05 +1100, KJ Tsanaktsidis via Elfutils-devel wrote: > I guess we don't want to hide some really expensive traversal > operation inside a simple call to dwarf_aggregate_size, no... > > What if we instead provide a way for the user to specify what > language > a CU is? Like "dwarf_cu_report_language(Dwarf_Die *cu, int lang)". > That would get saved with the (partial) CU, and dwarf_srclang could > retrieve this information (if DW_AT_language isn't set). Then, the > user could recursively traverse all CUs and call > dwarf_cu_report_language on each partial CU. And as a bonus, we could > even wrap that up in dwarf_cu_traverse_partial_cu_set_language or > something (OK, the name needs a bit of workshopping). > > That way, the expensive thing is in a separate call that's marked as > being very expensive (and cached, so it only needs to be done once). > Sound like a reasonable approach? Sorry for forgetting about this discussion. I do think the above makes sense. I opened a bug to track this: https://sourceware.org/bugzilla/show_bug.cgi?id=28578 Cheers, Mark ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2021-11-10 13:40 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-09-25 7:21 dwarf_aggregate_size doesn't work with arrays in partial CUs KJ Tsanaktsidis 2021-09-29 14:21 ` Mark Wielaard 2021-10-03 5:05 ` KJ Tsanaktsidis 2021-11-10 13:40 ` Mark Wielaard
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).