Re: Detecting separate debuginfo

public inbox for elfutils@sourceware.org
 help / color / mirror / Atom feed

* Re: Detecting separate debuginfo
@ 2014-04-04  9:35 Florian Weimer
  0 siblings, 0 replies; 3+ messages in thread
From: Florian Weimer @ 2014-04-04  9:35 UTC (permalink / raw)
  To: elfutils-devel

[-- Attachment #1: Type: text/plain, Size: 2702 bytes --]

On 03/30/2014 11:23 AM, Mark Wielaard wrote:
> On Fri, Mar 28, 2014 at 02:00:49PM +0100, Florian Weimer wrote:
>> I maintain a database which extracts symbol information from ELF objects
>> (among other things).  I would like to enrich that with DWARF producer
>> data, and perhaps additional DWARF information in the future.
>>
>> I'd really like to avoid importing the ELF symbol information twice,
>> once from the real object file, and once from the separate debuginfo.
>
> Note that in general the main ELF file contains only a subset of the ELF
> symbols in .dynsym, or the compressed .gnu_debugdata section (which only
> contains function symbols), that the separate .debugdata file contains
> in the full .symtab. To get the full symbol table you can ignore main
> ELF file if you know there should be a corresponding .debug file (because
> the main ELF file has a .gnu_debuglink section).

Thanks, I only load the .dynsym section now.  It's not present in 
separate debuginfo, so it keeps the data duplication to a minimum right now.

> You can use the build-id to check whether two files describe the same
> module. Use eu-unstrip -n -e <file> to see it and the possible separate
> .debug file if it has some (that will show the file based location, but
> at least you know whether it should exist).

Indeed, I was already keeping that.

>> Based on the previous discussion around program interpreter reporting in
>> readelf, there is no easy way to detect separate debuginfo to trigger
>> special processing for it (e.g., do not extract symbols, only
>> DW_at_producer data).
>
> It isn't easy to detect whether the program headers of an ELF file are
> valid, although Roland suggested an heuristic to detect if they are.

BTW, I verified the program interpreter heuristic in readelf and it 
really works well.  It doesn't suppress any valid interpreters, and the 
code no longer reports any garbage interpreters, either.  I checked this 
Fedora and CentOS 5 binaries.

> But it is easy to detect whether a file had debuginfo (just check for a
> .debug_info section or just try opening the Elf with libdw dwarf_begin).
> And if it doesn't then just check to see if there is a .gnu_debuglink
> section to see if it has separate debuginfo (and a separate full symbol
> table).

Ahh, good points.  I'm doing both right now, attempting dwarf_begin and 
looking at .gnu_debuglink.

I think these suggestions helped me to solve my immediate needs.  I 
started using libdw and see some unexpected results (mostly missing data 
for attributes).  I guess I'll be back soon with more questions.

-- 
Florian Weimer / Red Hat Product Security Team

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Detecting separate debuginfo
@ 2014-03-30  9:23 Mark Wielaard
  0 siblings, 0 replies; 3+ messages in thread
From: Mark Wielaard @ 2014-03-30  9:23 UTC (permalink / raw)
  To: elfutils-devel

[-- Attachment #1: Type: text/plain, Size: 3803 bytes --]

On Fri, Mar 28, 2014 at 02:00:49PM +0100, Florian Weimer wrote:
> I maintain a database which extracts symbol information from ELF objects 
> (among other things).  I would like to enrich that with DWARF producer 
> data, and perhaps additional DWARF information in the future.
> 
> I'd really like to avoid importing the ELF symbol information twice, 
> once from the real object file, and once from the separate debuginfo.

Note that in general the main ELF file contains only a subset of the ELF
symbols in .dynsym, or the compressed .gnu_debugdata section (which only
contains function symbols), that the separate .debugdata file contains
in the full .symtab. To get the full symbol table you can ignore main
ELF file if you know there should be a corresponding .debug file (because
the main ELF file has a .gnu_debuglink section).

> The database performs content-based deduplication, this means I do not 
> have path name information during extraction.  This mean I cannot use 
> file system paths to disambiguate the real thing and its debugging 
> information.  Both files are loaded separately and not necessarily@
> the same time.  I don't want to change that if possible because this 
> would result in a scalability issue eventually.  I don't want to assume 
> that *all* debuginfo data has been separated, either.

You can use the build-id to check whether two files describe the same
module. Use eu-unstrip -n -e <file> to see it and the possible separate
.debug file if it has some (that will show the file based location, but
at least you know whether it should exist).

> Based on the previous discussion around program interpreter reporting in 
> readelf, there is no easy way to detect separate debuginfo to trigger 
> special processing for it (e.g., do not extract symbols, only 
> DW_at_producer data).

It isn't easy to detect whether the program headers of an ELF file are
valid, although Roland suggested an heuristic to detect if they are.
But it is easy to detect whether a file had debuginfo (just check for a
.debug_info section or just try opening the Elf with libdw dwarf_begin).
And if it doesn't then just check to see if there is a .gnu_debuglink
section to see if it has separate debuginfo (and a separate full symbol
table).

> One thing that would help me as well if there is a way to get the exact 
> same set of exported symbols from the real file and its separate 
> debuginfo.  The I could deduplicate based on that, and processing both 
> files would not matter anymore.  eu-readelf shows quite different output 
> for the two files, so I'm not sure how to achieve that.

Yes, as explained above the main ELF file, if it has separate debuginfo
will also have its full symbol table (.symtab) in the .debug file. So the
main ELF file will just contain the minimal .dynsym symbols needed at
runtime (and might have some extra function symbols in the .gnu_debugdata
section see https://sourceware.org/gdb/onlinedocs/gdb/MiniDebugInfo.html
you can read that compressed section with eu-readelf --elf-section).

> I don't actually use eu-readelf output (but my extraction code is 
> derived from it), and I'm open to suggestions to look(a)particular 
> sections/headers to get matching output.  I'm mainly interested in 
> public symbols and undefined symbols.  Internal symbols from debugging 
> information could be ignored for the time being.

I think you just want the .dynsym symbols then. eu-readelf -s will show
either or both of .dynsym and .symtab if it exists. eu-nm -D will only
show the dynamic symbols. If the ELF file isn't stripped it will have both.
A separate .debug file will only have the full .symtab table (the .dynsym
section will have NOBITS set).

Hope that helps,

Mark

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Detecting separate debuginfo
@ 2014-03-28 13:00 Florian Weimer
  0 siblings, 0 replies; 3+ messages in thread
From: Florian Weimer @ 2014-03-28 13:00 UTC (permalink / raw)
  To: elfutils-devel

[-- Attachment #1: Type: text/plain, Size: 1814 bytes --]

I maintain a database which extracts symbol information from ELF objects 
(among other things).  I would like to enrich that with DWARF producer 
data, and perhaps additional DWARF information in the future.

I'd really like to avoid importing the ELF symbol information twice, 
once from the real object file, and once from the separate debuginfo.

The database performs content-based deduplication, this means I do not 
have path name information during extraction.  This mean I cannot use 
file system paths to disambiguate the real thing and its debugging 
information.  Both files are loaded separately and not necessarily at 
the same time.  I don't want to change that if possible because this 
would result in a scalability issue eventually.  I don't want to assume 
that *all* debuginfo data has been separated, either.

Based on the previous discussion around program interpreter reporting in 
readelf, there is no easy way to detect separate debuginfo to trigger 
special processing for it (e.g., do not extract symbols, only 
DW_at_producer data).

One thing that would help me as well if there is a way to get the exact 
same set of exported symbols from the real file and its separate 
debuginfo.  The I could deduplicate based on that, and processing both 
files would not matter anymore.  eu-readelf shows quite different output 
for the two files, so I'm not sure how to achieve that.

I don't actually use eu-readelf output (but my extraction code is 
derived from it), and I'm open to suggestions to look at particular 
sections/headers to get matching output.  I'm mainly interested in 
public symbols and undefined symbols.  Internal symbols from debugging 
information could be ignored for the time being.

-- 
Florian Weimer / Red Hat Product Security Team

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2014-04-04  9:35 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-04  9:35 Detecting separate debuginfo Florian Weimer
  -- strict thread matches above, loose matches on Subject: below --
2014-03-30  9:23 Mark Wielaard
2014-03-28 13:00 Florian Weimer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).