public inbox for elfutils@sourceware.org
 help / color / mirror / Atom feed
From: Aaron Merey <amerey@redhat.com>
To: Mark Wielaard <mark@klomp.org>
Cc: elfutils-devel@sourceware.org
Subject: Re: [PATCH v2] dwarf_getaranges: Build aranges list from CUs instead of .debug_aranges
Date: Wed, 21 Feb 2024 22:19:10 -0500	[thread overview]
Message-ID: <CAJDtP-Ss_6b0PXP+ByB8dX5Kr=RotzZn5kGAvxW5r=KkC0rv2Q@mail.gmail.com> (raw)
In-Reply-To: <20240220222332.GB1666@gnu.wildebeest.org>

Hi Mark,

On Tue, Feb 20, 2024 at 5:23 PM Mark Wielaard <mark@klomp.org> wrote:
>
> > As for the number of aranges found, there is a difference for libxul.so:
> > 250435 with the patch compared to 254832 without.  So 4397 fewer aranges
> > are found when using the new CU iteration method.  I'll dig into this and
> > see if there is a problem or if it's just due to some redundancy in
> > libxul's .debug_aranges.  FWIW there was no change to the aranges counts
> > for the other modules searched during this eu-stack firefox corefile test.
>
> A quick way to see where the differences are is using
> eu-readelf --debug-dump=decodedaranges before/after your patch.
>
> This is opposite to what I expected. I had expected there to be more,
> instead of less ranges. The difference is less than 2%. But still
> interesting to know what/why.
>
> Were there any differences in the backtraces? If not, then those
> ranges might not actually have been mapping to code.

The backtraces were identical.  I took a closer look at this and the
difference is due to clang including DW_AT_location addresses in
.debug_aranges.  This patch just looks for DW_AT_{high,low}_pc and
DW_AT_ranges (via dwarf_ranges) when generating the aranges list,
so these DW_AT_location addresses aren't included.

According to David Blaikie [1], "GCC does not include globals in
debug_aranges, so they probably can't be relied upon unfortunately
(unfortunate that Clang pays the cost when it emits them, but consumers
can't rely on them)".

Since consumers already can't assume that .debug_aranges includes
DW_AT_location addresses, I think it's reasonable for us to not include
them when dynamically generating aranges.  Especially since there could
be a noticeable performance cost to doing so.

A separate issue I saw is that some entries in .debug_aranges with
a decoded start address of "000000000000000000" wouldn't always
have matching entries in .debug_ranges.  This resulted in some dynamic
aranges not matching their corresponding .debug_aranges entry.

For example, the following decoded arange is in libxul's .debug_aranges:

    start: 000000000000000000, length:    13, CU DIE offset: 95189496

In the .debug_info for this CU, there is a corresponding subprogram with
low_pc 0 and high_pc 13.

However in the .debug_ranges entry for this CU, there is no range starting
at address 0 with size of at least 13.  The closest range list entry is:

    range 1, 1
    +0x0000000000000001 <__ehdr_start+0x1>..
    +000000000000000000 <libxul.so>

When we dynamically generate aranges this results in the following
decoded arange:

    start: 0x0000000000000001, length:     0, CU DIE offset: 95189496

So the start address is off by 1 and the length doesn't match the
.debug_aranges entry.  In some similar cases there wouldn't even be
a corresponding "range 1, 1" entry in .debug_ranges.  I'm not yet sure
what's going on here.

>
> > > Might it be an idea to leave dwarf_getaranges as it is and introduce a
> > > new (internal) function to get "dynamic" ranges? It looks like what
> > > programs (like eu-stack and eu-addr2line) really use is dwarf_addrdie
> > > and dwfl_module_addrdie. These are currently build on dwarf_getaranges,
> > > but could maybe use a new interface?
> >
> > IMO this depends on what users expect from dwarf_getaranges.  Do they
> > want the exact contents of .debug_aranges (whether or not it's complete)
> > or should dwarf_getaranges go beyond .debug_aranges to ensure the most
> > complete results?
> >
> > The comment for dwarf_getaranges in libdw.h simply reads "Return list
> > address ranges".  Since there's no mention of .debug_aranges specifically,
> > I think it's fair if dwarf_getaranges does whatever it can to ensure
> > comprehensive results.  In which case dwarf_getaranges should probably
> > dynamically generate aranges.
>
> You might be right that no user really cares. But as seen in the
> eu-readelf code, it might also be that people expected it to map to
> the ranges from .debug_aranges.
>
> So I would be happier if we just kept the dwarf_getaranges code as
> is. And just change the code in dwarf_addrdie and dwfl_module_addrdie.
>
> We could then also introduce a new public function, dwarf_getdieranges
> (?) that does the new thing. But it doesn't have to be public on the
> first try as long as dwarf_addrdie and dwfl_module_addrdie work. (We
> might want to change the interface of dwarf_getdieranges so it can be
> "lazy" for example.)

Ok this approach seems like the most flexible.  Users can have both
.debug_aranges and CU-based aranges plus we don't have to change the
semantics of dwarf_getaranges.  I'll submit a revised patch for this.

Aaron

[1] https://reviews.llvm.org/D123538#inline-1188522


  reply	other threads:[~2024-02-22  3:19 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-07  1:35 [PATCH 0/2] dwarf_getaranges: Build aranges list from CUs Aaron Merey
2023-12-07  1:35 ` [PATCH 1/2] libdw: Use INTUSE with dwarf_get_units Aaron Merey
2023-12-21 23:56   ` Mark Wielaard
2023-12-22 21:02     ` Aaron Merey
2023-12-07  1:35 ` [PATCH 2/2] dwarf_getaranges: Build aranges list from CUs instead of .debug_aranges Aaron Merey
2023-12-11 23:18   ` [PATCH v2] " Aaron Merey
2024-02-13 13:28     ` Mark Wielaard
2024-02-20  4:20       ` Aaron Merey
2024-02-20 22:23         ` Mark Wielaard
2024-02-22  3:19           ` Aaron Merey [this message]
2024-02-22 15:35             ` Frank Ch. Eigler
2024-02-22 17:27               ` Aaron Merey

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJDtP-Ss_6b0PXP+ByB8dX5Kr=RotzZn5kGAvxW5r=KkC0rv2Q@mail.gmail.com' \
    --to=amerey@redhat.com \
    --cc=elfutils-devel@sourceware.org \
    --cc=mark@klomp.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).