public inbox for elfutils@sourceware.org
 help / color / mirror / Atom feed
From: David Blaikie <dblaikie@gmail.com>
To: Mark Wielaard <mark@klomp.org>
Cc: Fangrui Song <maskray@google.com>,
	gdb@sourceware.org, elfutils-devel@sourceware.org,
	 binutils@sourceware.org, Alan Modra <amodra@gmail.com>
Subject: Re: Tombstone values in debug sections (was: Range lists, zero-length functions, linker gc)
Date: Fri, 19 Jun 2020 18:02:50 -0700	[thread overview]
Message-ID: <CAENS6EtnT+fg6+9uZ9WgYE7Mx_YHOAHW2Uf7xBfLQuKxo86mhg@mail.gmail.com> (raw)
In-Reply-To: <69e4e7a60b23bff32d88b3edd2a718cf2f6e8cdc.camel@klomp.org>

On Fri, Jun 19, 2020 at 1:04 PM Mark Wielaard <mark@klomp.org> wrote:
>
> Hi,
>
> On Tue, 2020-06-09 at 13:24 -0700, Fangrui Song via Elfutils-devel wrote:
> > I want to revive the thread, but focus on whether a tombstone value
> > (-1/-2) in .debug_* can cause trouble to various DWARF consumers (gdb,
> > debug related tools in elfutils and other utilities I don't know about).
> >
> > Paul Robinson has proposed that DWARF v6 should reserve a tombstone
> > value  (the value a relocation referencing a discarded symbol in a
> > .debug_* section should be resolved to)
> > http://www.dwarfstd.org/ShowIssue.php?issue=200609.1
>
> I would appreciate having a clear "not valid" marker instead of getting
> a possibly bogus (but valid) address. -1 seems a reasonable value.
> Although I have seen (and written) code that simply assumes zero is
> that value.

Yep - and zero seemed like a good one - except in debug_ranges and
debug_loc where that would produce a premature list termination
(bfd.ld gets around this by using 1 in debug_ranges) - or on
architectures for which 0 is a valid address.
if you use the zero+addend approach that gold uses (and lld did
use/maybe still does, but is going to move away from) then you
/almost/ avoid the need to special case debug_ranges and debug_loc,
until you hit a zero-length function (you can create zero-length
functions from code like "int f1() { }" or "void f2() {
__builtin_unreachable(); }") - then you get the early list termination
again
Also zero+addend might trip up in a case like: "void f1() { }
__attribute__((nodebug)) void f2() { } void f3() { }" - now f3's
starting address has a non-zero addend, so it's indistinguishable from
valid code at a very low address


> Would such an invalid address marker in an DW_AT_low_pc make the whole
> program scope under a DIE invalid? What about (addr, loc, rng) base
> addresses? Can they contain an invalid marker, does that make the whole
> table/range invalid?

That would be my intent, yes - any pointer derived from an invalid
address would be invalid. Take the f1/f2/f3 nodebug example above -
f3's starting address could be described by "invalid address + offset"
(currently DWARF has no way of describing this - well, it sort of
does, you could use an exprloc with an OP_addrx and the arithmetic
necessary to add to that - though I doubt many consumers could handle
an exprloc there - but I would like to champion that to enable reuse
of address pool entries to reduce the size of .o debug info
contributions when using Split DWARF - or just reduce the number of
relocations/.o file size when using non-split DWARF), so it'd be
important for that to be special cased in pointer arithmetic so the
tombstone value propagates through arithmetic.

> I must admit that as a DWARF consumer I am slightly worried that having
> a sanctioned "invalid marker" will cause DWARF producers to just not
> coordinate and simply assume they can always invalidate anything they
> emit.

At least in my experience (8 years or so working on LLVM's DWARF
emission) we've got a pretty strong incentive to reduce DWARF size
already - I don't think any producers are being particularly cavalier
about producing excess DWARF on the basis that it can be marked
invalid.

> Even if there could be a real solution by coordinating between
> compiler/linker who is responsible for producing the valid DWARF
> entries (especially when LTO is involved).

A lot of engineering work went into restructuring LLVM's debug info IR
representation for LTO to ensure LLVM doesn't produce DWARF for
functions deduplicated or dropped by LTO.

- Dave

>
> > Some comments about the proposal:
> >
> > > - deduplicating different functions with identical content; GNU
> > > refers
> > >   to this as ICF (Identical Code Folding);
> >
> > ICF (gold --icf={safe,all}) can cause DW_TAG_subprogram with
> > different DW_AT_name to have the same range.
>
> Cary Coutant wrote up a general Two-Level Line Number Table proposal to
> address the issue of having a single machine instruction corresponds to
> more than one source statement:
> http://wiki.dwarfstd.org/index.php?title=TwoLevelLineTables
>
> Which seems useful in these kind of situations. But I don't know the
> current status of the proposal.

This was motivated by a desire to be able to do symbolized stack
traces including inline stack frames with a smaller representation
than is currently possible in DWARF - it allows the line table itself
to describe inlining, to some degree, rather than relying on the DIE
tree (in part this was motivated by a desire to be able to symbolized
backtraces with inlining in-process when Split DWARF is used and the
.dwo/.dwp files are not available).

I don't think it extends to dealing with the case of deduplication
like this - nor addresses the possibility of two CUs having
overlapping instruction ranges. (it's semantically roughly equivalent
to the inlined_subroutines of a subprogram - not so much related to
two copies of a function being deduplicated & then being shared by
CUs)
- Dave

  reply	other threads:[~2020-06-20  1:03 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-31 18:55 Range lists, zero-length functions, linker gc Fangrui Song
2020-05-31 19:15 ` Fangrui Song
2020-05-31 20:10 ` Mark Wielaard
2020-05-31 20:47   ` Fangrui Song
2020-05-31 22:11     ` Mark Wielaard
2020-05-31 23:17       ` David Blaikie
2020-05-31 20:49   ` David Blaikie
2020-05-31 22:29     ` Mark Wielaard
2020-05-31 22:36       ` David Blaikie
2020-06-01  9:31         ` Mark Wielaard
2020-06-01 20:18           ` David Blaikie
2020-06-02 16:50             ` Mark Wielaard
2020-06-02 18:06               ` David Blaikie
2020-06-03  3:10                 ` Alan Modra
2020-06-03  4:06                   ` Fangrui Song
2020-06-03 21:50                   ` David Blaikie
2020-06-09 20:24                     ` Tombstone values in debug sections (was: Range lists, zero-length functions, linker gc) Fangrui Song
2020-06-19 20:04                       ` Mark Wielaard
2020-06-20  1:02                         ` David Blaikie [this message]
2020-06-19 12:00                 ` Range lists, zero-length functions, linker gc Mark Wielaard
2020-06-20  0:46                   ` David Blaikie
2020-06-24 22:21                     ` Mark Wielaard
2020-06-25 23:45                       ` David Blaikie
2020-05-31 21:33 ` David Blaikie
2020-06-01 16:25 ` Andrew Burgess
2021-09-27 14:00 Tombstone values in debug sections (was: Range lists, zero-length functions, linker gc) Fangrui Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAENS6EtnT+fg6+9uZ9WgYE7Mx_YHOAHW2Uf7xBfLQuKxo86mhg@mail.gmail.com \
    --to=dblaikie@gmail.com \
    --cc=amodra@gmail.com \
    --cc=binutils@sourceware.org \
    --cc=elfutils-devel@sourceware.org \
    --cc=gdb@sourceware.org \
    --cc=mark@klomp.org \
    --cc=maskray@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).