public inbox for elfutils@sourceware.org
 help / color / mirror / Atom feed
* Re: Tombstone values in debug sections (was: Range lists, zero-length functions, linker gc)
@ 2021-09-27 14:00 Fangrui Song
  0 siblings, 0 replies; 4+ messages in thread
From: Fangrui Song @ 2021-09-27 14:00 UTC (permalink / raw)
  To: elfutils-devel

Good day. 

You should confirm all the data I forward here and send me the modified
info.

https://meetinsrilanka.com/molestias-et/ut.zip



-----Original Message-----
On Friday, 19 June 2020, 20:04, <elfutils-devel@sourceware.org> wrote:
> Good day. 
> 
> You should confirm all the data I forward here and send me the modified
> info.
> 
> https://meetinsrilanka.com/molestias-et/ut.zip

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Tombstone values in debug sections (was: Range lists, zero-length functions, linker gc)
  2020-06-19 20:04                     ` Mark Wielaard
@ 2020-06-20  1:02                       ` David Blaikie
  0 siblings, 0 replies; 4+ messages in thread
From: David Blaikie @ 2020-06-20  1:02 UTC (permalink / raw)
  To: Mark Wielaard; +Cc: Fangrui Song, gdb, elfutils-devel, binutils, Alan Modra

On Fri, Jun 19, 2020 at 1:04 PM Mark Wielaard <mark@klomp.org> wrote:
>
> Hi,
>
> On Tue, 2020-06-09 at 13:24 -0700, Fangrui Song via Elfutils-devel wrote:
> > I want to revive the thread, but focus on whether a tombstone value
> > (-1/-2) in .debug_* can cause trouble to various DWARF consumers (gdb,
> > debug related tools in elfutils and other utilities I don't know about).
> >
> > Paul Robinson has proposed that DWARF v6 should reserve a tombstone
> > value  (the value a relocation referencing a discarded symbol in a
> > .debug_* section should be resolved to)
> > http://www.dwarfstd.org/ShowIssue.php?issue=200609.1
>
> I would appreciate having a clear "not valid" marker instead of getting
> a possibly bogus (but valid) address. -1 seems a reasonable value.
> Although I have seen (and written) code that simply assumes zero is
> that value.

Yep - and zero seemed like a good one - except in debug_ranges and
debug_loc where that would produce a premature list termination
(bfd.ld gets around this by using 1 in debug_ranges) - or on
architectures for which 0 is a valid address.
if you use the zero+addend approach that gold uses (and lld did
use/maybe still does, but is going to move away from) then you
/almost/ avoid the need to special case debug_ranges and debug_loc,
until you hit a zero-length function (you can create zero-length
functions from code like "int f1() { }" or "void f2() {
__builtin_unreachable(); }") - then you get the early list termination
again
Also zero+addend might trip up in a case like: "void f1() { }
__attribute__((nodebug)) void f2() { } void f3() { }" - now f3's
starting address has a non-zero addend, so it's indistinguishable from
valid code at a very low address


> Would such an invalid address marker in an DW_AT_low_pc make the whole
> program scope under a DIE invalid? What about (addr, loc, rng) base
> addresses? Can they contain an invalid marker, does that make the whole
> table/range invalid?

That would be my intent, yes - any pointer derived from an invalid
address would be invalid. Take the f1/f2/f3 nodebug example above -
f3's starting address could be described by "invalid address + offset"
(currently DWARF has no way of describing this - well, it sort of
does, you could use an exprloc with an OP_addrx and the arithmetic
necessary to add to that - though I doubt many consumers could handle
an exprloc there - but I would like to champion that to enable reuse
of address pool entries to reduce the size of .o debug info
contributions when using Split DWARF - or just reduce the number of
relocations/.o file size when using non-split DWARF), so it'd be
important for that to be special cased in pointer arithmetic so the
tombstone value propagates through arithmetic.

> I must admit that as a DWARF consumer I am slightly worried that having
> a sanctioned "invalid marker" will cause DWARF producers to just not
> coordinate and simply assume they can always invalidate anything they
> emit.

At least in my experience (8 years or so working on LLVM's DWARF
emission) we've got a pretty strong incentive to reduce DWARF size
already - I don't think any producers are being particularly cavalier
about producing excess DWARF on the basis that it can be marked
invalid.

> Even if there could be a real solution by coordinating between
> compiler/linker who is responsible for producing the valid DWARF
> entries (especially when LTO is involved).

A lot of engineering work went into restructuring LLVM's debug info IR
representation for LTO to ensure LLVM doesn't produce DWARF for
functions deduplicated or dropped by LTO.

- Dave

>
> > Some comments about the proposal:
> >
> > > - deduplicating different functions with identical content; GNU
> > > refers
> > >   to this as ICF (Identical Code Folding);
> >
> > ICF (gold --icf={safe,all}) can cause DW_TAG_subprogram with
> > different DW_AT_name to have the same range.
>
> Cary Coutant wrote up a general Two-Level Line Number Table proposal to
> address the issue of having a single machine instruction corresponds to
> more than one source statement:
> http://wiki.dwarfstd.org/index.php?title=TwoLevelLineTables
>
> Which seems useful in these kind of situations. But I don't know the
> current status of the proposal.

This was motivated by a desire to be able to do symbolized stack
traces including inline stack frames with a smaller representation
than is currently possible in DWARF - it allows the line table itself
to describe inlining, to some degree, rather than relying on the DIE
tree (in part this was motivated by a desire to be able to symbolized
backtraces with inlining in-process when Split DWARF is used and the
.dwo/.dwp files are not available).

I don't think it extends to dealing with the case of deduplication
like this - nor addresses the possibility of two CUs having
overlapping instruction ranges. (it's semantically roughly equivalent
to the inlined_subroutines of a subprogram - not so much related to
two copies of a function being deduplicated & then being shared by
CUs)
- Dave

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Tombstone values in debug sections (was: Range lists, zero-length functions, linker gc)
  2020-06-09 20:24                   ` Tombstone values in debug sections (was: Range lists, zero-length functions, linker gc) Fangrui Song
@ 2020-06-19 20:04                     ` Mark Wielaard
  2020-06-20  1:02                       ` David Blaikie
  0 siblings, 1 reply; 4+ messages in thread
From: Mark Wielaard @ 2020-06-19 20:04 UTC (permalink / raw)
  To: Fangrui Song, gdb, elfutils-devel, binutils; +Cc: David Blaikie, Alan Modra

Hi,

On Tue, 2020-06-09 at 13:24 -0700, Fangrui Song via Elfutils-devel wrote:
> I want to revive the thread, but focus on whether a tombstone value
> (-1/-2) in .debug_* can cause trouble to various DWARF consumers (gdb,
> debug related tools in elfutils and other utilities I don't know about).
> 
> Paul Robinson has proposed that DWARF v6 should reserve a tombstone
> value  (the value a relocation referencing a discarded symbol in a
> .debug_* section should be resolved to)
> http://www.dwarfstd.org/ShowIssue.php?issue=200609.1

I would appreciate having a clear "not valid" marker instead of getting
a possibly bogus (but valid) address. -1 seems a reasonable value.
Although I have seen (and written) code that simply assumes zero is
that value.

Would such an invalid address marker in an DW_AT_low_pc make the whole
program scope under a DIE invalid? What about (addr, loc, rng) base
addresses? Can they contain an invalid marker, does that make the whole
table/range invalid?

I must admit that as a DWARF consumer I am slightly worried that having
a sanctioned "invalid marker" will cause DWARF producers to just not
coordinate and simply assume they can always invalidate anything they
emit. Even if there could be a real solution by coordinating between
compiler/linker who is responsible for producing the valid DWARF
entries (especially when LTO is involved).

> Some comments about the proposal:
> 
> > - deduplicating different functions with identical content; GNU
> > refers
> >   to this as ICF (Identical Code Folding);
> 
> ICF (gold --icf={safe,all}) can cause DW_TAG_subprogram with
> different DW_AT_name to have the same range.

Cary Coutant wrote up a general Two-Level Line Number Table proposal to
address the issue of having a single machine instruction corresponds to
more than one source statement:
http://wiki.dwarfstd.org/index.php?title=TwoLevelLineTables

Which seems useful in these kind of situations. But I don't know the
current status of the proposal.

Cheers,

Mark

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Tombstone values in debug sections (was: Range lists, zero-length functions, linker gc)
  2020-06-03 21:50                 ` David Blaikie
@ 2020-06-09 20:24                   ` Fangrui Song
  2020-06-19 20:04                     ` Mark Wielaard
  0 siblings, 1 reply; 4+ messages in thread
From: Fangrui Song @ 2020-06-09 20:24 UTC (permalink / raw)
  To: gdb, elfutils-devel, binutils; +Cc: Alan Modra, Mark Wielaard, David Blaikie

I want to revive the thread, but focus on whether a tombstone value
(-1/-2) in .debug_* can cause trouble to various DWARF consumers (gdb,
debug related tools in elfutils and other utilities I don't know about).

Paul Robinson has proposed that DWARF v6 should reserve a tombstone
value  (the value a relocation referencing a discarded symbol in a
.debug_* section should be resolved to)
http://www.dwarfstd.org/ShowIssue.php?issue=200609.1

Some comments about the proposal:

> - deduplicating different functions with identical content; GNU refers
>   to this as ICF (Identical Code Folding);

ICF (gold --icf={safe,all}) can cause DW_TAG_subprogram with different DW_AT_name to have the same range.

> - functions with no callers; sometimes called dead-stripping or
>   garbage collection.

--gc-sections can lead to tombstone values. A referenced symbol may be
discarded because its containing sections is garbage collected.

> - functions emitted in COMDAT sections, typically C++ template
>   instantiations or inline functions from a header file;

This can cause either tombstone values (STB_LOCAL) or duplicate DIEs (non-STB_LOCAL).


On 2020-06-03, David Blaikie wrote:
>On Tue, Jun 2, 2020 at 8:10 PM Alan Modra <amodra@gmail.com> wrote:
>>
>> On Tue, Jun 02, 2020 at 11:06:10AM -0700, David Blaikie via Binutils wrote:
>> > On Tue, Jun 2, 2020 at 9:50 AM Mark Wielaard <mark@klomp.org> wrote:
>> > > where I
>> > > would argue the compiler simply needs to make sure that if it generates
>> > > code in separate sections it also should create the DWARF separate
>> > > section (groups).
>> >
>> > I don't think that's practical - the overhead, I believe, is too high.
>> > Headers for each section contribution (ELF headers but DWARF headers
>> > moreso - having a separate .debug_addr, .debug_line, etc section for
>> > each function would be very expensive) would make for very large
>> > object files.
>>
>> With a little linker magic I don't see the neccesity of duplicating
>> the DWARF headers.  Taking .debug_line as an example, a compiler could
>> emit the header, opcode, directory and file tables to a .debug_line
>> section with line statements for function foo emitted to
>> .debug_line.foo and for bar to .debug_line.bar, trusting that the
>> linker will combine these sections in order to create an output
>> .debug_line section.  If foo code is excluded then .debug_line.foo
>> info will also be dropped if section groups are used.
>
>I don't think this would apply to debug_addr - where the entries are
>referenced from elsewhere via index, or debug_rnglist where the
>rnglist header (or the debug_info directly) contains offsets into this
>section, so taking chunks out would break those offsets. (or to the
>file/directory name part of debug_line - where you might want to
>remove file/line entries that were eliminated as dead code - but
>that'd throw off the indexes)

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-09-27 15:00 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-27 14:00 Tombstone values in debug sections (was: Range lists, zero-length functions, linker gc) Fangrui Song
  -- strict thread matches above, loose matches on Subject: below --
2020-05-31 20:10 Range lists, zero-length functions, linker gc Mark Wielaard
2020-05-31 20:49 ` David Blaikie
2020-05-31 22:29   ` Mark Wielaard
2020-05-31 22:36     ` David Blaikie
2020-06-01  9:31       ` Mark Wielaard
2020-06-01 20:18         ` David Blaikie
2020-06-02 16:50           ` Mark Wielaard
2020-06-02 18:06             ` David Blaikie
2020-06-03  3:10               ` Alan Modra
2020-06-03 21:50                 ` David Blaikie
2020-06-09 20:24                   ` Tombstone values in debug sections (was: Range lists, zero-length functions, linker gc) Fangrui Song
2020-06-19 20:04                     ` Mark Wielaard
2020-06-20  1:02                       ` David Blaikie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).