Re: Range lists, zero-length functions, linker gc

public inbox for gdb@sourceware.org
 help / color / mirror / Atom feed

From: Mark Wielaard <mark@klomp.org>
To: David Blaikie <dblaikie@gmail.com>
Cc: gdb@sourceware.org, elfutils-devel@sourceware.org,
	binutils@sourceware.org,  Fangrui Song <maskray@google.com>
Subject: Re: Range lists, zero-length functions, linker gc
Date: Thu, 25 Jun 2020 00:21:59 +0200	[thread overview]
Message-ID: <2a72bb9f5f4d29f6733ee0c907a1043a97ef71d5.camel@klomp.org> (raw)
In-Reply-To: <CAENS6EsJk3FFbaV2xHeymNPZ1_r1933mJfqPdNp0rAO6vEezRA@mail.gmail.com>

Hi David,

On Fri, 2020-06-19 at 17:46 -0700, David Blaikie via Elfutils-devel wrote:
> On Fri, Jun 19, 2020 at 5:00 AM Mark Wielaard <mark@klomp.org> wrote:
> > I think that is kind of the point of Early Debug. Only use DWARF (at
> > first) for address/range-less data like types and program scope
> > entries, but don't emit anything (in DWARF format) for things that
> > might need adjustments during link/LTO phase. The problem with using
> > DWARF with address (ranges) during early object creation is that the
> > linker isn't capable to rewrite the DWARF. You'll need a linker plugin
> > that calls back into the compiler to do the actual LTO and emit the
> > actual DWARF containing address/ranges (which can then link back to the
> > already emitted DWARF types/program scope/etc during the Early Debug
> > phase). I think the issue you are describing is actually that you do
> > use DWARF to describe function definitions (not just the declarations)
> > too early. If you aren't sure yet which addresses will be used DWARF
> > isn't really the appropriate (temporary) debug format.
> 
> Sorry, I think we keep talking around each other. Not sure if we can
> reach a good consensus or shared understanding on this topic.

I think the confusion comes from the fact that we seem to cycle through
a couple of different topics which are related, but not really
connected directly.

There is the topic of using "tombstones" in place of some pc or range
attributes/tables in the case of traditional linking separate compile
units/objects. Where we seem to agree that those are better than
silently producing bad data, but were we disagree whether there are
other ways to solve the issue (using comdat section for example, where
we might see the overhead/gains differently).

There is the topic of LTO where part of the linker optimization is done
through a (compiler) plugin. Where it isn't clear (to me at least) if
some of the traditional way of handling DWARF in object files makes
sense. I would argue that GCC shows that for LTO you need something
like Early Debug, where you only produce parts of the DWARF early that
don't contain any addresses or ranges, since you don't know yet where
code/data will end up till after the actual LTO phase, only after which
it can be produced.

Then there is the topic of Split Dwarf, where I am not sure it is
directly relevant to the above two topics. It is just a different
representation of the DWARF data, with an extra layer of indirections
used for addresses. Which in the case of the traditional model means
that you still hit the tombstones, just through an indirection table.
And for LTO it just makes some things more complicated because you have
this extra address indirection table, but since you cannot know where
the addresses end up till after the LTO phase you now have an extra
layer of indirection to fix up.

> DWARF in unlinked object files has been a fairly well used temporary
> debug format for a long time - and the DWARF spec has done a lot to
> ensure it is compatible with ELF in both object files and linkers
> forever, basically? So I don't think it'd be suitable to say "DWARF
> isn't an appropriate intermediate debug format to use between
> compilers and linkers". In the sense that I don't think either the
> DWARF committee members, producers, or consumers would agree with this
> sentiment.

I absolutely agree with that statement for the traditional linker
model, where you build up DWARF data per compile unit. But for the LTO
model, where there is a feedback loop between compiler and linker, I
don't think (all of) DWARF is an appropriate intermediate debug format.
If only because the concept of "compile unit" gets really fuzzy. I
think in that model a lot of DWARF can still be used usefully as
intermediate debug format to pass between compiler, linker, compiler,
linker during the LTO phase. Just not the part that describes the
program scope and variable/data locations represented as (ranges of)
addresses (when produced early).

> > I understand the function sections case, but can you give actual
> > examples of an inline function or function template source code and how
> > a DWARF producer generates DWARF for that? Maybe some simple source
> > code we can put through gcc or clang to see how they (mis)handle it.
> > Not being a compiler architect I am not sure I understand why those
> > cannot be expressed correctly.
> 
> oh, sure! sorry.
> 
> a simple case of inline functions being deduplicated looks like this:
> 
> a.cpp:
> inline void f1() { }
> void f2() {
>   f1();
> }
> 
> b.cpp:
> inline void f1() { }
> void f2();
> int main() {
>   f1();
>   f2();
> }
> 
> This actually demonstrates a slightly different behavior of bfd and
> gold: When the comdats are the same size (I'm told that's the
> heuristic) and the local symbol names the DWARF uses to refer to the
> functions (f1 in this case) - then both DWARF descriptions are
> resolved to point to the same deduplicated copy of 'f1', eg:

Thanks for the concrete example. I'll study it.

Would you mind telling which DWARF producer/compiler you used and which
command line flags you used to the compiler and linker invocations? I
like to replicate the produced DWARF but wasn't able to get something
that used ranges like in your examples. I also wonder about the ODR
violation, does your example depend on this being C++ or does it
produce the same issues when it was build as a C program?

Thanks,

Mark

next prev parent reply	other threads:[~2020-06-24 22:22 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-31 18:55 Fangrui Song
2020-05-31 19:15 ` Fangrui Song
2020-05-31 20:10 ` Mark Wielaard
2020-05-31 20:47   ` Fangrui Song
2020-05-31 22:11     ` Mark Wielaard
2020-05-31 23:17       ` David Blaikie
2020-05-31 20:49   ` David Blaikie
2020-05-31 22:29     ` Mark Wielaard
2020-05-31 22:36       ` David Blaikie
2020-06-01  9:31         ` Mark Wielaard
2020-06-01 20:18           ` David Blaikie
2020-06-02 16:50             ` Mark Wielaard
2020-06-02 18:06               ` David Blaikie
2020-06-03  3:10                 ` Alan Modra
2020-06-03  4:06                   ` Fangrui Song
2020-06-03 21:50                   ` David Blaikie
2020-06-09 20:24                     ` Tombstone values in debug sections (was: Range lists, zero-length functions, linker gc) Fangrui Song
2020-06-19 20:04                       ` Mark Wielaard
2020-06-20  1:02                         ` David Blaikie
2020-06-19 12:00                 ` Range lists, zero-length functions, linker gc Mark Wielaard
2020-06-20  0:46                   ` David Blaikie
2020-06-24 22:21                     ` Mark Wielaard [this message]
2020-06-25 23:45                       ` David Blaikie
2020-05-31 21:33 ` David Blaikie
2020-06-01 16:25 ` Andrew Burgess

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2a72bb9f5f4d29f6733ee0c907a1043a97ef71d5.camel@klomp.org \
    --to=mark@klomp.org \
    --cc=binutils@sourceware.org \
    --cc=dblaikie@gmail.com \
    --cc=elfutils-devel@sourceware.org \
    --cc=gdb@sourceware.org \
    --cc=maskray@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).