From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dblaikie@gmail.com>
Received: from mail-il1-x142.google.com (mail-il1-x142.google.com
 [IPv6:2607:f8b0:4864:20::142])
 by sourceware.org (Postfix) with ESMTPS id 30165386F002;
 Thu, 25 Jun 2020 23:46:06 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 30165386F002
Received: by mail-il1-x142.google.com with SMTP id t27so2004474ill.9;
 Thu, 25 Jun 2020 16:46:06 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc;
 bh=/qm99chSMA32M1H1UZjDsfDsXakIo4zAzZ+hC7cX5qw=;
 b=Jx0/jEPaJwRqPLDAIROwgiYqzchNpWCsaeglWcpFRZtOUvncquxMMx22nQNJ5sTNTn
 xtTgg8DtUqDgE9pH0VPGhz4eOXapwPYi+XuZ86CQv/nubBVvnLFdUOkvTAJWHW+s8cf6
 32GnJvcN3X49FVl7NdZN7Lh/nsCPVfJ+b9h0diW2vXH6Wp39hleaPD0EnTNHu4fP5Fmw
 qrgWwKJz2Y/BziHebBSPNeIHsBwGG+EzisRDcWTCMTXLYhsKzNUx7VZlDVqmF/7PNaWI
 nNZ+CO1ahnmkBX5NvH2JqKXjXj3gspeMra6Ae667fiNLt+w9eiGEfeqJl6B5rLu4wzoV
 TCPA==
X-Gm-Message-State: AOAM532QlReW2nuxK0FAVcnzvxbQa6x6FGPtM3kkc+9/CIUeQ6CTVL4G
 B2bvboJ/mZNCzqhJBu9pW4mjA5QPLET2vZ43Zv4=
X-Google-Smtp-Source: ABdhPJytFTd5/pmUel9maNIvsrdrzd7MPXvwyNXZdaoDkXzt8zhdx5aC3JDJjvj4KjJl3MLTB+EKEU9yNxySogvr/pA=
X-Received: by 2002:a92:4810:: with SMTP id v16mr443382ila.75.1593128765470;
 Thu, 25 Jun 2020 16:46:05 -0700 (PDT)
MIME-Version: 1.0
References: <20200531185506.mp2idyczc4thye4h@google.com>
 <20200531201016.GJ44629@wildebeest.org>
 <CAENS6Esjx0HQpviW=ZrA4O3Bza7JDOpoqe3fLxqmLZ4TZsv-9w@mail.gmail.com>
 <20200531222937.GM44629@wildebeest.org>
 <CAENS6Es_DuMzzQi-RBzF_0vm2QCX1DbxGsQsTVDMdSz2f2h4oQ@mail.gmail.com>
 <20200601093103.GN44629@wildebeest.org>
 <CAENS6EsK+ef=GCWewQSCH4imi-_LN8z7gp3qwK5BoMr-mYY=4w@mail.gmail.com>
 <a691fb97d27be64248298287ff9a189ce0734731.camel@klomp.org>
 <CAENS6EsMs78YuYnkC458XOBNXJGNurUk4MCHiq-0nRiywN7YLg@mail.gmail.com>
 <5e22c0183325aae16a28e301c7a83cea479130a0.camel@klomp.org>
 <CAENS6EsJk3FFbaV2xHeymNPZ1_r1933mJfqPdNp0rAO6vEezRA@mail.gmail.com>
 <2a72bb9f5f4d29f6733ee0c907a1043a97ef71d5.camel@klomp.org>
In-Reply-To: <2a72bb9f5f4d29f6733ee0c907a1043a97ef71d5.camel@klomp.org>
From: David Blaikie <dblaikie@gmail.com>
Date: Thu, 25 Jun 2020 16:45:54 -0700
Message-ID: <CAENS6Eu5d+2RASTAGJUHV-pBSaVy8zbz4Q-KVBnrvKNWMiWK2A@mail.gmail.com>
Subject: Re: Range lists, zero-length functions, linker gc
To: Mark Wielaard <mark@klomp.org>
Cc: gdb@sourceware.org, elfutils-devel@sourceware.org, binutils@sourceware.org,
 Fangrui Song <maskray@google.com>
Content-Type: text/plain; charset="UTF-8"
X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,
 SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-BeenThere: gdb@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gdb mailing list <gdb.sourceware.org>
List-Unsubscribe: <http://sourceware.org/mailman/options/gdb>,
 <mailto:gdb-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/gdb/>
List-Post: <mailto:gdb@sourceware.org>
List-Help: <mailto:gdb-request@sourceware.org?subject=help>
List-Subscribe: <http://sourceware.org/mailman/listinfo/gdb>,
 <mailto:gdb-request@sourceware.org?subject=subscribe>
X-List-Received-Date: Thu, 25 Jun 2020 23:46:08 -0000

On Wed, Jun 24, 2020 at 3:22 PM Mark Wielaard <mark@klomp.org> wrote:
>
> Hi David,
>
> On Fri, 2020-06-19 at 17:46 -0700, David Blaikie via Elfutils-devel wrote:
> > On Fri, Jun 19, 2020 at 5:00 AM Mark Wielaard <mark@klomp.org> wrote:
> > > I think that is kind of the point of Early Debug. Only use DWARF (at
> > > first) for address/range-less data like types and program scope
> > > entries, but don't emit anything (in DWARF format) for things that
> > > might need adjustments during link/LTO phase. The problem with using
> > > DWARF with address (ranges) during early object creation is that the
> > > linker isn't capable to rewrite the DWARF. You'll need a linker plugin
> > > that calls back into the compiler to do the actual LTO and emit the
> > > actual DWARF containing address/ranges (which can then link back to the
> > > already emitted DWARF types/program scope/etc during the Early Debug
> > > phase). I think the issue you are describing is actually that you do
> > > use DWARF to describe function definitions (not just the declarations)
> > > too early. If you aren't sure yet which addresses will be used DWARF
> > > isn't really the appropriate (temporary) debug format.
> >
> > Sorry, I think we keep talking around each other. Not sure if we can
> > reach a good consensus or shared understanding on this topic.
>
> I think the confusion comes from the fact that we seem to cycle through
> a couple of different topics which are related, but not really
> connected directly.
>
> There is the topic of using "tombstones" in place of some pc or range
> attributes/tables in the case of traditional linking separate compile
> units/objects. Where we seem to agree that those are better than
> silently producing bad data, but were we disagree whether there are
> other ways to solve the issue (using comdat section for example, where
> we might see the overhead/gains differently).
>
> There is the topic of LTO where part of the linker optimization is done
> through a (compiler) plugin. Where it isn't clear (to me at least) if
> some of the traditional way of handling DWARF in object files makes
> sense.

Oh - perhaps to clarify: I don't know of any implementation that
creates DWARF in intermediate object files in LTO.

> I would argue that GCC shows that for LTO you need something
> like Early Debug, where you only produce parts of the DWARF early that
> don't contain any addresses or ranges, since you don't know yet where
> code/data will end up till after the actual LTO phase, only after which
> it can be produced.

Yeah - I guess that's the point of the name "Early Debug" - it's
earlier than usual, rather than making the rest later than usual.

In LLVM's implementation the faux .o files in LTO contain no DWARF
whatsoever - but a semantic representation something like DWARF
intended to be manipulated by compiler optimizations and designed to
drop unreferenced portions as optimizations make changes. (if you
inline and optimize away a function call, that function may get
dropped - then no DWARF is emitted for it, same as if it were never
called)

Yeah, it'd be theoretically possible to create all the DWARF up-front,
use loclists and rnglists for /everything/ (because you wouldn't know
if a variable would have a single location or multiple until after
optimizations) and then fill in those loclists and rnglists
post-optimization. I don't know of any implementation that does that,
though - it'd make for very verbose DWARF, and I agree with you that
that wouldn't be great - I think the only point of conflict there is:
I don't think that's a concern that's actually manifesting in DWARF
producers today. Certainly not in LLVM & doesn't sound like it is in
GCC.

I think there's enough incentive for compiler performance - not to
produce loads of duplicate DWARF, and to have a fairly
compact/optimizable intermediate representation - there was a lot of
work that went into changing LLVM's representation to be more amenable
to LTO to ensure things got dropped and deduplicated as soon as
possible.

> Then there is the topic of Split Dwarf, where I am not sure it is
> directly relevant to the above two topics. It is just a different
> representation of the DWARF data, with an extra layer of indirections
> used for addresses. Which in the case of the traditional model means
> that you still hit the tombstones, just through an indirection table.
> And for LTO it just makes some things more complicated because you have
> this extra address indirection table, but since you cannot know where
> the addresses end up till after the LTO phase you now have an extra
> layer of indirection to fix up.

I think the point of Split DWARF is, to your first point about you and
I having perhaps different tradeoffs about object size cost (using
comdats to deduplicate/drop DWARF For dead or deduplicated functions)
- in the case of Split DWARF, it's impossible - well, it's impossible
if you're going to use fragmented DWARF (eg: use comdats to stitch
together a single CU out of droppable parts). If you were going to
drop the DWARF related to a dead or deduplicated function when using
Split DWARF you'd have to use a whole separate unit (possibly a
partial_unit) - which would add a lot more size overhead. Perhaps
enough that we'd both agree that's prohibitive (especially since that
cost would persist into the linked binary - so it wouldn't be as much
of a .o/linked executable tradeoff, but an outright growth)

>
> > DWARF in unlinked object files has been a fairly well used temporary
> > debug format for a long time - and the DWARF spec has done a lot to
> > ensure it is compatible with ELF in both object files and linkers
> > forever, basically? So I don't think it'd be suitable to say "DWARF
> > isn't an appropriate intermediate debug format to use between
> > compilers and linkers". In the sense that I don't think either the
> > DWARF committee members, producers, or consumers would agree with this
> > sentiment.
>
> I absolutely agree with that statement for the traditional linker
> model, where you build up DWARF data per compile unit.

Ah, OK - then perhaps that's all we need to really agree on to move
forward with the discussion of a tombstone value, what value it is,
that it should be in the DWARF spec and all the implementations should
know and agree on it?

> But for the LTO
> model, where there is a feedback loop between compiler and linker, I
> don't think (all of) DWARF is an appropriate intermediate debug format.

Neither do I - though if we both agree there is a need for a tombstone
in the traditional linker model, then we do leave it open for very
inefficient LTO implementations to use that feature too - though
there's lots of ways a DWARF producer could produce very inefficient
DWARF & I don't think there's a great need to mandate against it in
general (if we could avoid having the tombstone concept entirely -
sure - but if we've got to have it, I don't know that the LTO
conversation goes anyway in terms of informing the design of the
tombstone feature)

> If only because the concept of "compile unit" gets really fuzzy. I
> think in that model a lot of DWARF can still be used usefully as
> intermediate debug format to pass between compiler, linker, compiler,
> linker during the LTO phase. Just not the part that describes the
> program scope and variable/data locations represented as (ranges of)
> addresses (when produced early).
>
> > > I understand the function sections case, but can you give actual
> > > examples of an inline function or function template source code and how
> > > a DWARF producer generates DWARF for that? Maybe some simple source
> > > code we can put through gcc or clang to see how they (mis)handle it.
> > > Not being a compiler architect I am not sure I understand why those
> > > cannot be expressed correctly.
> >
> > oh, sure! sorry.
> >
> > a simple case of inline functions being deduplicated looks like this:
> >
> > a.cpp:
> > inline void f1() { }
> > void f2() {
> >   f1();
> > }
> >
> > b.cpp:
> > inline void f1() { }
> > void f2();
> > int main() {
> >   f1();
> >   f2();
> > }
> >
> > This actually demonstrates a slightly different behavior of bfd and
> > gold: When the comdats are the same size (I'm told that's the
> > heuristic) and the local symbol names the DWARF uses to refer to the
> > functions (f1 in this case) - then both DWARF descriptions are
> > resolved to point to the same deduplicated copy of 'f1', eg:
>
> Thanks for the concrete example. I'll study it.
>
> Would you mind telling which DWARF producer/compiler you used and which
> command line flags you used to the compiler and linker invocations?

clang or gcc without any extra flags should suffice here

To get the summarized DWARF I showed above, I used this complete command line:

$ clang++ -g a.cpp b.cpp && llvm-dwarfdump -v -debug-info a.out | grep
"DW_TAG\|DW_AT_[^ ]*pc\|DW_AT_ranges\|^ *\[\|DW_AT_name" | sed -e
"s/............//"

(using clang and llvm-dwarfdump from LLVM trunk)

> I
> like to replicate the produced DWARF but wasn't able to get something
> that used ranges like in your examples. I also wonder about the ODR
> violation, does your example depend on this being C++ or does it
> produce the same issues when it was build as a C program?

I believe C has different "inline" semantics that I'm not as familiar
with - but I /believe/ the actual C standard inline semantics wouldn't
produce the kind of situation that C++ does. (in C++ you define an
inline function in every translation it's used - and the compiler can
choose to inline or not, and if it doesn't actually inline then the
object file carries a deduplicable definition of the function and then
the linker picks one of those definitions from any in the input object
files - whereas in C the inline function definition, if not inlined,
is discarded by the compiler and the user must have provided a
non-inline definition in one file as usual - so there's no
duplication/deduplication)

You could use function-sections/gc-sections to observe the "ODR
violation" sort of situation where the addresses go to zero/tombstone
rather than the "two subprograms point to one function" behavior:

eg:

$ clang -g -ffunction-sections -Wl,-gc-sections a.c &&
llvm-dwarfdump-tot -v -debug-info a.out | grep "DW_TAG\|DW_AT_[^
]*pc\|DW_AT_ranges\|^ *\[\|DW_AT_name" | sed -e "s/............//"
DW_TAG_compile_unit [1] *
  DW_AT_name [DW_FORM_strp]     ( .debug_str[0x00000065] = "a.c")
  DW_AT_low_pc [DW_FORM_addr]   (0x0000000000000000)
  DW_AT_ranges [DW_FORM_sec_offset]     (0x00000000
     [0x0000000000000001, 0x0000000000000001)
     [0x0000000000401110, 0x0000000000401118))
  DW_TAG_subprogram [2]
    DW_AT_low_pc [DW_FORM_addr] (0x0000000000000000)
    DW_AT_high_pc [DW_FORM_data4]       (0x00000006)
    DW_AT_name [DW_FORM_strp]   ( .debug_str[0x00000094] = "f1")
  DW_TAG_subprogram [3]
    DW_AT_low_pc [DW_FORM_addr] (0x0000000000401110)
    DW_AT_high_pc [DW_FORM_data4]       (0x00000008)
    DW_AT_name [DW_FORM_strp]   ( .debug_str[0x00000097] = "main")
  DW_TAG_base_type [4]
    DW_AT_name [DW_FORM_strp]   ( .debug_str[0x0000009c] = "int")

& I guess now we can show the full variety of tombstone behavior...

(the above example was with bfd ld, using 1 as a tombstone in
debug_ranges and 0 as the tombstone elsewhere (such as in the low_pc
of the "f1" subprogram)) - this works unless zero or 1 (or other
"small" values - or you have large functions (so [0, 6) range becomes
larger and starts overlapping with the non-gc'd functions)) are part
of the valid address range of the program - if they are, then the
subprogram address ranges become ambiguous & you don't know which
function you're in

Then we've got gold (add "-fuse-ld=gold" to the compilation command),
just snipping the relevant bit of the output:

  DW_AT_ranges [DW_FORM_sec_offset]     (0x00000000
     [0x0000000000000000, 0x0000000000000006)
     [0x0000000000400510, 0x0000000000400518))
  DW_TAG_subprogram [2]
    DW_AT_low_pc [DW_FORM_addr] (0x0000000000000000)

Here we can see gold's technique of using "0+addend" as the tombstone
value - which works, again, until your valid address range is lower or
you have large functions (or you special case zero as the tombstone -
which then works until you have zero as a valid code address, or you
have empty functions (where range and loc lists would get terminated
prematurely) or you have a function that starts at a non-zero
addend... )

Then we've got lld's new behavior (which will hopefully be adopted by
the other linkers and the DWARF standard as a more robust solution):

  DW_AT_ranges [DW_FORM_sec_offset]     (0x00000000
     [0xfffffffffffffffe, 0xfffffffffffffffe) # this would be
0xffffffffffffffff in DWARFv5, but needs to be 0xfffffffffffffffe in
DWARFv4 to avoid creating unintended base address selection entries in
debug_loc and debug_ranges
     [0x0000000000201690, 0x0000000000201698))
  DW_TAG_subprogram [2]
    DW_AT_low_pc [DW_FORM_addr] (0xffffffffffffffff)

Which probably works about as well as the other solutions if the
consumer isn't special casing things (& isn't being too fussy about
the fact that low_pc+(data4)high_pc might overflow... ) and also
allows the consumer to special case more intentionally without ruling
out zero as a valid address, etc.