From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <mark@klomp.org>
Received: from gnu.wildebeest.org (wildebeest.demon.nl [212.238.236.112])
 by sourceware.org (Postfix) with ESMTPS id 5BAB63840C23;
 Fri, 19 Jun 2020 12:00:29 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 5BAB63840C23
Authentication-Results: sourceware.org;
 dmarc=none (p=none dis=none) header.from=klomp.org
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mark@klomp.org
Received: from tarox.wildebeest.org (tarox.wildebeest.org [172.31.17.39])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by gnu.wildebeest.org (Postfix) with ESMTPSA id 2D5483000C8B;
 Fri, 19 Jun 2020 14:00:28 +0200 (CEST)
Received: by tarox.wildebeest.org (Postfix, from userid 1000)
 id 0A4314708690; Fri, 19 Jun 2020 14:00:28 +0200 (CEST)
Message-ID: <5e22c0183325aae16a28e301c7a83cea479130a0.camel@klomp.org>
Subject: Re: Range lists, zero-length functions, linker gc
From: Mark Wielaard <mark@klomp.org>
To: David Blaikie <dblaikie@gmail.com>
Cc: gdb@sourceware.org, elfutils-devel@sourceware.org,
 binutils@sourceware.org,  Fangrui Song <maskray@google.com>
Date: Fri, 19 Jun 2020 14:00:27 +0200
In-Reply-To: <CAENS6EsMs78YuYnkC458XOBNXJGNurUk4MCHiq-0nRiywN7YLg@mail.gmail.com>
References: <20200531185506.mp2idyczc4thye4h@google.com>
 <20200531201016.GJ44629@wildebeest.org>
 <CAENS6Esjx0HQpviW=ZrA4O3Bza7JDOpoqe3fLxqmLZ4TZsv-9w@mail.gmail.com>
 <20200531222937.GM44629@wildebeest.org>
 <CAENS6Es_DuMzzQi-RBzF_0vm2QCX1DbxGsQsTVDMdSz2f2h4oQ@mail.gmail.com>
 <20200601093103.GN44629@wildebeest.org>
 <CAENS6EsK+ef=GCWewQSCH4imi-_LN8z7gp3qwK5BoMr-mYY=4w@mail.gmail.com>
 <a691fb97d27be64248298287ff9a189ce0734731.camel@klomp.org>
 <CAENS6EsMs78YuYnkC458XOBNXJGNurUk4MCHiq-0nRiywN7YLg@mail.gmail.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Mailer: Evolution 3.28.5 (3.28.5-8.el7) 
Mime-Version: 1.0
X-Spam-Status: No, score=-7.4 required=5.0 tests=BAYES_00, JMQ_SPF_NEUTRAL,
 KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=no autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-BeenThere: gdb@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gdb mailing list <gdb.sourceware.org>
List-Unsubscribe: <http://sourceware.org/mailman/options/gdb>,
 <mailto:gdb-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/gdb/>
List-Post: <mailto:gdb@sourceware.org>
List-Help: <mailto:gdb-request@sourceware.org?subject=help>
List-Subscribe: <http://sourceware.org/mailman/listinfo/gdb>,
 <mailto:gdb-request@sourceware.org?subject=subscribe>
X-List-Received-Date: Fri, 19 Jun 2020 12:00:30 -0000

Hi,

On Tue, 2020-06-02 at 11:06 -0700, David Blaikie via Elfutils-devel wrote:
> > I do think combining Split DWARF and LTO might not be the best
> > solution. When doing LTO you probably want something like GCC Early
> > Debug, which is like Split DWARF, but different, because the Early
> > Debug simply doesn't contain any address (ranges) yet (not even through
> > indirection like .debug_addr).
>=20
> I don't think Early Debug fits here - it seems like it was
> specifically for DWARF that doesn't refer to any code (eg: function
> declarations and type definitions). I don't see how it could be used
> for the actual address-referencing DWARF needed to describe function
> definitions.

I think that is kind of the point of Early Debug. Only use DWARF (at
first) for address/range-less data like types and program scope
entries, but don't emit anything (in DWARF format) for things that
might need adjustments during link/LTO phase. The problem with using
DWARF with address (ranges) during early object creation is that the
linker isn't capable to rewrite the DWARF. You'll need a linker plugin
that calls back into the compiler to do the actual LTO and emit the
actual DWARF containing address/ranges (which can then link back to the
already emitted DWARF types/program scope/etc during the Early Debug
phase). I think the issue you are describing is actually that you do
use DWARF to describe function definitions (not just the declarations)
too early. If you aren't sure yet which addresses will be used DWARF
isn't really the appropriate (temporary) debug format.

> > > > > & again the overhead of all those separate contributions, headers=
,
> > > > > etc, turns out to be not very desirable in any case.
> > > >=20
> > > > Yes, I agree with that. But as said earlier, maybe the compiler
> > > > shouldn't have generated to code/data in the first place?
> > >=20
> > > In the (especially) C++ compilation model, I don't believe that's
> > > possible - inline functions, templates, etc, require duplication -
> > > unless you have a more complicated build process that can gather the
> > > potential duplication, then fan back out again to compile, etc.
> > > ThinLTO does some of this - at a cost of a more complicated build
> > > system, etc.
> >=20
> > It might be useful for the original discussion to have a few more
> > concrete examples to show when you might have unused code that the
> > linker might want to discard, but where the compiler could only produce
> > DWARF in one big blob. Apart of the -ffunction-sections case,
>=20
> Function sections, inline functions, function templates are core examples=
.

I understand the function sections case, but can you give actual
examples of an inline function or function template source code and how
a DWARF producer generates DWARF for that? Maybe some simple source
code we can put through gcc or clang to see how they (mis)handle it.
Not being a compiler architect I am not sure I understand why those
cannot be expressed correctly.

> > where I
> > would argue the compiler simply needs to make sure that if it generates
> > code in separate sections it also should create the DWARF separate
> > section (groups).
>=20
> I don't think that's practical - the overhead, I believe, is too high.
> Headers for each section contribution (ELF headers but DWARF headers
> moreso - having a separate .debug_addr, .debug_line, etc section for
> each function would be very expensive) would make for very large
> object files.

I see your point, but maybe this shouldn't be handled by the linker
then, but maybe have a linker plugin so the compiler can fixup the
DWARF (or generate it later).

Cheers,

Mark