From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from gnu.wildebeest.org (wildebeest.demon.nl [212.238.236.112]) by sourceware.org (Postfix) with ESMTPS id CF9313851C19; Wed, 24 Jun 2020 22:22:00 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org CF9313851C19 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=klomp.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mark@klomp.org Received: from tarox.wildebeest.org (tarox.wildebeest.org [172.31.17.39]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by gnu.wildebeest.org (Postfix) with ESMTPSA id 897A730291AB; Thu, 25 Jun 2020 00:21:59 +0200 (CEST) Received: by tarox.wildebeest.org (Postfix, from userid 1000) id 3B2F2409CB2C; Thu, 25 Jun 2020 00:21:59 +0200 (CEST) Message-ID: <2a72bb9f5f4d29f6733ee0c907a1043a97ef71d5.camel@klomp.org> Subject: Re: Range lists, zero-length functions, linker gc From: Mark Wielaard To: David Blaikie Cc: gdb@sourceware.org, elfutils-devel@sourceware.org, binutils@sourceware.org, Fangrui Song Date: Thu, 25 Jun 2020 00:21:59 +0200 In-Reply-To: References: <20200531185506.mp2idyczc4thye4h@google.com> <20200531201016.GJ44629@wildebeest.org> <20200531222937.GM44629@wildebeest.org> <20200601093103.GN44629@wildebeest.org> <5e22c0183325aae16a28e301c7a83cea479130a0.camel@klomp.org> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Mailer: Evolution 3.28.5 (3.28.5-8.el7) Mime-Version: 1.0 X-Spam-Status: No, score=-7.0 required=5.0 tests=BAYES_00, JMQ_SPF_NEUTRAL, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gdb@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 24 Jun 2020 22:22:02 -0000 Hi David, On Fri, 2020-06-19 at 17:46 -0700, David Blaikie via Elfutils-devel wrote: > On Fri, Jun 19, 2020 at 5:00 AM Mark Wielaard wrote: > > I think that is kind of the point of Early Debug. Only use DWARF (at > > first) for address/range-less data like types and program scope > > entries, but don't emit anything (in DWARF format) for things that > > might need adjustments during link/LTO phase. The problem with using > > DWARF with address (ranges) during early object creation is that the > > linker isn't capable to rewrite the DWARF. You'll need a linker plugin > > that calls back into the compiler to do the actual LTO and emit the > > actual DWARF containing address/ranges (which can then link back to the > > already emitted DWARF types/program scope/etc during the Early Debug > > phase). I think the issue you are describing is actually that you do > > use DWARF to describe function definitions (not just the declarations) > > too early. If you aren't sure yet which addresses will be used DWARF > > isn't really the appropriate (temporary) debug format. >=20 > Sorry, I think we keep talking around each other. Not sure if we can > reach a good consensus or shared understanding on this topic. I think the confusion comes from the fact that we seem to cycle through a couple of different topics which are related, but not really connected directly. There is the topic of using "tombstones" in place of some pc or range attributes/tables in the case of traditional linking separate compile units/objects. Where we seem to agree that those are better than silently producing bad data, but were we disagree whether there are other ways to solve the issue (using comdat section for example, where we might see the overhead/gains differently). There is the topic of LTO where part of the linker optimization is done through a (compiler) plugin. Where it isn't clear (to me at least) if some of the traditional way of handling DWARF in object files makes sense. I would argue that GCC shows that for LTO you need something like Early Debug, where you only produce parts of the DWARF early that don't contain any addresses or ranges, since you don't know yet where code/data will end up till after the actual LTO phase, only after which it can be produced. Then there is the topic of Split Dwarf, where I am not sure it is directly relevant to the above two topics. It is just a different representation of the DWARF data, with an extra layer of indirections used for addresses. Which in the case of the traditional model means that you still hit the tombstones, just through an indirection table. And for LTO it just makes some things more complicated because you have this extra address indirection table, but since you cannot know where the addresses end up till after the LTO phase you now have an extra layer of indirection to fix up. > DWARF in unlinked object files has been a fairly well used temporary > debug format for a long time - and the DWARF spec has done a lot to > ensure it is compatible with ELF in both object files and linkers > forever, basically? So I don't think it'd be suitable to say "DWARF > isn't an appropriate intermediate debug format to use between > compilers and linkers". In the sense that I don't think either the > DWARF committee members, producers, or consumers would agree with this > sentiment. I absolutely agree with that statement for the traditional linker model, where you build up DWARF data per compile unit. But for the LTO model, where there is a feedback loop between compiler and linker, I don't think (all of) DWARF is an appropriate intermediate debug format. If only because the concept of "compile unit" gets really fuzzy. I think in that model a lot of DWARF can still be used usefully as intermediate debug format to pass between compiler, linker, compiler, linker during the LTO phase. Just not the part that describes the program scope and variable/data locations represented as (ranges of) addresses (when produced early). > > I understand the function sections case, but can you give actual > > examples of an inline function or function template source code and how > > a DWARF producer generates DWARF for that? Maybe some simple source > > code we can put through gcc or clang to see how they (mis)handle it. > > Not being a compiler architect I am not sure I understand why those > > cannot be expressed correctly. >=20 > oh, sure! sorry. >=20 > a simple case of inline functions being deduplicated looks like this: >=20 > a.cpp: > inline void f1() { } > void f2() { > f1(); > } >=20 > b.cpp: > inline void f1() { } > void f2(); > int main() { > f1(); > f2(); > } >=20 > This actually demonstrates a slightly different behavior of bfd and > gold: When the comdats are the same size (I'm told that's the > heuristic) and the local symbol names the DWARF uses to refer to the > functions (f1 in this case) - then both DWARF descriptions are > resolved to point to the same deduplicated copy of 'f1', eg: Thanks for the concrete example. I'll study it. Would you mind telling which DWARF producer/compiler you used and which command line flags you used to the compiler and linker invocations? I like to replicate the produced DWARF but wasn't able to get something that used ranges like in your examples. I also wonder about the ODR violation, does your example depend on this being C++ or does it produce the same issues when it was build as a C program? Thanks, Mark