From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-il1-x142.google.com (mail-il1-x142.google.com [IPv6:2607:f8b0:4864:20::142]) by sourceware.org (Postfix) with ESMTPS id 30165386F002; Thu, 25 Jun 2020 23:46:06 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 30165386F002 Received: by mail-il1-x142.google.com with SMTP id t27so2004474ill.9; Thu, 25 Jun 2020 16:46:06 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=/qm99chSMA32M1H1UZjDsfDsXakIo4zAzZ+hC7cX5qw=; b=Jx0/jEPaJwRqPLDAIROwgiYqzchNpWCsaeglWcpFRZtOUvncquxMMx22nQNJ5sTNTn xtTgg8DtUqDgE9pH0VPGhz4eOXapwPYi+XuZ86CQv/nubBVvnLFdUOkvTAJWHW+s8cf6 32GnJvcN3X49FVl7NdZN7Lh/nsCPVfJ+b9h0diW2vXH6Wp39hleaPD0EnTNHu4fP5Fmw qrgWwKJz2Y/BziHebBSPNeIHsBwGG+EzisRDcWTCMTXLYhsKzNUx7VZlDVqmF/7PNaWI nNZ+CO1ahnmkBX5NvH2JqKXjXj3gspeMra6Ae667fiNLt+w9eiGEfeqJl6B5rLu4wzoV TCPA== X-Gm-Message-State: AOAM532QlReW2nuxK0FAVcnzvxbQa6x6FGPtM3kkc+9/CIUeQ6CTVL4G B2bvboJ/mZNCzqhJBu9pW4mjA5QPLET2vZ43Zv4= X-Google-Smtp-Source: ABdhPJytFTd5/pmUel9maNIvsrdrzd7MPXvwyNXZdaoDkXzt8zhdx5aC3JDJjvj4KjJl3MLTB+EKEU9yNxySogvr/pA= X-Received: by 2002:a92:4810:: with SMTP id v16mr443382ila.75.1593128765470; Thu, 25 Jun 2020 16:46:05 -0700 (PDT) MIME-Version: 1.0 References: <20200531185506.mp2idyczc4thye4h@google.com> <20200531201016.GJ44629@wildebeest.org> <20200531222937.GM44629@wildebeest.org> <20200601093103.GN44629@wildebeest.org> <5e22c0183325aae16a28e301c7a83cea479130a0.camel@klomp.org> <2a72bb9f5f4d29f6733ee0c907a1043a97ef71d5.camel@klomp.org> In-Reply-To: <2a72bb9f5f4d29f6733ee0c907a1043a97ef71d5.camel@klomp.org> From: David Blaikie Date: Thu, 25 Jun 2020 16:45:54 -0700 Message-ID: Subject: Re: Range lists, zero-length functions, linker gc To: Mark Wielaard Cc: gdb@sourceware.org, elfutils-devel@sourceware.org, binutils@sourceware.org, Fangrui Song Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gdb@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 25 Jun 2020 23:46:08 -0000 On Wed, Jun 24, 2020 at 3:22 PM Mark Wielaard wrote: > > Hi David, > > On Fri, 2020-06-19 at 17:46 -0700, David Blaikie via Elfutils-devel wrote: > > On Fri, Jun 19, 2020 at 5:00 AM Mark Wielaard wrote: > > > I think that is kind of the point of Early Debug. Only use DWARF (at > > > first) for address/range-less data like types and program scope > > > entries, but don't emit anything (in DWARF format) for things that > > > might need adjustments during link/LTO phase. The problem with using > > > DWARF with address (ranges) during early object creation is that the > > > linker isn't capable to rewrite the DWARF. You'll need a linker plugin > > > that calls back into the compiler to do the actual LTO and emit the > > > actual DWARF containing address/ranges (which can then link back to the > > > already emitted DWARF types/program scope/etc during the Early Debug > > > phase). I think the issue you are describing is actually that you do > > > use DWARF to describe function definitions (not just the declarations) > > > too early. If you aren't sure yet which addresses will be used DWARF > > > isn't really the appropriate (temporary) debug format. > > > > Sorry, I think we keep talking around each other. Not sure if we can > > reach a good consensus or shared understanding on this topic. > > I think the confusion comes from the fact that we seem to cycle through > a couple of different topics which are related, but not really > connected directly. > > There is the topic of using "tombstones" in place of some pc or range > attributes/tables in the case of traditional linking separate compile > units/objects. Where we seem to agree that those are better than > silently producing bad data, but were we disagree whether there are > other ways to solve the issue (using comdat section for example, where > we might see the overhead/gains differently). > > There is the topic of LTO where part of the linker optimization is done > through a (compiler) plugin. Where it isn't clear (to me at least) if > some of the traditional way of handling DWARF in object files makes > sense. Oh - perhaps to clarify: I don't know of any implementation that creates DWARF in intermediate object files in LTO. > I would argue that GCC shows that for LTO you need something > like Early Debug, where you only produce parts of the DWARF early that > don't contain any addresses or ranges, since you don't know yet where > code/data will end up till after the actual LTO phase, only after which > it can be produced. Yeah - I guess that's the point of the name "Early Debug" - it's earlier than usual, rather than making the rest later than usual. In LLVM's implementation the faux .o files in LTO contain no DWARF whatsoever - but a semantic representation something like DWARF intended to be manipulated by compiler optimizations and designed to drop unreferenced portions as optimizations make changes. (if you inline and optimize away a function call, that function may get dropped - then no DWARF is emitted for it, same as if it were never called) Yeah, it'd be theoretically possible to create all the DWARF up-front, use loclists and rnglists for /everything/ (because you wouldn't know if a variable would have a single location or multiple until after optimizations) and then fill in those loclists and rnglists post-optimization. I don't know of any implementation that does that, though - it'd make for very verbose DWARF, and I agree with you that that wouldn't be great - I think the only point of conflict there is: I don't think that's a concern that's actually manifesting in DWARF producers today. Certainly not in LLVM & doesn't sound like it is in GCC. I think there's enough incentive for compiler performance - not to produce loads of duplicate DWARF, and to have a fairly compact/optimizable intermediate representation - there was a lot of work that went into changing LLVM's representation to be more amenable to LTO to ensure things got dropped and deduplicated as soon as possible. > Then there is the topic of Split Dwarf, where I am not sure it is > directly relevant to the above two topics. It is just a different > representation of the DWARF data, with an extra layer of indirections > used for addresses. Which in the case of the traditional model means > that you still hit the tombstones, just through an indirection table. > And for LTO it just makes some things more complicated because you have > this extra address indirection table, but since you cannot know where > the addresses end up till after the LTO phase you now have an extra > layer of indirection to fix up. I think the point of Split DWARF is, to your first point about you and I having perhaps different tradeoffs about object size cost (using comdats to deduplicate/drop DWARF For dead or deduplicated functions) - in the case of Split DWARF, it's impossible - well, it's impossible if you're going to use fragmented DWARF (eg: use comdats to stitch together a single CU out of droppable parts). If you were going to drop the DWARF related to a dead or deduplicated function when using Split DWARF you'd have to use a whole separate unit (possibly a partial_unit) - which would add a lot more size overhead. Perhaps enough that we'd both agree that's prohibitive (especially since that cost would persist into the linked binary - so it wouldn't be as much of a .o/linked executable tradeoff, but an outright growth) > > > DWARF in unlinked object files has been a fairly well used temporary > > debug format for a long time - and the DWARF spec has done a lot to > > ensure it is compatible with ELF in both object files and linkers > > forever, basically? So I don't think it'd be suitable to say "DWARF > > isn't an appropriate intermediate debug format to use between > > compilers and linkers". In the sense that I don't think either the > > DWARF committee members, producers, or consumers would agree with this > > sentiment. > > I absolutely agree with that statement for the traditional linker > model, where you build up DWARF data per compile unit. Ah, OK - then perhaps that's all we need to really agree on to move forward with the discussion of a tombstone value, what value it is, that it should be in the DWARF spec and all the implementations should know and agree on it? > But for the LTO > model, where there is a feedback loop between compiler and linker, I > don't think (all of) DWARF is an appropriate intermediate debug format. Neither do I - though if we both agree there is a need for a tombstone in the traditional linker model, then we do leave it open for very inefficient LTO implementations to use that feature too - though there's lots of ways a DWARF producer could produce very inefficient DWARF & I don't think there's a great need to mandate against it in general (if we could avoid having the tombstone concept entirely - sure - but if we've got to have it, I don't know that the LTO conversation goes anyway in terms of informing the design of the tombstone feature) > If only because the concept of "compile unit" gets really fuzzy. I > think in that model a lot of DWARF can still be used usefully as > intermediate debug format to pass between compiler, linker, compiler, > linker during the LTO phase. Just not the part that describes the > program scope and variable/data locations represented as (ranges of) > addresses (when produced early). > > > > I understand the function sections case, but can you give actual > > > examples of an inline function or function template source code and how > > > a DWARF producer generates DWARF for that? Maybe some simple source > > > code we can put through gcc or clang to see how they (mis)handle it. > > > Not being a compiler architect I am not sure I understand why those > > > cannot be expressed correctly. > > > > oh, sure! sorry. > > > > a simple case of inline functions being deduplicated looks like this: > > > > a.cpp: > > inline void f1() { } > > void f2() { > > f1(); > > } > > > > b.cpp: > > inline void f1() { } > > void f2(); > > int main() { > > f1(); > > f2(); > > } > > > > This actually demonstrates a slightly different behavior of bfd and > > gold: When the comdats are the same size (I'm told that's the > > heuristic) and the local symbol names the DWARF uses to refer to the > > functions (f1 in this case) - then both DWARF descriptions are > > resolved to point to the same deduplicated copy of 'f1', eg: > > Thanks for the concrete example. I'll study it. > > Would you mind telling which DWARF producer/compiler you used and which > command line flags you used to the compiler and linker invocations? clang or gcc without any extra flags should suffice here To get the summarized DWARF I showed above, I used this complete command line: $ clang++ -g a.cpp b.cpp && llvm-dwarfdump -v -debug-info a.out | grep "DW_TAG\|DW_AT_[^ ]*pc\|DW_AT_ranges\|^ *\[\|DW_AT_name" | sed -e "s/............//" (using clang and llvm-dwarfdump from LLVM trunk) > I > like to replicate the produced DWARF but wasn't able to get something > that used ranges like in your examples. I also wonder about the ODR > violation, does your example depend on this being C++ or does it > produce the same issues when it was build as a C program? I believe C has different "inline" semantics that I'm not as familiar with - but I /believe/ the actual C standard inline semantics wouldn't produce the kind of situation that C++ does. (in C++ you define an inline function in every translation it's used - and the compiler can choose to inline or not, and if it doesn't actually inline then the object file carries a deduplicable definition of the function and then the linker picks one of those definitions from any in the input object files - whereas in C the inline function definition, if not inlined, is discarded by the compiler and the user must have provided a non-inline definition in one file as usual - so there's no duplication/deduplication) You could use function-sections/gc-sections to observe the "ODR violation" sort of situation where the addresses go to zero/tombstone rather than the "two subprograms point to one function" behavior: eg: $ clang -g -ffunction-sections -Wl,-gc-sections a.c && llvm-dwarfdump-tot -v -debug-info a.out | grep "DW_TAG\|DW_AT_[^ ]*pc\|DW_AT_ranges\|^ *\[\|DW_AT_name" | sed -e "s/............//" DW_TAG_compile_unit [1] * DW_AT_name [DW_FORM_strp] ( .debug_str[0x00000065] = "a.c") DW_AT_low_pc [DW_FORM_addr] (0x0000000000000000) DW_AT_ranges [DW_FORM_sec_offset] (0x00000000 [0x0000000000000001, 0x0000000000000001) [0x0000000000401110, 0x0000000000401118)) DW_TAG_subprogram [2] DW_AT_low_pc [DW_FORM_addr] (0x0000000000000000) DW_AT_high_pc [DW_FORM_data4] (0x00000006) DW_AT_name [DW_FORM_strp] ( .debug_str[0x00000094] = "f1") DW_TAG_subprogram [3] DW_AT_low_pc [DW_FORM_addr] (0x0000000000401110) DW_AT_high_pc [DW_FORM_data4] (0x00000008) DW_AT_name [DW_FORM_strp] ( .debug_str[0x00000097] = "main") DW_TAG_base_type [4] DW_AT_name [DW_FORM_strp] ( .debug_str[0x0000009c] = "int") & I guess now we can show the full variety of tombstone behavior... (the above example was with bfd ld, using 1 as a tombstone in debug_ranges and 0 as the tombstone elsewhere (such as in the low_pc of the "f1" subprogram)) - this works unless zero or 1 (or other "small" values - or you have large functions (so [0, 6) range becomes larger and starts overlapping with the non-gc'd functions)) are part of the valid address range of the program - if they are, then the subprogram address ranges become ambiguous & you don't know which function you're in Then we've got gold (add "-fuse-ld=gold" to the compilation command), just snipping the relevant bit of the output: DW_AT_ranges [DW_FORM_sec_offset] (0x00000000 [0x0000000000000000, 0x0000000000000006) [0x0000000000400510, 0x0000000000400518)) DW_TAG_subprogram [2] DW_AT_low_pc [DW_FORM_addr] (0x0000000000000000) Here we can see gold's technique of using "0+addend" as the tombstone value - which works, again, until your valid address range is lower or you have large functions (or you special case zero as the tombstone - which then works until you have zero as a valid code address, or you have empty functions (where range and loc lists would get terminated prematurely) or you have a function that starts at a non-zero addend... ) Then we've got lld's new behavior (which will hopefully be adopted by the other linkers and the DWARF standard as a more robust solution): DW_AT_ranges [DW_FORM_sec_offset] (0x00000000 [0xfffffffffffffffe, 0xfffffffffffffffe) # this would be 0xffffffffffffffff in DWARFv5, but needs to be 0xfffffffffffffffe in DWARFv4 to avoid creating unintended base address selection entries in debug_loc and debug_ranges [0x0000000000201690, 0x0000000000201698)) DW_TAG_subprogram [2] DW_AT_low_pc [DW_FORM_addr] (0xffffffffffffffff) Which probably works about as well as the other solutions if the consumer isn't special casing things (& isn't being too fussy about the fact that low_pc+(data4)high_pc might overflow... ) and also allows the consumer to special case more intentionally without ruling out zero as a valid address, etc.