From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-vs1-xe43.google.com (mail-vs1-xe43.google.com [IPv6:2607:f8b0:4864:20::e43]) by sourceware.org (Postfix) with ESMTPS id B55953851C23; Sat, 20 Jun 2020 00:46:57 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org B55953851C23 Received: by mail-vs1-xe43.google.com with SMTP id m25so6594172vsp.8; Fri, 19 Jun 2020 17:46:57 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=cnFelruDrOhNtwm5yYQ1tZqye+vcdp9J23dX0GmgGI4=; b=gZhgLGkDzg3hrPOYWgwXAMnhQGWy3swOv4k8d08Lxw9Lo8izIlLzMxtVkrZI96eAUT KUMreVdx8GbDBYkfr7C/57G2Wyv5KQvJelTeMPOhW6xE600vhHy8ch7dcYGlt8u+JRo5 cqhNnw9lgIMe46jn782J72pVHxH854DFCgjFhK69mGEg+ZHK1j/bNHR5it7XcJpt2GOL 1z4e3eppfSqe5p62xT/fxE9Zt73MAvOBwHF6u9syTqydM73zRGOWi2kG0jW0pbJqOIzA 5/sv00U2H1ZsZv/ChPN+0sXMl9HF/FQEnqI1jSLg9zPqgZamkYK3TGa7bOEwX0CaJ5/C Hbzg== X-Gm-Message-State: AOAM5302DK7VKfiMuo01zg9QKSEok9DO4ADziy229DVgl59OcfIJNLbk R0ZJCoZGmvAWPOQFsiFltvoMiS5qv9YCtoDsprM= X-Google-Smtp-Source: ABdhPJwViOk7esbT3Hz1/SrcBQga5iFiN927wzpFoRU3p2BWOHIP+KPJGuz6XLS6XU5g+1qSksNVC5Ct/eqwn5KCdaw= X-Received: by 2002:a67:b405:: with SMTP id x5mr9189433vsl.79.1592614017092; Fri, 19 Jun 2020 17:46:57 -0700 (PDT) MIME-Version: 1.0 References: <20200531185506.mp2idyczc4thye4h@google.com> <20200531201016.GJ44629@wildebeest.org> <20200531222937.GM44629@wildebeest.org> <20200601093103.GN44629@wildebeest.org> <5e22c0183325aae16a28e301c7a83cea479130a0.camel@klomp.org> In-Reply-To: <5e22c0183325aae16a28e301c7a83cea479130a0.camel@klomp.org> From: David Blaikie Date: Fri, 19 Jun 2020 17:46:46 -0700 Message-ID: Subject: Re: Range lists, zero-length functions, linker gc To: Mark Wielaard Cc: gdb@sourceware.org, elfutils-devel@sourceware.org, binutils@sourceware.org, Fangrui Song Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-1.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gdb@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Jun 2020 00:46:59 -0000 On Fri, Jun 19, 2020 at 5:00 AM Mark Wielaard wrote: > > Hi, > > On Tue, 2020-06-02 at 11:06 -0700, David Blaikie via Elfutils-devel wrote: > > > I do think combining Split DWARF and LTO might not be the best > > > solution. When doing LTO you probably want something like GCC Early > > > Debug, which is like Split DWARF, but different, because the Early > > > Debug simply doesn't contain any address (ranges) yet (not even through > > > indirection like .debug_addr). > > > > I don't think Early Debug fits here - it seems like it was > > specifically for DWARF that doesn't refer to any code (eg: function > > declarations and type definitions). I don't see how it could be used > > for the actual address-referencing DWARF needed to describe function > > definitions. > > I think that is kind of the point of Early Debug. Only use DWARF (at > first) for address/range-less data like types and program scope > entries, but don't emit anything (in DWARF format) for things that > might need adjustments during link/LTO phase. The problem with using > DWARF with address (ranges) during early object creation is that the > linker isn't capable to rewrite the DWARF. You'll need a linker plugin > that calls back into the compiler to do the actual LTO and emit the > actual DWARF containing address/ranges (which can then link back to the > already emitted DWARF types/program scope/etc during the Early Debug > phase). I think the issue you are describing is actually that you do > use DWARF to describe function definitions (not just the declarations) > too early. If you aren't sure yet which addresses will be used DWARF > isn't really the appropriate (temporary) debug format. Sorry, I think we keep talking around each other. Not sure if we can reach a good consensus or shared understanding on this topic. DWARF in unlinked object files has been a fairly well used temporary debug format for a long time - and the DWARF spec has done a lot to ensure it is compatible with ELF in both object files and linkers forever, basically? So I don't think it'd be suitable to say "DWARF isn't an appropriate intermediate debug format to use between compilers and linkers". In the sense that I don't think either the DWARF committee members, producers, or consumers would agree with this sentiment. > > > > > > & again the overhead of all those separate contributions, headers, > > > > > > etc, turns out to be not very desirable in any case. > > > > > > > > > > Yes, I agree with that. But as said earlier, maybe the compiler > > > > > shouldn't have generated to code/data in the first place? > > > > > > > > In the (especially) C++ compilation model, I don't believe that's > > > > possible - inline functions, templates, etc, require duplication - > > > > unless you have a more complicated build process that can gather the > > > > potential duplication, then fan back out again to compile, etc. > > > > ThinLTO does some of this - at a cost of a more complicated build > > > > system, etc. > > > > > > It might be useful for the original discussion to have a few more > > > concrete examples to show when you might have unused code that the > > > linker might want to discard, but where the compiler could only produce > > > DWARF in one big blob. Apart of the -ffunction-sections case, > > > > Function sections, inline functions, function templates are core examples. > > I understand the function sections case, but can you give actual > examples of an inline function or function template source code and how > a DWARF producer generates DWARF for that? Maybe some simple source > code we can put through gcc or clang to see how they (mis)handle it. > Not being a compiler architect I am not sure I understand why those > cannot be expressed correctly. oh, sure! sorry. a simple case of inline functions being deduplicated looks like this: a.cpp: inline void f1() { } void f2() { f1(); } b.cpp: inline void f1() { } void f2(); int main() { f1(); f2(); } This actually demonstrates a slightly different behavior of bfd and gold: When the comdats are the same size (I'm told that's the heuristic) and the local symbol names the DWARF uses to refer to the functions (f1 in this case) - then both DWARF descriptions are resolved to point to the same deduplicated copy of 'f1', eg: BFD and Gold both produce this DWARF (uninteresting attributes have been omitted): DW_TAG_compile_unit [1] * DW_AT_name [DW_FORM_strp] ( .debug_str[0x00000065] = "a.cpp") DW_AT_low_pc [DW_FORM_addr] (0x0000000000000000) DW_AT_ranges [DW_FORM_sec_offset] (0x00000000 [0x0000000000401110, 0x000000000040111b) [0x0000000000401120, 0x0000000000401126)) DW_TAG_subprogram [2] DW_AT_low_pc [DW_FORM_addr] (0x0000000000401110) DW_AT_high_pc [DW_FORM_data4] (0x0000000b) DW_AT_name [DW_FORM_strp] ( .debug_str[0x0000009d] = "f2") DW_TAG_subprogram [2] DW_AT_low_pc [DW_FORM_addr] (0x0000000000401120) DW_AT_high_pc [DW_FORM_data4] (0x00000006) DW_AT_name [DW_FORM_strp] ( .debug_str[0x000000a7] = "f1") DW_TAG_compile_unit [1] * DW_AT_name [DW_FORM_strp] ( .debug_str[0x000000aa] = "b.cpp") DW_AT_low_pc [DW_FORM_addr] (0x0000000000000000) DW_AT_ranges [DW_FORM_sec_offset] (0x00000030 [0x0000000000401130, 0x0000000000401142) [0x0000000000401120, 0x0000000000401126)) DW_TAG_subprogram [2] DW_AT_low_pc [DW_FORM_addr] (0x0000000000401130) DW_AT_high_pc [DW_FORM_data4] (0x00000012) DW_AT_name [DW_FORM_strp] ( .debug_str[0x000000b0] = "main") DW_TAG_subprogram [3] DW_AT_low_pc [DW_FORM_addr] (0x0000000000401120) DW_AT_high_pc [DW_FORM_data4] (0x00000006) DW_AT_name [DW_FORM_strp] ( .debug_str[0x000000a7] = "f1") Now you have two CUs that have overlapping ranges, which is interesting - if not strictly invalid (DWARF being permissive and all). Though I think the size heuristic is risky - it's possible that 'f1' was optimized differently in the two compilations and just happened to end up with the same size - but the DWARF descriptions may be incorrect for the other version of the function (eg: one compiler chose to put a constant in one register, the toher compiler used another register - same instruction sequence length, but the DWARF would be different and incorrect to mismatch like that) If you end up with different function lengths (which is common enough in larger programs - different other definitions may be available, different inlining heuristics about overall object size, etc, may kick in) then you get BFD and Gold's current tombstoning behavior: DW_TAG_compile_unit [1] * DW_AT_name [DW_FORM_strp] ( .debug_str[0x00000065] = "a.cpp") DW_AT_low_pc [DW_FORM_addr] (0x0000000000000000) DW_AT_ranges [DW_FORM_sec_offset] (0x00000000 [0x0000000000401110, 0x000000000040111b) [0x0000000000401120, 0x000000000040112b)) DW_TAG_subprogram [2] DW_AT_low_pc [DW_FORM_addr] (0x0000000000401110) DW_AT_high_pc [DW_FORM_data4] (0x0000000b) DW_AT_name [DW_FORM_strp] ( .debug_str[0x0000009d] = "f2") DW_TAG_subprogram [2] DW_AT_low_pc [DW_FORM_addr] (0x0000000000401120) DW_AT_high_pc [DW_FORM_data4] (0x0000000b) DW_AT_name [DW_FORM_strp] ( .debug_str[0x000000a7] = "f1") DW_TAG_compile_unit [1] * DW_AT_name [DW_FORM_strp] ( .debug_str[0x000000aa] = "b.cpp") DW_AT_low_pc [DW_FORM_addr] (0x0000000000000000) DW_AT_ranges [DW_FORM_sec_offset] (0x00000030 [0x0000000000401130, 0x0000000000401142) [0x0000000000000001, 0x0000000000000001)) DW_TAG_subprogram [2] DW_AT_low_pc [DW_FORM_addr] (0x0000000000401130) DW_AT_high_pc [DW_FORM_data4] (0x00000012) DW_AT_name [DW_FORM_strp] ( .debug_str[0x000000b0] = "main") DW_TAG_subprogram [3] DW_AT_low_pc [DW_FORM_addr] (0x0000000000000000) DW_AT_high_pc [DW_FORM_data4] (0x00000006) DW_AT_name [DW_FORM_strp] ( .debug_str[0x000000a7] = "f1") In this case BFD uses the tombstone value 0 in most sections, but uses 1 in debug_ranges to ensure it doesn't produce the 0,0 that would end the range list early (this workaround is incomplete and should also be applied to debug_loc which is terminated by 0,0 too - but GCC (and Clang) doesn't produce any inter-function location lists, so this doesn't present a problem in practice/for now, except for dumping tools which end up seeing "holes" in debug_loc that would otherwise be dumpable) Gold's behavior in this case is a little different, using the 0+addend approach: DW_TAG_compile_unit [1] * DW_AT_name [DW_FORM_strp] ( .debug_str[0x00000065] = "a.cpp") DW_AT_low_pc [DW_FORM_addr] (0x0000000000000000) DW_AT_ranges [DW_FORM_sec_offset] (0x00000000 [0x0000000000400540, 0x000000000040054b) [0x0000000000400550, 0x000000000040055b)) DW_TAG_subprogram [2] DW_AT_low_pc [DW_FORM_addr] (0x0000000000400540) DW_AT_high_pc [DW_FORM_data4] (0x0000000b) DW_AT_name [DW_FORM_strp] ( .debug_str[0x0000009d] = "f2") DW_TAG_subprogram [2] DW_AT_low_pc [DW_FORM_addr] (0x0000000000400550) DW_AT_high_pc [DW_FORM_data4] (0x0000000b) DW_AT_name [DW_FORM_strp] ( .debug_str[0x000000a7] = "f1") DW_TAG_compile_unit [1] * DW_AT_name [DW_FORM_strp] ( .debug_str[0x000000aa] = "b.cpp") DW_AT_low_pc [DW_FORM_addr] (0x0000000000000000) DW_AT_ranges [DW_FORM_sec_offset] (0x00000030 [0x0000000000400560, 0x0000000000400572) [0x0000000000000000, 0x0000000000000006)) DW_TAG_subprogram [2] DW_AT_low_pc [DW_FORM_addr] (0x0000000000400560) DW_AT_high_pc [DW_FORM_data4] (0x00000012) DW_AT_name [DW_FORM_strp] ( .debug_str[0x000000b0] = "main") DW_TAG_subprogram [3] DW_AT_low_pc [DW_FORM_addr] (0x0000000000000000) DW_AT_high_pc [DW_FORM_data4] (0x00000006) DW_AT_name [DW_FORM_strp] ( .debug_str[0x000000a7] = "f1") I introduced an ODR violation here (by modifying a.cpp's f1 to call f2 - thus making a.cpp's f1 a different length from b.cpp's f1) just as an easy way to demonstrate the "different lengths" issue - but this could arise from valid code that was differently optimized in the two translation units. & yeah - on an LLVM thread we did dabble with what it'd look like to use comdats without whole separate units to put these together - and it's possible, though that doesn't apply to Split DWARF (can't piece together the debug_addr section either - since it'd throw of the indexes used from the Split DWARF file) - and still adds extra section overhead. Did prototype debug_ranges/debug_rnglist comdat assembling (so the CU's range list wouldn't have entries for the deduplicated/gc'd functions) (but again, more ELF sections - for little gain in linked debug info size for the cost in intermediate object size) > > > where I > > > would argue the compiler simply needs to make sure that if it generates > > > code in separate sections it also should create the DWARF separate > > > section (groups). > > > > I don't think that's practical - the overhead, I believe, is too high. > > Headers for each section contribution (ELF headers but DWARF headers > > moreso - having a separate .debug_addr, .debug_line, etc section for > > each function would be very expensive) would make for very large > > object files. > > I see your point, but maybe this shouldn't be handled by the linker > then, but maybe have a linker plugin so the compiler can fixup the > DWARF (or generate it later). This sounds like it'd still be fairly intrusive (architecturally) and expensive (both from a software complexity and linking time/memory usage/etc). I'm not ruling it out as a possibility - and I'm interested in dabbling with this kind of deduplication purely academically (my users use Split DWARF, so there's no opportunity there to fix this - so my interest in in-.o/linked executable DWARF is limited to personal interest). I'm curious about just how expensive the ELF sections would be, what sort of custom scheme might be used instead (I could imagine a content-aware feature that might be more terse than generic ELF sections, but not especially invasive (wouldn't require parsing or rewriting DWARF DIEs, etc). That's being discussed in the LLVM community - but I don't expect it'll be soon, nor pervasively used even if it is built. So I come back to Split DWARF making this fairly well impossible to implement without a tombstone value, so far as I can imagine/think of. And function sections at least making it very expensive to implement (either in terms of object size and/or significant changes to the nature of linking DWARF). And this being a pretty well established use case/feature for decades now - that has some relatively small drawbacks in certain narrow cases (zero length functions, zero or low address values that are valid in some use cases) that adding an explicit tombstone is necessary in some cases and beneficial if not strictly necessary in others. - Dave > > Cheers, > > Mark