From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-il1-x134.google.com (mail-il1-x134.google.com [IPv6:2607:f8b0:4864:20::134]) by sourceware.org (Postfix) with ESMTPS id 87E9A3885C03; Sun, 31 May 2020 23:17:34 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 87E9A3885C03 Received: by mail-il1-x134.google.com with SMTP id 18so7663296iln.9; Sun, 31 May 2020 16:17:34 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=iy4ofj4UHaVhpkhCJwK/9ZYUU3u080qHskvMBKppFAA=; b=DvEcm/OkJggPsnfQ9eYM9vSq0nGgFcFxKhue+aGPwo8Al5qlrrcJpsid6CpHdoxqtY 8P8MLql4FeMQ7iI//EULK+derQvddVIUatfHszOJPozCfr/VY3sbF3AH3eEmjVpEF/eT ll0b3RcYWKKP92vMv+reTE5aX9uNAPNZlqN30oqjOi1UiZOay4eUxzfUvgHWFYDWgwuQ uVxxKJVtW9fyFCrLS3PnBk5qAjwqWxjX5xlH/CExnFLz3o4HdgXVGbbXc/SY56s4vVGz CzQNcdgwL9JJJl8CISHQLd3QZE2Uy1J+RVKRspid8iGF1wxp45SwUtSGM7YsahhmzTBr hGBg== X-Gm-Message-State: AOAM533hbGbMmuTqSWfqsDZTmhNX95RhkBu9mvjbhrvodccQVYaOgsOr cmo7ckY6CZpu3JIPV18fn97EP+1BoT2MHiU4nziFBw== X-Google-Smtp-Source: ABdhPJwOJg0E82Ulrf8ipLu78LZ0jNyuHfGvsyqemCA0oe1MElzLWDjXwc++HdgnJIAcGPzRFHLkkHQShz4ov5Rr1Mc= X-Received: by 2002:a92:d88c:: with SMTP id e12mr19390482iln.197.1590967053797; Sun, 31 May 2020 16:17:33 -0700 (PDT) MIME-Version: 1.0 References: <20200531185506.mp2idyczc4thye4h@google.com> <20200531201016.GJ44629@wildebeest.org> <20200531204738.xhyuemuaygi5ihdd@google.com> <20200531221111.GL44629@wildebeest.org> In-Reply-To: <20200531221111.GL44629@wildebeest.org> From: David Blaikie Date: Sun, 31 May 2020 16:17:22 -0700 Message-ID: Subject: Re: Range lists, zero-length functions, linker gc To: Mark Wielaard Cc: Fangrui Song , gdb@sourceware.org, elfutils-devel@sourceware.org, binutils@sourceware.org Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-3.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gdb@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 31 May 2020 23:17:36 -0000 On Sun, May 31, 2020 at 3:42 PM Mark Wielaard wrote: > > Hi, > > On Sun, May 31, 2020 at 01:47:38PM -0700, Fangrui Song via Elfutils-devel wrote: > > On 2020-05-31, Mark Wielaard wrote: > > > I think this is a producer problem. If a (code) section can be totally > > > dropped then the associated (.debug) sections should have been > > > generated together with that (code) section in a COMDAT group. That > > > way when the linker drops that section, all the associated sections in > > > that COMDAT group will get dropped with it. If you don't do that, then > > > the DWARF is malformed and there is not much a consumer can do about > > > it. > > > > > > Said otherwise, I don't think it is correct for the linker (with > > > --gc-sections) to drop any sections that have references to it > > > (through relocation symbols) from other (.debug) sections. > > > > I would love if we could solve the problem using ELF features, but > > putting DW_TAG_subprogram in the same section group is not an > > unqualified win > > Sorry for pushing back a little, No worries - so long as other people engage with the rest of the thread, hopefully - happy to/worthwhile discussing all the edges. > but as a DWARF consumer this feels a > little like the DWARF producer hasn't tried hard enough to produce > valid DWARF and now tries to pass the problems off onto the DWARF > consumer. I think the fact that it's been this way across multiple compilers, linkers, and debuggers for decades is pretty strong evidence that it's at least a strategy producers do use/probably want to/will continue using. > Or when looking at it from the perspective of the linker, > the compiler gave it an impossible problem to solve because it didn't > really get all the pieces of the puzzle (the compiler already fused > some independent parts together). > > I certainly appreciate the issue on 32-bit systems. It seems we > already have reached the limits for some programs to be linked (or > produce all the debuginfo) when all you got is 32-bits. > > But maybe that means that the problem is actually that the compiler > already produced too much code/data. And the issue really is that it > passes some problems, like unused code elimination, off to the > linker. While the compiler really should have a better view of that, > and should do that job itself. Something like LLVM's ThinLTO does help here - avoiding duplication in object files, but doesn't entirely eliminate code removal in the final link step. Anything that attempts to improve this (including ThinLTO) comes at the cost of "pinch points" in the compilation - places where global knowledge is required to decide how to remove the redundancy - which complicates and potentially slows down the build (if you want cross-file optimizations, you're willing to pay some slowdown there - but if you're looking for a quick interactive build, this sort of extra pinch point is going to be unfortunate (ThinLTO helps mitigate some of the huge cost of LTO, but it's still extra steps)). > If it did, then it would never even > produce the debuginfo in the first place. > > GCC used to produce horrible DWARF years ago with early LTO > implementations, because they just handed it all off to the linker to > figure out. But they solved it by generating DWARF in phases, only > when it was known the DWARF was valid/consistent did it get > produced. So that if some code was actually eliminated then the linker > never even see any "code ranges" for code that disappeared. See Early > Debug: https://gcc.gnu.org/wiki/early-debug Ah, interesting read - thanks for the link! Yeah, LLVM took a different path there - the serializable IR (GIMPL equivalent, I guess) includes a semantic representation of DWARF, essentially - so while we've dealt with various issues around IR+IR linking for (Thin & full) LTO, it wasn't such a hard break/architectural issue as GCC dealt with there. Though we have discussed/entertained the idea of doing something more like GCC does - generating static DWARF earlier in the front-end and serializing a blob of relatively opaque DWARF in the IR except for the bits of IR (variable locations, etc) that the compiler needs visibility into. That particular way GCC used of separating the CUs is an interesting one to know about/keep in mind if we go down that route (though might hit the Split DWARF/multiple CUs issue if we did). > Might some similar technique, where the compiler does a bit more work, > so that it actually produces less DWARF to be processed by the linker, > be used here? Not while we're looking at the "classic" compilation model (compile source files to object files, link object files), that I'm aware of. > Sorry for pushing the problem back to the producer side, but as a > consumer I think that is the more correct place to solve this. No worries - and I there might be some interesting approaches to consider, but I think the history of this issue is long enough that some producers, in some use-cases, will continue to want this functionality that's been (in some cases explicitly (eg: bfd's support for debug_ranges looks very explicitly to support DWARF in this situation), in some cases defacto) supported for quite a while. - Dave