From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-il1-x141.google.com (mail-il1-x141.google.com [IPv6:2607:f8b0:4864:20::141]) by sourceware.org (Postfix) with ESMTPS id 46A6C385DC00; Mon, 1 Jun 2020 20:18:17 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 46A6C385DC00 Received: by mail-il1-x141.google.com with SMTP id a13so5885582ilh.3; Mon, 01 Jun 2020 13:18:17 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=1x/O4KYLm6PIVyyzhd0f4HEhjZZ58qjru4uynchH/gA=; b=UL99aSPrNPWQ6cs4jAyPdT+nIY4JTOd0tEB9Lfg5SCGTlsqp5ZjgRTQwOpO6DdEd8N uMpenl3+l54ilPMjl2/ZS33+oqXX5S092hgUIx0VgYbVoXZsXwoWpbmKBYQk9GrA1wmK Cgv8xLIAB9wEckYZRQ5tiRKsEk4tjod9x0hesPsSAo/6+L234q6snrnrhwEAzfrjCR8Y erK7wnnUjEeSQi+eyhvEhByMGrWNzvQTOA6USq+x1rSOJMZNrikk/FJz015sxRgWX31X e6BM/Q+4FVpM/ps4DnG/KDA8a4AG6PqPVxaw36VJ4O6TLbBZbbqedschDru/qTyUSz/6 xd9A== X-Gm-Message-State: AOAM530ay4rHoP6lxQzluY4m5dPXHAdOxDmEViWX8OWJ/LeByIiobcZ1 omo2lnPMbmf6KDLYJt0NuuKrVnO6krVsPdp87JE= X-Google-Smtp-Source: ABdhPJzULEcsdY0ls8ZSeGeDw3Z2bG3GU+o+esx0EE6RyI3W7jl3NxT0gdoUay2EsPTER9EL6Ua48FUrt+ftWYT2Sgo= X-Received: by 2002:a92:c5c5:: with SMTP id s5mr11905304ilt.85.1591042696624; Mon, 01 Jun 2020 13:18:16 -0700 (PDT) MIME-Version: 1.0 References: <20200531185506.mp2idyczc4thye4h@google.com> <20200531201016.GJ44629@wildebeest.org> <20200531222937.GM44629@wildebeest.org> <20200601093103.GN44629@wildebeest.org> In-Reply-To: <20200601093103.GN44629@wildebeest.org> From: David Blaikie Date: Mon, 1 Jun 2020 13:18:05 -0700 Message-ID: Subject: Re: Range lists, zero-length functions, linker gc To: Mark Wielaard Cc: Fangrui Song , gdb@sourceware.org, elfutils-devel@sourceware.org, binutils@sourceware.org Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: elfutils-devel@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Elfutils-devel mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 01 Jun 2020 20:18:18 -0000 On Mon, Jun 1, 2020 at 2:31 AM Mark Wielaard wrote: > > Hi, > > On Sun, May 31, 2020 at 03:36:02PM -0700, David Blaikie wrote: > > On Sun, May 31, 2020 at 3:30 PM Mark Wielaard wrote: > > > On Sun, May 31, 2020 at 01:49:12PM -0700, David Blaikie wrote: > > > > That's probably not practical for at least some users - the > > > > easiest/most thorough counter-example is Split DWARF - the DWARF is in > > > > another file the linker can't see. All the linker sees is a list of > > > > addresses (debug_addr). > > > > > > I might be missing something, but I think this works fine with Split > > > DWARF. As long as you make sure that the .dwo files/sections are > > > separated along the same lines as the ELF section groups are. That > > > means each section group either gets its own .dwo file, or you > > > generate the .dwo sections in the same section group in the same > > > object file using the SHF_EXCLUDED trick. That way each .debug.dwo > > > uses their own index into the separate .debug_addr tables. If that > > > group, with the .debug_addr table, gets discarded, then the reference > > > to the .dwo also disappears and it simply won't be used. > > > > Oh, a whole separate .dwo file per function? That would be pretty > > extreme/difficult to implement (now the compiler's producing a > > variable number of output files? using some naming scheme so the build > > system could find them again for building a .dwp if needed, etc). > > Each skeleton compilation unit has a DW_AT_dwo_name attribute which > indicates the .dwo file where the split unit sections can be found. It > actually seems seems easier to generate a different one for each > skeleton compilation unit than trying to combine them for all the > different skeleton compilation units you produce. > > > Certainly Bazel (& the internal Google version used to build most > > Google software) can't handle an unbounded/unknown number of output > > files from a build action. > > Yes, in principle .dwo files seems troublesome for build systems in > general. They're pretty practical when they're generated right next to the .o file & that's guaranteed by the compiler. "if you generate x.o, there will be x.dwo next to it" - that's certainly how Bazel deals with this. It doesn't parse the DWARF at all - knowing where the .dwo files are along with the .o files. > Especially since to do things properly you would need to read > the actual dwo_name attribute to make the connection from > object/skeleton file to split dwarf object file. And there is no easy > way to map back from .dwo to main ELF file. I don't think those issues have come up as problems for Google's deployment of Split DWARF which we've been using since the early prototypes. > Because of that I am > actually a fan of the SHF_EXCLUDED hack that simply places the split > .dwo sections in the same object file. For the above that would mean, > just place them in the same section group. This was a newer feature added during standardization of Split DWARF, which is handy for some users - but doesn't address the needs of the original design of Split DWARF (for Google) - a distributed build system that is trying to avoid moving more bytes than it must to one machine to run the link step. So not having to ship all the DWARF bytes to one machine for interactive debugging (pulling down from a distributed file system only the needed .dwo files during debugging - not all of them) - or at least being able to ship all the .dwo files to one machine to make a .dwp, and ship all the .o files to another machine for the link. > > > Multiple CUs in a single .dwo file is not really supported, which > > would be another challenge (we had to compromise debug info quality a > > little because of this limitation when doing ThinLTO - unable to emit > > multiple CUs into each thin-linked .o file) - at which point maybe the > > compiler'd need to produce an intermediate .dwp file of sorts... > > Are you sure? Fairly sure - I worked in depth on the implementation of ThinLTO & considered a variety of options trying to support Split DWARF in that situation. > Each CU would have a separate dwo_id field to > distinquish them. At least that is how elfutils figures out which CU > in a dwo file matches a given skeleton DIE. This should work the same > as for type units, you can have multiple type untis in the same file > and distinquish which one you need by matching the signature. One of the complications is that it increased the complexity of making a .dwp file - Split DWARF is spec'd to ensure that the linking process is as lightweight as possible. Not having the size overhead of relocations (though trading off more indirection through the cu_index, debug_str_offsets, etc). Oh right... that was the critical issue: There was no way I could think of to do cross-CU references in Split DWARF (cross-CU references being critical to LTO - inlining from one CU into another, etc). Because there was no relocation processing in dwp generation. Arguably maybe one could use a sec_offset that's resolved relative to a local range within the contributions described by the cu_index - but the cu_index must have one entry per unit (the entries are keyed on unit) - I guess you could have a single entry per CU, but have those entries overlap (so all the CUs from one dwo file get separate index entries that contain the same contribution ranges). Then consumers would have to search through the debug_info contribution to find the right unit.... defeating some of the value of the index. > > & again the overhead of all those separate contributions, headers, > > etc, turns out to be not very desirable in any case. > > Yes, I agree with that. But as said earlier, maybe the compiler > shouldn't have generated to code/data in the first place? In the (especially) C++ compilation model, I don't believe that's possible - inline functions, templates, etc, require duplication - unless you have a more complicated build process that can gather the potential duplication, then fan back out again to compile, etc. ThinLTO does some of this - at a cost of a more complicated build system, etc. - Dave