From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ua1-x944.google.com (mail-ua1-x944.google.com [IPv6:2607:f8b0:4864:20::944]) by sourceware.org (Postfix) with ESMTPS id 3F50A3851C25; Sat, 20 Jun 2020 01:03:02 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 3F50A3851C25 Received: by mail-ua1-x944.google.com with SMTP id b13so3795281uav.3; Fri, 19 Jun 2020 18:03:02 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=aExg40ucjzvHGRkET3/40Ga2Afy2PO6NQHJX7lqKuIM=; b=MwAVHy0r1P3ts0YSqVX45r3vYp+/OcS4dCGgWDRwxt/99HTctkvgFJw0Puov4M30xY kV6ghe7AunrXDjVvDFihITsu83Zc8ok2jgJgqISQOxFm7SZRZYYTSx3IBxE8TQHqN35G 2h0i9AHhN3eoQHQH8/cCr17bbPxPU9KAGn3GZllGnILHh7+3novGhYlUP/gA4TWwPdW8 h2LQOxUTN3Hlvfs/65ANYHF8vJUTF54eWmCOAOmnryZ/8lW7SJI5MYOX4WVuc+PC6ASy f15WWMHy3f8kEhFbcuinONdrD+05RQPu+5FLC4rJFjAwhQ6K89LfKALoMvtzsopfG0UD JXgQ== X-Gm-Message-State: AOAM531SaFpx0P2LrGmj1vwZFl4ZsSq65+VXOl+4x3bxKJ0Wm4jOdviP xU51pECA9bE0JfjIw3k5B04F5xo6W+IOJid7jfQ= X-Google-Smtp-Source: ABdhPJy79+rzPNvFpse4oeVLdOwfGPvYjA27vRBgxNFVEwWTBw2eHV+qhHkQhxeRZKKbJiXqsw0Qm/uVUpw3HjaiBsU= X-Received: by 2002:ab0:13ab:: with SMTP id m40mr4626120uae.131.1592614981762; Fri, 19 Jun 2020 18:03:01 -0700 (PDT) MIME-Version: 1.0 References: <20200531201016.GJ44629@wildebeest.org> <20200531222937.GM44629@wildebeest.org> <20200601093103.GN44629@wildebeest.org> <20200603031040.GD29024@bubble.grove.modra.org> <20200609202414.2olgwq2jniweeyr6@google.com> <69e4e7a60b23bff32d88b3edd2a718cf2f6e8cdc.camel@klomp.org> In-Reply-To: <69e4e7a60b23bff32d88b3edd2a718cf2f6e8cdc.camel@klomp.org> From: David Blaikie Date: Fri, 19 Jun 2020 18:02:50 -0700 Message-ID: Subject: Re: Tombstone values in debug sections (was: Range lists, zero-length functions, linker gc) To: Mark Wielaard Cc: Fangrui Song , gdb@sourceware.org, elfutils-devel@sourceware.org, binutils@sourceware.org, Alan Modra Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-1.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: elfutils-devel@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Elfutils-devel mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Jun 2020 01:03:04 -0000 On Fri, Jun 19, 2020 at 1:04 PM Mark Wielaard wrote: > > Hi, > > On Tue, 2020-06-09 at 13:24 -0700, Fangrui Song via Elfutils-devel wrote: > > I want to revive the thread, but focus on whether a tombstone value > > (-1/-2) in .debug_* can cause trouble to various DWARF consumers (gdb, > > debug related tools in elfutils and other utilities I don't know about). > > > > Paul Robinson has proposed that DWARF v6 should reserve a tombstone > > value (the value a relocation referencing a discarded symbol in a > > .debug_* section should be resolved to) > > http://www.dwarfstd.org/ShowIssue.php?issue=200609.1 > > I would appreciate having a clear "not valid" marker instead of getting > a possibly bogus (but valid) address. -1 seems a reasonable value. > Although I have seen (and written) code that simply assumes zero is > that value. Yep - and zero seemed like a good one - except in debug_ranges and debug_loc where that would produce a premature list termination (bfd.ld gets around this by using 1 in debug_ranges) - or on architectures for which 0 is a valid address. if you use the zero+addend approach that gold uses (and lld did use/maybe still does, but is going to move away from) then you /almost/ avoid the need to special case debug_ranges and debug_loc, until you hit a zero-length function (you can create zero-length functions from code like "int f1() { }" or "void f2() { __builtin_unreachable(); }") - then you get the early list termination again Also zero+addend might trip up in a case like: "void f1() { } __attribute__((nodebug)) void f2() { } void f3() { }" - now f3's starting address has a non-zero addend, so it's indistinguishable from valid code at a very low address > Would such an invalid address marker in an DW_AT_low_pc make the whole > program scope under a DIE invalid? What about (addr, loc, rng) base > addresses? Can they contain an invalid marker, does that make the whole > table/range invalid? That would be my intent, yes - any pointer derived from an invalid address would be invalid. Take the f1/f2/f3 nodebug example above - f3's starting address could be described by "invalid address + offset" (currently DWARF has no way of describing this - well, it sort of does, you could use an exprloc with an OP_addrx and the arithmetic necessary to add to that - though I doubt many consumers could handle an exprloc there - but I would like to champion that to enable reuse of address pool entries to reduce the size of .o debug info contributions when using Split DWARF - or just reduce the number of relocations/.o file size when using non-split DWARF), so it'd be important for that to be special cased in pointer arithmetic so the tombstone value propagates through arithmetic. > I must admit that as a DWARF consumer I am slightly worried that having > a sanctioned "invalid marker" will cause DWARF producers to just not > coordinate and simply assume they can always invalidate anything they > emit. At least in my experience (8 years or so working on LLVM's DWARF emission) we've got a pretty strong incentive to reduce DWARF size already - I don't think any producers are being particularly cavalier about producing excess DWARF on the basis that it can be marked invalid. > Even if there could be a real solution by coordinating between > compiler/linker who is responsible for producing the valid DWARF > entries (especially when LTO is involved). A lot of engineering work went into restructuring LLVM's debug info IR representation for LTO to ensure LLVM doesn't produce DWARF for functions deduplicated or dropped by LTO. - Dave > > > Some comments about the proposal: > > > > > - deduplicating different functions with identical content; GNU > > > refers > > > to this as ICF (Identical Code Folding); > > > > ICF (gold --icf={safe,all}) can cause DW_TAG_subprogram with > > different DW_AT_name to have the same range. > > Cary Coutant wrote up a general Two-Level Line Number Table proposal to > address the issue of having a single machine instruction corresponds to > more than one source statement: > http://wiki.dwarfstd.org/index.php?title=TwoLevelLineTables > > Which seems useful in these kind of situations. But I don't know the > current status of the proposal. This was motivated by a desire to be able to do symbolized stack traces including inline stack frames with a smaller representation than is currently possible in DWARF - it allows the line table itself to describe inlining, to some degree, rather than relying on the DIE tree (in part this was motivated by a desire to be able to symbolized backtraces with inlining in-process when Split DWARF is used and the .dwo/.dwp files are not available). I don't think it extends to dealing with the case of deduplication like this - nor addresses the possibility of two CUs having overlapping instruction ranges. (it's semantically roughly equivalent to the inlined_subroutines of a subprogram - not so much related to two copies of a function being deduplicated & then being shared by CUs) - Dave