From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-io1-xd42.google.com (mail-io1-xd42.google.com [IPv6:2607:f8b0:4864:20::d42]) by sourceware.org (Postfix) with ESMTPS id 553893840C37; Sun, 31 May 2020 21:33:45 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 553893840C37 Received: by mail-io1-xd42.google.com with SMTP id h4so1683066iob.10; Sun, 31 May 2020 14:33:45 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=RguSr0PkFzCSYbIA1rmhUrXTQmK7cLjBH2d2FY/rR9U=; b=pLa5n0uqSokDOo2CEim39kjaOnCINq2w+90z37x8L97RunjYbajfnUvlY4bwqyJQWD tT+y1PRk29ltE9qbvEhHktLVP9LA+x9ycS6Uq6e55hinqsIMXKUqGcSeh36GNpvS8Zj7 8qh1UQf1AWO+n57sifFRx9DrED80j96mDT0ifGB3vxddRL/HSLdKJ0REhgvuzF+fHkx6 8DIFkob9lWYwrZtJqHXHU0pTGlheOEmoX04qjgrkjwcJskwZ2EQEDy1cbGzHirOT4ydd 9aGGS5pWHBI4cfooemJl6CMvZ108QKMmpua/rqFfwpfvuhwMfVnFBaMuY/1LkWoKmcOU HdWw== X-Gm-Message-State: AOAM532WPq3GOwCjkLP6ACbnz5k3fURiC8mLvjRVzVnUakC+SoCi4Fa+ 0rNVEUfunRZvfqxf11qLDzN5TGCoIIKGBu30ufJVEw== X-Google-Smtp-Source: ABdhPJz8JsdquPhzhxIhQod9jcKbbYY5aTPrvbAaU7qMxxbMysDrx/aN5KjPcyw1CU9EsDVAALM5BNMfNFayIHoIyrA= X-Received: by 2002:a05:6638:1405:: with SMTP id k5mr17335306jad.108.1590960824710; Sun, 31 May 2020 14:33:44 -0700 (PDT) MIME-Version: 1.0 References: <20200531185506.mp2idyczc4thye4h@google.com> In-Reply-To: <20200531185506.mp2idyczc4thye4h@google.com> From: David Blaikie Date: Sun, 31 May 2020 14:33:33 -0700 Message-ID: Subject: Re: Range lists, zero-length functions, linker gc To: Fangrui Song Cc: binutils@sourceware.org, gdb@sourceware.org, elfutils-devel@sourceware.org Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-2.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gdb@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 31 May 2020 21:33:47 -0000 Thanks for getting this conversation started here, Fangrui, I might summarize things slightly differently (some corrections - some just different phrasing): Current situation: When a linker discards code (either chooses a comdat copy from another object file that's not identical (two inline functions might be optimized differently, so DWARF can't point both descriptions to the same code - one has to be pointed to some "null" data essentially) or because of --gc-sections, etc) the DWARF that had relocations to them must be given some value. But what value? Current situation: bfd: 1 in debug_ranges, 0 elsewhere (debug_ or otherwise) lld and gold: 0+addend everywhere (debug_ or otherwise) Problems: bfd uses 1 in debug_ranges to avoid creating a 0,0 range entry (<= DWARFv4, debug_ranges contains address pairs terminated by 0,0) that would terminate the list prematurely bfd misses the same problem in debug_loc - though that's less impactful (debug_loc are usually just within the scope of one function, so it's usually all or nothing - if it terminates the list early it's not good for dumpers, but not likely a problem for debuggers - though in theory you could have a debug_loc across multiple functions/sections (if you optimize a global variable up into a local register through different functions) - and then terminating the list early would be a problem) lld/gold approach ends up mostly creating ranges like [0, length) - for sufficiently large functions, or code mapped into sufficiently low address ranges this range could overlap with real code and create ambiguities unless the consumer special cased starting at zero... - except for the ".text.x" example below, where 0+addend could still result a [positive, positive) address range that would be impossible to reliably identify in the consumer lld/gold has a more severe problem in the event of empty functions (GCC and Clang can both produce empty functions - simplest example being "int f1() { }" - yeah, you can't call this validly, but still code that can appear and is valid so long as it isn't called - also (where we found this recently) "void f1() { llvm_unreachable(); }" creates zero-length functions too) 0+addend produces a [0, 0) entry in the range list which terminates it prematurely and breaks debug info for other code that appears after the empty function. So, it'd be nice to improve the situation for low-range code that could overlap with the [0+addend, 0+addend) situation in lld/gold, fix the 0,0 debug_range problem, and maybe overall make this more explicit/intentional/consistent between producers (compilers and linkers), consumers, and the DWARF spec itself. -1 isn't workable in general, because it has special meaning in debug_ranges and debug_loc - but otherwise it's probably a pretty good "special" constant (though I guess in theory someone could map their code to the very top of their address range? I assume that's less likely than using zero or other "low-ish" address spaces that could overlap with the [0+addend, 0+addend) situation of lld/gold). Hence Fangrui's suggestion of -2 for debug_ranges and debug_loc, -1 everywhere else (at least all debug_* sections - but "all other sections" if that turns out to be a problematic value for non-debug sections) On Sun, May 31, 2020 at 12:19 PM Fangrui Song via Gdb wrote: > > It is being discussed on llvm-dev > (https://lists.llvm.org/pipermail/llvm-dev/2020-May/141885.html https://groups.google.com/forum/#!topic/llvm-dev/i0DFx6YSqDA) > what linkers should do regarding relocations referencing dropped functions (due > to section group rules, --gc-sections, /DISCARD/, etc) in .debug_* > > As an example: > > __attribute__((section(".text.x"))) void f1() { } > __attribute__((section(".text.x"))) void f2() { } > int main() { } > > Some .debug_* sections are relocated by R_X86_64_64 referencing undefined symbols (the STT_SECTION > symbols are collected): > > 0x00000043: DW_TAG_subprogram [2] > ###### relocated by .text.x + 10 > DW_AT_low_pc [DW_FORM_addr] (0x0000000000000010 ".text.x") > DW_AT_high_pc [DW_FORM_data4] (0x00000006) > DW_AT_frame_base [DW_FORM_exprloc] (DW_OP_reg6 RBP) > DW_AT_linkage_name [DW_FORM_strp] ( .debug_str[0x0000002c] = "_Z2f2v") > DW_AT_name [DW_FORM_strp] ( .debug_str[0x00000033] = "f2") > > > With ld --gc-sections: > > * DW_AT_low_pc [DW_FORM_addr] in .debug_info are resolved to 0 + addend > This can cause overlapping address ranges with normal text sections. {{overlap}} > * [beginning address offset, ending address offset) in .debug_ranges are resolved to 1 (ignoring addend). > See bfd/reloc.c (behavior introduced in > https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=e4067dbb2a3368dbf908b39c5435c84d51abc9f3 ) > > [0, 0) cannot be used because it terminates the list entry. > [-1, -1) cannot be used because -1 represents a base address selection entry which will affect > subsequent address offset pairs. > * .debug_loc address offset pairs have similar problem to .debug_ranges > * In DWARF v5, the abnormal values can be in a separate section .debug_addr > > --- > > To save your time, I have a summary of the discussions. I am eager to know what you think > of the ideas from binutils/gdb/elfutils's perspective. > > * {{reserved_address}} Paul Robinson wants to propose that DWARF v6 reserves a special address. > All (undef + addend) in .debug_* are resolved to -1. > > We have to ignore the addend. With __attribute__((section(".text.x"))), > the address offset pair may be something like [.text.x + 16, .text.x + 24) > I have to resolve the whole (.text.x + 16) to the special value. > > (undef + addend) in pre-DWARF v5 .debug_loc and .debug_ranges are resolved to -2 > (0 and -1 cannot be used due to the reasons above). > > * Refined formula for a relocated value in a non-SHF_ALLOC section: > > if is_defined(sym) > return addr(sym) + addend > if relocated_section is .debug_ranges or .debug_loc > return -2 # addend is intentionally ignored > > // Every DWARF v5 section falls here > return -1 {{zero}} > > * {{zero}} Can we resolve (undef + addend) to 0? > > https://lists.llvm.org/pipermail/llvm-dev/2020-May/141967.html > > > while it might not be an issue for ELF, DWARF would want a standard that's fairly resilient to > > quirky/interesting use cases (admittedly - such platforms could equally want to make their > > executable code way up in the address space near max or max - 1, etc?). > > Question: is address 0 meaningful for code in some binary formats? > > * {{overlap}} The current situation (GNU ld, gold, LLD): (undef + addend) in .debug_* are resolved to addend. > For an address offset pair like [.text + 0, .text + 0x10010), if the ending address offset is large > enough, it may overlap with a normal text address range (for example [0x10000, *)) > > This can cause problems in debuggers. How does gdb solve the problem? > > * {{nonalloc}} Linkers resolve (undef + addend) in non-SHF_ALLOC sections to > `addend`. For non-debug sections (open-ended), do we have needs resolving such > values to `base` or `base+addend` where base is customizable? > (https://lists.llvm.org/pipermail/llvm-dev/2020-May/141956.html )