From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 40960 invoked by alias); 27 Mar 2017 11:39:33 -0000 Mailing-List: contact binutils-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: binutils-owner@sourceware.org Received: (qmail 40951 invoked by uid 89); 27 Mar 2017 11:39:32 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-10.3 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_2,GIT_PATCH_3,KAM_ASCII_DIVIDERS,RCVD_IN_DNSWL_NONE,RP_MATCHES_RCVD,SPF_PASS autolearn=ham version=3.3.2 spammy=circumstances, swept, viable, sweeping X-HELO: mailapp01.imgtec.com Received: from mailapp01.imgtec.com (HELO mailapp01.imgtec.com) (195.59.15.196) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 27 Mar 2017 11:39:29 +0000 Received: from HHMAIL01.hh.imgtec.org (unknown [10.100.10.19]) by Forcepoint Email with ESMTPS id 04DA72D795555; Mon, 27 Mar 2017 12:39:21 +0100 (IST) Received: from [10.20.78.230] (10.20.78.230) by HHMAIL01.hh.imgtec.org (10.100.10.21) with Microsoft SMTP Server id 14.3.294.0; Mon, 27 Mar 2017 12:39:21 +0100 Date: Mon, 27 Mar 2017 11:39:00 -0000 From: "Maciej W. Rozycki" To: Alan Modra CC: Nick Clifton , Subject: [PATCH] PR ld/21233: Avoid sweeping forced-undefined symbols in section GC Message-ID: User-Agent: Alpine 2.00 (DEB 1167 2008-08-23) MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" X-SW-Source: 2017-03/txt/msg00340.txt.bz2 Complement commit 902e9fc76a0e ("PR ld/20828: Move symbol version processing ahead of GC symbol sweep"), commit b531344c34b0 ("PR ld/20828: Reorder the symbol sweep stage of section GC") and commit 81ff47b3a546 ("PR ld/20828: Fix linker script symbols wrongly forced local with section GC"), and prevent symbols forcibly entered in the output file with the use of the `--undefined=' or `--require-defined=' linker command line options or the EXTERN linker script command from being swept in section garbage collection and consequently recorded in the dynamic symbol table as local entries. This happens in certain circumstances, where a symbol reference also exists in one of the static input files, however only in a section which is garbage-collected and does not make it to the output file, and the symbol is defined in a dynamic object present in the link. For example with the `i386-linux' target and the `pr21233.s' and `pr21233-l.s' sources, and the `pr21233.ld' linker script included with this change we get: $ as -o pr21233-l.o pr21233-l.s $ ld -shared -T pr21233.ld -o libpr21233.so pr21233-l.o $ as -o pr21233.o pr21233.s $ ld --gc-sections -e foo --require-defined=bar -T pr21233.ld -o pr21233 pr21233.o libpr21233.so $ readelf --dyn-syms pr21233 Symbol table '.dynsym' contains 2 entries: Num: Value Size Type Bind Vis Ndx Name 0: 00000000 0 NOTYPE LOCAL DEFAULT UND 1: 00000000 0 OBJECT LOCAL DEFAULT UND bar $ which makes the run-time `bar' dependency of the `pr21233' executable different from its corresponding link-time dependency, i.e. the presence of `libpr21233.so' and its `bar' symbol is required at the link time, however at the run time a copy of `libpr21233.so' without `bar' will do. Similarly with `--undefined=' and EXTERN which do not actually require the reference to the symbol requested to be satisfied with a definition at the link time, however once the definition has been pulled at the link time, so it should at the dynamic load time. Additionally with the `mips-linux' target we get: $ ld --gc-sections -e foo --require-defined=bar -T pr21233.ld -o pr21233 pr21233.o libpr21233.so ld: BFD (GNU Binutils) 2.28.51.20170324 assertion fail .../bfd/elfxx-mips.c:3861 $ as the target is not prepared to handle such a local dynamic symbol. With this change in effect we get: $ readelf --dyn-syms pr21233 Symbol table '.dynsym' contains 2 entries: Num: Value Size Type Bind Vis Ndx Name 0: 00000000 0 NOTYPE LOCAL DEFAULT UND 1: 00000000 0 OBJECT GLOBAL DEFAULT UND bar $ instead, for both targets. ld/ PR ld/21233 * ldlang.c (insert_undefined): Set `mark' for ELF symbols. * testsuite/ld-elf/pr21233.sd: New test. * testsuite/ld-elf/pr21233-l.sd: New test. * testsuite/ld-elf/pr21233.ld: New test linker script. * testsuite/ld-elf/pr21233-e.ld: New test linker script. * testsuite/ld-elf/pr21233.s: New test source. * testsuite/ld-elf/pr21233-l.s: New test source. * testsuite/ld-elf/shared.exp: Run the new tests. --- NB I haven't checked if a dynamic executable with an undefined unused local symbol is going to succeed or fail execution with the dynamic loader, but in any case I see our current semantics as inconsistent. And as I previously noted in the course of addressing PR ld/20828 in any case there's no point in having a local symbol (other than perhaps for special cases some psABIs may define) in the dynamic symbol table. This brings us with two potential solutions: either retaining the global scope for the symbol or removing the symbol altogether. I think the former approach is more consistent with the definitions of the command-line options and the linker script command concerned, which is why I chose it over the latter. There have been no regressions across my usual targets with this change applied. The new test cases do however fail across a number of targets. The cause is their handling of copy relocations. As `bar' is a dynamic data symbol, space in the `.dynbss' section is allocated and a copy relocation produced. As `.dynbss' is discarded from output the relocation is lost, and the local copy of the `bar' symbol converted to an absolute symbol, e.g. with `m68k-linux': $ readelf --dyn-syms pr21233 Symbol table '.dynsym' contains 2 entries: Num: Value Size Type Bind Vis Ndx Name 0: 00000000 0 NOTYPE LOCAL DEFAULT UND 1: 00000000 1 OBJECT GLOBAL DEFAULT ABS bar $ (NB this variant of `bar' is similarly entered as a local symbol if the executable is linked without the change discussed here). The exact list of targets affected I have identified is as follows: aarch64-linux am33_2.0-linux arc-linux arm-eabi arm-linuxeabi arm-netbsdelf crisv32-linux m68k-elf m68k-linux mn10300-elf nios2-linux vax-linux vax-netbsdelf Additionally `bfin-elf' and `bfin-uclinux' fail with: ld: the bfin target does not currently support the generation of copy relocations ld: failed to set dynamic section sizes: File format not recognized I have identified the cause of this phenomenon to be the reverse order `elf_gc_sweep' and `_bfd_elf_link_check_relocs' are called in on these targets, causing `->non_got_ref' to be set for the symbol even though the reference is later discarded. The possibility to change the order has been introduced with commit d968975277ba ("Check ELF relocs after opening all input files"), using CHECK_RELOCS_AFTER_OPEN_INPUT, after a discussion started at which indicates the intent to gradually swap the order for all targets. After the initial change for x86 this has only been since done for SH (cf PR ld/17739), i.e. I gather we are still in the middle of the move. Which brings me a question to our general maintainers: which of the following 3 options shall I pick for the purpose of this test case: 1. Leave the new failures as they are and let maintainers handle them as they find need or time; there may be more to be done for individual targets beyond the lone change to CHECK_RELOCS_AFTER_OPEN_INPUT. 2. File a PR referring to commit d968975277ba and its discussion and KFAIL the affected test cases for the problematic targets. 3. Tweak the tests to accept absolute symbols (and exclude `bfin-*-*' from testing as a hopeless case). I'd rather avoid this option as I see such an outcome as invalid, defeating the purpose of the command-line options and the linker script command concerned. NB I have verified that using `-z nocopyreloc' does not change anything for most of these problematic targets, so this is not a viable workaround. Maciej binutils-ld-insert-undefined-mark.diff Index: binutils/ld/ldlang.c =================================================================== --- binutils.orig/ld/ldlang.c 2017-03-22 15:26:43.651680031 +0000 +++ binutils/ld/ldlang.c 2017-03-22 18:44:47.798267237 +0000 @@ -3429,6 +3429,8 @@ insert_undefined (const char *name) { h->type = bfd_link_hash_undefined; h->u.undef.abfd = NULL; + if (is_elf_hash_table (link_info.hash)) + ((struct elf_link_hash_entry *) h)->mark = 1; bfd_link_add_undef (link_info.hash, h); } } Index: binutils/ld/testsuite/ld-elf/pr21233-e.ld =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ binutils/ld/testsuite/ld-elf/pr21233-e.ld 2017-03-22 18:44:54.976834149 +0000 @@ -0,0 +1,2 @@ +EXTERN (bar) +INCLUDE pr21233.ld Index: binutils/ld/testsuite/ld-elf/pr21233-l.s =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ binutils/ld/testsuite/ld-elf/pr21233-l.s 2017-03-22 18:44:54.994117791 +0000 @@ -0,0 +1,6 @@ + .data + .globl bar + .type bar, %object +bar: + .byte 1 + .size bar, . - bar Index: binutils/ld/testsuite/ld-elf/pr21233-l.sd =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ binutils/ld/testsuite/ld-elf/pr21233-l.sd 2017-03-22 19:18:22.818303465 +0000 @@ -0,0 +1,6 @@ +# Make sure global `bar' is present in the dynamic symbol table, e.g.: +# Num: Value Size Type Bind Vis Ndx Name +# 1: 00000000 1 OBJECT GLOBAL DEFAULT 5 bar +#... + *[0-9]+: +[0-9a-f]+ +1 +OBJECT +GLOBAL +DEFAULT +[0-9]+ +bar +#pass Index: binutils/ld/testsuite/ld-elf/pr21233.ld =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ binutils/ld/testsuite/ld-elf/pr21233.ld 2017-03-22 18:44:55.060208515 +0000 @@ -0,0 +1,17 @@ +SECTIONS +{ + .hash : { *(.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .rel.dyn : { *(.rel.dyn) } + .text : { *(.text) } + .dynamic : { *(.dynamic) } + .data : { *(.data) } + .symtab : { *(.symtab) } + .strtab : { *(.strtab) } + .shstrtab : { *(.shstrtab) } + .plt : { *(.plt) } + .got.plt : { *(.got.plt) } + .got : { *(.got) } + /DISCARD/ : { *(*) } +} Index: binutils/ld/testsuite/ld-elf/pr21233.s =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ binutils/ld/testsuite/ld-elf/pr21233.s 2017-03-22 18:44:55.080529486 +0000 @@ -0,0 +1,8 @@ + .text + .globl foo + .type foo, %function +foo: + .size foo, . - foo + + .data + .dc.a bar Index: binutils/ld/testsuite/ld-elf/pr21233.sd =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ binutils/ld/testsuite/ld-elf/pr21233.sd 2017-03-22 18:44:55.092782869 +0000 @@ -0,0 +1,9 @@ +# Make sure the `bar' reference is global rather than local +# in the dynamic symbol table, e.g.: +# Num: Value Size Type Bind Vis Ndx Name +# 1: 00000000 0 OBJECT GLOBAL DEFAULT UND bar +# vs: +# 1: 00000000 0 OBJECT LOCAL DEFAULT UND bar +#... + *[0-9]+: +[0-9a-f]+ +0 +OBJECT +GLOBAL +DEFAULT +UND +bar +#pass Index: binutils/ld/testsuite/ld-elf/shared.exp =================================================================== --- binutils.orig/ld/testsuite/ld-elf/shared.exp 2017-03-22 15:26:44.259343252 +0000 +++ binutils/ld/testsuite/ld-elf/shared.exp 2017-03-22 18:54:52.902130284 +0000 @@ -115,6 +115,45 @@ if { [check_gc_sections_available] } { {{objdump -p pr20828-v.od}} \ "pr20828-v-2"]] } +# PR ld/21233 check for correct dynamic symbol table entries where: +# - a symbol has been defined in a shared library used in the link, +# - the symbol has been referenced from a section swept in garbage collection, +# - the symbol has also been forced to be entered in the output file as an +# undefined symbol, either with a command-line option or a linker script +# command. +# Verify that the undefined symbol is global rather than local. +if { [check_gc_sections_available] } { + run_ld_link_tests [list \ + [list \ + "PR ld/21233 dynamic symbols with section GC\ + (auxiliary shared library)" \ + "$LFLAGS -shared -T pr21233.ld" "" "$AFLAGS_PIC" \ + {pr21233-l.s} \ + {{readelf --dyn-syms pr21233-l.sd}} \ + "libpr21233.so"] \ + [list \ + "PR ld/21233 dynamic symbols with section GC (--undefined)" \ + "$LFLAGS --gc-sections -e foo --undefined=bar -T pr21233.ld" \ + "tmpdir/libpr21233.so" "" \ + {pr21233.s} \ + {{readelf --dyn-syms pr21233.sd}} \ + "pr21233-1"] \ + [list \ + "PR ld/21233 dynamic symbols with section GC (--require-defined)" \ + "$LFLAGS --gc-sections -e foo --require-defined=bar\ + -T pr21233.ld" \ + "tmpdir/libpr21233.so" "" \ + {pr21233.s} \ + {{readelf --dyn-syms pr21233.sd}} \ + "pr21233-2"] \ + [list \ + "PR ld/21233 dynamic symbols with section GC (EXTERN)" \ + "$LFLAGS --gc-sections -e foo -T pr21233-e.ld" \ + "tmpdir/libpr21233.so" "" \ + {pr21233.s} \ + {{readelf --dyn-syms pr21233.sd}} \ + "pr21233-3"]] +} # Check to see if the C compiler works if { [which $CC] == 0 } {