From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from brightrain.aerifal.cx (brightrain.aerifal.cx [216.12.86.13]) by sourceware.org (Postfix) with ESMTPS id B141C385AC12 for ; Fri, 19 Nov 2021 19:18:53 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B141C385AC12 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=libc.org Authentication-Results: sourceware.org; spf=none smtp.mailfrom=libc.org Date: Fri, 19 Nov 2021 14:18:52 -0500 From: Rich Felker To: Fangrui Song Cc: "H.J. Lu" , Adhemerval Zanella , Carlos O'Donell , GNU C Library Subject: Re: Can DT_RELR catch up glibc 2.35? Message-ID: <20211119191852.GU7074@brightrain.aerifal.cx> References: <20211112074723.uvmlvihlutnib6ik@google.com> <0732a3cc-8dad-52fb-96e3-ef5da8eb3a8e@linaro.org> <20211118003025.6pq5jucukbgrw7zg@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20211118003025.6pq5jucukbgrw7zg@google.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-Spam-Status: No, score=-5.1 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS, KAM_INFOUSMEBIZ, KAM_LAZY_DOMAIN_SECURITY, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=no autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Nov 2021 19:18:56 -0000 On Wed, Nov 17, 2021 at 04:30:25PM -0800, Fangrui Song wrote: > On 2021-11-17, H.J. Lu wrote: > >On Wed, Nov 17, 2021 at 4:46 AM Adhemerval Zanella > > wrote: > >> > >> > >> > >>On 16/11/2021 21:26, H.J. Lu wrote: > >>> On Tue, Nov 16, 2021 at 1:07 PM Adhemerval Zanella > >>> wrote: > >>>> > >>>> > >>>> > >>>> On 12/11/2021 04:47, Fangrui Song wrote: > >>>>> I am glad that https://sourceware.org/pipermail/libc-alpha/2021-October/132029.html > >>>>> ("[PATCH v2] elf: Support DT_RELR relative relocation format [BZ #27924]") gets > >>>>> some traction and many folks acknowledge the size benefit. > >>>>> (On my Arch Linux, I measured 8% decrease for my /usr/bin.) > >>>> > >>>> I brought this to the weekly glibc call two weeks ago and if I recall correctly > >>>> the *main* issue is we need a proper generic ABI definition published to move this > >>>> forward on glibc side (H.J.Lu was adamant about). > >>>> > >>>> From my part, current status where we have multiple system that already support > >>>> it (android, chromeos, freebsd) and with a toolchain that supports build/check > >>>> glibc on at least 4 different ABIs (lld 13 on x86 and arm) is good enough. > >>>> > >>>> We lack of proper testing while using bfd might a drawback, since we lack a way > >>>> to generate binaries without linker support. > >>>> > >>>>> > >>>>> There are two potential issues. > >>>>> > >>>>> 1. Lack of "Time travel compatibility" detector > >>>>> 2. Some folks feel that unable to test with scripts/build-many-glibcs.py is a problem. > >>>>> (ld.lld --pack-dyn-relocs=relr (since July 2018) is the only linker implementation > >>>>> and scripts/build-many-glibcs.py doesn't have an lld configuration) > >>>>> > >>>>> Let me address them for you. > >>>>> > >>>>> --- > >>>>> > >>>>> 1. > >>>>> > >>>>> "Time travel compatibility" means running a new object on an old system. > >>>>> A new object using DT_RELR doesn't have the R_*_RELATIVE part in > >>>>> .rel.dyn/.rela.dyn and is destined to crash. > >>>>> > >>>>> If the GNU ld implementation (which may take a while) adopts an > >>>>> undefined versioned .dynsym symbol (e.g. _dl_have_relr > >>>>> https://sourceware.org/pipermail/binutils/2021-October/118347.html), > >>>>> we can guarantee old ld.so will report an error. > >>>>> The undefined symbol needs to be versioned because ld -shared (default > >>>>> to --allow-shlib-undefined) does not error on unversioned symbols. Say > >>>>> GNU ld adopts something like _dl_have_relr@GLIBC_2.40 . Now it is funny as GNU > >>>>> ld needs to know the glibc version "GLIBC_2.40", not just the stem > >>>>> glibc-flavored symbol name "_dl_have_relr". > >>>> > >>>> This might be troublesome to backport, since it would require to use a higher > >>>> version than the baseline one. I am not sure if distro will be willing or plan > >>>> to backport such feature though. > >>>> > >>>>> > >>>>> There are non-Linux OSes which don't like a "_dl_have_relr" symbol name. > >>>>> GNU ld would have to provide options in two flavors, one with > >>>>> _dl_have_relr@GLIBC_2.40, one without. Among glibc systems, there are > >>>>> plenty of distros there which don't rigidly require a friendly > >>>>> diagnostic for "time traverl compatibility", e.g. I pretty sure many > >>>>> Gentoo Linux folks doing aggressive optimizations know that their > >>>>> executables don't run on old systems. > >>>> > >>>> I think even other Linux libc, such as musl, won't be willing to support > >>>> tying the DT_RELR to a loader/libc symbol existing (musl even less because > >>>> it explicit does not support symbol versioning). > >>>> > >>>>> > >>>>> An alternative to _dl_have_relr is EI_ABIVERSION. That is probably even > >>>>> less appealing because bumping the version locks out many ELF consumers. > >>>>> https://maskray.me/blog/2021-10-31-relative-relocations-and-relr#ei_abiversion > >>>>> In addition, I noticed that Debian ld.so 2.32 just seems to ignore EI_ABIVERSION. > >>>> > >>>> The problem with EI_ABIVERSION is a limitation of glibc, which only checks > >>>> EI_ABIVERSION on open_verify() and this is not called on default process > >>>> execution, where kernel will be one responsible to load both the binary > >>>> and the interpreter: > >>>> > >>>> --- > >>>> $ cat test.c > >>>> #include > >>>> > >>>> int main () > >>>> { > >>>> return 0; > >>>> } > >>>> $ gdb ./test > >>>> [...] > >>>> (gdb) starti > >>>> [...] > >>>> process 1420253 > >>>> Mapped address spaces: > >>>> > >>>> Start Addr End Addr Size Offset objfile > >>>> 0x555555554000 0x555555555000 0x1000 0x0 /tmp/test/test > >>>> 0x555555555000 0x555555556000 0x1000 0x1000 /tmp/test/test > >>>> 0x555555556000 0x555555557000 0x1000 0x2000 /tmp/test/test > >>>> 0x555555557000 0x555555559000 0x2000 0x2000 /tmp/test/test > >>>> 0x7ffff7fc2000 0x7ffff7fc6000 0x4000 0x0 [vvar] > >>>> 0x7ffff7fc6000 0x7ffff7fc8000 0x2000 0x0 [vdso] > >>>> 0x7ffff7fc8000 0x7ffff7fc9000 0x1000 0x0 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 > >>>> 0x7ffff7fc9000 0x7ffff7ff1000 0x28000 0x1000 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 > >>>> 0x7ffff7ff1000 0x7ffff7ffb000 0xa000 0x29000 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 > >>>> 0x7ffff7ffb000 0x7ffff7fff000 0x4000 0x32000 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 > >>>> 0x7ffffffde000 0x7ffffffff000 0x21000 0x0 [stack] > >>>> 0xffffffffff600000 0xffffffffff601000 0x1000 0x0 [vsyscall] > >>>> --- > >>>> > >>>> However, the test is correctly executed on any load library and/or if the > >>>> executable is executed by issuing the loader directly: > >>>> > >>>> --- > >>>> $ readelf -h test > >>>> ELF Header: > >>>> Magic: 7f 45 4c 46 02 01 01 00 *04* 00 00 00 00 00 00 00 > >>>> [...] > >>>> $ /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 ./test > >>>> ./test: error while loading shared libraries: ./test: ELF file ABI version invalid > >>>> --- > >>>> > >>>> I think this is an bug, since it basically defeats the EI_ABIVERSION check > >>>> and makes programs executed by issuing the loader with a different semantic > >>>> than the one executed through execve syscall. > >>>> > >>>> Afaik kernel does not pass such information on auxv vector (we might ask > >>>> for a AT_EHDR eventually) so a potential fix will cost us some extra > >>>> syscalls on every program execution (to read and check the ELF Header with > >>>> similar test done on open_verify()). > >>>> > >>>> However it does *not* help on older glibc which will still accept old binaries. > >>>> > >>>>> > >>>>> % r2 -wqc 'wx 22 @ 8' a; readelf -Wh a | grep ABI; ./a > >>>>> OS/ABI: UNIX - GNU > >>>>> ABI Version: 34 > >>>>> hello > >>>>> > >>>> > >>>> I am not really sure if the 'time travel compatibility' is really an issue, > >>>> although I saw reports where users try to use chromeos library on glibc that > >>>> fails in some strange ways (most likely due DT_RELR). If user is deploying > >>>> a *opt-in* feature that requires proper dynamic loader support, I would > >>>> expect it know the environment he is targeting. > >>>> > >>>> So I think the best course of action for this issue is indeed fix EI_ABIVERSION > >>>> and make DT_RELR a new 'libc-abis' entry. We might backport the EI_ABIVERSION > >>>> fix to some older releases, and distros that want to use DT_RELR should do also. > >>> > >>> Given that EI_ABIVERSION doesn't really work, should we revisit my > >>> GNU_PROPERTY_1_GLIBC_2_NEEDED proposal: > >>> > >>> https://sourceware.org/pipermail/binutils/2021-October/118292.html > >> > >>The GNU_PROPERTY_1_GLIBC_2_NEEDED still does not really help much if the idea > >>is to backport DT_RELR to older version and it still adds logic on the static > >>linker about glibc symbol version. I would like that static linker know as > >>little as possible about glibc version, EI_ABIVERSION is way simpler and > >>already express ABI extensions. > >> > >>I still think for DT_RELR instead of inventing another GNU extension, we might > >>fix EI_ABIVERSION and use it properly. Checking with kernel, I think it should > >>be simple: the elf header is located at the AT_PHDR - sizeof (ElfW(Ehdr)), so we > >>can refactor the tests at open_verify and use on rtld.c for the case execve() > >>is called for the executable. > > > >The scheme should work for older systems without changes. Can we add > >GLIBC_PRIVATE_DT_RELR? Linker adds GLIBC_PRIVATE_DT_RELR > >version dependency when DT_RELR is generated > > For CCed folks who may be puzzled about the context, > I have a write-up > https://maskray.me/blog/2021-10-31-relative-relocations-and-relr#time-travel-compatibility > which provides my reply to HJ's question as well. > > A synthesized versioned undefined dynamic symbol can indeed catch "time > travel compatibility", but the mechanism would be the first time ld adds an option variant > for a particular libc implementation (glibc) locking out all other > implementations: --pack-dyn-relocs=relr-glibc or -z relr-glibc. > Sigh, it is really not pretty. > > We know many other libc implementations don't want to synthesize such a > symbol. If you really want this, I have an alternate solution: add a new relocation type to live in the normal REL/RELA table, whose semantics are "process a DT_RELR table". This will cause the dynamic linker to error out of it's too old to know about DT_RELR, and it can be ignored as a no-op (or used as the trigger to process DT_RELR) by ldso that's new enough to know about it. Rich