public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* Can DT_RELR catch up glibc 2.35?
@ 2021-11-12  7:47 Fangrui Song
  2021-11-16 21:07 ` Adhemerval Zanella
  0 siblings, 1 reply; 13+ messages in thread
From: Fangrui Song @ 2021-11-12  7:47 UTC (permalink / raw)
  To: Adhemerval Zanella, Carlos O'Donell, H.J. Lu; +Cc: libc-alpha

I am glad that https://sourceware.org/pipermail/libc-alpha/2021-October/132029.html
("[PATCH v2] elf: Support DT_RELR relative relocation format [BZ #27924]") gets
some traction and many folks acknowledge the size benefit.
(On my Arch Linux, I measured 8% decrease for my /usr/bin.)

There are two potential issues.

1. Lack of "Time travel compatibility" detector
2. Some folks feel that unable to test with scripts/build-many-glibcs.py is a problem.
   (ld.lld --pack-dyn-relocs=relr (since July 2018) is the only linker implementation
   and scripts/build-many-glibcs.py doesn't have an lld configuration)

Let me address them for you.

---

1.

"Time travel compatibility" means running a new object on an old system.
A new object using DT_RELR doesn't have the R_*_RELATIVE part in
.rel.dyn/.rela.dyn and is destined to crash.

If the GNU ld implementation (which may take a while) adopts an
undefined versioned .dynsym symbol (e.g. _dl_have_relr
https://sourceware.org/pipermail/binutils/2021-October/118347.html),
we can guarantee old ld.so will report an error.
The undefined symbol needs to be versioned because ld -shared (default
to --allow-shlib-undefined) does not error on unversioned symbols. Say
GNU ld adopts something like _dl_have_relr@GLIBC_2.40 . Now it is funny as GNU
ld needs to know the glibc version "GLIBC_2.40", not just the stem
glibc-flavored symbol name "_dl_have_relr".

There are non-Linux OSes which don't like a "_dl_have_relr" symbol name.
GNU ld would have to provide options in two flavors, one with
_dl_have_relr@GLIBC_2.40, one without. Among glibc systems, there are
plenty of distros there which don't rigidly require a friendly
diagnostic for "time traverl compatibility", e.g. I pretty sure many
Gentoo Linux folks doing aggressive optimizations know that their
executables don't run on old systems.

An alternative to _dl_have_relr is EI_ABIVERSION. That is probably even
less appealing because bumping the version locks out many ELF consumers.
https://maskray.me/blog/2021-10-31-relative-relocations-and-relr#ei_abiversion
In addition, I noticed that Debian ld.so 2.32 just seems to ignore EI_ABIVERSION.

% r2 -wqc 'wx 22 @ 8' a; readelf -Wh a | grep ABI; ./a
   OS/ABI:                            UNIX - GNU
   ABI Version:                       34
hello

---

2.

Fetching a prebuilt llvm-project 13.0.0 which supports many Linux distros is
difficult. The accessibility of ld.lld 13.0.0 is certainly nice but I wish that
you don't consider it a blocker as llvm-project 13.0.0 has arrived on many
distros and will arrive on others soon.

Moreover, I want to emphasize that the core logic is below 30 lines.  It is
isolated enough and uses sufficiently few interfaces so as NOT to cause
maintenance burden to other (tricky) parts of ld.so.

---

I installed Gentoo Linux last weekend for fun and chatted with some Gentoo
Linux folks who use -fuse-ld=lld. I am sending this message because I think I
should make the feature benefit them earlier. I know some Arch/Debian Linux
users are interested in the feature as well but they may have to wait longer
for GNU ld (their system linker) support.

I sincerely hope that the patch can catch up glibc 2.35.  By making the
functionality available in an older consumer, we just avoid more "time
travelling compatibility" problems.  Landing the consumer and the producer at
about the same time is actually the bane of many compatibility problems.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Can DT_RELR catch up glibc 2.35?
  2021-11-12  7:47 Can DT_RELR catch up glibc 2.35? Fangrui Song
@ 2021-11-16 21:07 ` Adhemerval Zanella
  2021-11-17  0:26   ` H.J. Lu
  2021-11-17 22:12   ` Florian Weimer
  0 siblings, 2 replies; 13+ messages in thread
From: Adhemerval Zanella @ 2021-11-16 21:07 UTC (permalink / raw)
  To: Fangrui Song, Carlos O'Donell, H.J. Lu; +Cc: libc-alpha, Rich Felker



On 12/11/2021 04:47, Fangrui Song wrote:
> I am glad that https://sourceware.org/pipermail/libc-alpha/2021-October/132029.html
> ("[PATCH v2] elf: Support DT_RELR relative relocation format [BZ #27924]") gets
> some traction and many folks acknowledge the size benefit.
> (On my Arch Linux, I measured 8% decrease for my /usr/bin.)

I brought this to the weekly glibc call two weeks ago and if I recall correctly 
the *main* issue is we need a proper generic ABI definition published to move this
forward on glibc side (H.J.Lu was adamant about).

From my part, current status where we have multiple system that already support
it (android, chromeos, freebsd) and with a toolchain that supports build/check
glibc on at least 4 different ABIs (lld 13 on x86 and arm) is good enough.

We lack of proper testing while using bfd might a drawback, since we lack a way
to generate binaries without linker support.

> 
> There are two potential issues.
> 
> 1. Lack of "Time travel compatibility" detector
> 2. Some folks feel that unable to test with scripts/build-many-glibcs.py is a problem.
>   (ld.lld --pack-dyn-relocs=relr (since July 2018) is the only linker implementation
>   and scripts/build-many-glibcs.py doesn't have an lld configuration)
> 
> Let me address them for you.
> 
> ---
> 
> 1.
> 
> "Time travel compatibility" means running a new object on an old system.
> A new object using DT_RELR doesn't have the R_*_RELATIVE part in
> .rel.dyn/.rela.dyn and is destined to crash.
> 
> If the GNU ld implementation (which may take a while) adopts an
> undefined versioned .dynsym symbol (e.g. _dl_have_relr
> https://sourceware.org/pipermail/binutils/2021-October/118347.html),
> we can guarantee old ld.so will report an error.
> The undefined symbol needs to be versioned because ld -shared (default
> to --allow-shlib-undefined) does not error on unversioned symbols. Say
> GNU ld adopts something like _dl_have_relr@GLIBC_2.40 . Now it is funny as GNU
> ld needs to know the glibc version "GLIBC_2.40", not just the stem
> glibc-flavored symbol name "_dl_have_relr".

This might be troublesome to backport, since it would require to use a higher
version than the baseline one.  I am not sure if distro will be willing or plan
to backport such feature though.

> 
> There are non-Linux OSes which don't like a "_dl_have_relr" symbol name.
> GNU ld would have to provide options in two flavors, one with
> _dl_have_relr@GLIBC_2.40, one without. Among glibc systems, there are
> plenty of distros there which don't rigidly require a friendly
> diagnostic for "time traverl compatibility", e.g. I pretty sure many
> Gentoo Linux folks doing aggressive optimizations know that their
> executables don't run on old systems.

I think even other Linux libc, such as musl, won't be willing to support
tying the DT_RELR to a loader/libc symbol existing (musl even less because
it explicit does not support symbol versioning).

> 
> An alternative to _dl_have_relr is EI_ABIVERSION. That is probably even
> less appealing because bumping the version locks out many ELF consumers.
> https://maskray.me/blog/2021-10-31-relative-relocations-and-relr#ei_abiversion
> In addition, I noticed that Debian ld.so 2.32 just seems to ignore EI_ABIVERSION.

The problem with EI_ABIVERSION is a limitation of glibc, which only checks
EI_ABIVERSION on open_verify() and this is not called on default process
execution, where kernel will be one responsible to load both the binary
and the interpreter:

---
$ cat test.c 
#include <stdio.h>

int main ()
{
  return 0;
}
$ gdb ./test 
[...]
(gdb) starti
[...]
process 1420253
Mapped address spaces:

          Start Addr           End Addr       Size     Offset objfile
      0x555555554000     0x555555555000     0x1000        0x0 /tmp/test/test
      0x555555555000     0x555555556000     0x1000     0x1000 /tmp/test/test
      0x555555556000     0x555555557000     0x1000     0x2000 /tmp/test/test
      0x555555557000     0x555555559000     0x2000     0x2000 /tmp/test/test
      0x7ffff7fc2000     0x7ffff7fc6000     0x4000        0x0 [vvar]
      0x7ffff7fc6000     0x7ffff7fc8000     0x2000        0x0 [vdso]
      0x7ffff7fc8000     0x7ffff7fc9000     0x1000        0x0 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
      0x7ffff7fc9000     0x7ffff7ff1000    0x28000     0x1000 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
      0x7ffff7ff1000     0x7ffff7ffb000     0xa000    0x29000 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
      0x7ffff7ffb000     0x7ffff7fff000     0x4000    0x32000 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
      0x7ffffffde000     0x7ffffffff000    0x21000        0x0 [stack]
  0xffffffffff600000 0xffffffffff601000     0x1000        0x0 [vsyscall]
---

However, the test is correctly executed on any load library and/or if the
executable is executed by issuing the loader directly:

---
$ readelf -h test
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 *04* 00 00 00 00 00 00 00 
[...]
$ /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 ./test
./test: error while loading shared libraries: ./test: ELF file ABI version invalid
---

I think this is an bug, since it basically defeats the EI_ABIVERSION check
and makes programs executed by issuing the loader with a different semantic
than the one executed through execve syscall.

Afaik kernel does not pass such information on auxv vector (we might ask
for a AT_EHDR eventually) so a potential fix will cost us some extra 
syscalls on every program execution (to read and check the ELF Header with 
similar test done on open_verify()).

However it does *not* help on older glibc which will still accept old binaries.

> 
> % r2 -wqc 'wx 22 @ 8' a; readelf -Wh a | grep ABI; ./a
>   OS/ABI:                            UNIX - GNU
>   ABI Version:                       34
> hello
> 

I am not really sure if the 'time travel compatibility' is really an issue,
although I saw reports where users try to use chromeos library on glibc that
fails in some strange ways (most likely due DT_RELR). If user is deploying
a *opt-in* feature that requires proper dynamic loader support, I would
expect it know the environment he is targeting.

So I think the best course of action for this issue is indeed fix EI_ABIVERSION
and make DT_RELR a new 'libc-abis' entry.  We might backport the EI_ABIVERSION
fix to some older releases, and distros that want to use DT_RELR should do also.

> ---
> 
> 2.
> 
> Fetching a prebuilt llvm-project 13.0.0 which supports many Linux distros is
> difficult. The accessibility of ld.lld 13.0.0 is certainly nice but I wish that
> you don't consider it a blocker as llvm-project 13.0.0 has arrived on many
> distros and will arrive on others soon.
> 
> Moreover, I want to emphasize that the core logic is below 30 lines.  It is
> isolated enough and uses sufficiently few interfaces so as NOT to cause
> maintenance burden to other (tricky) parts of ld.so.

The build-many-glibc support would be a nice addition, by I personally think
it should no be a blocker.  I have a long term goal to add DT_RELR support
on binutils, but since I don't have much experience with the internals
of the bfd, the progress pace is slow.

However, I think current lld status is good enough for at least x86
and arm (no idea about riscv besides the fact it builds).  I need to push my p
atch to enable powerpc support, however I still having trouble targeting power10
(which seems to be a lld issue).

> 
> ---
> 
> I installed Gentoo Linux last weekend for fun and chatted with some Gentoo
> Linux folks who use -fuse-ld=lld. I am sending this message because I think I
> should make the feature benefit them earlier. I know some Arch/Debian Linux
> users are interested in the feature as well but they may have to wait longer
> for GNU ld (their system linker) support.
> 
> I sincerely hope that the patch can catch up glibc 2.35.  By making the
> functionality available in an older consumer, we just avoid more "time
> travelling compatibility" problems.  Landing the consumer and the producer at
> about the same time is actually the bane of many compatibility problems.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Can DT_RELR catch up glibc 2.35?
  2021-11-16 21:07 ` Adhemerval Zanella
@ 2021-11-17  0:26   ` H.J. Lu
  2021-11-17 12:46     ` Adhemerval Zanella
  2021-11-17 22:12   ` Florian Weimer
  1 sibling, 1 reply; 13+ messages in thread
From: H.J. Lu @ 2021-11-17  0:26 UTC (permalink / raw)
  To: Adhemerval Zanella
  Cc: Fangrui Song, Carlos O'Donell, GNU C Library, Rich Felker

On Tue, Nov 16, 2021 at 1:07 PM Adhemerval Zanella
<adhemerval.zanella@linaro.org> wrote:
>
>
>
> On 12/11/2021 04:47, Fangrui Song wrote:
> > I am glad that https://sourceware.org/pipermail/libc-alpha/2021-October/132029.html
> > ("[PATCH v2] elf: Support DT_RELR relative relocation format [BZ #27924]") gets
> > some traction and many folks acknowledge the size benefit.
> > (On my Arch Linux, I measured 8% decrease for my /usr/bin.)
>
> I brought this to the weekly glibc call two weeks ago and if I recall correctly
> the *main* issue is we need a proper generic ABI definition published to move this
> forward on glibc side (H.J.Lu was adamant about).
>
> From my part, current status where we have multiple system that already support
> it (android, chromeos, freebsd) and with a toolchain that supports build/check
> glibc on at least 4 different ABIs (lld 13 on x86 and arm) is good enough.
>
> We lack of proper testing while using bfd might a drawback, since we lack a way
> to generate binaries without linker support.
>
> >
> > There are two potential issues.
> >
> > 1. Lack of "Time travel compatibility" detector
> > 2. Some folks feel that unable to test with scripts/build-many-glibcs.py is a problem.
> >   (ld.lld --pack-dyn-relocs=relr (since July 2018) is the only linker implementation
> >   and scripts/build-many-glibcs.py doesn't have an lld configuration)
> >
> > Let me address them for you.
> >
> > ---
> >
> > 1.
> >
> > "Time travel compatibility" means running a new object on an old system.
> > A new object using DT_RELR doesn't have the R_*_RELATIVE part in
> > .rel.dyn/.rela.dyn and is destined to crash.
> >
> > If the GNU ld implementation (which may take a while) adopts an
> > undefined versioned .dynsym symbol (e.g. _dl_have_relr
> > https://sourceware.org/pipermail/binutils/2021-October/118347.html),
> > we can guarantee old ld.so will report an error.
> > The undefined symbol needs to be versioned because ld -shared (default
> > to --allow-shlib-undefined) does not error on unversioned symbols. Say
> > GNU ld adopts something like _dl_have_relr@GLIBC_2.40 . Now it is funny as GNU
> > ld needs to know the glibc version "GLIBC_2.40", not just the stem
> > glibc-flavored symbol name "_dl_have_relr".
>
> This might be troublesome to backport, since it would require to use a higher
> version than the baseline one.  I am not sure if distro will be willing or plan
> to backport such feature though.
>
> >
> > There are non-Linux OSes which don't like a "_dl_have_relr" symbol name.
> > GNU ld would have to provide options in two flavors, one with
> > _dl_have_relr@GLIBC_2.40, one without. Among glibc systems, there are
> > plenty of distros there which don't rigidly require a friendly
> > diagnostic for "time traverl compatibility", e.g. I pretty sure many
> > Gentoo Linux folks doing aggressive optimizations know that their
> > executables don't run on old systems.
>
> I think even other Linux libc, such as musl, won't be willing to support
> tying the DT_RELR to a loader/libc symbol existing (musl even less because
> it explicit does not support symbol versioning).
>
> >
> > An alternative to _dl_have_relr is EI_ABIVERSION. That is probably even
> > less appealing because bumping the version locks out many ELF consumers.
> > https://maskray.me/blog/2021-10-31-relative-relocations-and-relr#ei_abiversion
> > In addition, I noticed that Debian ld.so 2.32 just seems to ignore EI_ABIVERSION.
>
> The problem with EI_ABIVERSION is a limitation of glibc, which only checks
> EI_ABIVERSION on open_verify() and this is not called on default process
> execution, where kernel will be one responsible to load both the binary
> and the interpreter:
>
> ---
> $ cat test.c
> #include <stdio.h>
>
> int main ()
> {
>   return 0;
> }
> $ gdb ./test
> [...]
> (gdb) starti
> [...]
> process 1420253
> Mapped address spaces:
>
>           Start Addr           End Addr       Size     Offset objfile
>       0x555555554000     0x555555555000     0x1000        0x0 /tmp/test/test
>       0x555555555000     0x555555556000     0x1000     0x1000 /tmp/test/test
>       0x555555556000     0x555555557000     0x1000     0x2000 /tmp/test/test
>       0x555555557000     0x555555559000     0x2000     0x2000 /tmp/test/test
>       0x7ffff7fc2000     0x7ffff7fc6000     0x4000        0x0 [vvar]
>       0x7ffff7fc6000     0x7ffff7fc8000     0x2000        0x0 [vdso]
>       0x7ffff7fc8000     0x7ffff7fc9000     0x1000        0x0 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
>       0x7ffff7fc9000     0x7ffff7ff1000    0x28000     0x1000 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
>       0x7ffff7ff1000     0x7ffff7ffb000     0xa000    0x29000 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
>       0x7ffff7ffb000     0x7ffff7fff000     0x4000    0x32000 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
>       0x7ffffffde000     0x7ffffffff000    0x21000        0x0 [stack]
>   0xffffffffff600000 0xffffffffff601000     0x1000        0x0 [vsyscall]
> ---
>
> However, the test is correctly executed on any load library and/or if the
> executable is executed by issuing the loader directly:
>
> ---
> $ readelf -h test
> ELF Header:
>   Magic:   7f 45 4c 46 02 01 01 00 *04* 00 00 00 00 00 00 00
> [...]
> $ /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 ./test
> ./test: error while loading shared libraries: ./test: ELF file ABI version invalid
> ---
>
> I think this is an bug, since it basically defeats the EI_ABIVERSION check
> and makes programs executed by issuing the loader with a different semantic
> than the one executed through execve syscall.
>
> Afaik kernel does not pass such information on auxv vector (we might ask
> for a AT_EHDR eventually) so a potential fix will cost us some extra
> syscalls on every program execution (to read and check the ELF Header with
> similar test done on open_verify()).
>
> However it does *not* help on older glibc which will still accept old binaries.
>
> >
> > % r2 -wqc 'wx 22 @ 8' a; readelf -Wh a | grep ABI; ./a
> >   OS/ABI:                            UNIX - GNU
> >   ABI Version:                       34
> > hello
> >
>
> I am not really sure if the 'time travel compatibility' is really an issue,
> although I saw reports where users try to use chromeos library on glibc that
> fails in some strange ways (most likely due DT_RELR). If user is deploying
> a *opt-in* feature that requires proper dynamic loader support, I would
> expect it know the environment he is targeting.
>
> So I think the best course of action for this issue is indeed fix EI_ABIVERSION
> and make DT_RELR a new 'libc-abis' entry.  We might backport the EI_ABIVERSION
> fix to some older releases, and distros that want to use DT_RELR should do also.

Given that EI_ABIVERSION doesn't really work, should we revisit my
GNU_PROPERTY_1_GLIBC_2_NEEDED proposal:

https://sourceware.org/pipermail/binutils/2021-October/118292.html

-- 
H.J.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Can DT_RELR catch up glibc 2.35?
  2021-11-17  0:26   ` H.J. Lu
@ 2021-11-17 12:46     ` Adhemerval Zanella
  2021-11-17 13:14       ` H.J. Lu
  0 siblings, 1 reply; 13+ messages in thread
From: Adhemerval Zanella @ 2021-11-17 12:46 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Fangrui Song, Carlos O'Donell, GNU C Library, Rich Felker



On 16/11/2021 21:26, H.J. Lu wrote:
> On Tue, Nov 16, 2021 at 1:07 PM Adhemerval Zanella
> <adhemerval.zanella@linaro.org> wrote:
>>
>>
>>
>> On 12/11/2021 04:47, Fangrui Song wrote:
>>> I am glad that https://sourceware.org/pipermail/libc-alpha/2021-October/132029.html
>>> ("[PATCH v2] elf: Support DT_RELR relative relocation format [BZ #27924]") gets
>>> some traction and many folks acknowledge the size benefit.
>>> (On my Arch Linux, I measured 8% decrease for my /usr/bin.)
>>
>> I brought this to the weekly glibc call two weeks ago and if I recall correctly
>> the *main* issue is we need a proper generic ABI definition published to move this
>> forward on glibc side (H.J.Lu was adamant about).
>>
>> From my part, current status where we have multiple system that already support
>> it (android, chromeos, freebsd) and with a toolchain that supports build/check
>> glibc on at least 4 different ABIs (lld 13 on x86 and arm) is good enough.
>>
>> We lack of proper testing while using bfd might a drawback, since we lack a way
>> to generate binaries without linker support.
>>
>>>
>>> There are two potential issues.
>>>
>>> 1. Lack of "Time travel compatibility" detector
>>> 2. Some folks feel that unable to test with scripts/build-many-glibcs.py is a problem.
>>>   (ld.lld --pack-dyn-relocs=relr (since July 2018) is the only linker implementation
>>>   and scripts/build-many-glibcs.py doesn't have an lld configuration)
>>>
>>> Let me address them for you.
>>>
>>> ---
>>>
>>> 1.
>>>
>>> "Time travel compatibility" means running a new object on an old system.
>>> A new object using DT_RELR doesn't have the R_*_RELATIVE part in
>>> .rel.dyn/.rela.dyn and is destined to crash.
>>>
>>> If the GNU ld implementation (which may take a while) adopts an
>>> undefined versioned .dynsym symbol (e.g. _dl_have_relr
>>> https://sourceware.org/pipermail/binutils/2021-October/118347.html),
>>> we can guarantee old ld.so will report an error.
>>> The undefined symbol needs to be versioned because ld -shared (default
>>> to --allow-shlib-undefined) does not error on unversioned symbols. Say
>>> GNU ld adopts something like _dl_have_relr@GLIBC_2.40 . Now it is funny as GNU
>>> ld needs to know the glibc version "GLIBC_2.40", not just the stem
>>> glibc-flavored symbol name "_dl_have_relr".
>>
>> This might be troublesome to backport, since it would require to use a higher
>> version than the baseline one.  I am not sure if distro will be willing or plan
>> to backport such feature though.
>>
>>>
>>> There are non-Linux OSes which don't like a "_dl_have_relr" symbol name.
>>> GNU ld would have to provide options in two flavors, one with
>>> _dl_have_relr@GLIBC_2.40, one without. Among glibc systems, there are
>>> plenty of distros there which don't rigidly require a friendly
>>> diagnostic for "time traverl compatibility", e.g. I pretty sure many
>>> Gentoo Linux folks doing aggressive optimizations know that their
>>> executables don't run on old systems.
>>
>> I think even other Linux libc, such as musl, won't be willing to support
>> tying the DT_RELR to a loader/libc symbol existing (musl even less because
>> it explicit does not support symbol versioning).
>>
>>>
>>> An alternative to _dl_have_relr is EI_ABIVERSION. That is probably even
>>> less appealing because bumping the version locks out many ELF consumers.
>>> https://maskray.me/blog/2021-10-31-relative-relocations-and-relr#ei_abiversion
>>> In addition, I noticed that Debian ld.so 2.32 just seems to ignore EI_ABIVERSION.
>>
>> The problem with EI_ABIVERSION is a limitation of glibc, which only checks
>> EI_ABIVERSION on open_verify() and this is not called on default process
>> execution, where kernel will be one responsible to load both the binary
>> and the interpreter:
>>
>> ---
>> $ cat test.c
>> #include <stdio.h>
>>
>> int main ()
>> {
>>   return 0;
>> }
>> $ gdb ./test
>> [...]
>> (gdb) starti
>> [...]
>> process 1420253
>> Mapped address spaces:
>>
>>           Start Addr           End Addr       Size     Offset objfile
>>       0x555555554000     0x555555555000     0x1000        0x0 /tmp/test/test
>>       0x555555555000     0x555555556000     0x1000     0x1000 /tmp/test/test
>>       0x555555556000     0x555555557000     0x1000     0x2000 /tmp/test/test
>>       0x555555557000     0x555555559000     0x2000     0x2000 /tmp/test/test
>>       0x7ffff7fc2000     0x7ffff7fc6000     0x4000        0x0 [vvar]
>>       0x7ffff7fc6000     0x7ffff7fc8000     0x2000        0x0 [vdso]
>>       0x7ffff7fc8000     0x7ffff7fc9000     0x1000        0x0 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
>>       0x7ffff7fc9000     0x7ffff7ff1000    0x28000     0x1000 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
>>       0x7ffff7ff1000     0x7ffff7ffb000     0xa000    0x29000 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
>>       0x7ffff7ffb000     0x7ffff7fff000     0x4000    0x32000 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
>>       0x7ffffffde000     0x7ffffffff000    0x21000        0x0 [stack]
>>   0xffffffffff600000 0xffffffffff601000     0x1000        0x0 [vsyscall]
>> ---
>>
>> However, the test is correctly executed on any load library and/or if the
>> executable is executed by issuing the loader directly:
>>
>> ---
>> $ readelf -h test
>> ELF Header:
>>   Magic:   7f 45 4c 46 02 01 01 00 *04* 00 00 00 00 00 00 00
>> [...]
>> $ /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 ./test
>> ./test: error while loading shared libraries: ./test: ELF file ABI version invalid
>> ---
>>
>> I think this is an bug, since it basically defeats the EI_ABIVERSION check
>> and makes programs executed by issuing the loader with a different semantic
>> than the one executed through execve syscall.
>>
>> Afaik kernel does not pass such information on auxv vector (we might ask
>> for a AT_EHDR eventually) so a potential fix will cost us some extra
>> syscalls on every program execution (to read and check the ELF Header with
>> similar test done on open_verify()).
>>
>> However it does *not* help on older glibc which will still accept old binaries.
>>
>>>
>>> % r2 -wqc 'wx 22 @ 8' a; readelf -Wh a | grep ABI; ./a
>>>   OS/ABI:                            UNIX - GNU
>>>   ABI Version:                       34
>>> hello
>>>
>>
>> I am not really sure if the 'time travel compatibility' is really an issue,
>> although I saw reports where users try to use chromeos library on glibc that
>> fails in some strange ways (most likely due DT_RELR). If user is deploying
>> a *opt-in* feature that requires proper dynamic loader support, I would
>> expect it know the environment he is targeting.
>>
>> So I think the best course of action for this issue is indeed fix EI_ABIVERSION
>> and make DT_RELR a new 'libc-abis' entry.  We might backport the EI_ABIVERSION
>> fix to some older releases, and distros that want to use DT_RELR should do also.
> 
> Given that EI_ABIVERSION doesn't really work, should we revisit my
> GNU_PROPERTY_1_GLIBC_2_NEEDED proposal:
> 
> https://sourceware.org/pipermail/binutils/2021-October/118292.html

The GNU_PROPERTY_1_GLIBC_2_NEEDED still does not really help much if the idea
is to backport DT_RELR to older version and it still adds logic on the static
linker about glibc symbol version.  I would like that static linker know as
little as possible about glibc version, EI_ABIVERSION is way simpler and
already express ABI extensions. 

I still think for DT_RELR instead of inventing another GNU extension, we might
fix EI_ABIVERSION and use it properly.   Checking with kernel, I think it should
be simple: the elf header is located at the AT_PHDR - sizeof (ElfW(Ehdr)), so we
can refactor the tests at open_verify and use on rtld.c for the case execve()
is called for the executable.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Can DT_RELR catch up glibc 2.35?
  2021-11-17 12:46     ` Adhemerval Zanella
@ 2021-11-17 13:14       ` H.J. Lu
  2021-11-18  0:30         ` Fangrui Song
  0 siblings, 1 reply; 13+ messages in thread
From: H.J. Lu @ 2021-11-17 13:14 UTC (permalink / raw)
  To: Adhemerval Zanella
  Cc: Fangrui Song, Carlos O'Donell, GNU C Library, Rich Felker

On Wed, Nov 17, 2021 at 4:46 AM Adhemerval Zanella
<adhemerval.zanella@linaro.org> wrote:
>
>
>
> On 16/11/2021 21:26, H.J. Lu wrote:
> > On Tue, Nov 16, 2021 at 1:07 PM Adhemerval Zanella
> > <adhemerval.zanella@linaro.org> wrote:
> >>
> >>
> >>
> >> On 12/11/2021 04:47, Fangrui Song wrote:
> >>> I am glad that https://sourceware.org/pipermail/libc-alpha/2021-October/132029.html
> >>> ("[PATCH v2] elf: Support DT_RELR relative relocation format [BZ #27924]") gets
> >>> some traction and many folks acknowledge the size benefit.
> >>> (On my Arch Linux, I measured 8% decrease for my /usr/bin.)
> >>
> >> I brought this to the weekly glibc call two weeks ago and if I recall correctly
> >> the *main* issue is we need a proper generic ABI definition published to move this
> >> forward on glibc side (H.J.Lu was adamant about).
> >>
> >> From my part, current status where we have multiple system that already support
> >> it (android, chromeos, freebsd) and with a toolchain that supports build/check
> >> glibc on at least 4 different ABIs (lld 13 on x86 and arm) is good enough.
> >>
> >> We lack of proper testing while using bfd might a drawback, since we lack a way
> >> to generate binaries without linker support.
> >>
> >>>
> >>> There are two potential issues.
> >>>
> >>> 1. Lack of "Time travel compatibility" detector
> >>> 2. Some folks feel that unable to test with scripts/build-many-glibcs.py is a problem.
> >>>   (ld.lld --pack-dyn-relocs=relr (since July 2018) is the only linker implementation
> >>>   and scripts/build-many-glibcs.py doesn't have an lld configuration)
> >>>
> >>> Let me address them for you.
> >>>
> >>> ---
> >>>
> >>> 1.
> >>>
> >>> "Time travel compatibility" means running a new object on an old system.
> >>> A new object using DT_RELR doesn't have the R_*_RELATIVE part in
> >>> .rel.dyn/.rela.dyn and is destined to crash.
> >>>
> >>> If the GNU ld implementation (which may take a while) adopts an
> >>> undefined versioned .dynsym symbol (e.g. _dl_have_relr
> >>> https://sourceware.org/pipermail/binutils/2021-October/118347.html),
> >>> we can guarantee old ld.so will report an error.
> >>> The undefined symbol needs to be versioned because ld -shared (default
> >>> to --allow-shlib-undefined) does not error on unversioned symbols. Say
> >>> GNU ld adopts something like _dl_have_relr@GLIBC_2.40 . Now it is funny as GNU
> >>> ld needs to know the glibc version "GLIBC_2.40", not just the stem
> >>> glibc-flavored symbol name "_dl_have_relr".
> >>
> >> This might be troublesome to backport, since it would require to use a higher
> >> version than the baseline one.  I am not sure if distro will be willing or plan
> >> to backport such feature though.
> >>
> >>>
> >>> There are non-Linux OSes which don't like a "_dl_have_relr" symbol name.
> >>> GNU ld would have to provide options in two flavors, one with
> >>> _dl_have_relr@GLIBC_2.40, one without. Among glibc systems, there are
> >>> plenty of distros there which don't rigidly require a friendly
> >>> diagnostic for "time traverl compatibility", e.g. I pretty sure many
> >>> Gentoo Linux folks doing aggressive optimizations know that their
> >>> executables don't run on old systems.
> >>
> >> I think even other Linux libc, such as musl, won't be willing to support
> >> tying the DT_RELR to a loader/libc symbol existing (musl even less because
> >> it explicit does not support symbol versioning).
> >>
> >>>
> >>> An alternative to _dl_have_relr is EI_ABIVERSION. That is probably even
> >>> less appealing because bumping the version locks out many ELF consumers.
> >>> https://maskray.me/blog/2021-10-31-relative-relocations-and-relr#ei_abiversion
> >>> In addition, I noticed that Debian ld.so 2.32 just seems to ignore EI_ABIVERSION.
> >>
> >> The problem with EI_ABIVERSION is a limitation of glibc, which only checks
> >> EI_ABIVERSION on open_verify() and this is not called on default process
> >> execution, where kernel will be one responsible to load both the binary
> >> and the interpreter:
> >>
> >> ---
> >> $ cat test.c
> >> #include <stdio.h>
> >>
> >> int main ()
> >> {
> >>   return 0;
> >> }
> >> $ gdb ./test
> >> [...]
> >> (gdb) starti
> >> [...]
> >> process 1420253
> >> Mapped address spaces:
> >>
> >>           Start Addr           End Addr       Size     Offset objfile
> >>       0x555555554000     0x555555555000     0x1000        0x0 /tmp/test/test
> >>       0x555555555000     0x555555556000     0x1000     0x1000 /tmp/test/test
> >>       0x555555556000     0x555555557000     0x1000     0x2000 /tmp/test/test
> >>       0x555555557000     0x555555559000     0x2000     0x2000 /tmp/test/test
> >>       0x7ffff7fc2000     0x7ffff7fc6000     0x4000        0x0 [vvar]
> >>       0x7ffff7fc6000     0x7ffff7fc8000     0x2000        0x0 [vdso]
> >>       0x7ffff7fc8000     0x7ffff7fc9000     0x1000        0x0 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
> >>       0x7ffff7fc9000     0x7ffff7ff1000    0x28000     0x1000 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
> >>       0x7ffff7ff1000     0x7ffff7ffb000     0xa000    0x29000 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
> >>       0x7ffff7ffb000     0x7ffff7fff000     0x4000    0x32000 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
> >>       0x7ffffffde000     0x7ffffffff000    0x21000        0x0 [stack]
> >>   0xffffffffff600000 0xffffffffff601000     0x1000        0x0 [vsyscall]
> >> ---
> >>
> >> However, the test is correctly executed on any load library and/or if the
> >> executable is executed by issuing the loader directly:
> >>
> >> ---
> >> $ readelf -h test
> >> ELF Header:
> >>   Magic:   7f 45 4c 46 02 01 01 00 *04* 00 00 00 00 00 00 00
> >> [...]
> >> $ /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 ./test
> >> ./test: error while loading shared libraries: ./test: ELF file ABI version invalid
> >> ---
> >>
> >> I think this is an bug, since it basically defeats the EI_ABIVERSION check
> >> and makes programs executed by issuing the loader with a different semantic
> >> than the one executed through execve syscall.
> >>
> >> Afaik kernel does not pass such information on auxv vector (we might ask
> >> for a AT_EHDR eventually) so a potential fix will cost us some extra
> >> syscalls on every program execution (to read and check the ELF Header with
> >> similar test done on open_verify()).
> >>
> >> However it does *not* help on older glibc which will still accept old binaries.
> >>
> >>>
> >>> % r2 -wqc 'wx 22 @ 8' a; readelf -Wh a | grep ABI; ./a
> >>>   OS/ABI:                            UNIX - GNU
> >>>   ABI Version:                       34
> >>> hello
> >>>
> >>
> >> I am not really sure if the 'time travel compatibility' is really an issue,
> >> although I saw reports where users try to use chromeos library on glibc that
> >> fails in some strange ways (most likely due DT_RELR). If user is deploying
> >> a *opt-in* feature that requires proper dynamic loader support, I would
> >> expect it know the environment he is targeting.
> >>
> >> So I think the best course of action for this issue is indeed fix EI_ABIVERSION
> >> and make DT_RELR a new 'libc-abis' entry.  We might backport the EI_ABIVERSION
> >> fix to some older releases, and distros that want to use DT_RELR should do also.
> >
> > Given that EI_ABIVERSION doesn't really work, should we revisit my
> > GNU_PROPERTY_1_GLIBC_2_NEEDED proposal:
> >
> > https://sourceware.org/pipermail/binutils/2021-October/118292.html
>
> The GNU_PROPERTY_1_GLIBC_2_NEEDED still does not really help much if the idea
> is to backport DT_RELR to older version and it still adds logic on the static
> linker about glibc symbol version.  I would like that static linker know as
> little as possible about glibc version, EI_ABIVERSION is way simpler and
> already express ABI extensions.
>
> I still think for DT_RELR instead of inventing another GNU extension, we might
> fix EI_ABIVERSION and use it properly.   Checking with kernel, I think it should
> be simple: the elf header is located at the AT_PHDR - sizeof (ElfW(Ehdr)), so we
> can refactor the tests at open_verify and use on rtld.c for the case execve()
> is called for the executable.

The scheme should work for older systems without changes.  Can we add
GLIBC_PRIVATE_DT_RELR?  Linker adds GLIBC_PRIVATE_DT_RELR
version dependency when DT_RELR is generated

-- 
H.J.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Can DT_RELR catch up glibc 2.35?
  2021-11-16 21:07 ` Adhemerval Zanella
  2021-11-17  0:26   ` H.J. Lu
@ 2021-11-17 22:12   ` Florian Weimer
  2021-11-18 12:45     ` Adhemerval Zanella
  1 sibling, 1 reply; 13+ messages in thread
From: Florian Weimer @ 2021-11-17 22:12 UTC (permalink / raw)
  To: Adhemerval Zanella via Libc-alpha
  Cc: Fangrui Song, Carlos O'Donell, H.J. Lu, Adhemerval Zanella,
	Rich Felker

* Adhemerval Zanella via Libc-alpha:

> Afaik kernel does not pass such information on auxv vector (we might ask
> for a AT_EHDR eventually) so a potential fix will cost us some extra 
> syscalls on every program execution (to read and check the ELF Header with 
> similar test done on open_verify()).

We need a change to the auxiliary vector to make this reliable.

AT_PHDR - sizeof (ElfW(Ehdr)) is not a replacement for AT_EHDR because I
don't think the program headers must come immediately after the ELF
header.  Otherwise, we wouldn't need both e_phoff and e_ehsize.

> So I think the best course of action for this issue is indeed fix EI_ABIVERSION
> and make DT_RELR a new 'libc-abis' entry.  We might backport the EI_ABIVERSION
> fix to some older releases, and distros that want to use DT_RELR
> should do also.

I agree.  Waiting for the proposed flags in gABI to appear does not seem
attractive to me.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Can DT_RELR catch up glibc 2.35?
  2021-11-17 13:14       ` H.J. Lu
@ 2021-11-18  0:30         ` Fangrui Song
  2021-11-18  9:45           ` Florian Weimer
  2021-11-19 19:18           ` Rich Felker
  0 siblings, 2 replies; 13+ messages in thread
From: Fangrui Song @ 2021-11-18  0:30 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Adhemerval Zanella, Carlos O'Donell, GNU C Library, Rich Felker

On 2021-11-17, H.J. Lu wrote:
>On Wed, Nov 17, 2021 at 4:46 AM Adhemerval Zanella
><adhemerval.zanella@linaro.org> wrote:
>>
>>
>>
>> On 16/11/2021 21:26, H.J. Lu wrote:
>> > On Tue, Nov 16, 2021 at 1:07 PM Adhemerval Zanella
>> > <adhemerval.zanella@linaro.org> wrote:
>> >>
>> >>
>> >>
>> >> On 12/11/2021 04:47, Fangrui Song wrote:
>> >>> I am glad that https://sourceware.org/pipermail/libc-alpha/2021-October/132029.html
>> >>> ("[PATCH v2] elf: Support DT_RELR relative relocation format [BZ #27924]") gets
>> >>> some traction and many folks acknowledge the size benefit.
>> >>> (On my Arch Linux, I measured 8% decrease for my /usr/bin.)
>> >>
>> >> I brought this to the weekly glibc call two weeks ago and if I recall correctly
>> >> the *main* issue is we need a proper generic ABI definition published to move this
>> >> forward on glibc side (H.J.Lu was adamant about).
>> >>
>> >> From my part, current status where we have multiple system that already support
>> >> it (android, chromeos, freebsd) and with a toolchain that supports build/check
>> >> glibc on at least 4 different ABIs (lld 13 on x86 and arm) is good enough.
>> >>
>> >> We lack of proper testing while using bfd might a drawback, since we lack a way
>> >> to generate binaries without linker support.
>> >>
>> >>>
>> >>> There are two potential issues.
>> >>>
>> >>> 1. Lack of "Time travel compatibility" detector
>> >>> 2. Some folks feel that unable to test with scripts/build-many-glibcs.py is a problem.
>> >>>   (ld.lld --pack-dyn-relocs=relr (since July 2018) is the only linker implementation
>> >>>   and scripts/build-many-glibcs.py doesn't have an lld configuration)
>> >>>
>> >>> Let me address them for you.
>> >>>
>> >>> ---
>> >>>
>> >>> 1.
>> >>>
>> >>> "Time travel compatibility" means running a new object on an old system.
>> >>> A new object using DT_RELR doesn't have the R_*_RELATIVE part in
>> >>> .rel.dyn/.rela.dyn and is destined to crash.
>> >>>
>> >>> If the GNU ld implementation (which may take a while) adopts an
>> >>> undefined versioned .dynsym symbol (e.g. _dl_have_relr
>> >>> https://sourceware.org/pipermail/binutils/2021-October/118347.html),
>> >>> we can guarantee old ld.so will report an error.
>> >>> The undefined symbol needs to be versioned because ld -shared (default
>> >>> to --allow-shlib-undefined) does not error on unversioned symbols. Say
>> >>> GNU ld adopts something like _dl_have_relr@GLIBC_2.40 . Now it is funny as GNU
>> >>> ld needs to know the glibc version "GLIBC_2.40", not just the stem
>> >>> glibc-flavored symbol name "_dl_have_relr".
>> >>
>> >> This might be troublesome to backport, since it would require to use a higher
>> >> version than the baseline one.  I am not sure if distro will be willing or plan
>> >> to backport such feature though.
>> >>
>> >>>
>> >>> There are non-Linux OSes which don't like a "_dl_have_relr" symbol name.
>> >>> GNU ld would have to provide options in two flavors, one with
>> >>> _dl_have_relr@GLIBC_2.40, one without. Among glibc systems, there are
>> >>> plenty of distros there which don't rigidly require a friendly
>> >>> diagnostic for "time traverl compatibility", e.g. I pretty sure many
>> >>> Gentoo Linux folks doing aggressive optimizations know that their
>> >>> executables don't run on old systems.
>> >>
>> >> I think even other Linux libc, such as musl, won't be willing to support
>> >> tying the DT_RELR to a loader/libc symbol existing (musl even less because
>> >> it explicit does not support symbol versioning).
>> >>
>> >>>
>> >>> An alternative to _dl_have_relr is EI_ABIVERSION. That is probably even
>> >>> less appealing because bumping the version locks out many ELF consumers.
>> >>> https://maskray.me/blog/2021-10-31-relative-relocations-and-relr#ei_abiversion
>> >>> In addition, I noticed that Debian ld.so 2.32 just seems to ignore EI_ABIVERSION.
>> >>
>> >> The problem with EI_ABIVERSION is a limitation of glibc, which only checks
>> >> EI_ABIVERSION on open_verify() and this is not called on default process
>> >> execution, where kernel will be one responsible to load both the binary
>> >> and the interpreter:
>> >>
>> >> ---
>> >> $ cat test.c
>> >> #include <stdio.h>
>> >>
>> >> int main ()
>> >> {
>> >>   return 0;
>> >> }
>> >> $ gdb ./test
>> >> [...]
>> >> (gdb) starti
>> >> [...]
>> >> process 1420253
>> >> Mapped address spaces:
>> >>
>> >>           Start Addr           End Addr       Size     Offset objfile
>> >>       0x555555554000     0x555555555000     0x1000        0x0 /tmp/test/test
>> >>       0x555555555000     0x555555556000     0x1000     0x1000 /tmp/test/test
>> >>       0x555555556000     0x555555557000     0x1000     0x2000 /tmp/test/test
>> >>       0x555555557000     0x555555559000     0x2000     0x2000 /tmp/test/test
>> >>       0x7ffff7fc2000     0x7ffff7fc6000     0x4000        0x0 [vvar]
>> >>       0x7ffff7fc6000     0x7ffff7fc8000     0x2000        0x0 [vdso]
>> >>       0x7ffff7fc8000     0x7ffff7fc9000     0x1000        0x0 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
>> >>       0x7ffff7fc9000     0x7ffff7ff1000    0x28000     0x1000 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
>> >>       0x7ffff7ff1000     0x7ffff7ffb000     0xa000    0x29000 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
>> >>       0x7ffff7ffb000     0x7ffff7fff000     0x4000    0x32000 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
>> >>       0x7ffffffde000     0x7ffffffff000    0x21000        0x0 [stack]
>> >>   0xffffffffff600000 0xffffffffff601000     0x1000        0x0 [vsyscall]
>> >> ---
>> >>
>> >> However, the test is correctly executed on any load library and/or if the
>> >> executable is executed by issuing the loader directly:
>> >>
>> >> ---
>> >> $ readelf -h test
>> >> ELF Header:
>> >>   Magic:   7f 45 4c 46 02 01 01 00 *04* 00 00 00 00 00 00 00
>> >> [...]
>> >> $ /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 ./test
>> >> ./test: error while loading shared libraries: ./test: ELF file ABI version invalid
>> >> ---
>> >>
>> >> I think this is an bug, since it basically defeats the EI_ABIVERSION check
>> >> and makes programs executed by issuing the loader with a different semantic
>> >> than the one executed through execve syscall.
>> >>
>> >> Afaik kernel does not pass such information on auxv vector (we might ask
>> >> for a AT_EHDR eventually) so a potential fix will cost us some extra
>> >> syscalls on every program execution (to read and check the ELF Header with
>> >> similar test done on open_verify()).
>> >>
>> >> However it does *not* help on older glibc which will still accept old binaries.
>> >>
>> >>>
>> >>> % r2 -wqc 'wx 22 @ 8' a; readelf -Wh a | grep ABI; ./a
>> >>>   OS/ABI:                            UNIX - GNU
>> >>>   ABI Version:                       34
>> >>> hello
>> >>>
>> >>
>> >> I am not really sure if the 'time travel compatibility' is really an issue,
>> >> although I saw reports where users try to use chromeos library on glibc that
>> >> fails in some strange ways (most likely due DT_RELR). If user is deploying
>> >> a *opt-in* feature that requires proper dynamic loader support, I would
>> >> expect it know the environment he is targeting.
>> >>
>> >> So I think the best course of action for this issue is indeed fix EI_ABIVERSION
>> >> and make DT_RELR a new 'libc-abis' entry.  We might backport the EI_ABIVERSION
>> >> fix to some older releases, and distros that want to use DT_RELR should do also.
>> >
>> > Given that EI_ABIVERSION doesn't really work, should we revisit my
>> > GNU_PROPERTY_1_GLIBC_2_NEEDED proposal:
>> >
>> > https://sourceware.org/pipermail/binutils/2021-October/118292.html
>>
>> The GNU_PROPERTY_1_GLIBC_2_NEEDED still does not really help much if the idea
>> is to backport DT_RELR to older version and it still adds logic on the static
>> linker about glibc symbol version.  I would like that static linker know as
>> little as possible about glibc version, EI_ABIVERSION is way simpler and
>> already express ABI extensions.
>>
>> I still think for DT_RELR instead of inventing another GNU extension, we might
>> fix EI_ABIVERSION and use it properly.   Checking with kernel, I think it should
>> be simple: the elf header is located at the AT_PHDR - sizeof (ElfW(Ehdr)), so we
>> can refactor the tests at open_verify and use on rtld.c for the case execve()
>> is called for the executable.
>
>The scheme should work for older systems without changes.  Can we add
>GLIBC_PRIVATE_DT_RELR?  Linker adds GLIBC_PRIVATE_DT_RELR
>version dependency when DT_RELR is generated

For CCed folks who may be puzzled about the context,
I have a write-up
https://maskray.me/blog/2021-10-31-relative-relocations-and-relr#time-travel-compatibility
which provides my reply to HJ's question as well.

A synthesized versioned undefined dynamic symbol can indeed catch "time
travel compatibility", but the mechanism would be the first time ld adds an option variant
for a particular libc implementation (glibc) locking out all other
implementations: --pack-dyn-relocs=relr-glibc or -z relr-glibc.
Sigh, it is really not pretty.

We know many other libc implementations don't want to synthesize such a
symbol.

"If user is deploying a *opt-in* feature that requires proper dynamic
loader support, I would expect they know the environment they are targeting."

May I suggest that: if a glibc distro really worries that users deploy
ld.lld --pack-dyn-relocs=relr on their new system and back port that to
old systems, just remove DT_RELR support from your local glibc? Since
ld.lld --pack-dyn-relocs=relr  doesn't work on your system with glibc
2.35, people wouldn't complain about not working on older versions.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Can DT_RELR catch up glibc 2.35?
  2021-11-18  0:30         ` Fangrui Song
@ 2021-11-18  9:45           ` Florian Weimer
  2021-11-18 23:27             ` Fangrui Song
  2021-11-19 19:18           ` Rich Felker
  1 sibling, 1 reply; 13+ messages in thread
From: Florian Weimer @ 2021-11-18  9:45 UTC (permalink / raw)
  To: Fangrui Song via Libc-alpha; +Cc: H.J. Lu, Fangrui Song, Rich Felker

* Fangrui Song via Libc-alpha:

> A synthesized versioned undefined dynamic symbol can indeed catch "time
> travel compatibility", but the mechanism would be the first time ld adds an option variant
> for a particular libc implementation (glibc) locking out all other
> implementations: --pack-dyn-relocs=relr-glibc or -z relr-glibc.
> Sigh, it is really not pretty.

There are already other features that do not work with other libcs,
e.g. --audit, --filter, --auxiliary.  And --dynamic-linker tends to lock
out other libcs, too.  And it's a bit hard to argue against this given
that --pack-dyn-relocs=relr works the same way against upstream glibc.

However, I happen to dislike the glibc-tied approach as well, and would
like to see an ELFOSABI bump, along with a kernel fix to make it stick
for main programs as well.

> We know many other libc implementations don't want to synthesize such a
> symbol.

It's been used for other toolchain features before, that's where the
idea comes from.

> "If user is deploying a *opt-in* feature that requires proper dynamic
> loader support, I would expect they know the environment they are targeting."
>
> May I suggest that: if a glibc distro really worries that users deploy
> ld.lld --pack-dyn-relocs=relr on their new system and back port that to
> old systems, just remove DT_RELR support from your local glibc? Since
> ld.lld --pack-dyn-relocs=relr  doesn't work on your system with glibc
> 2.35, people wouldn't complain about not working on older versions.

In a sense, isn't that what's happening?  Albeit on an upstream scale.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Can DT_RELR catch up glibc 2.35?
  2021-11-17 22:12   ` Florian Weimer
@ 2021-11-18 12:45     ` Adhemerval Zanella
  0 siblings, 0 replies; 13+ messages in thread
From: Adhemerval Zanella @ 2021-11-18 12:45 UTC (permalink / raw)
  To: Florian Weimer, Adhemerval Zanella via Libc-alpha
  Cc: Fangrui Song, Carlos O'Donell, H.J. Lu, Rich Felker



On 17/11/2021 19:12, Florian Weimer wrote:
> * Adhemerval Zanella via Libc-alpha:
> 
>> Afaik kernel does not pass such information on auxv vector (we might ask
>> for a AT_EHDR eventually) so a potential fix will cost us some extra 
>> syscalls on every program execution (to read and check the ELF Header with 
>> similar test done on open_verify()).
> 
> We need a change to the auxiliary vector to make this reliable.
> 
> AT_PHDR - sizeof (ElfW(Ehdr)) is not a replacement for AT_EHDR because I
> don't think the program headers must come immediately after the ELF
> header.  Otherwise, we wouldn't need both e_phoff and e_ehsize.

Indeed, however I think we can get ElfW(Ehdr) implicitly by checking the 
PT_LOAD from the AT_PHDR and assume that main_map->l_map_start contains
the ELF header.

> 
>> So I think the best course of action for this issue is indeed fix EI_ABIVERSION
>> and make DT_RELR a new 'libc-abis' entry.  We might backport the EI_ABIVERSION
>> fix to some older releases, and distros that want to use DT_RELR
>> should do also.
> 
> I agree.  Waiting for the proposed flags in gABI to appear does not seem
> attractive to me.

Fangrui raised some question that setting the EI_ABIVERSION by the linker
though a command line option and not from input arguments is not optional.
At least for bfd it does for a couple of architecture (mips for instance).

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Can DT_RELR catch up glibc 2.35?
  2021-11-18  9:45           ` Florian Weimer
@ 2021-11-18 23:27             ` Fangrui Song
  2021-11-19 11:51               ` Adhemerval Zanella
  2021-11-24  1:10               ` Sam James
  0 siblings, 2 replies; 13+ messages in thread
From: Fangrui Song @ 2021-11-18 23:27 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Fangrui Song via Libc-alpha, H.J. Lu, Rich Felker, Felix Yan,
	Sylvestre Ledru, sam, Leah Neukirchen

(For newly CCed Linux distro folks, sorry for hijacking you here. Scroll
to the end for my request from you.)

The first message of the thread is
https://sourceware.org/pipermail/libc-alpha/2021-November/133009.html
and
https://maskray.me/blog/2021-10-31-relative-relocations-and-relr#time-travel-compatibility
provides context.
)

On 2021-11-18, Florian Weimer wrote:
>* Fangrui Song via Libc-alpha:
>
>> A synthesized versioned undefined dynamic symbol can indeed catch "time
>> travel compatibility", but the mechanism would be the first time ld adds an option variant
>> for a particular libc implementation (glibc) locking out all other
>> implementations: --pack-dyn-relocs=relr-glibc or -z relr-glibc.
>> Sigh, it is really not pretty.
>
>There are already other features that do not work with other libcs,
>e.g. --audit, --filter, --auxiliary.  And --dynamic-linker tends to lock
>out other libcs, too.  And it's a bit hard to argue against this given
>that --pack-dyn-relocs=relr works the same way against upstream glibc.

--audit, --filter, and --auxiliary do not lock out other
implementations. Solaris has DT_AUDIT, DT_FILTER, and DT_AUXILIARY.
If a libc implementation desires, it can add the support.

--pack-dyn-relocs=relr-glibc or -z relr-glibc would be the first time an
option name mentions glibc. (See below, it could be "relr-gnu", but still
something not many users using ELFOSABI_GNU want).

>However, I happen to dislike the glibc-tied approach as well, and would
>like to see an ELFOSABI bump, along with a kernel fix to make it stick
>for main programs as well.

Thanks for disliking the glibc-tied approach :)

EI_ABIVERSION depends on EI_OSABI. We need to discuss
ELFOSABI_NONE/ELFOSABI_GNU separately.

For ELFOSABI_GNU (alias: ELFOSABI_LINUX), the option name does not have
to mention "glibc" and can probably use "relr-gnu".  However, It is
still odd that a feature exists/works well and a "*-gnu" variant is
added just because glibc uses a different development model.

I think Android bionic / musl may feel the change unnecessary but may
not complain loudly because they just doesn't inspect EI_ABIVERSION.
Well, musl doesn't support DT_RELR, but Rich can provide more
authoritative opinion on EI_ABIVERSION bump from musl's point of view.

For ELFOSABI_NONE: many Linux executables don't use
STB_GNU_UNIQUE/STT_GNU_IFUNC and therefore use ELFOSABI_NONE.  I think
Solaris folks have ruled out the possibility to bump EI_OSABIVERSION.
This is not something they/FreeBSD/etc need.

In addition, to make EI_ABIVERSION check work. A distro needs to
backport the change to many old glibc versions. I am not sure many
distros want to do this.

>> We know many other libc implementations don't want to synthesize such a
>> symbol.
>
>It's been used for other toolchain features before, that's where the
>idea comes from.

E.g. __stack_chk_guard, _mcount from GCC/Clang instrumentation as
a compiler-libc protocol.
But this would be the first time the linker synthesizes an undefined symbol.

>> "If user is deploying a *opt-in* feature that requires proper dynamic
>> loader support, I would expect they know the environment they are targeting."
>>
>> May I suggest that: if a glibc distro really worries that users deploy
>> ld.lld --pack-dyn-relocs=relr on their new system and back port that to
>> old systems, just remove DT_RELR support from your local glibc? Since
>> ld.lld --pack-dyn-relocs=relr  doesn't work on your system with glibc
>> 2.35, people wouldn't complain about not working on older versions.
>
>In a sense, isn't that what's happening?  Albeit on an upstream scale.

 From my point of view, I want glibc based Linux distro to benefit from
DT_RELR earlier. Blocking this on an upstream scale is not appealing.

CCed a bunch of Linux distro folks (Arch/Debian/Gentoo/Void).

* ld.lld is not a default linker on most Linux distros.
* ld.lld --pack-dyn-relocs=relr is an opt-in feature.
* --pack-dyn-relocs=relr is difficult to misuse because GNU ld doesn't support it.
* binutils may get relr support one day, but may take several releases.
* Nobody will switch the GCC/Clang default any time soon.
* Coping the new executable to an old glibc system is unsupported.

By enabling DT_RELR in upstream glibc, the Linux distros will get the
glibc feature with ZERO overhead in their downstream packaging.  Then, a
user opting in ld.lld --pack-dyn-relocs=relr will have smaller
executables.  When GNU ld finally gets the feature, the benefit will
reach more users. So why not having the feature to make the future
feature enablement smoother?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Can DT_RELR catch up glibc 2.35?
  2021-11-18 23:27             ` Fangrui Song
@ 2021-11-19 11:51               ` Adhemerval Zanella
  2021-11-24  1:10               ` Sam James
  1 sibling, 0 replies; 13+ messages in thread
From: Adhemerval Zanella @ 2021-11-19 11:51 UTC (permalink / raw)
  To: Fangrui Song, Florian Weimer
  Cc: Rich Felker, Fangrui Song via Libc-alpha, Sylvestre Ledru, sam,
	Felix Yan, Leah Neukirchen



On 18/11/2021 20:27, Fangrui Song via Libc-alpha wrote:
> (For newly CCed Linux distro folks, sorry for hijacking you here. Scroll
> to the end for my request from you.)
> 
> The first message of the thread is
> https://sourceware.org/pipermail/libc-alpha/2021-November/133009.html
> and
> https://maskray.me/blog/2021-10-31-relative-relocations-and-relr#time-travel-compatibility
> provides context.
> )
> 
> On 2021-11-18, Florian Weimer wrote:
>> * Fangrui Song via Libc-alpha:
>>
>>> A synthesized versioned undefined dynamic symbol can indeed catch "time
>>> travel compatibility", but the mechanism would be the first time ld adds an option variant
>>> for a particular libc implementation (glibc) locking out all other
>>> implementations: --pack-dyn-relocs=relr-glibc or -z relr-glibc.
>>> Sigh, it is really not pretty.
>>
>> There are already other features that do not work with other libcs,
>> e.g. --audit, --filter, --auxiliary.  And --dynamic-linker tends to lock
>> out other libcs, too.  And it's a bit hard to argue against this given
>> that --pack-dyn-relocs=relr works the same way against upstream glibc.
> 
> --audit, --filter, and --auxiliary do not lock out other
> implementations. Solaris has DT_AUDIT, DT_FILTER, and DT_AUXILIARY.
> If a libc implementation desires, it can add the support.
> 
> --pack-dyn-relocs=relr-glibc or -z relr-glibc would be the first time an
> option name mentions glibc. (See below, it could be "relr-gnu", but still
> something not many users using ELFOSABI_GNU want).
> 
>> However, I happen to dislike the glibc-tied approach as well, and would
>> like to see an ELFOSABI bump, along with a kernel fix to make it stick
>> for main programs as well.
> 
> Thanks for disliking the glibc-tied approach :)
> 
> EI_ABIVERSION depends on EI_OSABI. We need to discuss
> ELFOSABI_NONE/ELFOSABI_GNU separately.
> 
> For ELFOSABI_GNU (alias: ELFOSABI_LINUX), the option name does not have
> to mention "glibc" and can probably use "relr-gnu".  However, It is
> still odd that a feature exists/works well and a "*-gnu" variant is
> added just because glibc uses a different development model.
> 
> I think Android bionic / musl may feel the change unnecessary but may
> not complain loudly because they just doesn't inspect EI_ABIVERSION.
> Well, musl doesn't support DT_RELR, but Rich can provide more
> authoritative opinion on EI_ABIVERSION bump from musl's point of view.

I think it is a fair assumption that for ELFOSABI_GNU DT_RELR to use a
newer value for EI_ABIVERSION.  Android also supports DT_REL/DT_ANDROID_RELR
even without a proper standard ABI definition, so I assume it won't really
care about EI_ABIVERSION/ELFOSABI_GNU bump.

And at least for glibc, EI_ABIVERSION was not really enforced because it
lacks full coverage (it fails when loader is not responsible to load
binary image).  Some architecture tries to leverage it (MIPS for instance),
but I think currently it is a best effort than a proper solution.

> 
> For ELFOSABI_NONE: many Linux executables don't use
> STB_GNU_UNIQUE/STT_GNU_IFUNC and therefore use ELFOSABI_NONE.  I think
> Solaris folks have ruled out the possibility to bump EI_OSABIVERSION.
> This is not something they/FreeBSD/etc need.> 
> In addition, to make EI_ABIVERSION check work. A distro needs to
> backport the change to many old glibc versions. I am not sure many
> distros want to do this.

Yes, it takes time and not all distributions will pick this support
readily.  However backporting EI_ABIVERSION/ELFOSABI_GNU proper check
should be easy.

> 
>>> We know many other libc implementations don't want to synthesize such a
>>> symbol.
>>
>> It's been used for other toolchain features before, that's where the
>> idea comes from.
> 
> E.g. __stack_chk_guard, _mcount from GCC/Clang instrumentation as
> a compiler-libc protocol.
> But this would be the first time the linker synthesizes an undefined symbol.

I also dislike this approach.

> 
>>> "If user is deploying a *opt-in* feature that requires proper dynamic
>>> loader support, I would expect they know the environment they are targeting."
>>>
>>> May I suggest that: if a glibc distro really worries that users deploy
>>> ld.lld --pack-dyn-relocs=relr on their new system and back port that to
>>> old systems, just remove DT_RELR support from your local glibc? Since
>>> ld.lld --pack-dyn-relocs=relr  doesn't work on your system with glibc
>>> 2.35, people wouldn't complain about not working on older versions.
>>
>> In a sense, isn't that what's happening?  Albeit on an upstream scale.
> 
> From my point of view, I want glibc based Linux distro to benefit from
> DT_RELR earlier. Blocking this on an upstream scale is not appealing.
> 
> CCed a bunch of Linux distro folks (Arch/Debian/Gentoo/Void).
> 
> * ld.lld is not a default linker on most Linux distros.
> * ld.lld --pack-dyn-relocs=relr is an opt-in feature.
> * --pack-dyn-relocs=relr is difficult to misuse because GNU ld doesn't support it.
> * binutils may get relr support one day, but may take several releases.
> * Nobody will switch the GCC/Clang default any time soon.
> * Coping the new executable to an old glibc system is unsupported.
> 
> By enabling DT_RELR in upstream glibc, the Linux distros will get the
> glibc feature with ZERO overhead in their downstream packaging.  Then, a
> user opting in ld.lld --pack-dyn-relocs=relr will have smaller
> executables.  When GNU ld finally gets the feature, the benefit will
> reach more users. So why not having the feature to make the future
> feature enablement smoother?

I tend to agree with your points, but I am still failing the see the trouble
to support different EI_ABIVERSION for ELFOSABI_GNU on lld (or any other
linker that intends to support DT_RELR for ELFOSABI_GNU).

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Can DT_RELR catch up glibc 2.35?
  2021-11-18  0:30         ` Fangrui Song
  2021-11-18  9:45           ` Florian Weimer
@ 2021-11-19 19:18           ` Rich Felker
  1 sibling, 0 replies; 13+ messages in thread
From: Rich Felker @ 2021-11-19 19:18 UTC (permalink / raw)
  To: Fangrui Song
  Cc: H.J. Lu, Adhemerval Zanella, Carlos O'Donell, GNU C Library

On Wed, Nov 17, 2021 at 04:30:25PM -0800, Fangrui Song wrote:
> On 2021-11-17, H.J. Lu wrote:
> >On Wed, Nov 17, 2021 at 4:46 AM Adhemerval Zanella
> ><adhemerval.zanella@linaro.org> wrote:
> >>
> >>
> >>
> >>On 16/11/2021 21:26, H.J. Lu wrote:
> >>> On Tue, Nov 16, 2021 at 1:07 PM Adhemerval Zanella
> >>> <adhemerval.zanella@linaro.org> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 12/11/2021 04:47, Fangrui Song wrote:
> >>>>> I am glad that https://sourceware.org/pipermail/libc-alpha/2021-October/132029.html
> >>>>> ("[PATCH v2] elf: Support DT_RELR relative relocation format [BZ #27924]") gets
> >>>>> some traction and many folks acknowledge the size benefit.
> >>>>> (On my Arch Linux, I measured 8% decrease for my /usr/bin.)
> >>>>
> >>>> I brought this to the weekly glibc call two weeks ago and if I recall correctly
> >>>> the *main* issue is we need a proper generic ABI definition published to move this
> >>>> forward on glibc side (H.J.Lu was adamant about).
> >>>>
> >>>> From my part, current status where we have multiple system that already support
> >>>> it (android, chromeos, freebsd) and with a toolchain that supports build/check
> >>>> glibc on at least 4 different ABIs (lld 13 on x86 and arm) is good enough.
> >>>>
> >>>> We lack of proper testing while using bfd might a drawback, since we lack a way
> >>>> to generate binaries without linker support.
> >>>>
> >>>>>
> >>>>> There are two potential issues.
> >>>>>
> >>>>> 1. Lack of "Time travel compatibility" detector
> >>>>> 2. Some folks feel that unable to test with scripts/build-many-glibcs.py is a problem.
> >>>>>   (ld.lld --pack-dyn-relocs=relr (since July 2018) is the only linker implementation
> >>>>>   and scripts/build-many-glibcs.py doesn't have an lld configuration)
> >>>>>
> >>>>> Let me address them for you.
> >>>>>
> >>>>> ---
> >>>>>
> >>>>> 1.
> >>>>>
> >>>>> "Time travel compatibility" means running a new object on an old system.
> >>>>> A new object using DT_RELR doesn't have the R_*_RELATIVE part in
> >>>>> .rel.dyn/.rela.dyn and is destined to crash.
> >>>>>
> >>>>> If the GNU ld implementation (which may take a while) adopts an
> >>>>> undefined versioned .dynsym symbol (e.g. _dl_have_relr
> >>>>> https://sourceware.org/pipermail/binutils/2021-October/118347.html),
> >>>>> we can guarantee old ld.so will report an error.
> >>>>> The undefined symbol needs to be versioned because ld -shared (default
> >>>>> to --allow-shlib-undefined) does not error on unversioned symbols. Say
> >>>>> GNU ld adopts something like _dl_have_relr@GLIBC_2.40 . Now it is funny as GNU
> >>>>> ld needs to know the glibc version "GLIBC_2.40", not just the stem
> >>>>> glibc-flavored symbol name "_dl_have_relr".
> >>>>
> >>>> This might be troublesome to backport, since it would require to use a higher
> >>>> version than the baseline one.  I am not sure if distro will be willing or plan
> >>>> to backport such feature though.
> >>>>
> >>>>>
> >>>>> There are non-Linux OSes which don't like a "_dl_have_relr" symbol name.
> >>>>> GNU ld would have to provide options in two flavors, one with
> >>>>> _dl_have_relr@GLIBC_2.40, one without. Among glibc systems, there are
> >>>>> plenty of distros there which don't rigidly require a friendly
> >>>>> diagnostic for "time traverl compatibility", e.g. I pretty sure many
> >>>>> Gentoo Linux folks doing aggressive optimizations know that their
> >>>>> executables don't run on old systems.
> >>>>
> >>>> I think even other Linux libc, such as musl, won't be willing to support
> >>>> tying the DT_RELR to a loader/libc symbol existing (musl even less because
> >>>> it explicit does not support symbol versioning).
> >>>>
> >>>>>
> >>>>> An alternative to _dl_have_relr is EI_ABIVERSION. That is probably even
> >>>>> less appealing because bumping the version locks out many ELF consumers.
> >>>>> https://maskray.me/blog/2021-10-31-relative-relocations-and-relr#ei_abiversion
> >>>>> In addition, I noticed that Debian ld.so 2.32 just seems to ignore EI_ABIVERSION.
> >>>>
> >>>> The problem with EI_ABIVERSION is a limitation of glibc, which only checks
> >>>> EI_ABIVERSION on open_verify() and this is not called on default process
> >>>> execution, where kernel will be one responsible to load both the binary
> >>>> and the interpreter:
> >>>>
> >>>> ---
> >>>> $ cat test.c
> >>>> #include <stdio.h>
> >>>>
> >>>> int main ()
> >>>> {
> >>>>   return 0;
> >>>> }
> >>>> $ gdb ./test
> >>>> [...]
> >>>> (gdb) starti
> >>>> [...]
> >>>> process 1420253
> >>>> Mapped address spaces:
> >>>>
> >>>>           Start Addr           End Addr       Size     Offset objfile
> >>>>       0x555555554000     0x555555555000     0x1000        0x0 /tmp/test/test
> >>>>       0x555555555000     0x555555556000     0x1000     0x1000 /tmp/test/test
> >>>>       0x555555556000     0x555555557000     0x1000     0x2000 /tmp/test/test
> >>>>       0x555555557000     0x555555559000     0x2000     0x2000 /tmp/test/test
> >>>>       0x7ffff7fc2000     0x7ffff7fc6000     0x4000        0x0 [vvar]
> >>>>       0x7ffff7fc6000     0x7ffff7fc8000     0x2000        0x0 [vdso]
> >>>>       0x7ffff7fc8000     0x7ffff7fc9000     0x1000        0x0 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
> >>>>       0x7ffff7fc9000     0x7ffff7ff1000    0x28000     0x1000 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
> >>>>       0x7ffff7ff1000     0x7ffff7ffb000     0xa000    0x29000 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
> >>>>       0x7ffff7ffb000     0x7ffff7fff000     0x4000    0x32000 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
> >>>>       0x7ffffffde000     0x7ffffffff000    0x21000        0x0 [stack]
> >>>>   0xffffffffff600000 0xffffffffff601000     0x1000        0x0 [vsyscall]
> >>>> ---
> >>>>
> >>>> However, the test is correctly executed on any load library and/or if the
> >>>> executable is executed by issuing the loader directly:
> >>>>
> >>>> ---
> >>>> $ readelf -h test
> >>>> ELF Header:
> >>>>   Magic:   7f 45 4c 46 02 01 01 00 *04* 00 00 00 00 00 00 00
> >>>> [...]
> >>>> $ /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 ./test
> >>>> ./test: error while loading shared libraries: ./test: ELF file ABI version invalid
> >>>> ---
> >>>>
> >>>> I think this is an bug, since it basically defeats the EI_ABIVERSION check
> >>>> and makes programs executed by issuing the loader with a different semantic
> >>>> than the one executed through execve syscall.
> >>>>
> >>>> Afaik kernel does not pass such information on auxv vector (we might ask
> >>>> for a AT_EHDR eventually) so a potential fix will cost us some extra
> >>>> syscalls on every program execution (to read and check the ELF Header with
> >>>> similar test done on open_verify()).
> >>>>
> >>>> However it does *not* help on older glibc which will still accept old binaries.
> >>>>
> >>>>>
> >>>>> % r2 -wqc 'wx 22 @ 8' a; readelf -Wh a | grep ABI; ./a
> >>>>>   OS/ABI:                            UNIX - GNU
> >>>>>   ABI Version:                       34
> >>>>> hello
> >>>>>
> >>>>
> >>>> I am not really sure if the 'time travel compatibility' is really an issue,
> >>>> although I saw reports where users try to use chromeos library on glibc that
> >>>> fails in some strange ways (most likely due DT_RELR). If user is deploying
> >>>> a *opt-in* feature that requires proper dynamic loader support, I would
> >>>> expect it know the environment he is targeting.
> >>>>
> >>>> So I think the best course of action for this issue is indeed fix EI_ABIVERSION
> >>>> and make DT_RELR a new 'libc-abis' entry.  We might backport the EI_ABIVERSION
> >>>> fix to some older releases, and distros that want to use DT_RELR should do also.
> >>>
> >>> Given that EI_ABIVERSION doesn't really work, should we revisit my
> >>> GNU_PROPERTY_1_GLIBC_2_NEEDED proposal:
> >>>
> >>> https://sourceware.org/pipermail/binutils/2021-October/118292.html
> >>
> >>The GNU_PROPERTY_1_GLIBC_2_NEEDED still does not really help much if the idea
> >>is to backport DT_RELR to older version and it still adds logic on the static
> >>linker about glibc symbol version.  I would like that static linker know as
> >>little as possible about glibc version, EI_ABIVERSION is way simpler and
> >>already express ABI extensions.
> >>
> >>I still think for DT_RELR instead of inventing another GNU extension, we might
> >>fix EI_ABIVERSION and use it properly.   Checking with kernel, I think it should
> >>be simple: the elf header is located at the AT_PHDR - sizeof (ElfW(Ehdr)), so we
> >>can refactor the tests at open_verify and use on rtld.c for the case execve()
> >>is called for the executable.
> >
> >The scheme should work for older systems without changes.  Can we add
> >GLIBC_PRIVATE_DT_RELR?  Linker adds GLIBC_PRIVATE_DT_RELR
> >version dependency when DT_RELR is generated
> 
> For CCed folks who may be puzzled about the context,
> I have a write-up
> https://maskray.me/blog/2021-10-31-relative-relocations-and-relr#time-travel-compatibility
> which provides my reply to HJ's question as well.
> 
> A synthesized versioned undefined dynamic symbol can indeed catch "time
> travel compatibility", but the mechanism would be the first time ld adds an option variant
> for a particular libc implementation (glibc) locking out all other
> implementations: --pack-dyn-relocs=relr-glibc or -z relr-glibc.
> Sigh, it is really not pretty.
> 
> We know many other libc implementations don't want to synthesize such a
> symbol.

If you really want this, I have an alternate solution: add a new
relocation type to live in the normal REL/RELA table, whose semantics
are "process a DT_RELR table". This will cause the dynamic linker to
error out of it's too old to know about DT_RELR, and it can be ignored
as a no-op (or used as the trigger to process DT_RELR) by ldso that's
new enough to know about it.

Rich

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Can DT_RELR catch up glibc 2.35?
  2021-11-18 23:27             ` Fangrui Song
  2021-11-19 11:51               ` Adhemerval Zanella
@ 2021-11-24  1:10               ` Sam James
  1 sibling, 0 replies; 13+ messages in thread
From: Sam James @ 2021-11-24  1:10 UTC (permalink / raw)
  To: Fangrui Song
  Cc: Florian Weimer, Fangrui Song via Libc-alpha, H.J. Lu,
	Rich Felker, Felix Yan, Sylvestre Ledru, Leah Neukirchen

[-- Attachment #1: Type: text/plain, Size: 1585 bytes --]



> On 18 Nov 2021, at 23:27, Fangrui Song <maskray@google.com> wrote:
> 
> (For newly CCed Linux distro folks, sorry for hijacking you here. Scroll
> to the end for my request from you.)
> 
> [snip]

> CCed a bunch of Linux distro folks (Arch/Debian/Gentoo/Void).
> 
> * ld.lld is not a default linker on most Linux distros.
> * ld.lld --pack-dyn-relocs=relr is an opt-in feature.
> * --pack-dyn-relocs=relr is difficult to misuse because GNU ld doesn't support it.
> * binutils may get relr support one day, but may take several releases.
> * Nobody will switch the GCC/Clang default any time soon.
> * Coping the new executable to an old glibc system is unsupported.
> 
> By enabling DT_RELR in upstream glibc, the Linux distros will get the
> glibc feature with ZERO overhead in their downstream packaging.  Then, a
> user opting in ld.lld --pack-dyn-relocs=relr will have smaller
> executables.  When GNU ld finally gets the feature, the benefit will
> reach more users. So why not having the feature to make the future
> feature enablement smoother?

FWIW, from our perspective in Gentoo (but it should apply to most/all Linux distros),
there's really no difference as long as the default isn't changed yet.

People are, as always, free to use custom flags and must understand their implications,
but they need to be careful when building on _any_ distro in _any_ environment anyway:
obviously building with newer glibc affects one's ability to run binaries on older targets
with an older glibc. It's not a new issue in that regard.

best,
sam

[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 618 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-11-24  1:10 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-12  7:47 Can DT_RELR catch up glibc 2.35? Fangrui Song
2021-11-16 21:07 ` Adhemerval Zanella
2021-11-17  0:26   ` H.J. Lu
2021-11-17 12:46     ` Adhemerval Zanella
2021-11-17 13:14       ` H.J. Lu
2021-11-18  0:30         ` Fangrui Song
2021-11-18  9:45           ` Florian Weimer
2021-11-18 23:27             ` Fangrui Song
2021-11-19 11:51               ` Adhemerval Zanella
2021-11-24  1:10               ` Sam James
2021-11-19 19:18           ` Rich Felker
2021-11-17 22:12   ` Florian Weimer
2021-11-18 12:45     ` Adhemerval Zanella

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).