public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Fangrui Song <maskray@google.com>
To: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Cc: Florian Weimer <fweimer@redhat.com>, libc-alpha@sourceware.org
Subject: Re: [PATCH] dlsym: Make RTLD_NEXT prefer default version definition [#BZ #14932]
Date: Fri, 27 May 2022 15:04:08 -0700	[thread overview]
Message-ID: <20220527220408.djhp3nz4aalpnsua@google.com> (raw)
In-Reply-To: <fd2ff169-536d-9af2-5bfa-c7a9d69dba23@linaro.org>

On 2022-05-27, Adhemerval Zanella wrote:
>
>
>On 27/05/2022 18:03, Fangrui Song via Libc-alpha wrote:
>> On 2022-05-27, Florian Weimer wrote:
>>> * Fangrui Song:
>>>
>>>> I suspect the code (from 199x) has a hidden assumption that for a stem
>>>> symbol (say foo), the .dynsym entries (say foo@v1, foo@v2, foo@@v3) are
>>>> ordered by the version node indices (in .gnu.version_d).
>>>
>>> Right, the code clearly has this assumption.  There has to be some
>>> ordering because we don't want to do a topological sort to find the
>>> lowest and highest versions.
>>>
>>>> If so:
>>>>
>>>> * (flags & DL_LOOKUP_RETURN_NEWEST) == 0 => the symbol with
>>>>     verstab[symidx]==2 will skip the check and be returned
>>>> * (flags & DL_LOOKUP_RETURN_NEWEST) != 0 => only the default version
>>>>   symbol gets assigned in `*versioned_sym = sym`
>>>>
>>>> Unfortunately, this assumption hold for none of GNU ld, gold, and
>>>> ld.lld (seems unrelated to --hash-style=):
>>>>
>>>> ```
>>>> % ld.bfd @response.txt && readelf -W --dyn-syms nextmod3.so | grep foo@
>>>>      5: 0000000000001100     6 FUNC    GLOBAL DEFAULT   13 foo@v1
>>>>      8: 0000000000001120     6 FUNC    GLOBAL DEFAULT   13 foo@@v3
>>>>      9: 0000000000001110     6 FUNC    GLOBAL DEFAULT   13 foo@v2
>>>> % gold @response.txt && readelf -W --dyn-syms nextmod3.so | grep foo@
>>>>      7: 00000000000007c0     6 FUNC    GLOBAL DEFAULT   14 foo@@v3
>>>>      9: 00000000000007a0     6 FUNC    GLOBAL DEFAULT   14 foo@v1
>>>>     10: 00000000000007b0     6 FUNC    GLOBAL DEFAULT   14 foo@v2
>>>> % ld.lld @response.txt && readelf -W --dyn-syms nextmod3.so | grep foo@
>>>>      7: 0000000000001740     6 FUNC    GLOBAL DEFAULT   13 foo@@v3
>>>>      8: 0000000000001720     6 FUNC    GLOBAL DEFAULT   13 foo@v1
>>>>      9: 0000000000001730     6 FUNC    GLOBAL DEFAULT   13 foo@v2
>>>
>>> The order of symbols in the symbol table is not what counts here, it's
>>> the order in the list for the hash bucket.
>>>
>>> I'm not sure if it's possible to dump this with readelf.  I'm writing a
>>> custom dumper for this.
>>>
>>> Thanks,
>>> Florian
>>>
>> llvm-readelf --gnu-hash-table can dump the chain array (`Values:`), with
>> the size equal to the number of defined symbols.  The odd numbers
>> indicate the end of a chain.
>>
>> I believe the .gnu.hash lookup order is in increasing symbol index (in
>> all of musl, glibc, FreeBSD rtld).
>> So the glibc (flags & DL_LOOKUP_RETURN_NEWEST) == 0 code path (for
>> relocations) is wrong...

OK, I was wrong. Re-reading, I think the code is correct but confusing.

The following example tests a relocation (flags = DL_LOOKUP_ADD_DEPENDENCY |
DL_LOOKUP_FOR_RELOCATE; note there is no DL_LOOKUP_RETURN_NEWEST):

printf > ./a.c %s '
#define _GNU_SOURCE  // workaround before 748df8126ac69e68e0b94e236ea3c2e11b1176cb
#include <dlfcn.h>
#include <stdio.h>

int foo(int a);

int main(void) {
   int res = foo (0);
   printf ("foo (0) = %d, %s\n", res, res == 0 ? "ok" : "wrong");

   /* Resolve to foo@@v3 in nextmod3.so, instead of
      foo@v1 or foo@v2.  */
   int (*fp) (int) = dlsym (RTLD_DEFAULT, "foo");
   res = fp (0);
   printf ("foo (0) = %d, %s\n", res, res == 3 ? "ok" : "wrong");

   /* Resolve to foo@@v3 in nextmod3.so, instead of
      foo@v1 or foo@v2.  */
   fp = dlsym (RTLD_NEXT, "foo");
   res = fp (0);
   printf ("foo (0) = %d, %s\n", res, res == 3 ? "ok" : "wrong");
}
'
echo 'int foo(int a) { return -1; }' > ./b0.c
printf > ./b.c %s '
int foo_va(int a) { return 0; } asm(".symver foo_va, foo@va, remove");
int foo_v1(int a) { return 1; } asm(".symver foo_v1, foo@v1, remove");
int foo_v2(int a) { return 2; } asm(".symver foo_v2, foo@v2, remove");
int foo(int a) { return 3; } asm(".symver foo, foo@@@v3");
'
echo 'va {}; v1 {} va; v2 {} v1; v3 {} v2;' > ./b.ver
sed 's/^        /\t/g' > ./Makefile <<'eof'
.MAKE.MODE := meta curdirOk=1
CFLAGS = -fpic -g

a: a.o b0.so b.so
	$(CC) -Wl,--no-as-needed a.o b0.so -ldl -Wl,-rpath=$$PWD -o $@

b0.so: b0.o
	$(CC) $> -shared -Wl,--soname=b.so -o $@

b.so: b.o
	$(CC) $> -shared -Wl,--soname=b.so,--version-script=b.ver -o $@

clean:
	rm -f a *.so *.o *.meta
eof

Use `bmake` to build (meta mode ensures changing the command line will cause a re-build).

It will be nice the enhance tst-next-ver.c with the `foo` relocation
test but my glibc-Makefile-fu isn't great...

>
>My understanding is (flags & DL_LOOKUP_RETURN_NEWEST) == 0 is only used for
>dlvsym and I think we never documented properly hwo dlvsym (handle, symbol, "")
>should behave.  Currently it is different than dlsym (handle, symbol) (it will
>return the a versioned symbol), while with your patch it will be similar to dlsym.

It's for both dlvsym and relocation resolving.

https://refspecs.linuxfoundation.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic.html
"10.7.6 Symbol Resolution" says:

> The object with the reference does not use versioning, while the object
> with the defin-
> itions does. In this instance, only the definitions with index numbers 1
> and 2 will be
> used in the reference match, the same identified by the static linker as
> the base defini-
> tion. In cases where the static linker was not used, such as in calls to
> dlopen(), a
> version that does not have the base definition index shall be acceptable
> if it is the only
> version for which the symbol is defined.

% readelf -WV b.so
...
Version definition section '.gnu.version_d' contains 5 entries:
  Addr: 0x0000000000000374  Offset: 0x000374  Link: 6 (.dynstr)
   000000: Rev: 1  Flags: BASE  Index: 1  Cnt: 1  Name: b.so
   0x001c: Rev: 1  Flags: none  Index: 2  Cnt: 1  Name: va
   0x0038: Rev: 1  Flags: none  Index: 3  Cnt: 1  Name: v1
   0x0054: Rev: 1  Flags: none  Index: 4  Cnt: 1  Name: v2
   0x0070: Rev: 1  Flags: none  Index: 5  Cnt: 1  Name: v3

https://maskray.me/blog/2020-11-26-all-about-symbol-versioning#rtld-behavior
has my updated understanding of the behavior.

      reply	other threads:[~2022-05-27 22:04 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-20  8:35 Fangrui Song
2022-05-20 11:27 ` Adhemerval Zanella
2022-05-20 14:22   ` Florian Weimer
2022-05-22 18:54     ` Fāng-ruì Sòng
2022-05-27 16:59       ` Adhemerval Zanella
2022-05-27 11:03 ` Florian Weimer
2022-05-27 19:24   ` Fangrui Song
2022-05-27 20:32     ` Fangrui Song
2022-05-27 20:38       ` Florian Weimer
2022-05-27 21:03         ` Fangrui Song
2022-05-27 21:15           ` Adhemerval Zanella
2022-05-27 22:04             ` Fangrui Song [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220527220408.djhp3nz4aalpnsua@google.com \
    --to=maskray@google.com \
    --cc=adhemerval.zanella@linaro.org \
    --cc=fweimer@redhat.com \
    --cc=libc-alpha@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).