public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Sergey Bugaev <bugaevc@gmail.com>
To: Samuel Thibault <samuel.thibault@gnu.org>
Cc: libc-alpha@sourceware.org, bug-hurd@gnu.org
Subject: Re: [RFC PATCH glibc 24/34] hurd: Only check for TLS initialization inside rtld or in static builds
Date: Thu, 13 Apr 2023 13:02:58 +0300	[thread overview]
Message-ID: <CAN9u=HdW51ajNDh1C85ifTGgHSnayDG-8TLss26D2-RX9Kosdg@mail.gmail.com> (raw)
In-Reply-To: <20230412234657.ntztyz7iau55lcwt@begin>

Wow, this is great, thank you! You've really gone above and beyond
compared to what I expected you to do.

Some replies below; will reply to other points later.

On Thu, Apr 13, 2023 at 2:47 AM Samuel Thibault <samuel.thibault@gnu.org> wrote:
> > Maybe you're building with some flags that affect this? I'm only doing
> > ../configure.
>
> I'm using
>
> ../configure --prefix= --enable-pt_chown

Yeah, that shouldn't influence anything vs what I have.

> I have uploaded the build result of master +
> b37899d34d2190ef4b454283188f22519f096048 restored on:
>
> https://dept-info.labri.fr/~thibault/tmp/libc.so.0.3
> https://dept-info.labri.fr/~thibault/tmp/ld.so
> https://dept-info.labri.fr/~thibault/tmp/test-as-const-rtld-sizes
>
> you can run it by hand with
> ./ld.so --library-path $PWD ./test-as-const-rtld-sizes
>
> It hangs on my system. I have put the core dump on
>
> https://dept-info.labri.fr/~thibault/tmp/core.18601
>
> which can be inspected with
>
> gdb ./ld.so core.18601

Thank you, I'm going to take a look.

> Running live gdb ./ld.so 18529, I get:
>
> (gdb) thread apply all bt
>
> Thread 2 (Thread 18529.2):
> #0  0x0102aa3c in __GI___mach_msg_trap () at /usr/src/glibc-upstream/build/mach/mach_msg_trap.S:2
> #1  0x0102b1d6 in __GI___mach_msg (msg=0x1315d10, option=3, send_size=64, rcv_size=32, rcv_name=0, timeout=0, notify=0) at msg.c:111
> #2  0x012c9850 in __gsync_wait (task=<optimized out>, addr=<optimized out>, val1=<optimized out>, val2=<optimized out>, msec=<optimized out>, flags=<optimized out>) at ./build-tree/hurd-i386-libc/mach/RPC_gsync_wait.c:186
> #3  0x0104631b in __GI___spin_lock (__lock=0x12bb844 <_hurd_siglock>) at ../mach/lock-intern.h:60
> #4  __GI___mutex_lock (__lock=0x12bb844 <_hurd_siglock>) at ../mach/lock-intern.h:119
> #5  __GI__hurd_thread_sigstate (thread=<optimized out>) at hurdsig.c:80
> #6  0x0116abb8 in _hurd_critical_section_lock () at ../hurd/hurd/signal.h:230
> #7  _hurd_fd_get (fd=2) at ../hurd/hurd/fd.h:74
> #8  __GI___write_nocancel (fd=2, buf=0x1315e60, nbytes=<optimized out>) at ../sysdeps/mach/hurd/write_nocancel.c:26
> #9  0x01149135 in __GI___libc_write (fd=2, buf=0x1315e60, nbytes=41) at ../sysdeps/mach/hurd/write.c:26
> #10 0x0116ff07 in __GI___writev (fd=<optimized out>, vector=<optimized out>, count=<optimized out>) at ../sysdeps/posix/writev.c:87
> #11 0x010b9df5 in writev_for_fatal (fd=<optimized out>, total=<optimized out>, niov=<optimized out>, iov=<optimized out>) at ../sysdeps/posix/libc_fatal.c:44
> #12 __libc_message (fmt=<optimized out>) at ../sysdeps/posix/libc_fatal.c:124
> #13 0x010b9ead in __GI___libc_fatal (message=0x12216b4 "hurd: Can't add reference on Mach thread\n") at ../sysdeps/posix/libc_fatal.c:159
> #14 0x01046524 in __GI__hurd_thread_sigstate (thread=<optimized out>) at hurdsig.c:136
> #15 0x0103fd33 in __GI__hurd_self_sigstate () at ../hurd/hurd/signal.h:173
> #16 _hurd_msgport_receive (arg=<error reading variable: Cannot access memory at address 0x1316004>) at msgportdemux.c:47
> Backtrace stopped: Cannot access memory at address 0x1316000

So the immediate cause of the hang is we deadlock trying to take the
_hurd_siglock while already holding it (inside #14 0x01046524 in
__GI__hurd_thread_sigstate), since it's just a non-recursive struct
mutex. We should probably write our own version of writev_for_fatal
that does no locking (other than maybe trylock) and tries to not touch
any TLS or sigstate. Like, just grab the port from _hurd_dtable (with
no critical sections, no nothin') and call io_write on it (with the
regular non-intr mach_msg).

That, and abort () should be more careful with locking the sigstate
too, in order not to fault and/or deadlock if sigstate / TLS is
broken. Well, in this case we didn't even get to abort (), we
deadlocked on the write, but abort would be next.

But the underlying issue is that the thread port is bogus, which --
what? How can this even happen? This is not even some other thread's
port (maybe a thread could die and then its port right would turn into
a dead name, and then mach_port_mod_refs would return
KERN_INVALID_RIGHT if you try to add a send right...), this is our
very own thread port, the result of mach_thread_self () which was
called just several moments ago in hurd_self_sigstate ()!

If we're trying to tie this to TLS somehow, maybe the port is fine,
but the mach_port_mod_refs RPS fails because something is off with the
reply port. But also note that at this point we're well into libc.so
(this is the msgport thread already, and the other thread below is
already running user's code!), so clearly the TLS must have been set
up (and not by TLS_INIT_TP, this is not the main thread). And since we
managed to create the thread (which is done in libc.so, not ld.so),
TLS must have been OK in the main thread too (thread_create is not a
direct syscall).

And if TLS wasn't set up, I'd expect a TLS access to segfault or
busfault or something, not to read bogus data -- or am I wrong? what
does $gs_base initially point to, before it's set up? I would assume
it's either just 0x0, or the %gs:something read does not work at all
(SIGBUS).

> Interestingly, watching for the $gs update:
>
> € gdb --args ./ld.so --library-path=/tmp ./test-as-const-rtld-sizes
> (gdb) b _start
> Breakpoint 1 at 0x1a5a0
> (gdb) r
> Starting program: /tmp/ld.so --library-path /tmp ./test-as-const-rtld-sizes
>
> Thread 5 hit Breakpoint 1, 0x0801a5a0 in _start ()
> (gdb) watch $gs
> Watchpoint 2: $gs
> (gdb) c
> Continuing.

Cool, I didn't know you could watch a register like that -- although
it appears to be super slow, so it must be not using hardware
watchpoints.

> At that point the library loading has happened:
>
> (gdb) info sharedlibrary
> From        To          Syms Read   Shared Object Library
> 0x08000db0  0x080256e1  Yes         /tmp/ld.so
> 0x0102a650  0x01200d35  No          /tmp/libc.so.0.3
> 0x012c49a0  0x012d0ad4  No          /tmp/libmachuser.so.1
> 0x012e0bc0  0x012fee50  No          /tmp/libhurduser.so.0.3
>
> And the function symbols indeed seem to have been overloaded:
>
> (gdb) l __write
> 384     __write (int fd, const void *buf, size_t nbytes)
> 385     {
> 386       error_t err;
> 387       vm_size_t nwrote;
> 388
> 389       assert (fd < _hurd_init_dtablesize);
>
>
> That is why I'm thinking that apparently exposing the libc functions
> happens before setting up TLS, and thus potential for mayhem if libc
> assumes that TLS is set up. The loading itself is apparently done in the
> _dl_map_object_deps call of dl_main.

Well, if that's why you're thinking libc.so functions are already in
use by ld.so, this will be easy to disprove :)

GDB is super cool, but it's not *that* smart. When you "l __write", it
likely just looks through the "loaded" DSOs and finds the symbol,
looks up its debuginfo, then source, and prints that. It may even say
that there are several places where a symbol with the same name is
defined. Here's what I get (on a different executable):

Thread 4 hit Temporary breakpoint 1, 0x080483f0 in main ()
(gdb) l __write
file: "../sysdeps/mach/hurd/dl-sysdep.c", line number: 389, symbol: "__write"
384     ../sysdeps/mach/hurd/dl-sysdep.c: No such file or directory.
file: "../sysdeps/mach/hurd/write.c", line number: 25, symbol:
"__GI___libc_write"
20      ../sysdeps/mach/hurd/write.c: No such file or directory.

Point is, GDB can look up symbols in DSOs, but it doesn't understand
symbol resolution rules: RTLD_LOCAL vs GLOBAL, linking namespaces,
whether a DSO has already been relocated or not, all of those things.

What you should really check is not what GDB prints on "l __write",
but rather what the GOT/PLT slots inside ld.so contain; that is where
ld.so will jump when it calls the functions. Here's an annotated GDB
session (with upstream Debian's glibc):

$ gdb -q ./hello
Reading symbols from ./hello...
# Start running so gdb can resolve addresses & symbols inside ld.so:
(gdb) starti
Starting program: /home/bugaevc/hello
Thread 4 stopped.
0x0001d550 in _start () from /lib/ld.so
# Let's look at the GOT/PLT entry for __write (actually __write_nocancel):
(gdb) p &'__write_nocancel@got.plt'
$1 = (<text from jump slot in .got.plt, no debug info> *) 0x36034
<__write_nocancel@got.plt>
(gdb) p '__write_nocancel@got.plt'
$2 = (<text from jump slot in .got.plt, no debug info>) 0xde6
# The entry itself is at 0x36034 (this will come in useful later), but
as of _start, it contains garbage (which is to be expected). Now let's
advance to TLS setup, and check again:
(gdb) advance __i386_set_gdt
__i386_set_gdt (target_thread=96, selector=0x1037b94, desc=...) at
./build-tree/hurd-i386-libc/mach/RPC_i386_set_gdt.c:79
79      ./build-tree/hurd-i386-libc/mach/RPC_i386_set_gdt.c: No such
file or directory.
(gdb) p '__write_nocancel@got.plt'
$3 = (<text from jump slot in .got.plt, no debug info>) 0x1c1f0 <__write>
# See, now the PLT entry points to __write. But whose __write this is?
(gdb) info symbol 0x1c1f0
__write_nocancel in section .text of /lib/ld.so
# It's ld.so's! That's because it has initially relocated itself so
that its symbols point back to itself.
# And now let's advance to where the signal thread is set up:
(gdb) tb _hurdsig_init
Function "_hurdsig_init" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Temporary breakpoint 1 (_hurdsig_init) pending.
(gdb) c
Continuing.
Thread 4 hit Temporary breakpoint 1, _hurdsig_init
(intarray=0x103c000, intarraysize=5) at ./hurd/hurdsig.c:1453
1453    ./hurd/hurdsig.c: No such file or directory.
# Sanity check: are we inside libc.so now?
(gdb) info symbol $eip
_hurdsig_init in section .text of /lib/i386-gnu/libc.so.0.3
# Yes we are, good. Let's look at the PLT entry:
(gdb) p '__write_nocancel@got.plt'
$4 = (<text from jump slot in .got.plt, no debug info>) 0x11b5820
<__GI___write_nocancel>
# Huh, it clearly points to a different __write_nocancel now! But
whose PLT entry are we looking at now, ld.so's or libc.so's? Who
knows, let's put in that address from above explicitly:
(gdb) p *(void**) 0x36034
$5 = (void *) 0x11b5820 <__GI___write_nocancel>
# And just to be very sure, this is libc.so's __write_nocancel, right?
(gdb) info symbol 0x11b5820
__write_nocancel in section .text of /lib/i386-gnu/libc.so.0.3
# Right :)

Hope I'm making my point clear: "l __write" is no basis to suspect
that ld.so has already been bound to libc.so function. Something else
must be going on.

I'll be back with more results, and thank you again.

Sergey

  reply	other threads:[~2023-04-13 10:03 UTC|newest]

Thread overview: 140+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-19 15:09 [RFC PATCH 00/34] The rest of the x86_64-gnu port Sergey Bugaev
2023-03-19 15:09 ` [RFC PATCH gnumach 01/34] Add i386_fsgs_base_state Sergey Bugaev
2023-04-02 22:43   ` Samuel Thibault
2023-03-19 15:09 ` [RFC PATCH gnumach 02/34] Remove bootstrap.defs Sergey Bugaev
2023-04-02 22:43   ` Samuel Thibault
2023-04-03  9:39     ` Sergey Bugaev
2023-03-19 15:09 ` [RFC PATCH gnumach 03/34] Make exception subcode a long Sergey Bugaev
2023-04-02 22:45   ` Samuel Thibault
2023-04-03  9:32     ` Sergey Bugaev
2023-04-06  2:11       ` Flávio Cruz
2023-04-10 23:52         ` Samuel Thibault
2023-03-19 15:09 ` [RFC PATCH glibc 04/34] hurd: " Sergey Bugaev
2023-04-02 22:52   ` Samuel Thibault
2023-03-19 15:09 ` [RFC PATCH glibc 05/34] hurd: Remove __hurd_threadvar_stack_{offset,mask} Sergey Bugaev
2023-04-02 22:53   ` Samuel Thibault
2023-03-19 15:09 ` [RFC PATCH glibc 06/34] hurd: Swap around two function calls Sergey Bugaev
2023-04-02 22:54   ` Samuel Thibault
2023-03-19 15:09 ` [RFC PATCH glibc 07/34] hurd: Fix file name in #error Sergey Bugaev
2023-04-02 22:55   ` Samuel Thibault
2023-03-19 15:09 ` [RFC PATCH glibc 08/34] hurd: Disable O_TRUNC and FS_RETRY_MAGICAL in rtld Sergey Bugaev
2023-04-02 22:57   ` Samuel Thibault
2023-03-19 15:09 ` [RFC PATCH glibc 09/34] hurd: Fix _hurd_setup_sighandler () signature Sergey Bugaev
2023-04-02 22:58   ` Samuel Thibault
2023-03-19 15:09 ` [RFC PATCH glibc 10/34] stdio-common: Fix building when !IS_IN (libc) Sergey Bugaev
2023-04-02 23:01   ` Samuel Thibault
2023-03-19 15:09 ` [RFC PATCH glibc 11/34] mach, hurd: Drop __libc_lock_self0 Sergey Bugaev
2023-04-02 23:02   ` Samuel Thibault
2023-03-19 15:09 ` [RFC PATCH glibc 12/34] hurd: More 64-bit integer casting fixes Sergey Bugaev
2023-04-02 23:03   ` Samuel Thibault
2023-03-19 15:09 ` [RFC PATCH glibc 13/34] x86-64: Disable prefer_map_32bit_exec tunable on non-Linux Sergey Bugaev
2023-04-02 23:09   ` Samuel Thibault
2023-04-03 10:10     ` Sergey Bugaev
2023-04-03 19:02       ` H.J. Lu
2023-04-03 20:11         ` Sergey Bugaev
2023-03-19 15:09 ` [RFC PATCH glibc 14/34] hurd: Move rtld-strncpy-c.c out of mach/hurd/ Sergey Bugaev
2023-04-02 23:10   ` Samuel Thibault
2023-03-19 15:09 ` [RFC PATCH glibc 15/34] hurd: Use uintptr_t for register values in trampoline.c Sergey Bugaev
2023-04-02 23:13   ` Samuel Thibault
2023-03-19 15:09 ` [RFC PATCH glibc 16/34] hurd: Add sys/ucontext.h and sigcontext.h for x86_64 Sergey Bugaev
2023-04-10 18:39   ` Samuel Thibault
2023-04-10 19:07     ` Sergey Bugaev
2023-04-10 19:21       ` Samuel Thibault
2023-04-10 18:58   ` Samuel Thibault
2023-04-10 19:13     ` Sergey Bugaev
2023-04-10 19:21       ` Samuel Thibault
2023-04-10 21:50         ` Sergey Bugaev
2023-04-10 22:23           ` Samuel Thibault
2023-03-19 15:10 ` [RFC PATCH glibc 17/34] hurd: Implement x86_64/intr-msg.h Sergey Bugaev
2023-04-10 18:41   ` Samuel Thibault
2023-03-19 15:10 ` [RFC PATCH glibc 18/34] hurd: Port trampoline.c to x86_64 Sergey Bugaev
2023-04-03 11:56   ` [PATCH v2 18.0/34] Alignment-respecting x86_64 trampoline.c Sergey Bugaev
2023-04-03 11:56     ` [PATCH v2 18.1/34] hurd: Do not declare local variables volatile Sergey Bugaev
2023-04-10 18:42       ` Samuel Thibault
2023-04-03 11:56     ` [PATCH v2 18.2/34] hurd: Port trampoline.c to x86_64 Sergey Bugaev
2023-04-10 19:04       ` Samuel Thibault
2023-04-10 21:33         ` Sergey Bugaev
2023-03-19 15:10 ` [RFC PATCH glibc 19/34] hurd: Move a couple of singal-related files to x86 Sergey Bugaev
2023-04-02 23:15   ` Samuel Thibault
2023-03-19 15:10 ` [RFC PATCH glibc 20/34] htl: Add tcb-offsets.sym for x86_64 Sergey Bugaev
2023-04-02 23:16   ` Samuel Thibault
2023-03-19 15:10 ` [RFC PATCH glibc 21/34] x86_64: Add rtld-stpncpy & rtld-strncpy Sergey Bugaev
2023-04-02 23:18   ` Samuel Thibault
2023-03-19 15:10 ` [RFC PATCH glibc 22/34] htl: Implement thread_set_pcsptp for x86_64 Sergey Bugaev
2023-04-02 23:19   ` Samuel Thibault
2023-03-19 15:10 ` [RFC PATCH glibc 23/34] elf: Stop including tls.h in ldsodefs.h Sergey Bugaev
2023-04-02 23:20   ` Samuel Thibault
2023-04-03  9:26     ` Sergey Bugaev
2023-04-10 21:26   ` Samuel Thibault
2023-03-19 15:10 ` [RFC PATCH glibc 24/34] hurd: Only check for TLS initialization inside rtld or in static builds Sergey Bugaev
2023-04-10 21:33   ` Samuel Thibault
2023-04-11 18:57   ` Samuel Thibault
2023-04-11 19:18     ` Samuel Thibault
2023-04-11 20:03     ` Samuel Thibault
2023-04-11 20:27     ` Sergey Bugaev
2023-04-11 21:23       ` Samuel Thibault
2023-04-12  8:36         ` Sergey Bugaev
2023-04-12  9:00           ` Samuel Thibault
2023-04-12 10:42             ` Sergey Bugaev
2023-04-12 10:45               ` Samuel Thibault
2023-04-12 17:18                 ` Sergey Bugaev
2023-04-12 23:46               ` Samuel Thibault
2023-04-13 10:02                 ` Sergey Bugaev [this message]
2023-04-13 10:10                   ` Samuel Thibault
2023-04-13 12:17                     ` Sergey Bugaev
2023-04-13 21:47                       ` Samuel Thibault
2023-04-13 22:21                         ` Samuel Thibault
2023-04-14  8:29                         ` Sergey Bugaev
2023-04-14  8:36                           ` Samuel Thibault
2023-04-14  8:53                             ` Sergey Bugaev
2023-04-14  9:09                               ` Samuel Thibault
2023-04-14  9:23                                 ` Sergey Bugaev
2023-04-14  9:31                                   ` Samuel Thibault
2023-04-17  7:16                               ` Samuel Thibault
2023-04-14 17:34   ` Samuel Thibault
2023-04-14 19:52     ` Sergey Bugaev
2023-03-19 15:10 ` [RFC PATCH glibc 25/34] hurd: Improve reply port handling when exiting signal handlers Sergey Bugaev
2023-04-10 22:03   ` Samuel Thibault
2023-04-11  7:44     ` Sergey Bugaev
2023-04-11 20:15       ` Samuel Thibault
2023-04-11 20:35         ` Sergey Bugaev
2023-04-12 22:54   ` Samuel Thibault
2023-03-19 15:10 ` [RFC PATCH glibc 26/34] hurd: Remove __hurd_local_reply_port Sergey Bugaev
2023-04-10 22:07   ` Samuel Thibault
2023-04-10 22:35     ` Samuel Thibault
2023-04-11  8:00     ` Sergey Bugaev
2023-04-11 20:18       ` Samuel Thibault
2023-04-13 11:58         ` [RFC PATCH glibc v2 " Sergey Bugaev
2023-04-13 13:12           ` Samuel Thibault
2023-04-13 13:20             ` Sergey Bugaev
2023-04-13 21:28               ` Samuel Thibault
2023-04-14 17:33           ` Samuel Thibault
2023-04-14 20:29             ` Sergey Bugaev
2023-04-15  6:45               ` Samuel Thibault
2023-04-15  7:34                 ` Sergey Bugaev
2023-04-15  7:42                   ` Samuel Thibault
2023-03-19 15:10 ` [RFC PATCH glibc 27/34] hurd: Don't leak __hurd_reply_port0 Sergey Bugaev
2023-04-10 22:25   ` Samuel Thibault
2023-03-19 15:10 ` [RFC PATCH glibc 28/34] hurd: Implement _hurd_longjmp_thread_state for x86_64 Sergey Bugaev
2023-04-02 23:23   ` Samuel Thibault
2023-03-19 15:10 ` [RFC PATCH glibc 29/34] hurd: Add vm_param.h " Sergey Bugaev
2023-04-02 23:24   ` Samuel Thibault
2023-03-19 15:10 ` [RFC PATCH glibc 30/34] hurd: Implement longjmp " Sergey Bugaev
2023-03-19 15:10 ` [RFC PATCH glibc 31/34] hurd: Microoptimize _hurd_self_sigstate () Sergey Bugaev
2023-04-02 23:26   ` Samuel Thibault
2023-03-19 15:10 ` [RFC PATCH glibc 32/34] hurd: Implement sigreturn for x86_64 Sergey Bugaev
2023-04-03 11:47   ` [PATCH v2] " Sergey Bugaev
2023-03-19 15:10 ` [RFC PATCH glibc 33/34] hurd: Create abilist files for lib{mach,hurd}user Sergey Bugaev
2023-03-19 15:19   ` Samuel Thibault
2023-03-19 15:39     ` Sergey Bugaev
2023-03-19 15:43       ` Samuel Thibault
2023-03-19 15:10 ` [RFC PATCH glibc 34/34] hurd: Add expected abilist files for x86_64 Sergey Bugaev
2023-03-19 18:04   ` Florian Weimer
2023-03-19 20:14     ` [PATCH v2] " Sergey Bugaev
2023-03-20  6:30       ` Florian Weimer
2023-03-19 16:44 ` [RFC PATCH 00/34] The rest of the x86_64-gnu port Luca
2023-03-20  5:03   ` Flávio Cruz
2023-04-02 23:30 ` Samuel Thibault
2023-04-10 19:20 ` Samuel Thibault
2023-04-10 21:24   ` Sergey Bugaev
2023-04-10 21:27     ` Samuel Thibault

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAN9u=HdW51ajNDh1C85ifTGgHSnayDG-8TLss26D2-RX9Kosdg@mail.gmail.com' \
    --to=bugaevc@gmail.com \
    --cc=bug-hurd@gnu.org \
    --cc=libc-alpha@sourceware.org \
    --cc=samuel.thibault@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).