From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 101557 invoked by alias); 24 Jan 2018 13:59:46 -0000 Mailing-List: contact libc-help-help@sourceware.org; run by ezmlm Precedence: bulk List-Subscribe: List-Post: List-Help: , Sender: libc-help-owner@sourceware.org Received: (qmail 101530 invoked by uid 89); 24 Jan 2018 13:59:45 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.1 required=5.0 tests=BAYES_00,KAM_ASCII_DIVIDERS,RCVD_IN_DNSWL_NONE,SPF_HELO_PASS,SPF_PASS autolearn=no version=3.3.2 spammy=watches, symptom, H*Ad:U*libc-help, HTo:U*libc-help X-HELO: bhuna.collabora.co.uk Received: from bhuna.collabora.co.uk (HELO bhuna.collabora.co.uk) (46.235.227.227) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 24 Jan 2018 13:59:41 +0000 Received: from collabora.com (unknown [IPv6:2a00:5f00:102:0:2430:b45c:e00d:8497]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) (Authenticated sender: vivek) by bhuna.collabora.co.uk (Postfix) with ESMTPSA id D3D5F264C3E for ; Wed, 24 Jan 2018 13:59:39 +0000 (GMT) Date: Wed, 24 Jan 2018 13:59:00 -0000 From: =?UTF-8?Q?Vivek_Das=C2=A0Mohapatra?= To: libc-help@sourceware.org Subject: A possible libc/dlmopen/pthreads bug Message-ID: User-Agent: Alpine 2.20 (DEB 67 2015-01-07) MIME-Version: 1.0 Content-Type: multipart/mixed; BOUNDARY="8323329-1876895591-1516802379=:22921" X-IsSubscribed: yes X-SW-Source: 2018-01/txt/msg00008.txt.bz2 This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --8323329-1876895591-1516802379=:22921 Content-Type: text/plain; format=flowed; charset=UTF-8 Content-Transfer-Encoding: 8BIT Content-length: 10379 Hello - As I've posted here before, I'm working on a segregated-dynamic-linking mechanism based on dlmopen (to allow applications to link against libraries without sharing their dependencies). While testing some recent builds, I think I may have tripped over a possible dlmopen/pthreads bug. It is entirely possible that I'm doing something wrong, or failing to do something - a lot of this is new territory for me. Either way, I'd appreciate some feedback/insight into what's going on. A bit of background before I proceed: libcapsule is the helper library I'm working on, based on dlmopen, that allows me to create "shim" libraries that expose the symbols from their immediate target, but don't expose any further dependencies. +-----------------------------+ +----------------------------+ | Runtime filesystem | | Host filesystem | | | | | | +------------+ | | | | | Executable | | | | | ++------+----+ | | | | | | | | | | | +--+---------------+ | | +------------------+ | | | | shim libX11 | <-----> | real libX11 |-+ | | | +--+---------------+ | | +----------+-------+ | | | | | | | | | | | | | | | +---+-------+ | | | ++------+-----+ | | | libA | | | | | libc | | | +-----------+ | | | +-------------+ | | +-----------+ | | | | | | libB |-+ | | | | +-----------+ | +-----------------------------+ +----------------------------+ The libraries on the right hand side from the host filesystem are in a dlmopen namespace of their own (as an implementation detail they have the same libc as the DSOs on the left, but it is a new copy opened with dlmopen) The shim libX11 takes care of making sure the executable gets symbols from the real libX11 instead of the shim, but the bug we're going to look is not (I think) related to that, so I'm not going to discuss that process here. The visible symptom is that when I launch pulseaudio with a few shimmed libraries (the first case I stumbled upon, and easy to reproduce) it seems to deadlock very early in its life. A bit of digging with strace and gdb shows that when it locks up it does so inside setresuid. A bit more digging indicates that the code is infinite looping here: __nptl_setxid (cmdp=0xffffd9d8) at allocatestack.c:1105 +list 1103 1104 /* Now the list with threads using user-allocated stacks. */ 1105 list_for_each (runp, &__stack_user) 1106 { 1107 struct pthread *t = list_entry (runp, struct pthread, list); 1108 if (t == self) 1109 continue; 1110 1111 setxid_mark_thread (cmdp, t); 1112 } For some reason, list_for_each never terminates. If I disable encapsulation (remove the shim libraries from the library path) then the following holds at that point in the code: Breakpoint 6, __nptl_setxid (cmdp=0xffffd9e8) at allocatestack.c:1105 1105 list_for_each (runp, &__stack_user) +bt #0 __nptl_setxid (cmdp=0xffffd9e8) at allocatestack.c:1105 #1 0xf7b96162 in __GI___setresuid (ruid=1000, euid=1000, suid=1000) at ../sysdeps/unix/sysv/linux/i386/setresuid.c:29 #2 0x5655b7f0 in pa_drop_root () #3 0x56558a6e in main () Digging into __stack_user: +p __stack_user $1 = {next = 0xf73a48a0, prev = 0xf73a48a0} +p &__stack_user $2 = (list_t *) 0xf7d1d1a4 <__stack_user> +p (&__stack_user)->next $3 = (struct list_head *) 0xf73a48a0 +p (&__stack_user)->next->next $4 = (struct list_head *) 0xf7d1d1a4 <__stack_user> +p (&__stack_user)->next->next->next $5 = (struct list_head *) 0xf73a48a0 We find a circular linked list, which contains a pointer to __stack_user. Since list_for_each is invoked as list_for_each(…, &__stack_user), this means the for loop it implements will terminate, allowing setresuid to proceed. // ============================================================================ Note: The definition of list_for_each is this: # define list_for_each(pos, head) \ for (pos = (head)->next; pos != (head); pos = pos->next) // ============================================================================ Now let's examine the same case with the shim in place: Breakpoint 6, __nptl_setxid (cmdp=0xffffd9d8) at allocatestack.c:1105 1105 list_for_each (runp, &__stack_user) ⋮ +p __stack_user $1 = {next = 0xf76eeb60, prev = 0xf76eeb60} +p &__stack_user $2 = (list_t *) 0xf7d8f1a4 <__stack_user> +p (&__stack_user)->next $3 = (struct list_head *) 0xf76eeb60 +p (&__stack_user)->next->next $4 = (struct list_head *) 0xf71391a4 +p (&__stack_user)->next->next->next $5 = (struct list_head *) 0xf76eeb60 We can see we have a circular linked list, as before, but it does _not_ contain the element supplied as the head to list_for_each: We're going to loop forever. ============================================================================ Next let's try and figure out when/where this happens. Setting various breakpoints and watches we uncover the following: +run Starting program: /usr/bin/pulseaudio --start [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/i386-linux-gnu/libthread_db.so.1". Breakpoint 1, __pthread_initialize_minimal_internal () at nptl-init.c:290 290 { +break allocatestack.c:1105 Breakpoint 6 at 0xf7d78b2c: file allocatestack.c, line 1105. +watch __stack_user Hardware watchpoint 7: __stack_user +watch __stack_user.next Hardware watchpoint 8: __stack_user.next +cont Continuing. Hardware watchpoint 7: __stack_user Old value = {next = 0x0, prev = 0x0} New value = {next = 0xf7d8f1a4 <__stack_user>, prev = 0x0} Hardware watchpoint 8: __stack_user.next Old value = (struct list_head *) 0x0 New value = (struct list_head *) 0xf7d8f1a4 <__stack_user> __pthread_initialize_minimal_internal () at nptl-init.c:377 377 list_add (&pd->list, &__stack_user); +cont Continuing. Hardware watchpoint 7: __stack_user Old value = {next = 0xf7d8f1a4 <__stack_user>, prev = 0x0} New value = {next = 0xf7d8f1a4 <__stack_user>, prev = 0xf76eeb60} list_add (head=, newp=0xf76eeb60) at ../include/list.h:64 64 head->next = newp; +cont Continuing. Hardware watchpoint 7: __stack_user Old value = {next = 0xf7d8f1a4 <__stack_user>, prev = 0xf76eeb60} New value = {next = 0xf76eeb60, prev = 0xf76eeb60} Hardware watchpoint 8: __stack_user.next Old value = (struct list_head *) 0xf7d8f1a4 <__stack_user> New value = (struct list_head *) 0xf76eeb60 __pthread_initialize_minimal_internal () at nptl-init.c:381 381 THREAD_SETMEM (pd, report_events, __nptl_initial_report_events); +cont Continuing. Breakpoint 2, __pthread_init_static_tls (map=0x5657e040) at allocatestack.c:1210 1210 { // ============================================================================ // At this point we step to the end of __pthread_init_static_tls and set // an extra watch point on the address currently holding &__stack_user // ============================================================================ +p __stack_user.next $1 = (struct list_head *) 0xf76eeb60 +p __stack_user.next->next $2 = (struct list_head *) 0xf7d8f1a4 <__stack_user> ← STILL GOOD +watch __stack_user.next->next Hardware watchpoint 9: __stack_user.next->next +s // And here it is: Hardware watchpoint 9: __stack_user.next->next Old value = (struct list_head *) 0xf7d8f1a4 <__stack_user> New value = (struct list_head *) 0xf71391a4 ← >>>>> GONE WRONG HERE <<<<< 0xf7121c83 in ?? () // Hm, an unknown address scribbling on __stack_user. +call calloc(1, sizeof(Dl_info)) $3 = (void *) 0x56574d18 +call dladdr(0xf7121c83, $3) $4 = 1 +p *(Dl_info *)$3 $5 = {dli_fname = 0x565755b8 "/lib/i386-linux-gnu/libpthread.so.0", dli_fbase = 0xf711d000, dli_sname = 0xf711f617 "__pthread_initialize_minimal", dli_saddr = 0xf7121be0} // Well that can't be right, can it? gdb should have figured out the name // of 0xf7121c83, not said ?? - let's work out the address in the other // direction: +p __pthread_initialize_minimal $6 = {} 0xf7d77be0 <__pthread_initialize_minimal_internal> +call dladdr(0xf7d77be0, $3) $8 = 1 +p *(Dl_info *)$3 $10 = {dli_fname = 0xf7fd4d70 "/lib/i386-linux-gnu/libpthread.so.0", dli_fbase = 0xf7d73000, dli_sname = 0xf7d75617 "__pthread_initialize_minimal", dli_saddr = 0xf7d77be0 <__pthread_initialize_minimal_internal>} // ============================================================================ Aha! Same DSO, different base address. So the ?? instance of __pthread_initialize_minimal_internal was from the _other_ copy of libc, inside the dlmopen namespace - the one gdb doesn't know how to inspect. PS: for completeness, I went back and followed the __stack_user linked list at the "GONE WRONG HERE" point, just to be sure: +p __stack_user $1 = {next = 0xf76eeb60, prev = 0xf76eeb60} +p __stack_user.next $2 = (struct list_head *) 0xf76eeb60 +p __stack_user.next->next $3 = (struct list_head *) 0xf71391a4 +p __stack_user.next->next->next $4 = (struct list_head *) 0xf71391a4 +p __stack_user.next->next->next->next $5 = (struct list_head *) 0xf71391a4 So the linked list definitely doesn't contain &__stack_user any more. // ============================================================================ Apologies for the exegesis: It seems to me that the copy of libc in the private namespace has somehow managed to scribble on the linked list pointed to by __stack_user, overwriting a key address. Is my analysis correct? Is there something I could or should have done to avoid this? A while ago (https://sourceware.org/ml/libc-help/2018-01/msg00002.html) I suggested a dlmopen flag RTLD_UNIQUE or similar which would cause the existing mapping of the target library in the main namespace/link-map to be re-used instead of creating a new one: I believe this would prevent this problem (and others detailed in that message) from occurring - any thoughts? --8323329-1876895591-1516802379=:22921--