From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 657 invoked by alias); 24 Jan 2018 17:08:33 -0000 Mailing-List: contact libc-help-help@sourceware.org; run by ezmlm Precedence: bulk List-Subscribe: List-Post: List-Help: , Sender: libc-help-owner@sourceware.org Received: (qmail 646 invoked by uid 89); 24 Jan 2018 17:08:32 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.5 required=5.0 tests=AWL,BAYES_00,KAM_ASCII_DIVIDERS,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=no version=3.3.2 spammy=watches, symptom, STILL, GOOD X-HELO: mail-qt0-f180.google.com Received: from mail-qt0-f180.google.com (HELO mail-qt0-f180.google.com) (209.85.216.180) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 24 Jan 2018 17:08:29 +0000 Received: by mail-qt0-f180.google.com with SMTP id s39so12116030qth.7 for ; Wed, 24 Jan 2018 09:08:29 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=DnYc38gMN1VIdNof126jAwpKVE+uN5GYB0iwEfqdv2Y=; b=oxwh/qAnPaSsX19WLVl7+h58dCUUqkP7EU9y8NQBkjYM31zk/BpWA/XEPZrAZQsUTK VHDKfxdLMWjSbSswNNUt/Soc8foXu1REgi2XJyKNzrY1Wy/Vh809V5svuAxT1l5joicX kRTX2vzuddaek0Nv66z76leL7WTNGS8bFunvqydMzhmCz4JJKxDGyFdj3imDw3SzBq5K BFZtiRfaaJrUgXx6waakCO+gZGLJuqKglw1onpB2jjttMTQdv60xw/spj6gtWs2TXjX8 yo86dlIR2AqOFyftJcuN7j5mYKgRW+l6/qXgC4VFHPqO1aqjxRFTcPeBPhts8jnOl4yF R/YA== X-Gm-Message-State: AKwxytc9rrmd69+WmeSIdG5fP7yjlJEehPFwFvqCxsEB3U621KJwa2yC IO1NjKiz8oTHHO/TAMvVJ3C3/bHBqYY= X-Google-Smtp-Source: AH8x224V50zdNURN51xtTM6KdygcuS5iAG9ZbRwh3uF8gXOqcTZNZYytJCjm6y5xO4WpANx+Vdfd/Q== X-Received: by 10.55.144.198 with SMTP id s189mr10432736qkd.168.1516813707604; Wed, 24 Jan 2018 09:08:27 -0800 (PST) Received: from [10.0.0.2] ([179.159.9.95]) by smtp.googlemail.com with ESMTPSA id y30sm2417978qtm.50.2018.01.24.09.08.26 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 24 Jan 2018 09:08:26 -0800 (PST) Subject: Re: A possible libc/dlmopen/pthreads bug To: libc-help@sourceware.org References: From: Adhemerval Zanella Message-ID: <63d90928-f9ee-27bf-1f78-ff4f5eaeed39@linaro.org> Date: Wed, 24 Jan 2018 17:08:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-IsSubscribed: yes X-SW-Source: 2018-01/txt/msg00010.txt.bz2 On 24/01/2018 11:59, Vivek Das Mohapatra wrote: > Hello - > > As I've posted here before, I'm working on a segregated-dynamic-linking > mechanism based on dlmopen (to allow applications to link against libraries > without sharing their dependencies). > > While testing some recent builds, I think I may have tripped over a > possible dlmopen/pthreads bug. It is entirely possible that I'm doing > something wrong, or failing to do something - a lot of this is new > territory for me. > > Either way, I'd appreciate some feedback/insight into what's going on. > > A bit of background before I proceed: libcapsule is the helper library > I'm working on, based on dlmopen, that allows me to create "shim" > libraries that expose the symbols from their immediate target, > but don't expose any further dependencies. > >   +-----------------------------+ +----------------------------+ >   | Runtime filesystem          | | Host filesystem            | >   |                             | |                            | >   | +------------+              | |                            | >   | | Executable |              | |                            | >   | ++------+----+              | |                            | >   |  |      |                   | |                            | >   |  |   +--+---------------+   | |   +------------------+     | >   |  |   | shim libX11      | <-----> | real libX11      |-+   | >   |  |   +--+---------------+   | |   +----------+-------+ |   | >   |  |      |                   | |              |         |   | >   |  |      |                   | |          +---+-------+ |   | >   | ++------+-----+             | |          | libA      | |   | >   | | libc        |             | |          +-----------+ |   | >   | +-------------+             | |          +-----------+ |   | >   |                             | |          | libB      |-+   | >   |                             | |          +-----------+     | >   +-----------------------------+ +----------------------------+ > > The libraries on the right hand side from the host filesystem are > in a dlmopen namespace of their own (as an implementation detail > they have the same libc as the DSOs on the left, but it is a new > copy opened with dlmopen) > > The shim libX11 takes care of making sure the executable gets symbols > from the real libX11 instead of the shim, but the bug we're going > to look is not (I think) related to that, so I'm not going to discuss > that process here. > > The visible symptom is that when I launch pulseaudio with a few shimmed > libraries (the first case I stumbled upon, and easy to reproduce) it seems > to deadlock very early in its life. > > A bit of digging with strace and gdb shows that when it locks up > it does so inside setresuid. A bit more digging indicates that the > code is infinite looping here: > > __nptl_setxid (cmdp=0xffffd9d8) at allocatestack.c:1105 > +list > 1103 1104      /* Now the list with threads using user-allocated stacks.  */ > 1105      list_for_each (runp, &__stack_user) > 1106        { > 1107          struct pthread *t = list_entry (runp, struct pthread, list); > 1108          if (t == self) > 1109            continue; > 1110 1111          setxid_mark_thread (cmdp, t); > 1112        } > > For some reason, list_for_each never terminates. > > If I disable encapsulation (remove the shim libraries from the library > path) then the following holds at that point in the code: > > Breakpoint 6, __nptl_setxid (cmdp=0xffffd9e8) at allocatestack.c:1105 > 1105      list_for_each (runp, &__stack_user) > +bt > #0  __nptl_setxid (cmdp=0xffffd9e8) at allocatestack.c:1105 > #1  0xf7b96162 in __GI___setresuid (ruid=1000, euid=1000, suid=1000) >       at ../sysdeps/unix/sysv/linux/i386/setresuid.c:29 > #2  0x5655b7f0 in pa_drop_root () > #3  0x56558a6e in main () > > Digging into __stack_user: > > +p __stack_user > $1 = {next = 0xf73a48a0, prev = 0xf73a48a0} > > +p &__stack_user > $2 = (list_t *) 0xf7d1d1a4 <__stack_user> > > +p (&__stack_user)->next > $3 = (struct list_head *) 0xf73a48a0 > > +p (&__stack_user)->next->next > $4 = (struct list_head *) 0xf7d1d1a4 <__stack_user> > > +p (&__stack_user)->next->next->next > $5 = (struct list_head *) 0xf73a48a0 > > We find a circular linked list, which contains a pointer to __stack_user. > Since list_for_each is invoked as list_for_each(…, &__stack_user), > this means the for loop it implements will terminate, allowing setresuid > to proceed. > > // ============================================================================ > Note: The definition of list_for_each is this: > > # define list_for_each(pos, head) \ >   for (pos = (head)->next; pos != (head); pos = pos->next) > // ============================================================================ > > Now let's examine the same case with the shim in place: > > Breakpoint 6, __nptl_setxid (cmdp=0xffffd9d8) at allocatestack.c:1105 > 1105      list_for_each (runp, &__stack_user) >  ⋮ > +p __stack_user > $1 = {next = 0xf76eeb60, prev = 0xf76eeb60} > > +p &__stack_user > $2 = (list_t *) 0xf7d8f1a4 <__stack_user> > > +p (&__stack_user)->next > $3 = (struct list_head *) 0xf76eeb60 > > +p (&__stack_user)->next->next > $4 = (struct list_head *) 0xf71391a4 > > +p (&__stack_user)->next->next->next > $5 = (struct list_head *) 0xf76eeb60 > > We can see we have a circular linked list, as before, but it does > _not_ contain the element supplied as the head to list_for_each: > We're going to loop forever. > > ============================================================================ > > Next let's try and figure out when/where this happens. > Setting various breakpoints and watches we uncover the following: > > +run > Starting program: /usr/bin/pulseaudio --start > [Thread debugging using libthread_db enabled] > Using host libthread_db library "/lib/i386-linux-gnu/libthread_db.so.1". > > Breakpoint 1, __pthread_initialize_minimal_internal () at nptl-init.c:290 > 290    { > +break allocatestack.c:1105 > Breakpoint 6 at 0xf7d78b2c: file allocatestack.c, line 1105. > +watch __stack_user > Hardware watchpoint 7: __stack_user > +watch __stack_user.next > Hardware watchpoint 8: __stack_user.next > +cont > Continuing. > > Hardware watchpoint 7: __stack_user > > Old value = {next = 0x0, prev = 0x0} > New value = {next = 0xf7d8f1a4 <__stack_user>, prev = 0x0} > > Hardware watchpoint 8: __stack_user.next > > Old value = (struct list_head *) 0x0 > New value = (struct list_head *) 0xf7d8f1a4 <__stack_user> > __pthread_initialize_minimal_internal () at nptl-init.c:377 > 377      list_add (&pd->list, &__stack_user); > +cont > Continuing. > > Hardware watchpoint 7: __stack_user > > Old value = {next = 0xf7d8f1a4 <__stack_user>, prev = 0x0} > New value = {next = 0xf7d8f1a4 <__stack_user>, prev = 0xf76eeb60} > list_add (head=, newp=0xf76eeb60) at ../include/list.h:64 > 64      head->next = newp; > +cont > Continuing. > > Hardware watchpoint 7: __stack_user > > Old value = {next = 0xf7d8f1a4 <__stack_user>, prev = 0xf76eeb60} > New value = {next = 0xf76eeb60, prev = 0xf76eeb60} > > Hardware watchpoint 8: __stack_user.next > > Old value = (struct list_head *) 0xf7d8f1a4 <__stack_user> > New value = (struct list_head *) 0xf76eeb60 > __pthread_initialize_minimal_internal () at nptl-init.c:381 > 381      THREAD_SETMEM (pd, report_events, __nptl_initial_report_events); > +cont > Continuing. > > Breakpoint 2, __pthread_init_static_tls (map=0x5657e040) at allocatestack.c:1210 > 1210    { > > // ============================================================================ > // At this point we step to the end of __pthread_init_static_tls and set > // an extra watch point on the address currently holding &__stack_user > // ============================================================================ > > +p __stack_user.next > $1 = (struct list_head *) 0xf76eeb60 > > +p __stack_user.next->next > $2 = (struct list_head *) 0xf7d8f1a4 <__stack_user>  ← STILL GOOD > > +watch __stack_user.next->next > Hardware watchpoint 9: __stack_user.next->next > +s > > > // And here it is: Hardware watchpoint 9: __stack_user.next->next > > Old value = (struct list_head *) 0xf7d8f1a4 <__stack_user> > New value = (struct list_head *) 0xf71391a4 ← >>>>> GONE WRONG HERE <<<<< > 0xf7121c83 in ?? () > > // Hm, an unknown address scribbling on __stack_user. > > +call calloc(1, sizeof(Dl_info)) > $3 = (void *) 0x56574d18 > +call dladdr(0xf7121c83, $3) > $4 = 1 > > +p *(Dl_info *)$3 > $5 = {dli_fname = 0x565755b8 "/lib/i386-linux-gnu/libpthread.so.0", >       dli_fbase = 0xf711d000, >       dli_sname = 0xf711f617 "__pthread_initialize_minimal", >       dli_saddr = 0xf7121be0} > > // Well that can't be right, can it? gdb should have figured out the name > // of 0xf7121c83, not said ?? - let's work out the address in the other > // direction: > > +p __pthread_initialize_minimal > $6 = {} 0xf7d77be0 <__pthread_initialize_minimal_internal> > > +call dladdr(0xf7d77be0, $3) > $8 = 1 > > +p *(Dl_info *)$3 > $10 = {dli_fname = 0xf7fd4d70 "/lib/i386-linux-gnu/libpthread.so.0", >        dli_fbase = 0xf7d73000, >        dli_sname = 0xf7d75617 "__pthread_initialize_minimal", >        dli_saddr = 0xf7d77be0 <__pthread_initialize_minimal_internal>} > > // ============================================================================ > > Aha! Same DSO, different base address. So the ?? instance of > __pthread_initialize_minimal_internal was from the _other_ copy of libc, > inside the dlmopen namespace - the one gdb doesn't know how to inspect. > > > PS: for completeness, I went back and followed the __stack_user linked list > at the "GONE WRONG HERE" point, just to be sure: > > +p __stack_user > $1 = {next = 0xf76eeb60, prev = 0xf76eeb60} > > +p __stack_user.next > $2 = (struct list_head *) 0xf76eeb60 > > +p __stack_user.next->next > $3 = (struct list_head *) 0xf71391a4 > > +p __stack_user.next->next->next > $4 = (struct list_head *) 0xf71391a4 > > +p __stack_user.next->next->next->next > $5 = (struct list_head *) 0xf71391a4 > > So the linked list definitely doesn't contain &__stack_user any more. > > // ============================================================================ > > Apologies for the exegesis: It seems to me that the copy of libc in the > private namespace has somehow managed to scribble on the linked list > pointed to by __stack_user, overwriting a key address. > > Is my analysis correct? Is there something I could or should have done to > avoid this? > > A while ago (https://sourceware.org/ml/libc-help/2018-01/msg00002.html) > I suggested a dlmopen flag RTLD_UNIQUE or similar which would cause the > existing mapping of the target library in the main namespace/link-map to be > re-used instead of creating a new one: I believe this would prevent this > problem (and others detailed in that message) from occurring - any thoughts? Nice write through, if you could please open a bug report if possible with a testcase to trigger it. I am wondering if this is triggering already reported issues with dlmopen: BZ#18684 [1], BZ#15271 [2], and BZ#15134 [3]. In fact it really looks like BZ#18684, where Carlos noted namespace's global searchlist (RTLD_GLOBAL) is never initialized in some cases. [1] https://sourceware.org/bugzilla/show_bug.cgi?id=18684 [2] https://sourceware.org/bugzilla/show_bug.cgi?id=15271 [3] https://sourceware.org/bugzilla/show_bug.cgi?id=15134