public inbox for libc-help@sourceware.org
 help / color / mirror / Atom feed
* A possible libc/dlmopen/pthreads bug
@ 2018-01-24 13:59 Vivek Das Mohapatra
  2018-01-24 16:52 ` Szabolcs Nagy
  2018-01-24 17:08 ` Adhemerval Zanella
  0 siblings, 2 replies; 15+ messages in thread
From: Vivek Das Mohapatra @ 2018-01-24 13:59 UTC (permalink / raw)
  To: libc-help

[-- Attachment #1: Type: text/plain, Size: 10391 bytes --]

Hello -

As I've posted here before, I'm working on a segregated-dynamic-linking
mechanism based on dlmopen (to allow applications to link against libraries
without sharing their dependencies).

While testing some recent builds, I think I may have tripped over a
possible dlmopen/pthreads bug. It is entirely possible that I'm doing
something wrong, or failing to do something - a lot of this is new
territory for me.

Either way, I'd appreciate some feedback/insight into what's going on.

A bit of background before I proceed: libcapsule is the helper library
I'm working on, based on dlmopen, that allows me to create "shim"
libraries that expose the symbols from their immediate target,
but don't expose any further dependencies.

   +-----------------------------+ +----------------------------+
   | Runtime filesystem          | | Host filesystem            |
   |                             | |                            |
   | +------------+              | |                            |
   | | Executable |              | |                            |
   | ++------+----+              | |                            |
   |  |      |                   | |                            |
   |  |   +--+---------------+   | |   +------------------+     |
   |  |   | shim libX11      | <-----> | real libX11      |-+   |
   |  |   +--+---------------+   | |   +----------+-------+ |   |
   |  |      |                   | |              |         |   |
   |  |      |                   | |          +---+-------+ |   |
   | ++------+-----+             | |          | libA      | |   |
   | | libc        |             | |          +-----------+ |   |
   | +-------------+             | |          +-----------+ |   |
   |                             | |          | libB      |-+   |
   |                             | |          +-----------+     |
   +-----------------------------+ +----------------------------+

The libraries on the right hand side from the host filesystem are
in a dlmopen namespace of their own (as an implementation detail
they have the same libc as the DSOs on the left, but it is a new
copy opened with dlmopen)

The shim libX11 takes care of making sure the executable gets symbols
from the real libX11 instead of the shim, but the bug we're going
to look is not (I think) related to that, so I'm not going to discuss
that process here.

The visible symptom is that when I launch pulseaudio with a few shimmed
libraries (the first case I stumbled upon, and easy to reproduce) it seems
to deadlock very early in its life.

A bit of digging with strace and gdb shows that when it locks up
it does so inside setresuid. A bit more digging indicates that the
code is infinite looping here:

__nptl_setxid (cmdp=0xffffd9d8) at allocatestack.c:1105
+list
1103 
1104      /* Now the list with threads using user-allocated stacks.  */
1105      list_for_each (runp, &__stack_user)
1106        {
1107          struct pthread *t = list_entry (runp, struct pthread, list);
1108          if (t == self)
1109            continue;
1110 
1111          setxid_mark_thread (cmdp, t);
1112        }

For some reason, list_for_each never terminates.

If I disable encapsulation (remove the shim libraries from the library
path) then the following holds at that point in the code:

Breakpoint 6, __nptl_setxid (cmdp=0xffffd9e8) at allocatestack.c:1105
1105	  list_for_each (runp, &__stack_user)
+bt
#0  __nptl_setxid (cmdp=0xffffd9e8) at allocatestack.c:1105
#1  0xf7b96162 in __GI___setresuid (ruid=1000, euid=1000, suid=1000)
       at ../sysdeps/unix/sysv/linux/i386/setresuid.c:29
#2  0x5655b7f0 in pa_drop_root ()
#3  0x56558a6e in main ()

Digging into __stack_user:

+p __stack_user
$1 = {next = 0xf73a48a0, prev = 0xf73a48a0}

+p &__stack_user
$2 = (list_t *) 0xf7d1d1a4 <__stack_user>

+p (&__stack_user)->next
$3 = (struct list_head *) 0xf73a48a0

+p (&__stack_user)->next->next
$4 = (struct list_head *) 0xf7d1d1a4 <__stack_user>

+p (&__stack_user)->next->next->next
$5 = (struct list_head *) 0xf73a48a0

We find a circular linked list, which contains a pointer to __stack_user.
Since list_for_each is invoked as list_for_each(…, &__stack_user),
this means the for loop it implements will terminate, allowing setresuid
to proceed.

// ============================================================================
Note: The definition of list_for_each is this:

# define list_for_each(pos, head) \
   for (pos = (head)->next; pos != (head); pos = pos->next)
// ============================================================================

Now let's examine the same case with the shim in place:

Breakpoint 6, __nptl_setxid (cmdp=0xffffd9d8) at allocatestack.c:1105
1105	  list_for_each (runp, &__stack_user)
  â‹®
+p __stack_user
$1 = {next = 0xf76eeb60, prev = 0xf76eeb60}

+p &__stack_user
$2 = (list_t *) 0xf7d8f1a4 <__stack_user>

+p (&__stack_user)->next
$3 = (struct list_head *) 0xf76eeb60

+p (&__stack_user)->next->next
$4 = (struct list_head *) 0xf71391a4

+p (&__stack_user)->next->next->next
$5 = (struct list_head *) 0xf76eeb60

We can see we have a circular linked list, as before, but it does
_not_ contain the element supplied as the head to list_for_each:
We're going to loop forever.

============================================================================

Next let's try and figure out when/where this happens.
Setting various breakpoints and watches we uncover the following:

+run
Starting program: /usr/bin/pulseaudio --start
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/i386-linux-gnu/libthread_db.so.1".

Breakpoint 1, __pthread_initialize_minimal_internal () at nptl-init.c:290
290	{
+break allocatestack.c:1105
Breakpoint 6 at 0xf7d78b2c: file allocatestack.c, line 1105.
+watch __stack_user
Hardware watchpoint 7: __stack_user
+watch __stack_user.next
Hardware watchpoint 8: __stack_user.next
+cont
Continuing.

Hardware watchpoint 7: __stack_user

Old value = {next = 0x0, prev = 0x0}
New value = {next = 0xf7d8f1a4 <__stack_user>, prev = 0x0}

Hardware watchpoint 8: __stack_user.next

Old value = (struct list_head *) 0x0
New value = (struct list_head *) 0xf7d8f1a4 <__stack_user>
__pthread_initialize_minimal_internal () at nptl-init.c:377
377	  list_add (&pd->list, &__stack_user);
+cont
Continuing.

Hardware watchpoint 7: __stack_user

Old value = {next = 0xf7d8f1a4 <__stack_user>, prev = 0x0}
New value = {next = 0xf7d8f1a4 <__stack_user>, prev = 0xf76eeb60}
list_add (head=<optimized out>, newp=0xf76eeb60) at ../include/list.h:64
64	  head->next = newp;
+cont
Continuing.

Hardware watchpoint 7: __stack_user

Old value = {next = 0xf7d8f1a4 <__stack_user>, prev = 0xf76eeb60}
New value = {next = 0xf76eeb60, prev = 0xf76eeb60}

Hardware watchpoint 8: __stack_user.next

Old value = (struct list_head *) 0xf7d8f1a4 <__stack_user>
New value = (struct list_head *) 0xf76eeb60
__pthread_initialize_minimal_internal () at nptl-init.c:381
381	  THREAD_SETMEM (pd, report_events, __nptl_initial_report_events);
+cont
Continuing.

Breakpoint 2, __pthread_init_static_tls (map=0x5657e040) at allocatestack.c:1210
1210	{

// ============================================================================
// At this point we step to the end of __pthread_init_static_tls and set
// an extra watch point on the address currently holding &__stack_user
// ============================================================================

+p __stack_user.next
$1 = (struct list_head *) 0xf76eeb60

+p __stack_user.next->next
$2 = (struct list_head *) 0xf7d8f1a4 <__stack_user>  ← STILL GOOD

+watch __stack_user.next->next
Hardware watchpoint 9: __stack_user.next->next
+s


// And here it is: 
Hardware watchpoint 9: __stack_user.next->next

Old value = (struct list_head *) 0xf7d8f1a4 <__stack_user>
New value = (struct list_head *) 0xf71391a4 ← >>>>> GONE WRONG HERE <<<<<
0xf7121c83 in ?? ()

// Hm, an unknown address scribbling on __stack_user.

+call calloc(1, sizeof(Dl_info))
$3 = (void *) 0x56574d18
+call dladdr(0xf7121c83, $3)
$4 = 1

+p *(Dl_info *)$3
$5 = {dli_fname = 0x565755b8 "/lib/i386-linux-gnu/libpthread.so.0",
       dli_fbase = 0xf711d000,
       dli_sname = 0xf711f617 "__pthread_initialize_minimal",
       dli_saddr = 0xf7121be0}

// Well that can't be right, can it? gdb should have figured out the name
// of 0xf7121c83, not said ?? - let's work out the address in the other
// direction:

+p __pthread_initialize_minimal
$6 = {<text variable, no debug info>} 0xf7d77be0 <__pthread_initialize_minimal_internal>

+call dladdr(0xf7d77be0, $3)
$8 = 1

+p *(Dl_info *)$3
$10 = {dli_fname = 0xf7fd4d70 "/lib/i386-linux-gnu/libpthread.so.0",
        dli_fbase = 0xf7d73000,
        dli_sname = 0xf7d75617 "__pthread_initialize_minimal",
        dli_saddr = 0xf7d77be0 <__pthread_initialize_minimal_internal>}

// ============================================================================

Aha! Same DSO, different base address. So the ?? instance of
__pthread_initialize_minimal_internal was from the _other_ copy of libc,
inside the dlmopen namespace - the one gdb doesn't know how to inspect.


PS: for completeness, I went back and followed the __stack_user linked list
at the "GONE WRONG HERE" point, just to be sure:

+p __stack_user
$1 = {next = 0xf76eeb60, prev = 0xf76eeb60}

+p __stack_user.next
$2 = (struct list_head *) 0xf76eeb60

+p __stack_user.next->next
$3 = (struct list_head *) 0xf71391a4

+p __stack_user.next->next->next
$4 = (struct list_head *) 0xf71391a4

+p __stack_user.next->next->next->next
$5 = (struct list_head *) 0xf71391a4

So the linked list definitely doesn't contain &__stack_user any more.

// ============================================================================

Apologies for the exegesis: It seems to me that the copy of libc in the
private namespace has somehow managed to scribble on the linked list
pointed to by __stack_user, overwriting a key address.

Is my analysis correct? Is there something I could or should have done to
avoid this?

A while ago (https://sourceware.org/ml/libc-help/2018-01/msg00002.html)
I suggested a dlmopen flag RTLD_UNIQUE or similar which would cause the
existing mapping of the target library in the main namespace/link-map to be
re-used instead of creating a new one: I believe this would prevent this
problem (and others detailed in that message) from occurring - any thoughts?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: A possible libc/dlmopen/pthreads bug
  2018-01-24 13:59 A possible libc/dlmopen/pthreads bug Vivek Das Mohapatra
@ 2018-01-24 16:52 ` Szabolcs Nagy
  2018-01-24 17:16   ` Vivek Das Mohapatra
  2018-01-24 17:25   ` Adhemerval Zanella
  2018-01-24 17:08 ` Adhemerval Zanella
  1 sibling, 2 replies; 15+ messages in thread
From: Szabolcs Nagy @ 2018-01-24 16:52 UTC (permalink / raw)
  To: Vivek Das Mohapatra, libc-help; +Cc: nd

On 24/01/18 13:59, Vivek Das Mohapatra wrote:
> Apologies for the exegesis: It seems to me that the copy of libc in the
> private namespace has somehow managed to scribble on the linked list
> pointed to by __stack_user, overwriting a key address.
> 
> Is my analysis correct? Is there something I could or should have done to
> avoid this?
> 
> A while ago (https://sourceware.org/ml/libc-help/2018-01/msg00002.html)
> I suggested a dlmopen flag RTLD_UNIQUE or similar which would cause the
> existing mapping of the target library in the main namespace/link-map to be
> re-used instead of creating a new one: I believe this would prevent this
> problem (and others detailed in that message) from occurring - any thoughts?

i don't know what you are doing, but it's hard to imagine
that two libcs (or libpthreads) would work in the same
process: if they can run code on the same thread they
cannot both control the tcb (and will clobber each other's
global state through that).

same for signal handlers (for internal signals) or
brk syscall, or stdio buffering, etc. the libc has to
deal with process global/thread local state that must
be controlled by the same code consistently otherwise
bad things happen.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: A possible libc/dlmopen/pthreads bug
  2018-01-24 13:59 A possible libc/dlmopen/pthreads bug Vivek Das Mohapatra
  2018-01-24 16:52 ` Szabolcs Nagy
@ 2018-01-24 17:08 ` Adhemerval Zanella
  2018-01-24 17:19   ` Vivek Das Mohapatra
  1 sibling, 1 reply; 15+ messages in thread
From: Adhemerval Zanella @ 2018-01-24 17:08 UTC (permalink / raw)
  To: libc-help



On 24/01/2018 11:59, Vivek Das Mohapatra wrote:
> Hello -
> 
> As I've posted here before, I'm working on a segregated-dynamic-linking
> mechanism based on dlmopen (to allow applications to link against libraries
> without sharing their dependencies).
> 
> While testing some recent builds, I think I may have tripped over a
> possible dlmopen/pthreads bug. It is entirely possible that I'm doing
> something wrong, or failing to do something - a lot of this is new
> territory for me.
> 
> Either way, I'd appreciate some feedback/insight into what's going on.
> 
> A bit of background before I proceed: libcapsule is the helper library
> I'm working on, based on dlmopen, that allows me to create "shim"
> libraries that expose the symbols from their immediate target,
> but don't expose any further dependencies.
> 
>   +-----------------------------+ +----------------------------+
>   | Runtime filesystem          | | Host filesystem            |
>   |                             | |                            |
>   | +------------+              | |                            |
>   | | Executable |              | |                            |
>   | ++------+----+              | |                            |
>   |  |      |                   | |                            |
>   |  |   +--+---------------+   | |   +------------------+     |
>   |  |   | shim libX11      | <-----> | real libX11      |-+   |
>   |  |   +--+---------------+   | |   +----------+-------+ |   |
>   |  |      |                   | |              |         |   |
>   |  |      |                   | |          +---+-------+ |   |
>   | ++------+-----+             | |          | libA      | |   |
>   | | libc        |             | |          +-----------+ |   |
>   | +-------------+             | |          +-----------+ |   |
>   |                             | |          | libB      |-+   |
>   |                             | |          +-----------+     |
>   +-----------------------------+ +----------------------------+
> 
> The libraries on the right hand side from the host filesystem are
> in a dlmopen namespace of their own (as an implementation detail
> they have the same libc as the DSOs on the left, but it is a new
> copy opened with dlmopen)
> 
> The shim libX11 takes care of making sure the executable gets symbols
> from the real libX11 instead of the shim, but the bug we're going
> to look is not (I think) related to that, so I'm not going to discuss
> that process here.
> 
> The visible symptom is that when I launch pulseaudio with a few shimmed
> libraries (the first case I stumbled upon, and easy to reproduce) it seems
> to deadlock very early in its life.
> 
> A bit of digging with strace and gdb shows that when it locks up
> it does so inside setresuid. A bit more digging indicates that the
> code is infinite looping here:
> 
> __nptl_setxid (cmdp=0xffffd9d8) at allocatestack.c:1105
> +list
> 1103 1104      /* Now the list with threads using user-allocated stacks.  */
> 1105      list_for_each (runp, &__stack_user)
> 1106        {
> 1107          struct pthread *t = list_entry (runp, struct pthread, list);
> 1108          if (t == self)
> 1109            continue;
> 1110 1111          setxid_mark_thread (cmdp, t);
> 1112        }
> 
> For some reason, list_for_each never terminates.
> 
> If I disable encapsulation (remove the shim libraries from the library
> path) then the following holds at that point in the code:
> 
> Breakpoint 6, __nptl_setxid (cmdp=0xffffd9e8) at allocatestack.c:1105
> 1105      list_for_each (runp, &__stack_user)
> +bt
> #0  __nptl_setxid (cmdp=0xffffd9e8) at allocatestack.c:1105
> #1  0xf7b96162 in __GI___setresuid (ruid=1000, euid=1000, suid=1000)
>       at ../sysdeps/unix/sysv/linux/i386/setresuid.c:29
> #2  0x5655b7f0 in pa_drop_root ()
> #3  0x56558a6e in main ()
> 
> Digging into __stack_user:
> 
> +p __stack_user
> $1 = {next = 0xf73a48a0, prev = 0xf73a48a0}
> 
> +p &__stack_user
> $2 = (list_t *) 0xf7d1d1a4 <__stack_user>
> 
> +p (&__stack_user)->next
> $3 = (struct list_head *) 0xf73a48a0
> 
> +p (&__stack_user)->next->next
> $4 = (struct list_head *) 0xf7d1d1a4 <__stack_user>
> 
> +p (&__stack_user)->next->next->next
> $5 = (struct list_head *) 0xf73a48a0
> 
> We find a circular linked list, which contains a pointer to __stack_user.
> Since list_for_each is invoked as list_for_each(…, &__stack_user),
> this means the for loop it implements will terminate, allowing setresuid
> to proceed.
> 
> // ============================================================================
> Note: The definition of list_for_each is this:
> 
> # define list_for_each(pos, head) \
>   for (pos = (head)->next; pos != (head); pos = pos->next)
> // ============================================================================
> 
> Now let's examine the same case with the shim in place:
> 
> Breakpoint 6, __nptl_setxid (cmdp=0xffffd9d8) at allocatestack.c:1105
> 1105      list_for_each (runp, &__stack_user)
>  ⋮
> +p __stack_user
> $1 = {next = 0xf76eeb60, prev = 0xf76eeb60}
> 
> +p &__stack_user
> $2 = (list_t *) 0xf7d8f1a4 <__stack_user>
> 
> +p (&__stack_user)->next
> $3 = (struct list_head *) 0xf76eeb60
> 
> +p (&__stack_user)->next->next
> $4 = (struct list_head *) 0xf71391a4
> 
> +p (&__stack_user)->next->next->next
> $5 = (struct list_head *) 0xf76eeb60
> 
> We can see we have a circular linked list, as before, but it does
> _not_ contain the element supplied as the head to list_for_each:
> We're going to loop forever.
> 
> ============================================================================
> 
> Next let's try and figure out when/where this happens.
> Setting various breakpoints and watches we uncover the following:
> 
> +run
> Starting program: /usr/bin/pulseaudio --start
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/i386-linux-gnu/libthread_db.so.1".
> 
> Breakpoint 1, __pthread_initialize_minimal_internal () at nptl-init.c:290
> 290    {
> +break allocatestack.c:1105
> Breakpoint 6 at 0xf7d78b2c: file allocatestack.c, line 1105.
> +watch __stack_user
> Hardware watchpoint 7: __stack_user
> +watch __stack_user.next
> Hardware watchpoint 8: __stack_user.next
> +cont
> Continuing.
> 
> Hardware watchpoint 7: __stack_user
> 
> Old value = {next = 0x0, prev = 0x0}
> New value = {next = 0xf7d8f1a4 <__stack_user>, prev = 0x0}
> 
> Hardware watchpoint 8: __stack_user.next
> 
> Old value = (struct list_head *) 0x0
> New value = (struct list_head *) 0xf7d8f1a4 <__stack_user>
> __pthread_initialize_minimal_internal () at nptl-init.c:377
> 377      list_add (&pd->list, &__stack_user);
> +cont
> Continuing.
> 
> Hardware watchpoint 7: __stack_user
> 
> Old value = {next = 0xf7d8f1a4 <__stack_user>, prev = 0x0}
> New value = {next = 0xf7d8f1a4 <__stack_user>, prev = 0xf76eeb60}
> list_add (head=<optimized out>, newp=0xf76eeb60) at ../include/list.h:64
> 64      head->next = newp;
> +cont
> Continuing.
> 
> Hardware watchpoint 7: __stack_user
> 
> Old value = {next = 0xf7d8f1a4 <__stack_user>, prev = 0xf76eeb60}
> New value = {next = 0xf76eeb60, prev = 0xf76eeb60}
> 
> Hardware watchpoint 8: __stack_user.next
> 
> Old value = (struct list_head *) 0xf7d8f1a4 <__stack_user>
> New value = (struct list_head *) 0xf76eeb60
> __pthread_initialize_minimal_internal () at nptl-init.c:381
> 381      THREAD_SETMEM (pd, report_events, __nptl_initial_report_events);
> +cont
> Continuing.
> 
> Breakpoint 2, __pthread_init_static_tls (map=0x5657e040) at allocatestack.c:1210
> 1210    {
> 
> // ============================================================================
> // At this point we step to the end of __pthread_init_static_tls and set
> // an extra watch point on the address currently holding &__stack_user
> // ============================================================================
> 
> +p __stack_user.next
> $1 = (struct list_head *) 0xf76eeb60
> 
> +p __stack_user.next->next
> $2 = (struct list_head *) 0xf7d8f1a4 <__stack_user>  ← STILL GOOD
> 
> +watch __stack_user.next->next
> Hardware watchpoint 9: __stack_user.next->next
> +s
> 
> 
> // And here it is: Hardware watchpoint 9: __stack_user.next->next
> 
> Old value = (struct list_head *) 0xf7d8f1a4 <__stack_user>
> New value = (struct list_head *) 0xf71391a4 ← >>>>> GONE WRONG HERE <<<<<
> 0xf7121c83 in ?? ()
> 
> // Hm, an unknown address scribbling on __stack_user.
> 
> +call calloc(1, sizeof(Dl_info))
> $3 = (void *) 0x56574d18
> +call dladdr(0xf7121c83, $3)
> $4 = 1
> 
> +p *(Dl_info *)$3
> $5 = {dli_fname = 0x565755b8 "/lib/i386-linux-gnu/libpthread.so.0",
>       dli_fbase = 0xf711d000,
>       dli_sname = 0xf711f617 "__pthread_initialize_minimal",
>       dli_saddr = 0xf7121be0}
> 
> // Well that can't be right, can it? gdb should have figured out the name
> // of 0xf7121c83, not said ?? - let's work out the address in the other
> // direction:
> 
> +p __pthread_initialize_minimal
> $6 = {<text variable, no debug info>} 0xf7d77be0 <__pthread_initialize_minimal_internal>
> 
> +call dladdr(0xf7d77be0, $3)
> $8 = 1
> 
> +p *(Dl_info *)$3
> $10 = {dli_fname = 0xf7fd4d70 "/lib/i386-linux-gnu/libpthread.so.0",
>        dli_fbase = 0xf7d73000,
>        dli_sname = 0xf7d75617 "__pthread_initialize_minimal",
>        dli_saddr = 0xf7d77be0 <__pthread_initialize_minimal_internal>}
> 
> // ============================================================================
> 
> Aha! Same DSO, different base address. So the ?? instance of
> __pthread_initialize_minimal_internal was from the _other_ copy of libc,
> inside the dlmopen namespace - the one gdb doesn't know how to inspect.
> 
> 
> PS: for completeness, I went back and followed the __stack_user linked list
> at the "GONE WRONG HERE" point, just to be sure:
> 
> +p __stack_user
> $1 = {next = 0xf76eeb60, prev = 0xf76eeb60}
> 
> +p __stack_user.next
> $2 = (struct list_head *) 0xf76eeb60
> 
> +p __stack_user.next->next
> $3 = (struct list_head *) 0xf71391a4
> 
> +p __stack_user.next->next->next
> $4 = (struct list_head *) 0xf71391a4
> 
> +p __stack_user.next->next->next->next
> $5 = (struct list_head *) 0xf71391a4
> 
> So the linked list definitely doesn't contain &__stack_user any more.
> 
> // ============================================================================
> 
> Apologies for the exegesis: It seems to me that the copy of libc in the
> private namespace has somehow managed to scribble on the linked list
> pointed to by __stack_user, overwriting a key address.
> 
> Is my analysis correct? Is there something I could or should have done to
> avoid this?
> 
> A while ago (https://sourceware.org/ml/libc-help/2018-01/msg00002.html)
> I suggested a dlmopen flag RTLD_UNIQUE or similar which would cause the
> existing mapping of the target library in the main namespace/link-map to be
> re-used instead of creating a new one: I believe this would prevent this
> problem (and others detailed in that message) from occurring - any thoughts?

Nice write through, if you could please open a bug report if possible with a
testcase to trigger it.  I am wondering if this is triggering already reported
issues with dlmopen: BZ#18684 [1], BZ#15271 [2], and BZ#15134 [3].  In fact
it really looks like BZ#18684, where Carlos noted namespace's global searchlist
(RTLD_GLOBAL) is never initialized in some cases.

[1] https://sourceware.org/bugzilla/show_bug.cgi?id=18684
[2] https://sourceware.org/bugzilla/show_bug.cgi?id=15271
[3] https://sourceware.org/bugzilla/show_bug.cgi?id=15134

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: A possible libc/dlmopen/pthreads bug
  2018-01-24 16:52 ` Szabolcs Nagy
@ 2018-01-24 17:16   ` Vivek Das Mohapatra
  2018-01-24 17:25   ` Adhemerval Zanella
  1 sibling, 0 replies; 15+ messages in thread
From: Vivek Das Mohapatra @ 2018-01-24 17:16 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: libc-help, nd

[-- Attachment #1: Type: text/plain, Size: 1813 bytes --]

> i don't know what you are doing, but it's hard to imagine
> that two libcs (or libpthreads) would work in the same
> process: if they can run code on the same thread they

This is how dlmopen() as implemented currently works (and I have had
games like openarena and dungeon defenders working with this setup).

There's currently no way _not_ to have two mappings of libc when using
dlmopen (and they each have their own mmaps and separate heaps and so
forth, so it mostly works).

For example, here are the link maps when running glxgears with this
exact setup:

(dl-handle CAPSULE // the private dlmopen namespace
   [prev: (nil)] 0x57676b90: "/host/usr/lib/i386-linux-gnu/libXau.so.6.0.0" [next: 0x57676e90]
   [prev: 0x57676b90] 0x57676e90: "/lib/i386-linux-gnu/libc.so.6" [next: 0x57677160]
   [prev: 0x57676e90] 0x57677160: "/lib/i386-linux-gnu/ld-linux.so.2" [next: 0x57672540]
   â‹®
   [prev: 0x57683e60] 0x57684190: "/host/usr/lib/i386-linux-gnu/libGL.so.1.2.0" [next: (nil)])

(dl-handle DEFAULT // the vanilla namespace
   [prev: (nil)] 0xf778f920: "" [next: 0xf778fc10]
   [prev: 0xf778f920] 0xf778fc10: "linux-gate.so.1" [next: 0xf77658d8]
   â‹®
   [prev: 0xf7766668] 0xf7766920: "/lib/i386-linux-gnu/libc.so.6" [next: 0xf7766c08]
   â‹®
   [prev: 0xf7228978] 0xf7228c58: "/lib/i386-linux-gnu/libz.so.1" [next: (nil)])

As you can see, two copies of libc.

I'm not saying it's the right approach: For reasons outlined in this thread and 
the other one I mentioned I think one libc mapping across both namespaces is
the right way to go - but that's not how it currently works and it _does_ mostly
currently work.

It falls down if memory allocation/freeing occurs across the namespace boundary
(ie alloc on one side, free on the other), but the two libcs mostly can't even
see one another.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: A possible libc/dlmopen/pthreads bug
  2018-01-24 17:08 ` Adhemerval Zanella
@ 2018-01-24 17:19   ` Vivek Das Mohapatra
  0 siblings, 0 replies; 15+ messages in thread
From: Vivek Das Mohapatra @ 2018-01-24 17:19 UTC (permalink / raw)
  To: Adhemerval Zanella; +Cc: libc-help

> Nice write through, if you could please open a bug report if possible with a
> testcase to trigger it.  I am wondering if this is triggering already reported

Sure, I'll try and pare down the test case so it's as simple as possible, it's
a bit tangled up in my libcapsule development environment right now.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: A possible libc/dlmopen/pthreads bug
  2018-01-24 16:52 ` Szabolcs Nagy
  2018-01-24 17:16   ` Vivek Das Mohapatra
@ 2018-01-24 17:25   ` Adhemerval Zanella
  2018-01-24 17:47     ` Vivek Das Mohapatra
  2018-01-24 19:34     ` Szabolcs Nagy
  1 sibling, 2 replies; 15+ messages in thread
From: Adhemerval Zanella @ 2018-01-24 17:25 UTC (permalink / raw)
  To: libc-help



On 24/01/2018 14:52, Szabolcs Nagy wrote:
> On 24/01/18 13:59, Vivek Das Mohapatra wrote:
>> Apologies for the exegesis: It seems to me that the copy of libc in the
>> private namespace has somehow managed to scribble on the linked list
>> pointed to by __stack_user, overwriting a key address.
>>
>> Is my analysis correct? Is there something I could or should have done to
>> avoid this?
>>
>> A while ago (https://sourceware.org/ml/libc-help/2018-01/msg00002.html)
>> I suggested a dlmopen flag RTLD_UNIQUE or similar which would cause the
>> existing mapping of the target library in the main namespace/link-map to be
>> re-used instead of creating a new one: I believe this would prevent this
>> problem (and others detailed in that message) from occurring - any thoughts?
> 
> i don't know what you are doing, but it's hard to imagine
> that two libcs (or libpthreads) would work in the same
> process: if they can run code on the same thread they
> cannot both control the tcb (and will clobber each other's
> global state through that).
> 
> same for signal handlers (for internal signals) or
> brk syscall, or stdio buffering, etc. the libc has to
> deal with process global/thread local state that must
> be controlled by the same code consistently otherwise
> bad things happen.

Unfortunately glibc manual lacks the description and semantic of dlfcn
functions and man-pages does not cover the global resources shared.
Considering the API is derived from SunOS and also that OpenSolaris
manual [1] states that:

"When an object is loaded on a new link-map list, the object is isolated from the main run-
 ning program. Certain global resources are only usable from one link-map list. A few exam-
 ples  are  the  sbrk()  based  malloc(), libthread(), and the signal vectors. Care must be
 taken not to use any of these resources other than from the primary link-map  list.  These
 issues are discussed in further detail in the Linker and Libraries Guide."

I think we should first discuss which is the expected semantic of dlmopen
regarding shared resources.  Although OpenSolaris manual is a bit vague
about it, I still think complete isolation as Viver is expecting is quite
hard as far from current code expectations.

In fact I think Vivek examples of running process with two different libcs 
are working mainly because the different glibc define and access the shared 
resources in same manner.  I bet it will break the moment we change some 
state internal layout or semantic (for instance the __stack_user struct size).

Also, I think some API will mainly just not work as expect, for instance
posix timers, posix thread cancellation, or set* functions.

[1] https://www.unix.com/man-page/opensolaris/3C/dlmopen/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: A possible libc/dlmopen/pthreads bug
  2018-01-24 17:25   ` Adhemerval Zanella
@ 2018-01-24 17:47     ` Vivek Das Mohapatra
  2018-01-24 18:27       ` Adhemerval Zanella
  2018-01-24 20:08       ` Carlos O'Donell
  2018-01-24 19:34     ` Szabolcs Nagy
  1 sibling, 2 replies; 15+ messages in thread
From: Vivek Das Mohapatra @ 2018-01-24 17:47 UTC (permalink / raw)
  To: Adhemerval Zanella; +Cc: libc-help

> In fact I think Vivek examples of running process with two different libcs
> are working mainly because the different glibc define and access the shared
> resources in same manner.  I bet it will break the moment we change some
> state internal layout or semantic (for instance the __stack_user struct size).

A slight correction - we are deliberately using the _same_ libc to
avoid exactly that problem - it's that dlmopen as currently implemented
creates a new mapping of the same libc for the dlmopen namespace.

I also don't need (or expect) complete isolation of the libc cluster:
If I could, I would share the same mapping of libc & co with the main
link map.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: A possible libc/dlmopen/pthreads bug
  2018-01-24 17:47     ` Vivek Das Mohapatra
@ 2018-01-24 18:27       ` Adhemerval Zanella
  2018-01-24 19:48         ` Carlos O'Donell
  2018-01-24 21:49         ` Vivek Das Mohapatra
  2018-01-24 20:08       ` Carlos O'Donell
  1 sibling, 2 replies; 15+ messages in thread
From: Adhemerval Zanella @ 2018-01-24 18:27 UTC (permalink / raw)
  To: Vivek Das Mohapatra; +Cc: libc-help



On 24/01/2018 15:46, Vivek Das Mohapatra wrote:
>> In fact I think Vivek examples of running process with two different libcs
>> are working mainly because the different glibc define and access the shared
>> resources in same manner.  I bet it will break the moment we change some
>> state internal layout or semantic (for instance the __stack_user struct size).
> 
> A slight correction - we are deliberately using the _same_ libc to
> avoid exactly that problem - it's that dlmopen as currently implemented
> creates a new mapping of the same libc for the dlmopen namespace.
> 
> I also don't need (or expect) complete isolation of the libc cluster:
> If I could, I would share the same mapping of libc & co with the main
> link map.

Right, you first message gave the impression based on the provided
chart that you expected libc isolation as well. I still think we should
clarify the dlmopen semantics, but we can work on this through bug
reports and issue like this one you brought.

I think best course of action is still open a bug report with a possible
testcase.  I will ping Carlos to check his idea about it, he seemed
interested with previous dlmopen issues.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: A possible libc/dlmopen/pthreads bug
  2018-01-24 17:25   ` Adhemerval Zanella
  2018-01-24 17:47     ` Vivek Das Mohapatra
@ 2018-01-24 19:34     ` Szabolcs Nagy
  2018-01-24 20:05       ` Adhemerval Zanella
  2018-01-24 20:06       ` Carlos O'Donell
  1 sibling, 2 replies; 15+ messages in thread
From: Szabolcs Nagy @ 2018-01-24 19:34 UTC (permalink / raw)
  To: Adhemerval Zanella, libc-help; +Cc: nd

On 24/01/18 17:25, Adhemerval Zanella wrote:
> In fact I think Vivek examples of running process with two different libcs
> are working mainly because the different glibc define and access the shared
> resources in same manner.  I bet it will break the moment we change some
> state internal layout or semantic (for instance the __stack_user struct size).

even if the two libc has same code and same layout for
internal data structures it is completely broken:
there are two copies of the libc global data section
with inconsistent content.

if printf called twice via different libcs they get
buffered independently, but write to the same fd, so
the observable effect on stdout can be out of order,
fflush on namespace boundary does not help since the
locks of one libc don't work on the other, the writes
can happen concurrently.

setxid has to coordinate with all threads via signals,
even with the ones created by the other libc.

same for atfork handlers: the handlers of the other
libc has to run too.

same of tls/tsd dtors: at thread exit the other libc
tls/tsd data should be freed too.

the dynamic loader + libc + libpthread needs to have
one unique instance or some apis won't work across
all instances.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: A possible libc/dlmopen/pthreads bug
  2018-01-24 18:27       ` Adhemerval Zanella
@ 2018-01-24 19:48         ` Carlos O'Donell
  2018-01-24 21:49         ` Vivek Das Mohapatra
  1 sibling, 0 replies; 15+ messages in thread
From: Carlos O'Donell @ 2018-01-24 19:48 UTC (permalink / raw)
  To: Adhemerval Zanella, Vivek Das Mohapatra; +Cc: libc-help

On 01/24/2018 10:27 AM, Adhemerval Zanella wrote:
> On 24/01/2018 15:46, Vivek Das Mohapatra wrote:
>>> In fact I think Vivek examples of running process with two different libcs
>>> are working mainly because the different glibc define and access the shared
>>> resources in same manner.  I bet it will break the moment we change some
>>> state internal layout or semantic (for instance the __stack_user struct size).
>>
>> A slight correction - we are deliberately using the _same_ libc to
>> avoid exactly that problem - it's that dlmopen as currently implemented
>> creates a new mapping of the same libc for the dlmopen namespace.
>>
>> I also don't need (or expect) complete isolation of the libc cluster:
>> If I could, I would share the same mapping of libc & co with the main
>> link map.
> 
> Right, you first message gave the impression based on the provided
> chart that you expected libc isolation as well. I still think we should
> clarify the dlmopen semantics, but we can work on this through bug
> reports and issue like this one you brought.
> 
> I think best course of action is still open a bug report with a possible
> testcase.  I will ping Carlos to check his idea about it, he seemed
> interested with previous dlmopen issues.

Correct, we need to drive the development forward using specific use cases.

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: A possible libc/dlmopen/pthreads bug
  2018-01-24 19:34     ` Szabolcs Nagy
@ 2018-01-24 20:05       ` Adhemerval Zanella
  2018-01-24 20:06       ` Carlos O'Donell
  1 sibling, 0 replies; 15+ messages in thread
From: Adhemerval Zanella @ 2018-01-24 20:05 UTC (permalink / raw)
  To: Szabolcs Nagy, libc-help; +Cc: nd



On 24/01/2018 17:34, Szabolcs Nagy wrote:
> On 24/01/18 17:25, Adhemerval Zanella wrote:
>> In fact I think Vivek examples of running process with two different libcs
>> are working mainly because the different glibc define and access the shared
>> resources in same manner.  I bet it will break the moment we change some
>> state internal layout or semantic (for instance the __stack_user struct size).
> 
> even if the two libc has same code and same layout for
> internal data structures it is completely broken:
> there are two copies of the libc global data section
> with inconsistent content.
> 
> if printf called twice via different libcs they get
> buffered independently, but write to the same fd, so
> the observable effect on stdout can be out of order,
> fflush on namespace boundary does not help since the
> locks of one libc don't work on the other, the writes
> can happen concurrently.
> 
> setxid has to coordinate with all threads via signals,
> even with the ones created by the other libc.
> 
> same for atfork handlers: the handlers of the other
> libc has to run too.
> 
> same of tls/tsd dtors: at thread exit the other libc
> tls/tsd data should be freed too.
> 
> the dynamic loader + libc + libpthread needs to have
> one unique instance or some apis won't work across
> all instances.
> 

Thanks for bring this up, it seems that Vivek usercases only cover a
narrow possible ones.  Even more I think we should really limit
dlmopen scope for glibc own dso instances.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: A possible libc/dlmopen/pthreads bug
  2018-01-24 19:34     ` Szabolcs Nagy
  2018-01-24 20:05       ` Adhemerval Zanella
@ 2018-01-24 20:06       ` Carlos O'Donell
  2018-01-24 20:31         ` Vivek Das Mohapatra
  1 sibling, 1 reply; 15+ messages in thread
From: Carlos O'Donell @ 2018-01-24 20:06 UTC (permalink / raw)
  To: Szabolcs Nagy, Adhemerval Zanella, libc-help; +Cc: nd

On 01/24/2018 11:34 AM, Szabolcs Nagy wrote:
> On 24/01/18 17:25, Adhemerval Zanella wrote:
>> In fact I think Vivek examples of running process with two different libcs
>> are working mainly because the different glibc define and access the shared
>> resources in same manner.  I bet it will break the moment we change some
>> state internal layout or semantic (for instance the __stack_user struct size).
> 
> even if the two libc has same code and same layout for
> internal data structures it is completely broken:
> there are two copies of the libc global data section
> with inconsistent content.
> 
> if printf called twice via different libcs they get
> buffered independently, but write to the same fd, so
> the observable effect on stdout can be out of order,
> fflush on namespace boundary does not help since the
> locks of one libc don't work on the other, the writes
> can happen concurrently.
> 
> setxid has to coordinate with all threads via signals,
> even with the ones created by the other libc.
> 
> same for atfork handlers: the handlers of the other
> libc has to run too.
> 
> same of tls/tsd dtors: at thread exit the other libc
> tls/tsd data should be freed too.
> 
> the dynamic loader + libc + libpthread needs to have
> one unique instance or some apis won't work across
> all instances.

Correct.

All of these things need to be shared to fix dlmopen
in the "simple" use case e.g. a shared glibc, but everything
else isolated.

The "isolation" use case in which a potentially new
dynamic loader / libc is loaded into the dlmopen namespace will
require a stable API between the loaders to coordinate
such issues, and I'm not entirely sure we want that. So I
would not start by implementing this higher level of isolation.
For example such isolation may mean you must check API compat
level and may be rejected such that all you can do is run
dlmopen, but not call into the namespace for compat reasons
(letting the namespace run on it's own with constructors that
create threads that only live in the namespace).

For now I'm happy with the "simple" use case of a matching
C runtime implementation in both the outer and inner
namespaces.

However, keep in mind that the problems you face with the
C runtime will appear again with higher level libraries.

This concept of "shim" libraries will only work for some
limited set of libraries that are relatively stateless
(and as you see glibc is not).

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: A possible libc/dlmopen/pthreads bug
  2018-01-24 17:47     ` Vivek Das Mohapatra
  2018-01-24 18:27       ` Adhemerval Zanella
@ 2018-01-24 20:08       ` Carlos O'Donell
  1 sibling, 0 replies; 15+ messages in thread
From: Carlos O'Donell @ 2018-01-24 20:08 UTC (permalink / raw)
  To: Vivek Das Mohapatra, Adhemerval Zanella; +Cc: libc-help

On 01/24/2018 09:46 AM, Vivek Das Mohapatra wrote:
>> In fact I think Vivek examples of running process with two different libcs
>> are working mainly because the different glibc define and access the shared
>> resources in same manner.  I bet it will break the moment we change some
>> state internal layout or semantic (for instance the __stack_user struct size).
> 
> A slight correction - we are deliberately using the _same_ libc to
> avoid exactly that problem - it's that dlmopen as currently implemented
> creates a new mapping of the same libc for the dlmopen namespace.
> 
> I also don't need (or expect) complete isolation of the libc cluster:
> If I could, I would share the same mapping of libc & co with the main
> link map.

This needs fixing, and is one of the reasons that dlmopen is generally
not being made available for this use case until we resolve the underlying
sharing problem.

I'm happy to see patches to make ld.so/glibc shared in all the namespaces.

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: A possible libc/dlmopen/pthreads bug
  2018-01-24 20:06       ` Carlos O'Donell
@ 2018-01-24 20:31         ` Vivek Das Mohapatra
  0 siblings, 0 replies; 15+ messages in thread
From: Vivek Das Mohapatra @ 2018-01-24 20:31 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: Szabolcs Nagy, Adhemerval Zanella, libc-help, nd

> This concept of "shim" libraries will only work for some
> limited set of libraries that are relatively stateless
> (and as you see glibc is not).

In my use case the shim library arranges for there to be only one
copy of the library being shimmed - the main namespace "thinks"
it has a copy of the library, but there's actually no code there,
only enough ELF data that it thinks linking can work: Since there's
only one copy even stateful libraries work (but see later).

The shim makes sure the same function pointers are used both inside
and outside the namespace *for the target library only*, not any
further dependencies.

*However* for the libc cluster we really do want there to be only one
copy, since shimming cannot occur until _after_ libc has been mapped
(since libc is pretty fundamental).

I'll see if I can understand the dl[m]open worker code well enough to
provide a patch.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: A possible libc/dlmopen/pthreads bug
  2018-01-24 18:27       ` Adhemerval Zanella
  2018-01-24 19:48         ` Carlos O'Donell
@ 2018-01-24 21:49         ` Vivek Das Mohapatra
  1 sibling, 0 replies; 15+ messages in thread
From: Vivek Das Mohapatra @ 2018-01-24 21:49 UTC (permalink / raw)
  To: Adhemerval Zanella; +Cc: libc-help

> I think best course of action is still open a bug report with a possible
> testcase.  I will ping Carlos to check his idea about it, he seemed
> interested with previous dlmopen issues.

For the record and anyone searching the archives later -

    https://sourceware.org/bugzilla/show_bug.cgi?id=22745

Test case reliably provokes the lockup here.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2018-01-24 21:49 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-24 13:59 A possible libc/dlmopen/pthreads bug Vivek Das Mohapatra
2018-01-24 16:52 ` Szabolcs Nagy
2018-01-24 17:16   ` Vivek Das Mohapatra
2018-01-24 17:25   ` Adhemerval Zanella
2018-01-24 17:47     ` Vivek Das Mohapatra
2018-01-24 18:27       ` Adhemerval Zanella
2018-01-24 19:48         ` Carlos O'Donell
2018-01-24 21:49         ` Vivek Das Mohapatra
2018-01-24 20:08       ` Carlos O'Donell
2018-01-24 19:34     ` Szabolcs Nagy
2018-01-24 20:05       ` Adhemerval Zanella
2018-01-24 20:06       ` Carlos O'Donell
2018-01-24 20:31         ` Vivek Das Mohapatra
2018-01-24 17:08 ` Adhemerval Zanella
2018-01-24 17:19   ` Vivek Das Mohapatra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).