Hi, This bug had a simple patch for five years without reply. https://sourceware.org/bugzilla/show_bug.cgi?id=4578 Could someone comment this? It was detected on custom chip, could this be replicated on other architectures? An analysis from bugzilla and patch are below " Details: If a thread happens to hold dl_load_lock and have r_state set to RT_ADD or RT_DELETE at the time another thread calls fork(), then the child exit code from fork (in nptl/sysdeps/unix/sysv/linux/fork.c in our case) re-initializes dl_load_lock but does not restore r_state to RT_CONSISTENT. If the child subsequently requires ld.so functionality before calling exec(), then the assertion will fire. The patch acquires dl_load_lock on entry to fork() and releases it on exit from the parent path. The child path is initialized as currently done. This is essentially pthreads_atfork, but forced to be first because the acquisition of dl_load_lock must happen before malloc_atfork is active to avoid a deadlock. " --- glibc-2.5-sources/nptl/sysdeps/unix/sysv/linux/fork.c 2007-05-29 23:44:33.000000000 -0400 +++ glibc-2.5-modified/nptl/sysdeps/unix/sysv/linux/fork.c 2007-05-31 15:07:18.712221827 -0400 @@ -27,6 +27,7 @@ #include "fork.h" #include <hp-timing.h> #include <ldsodefs.h> +#include <bits/libc-lock.h> #include <bits/stdio-lock.h> #include <atomic.h> @@ -59,6 +60,8 @@ struct used_handler *next; } *allp = NULL; + /* grab ld.so lock BEFORE switching to malloc_atfork */ + __rtld_lock_lock_recursive (GL(dl_load_lock)); /* Run all the registered preparation handlers. In reverse order. While doing this we build up a list of all the entries. */ struct fork_handler *runp; @@ -208,6 +211,8 @@ allp = allp->next; } + /* unlock ld.so last, because we locked it first */ + __rtld_lock_unlock_recursive (GL(dl_load_lock)); } return pid;
On Wed, Oct 09, 2013 at 10:05:34PM +0200, OndÅej BÃlka wrote: > Hi, > > This bug had a simple patch for five years without reply. > https://sourceware.org/bugzilla/show_bug.cgi?id=4578 > Could someone comment this? > > It was detected on custom chip, could this be replicated on other > architectures? > Comments? > An analysis from bugzilla and patch are below > > " > Details: > > If a thread happens to hold dl_load_lock and have r_state set to RT_ADD > or RT_DELETE at the time another thread calls fork(), then the child exit > code from fork (in nptl/sysdeps/unix/sysv/linux/fork.c in our case) > re-initializes dl_load_lock but does not restore r_state to RT_CONSISTENT. > If the child subsequently requires ld.so functionality before calling exec(), > then the assertion will fire. > > The patch acquires dl_load_lock on entry to fork() and releases it on exit > from the parent path. The child path is initialized as currently done. > This is essentially pthreads_atfork, but forced to be first because the > acquisition of dl_load_lock must happen before malloc_atfork is active > to avoid a deadlock. > " > > --- glibc-2.5-sources/nptl/sysdeps/unix/sysv/linux/fork.c > 2007-05-29 23:44:33.000000000 -0400 > +++ glibc-2.5-modified/nptl/sysdeps/unix/sysv/linux/fork.c > 2007-05-31 15:07:18.712221827 -0400 > @@ -27,6 +27,7 @@ > #include "fork.h" > #include <hp-timing.h> > #include <ldsodefs.h> > +#include <bits/libc-lock.h> > #include <bits/stdio-lock.h> > #include <atomic.h> > > @@ -59,6 +60,8 @@ > struct used_handler *next; > } *allp = NULL; > > + /* grab ld.so lock BEFORE switching to malloc_atfork */ > + __rtld_lock_lock_recursive (GL(dl_load_lock)); > /* Run all the registered preparation handlers. In reverse order. > While doing this we build up a list of all the entries. */ > struct fork_handler *runp; > @@ -208,6 +211,8 @@ > > allp = allp->next; > } > + /* unlock ld.so last, because we locked it first */ > + __rtld_lock_unlock_recursive (GL(dl_load_lock)); > } > > return pid;
ping
On Wed, Oct 09, 2013 at 10:05:34PM +0200, OndÅej BÃlka wrote:
> Hi,
>
> This bug had a simple patch for five years without reply.
> https://sourceware.org/bugzilla/show_bug.cgi?id=4578
> Could someone comment this?
>
> It was detected on custom chip, could this be replicated on other
> architectures?
>
> An analysis from bugzilla and patch are below
>
> "
> Details:
>
> If a thread happens to hold dl_load_lock and have r_state set to RT_ADD
> or RT_DELETE at the time another thread calls fork(), then the child exit
> code from fork (in nptl/sysdeps/unix/sysv/linux/fork.c in our case)
> re-initializes dl_load_lock but does not restore r_state to RT_CONSISTENT.
> If the child subsequently requires ld.so functionality before calling exec(),
> then the assertion will fire.
>
> The patch acquires dl_load_lock on entry to fork() and releases it on exit
> from the parent path. The child path is initialized as currently done.
> This is essentially pthreads_atfork, but forced to be first because the
> acquisition of dl_load_lock must happen before malloc_atfork is active
> to avoid a deadlock.
> "
>
> --- glibc-2.5-sources/nptl/sysdeps/unix/sysv/linux/fork.c
> 2007-05-29 23:44:33.000000000 -0400
> +++ glibc-2.5-modified/nptl/sysdeps/unix/sysv/linux/fork.c
> 2007-05-31 15:07:18.712221827 -0400
> @@ -27,6 +27,7 @@
> #include "fork.h"
> #include <hp-timing.h>
> #include <ldsodefs.h>
> +#include <bits/libc-lock.h>
> #include <bits/stdio-lock.h>
> #include <atomic.h>
>
> @@ -59,6 +60,8 @@
> struct used_handler *next;
> } *allp = NULL;
>
> + /* grab ld.so lock BEFORE switching to malloc_atfork */
> + __rtld_lock_lock_recursive (GL(dl_load_lock));
> /* Run all the registered preparation handlers. In reverse order.
> While doing this we build up a list of all the entries. */
> struct fork_handler *runp;
> @@ -208,6 +211,8 @@
>
> allp = allp->next;
> }
> + /* unlock ld.so last, because we locked it first */
> + __rtld_lock_unlock_recursive (GL(dl_load_lock));
> }
>
> return pid;
[-- Attachment #1: Type: Text/Plain, Size: 1426 bytes --] On Wednesday 09 October 2013 16:05:34 Ondřej Bílka wrote: > Details: > > If a thread happens to hold dl_load_lock and have r_state set to RT_ADD > or RT_DELETE at the time another thread calls fork(), then the child exit > code from fork (in nptl/sysdeps/unix/sysv/linux/fork.c in our case) > re-initializes dl_load_lock but does not restore r_state to RT_CONSISTENT. > If the child subsequently requires ld.so functionality before calling > exec(), then the assertion will fire. > > The patch acquires dl_load_lock on entry to fork() and releases it on exit > from the parent path. The child path is initialized as currently done. > This is essentially pthreads_atfork, but forced to be first because the > acquisition of dl_load_lock must happen before malloc_atfork is active > to avoid a deadlock. > " doesn't seem right that we grab the lock and then just reset it in the child ? seems like you should just unlock it rather than reset it in the child. i'm also wary of code that already grabs a lot of locks trying to grab even more. the code paths that already grab the IO locks ... can they possibly grab this one too ? like a custom format handler that triggers loading of libs ? > + /* grab ld.so lock BEFORE switching to malloc_atfork */ comment style is incorrect > + /* unlock ld.so last, because we locked it first */ comment style is wrong here too -mike [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 836 bytes --]
On Thu, Jan 02, 2014 at 05:18:22PM -0500, Mike Frysinger wrote: > On Wednesday 09 October 2013 16:05:34 OndÅej BÃlka wrote: > > Details: > > > > If a thread happens to hold dl_load_lock and have r_state set to RT_ADD > > or RT_DELETE at the time another thread calls fork(), then the child exit > > code from fork (in nptl/sysdeps/unix/sysv/linux/fork.c in our case) > > re-initializes dl_load_lock but does not restore r_state to RT_CONSISTENT. > > If the child subsequently requires ld.so functionality before calling > > exec(), then the assertion will fire. > > > > The patch acquires dl_load_lock on entry to fork() and releases it on exit > > from the parent path. The child path is initialized as currently done. > > This is essentially pthreads_atfork, but forced to be first because the > > acquisition of dl_load_lock must happen before malloc_atfork is active > > to avoid a deadlock. > > " > > doesn't seem right that we grab the lock and then just reset it in the child ? > seems like you should just unlock it rather than reset it in the child. > That part looks ok as without locking you could get inconsistent linker structures. > i'm also wary of code that already grabs a lot of locks trying to grab even > more. the code paths that already grab the IO locks ... can they possibly > grab this one too ? like a custom format handler that triggers loading of > libs ? > Do you haave a better solution? I send this to decide what to do with this bug. I would not be surprised if we decided that it is invalid as threads with fork cause problems in general.
[-- Attachment #1: Type: Text/Plain, Size: 2216 bytes --] On Thursday 02 January 2014 18:54:40 Ondřej Bílka wrote: > On Thu, Jan 02, 2014 at 05:18:22PM -0500, Mike Frysinger wrote: > > On Wednesday 09 October 2013 16:05:34 Ondřej Bílka wrote: > > > Details: > > > > > > If a thread happens to hold dl_load_lock and have r_state set to RT_ADD > > > or RT_DELETE at the time another thread calls fork(), then the child > > > exit code from fork (in nptl/sysdeps/unix/sysv/linux/fork.c in our > > > case) re-initializes dl_load_lock but does not restore r_state to > > > RT_CONSISTENT. If the child subsequently requires ld.so functionality > > > before calling exec(), then the assertion will fire. > > > > > > The patch acquires dl_load_lock on entry to fork() and releases it on > > > exit from the parent path. The child path is initialized as currently > > > done. This is essentially pthreads_atfork, but forced to be first > > > because the acquisition of dl_load_lock must happen before > > > malloc_atfork is active to avoid a deadlock. > > > " > > > > doesn't seem right that we grab the lock and then just reset it in the > > child ? seems like you should just unlock it rather than reset it in the > > child. > > That part looks ok as without locking you could get inconsistent linker > structures. that's not what i meant. rather than make the call to reset in the child as the code in the tree does now, if you grab the lock early, why not use the normal unlock call in the child ? struct state would be fine. > > i'm also wary of code that already grabs a lot of locks trying to grab > > even more. the code paths that already grab the IO locks ... can they > > possibly grab this one too ? like a custom format handler that triggers > > loading of libs ? > > Do you haave a better solution? I send this to decide what to do with > this bug. I would not be surprised if we decided that it is invalid as > threads with fork cause problems in general. i think we want to support it without it being completely terrible. maybe the answer here is to reset all the dl state like we do with the lock ? is there some init func we could call ? i'm really not familiar with glibc ldso internals. -mike [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 836 bytes --]
On Thu, Jan 02, 2014 at 09:07:02PM -0500, Mike Frysinger wrote: > On Thursday 02 January 2014 18:54:40 OndÅej BÃlka wrote: > > On Thu, Jan 02, 2014 at 05:18:22PM -0500, Mike Frysinger wrote: > > > On Wednesday 09 October 2013 16:05:34 OndÅej BÃlka wrote: > > > > Details: > > > > > > > > If a thread happens to hold dl_load_lock and have r_state set to RT_ADD > > > > or RT_DELETE at the time another thread calls fork(), then the child > > > > exit code from fork (in nptl/sysdeps/unix/sysv/linux/fork.c in our > > > > case) re-initializes dl_load_lock but does not restore r_state to > > > > RT_CONSISTENT. If the child subsequently requires ld.so functionality > > > > before calling exec(), then the assertion will fire. > > > > > > > > The patch acquires dl_load_lock on entry to fork() and releases it on > > > > exit from the parent path. The child path is initialized as currently > > > > done. This is essentially pthreads_atfork, but forced to be first > > > > because the acquisition of dl_load_lock must happen before > > > > malloc_atfork is active to avoid a deadlock. > > > > " > > > > > > doesn't seem right that we grab the lock and then just reset it in the > > > child ? seems like you should just unlock it rather than reset it in the > > > child. > > > > That part looks ok as without locking you could get inconsistent linker > > structures. > > that's not what i meant. rather than make the call to reset in the child as > the code in the tree does now, if you grab the lock early, why not use the > normal unlock call in the child ? struct state would be fine. > possible. > > > i'm also wary of code that already grabs a lot of locks trying to grab > > > even more. the code paths that already grab the IO locks ... can they > > > possibly grab this one too ? like a custom format handler that triggers > > > loading of libs ? > > > > Do you haave a better solution? I send this to decide what to do with > > this bug. I would not be surprised if we decided that it is invalid as > > threads with fork cause problems in general. > > i think we want to support it without it being completely terrible. > > maybe the answer here is to reset all the dl state like we do with the lock ? > is there some init func we could call ? i'm really not familiar with glibc > ldso internals. How would you handle dlclose with handle before fork? I still do not see a clean solution.