[RFC][BZ #1874] Fix assertion triggered by thread/fork interaction

public inbox for libc-ports@sourceware.org
 help / color / mirror / Atom feed

* [RFC][BZ #1874] Fix assertion triggered by thread/fork interaction
@ 2013-10-09 20:05 Ondřej Bílka
  2013-10-17 15:41 ` Ondřej Bílka
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Ondřej Bílka @ 2013-10-09 20:05 UTC (permalink / raw)
  To: libc-alpha, libc-ports

Hi,

This bug had a simple patch for five years without reply. 
https://sourceware.org/bugzilla/show_bug.cgi?id=4578
Could someone comment this?

It was detected on custom chip, could this be replicated on other
architectures?

An analysis from bugzilla and patch are below

"
Details:

If a thread happens to hold dl_load_lock and have r_state set to RT_ADD
or RT_DELETE at the time another thread calls fork(), then the child exit
code from fork (in nptl/sysdeps/unix/sysv/linux/fork.c in our case)
re-initializes dl_load_lock but does not restore r_state to RT_CONSISTENT.
If the child subsequently requires ld.so functionality before calling exec(),
then the assertion will fire.

The patch acquires dl_load_lock on entry to fork() and releases it on exit
from the parent path.  The child path is initialized as currently done.
This is essentially pthreads_atfork, but forced to be first because the
acquisition of dl_load_lock must happen before malloc_atfork is active
to avoid a deadlock.
"

--- glibc-2.5-sources/nptl/sysdeps/unix/sysv/linux/fork.c
2007-05-29 23:44:33.000000000 -0400
+++ glibc-2.5-modified/nptl/sysdeps/unix/sysv/linux/fork.c
2007-05-31 15:07:18.712221827 -0400
@@ -27,6 +27,7 @@
 #include "fork.h"
 #include <hp-timing.h>
 #include <ldsodefs.h>
+#include <bits/libc-lock.h>
 #include <bits/stdio-lock.h>
 #include <atomic.h>

@@ -59,6 +60,8 @@
     struct used_handler *next;
   } *allp = NULL;

+  /* grab ld.so lock BEFORE switching to malloc_atfork */
+   __rtld_lock_lock_recursive (GL(dl_load_lock));
   /* Run all the registered preparation handlers.  In reverse order.
      While doing this we build up a list of all the entries.  */
   struct fork_handler *runp;
@@ -208,6 +211,8 @@

 	  allp = allp->next;
 	}
+      /* unlock ld.so last, because we locked it first */
+      __rtld_lock_unlock_recursive (GL(dl_load_lock));
     }

   return pid;

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC][BZ #1874] Fix assertion triggered by thread/fork interaction
  2013-10-09 20:05 [RFC][BZ #1874] Fix assertion triggered by thread/fork interaction Ondřej Bílka
@ 2013-10-17 15:41 ` Ondřej Bílka
  2014-01-02 20:30 ` [PING][RFC][BZ " Ondřej Bílka
  2014-01-02 22:18 ` [RFC][BZ " Mike Frysinger
  2 siblings, 0 replies; 7+ messages in thread
From: Ondřej Bílka @ 2013-10-17 15:41 UTC (permalink / raw)
  To: libc-alpha, libc-ports

On Wed, Oct 09, 2013 at 10:05:34PM +0200, OndÅ™ej BÃlka wrote:
> Hi,
> 
> This bug had a simple patch for five years without reply. 
> https://sourceware.org/bugzilla/show_bug.cgi?id=4578
> Could someone comment this?
> 
> It was detected on custom chip, could this be replicated on other
> architectures?
>
Comments?
 
> An analysis from bugzilla and patch are below
> 
> "
> Details:
> 
> If a thread happens to hold dl_load_lock and have r_state set to RT_ADD
> or RT_DELETE at the time another thread calls fork(), then the child exit
> code from fork (in nptl/sysdeps/unix/sysv/linux/fork.c in our case)
> re-initializes dl_load_lock but does not restore r_state to RT_CONSISTENT.
> If the child subsequently requires ld.so functionality before calling exec(),
> then the assertion will fire.
> 
> The patch acquires dl_load_lock on entry to fork() and releases it on exit
> from the parent path.  The child path is initialized as currently done.
> This is essentially pthreads_atfork, but forced to be first because the
> acquisition of dl_load_lock must happen before malloc_atfork is active
> to avoid a deadlock.
> "
> 
> --- glibc-2.5-sources/nptl/sysdeps/unix/sysv/linux/fork.c
> 2007-05-29 23:44:33.000000000 -0400
> +++ glibc-2.5-modified/nptl/sysdeps/unix/sysv/linux/fork.c
> 2007-05-31 15:07:18.712221827 -0400
> @@ -27,6 +27,7 @@
>  #include "fork.h"
>  #include <hp-timing.h>
>  #include <ldsodefs.h>
> +#include <bits/libc-lock.h>
>  #include <bits/stdio-lock.h>
>  #include <atomic.h>
>  
> @@ -59,6 +60,8 @@
>      struct used_handler *next;
>    } *allp = NULL;
>  
> +  /* grab ld.so lock BEFORE switching to malloc_atfork */
> +   __rtld_lock_lock_recursive (GL(dl_load_lock));
>    /* Run all the registered preparation handlers.  In reverse order.
>       While doing this we build up a list of all the entries.  */
>    struct fork_handler *runp;
> @@ -208,6 +211,8 @@
>  
>  	  allp = allp->next;
>  	}
> +      /* unlock ld.so last, because we locked it first */
> +      __rtld_lock_unlock_recursive (GL(dl_load_lock));
>      }
>  
>    return pid;

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PING][RFC][BZ #1874] Fix assertion triggered by thread/fork interaction
  2013-10-09 20:05 [RFC][BZ #1874] Fix assertion triggered by thread/fork interaction Ondřej Bílka
  2013-10-17 15:41 ` Ondřej Bílka
@ 2014-01-02 20:30 ` Ondřej Bílka
  2014-01-02 22:18 ` [RFC][BZ " Mike Frysinger
  2 siblings, 0 replies; 7+ messages in thread
From: Ondřej Bílka @ 2014-01-02 20:30 UTC (permalink / raw)
  To: libc-alpha, libc-ports

ping

On Wed, Oct 09, 2013 at 10:05:34PM +0200, OndÅ™ej BÃlka wrote:
> Hi,
> 
> This bug had a simple patch for five years without reply. 
> https://sourceware.org/bugzilla/show_bug.cgi?id=4578
> Could someone comment this?
> 
> It was detected on custom chip, could this be replicated on other
> architectures?
> 
> An analysis from bugzilla and patch are below
> 
> "
> Details:
> 
> If a thread happens to hold dl_load_lock and have r_state set to RT_ADD
> or RT_DELETE at the time another thread calls fork(), then the child exit
> code from fork (in nptl/sysdeps/unix/sysv/linux/fork.c in our case)
> re-initializes dl_load_lock but does not restore r_state to RT_CONSISTENT.
> If the child subsequently requires ld.so functionality before calling exec(),
> then the assertion will fire.
> 
> The patch acquires dl_load_lock on entry to fork() and releases it on exit
> from the parent path.  The child path is initialized as currently done.
> This is essentially pthreads_atfork, but forced to be first because the
> acquisition of dl_load_lock must happen before malloc_atfork is active
> to avoid a deadlock.
> "
> 
> --- glibc-2.5-sources/nptl/sysdeps/unix/sysv/linux/fork.c
> 2007-05-29 23:44:33.000000000 -0400
> +++ glibc-2.5-modified/nptl/sysdeps/unix/sysv/linux/fork.c
> 2007-05-31 15:07:18.712221827 -0400
> @@ -27,6 +27,7 @@
>  #include "fork.h"
>  #include <hp-timing.h>
>  #include <ldsodefs.h>
> +#include <bits/libc-lock.h>
>  #include <bits/stdio-lock.h>
>  #include <atomic.h>
>  
> @@ -59,6 +60,8 @@
>      struct used_handler *next;
>    } *allp = NULL;
>  
> +  /* grab ld.so lock BEFORE switching to malloc_atfork */
> +   __rtld_lock_lock_recursive (GL(dl_load_lock));
>    /* Run all the registered preparation handlers.  In reverse order.
>       While doing this we build up a list of all the entries.  */
>    struct fork_handler *runp;
> @@ -208,6 +211,8 @@
>  
>  	  allp = allp->next;
>  	}
> +      /* unlock ld.so last, because we locked it first */
> +      __rtld_lock_unlock_recursive (GL(dl_load_lock));
>      }
>  
>    return pid;

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC][BZ #1874] Fix assertion triggered by thread/fork interaction
  2013-10-09 20:05 [RFC][BZ #1874] Fix assertion triggered by thread/fork interaction Ondřej Bílka
  2013-10-17 15:41 ` Ondřej Bílka
  2014-01-02 20:30 ` [PING][RFC][BZ " Ondřej Bílka
@ 2014-01-02 22:18 ` Mike Frysinger
  2014-01-02 23:54   ` Ondřej Bílka
  2 siblings, 1 reply; 7+ messages in thread
From: Mike Frysinger @ 2014-01-02 22:18 UTC (permalink / raw)
  To: libc-alpha; +Cc: Ondřej Bílka, libc-ports

[-- Attachment #1: Type: Text/Plain, Size: 1426 bytes --]

On Wednesday 09 October 2013 16:05:34 Ondřej Bílka wrote:
> Details:
> 
> If a thread happens to hold dl_load_lock and have r_state set to RT_ADD
> or RT_DELETE at the time another thread calls fork(), then the child exit
> code from fork (in nptl/sysdeps/unix/sysv/linux/fork.c in our case)
> re-initializes dl_load_lock but does not restore r_state to RT_CONSISTENT.
> If the child subsequently requires ld.so functionality before calling
> exec(), then the assertion will fire.
> 
> The patch acquires dl_load_lock on entry to fork() and releases it on exit
> from the parent path.  The child path is initialized as currently done.
> This is essentially pthreads_atfork, but forced to be first because the
> acquisition of dl_load_lock must happen before malloc_atfork is active
> to avoid a deadlock.
> "

doesn't seem right that we grab the lock and then just reset it in the child ?  
seems like you should just unlock it rather than reset it in the child.

i'm also wary of code that already grabs a lot of locks trying to grab even 
more.  the code paths that already grab the IO locks ... can they possibly 
grab this one too ?  like a custom format handler that triggers loading of 
libs ?

> +  /* grab ld.so lock BEFORE switching to malloc_atfork */

comment style is incorrect

> +      /* unlock ld.so last, because we locked it first */

comment style is wrong here too
-mike

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC][BZ #1874] Fix assertion triggered by thread/fork interaction
  2014-01-02 22:18 ` [RFC][BZ " Mike Frysinger
@ 2014-01-02 23:54   ` Ondřej Bílka
  2014-01-03  2:07     ` Mike Frysinger
  0 siblings, 1 reply; 7+ messages in thread
From: Ondřej Bílka @ 2014-01-02 23:54 UTC (permalink / raw)
  To: Mike Frysinger; +Cc: libc-alpha, libc-ports

On Thu, Jan 02, 2014 at 05:18:22PM -0500, Mike Frysinger wrote:
> On Wednesday 09 October 2013 16:05:34 OndÅ™ej BÃlka wrote:
> > Details:
> > 
> > If a thread happens to hold dl_load_lock and have r_state set to RT_ADD
> > or RT_DELETE at the time another thread calls fork(), then the child exit
> > code from fork (in nptl/sysdeps/unix/sysv/linux/fork.c in our case)
> > re-initializes dl_load_lock but does not restore r_state to RT_CONSISTENT.
> > If the child subsequently requires ld.so functionality before calling
> > exec(), then the assertion will fire.
> > 
> > The patch acquires dl_load_lock on entry to fork() and releases it on exit
> > from the parent path.  The child path is initialized as currently done.
> > This is essentially pthreads_atfork, but forced to be first because the
> > acquisition of dl_load_lock must happen before malloc_atfork is active
> > to avoid a deadlock.
> > "
> 
> doesn't seem right that we grab the lock and then just reset it in the child ?  
> seems like you should just unlock it rather than reset it in the child.
>
That part looks ok as without locking you could get inconsistent linker structures. 

> i'm also wary of code that already grabs a lot of locks trying to grab even 
> more.  the code paths that already grab the IO locks ... can they possibly 
> grab this one too ?  like a custom format handler that triggers loading of 
> libs ?
>
Do you haave a better solution? I send this to decide what to do with
this bug. I would not be surprised if we decided that it is invalid as
threads with fork cause problems in general.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC][BZ #1874] Fix assertion triggered by thread/fork interaction
  2014-01-02 23:54   ` Ondřej Bílka
@ 2014-01-03  2:07     ` Mike Frysinger
  2014-01-11 12:07       ` Ondřej Bílka
  0 siblings, 1 reply; 7+ messages in thread
From: Mike Frysinger @ 2014-01-03  2:07 UTC (permalink / raw)
  To: Ondřej Bílka; +Cc: libc-alpha, libc-ports

[-- Attachment #1: Type: Text/Plain, Size: 2216 bytes --]

On Thursday 02 January 2014 18:54:40 Ondřej Bílka wrote:
> On Thu, Jan 02, 2014 at 05:18:22PM -0500, Mike Frysinger wrote:
> > On Wednesday 09 October 2013 16:05:34 Ondřej Bílka wrote:
> > > Details:
> > > 
> > > If a thread happens to hold dl_load_lock and have r_state set to RT_ADD
> > > or RT_DELETE at the time another thread calls fork(), then the child
> > > exit code from fork (in nptl/sysdeps/unix/sysv/linux/fork.c in our
> > > case) re-initializes dl_load_lock but does not restore r_state to
> > > RT_CONSISTENT. If the child subsequently requires ld.so functionality
> > > before calling exec(), then the assertion will fire.
> > > 
> > > The patch acquires dl_load_lock on entry to fork() and releases it on
> > > exit from the parent path.  The child path is initialized as currently
> > > done. This is essentially pthreads_atfork, but forced to be first
> > > because the acquisition of dl_load_lock must happen before
> > > malloc_atfork is active to avoid a deadlock.
> > > "
> > 
> > doesn't seem right that we grab the lock and then just reset it in the
> > child ? seems like you should just unlock it rather than reset it in the
> > child.
> 
> That part looks ok as without locking you could get inconsistent linker
> structures.

that's not what i meant.  rather than make the call to reset in the child as 
the code in the tree does now, if you grab the lock early, why not use the 
normal unlock call in the child ?  struct state would be fine.

> > i'm also wary of code that already grabs a lot of locks trying to grab
> > even more.  the code paths that already grab the IO locks ... can they
> > possibly grab this one too ?  like a custom format handler that triggers
> > loading of libs ?
> 
> Do you haave a better solution? I send this to decide what to do with
> this bug. I would not be surprised if we decided that it is invalid as
> threads with fork cause problems in general.

i think we want to support it without it being completely terrible.

maybe the answer here is to reset all the dl state like we do with the lock ?  
is there some init func we could call ?  i'm really not familiar with glibc 
ldso internals.
-mike

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC][BZ #1874] Fix assertion triggered by thread/fork interaction
  2014-01-03  2:07     ` Mike Frysinger
@ 2014-01-11 12:07       ` Ondřej Bílka
  0 siblings, 0 replies; 7+ messages in thread
From: Ondřej Bílka @ 2014-01-11 12:07 UTC (permalink / raw)
  To: Mike Frysinger; +Cc: libc-alpha, libc-ports

On Thu, Jan 02, 2014 at 09:07:02PM -0500, Mike Frysinger wrote:
> On Thursday 02 January 2014 18:54:40 OndÅ™ej BÃlka wrote:
> > On Thu, Jan 02, 2014 at 05:18:22PM -0500, Mike Frysinger wrote:
> > > On Wednesday 09 October 2013 16:05:34 OndÅ™ej BÃlka wrote:
> > > > Details:
> > > > 
> > > > If a thread happens to hold dl_load_lock and have r_state set to RT_ADD
> > > > or RT_DELETE at the time another thread calls fork(), then the child
> > > > exit code from fork (in nptl/sysdeps/unix/sysv/linux/fork.c in our
> > > > case) re-initializes dl_load_lock but does not restore r_state to
> > > > RT_CONSISTENT. If the child subsequently requires ld.so functionality
> > > > before calling exec(), then the assertion will fire.
> > > > 
> > > > The patch acquires dl_load_lock on entry to fork() and releases it on
> > > > exit from the parent path.  The child path is initialized as currently
> > > > done. This is essentially pthreads_atfork, but forced to be first
> > > > because the acquisition of dl_load_lock must happen before
> > > > malloc_atfork is active to avoid a deadlock.
> > > > "
> > > 
> > > doesn't seem right that we grab the lock and then just reset it in the
> > > child ? seems like you should just unlock it rather than reset it in the
> > > child.
> > 
> > That part looks ok as without locking you could get inconsistent linker
> > structures.
> 
> that's not what i meant.  rather than make the call to reset in the child as 
> the code in the tree does now, if you grab the lock early, why not use the 
> normal unlock call in the child ?  struct state would be fine.
>
possible.
 
> > > i'm also wary of code that already grabs a lot of locks trying to grab
> > > even more.  the code paths that already grab the IO locks ... can they
> > > possibly grab this one too ?  like a custom format handler that triggers
> > > loading of libs ?
> > 
> > Do you haave a better solution? I send this to decide what to do with
> > this bug. I would not be surprised if we decided that it is invalid as
> > threads with fork cause problems in general.
> 
> i think we want to support it without it being completely terrible.
> 
> maybe the answer here is to reset all the dl state like we do with the lock ?  
> is there some init func we could call ?  i'm really not familiar with glibc 
> ldso internals.

How would you handle dlclose with handle before fork? I still do not see
a clean solution.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-01-11 12:07 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-10-09 20:05 [RFC][BZ #1874] Fix assertion triggered by thread/fork interaction Ondřej Bílka
2013-10-17 15:41 ` Ondřej Bílka
2014-01-02 20:30 ` [PING][RFC][BZ " Ondřej Bílka
2014-01-02 22:18 ` [RFC][BZ " Mike Frysinger
2014-01-02 23:54   ` Ondřej Bílka
2014-01-03  2:07     ` Mike Frysinger
2014-01-11 12:07       ` Ondřej Bílka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).