Timing window in NPTL fork.c causes hangs.

public inbox for libc-hacker@sourceware.org
 help / color / mirror / Atom feed

* Timing window in NPTL fork.c causes hangs.
@ 2007-02-19 20:24 Steven Munroe
  2007-02-25 21:47 ` Ulrich Drepper
  2007-02-26 16:03 ` Jakub Jelinek
  0 siblings, 2 replies; 4+ messages in thread
From: Steven Munroe @ 2007-02-19 20:24 UTC (permalink / raw)
  To: GNU libc hacker, Ryan Arnold, Mark Brown

[-- Attachment #1: Type: text/plain, Size: 2637 bytes --]

One of our larger application is experiencing hangs and we have tracked
this down to interaction between fork/atfork and the malloc
implementation. We have a simplified test case (attached) that
illuminates this problem.

Basically the NPTL fork is not atomic to signal due to the at_fork
handling which must run before (atfork prepare) and after (atfork parent
and child) the fork syscall. The GLIBC runtime uses atfork processing
internal to insure correct behaviour for the parent and child after the
fork. This includes IO and malloc, for example the calloc contains the
following code sequence:

    /* Suspend the thread until the `atfork' handlers have completed.
       By that time, the hooks will have been reset as well, so that
       mALLOc() can be used again. */
    (void)mutex_lock(&list_lock);
    (void)mutex_unlock(&list_lock);
    return public_mALLOc(sz);

This is no problem as long as fork processing continues and call the
malloc atfork parent/child handler.

However the code in sysdeps/unix/sysv/linux/fork.c is exposed to signals
interupting its operation. If the thread calling fork is interrupted by
a signal, after it has processed atfork prepare handlers but before it
has processed the atfork parent handles, and the signal handler blocks
for any reason (sigsuspend or attempts IO) the process can hang. For
example any other thread attempting to call malloc will wait for the
atfork handlers to release the "list_lock" but the thread processing the
fork in now blocked and can not proceed. If the forking thread is
dependent on one of the other threads to wake it (via signal) that
thread may block on the list_lock first and now we have deadlock.

So is it OK for NPTLs fork implementation to not be atomic relative to
signals?

From the POSIX spec we see statements like:

13089 ... Since the fork ( ) call can be considered as atomic
13090 from the applicationâ€™s perspective, the set would be initialized
as empty and such signals would
13091 have arrived after the fork ( ); see also <signal.h>.

In this case fork is definitely not atomic.

So what should we do about this? One possible solution is to use the
signal mask and disable async signals for the duration of __libc_fork().
Or at least from just before atfork prepare processing to after atfork
parent/child processing.

We have experimented with this in our application (masking signals
before the fork call and restoring them after in the parent and child).
And this does seem to elliminate the hang.

But should we change the libc NPTL fork implement to use signal masks to
give the application the appeirence that fork is atomic?

[-- Attachment #2: calloc-fork-hang.c --]
[-- Type: text/x-c, Size: 2396 bytes --]

#include <pthread.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <signal.h>
#include <stdio.h>

#define CALLOC_NMEMB 10000
/* 
* VERSION 1.1
* This testcase has 4 threads. The main thread simply starts the other threads
* and then sleeps on a pthread_join. The forkingThread repeatedly calls 
* fork. The signalingThread repeatedly signals the forking thread, which  
* causes the forking thread to do sigsuspend. The third thread repeatedly 
* callocs and frees memory. Only when it is done with the calloc does it 
* signal the suspended thread to continue. The theory is that when the forking 
* thread gets suspended in the right place, it is holding a lock that the
* callocing thread needs to continue, so the calloc thread hangs waiting on 
* that lock, and it cannot signal the forking thread to continue, creating a 
* deadlock. 
*/

int killflag = 1;
pthread_t forkThread;
pthread_t sigThread;
pthread_t calThread;

void  sigusr1Handler(int signum){
	sigset_t set1;
	sigfillset(&set1);
	sigdelset(&set1, SIGUSR2);
	sigsuspend(&set1);
	killflag = 1;
}

void  sigusr2Handler(int signum){
	return;
}

void* callocingThread(void *ptr)
{ 
	int * memptr;

	while(1)
	{ 
		memptr = calloc(CALLOC_NMEMB,4);
		if (!memptr){
			fprintf(stderr, "calloc failed\n");
		}
		pthread_kill(forkThread, SIGUSR2);
		free(memptr);
	}
}

void* signalingThread(void *ptr)
{ 
	while(1)
	{ 
		if (killflag) {
			killflag = 0;
			pthread_kill(forkThread, SIGUSR1);
		}
	}
}

void* forkingThread(void *ptr)
{ 
	pid_t pid;
	int i;

	struct sigaction sigusr1_action;
	struct sigaction sigusr2_action;

	sigfillset(&sigusr1_action.sa_mask);
	sigfillset(&sigusr2_action.sa_mask);

	sigusr1_action.sa_handler = &sigusr1Handler;
	sigusr2_action.sa_handler = &sigusr2Handler;

	sigaction(SIGUSR1, &sigusr1_action, NULL);
	sigaction(SIGUSR2, &sigusr2_action, NULL);

	while(1)
	{ 
		pid = fork();
		fprintf(stderr, ".");
		if (pid == 0){
		/* child */
			exit(0);
		} else if (pid > 0) {
		/* parent */
			waitpid(pid,NULL,NULL);
			continue;
		} else {
			fprintf(stderr, "fork failed\n");
		}
	}
}

int main(int argc , char *argv[])
{

	pthread_create(&forkThread, 0, &forkingThread, 0);
	pthread_create(&calThread, 0, &callocingThread, 0);
	pthread_create(&sigThread, 0, &signalingThread, 0);

	pthread_join(forkThread, NULL);

	return 0;
}

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Timing window in NPTL fork.c causes hangs.
  2007-02-19 20:24 Timing window in NPTL fork.c causes hangs Steven Munroe
@ 2007-02-25 21:47 ` Ulrich Drepper
  2007-02-26 15:30   ` Steven Munroe
  2007-02-26 16:03 ` Jakub Jelinek
  1 sibling, 1 reply; 4+ messages in thread
From: Ulrich Drepper @ 2007-02-25 21:47 UTC (permalink / raw)
  To: Steven Munroe; +Cc: GNU libc hacker

[-- Attachment #1: Type: text/plain, Size: 160 bytes --]

I assume this is the same issue fixed by the patch I just applied!?

-- 
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 251 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Timing window in NPTL fork.c causes hangs.
  2007-02-25 21:47 ` Ulrich Drepper
@ 2007-02-26 15:30   ` Steven Munroe
  0 siblings, 0 replies; 4+ messages in thread
From: Steven Munroe @ 2007-02-26 15:30 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: GNU libc hacker

Ulrich Drepper wrote:
> I assume this is the same issue fixed by the patch I just applied!?
>
>   
No this is a different hang. Suzuki's bug caused hangs in the child
process. This bug causes hangs (in malloc) in multithreaded parent
processes.

The NPTL implementation of fork is not atomic relative to glibc's
internal (i.e. malloc's) atfork() handling. If the forking thread takes
a signal after atfork prepare and before atfork parent processing AND
the signal handler blocks for any reason, any other thread attempting a
malloc will hang. If the forking threads signal handler is dependent in
any way on other threads to complete then we can see a deadlock.

One way to resolve this is to mask signals from before atfork prepare
until after atfork parent/child processing.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Timing window in NPTL fork.c causes hangs.
  2007-02-19 20:24 Timing window in NPTL fork.c causes hangs Steven Munroe
  2007-02-25 21:47 ` Ulrich Drepper
@ 2007-02-26 16:03 ` Jakub Jelinek
  1 sibling, 0 replies; 4+ messages in thread
From: Jakub Jelinek @ 2007-02-26 16:03 UTC (permalink / raw)
  To: Steven Munroe; +Cc: GNU libc hacker, Ryan Arnold, Mark Brown

On Mon, Feb 19, 2007 at 02:38:46PM -0600, Steven Munroe wrote:
> However the code in sysdeps/unix/sysv/linux/fork.c is exposed to signals
> interupting its operation. If the thread calling fork is interrupted by
> a signal, after it has processed atfork prepare handlers but before it
> has processed the atfork parent handles, and the signal handler blocks
> for any reason (sigsuspend or attempts IO) the process can hang. For
> example any other thread attempting to call malloc will wait for the
> atfork handlers to release the "list_lock" but the thread processing the
> fork in now blocked and can not proceed. If the forking thread is
> dependent on one of the other threads to wake it (via signal) that
> thread may block on the list_lock first and now we have deadlock.
> 
> So is it OK for NPTLs fork implementation to not be atomic relative to
> signals?

If you have an async signal handler that can block the app indefinitely,
then that's to be expected.  How is that different from the same signal
handler e.g. interrupting in the middle of malloc or stdio?  Some malloc or
stdio lock can be held at that point, so if your async signal handler
waits till some other thread wakes it up and those other threads need
malloc or stdio, you hang exactly the same way.

> >From the POSIX spec we see statements like:
> 
> 13089 ... Since the fork ( ) call can be considered as atomic
> 13090 from the application???s perspective, the set would be initialized
> as empty and such signals would
> 13091 have arrived after the fork ( ); see also <signal.h>.

This IMHO talks just about the issue whether a signal sent to the process
is sent just to parent or also to the child.  fork() as a whole can't
be considered atomic, you can e.g. block indefinitely in one of the atfork
handlers, using async signal safe function.

> So what should we do about this? One possible solution is to use the
> signal mask and disable async signals for the duration of __libc_fork().
> Or at least from just before atfork prepare processing to after atfork
> parent/child processing.

So you just break different apps (in addition to making fork() considerably
slower)?  Apps have full right to expect the signal masks weren't messed
up by the library, can very well e.g. sigsuspend in an atfork handler
and expect to be woken up.  If you block all signals before running
the atfork handlers, that would never happen.  Not to mention that the
atfork handlers can sigprocmask.

> We have experimented with this in our application (masking signals
> before the fork call and restoring them after in the parent and child).
> And this does seem to elliminate the hang.

Then just do that in your application if you need it.

> But should we change the libc NPTL fork implement to use signal masks to
> give the application the appeirence that fork is atomic?

No.

	Jakub

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2007-02-26 16:03 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-19 20:24 Timing window in NPTL fork.c causes hangs Steven Munroe
2007-02-25 21:47 ` Ulrich Drepper
2007-02-26 15:30   ` Steven Munroe
2007-02-26 16:03 ` Jakub Jelinek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).