public inbox for libc-hacker@sourceware.org
 help / color / mirror / Atom feed
* deadlock in signal handler with NPTL
@ 2004-06-22 21:51 Thorsten Kukuk
  2004-06-22 22:07 ` Jakub Jelinek
  0 siblings, 1 reply; 6+ messages in thread
From: Thorsten Kukuk @ 2004-06-22 21:51 UTC (permalink / raw)
  To: libc-hacker

[-- Attachment #1: Type: text/plain, Size: 612 bytes --]


Hi,

I got the following test program. I know, it is very ugly and there
are a lot of things somebody should not do, but this is something
what programs like sshd are doing.

The problem is: This program deadlocks very fast in a FUTEX_WAIT
call. This does not happen with LinuxThreads.

Any ideas what goes wrong?

  Thorsten

-- 
Thorsten Kukuk       http://www.suse.de/~kukuk/        kukuk@suse.de
SuSE Linux AG        Maxfeldstr. 5                 D-90409 Nuernberg
--------------------------------------------------------------------    
Key fingerprint = A368 676B 5E1B 3E46 CFCE  2D97 F8FD 4E23 56C6 FB4B

[-- Attachment #2: s.c --]
[-- Type: text/plain, Size: 574 bytes --]


#include <sys/types.h>

#include <unistd.h>
#include <stdio.h>
#include <syslog.h>
#include <signal.h>


void
handler(int sig)
{
	syslog(LOG_DEBUG, "sigtest");
	printf("in handler\n");
	fflush(stdout);
}

int
main(int argc, char *argv[])
{
	pid_t pid;

	switch (pid = fork()) {
	case 0:
		break;
	case -1:
		perror("form");
		exit(1);
	default:
		signaller(pid);
		exit(0);
	}
	signal(SIGCHLD, handler);
	while (1) {
		syslog(LOG_DEBUG, "test");
		printf("in loop\n");
		fflush(stdout);
	}
}

int
signaller(pid_t pid)
{
	while (1) {
		usleep(1);
		kill(pid, SIGCHLD);
	}
}

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: deadlock in signal handler with NPTL
  2004-06-22 21:51 deadlock in signal handler with NPTL Thorsten Kukuk
@ 2004-06-22 22:07 ` Jakub Jelinek
  2004-06-23  4:26   ` Thorsten Kukuk
  0 siblings, 1 reply; 6+ messages in thread
From: Jakub Jelinek @ 2004-06-22 22:07 UTC (permalink / raw)
  To: Thorsten Kukuk; +Cc: libc-hacker

On Tue, Jun 22, 2004 at 11:50:59PM +0200, Thorsten Kukuk wrote:
> 
> Hi,
> 
> I got the following test program. I know, it is very ugly and there
> are a lot of things somebody should not do, but this is something
> what programs like sshd are doing.

Then they should be fixed.  Neither syslog, nor printf, nor fflush
are supposed to be async-signal safe, nor they actually are in glibc.

> The problem is: This program deadlocks very fast in a FUTEX_WAIT
> call. This does not happen with LinuxThreads.

Try linking the program with -lpthread and retry with LinuxThreads.
It will hang the same way.
The thing is that NPTL uses locking (on IA-32/x86-64 without lock prefix)
even when -lpthread has not been linked in.

	Jakub

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: deadlock in signal handler with NPTL
  2004-06-22 22:07 ` Jakub Jelinek
@ 2004-06-23  4:26   ` Thorsten Kukuk
  2004-06-23  4:41     ` Steve Munroe
  0 siblings, 1 reply; 6+ messages in thread
From: Thorsten Kukuk @ 2004-06-23  4:26 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: libc-hacker

On Tue, Jun 22, Jakub Jelinek wrote:

> On Tue, Jun 22, 2004 at 11:50:59PM +0200, Thorsten Kukuk wrote:
> > 
> > Hi,
> > 
> > I got the following test program. I know, it is very ugly and there
> > are a lot of things somebody should not do, but this is something
> > what programs like sshd are doing.
> 
> Then they should be fixed.  Neither syslog, nor printf, nor fflush
> are supposed to be async-signal safe, nor they actually are in glibc.

Yes, but the problem is: Nearly every daemon on a Linux system is
calling syslog() in a signal handler and it seems to be very easy
to deadlock them on every Linux system running glibc/NPTL. While
there seems to be no other system with the same problem.

  Thorsten

-- 
Thorsten Kukuk       http://www.suse.de/~kukuk/        kukuk@suse.de
SuSE Linux AG        Maxfeldstr. 5                 D-90409 Nuernberg
--------------------------------------------------------------------    
Key fingerprint = A368 676B 5E1B 3E46 CFCE  2D97 F8FD 4E23 56C6 FB4B

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: deadlock in signal handler with NPTL
  2004-06-23  4:26   ` Thorsten Kukuk
@ 2004-06-23  4:41     ` Steve Munroe
  2004-06-23  6:56       ` Thorsten Kukuk
  0 siblings, 1 reply; 6+ messages in thread
From: Steve Munroe @ 2004-06-23  4:41 UTC (permalink / raw)
  To: Thorsten Kukuk; +Cc: Jakub Jelinek, libc-hacker






Thorsten Kukuk <kukuk@suse.de> wrote on 06/22/2004 11:22:56 PM:

> On Tue, Jun 22, Jakub Jelinek wrote:
>
> > On Tue, Jun 22, 2004 at 11:50:59PM +0200, Thorsten Kukuk wrote:
> > >
> > > Hi,
> > >
> > > I got the following test program. I know, it is very ugly and there
> > > are a lot of things somebody should not do, but this is something
> > > what programs like sshd are doing.
> >
> > Then they should be fixed.  Neither syslog, nor printf, nor fflush
> > are supposed to be async-signal safe, nor they actually are in glibc.
>
> Yes, but the problem is: Nearly every daemon on a Linux system is
> calling syslog() in a signal handler and it seems to be very easy
> to deadlock them on every Linux system running glibc/NPTL. While
> there seems to be no other system with the same problem.
>

Then what has change from glibc-2.3.3 (RHEL 3) until now? Because I have
not seen this problem before. I have reviewed all the changes to
lowlevellock.h since and I do not see any change that would effect this. In
fact your test case should show that same hang there.

Have the daemon's changed recently to add the syslog() call to the signal
handler?

Steven J. Munroe
Linux on Power Toolchain Architect
IBM Corporation, Linux Technology Center

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: deadlock in signal handler with NPTL
  2004-06-23  4:41     ` Steve Munroe
@ 2004-06-23  6:56       ` Thorsten Kukuk
  2004-06-23  8:23         ` Jakub Jelinek
  0 siblings, 1 reply; 6+ messages in thread
From: Thorsten Kukuk @ 2004-06-23  6:56 UTC (permalink / raw)
  To: Steve Munroe; +Cc: Jakub Jelinek, libc-hacker

On Tue, Jun 22, Steve Munroe wrote:

> Thorsten Kukuk <kukuk@suse.de> wrote on 06/22/2004 11:22:56 PM:
> 
> > On Tue, Jun 22, Jakub Jelinek wrote:
> >
> > > On Tue, Jun 22, 2004 at 11:50:59PM +0200, Thorsten Kukuk wrote:
> > > >
> > > > Hi,
> > > >
> > > > I got the following test program. I know, it is very ugly and there
> > > > are a lot of things somebody should not do, but this is something
> > > > what programs like sshd are doing.
> > >
> > > Then they should be fixed.  Neither syslog, nor printf, nor fflush
> > > are supposed to be async-signal safe, nor they actually are in glibc.
> >
> > Yes, but the problem is: Nearly every daemon on a Linux system is
> > calling syslog() in a signal handler and it seems to be very easy
> > to deadlock them on every Linux system running glibc/NPTL. While
> > there seems to be no other system with the same problem.
> >
> 
> Then what has change from glibc-2.3.3 (RHEL 3) until now? Because I have
> not seen this problem before.

The test case also deadlocks on a RHEL 3 machine very fast.

> I have reviewed all the changes to
> lowlevellock.h since and I do not see any change that would effect this. In
> fact your test case should show that same hang there.

The difference is: glibc with linuxthreads compiled only uses the
locking, if the program is linked against pthread.

glibc with NPTL compiled always uses locking (__libc_lock_lock always
calls lll_lock).

Uli, Jakub, is this really necessary? Wouldn't it be better to add the
one extra compare?

> Have the daemon's changed recently to add the syslog() call to the signal
> handler?

No, this is very, very old.

  Thorsten

-- 
Thorsten Kukuk       http://www.suse.de/~kukuk/        kukuk@suse.de
SuSE Linux AG        Maxfeldstr. 5                 D-90409 Nuernberg
--------------------------------------------------------------------    
Key fingerprint = A368 676B 5E1B 3E46 CFCE  2D97 F8FD 4E23 56C6 FB4B

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: deadlock in signal handler with NPTL
  2004-06-23  6:56       ` Thorsten Kukuk
@ 2004-06-23  8:23         ` Jakub Jelinek
  0 siblings, 0 replies; 6+ messages in thread
From: Jakub Jelinek @ 2004-06-23  8:23 UTC (permalink / raw)
  To: Thorsten Kukuk; +Cc: Steve Munroe, libc-hacker

On Wed, Jun 23, 2004 at 08:56:35AM +0200, Thorsten Kukuk wrote:
> > I have reviewed all the changes to
> > lowlevellock.h since and I do not see any change that would effect this. In
> > fact your test case should show that same hang there.
> 
> The difference is: glibc with linuxthreads compiled only uses the
> locking, if the program is linked against pthread.
> 
> glibc with NPTL compiled always uses locking (__libc_lock_lock always
> calls lll_lock).
> 
> Uli, Jakub, is this really necessary? Wouldn't it be better to add the
> one extra compare?

This is done on purpose.  We are not going to sacrifice speed for the sake
of a few broken apps which work with linuxthreads by pure luck.
The reasons are actually both speed and being able to dlopen libpthread.so.
When syslog enters its critical section by __libc_lock_lock (syslog_lock);,
it really assumes that it is alone there, if it knew it can handle the
critical section being entered recursively, it could use
__libc_lock_lock_recursive instead (at which point your testcase would
probably work, unless there are other problem places).
But the code is clearly not prepared for that.
So, when the locks are nop in linuxthreads, e.g. openlog might be done
twice, or the outer syslog might believe it is connected while the inner
syslog closed it and similar problems.

	Jakub

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2004-06-23  8:23 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-06-22 21:51 deadlock in signal handler with NPTL Thorsten Kukuk
2004-06-22 22:07 ` Jakub Jelinek
2004-06-23  4:26   ` Thorsten Kukuk
2004-06-23  4:41     ` Steve Munroe
2004-06-23  6:56       ` Thorsten Kukuk
2004-06-23  8:23         ` Jakub Jelinek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).