public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug nptl/9804] New: pthread_exit from main thread: poor semantics, potential tty session lockup.
@ 2009-01-31  3:18 kkylheku at gmail dot com
  2009-01-31  3:29 ` [Bug nptl/9804] " kkylheku at gmail dot com
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: kkylheku at gmail dot com @ 2009-01-31  3:18 UTC (permalink / raw)
  To: glibc-bugs

(I'm using glibc 2.9; the Version field only goes up to 2.8. The following 
behavior occurs as far back as 2.5).

The kernel is 2.6.26. I'm running i686 and mips; it doesn't matter. I have not 
tried a newer kernel than this.

The issue is that if the main thread of a process terminates with 
pthread_exit, while other threads are still running, the kernel task 
associated with the main thread becomes defunct in some kind of sleep that 
cannot be interrupted by an ordinary signal. As such, the kernel will not 
allow job control to be performed on that process. The process cannot be 
suspended with Ctrl-Z from the shell, nor can it be killed with Ctrl-C.

Moreover, attempts to manipulate it in these ways can cause it to permanently 
hang in that sleep, and no longer terminate even when the other threads do. In 
that situation, the user cannot regain control over the tty session, except by 
logging in via another terminal and killing the defunct process with the 
SIGKILL signal.

The expected behavior is that the process should continue to exist as a normal 
process as long as at least one thread remains running. Normal POSIX job 
control should be possible over that process: switching its process group from 
the foreground to the background, killing it with Ctrl-C.

This may require either a kernel patch, or a change in glibc so that 
pthread_exit from the main thread does not in fact perform a task exit at the 
OS level, but merely a user-space synchronization on the other threads.

(I will explore this from a kernel angle and add any findings or patches here, 
using glibc 2.9 on 2.6.26 as my reference base).

-- 
           Summary: pthread_exit from main thread: poor semantics, potential
                    tty session lockup.
           Product: glibc
           Version: 2.8
            Status: NEW
          Severity: normal
          Priority: P2
         Component: nptl
        AssignedTo: drepper at redhat dot com
        ReportedBy: kkylheku at gmail dot com
                CC: glibc-bugs at sources dot redhat dot com


http://sourceware.org/bugzilla/show_bug.cgi?id=9804

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug nptl/9804] pthread_exit from main thread: poor semantics, potential tty session lockup.
  2009-01-31  3:18 [Bug nptl/9804] New: pthread_exit from main thread: poor semantics, potential tty session lockup kkylheku at gmail dot com
@ 2009-01-31  3:29 ` kkylheku at gmail dot com
  2009-01-31  7:02 ` kkylheku at gmail dot com
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: kkylheku at gmail dot com @ 2009-01-31  3:29 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From kkylheku at gmail dot com  2009-01-31 03:29 -------
(In reply to comment #0)
> The issue is that if the main thread of a process terminates with 
> pthread_exit, while other threads are still running, the kernel task 
> associated with the main thread becomes defunct in some kind of sleep that 
> cannot be interrupted by an ordinary signal. As such, the kernel will not 
> allow job control to be performed on that process. The process cannot be 
> suspended with Ctrl-Z from the shell, nor can it be killed with Ctrl-C.

A little more precision is required. In fact, it's possible to kill the 
process which is in this state with Ctrl-C. The trouble begins if a Ctrl-Z 
suspend is attempted first. This is when it hangs.

The test case is simple: spawn a few threads which terminate after a timeout 
of say 10 seconds, and have the main thread call pthread_exit(NULL). If you 
run this process and do nothing, control is returned to the shell in about ten 
seconds. If you kill it with Ctrl-C, control is regained immediately. If, 
instead, you try to suspend it with Ctrl-Z, it hangs. The tasks of the other 
threads terminate in ten seconds, but the main one must be killed by SIGKILL, 
and control of the tty cannot be wrestled away from it.


-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=9804

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug nptl/9804] pthread_exit from main thread: poor semantics, potential tty session lockup.
  2009-01-31  3:18 [Bug nptl/9804] New: pthread_exit from main thread: poor semantics, potential tty session lockup kkylheku at gmail dot com
  2009-01-31  3:29 ` [Bug nptl/9804] " kkylheku at gmail dot com
@ 2009-01-31  7:02 ` kkylheku at gmail dot com
  2009-01-31  7:43 ` kkylheku at gmail dot com
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: kkylheku at gmail dot com @ 2009-01-31  7:02 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From kkylheku at gmail dot com  2009-01-31 07:01 -------
(In reply to comment #1)
> The tasks of the other threads terminate in ten seconds, but
> the main one must be killed by SIGKILL, and control of the tty
> cannot be wrestled away from it.

This statement is wrong. In fact, the threads other than main do not 
terminate. They also stick around. Somehow, the Ctrl-Z signal prevents their 
termination.

The main thread runs all the way through the do_exit routine in the kernel, 
and makes the final schedule call, turning into a defunct process. (I added 
traces to do_exit).

The Ctrl-Z hang seems to be causing the othread threads to be stuck in the 
kernel function do_signal_stop, according to what is reported 
in /proc/<pid>/wchan.






The repro program is this:

#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <pthread.h>

#define NUM_THREADS 5

void *task(void *a)
{
    sleep(10);
    pthread_exit(NULL);
}

int
main(int argc, char *argv[])
{
    pthread_t thr[NUM_THREADS];
    int i;

    for (i = 0; i < NUM_THREADS; i++) {
        if (pthread_create(&thr[i], NULL, task, 0) != 0) {
            fprintf(stderr, "pthread_create failed\n");
            return EXIT_FAILURE;
        }
    }

    pthread_exit(NULL);
}




-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=9804

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug nptl/9804] pthread_exit from main thread: poor semantics, potential tty session lockup.
  2009-01-31  3:18 [Bug nptl/9804] New: pthread_exit from main thread: poor semantics, potential tty session lockup kkylheku at gmail dot com
  2009-01-31  3:29 ` [Bug nptl/9804] " kkylheku at gmail dot com
  2009-01-31  7:02 ` kkylheku at gmail dot com
@ 2009-01-31  7:43 ` kkylheku at gmail dot com
  2009-02-01 21:14 ` kkylheku at gmail dot com
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: kkylheku at gmail dot com @ 2009-01-31  7:43 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From kkylheku at gmail dot com  2009-01-31 07:43 -------
I have a better understanding what is going on in the kernel. The main thread 
has terminated.  The Ctrl-Z sends a SIGTSTP to the remaining threads in the 
normal way. The threads are suspended. Only, the defunct main thread is not 
available for handling the state transition properly. The expected behavior is 
that a status change occurs in the process that is detected by the shell which 
can then report that the job is in the background. This does not happen for 
the defunct process.

I don't think there is an easy way to fix this. The exit handling has to be 
changed in the kernel. If a thread which behaves like a main thread is 
terminating, it cannot simply run through do_exit and become a zombie. It 
should enter into some kind of wait function in the kernel to join the others. 
While in that function, it should be a regular sleep in which it can handle 
job-control-related signals.

A workaround could be hacked in glibc's __libc_start_main; a kind of 
synchronization on the termination of the other threads. The actual call to 
__exit_thread could be delayed until the other threads are known to have 
terminated (all the way to the kernel; I see there is a notification mechanism 
for a tid being cleared, and it acts as a futex too).







-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=9804

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug nptl/9804] pthread_exit from main thread: poor semantics, potential tty session lockup.
  2009-01-31  3:18 [Bug nptl/9804] New: pthread_exit from main thread: poor semantics, potential tty session lockup kkylheku at gmail dot com
                   ` (2 preceding siblings ...)
  2009-01-31  7:43 ` kkylheku at gmail dot com
@ 2009-02-01 21:14 ` kkylheku at gmail dot com
  2009-02-01 22:10 ` kkylheku at gmail dot com
  2009-02-03  2:30 ` kkylheku at gmail dot com
  5 siblings, 0 replies; 7+ messages in thread
From: kkylheku at gmail dot com @ 2009-02-01 21:14 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From kkylheku at gmail dot com  2009-02-01 21:14 -------
I now have a kernel patch that solves this for me.

I modified the sys_exit system call to cause the parent thread to wait for all 
the other threads to die.   If, while waiting, the parent thread gets a non-
fatal signal, it returns -ERESTARTSYS. I.e. it's returning from sys_exit, 
which normally does not happen.  This allows the signal handling to be done 
and sys_exit to be restarted.  If a fatal signal happens, then the exit goes 
through.

This appears to be working perfectly for me. I can now suspend and resume the 
process even if the main thread has already terminated with pthread_exit. All 
of the threads suspend and resume properly. There is no hang.

I am marking this bug WONTFIX, because it's positively, definitely a kernel 
problem that requires no glibc changes. The kernel is simply not doing its 
part in supporting the semantics of pthread_exit from the primary thread.

Cheers.







-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |WONTFIX


http://sourceware.org/bugzilla/show_bug.cgi?id=9804

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug nptl/9804] pthread_exit from main thread: poor semantics, potential tty session lockup.
  2009-01-31  3:18 [Bug nptl/9804] New: pthread_exit from main thread: poor semantics, potential tty session lockup kkylheku at gmail dot com
                   ` (3 preceding siblings ...)
  2009-02-01 21:14 ` kkylheku at gmail dot com
@ 2009-02-01 22:10 ` kkylheku at gmail dot com
  2009-02-03  2:30 ` kkylheku at gmail dot com
  5 siblings, 0 replies; 7+ messages in thread
From: kkylheku at gmail dot com @ 2009-02-01 22:10 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From kkylheku at gmail dot com  2009-02-01 22:10 -------
Created an attachment (id=3702)
 --> (http://sourceware.org/bugzilla/attachment.cgi?id=3702&action=view)
Kernel patch to fix main thread pthread_exit problem.

Causes the single-thread exit system call (sys_exit) to detect that it's the
main thread of a thread group, and perform a wait for the other threads to die
before proceeding with its own exit. During this wait, signals are handled
properly by returning from sys_exit with ERESTARTSYS (not seen in user space).
Consequently, POSIX job control now works properly on a process whose main
thread has called pthread_exit, but which still has other threads running.

-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=9804

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug nptl/9804] pthread_exit from main thread: poor semantics, potential tty session lockup.
  2009-01-31  3:18 [Bug nptl/9804] New: pthread_exit from main thread: poor semantics, potential tty session lockup kkylheku at gmail dot com
                   ` (4 preceding siblings ...)
  2009-02-01 22:10 ` kkylheku at gmail dot com
@ 2009-02-03  2:30 ` kkylheku at gmail dot com
  5 siblings, 0 replies; 7+ messages in thread
From: kkylheku at gmail dot com @ 2009-02-03  2:30 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From kkylheku at gmail dot com  2009-02-03 02:30 -------
Created an attachment (id=3705)
 --> (http://sourceware.org/bugzilla/attachment.cgi?id=3705&action=view)
Follow-up patch to previous patch to implement additional requirement.

Ulrich presented a firm requirement that once the main thread enters the
sys_exit system call, it must not respond to signals.

Yet, on the other hand, the process should externally respond to uses like
being suspended with SIGTSTP (POSIX job control) and GDB attaching to it.
Simply blocking all the signals in that thread does not work. What does work is
setting all of the signal actions to SIG_IGN; however, the signal handler array
is shared among all the threads.

What this patch does is it unshares the thread's signal handler array. For
this, I completed the signal handler unsharing support in the sys_unshare
system call. I also made that system call available for direct calling within
the kernel, as a do_unshare function. Calling do_unshare(CLONE_SIGHAND) causes
the calling task to have its own private signal handlers.


-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=9804

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2009-02-03  2:30 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-01-31  3:18 [Bug nptl/9804] New: pthread_exit from main thread: poor semantics, potential tty session lockup kkylheku at gmail dot com
2009-01-31  3:29 ` [Bug nptl/9804] " kkylheku at gmail dot com
2009-01-31  7:02 ` kkylheku at gmail dot com
2009-01-31  7:43 ` kkylheku at gmail dot com
2009-02-01 21:14 ` kkylheku at gmail dot com
2009-02-01 22:10 ` kkylheku at gmail dot com
2009-02-03  2:30 ` kkylheku at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).