From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 12283 invoked by alias); 17 May 2004 13:52:10 -0000 Mailing-List: contact cygwin-help@cygwin.com; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner@cygwin.com Mail-Followup-To: cygwin@cygwin.com Received: (qmail 12273 invoked from network); 17 May 2004 13:52:09 -0000 Received: from unknown (HELO slinky.cs.nyu.edu) (128.122.20.14) by sourceware.org with SMTP; 17 May 2004 13:52:09 -0000 Received: from slinky.cs.nyu.edu (localhost [127.0.0.1]) by slinky.cs.nyu.edu (8.12.10+Sun/8.12.10) with ESMTP id i4HDq8US017996; Mon, 17 May 2004 09:52:08 -0400 (EDT) Received: from localhost (pechtcha@localhost) by slinky.cs.nyu.edu (8.12.10+Sun/8.12.2/Submit) with ESMTP id i4HDq77k017993; Mon, 17 May 2004 09:52:08 -0400 (EDT) X-Authentication-Warning: slinky.cs.nyu.edu: pechtcha owned process doing -bs Date: Mon, 17 May 2004 14:06:00 -0000 From: Igor Pechtchanski Reply-To: cygwin@cygwin.com To: rouilj@ieee.org cc: cygwin@cygwin.com Subject: Re: Problem with pty allocation code, race condition? In-Reply-To: <200405171330.i4HDUtfu010131@mx1.cs.umb.edu> Message-ID: References: <200405171330.i4HDUtfu010131@mx1.cs.umb.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; CHARSET=US-ASCII Content-ID: <7738.1084800282.1@blade42.cs.umb.edu> X-Scanned-By: MIMEDefang 2.39 X-SW-Source: 2004-05/txt/msg00597.txt.bz2 On Mon, 17 May 2004, John P. Rouillard wrote: > Hello: > > I have noticed a problem when I start X windows. As part of my > startup, I fire up three xterms, but only one of them actually > completes and displays a prompt. > > I believe there may be a race condition in the pty allocation code as > the three bash processes all share the same tty. "ps -ef" shows: > > UID PID PPID TTY STIME COMMAND > jrouilla 2332 2252 1 09:04:19 /usr/bin/bash > jrouilla 2340 2248 1 09:04:19 /usr/bin/bash > jrouilla 2348 2168 1 09:04:19 /usr/bin/bash > jrouilla 2632 2332 1 09:05:01 /usr/bin/ps > > The one with pid 2332 I believe was the first to start based on the > PID, but I also remember that the PID's are not monotonically > increasing under cygwin so YMMV. However pid 2332 is the one > (verified using echo $$) that I can interact with. The other two are > frozen with no output or input (I entered a ^D which should have > exited the shell). > > This failure usually occurs when I first log in in windows and run all > my startup scripts. It is less likely to occur if I start up X after > all the rest of the login processes have run, but I can provoke it > here as well but with a lower frequency. > > A proper startup with three running bash/xterms looks like: > > UID PID PPID TTY STIME COMMAND > jrouilla 2400 2216 1 09:11:05 /usr/bin/bash > jrouilla 2680 2204 3 09:11:05 /usr/bin/bash > jrouilla 2732 2188 4 09:11:06 /usr/bin/bash > jrouilla 2712 2400 1 09:11:09 /usr/bin/ps > > Does the last cygwin snapshot contain any code changes in the pty > allocation area? If so I can try it and see if it helps. I am already > running a snapshot from 20040412-23:00:24, but both 1.5.9 and this > snapshot have the same issue AFAICT. It's an intermittent problem for > me, but I will be happy to provide any info I can. > > I have attached the cygcheck output lightly edited to hide IP > addresses and internal groups. If you need that info to debug the > problem, I will send unedited output on request. > > -- rouilj FWIW, I can confirm that this problem has existed for a while (as long as I can remember) -- if you fire up two xterms in quick succession, especially under heavy load, there are good chances that they will share a pty. The output of regular "ps" will show that the "bash" processes are in the suspended state ("S"), and sending SIGCONT doesn't work. The first xterm (judging by the window position) is always the one getting the suspended bash. Also, it seems to happen more often when the shortcut to start the xterm hasn't been used in a while (evicted from disk cache?), so this makes it hard to reproduce the problem twice in a row. I believe I've reported this before, but couldn't come up with a small reproducible testcase (although I just managed to reproduce it on my machine -- Win2kPro SP3, Cygwin 1.5.9 -- again, using the above recipe). It's annoying enough that I'd like to try debugging it. Of course, as with most races, running the xterms under strace fixes it... Attaching to a hung bash is, IMO, useless, as all of the pty assignments have already happened by that point. Any pointers on how to catch this in the act are appreciated. Igor -- http://cs.nyu.edu/~pechtcha/ |\ _,,,---,,_ pechtcha@cs.nyu.edu ZZZzz /,`.-'`' -. ;-;;,_ igor@watson.ibm.com |,4- ) )-,_. ,\ ( `'-' Igor Pechtchanski, Ph.D. '---''(_/--' `-'\_) fL a.k.a JaguaR-R-R-r-r-r-.-.-. Meow! "I have since come to realize that being between your mentor and his route to the bathroom is a major career booster." -- Patrick Naughton -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/