From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 13064 invoked by alias); 24 Sep 2005 00:11:21 -0000 Mailing-List: contact cygwin-help@cygwin.com; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner@cygwin.com Mail-Followup-To: cygwin@cygwin.com Received: (qmail 13040 invoked by uid 22791); 24 Sep 2005 00:11:11 -0000 Received: from c-24-61-23-223.hsd1.ma.comcast.net (HELO cgf.cx) (24.61.23.223) by sourceware.org (qpsmtpd/0.30-dev) with ESMTP; Sat, 24 Sep 2005 00:11:11 +0000 Received: by cgf.cx (Postfix, from userid 201) id 65A2113C08D; Sat, 24 Sep 2005 00:11:10 +0000 (UTC) Date: Sat, 24 Sep 2005 01:11:00 -0000 From: Christopher Faylor To: cygwin@cygwin.com Subject: Re: Funny hang with snapshop 20050920 Message-ID: <20050924001110.GA1390@trixie.casa.cgf.cx> Reply-To: cygwin@cygwin.com References: <4333660B.7060305@scytek.de> <20050923022619.GB21253@trixie.casa.cgf.cx> <43348E75.7080309@scytek.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <43348E75.7080309@scytek.de> User-Agent: Mutt/1.5.8i X-SW-Source: 2005-09/txt/msg00843.txt.bz2 On Fri, Sep 23, 2005 at 07:23:33PM -0400, Volker Quetschke wrote: >Christopher Faylor wrote: >>On Thu, Sep 22, 2005 at 10:18:51PM -0400, Volker Quetschke wrote: >> >>>My favorite testcase (building OOo) started hanging again. >>>... >>>But now the *really* strange part begins: You can break the hang by doing >>>"ls /proc/3176/fd" !? >>>and the build continues (until the next hang). >>> >>>Sorry, we're unable to create a reduced testcase but we thought the >>>strange symptoms might help pinpoint the problem. >>> >>>Attached you also find the cygcheck output of that system. >> >>Does sending a 'kill -CONT 3176' also unstick things? Both situations >>send a >>signal to the process. > >Sorry, this question got lost, but ... > >>How about attaching to the hung process with strace? You didn't mention >>that. > >he tried to attach and strace was standing there without output. >A "ls /proc//fd" produced then the first four lines of the >attached strace log but tcsh still hung. You know, I noticed yesterday that there was some information missing from the strace output in the open_shared function and, of course, I didn't fix it. Oh well. That means that I don't get much from this strace output. >Several "ls /proc//fd" later it continued and produced the >rest of that logfile. > >Did you notice that the WINPID of the hanging tcsh is the same as >the PID? This is always the case if it hangs. That just means that the process has forked but hasn't execed anything. I don't think that's significant. >Additional info: Both tcsh processes exist with the respective >WINPID in taskmgr. I'd expect that they did or you wouldn't be able to attach to them. There is a new snapshot up there now. I think I've given up on the technique that I was trying to use to fix the Windows 98 bug. I've yanked out a lot of the code and simplified things but I hope I haven't caused the bug to reemerge. Could you try the 2005-09-23 snapshot? Same rules. I'd still like to know if sending a CONT to the hung process fixes it as well as ls /proc/nnn/fd and I'd still like to see the strace output if the process hangs again. Also could anyone who could duplicate the Windows 98 error popup dialog confirm or deny if it is still fixed? cgf -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/