From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 128639 invoked by alias); 20 Feb 2017 22:54:51 -0000 Mailing-List: contact cygwin-help@cygwin.com; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner@cygwin.com Mail-Followup-To: cygwin@cygwin.com Received: (qmail 128626 invoked by uid 89); 20 Feb 2017 22:54:51 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.7 required=5.0 tests=AWL,BAYES_00,KAM_LAZY_DOMAIN_SECURITY autolearn=no version=3.3.2 spammy=H*Ad:U*mark, stc, STC, H*r:ip*192.168.1.100 X-HELO: m0.truegem.net Received: from m0.truegem.net (HELO m0.truegem.net) (69.55.228.47) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 20 Feb 2017 22:54:48 +0000 Received: (from daemon@localhost) by m0.truegem.net (8.12.11/8.12.11) id v1KMsk9G000557 for ; Mon, 20 Feb 2017 14:54:46 -0800 (PST) (envelope-from mark@maxrnd.com) Received: from 76-217-5-154.lightspeed.irvnca.sbcglobal.net(76.217.5.154), claiming to be "[192.168.1.100]" via SMTP by m0.truegem.net, id smtpdp2AcSx; Mon Feb 20 14:54:45 2017 Subject: Re: Problem with zombie processes To: cygwin@cygwin.com References: <58A3598F.2020405@maxrnd.com> <58A773C9.1080905@maxrnd.com> <58AACADF.6080101@maxrnd.com> From: Mark Geisert Message-ID: <58AB73B5.6040104@maxrnd.com> Date: Mon, 20 Feb 2017 22:54:00 -0000 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0 SeaMonkey/2.40 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-SW-Source: 2017-02/txt/msg00249.txt.bz2 Erik Bray wrote: > On Mon, Feb 20, 2017 at 11:54 AM, Mark Geisert wrote: >>> So my guess was that Cygwin might try to hold on to a handle to a >>> child process at least until it's been explicitly wait()ed. But that >>> does not seem to be the case after all. >> >> >> You might have missed a subtlety in what I said above. The Python >> interpreter itself is calling wait4() to reap your child process. Cygwin >> has told Python one of its children has died. You won't get the chance to >> wait() for it yourself. Cygwin *does* have a handle to the process, but it >> gets closed as part of Python calling wait4(). > > To be clear, wait4() is not called from Python until the script > explicitly calls p.wait(). > In other words, when run this step by step (e.g. in gdb) I don't see a > wait4() call until the point where the script explicitly waits(). I > don't see any reason Python would do this behind the scenes. You're right. I missed the wait in your script and ASSumed too much of the Python interpreter :-( . >>> Anyways, I think it would be nicer if /proc returned at least partial >>> information on zombie processes, rather than an error. I have a patch >>> to this effect for /proc//stat, and will add a few more as well. >>> To me /proc//stat was the most important because that's the >>> easiest way to check the process's state in the first place! Now I >>> also have to catch EINVAL as well and assume that means a zombie >>> process. >> >> >> The file /proc//stat is there until Cygwin finishes cleanup of the >> child due to Python having wait()ed for it. When you run your test script, >> pay attention to the process state character in those cases where you >> successfully read the stat file. It's often S (stopped, I think) or R >> (running) but I also see Z (zombie) sometimes. Your script is in a race >> with Cygwin, and you cannot guarantee you'll see a killed process's state >> before Cygwin cleans it up. >> >> One way around this *might* be to install a SIGCHLD handler in your Python >> script. If that's possible, that should tell you when your child exits. > > Perhaps the Python script is a red herring. I just wrote it to > demonstrate the problem. The difference between where I send stdout > to is strange, but you're likely right that it just comes down to > subtle timing differences. Here's a C program that demonstrates the > same issue more reliably. Interestingly, it works when I run it in > strace (probably just because of the strace overhead) but not when I > run it normally. > > My point in all this is I'm confused why Cygwin would give up its > handles to the Windows process before wait() has been called. > > (In fact, it's pretty confusing to have fopen returning EINVAL which > according to [1] it should only be doing if the mode string were > invalid.) > > Thanks, > Erik > > [1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/fopen.html O.K., you may be on to something amiss in the Cygwin DLL. Thanks for the STC in C; that'll help somebody looking further at this. I'm out of ideas. It might be possible to reduce strace overhead somewhat by selecting a smaller set of trace options than the default. ..mark -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple