From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 19394 invoked by alias); 21 Dec 2012 10:33:27 -0000 Received: (qmail 19205 invoked by uid 22791); 21 Dec 2012 10:32:54 -0000 X-Spam-Check-By: sourceware.org Received: from aquarius.hirmke.de (HELO calimero.vinschen.de) (217.91.18.234) by sourceware.org (qpsmtpd/0.83/v0.83-20-g38e4449) with ESMTP; Fri, 21 Dec 2012 10:32:45 +0000 Received: by calimero.vinschen.de (Postfix, from userid 500) id B2E376A0455; Fri, 21 Dec 2012 11:32:41 +0100 (CET) Date: Fri, 21 Dec 2012 10:33:00 -0000 From: Corinna Vinschen To: cygwin@cygwin.com Subject: Re: Intermittent failures retrieving process exit codes Message-ID: <20121221103241.GD18188@calimero.vinschen.de> Reply-To: cygwin@cygwin.com Mail-Followup-To: cygwin@cygwin.com References: <50C2498C.2000003@coverity.com> <50C276AC.9090301@mailme.ath.cx> <50D401EF.9040705@coverity.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <50D401EF.9040705@coverity.com> User-Agent: Mutt/1.5.21 (2010-09-15) Mailing-List: contact cygwin-help@cygwin.com; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner@cygwin.com Mail-Followup-To: cygwin@cygwin.com X-SW-Source: 2012-12/txt/msg00313.txt.bz2 On Dec 21 01:30, Tom Honermann wrote: > I spent most of the week debugging this issue. This appears to be a > defect in Windows. I can reproduce the issue without Cygwin. I > can't rule out other third party kernel mode software possibly > contributing to the issue. A simple change to Cygwin works around > the problem for me. > > I don't know which Windows releases are affected by this. I've only > reproduced the problem (outside of Cygwin) with Wow64 processes > running on 64-bit Windows 7. I haven't yet tried elsewhere. > > The problem appears to be a race condition involving concurrent > calls to TerminateProcess() and ExitThread(). The example code > below minimally mimics the threads created and exit process/thread > calls that are performed when running Cygwin's false.exe. The > primary thread exits the process via TerminateProcess() ala > pinfo::exit() in winsup/cygwin/pinfo.cc. The secondary thread exits > itself via ExitThread() ala Cygwin's signal processing thread > function, wait_sig(), in winsup/cygwin/sigproc.cc. > > When the race condition results in the undesirable outcome, the exit > code for the process is set to the exit code for the secondary > thread's call to ExitThread(). I can only speculate at this point, > but my guess is that the TerminateProcess() code disassociates the > calling thread from the process before other threads are stopped > such that ExitThread(), concurrently running in another thread, may > determine that the calling thread is the last thread of the process > and overwrite the process exit code. > > The issue also reproduces if ExitProcess() is called in place of > TerminateProcess(). The test case below only uses > TerminateProcess() because that is what Cygwin does. > > Source code to reproduce the issue follows. Again, Cygwin is not > required to reproduce the problem. For my own testing, I compiled > the code using Microsoft's Visual Studio 2010 x86 compiler with the > command 'cl /Fetest-exit-code.exe test-exit-code.cpp' > > test-exit-code.cpp: Wow. Thanks for this testcase. I tried to reproduce the issue and I was not able to reprodsuce it on a single-CPU, single-core setup, but I could reproduce it almost immediately on a dual-core system, twice in a row in under 5 secs. > The workaround I implemented within Cygwin was simple and sloppy. I > added a call to Sleep(1000) immediately before the call to > ExitThread() in wait_sig() in winsup/cygwin/sigproc.cc. Since this > thread (probably) doesn't exit until the process is exiting anyway, > the call to Sleep() does not adversely affect shutdown. The thread > just gets terminated while in the call to Sleep() instead of exiting > before the process is terminated or getting terminated while still > in the call to ExitThread(). A better solution might be to avoid > the thread exiting at all (so long as it can't get terminated while > holding critical resources), or to have the process exiting thread > wait on it. Neither of these is ideal. Orderly shutdown of > multi-threaded processes is really hard to do correctly on Windows. > > Since the exit code for the signal processing thread is not used, > having the wait_sig() thread (and any other threads that could > potentially concurrently exit with another thread) exit with a > special status value such as STATUS_THREAD_IS_TERMINATING > (0xC000004BL) would enable diagnosis of this issue as any process > exit code matching this would be a likely indicator that this issue > was encountered. Maybe the signal thread should really not exit by itself, but just wait until the TerminateThread is called. Chris? Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple