From: Tom Honermann <thonermann@coverity.com>
To: <cygwin@cygwin.com>
Subject: Re: Intermittent failures retrieving process exit codes
Date: Thu, 14 Nov 2013 04:02:00 -0000 [thread overview]
Message-ID: <52844B2E.5050902@coverity.com> (raw)
In-Reply-To: <50D401EF.9040705@coverity.com>
On 12/21/2012 01:30 AM, Tom Honermann wrote:
> I spent most of the week debugging this issue. This appears to be a
> defect in Windows. I can reproduce the issue without Cygwin. I can't
> rule out other third party kernel mode software possibly contributing to
> the issue. A simple change to Cygwin works around the problem for me.
>
> I don't know which Windows releases are affected by this. I've only
> reproduced the problem (outside of Cygwin) with Wow64 processes running
> on 64-bit Windows 7. I haven't yet tried elsewhere.
>
> The problem appears to be a race condition involving concurrent calls to
> TerminateProcess() and ExitThread(). The example code below minimally
> mimics the threads created and exit process/thread calls that are
> performed when running Cygwin's false.exe. The primary thread exits the
> process via TerminateProcess() ala pinfo::exit() in
> winsup/cygwin/pinfo.cc. The secondary thread exits itself via
> ExitThread() ala Cygwin's signal processing thread function, wait_sig(),
> in winsup/cygwin/sigproc.cc.
>
> When the race condition results in the undesirable outcome, the exit
> code for the process is set to the exit code for the secondary thread's
> call to ExitThread(). I can only speculate at this point, but my guess
> is that the TerminateProcess() code disassociates the calling thread
> from the process before other threads are stopped such that
> ExitThread(), concurrently running in another thread, may determine that
> the calling thread is the last thread of the process and overwrite the
> process exit code.
>
> The issue also reproduces if ExitProcess() is called in place of
> TerminateProcess(). The test case below only uses TerminateProcess()
> because that is what Cygwin does.
>
> Source code to reproduce the issue follows. Again, Cygwin is not
> required to reproduce the problem. For my own testing, I compiled the
> code using Microsoft's Visual Studio 2010 x86 compiler with the command
> 'cl /Fetest-exit-code.exe test-exit-code.cpp'
>
> test-exit-code.cpp:
>
> #include <windows.h>
> #include <stdio.h>
> #include <stdlib.h>
>
> DWORD WINAPI SecondaryThread(
> LPVOID lpParameter)
> {
> Sleep(1);
> ExitThread(2);
> }
>
> int main() {
> HANDLE hSecondaryThread = CreateThread(
> NULL, // lpThreadAttributes
> 0, // dwStackSize
> SecondaryThread, // lpStartAddress
> (LPVOID)0, // lpParameter
> 0, // dwCreationFlags
> NULL); // lpThreadId
> if (!hSecondaryThread) {
> fprintf(stderr, "CreateThread failed. GLE=%lu\n",
> (unsigned long)GetLastError());
> exit(127);
> }
>
> Sleep(1);
>
> if (!TerminateProcess(GetCurrentProcess(), 1)) {
> fprintf(stderr, "TerminateProcess failed. GLE=%lu\n",
> (unsigned long)GetLastError());
> exit(127);
> }
>
> return 0;
> }
>
>
> To run the test, a simple .bat file is used:
>
> test.bat:
>
> @echo off
> setlocal
>
> :loop
> echo test...
> test-exit-code.exe
> if %ERRORLEVEL% NEQ 1 (
> echo test-exit-code.exe returned %ERRORLEVEL%
> exit /B 1
> )
> goto loop
>
>
> test.bat should run indefinitely. The amount of time it takes to fail
> on my machine (64-bit Windows 7 running in a VMware Workstation 8 VM
> under Kubuntu 12.04 on a Lenovo T420 Intel i7-2640M 2 processor laptop)
> varies considerably. I had one run fail in less than 10 iterations, but
> most of the time it has taken upwards of 5 minutes to get a failure.
>
> The workaround I implemented within Cygwin was simple and sloppy. I
> added a call to Sleep(1000) immediately before the call to ExitThread()
> in wait_sig() in winsup/cygwin/sigproc.cc. Since this thread (probably)
> doesn't exit until the process is exiting anyway, the call to Sleep()
> does not adversely affect shutdown. The thread just gets terminated
> while in the call to Sleep() instead of exiting before the process is
> terminated or getting terminated while still in the call to
> ExitThread(). A better solution might be to avoid the thread exiting at
> all (so long as it can't get terminated while holding critical
> resources), or to have the process exiting thread wait on it. Neither
> of these is ideal. Orderly shutdown of multi-threaded processes is
> really hard to do correctly on Windows.
>
> Since the exit code for the signal processing thread is not used, having
> the wait_sig() thread (and any other threads that could potentially
> concurrently exit with another thread) exit with a special status value
> such as STATUS_THREAD_IS_TERMINATING (0xC000004BL) would enable
> diagnosis of this issue as any process exit code matching this would be
> a likely indicator that this issue was encountered.
>
> As is, when this race condition results in the undesirable outcome,
> since the signal processing thread exits with a status of 0, the exit
> status of the process is 0. This explains why false.exe works so well
> to reproduce the issue. It would be impossible to produce a negative
> test using true.exe.
>
> Tom.
Time passes...
I worked with some former colleagues to report this issue to Microsoft.
Windows 8.1 and Windows Server 2012 R2 contain a fix that addresses
the test case above. A hotfix has been made available for Windows 7 SP1
and Windows Server 2008 R2. Should anyone desire a hotfix for other
versions of Windows, it will be necessary to open a case with Microsoft
to request it.
http://support.microsoft.com/kb/2875501
Tom.
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
next prev parent reply other threads:[~2013-11-14 4:02 UTC|newest]
Thread overview: 65+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-12-07 19:55 Tom Honermann
2012-12-07 21:54 ` Tom Honermann
2012-12-07 23:07 ` bartels
2012-12-21 6:30 ` Tom Honermann
2012-12-21 10:33 ` Corinna Vinschen
2012-12-21 12:15 ` Nick Lowe
2012-12-21 19:45 ` Tom Honermann
2012-12-22 3:09 ` Nick Lowe
2012-12-21 16:10 ` Christopher Faylor
2012-12-21 17:02 ` Corinna Vinschen
2012-12-21 19:36 ` Intermittent failures retrieving process exit codes - snapshot test requested Christopher Faylor
2012-12-21 20:37 ` Daniel Colascione
2012-12-21 22:23 ` marco atzeri
2012-12-21 23:09 ` Tom Honermann
2012-12-22 2:53 ` Christopher Faylor
2012-12-22 2:57 ` Tom Honermann
2012-12-22 2:49 ` Christopher Faylor
2012-12-22 3:14 ` Christopher Faylor
2012-12-22 9:06 ` marco atzeri
2012-12-22 17:50 ` Christopher Faylor
2012-12-23 16:56 ` Christopher Faylor
2012-12-23 18:54 ` marco atzeri
2012-12-27 20:50 ` Tom Honermann
2012-12-29 21:57 ` Christopher Faylor
2013-01-01 1:45 ` Tom Honermann
2013-01-01 5:36 ` Christopher Faylor
2013-01-02 19:15 ` Tom Honermann
2013-01-02 20:48 ` Christopher Faylor
2013-01-02 20:53 ` Daniel Colascione
2013-01-02 21:41 ` Christopher Faylor
2013-01-02 21:25 ` Tom Honermann
2013-01-15 22:17 ` Intermittent failures with ctrl-c (was: retrieving process exit codes) Tom Honermann
2013-01-16 2:04 ` Christopher Faylor
2013-01-16 16:38 ` Intermittent failures with ctrl-c Tom Honermann
2013-01-16 16:53 ` marco atzeri
2013-01-16 17:42 ` Tom Honermann
2013-01-16 18:05 ` Earnie Boyd
2013-01-16 18:51 ` Tom Honermann
2013-01-16 18:59 ` Christopher Faylor
2013-01-16 20:19 ` Tom Honermann
2013-01-16 22:23 ` Christopher Faylor
2013-01-18 20:12 ` Tom Honermann
2013-01-19 5:58 ` Christopher Faylor
2013-01-20 22:09 ` Tom Honermann
2013-01-23 3:20 ` Tom Honermann
2013-01-23 5:27 ` Christopher Faylor
2013-01-23 18:18 ` Tom Honermann
2013-01-23 18:35 ` Christopher Faylor
2013-01-24 4:12 ` Tom Honermann
2013-01-16 19:14 ` Christopher Faylor
2013-01-16 20:24 ` Tom Honermann
2012-12-21 20:01 ` Intermittent failures retrieving process exit codes Tom Honermann
2013-11-14 4:02 ` Tom Honermann [this message]
2013-11-14 9:20 ` Corinna Vinschen
2013-11-14 15:21 ` Tom Honermann
2013-11-15 18:53 ` Denis Excoffier
2013-11-15 19:21 ` Christopher Faylor
2013-11-17 13:30 ` Denis Excoffier
2013-11-15 22:15 ` Tom Honermann
2013-11-25 19:59 ` Lasse Collin
2013-11-25 23:12 ` Antivirus strikes back (probably) (Was: Intermittent failures retrieving process exit codes) Denis Excoffier
2013-11-26 21:09 ` Denis Excoffier
2013-11-26 23:36 ` Christopher Faylor
2013-11-26 21:09 ` Denis Excoffier
2013-12-01 13:24 ` Lasse Collin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=52844B2E.5050902@coverity.com \
--to=thonermann@coverity.com \
--cc=cygwin@cygwin.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).