* Intermittent failures retrieving process exit codes @ 2012-12-07 19:55 Tom Honermann 2012-12-07 21:54 ` Tom Honermann 2012-12-07 23:07 ` bartels 0 siblings, 2 replies; 65+ messages in thread From: Tom Honermann @ 2012-12-07 19:55 UTC (permalink / raw) To: cygwin I've witnessed intermittent failures in multiple build systems while working at multiple companies using Cygwin bash and make as part of the build system but using non-Cygwin compilers and other tools. The intermittent failures occur when a process appears to complete successfully, but the process retrieving its exit code receives an unexpected value. This has been seen on many different Cygwin versions across several years. Several reports of similar sounding issues can be found online: - http://cygwin.1069669.n5.nabble.com/Cygwin-1-7-x-on-Windows-7-Exit-statuses-of-Win32-executables-are-sometimes-wrong-td20186.html - http://stackoverflow.com/questions/9769256/intermittent-failures-under-cygwin-possibly-related-to-candle-and-or-make I recently was able to produce a very small test case that reproduces this issue reliably on some machines: $ cat test.sh #!/bin/sh while [ 1 ]; do echo "test..." if cmd /c "false"; then echo "exiting..." exit 1 fi done An invocation of test.sh should run indefinitely, but fails very quickly on one of my machines: $ ./test.sh test... test... exiting... $ ./test.sh test... test... test... test... exiting... $ ./test.sh test... exiting... There are several high-level possibilities for what is going wrong: 1) cmd.exe is failing to retrieve the correct exit code for the invocation of false.exe (A Cygwin process). 2) cmd.exe is failing to return the (correct) exit code it received for the invocation of false.exe. 3) bash.exe (A Cygwin process) is failing to retrieve the correct exit code for the invocation of cmd.exe. It is possible that other software installed on the machines I've witnessed this on are contributing to the problem (ala http://cygwin.com/faq/faq.using.html#faq.using.bloda). If so, such software would be a contributing factor to one of the explanations above, but does not necessarily mean that there is not a defect in Cygwin (or CreateProcess, WaitForSingleObject, or GetExitCodeProcess). I have not yet seen a similar case that does not involve Cygwin, so at present I suspect a defect in Cygwin, but possibly one that produces no negative symptoms in isolation. I've reproduced this issue with both the 32-bit and 64-bit versions of cmd.exe. I've also reproduced it by replacing cmd.exe with a C file that calls CreateProcess for Cygwin's false.exe on its own. The issue reproduces whether that C file is compiled with Cygwin gcc, MinGW gcc (32-bit and 64-bit), and with MSVC (32-bit and 64-bit). So, substitute what you like for 'cmd.exe' in the above. Likewise, I've reproduced this issue by replacing false.exe in the test above with a custom false.exe (A C program that just returns 1). The issue reproduces whether myfalse.exe is compiled with Cygwin gcc, MinGW gcc (32-bit and 64-bit), and with MSVC (32-bit and 64-bit). So, substitute what you like for 'false.exe' in the above. I am not able to reproduce the problem if I elide the invocation of false.exe. (ie, if the cmd.exe invocation is 'cmd /c "exit /B 1"' or if my replacement for cmd.exe just returns 1). The problem feels like a race condition in retrieving process exit codes. Further, it seems that it may only occur when two related processes exit in quick succession. I've been granted several weeks in the near future to work exclusively on this issue. Before I start working on it though, I'd like to hear from other community members who have experienced this and tried to debug it. What is and is not known about the issue. What workarounds have been tried (especially any that were found to be successful). Are there specific parts of the Cygwin (or bash) code that you recommend starting with? The machine that I've been running the above script on is 64-bit Windows 7 Professional SP1 running under VMware Workstation 8 which is running on Kubuntu 12.04. Relevant parts of 'cygcheck-s' are: Windows 7 Professional N Ver 6.1 Build 7601 Service Pack 1 Running under WOW64 on AMD64 Cygwin DLL version info: DLL version: 1.7.16 DLL epoch: 19 DLL old termios: 5 DLL malloc env: 28 Cygwin conv: 181 API major: 0 API minor: 262 Shared data: 5 DLL identifier: cygwin1 Mount registry: 3 Cygwin registry name: Cygwin Program options name: Program Options Installations name: Installations Cygdrive default prefix: Build date: Shared id: cygwin1S5 Potential app conflicts: ByteMobile laptop optimization client. No Cygwin services found. Cygwin Package Information Package Version Status bash 4.1.10-4 OK cygwin 1.7.16-1 OK Tom. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Intermittent failures retrieving process exit codes 2012-12-07 19:55 Intermittent failures retrieving process exit codes Tom Honermann @ 2012-12-07 21:54 ` Tom Honermann 2012-12-07 23:07 ` bartels 1 sibling, 0 replies; 65+ messages in thread From: Tom Honermann @ 2012-12-07 21:54 UTC (permalink / raw) To: cygwin On 12/07/2012 02:54 PM, Tom Honermann wrote: > Likewise, I've reproduced this issue by replacing false.exe in the test > above with a custom false.exe (A C program that just returns 1). The > issue reproduces whether myfalse.exe is compiled with Cygwin gcc, MinGW > gcc (32-bit and 64-bit), and with MSVC (32-bit and 64-bit). So, > substitute what you like for 'false.exe' in the above. The above is not correct, I erred in my testing. I am able to reproduce the issue when replacing false.exe in the test case with a custom false.exe compiled with Cygwin gcc. I am *not* able to reproduce the issue when replacing it with one compiled with MinGW gcc (32-bit or 64-bit) or with MSVC (32-bit or 64-bit). Tom. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Intermittent failures retrieving process exit codes 2012-12-07 19:55 Intermittent failures retrieving process exit codes Tom Honermann 2012-12-07 21:54 ` Tom Honermann @ 2012-12-07 23:07 ` bartels 2012-12-21 6:30 ` Tom Honermann 1 sibling, 1 reply; 65+ messages in thread From: bartels @ 2012-12-07 23:07 UTC (permalink / raw) To: cygwin On 12/07/2012 08:54 PM, Tom Honermann wrote: > > I recently was able to produce a very small test case that reproduces this issue reliably on some machines: Your suspicion about a race condition may very well be correct: I can easily confirm the problem on both iron and virtual smp, but not on a single core virtual. I have two instances of your test case running for half hour on the same core, without any problem: 30k cycles without hickup. Apart from the immediate effect exposed by your script, I have reason to believe that the root cause also affects other running (smp) processes. bartels -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Intermittent failures retrieving process exit codes 2012-12-07 23:07 ` bartels @ 2012-12-21 6:30 ` Tom Honermann 2012-12-21 10:33 ` Corinna Vinschen ` (2 more replies) 0 siblings, 3 replies; 65+ messages in thread From: Tom Honermann @ 2012-12-21 6:30 UTC (permalink / raw) To: cygwin I spent most of the week debugging this issue. This appears to be a defect in Windows. I can reproduce the issue without Cygwin. I can't rule out other third party kernel mode software possibly contributing to the issue. A simple change to Cygwin works around the problem for me. I don't know which Windows releases are affected by this. I've only reproduced the problem (outside of Cygwin) with Wow64 processes running on 64-bit Windows 7. I haven't yet tried elsewhere. The problem appears to be a race condition involving concurrent calls to TerminateProcess() and ExitThread(). The example code below minimally mimics the threads created and exit process/thread calls that are performed when running Cygwin's false.exe. The primary thread exits the process via TerminateProcess() ala pinfo::exit() in winsup/cygwin/pinfo.cc. The secondary thread exits itself via ExitThread() ala Cygwin's signal processing thread function, wait_sig(), in winsup/cygwin/sigproc.cc. When the race condition results in the undesirable outcome, the exit code for the process is set to the exit code for the secondary thread's call to ExitThread(). I can only speculate at this point, but my guess is that the TerminateProcess() code disassociates the calling thread from the process before other threads are stopped such that ExitThread(), concurrently running in another thread, may determine that the calling thread is the last thread of the process and overwrite the process exit code. The issue also reproduces if ExitProcess() is called in place of TerminateProcess(). The test case below only uses TerminateProcess() because that is what Cygwin does. Source code to reproduce the issue follows. Again, Cygwin is not required to reproduce the problem. For my own testing, I compiled the code using Microsoft's Visual Studio 2010 x86 compiler with the command 'cl /Fetest-exit-code.exe test-exit-code.cpp' test-exit-code.cpp: #include <windows.h> #include <stdio.h> #include <stdlib.h> DWORD WINAPI SecondaryThread( LPVOID lpParameter) { Sleep(1); ExitThread(2); } int main() { HANDLE hSecondaryThread = CreateThread( NULL, // lpThreadAttributes 0, // dwStackSize SecondaryThread, // lpStartAddress (LPVOID)0, // lpParameter 0, // dwCreationFlags NULL); // lpThreadId if (!hSecondaryThread) { fprintf(stderr, "CreateThread failed. GLE=%lu\n", (unsigned long)GetLastError()); exit(127); } Sleep(1); if (!TerminateProcess(GetCurrentProcess(), 1)) { fprintf(stderr, "TerminateProcess failed. GLE=%lu\n", (unsigned long)GetLastError()); exit(127); } return 0; } To run the test, a simple .bat file is used: test.bat: @echo off setlocal :loop echo test... test-exit-code.exe if %ERRORLEVEL% NEQ 1 ( echo test-exit-code.exe returned %ERRORLEVEL% exit /B 1 ) goto loop test.bat should run indefinitely. The amount of time it takes to fail on my machine (64-bit Windows 7 running in a VMware Workstation 8 VM under Kubuntu 12.04 on a Lenovo T420 Intel i7-2640M 2 processor laptop) varies considerably. I had one run fail in less than 10 iterations, but most of the time it has taken upwards of 5 minutes to get a failure. The workaround I implemented within Cygwin was simple and sloppy. I added a call to Sleep(1000) immediately before the call to ExitThread() in wait_sig() in winsup/cygwin/sigproc.cc. Since this thread (probably) doesn't exit until the process is exiting anyway, the call to Sleep() does not adversely affect shutdown. The thread just gets terminated while in the call to Sleep() instead of exiting before the process is terminated or getting terminated while still in the call to ExitThread(). A better solution might be to avoid the thread exiting at all (so long as it can't get terminated while holding critical resources), or to have the process exiting thread wait on it. Neither of these is ideal. Orderly shutdown of multi-threaded processes is really hard to do correctly on Windows. Since the exit code for the signal processing thread is not used, having the wait_sig() thread (and any other threads that could potentially concurrently exit with another thread) exit with a special status value such as STATUS_THREAD_IS_TERMINATING (0xC000004BL) would enable diagnosis of this issue as any process exit code matching this would be a likely indicator that this issue was encountered. As is, when this race condition results in the undesirable outcome, since the signal processing thread exits with a status of 0, the exit status of the process is 0. This explains why false.exe works so well to reproduce the issue. It would be impossible to produce a negative test using true.exe. Tom. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Intermittent failures retrieving process exit codes 2012-12-21 6:30 ` Tom Honermann @ 2012-12-21 10:33 ` Corinna Vinschen 2012-12-21 12:15 ` Nick Lowe 2012-12-21 16:10 ` Christopher Faylor 2012-12-21 20:01 ` Intermittent failures retrieving process exit codes Tom Honermann 2013-11-14 4:02 ` Tom Honermann 2 siblings, 2 replies; 65+ messages in thread From: Corinna Vinschen @ 2012-12-21 10:33 UTC (permalink / raw) To: cygwin On Dec 21 01:30, Tom Honermann wrote: > I spent most of the week debugging this issue. This appears to be a > defect in Windows. I can reproduce the issue without Cygwin. I > can't rule out other third party kernel mode software possibly > contributing to the issue. A simple change to Cygwin works around > the problem for me. > > I don't know which Windows releases are affected by this. I've only > reproduced the problem (outside of Cygwin) with Wow64 processes > running on 64-bit Windows 7. I haven't yet tried elsewhere. > > The problem appears to be a race condition involving concurrent > calls to TerminateProcess() and ExitThread(). The example code > below minimally mimics the threads created and exit process/thread > calls that are performed when running Cygwin's false.exe. The > primary thread exits the process via TerminateProcess() ala > pinfo::exit() in winsup/cygwin/pinfo.cc. The secondary thread exits > itself via ExitThread() ala Cygwin's signal processing thread > function, wait_sig(), in winsup/cygwin/sigproc.cc. > > When the race condition results in the undesirable outcome, the exit > code for the process is set to the exit code for the secondary > thread's call to ExitThread(). I can only speculate at this point, > but my guess is that the TerminateProcess() code disassociates the > calling thread from the process before other threads are stopped > such that ExitThread(), concurrently running in another thread, may > determine that the calling thread is the last thread of the process > and overwrite the process exit code. > > The issue also reproduces if ExitProcess() is called in place of > TerminateProcess(). The test case below only uses > TerminateProcess() because that is what Cygwin does. > > Source code to reproduce the issue follows. Again, Cygwin is not > required to reproduce the problem. For my own testing, I compiled > the code using Microsoft's Visual Studio 2010 x86 compiler with the > command 'cl /Fetest-exit-code.exe test-exit-code.cpp' > > test-exit-code.cpp: Wow. Thanks for this testcase. I tried to reproduce the issue and I was not able to reprodsuce it on a single-CPU, single-core setup, but I could reproduce it almost immediately on a dual-core system, twice in a row in under 5 secs. > The workaround I implemented within Cygwin was simple and sloppy. I > added a call to Sleep(1000) immediately before the call to > ExitThread() in wait_sig() in winsup/cygwin/sigproc.cc. Since this > thread (probably) doesn't exit until the process is exiting anyway, > the call to Sleep() does not adversely affect shutdown. The thread > just gets terminated while in the call to Sleep() instead of exiting > before the process is terminated or getting terminated while still > in the call to ExitThread(). A better solution might be to avoid > the thread exiting at all (so long as it can't get terminated while > holding critical resources), or to have the process exiting thread > wait on it. Neither of these is ideal. Orderly shutdown of > multi-threaded processes is really hard to do correctly on Windows. > > Since the exit code for the signal processing thread is not used, > having the wait_sig() thread (and any other threads that could > potentially concurrently exit with another thread) exit with a > special status value such as STATUS_THREAD_IS_TERMINATING > (0xC000004BL) would enable diagnosis of this issue as any process > exit code matching this would be a likely indicator that this issue > was encountered. Maybe the signal thread should really not exit by itself, but just wait until the TerminateThread is called. Chris? Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Intermittent failures retrieving process exit codes 2012-12-21 10:33 ` Corinna Vinschen @ 2012-12-21 12:15 ` Nick Lowe 2012-12-21 19:45 ` Tom Honermann 2012-12-21 16:10 ` Christopher Faylor 1 sibling, 1 reply; 65+ messages in thread From: Nick Lowe @ 2012-12-21 12:15 UTC (permalink / raw) To: Andrey Repin Briefly casting my eye at the test case, as a general point, remember that these termination APIs all complete asynchronously and I do not believe it has ever been safe or correct to call another while one is still pending - you are in undefined, edge case behaviour territory here. Win32's TerminateThread/ExitThread, that in turn calls the native NtTerminateThread, only requests cancellation of a thread and returns immediately. One has to wait on a handle to the thread know that termination has completed, for which the synchronise standard access right is required. The same is true of Win32's TerminateProcess/ExitProcess, in turn NtTerminateProcess, where one waits instead on a handle to the process. Regards, Nick -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Intermittent failures retrieving process exit codes 2012-12-21 12:15 ` Nick Lowe @ 2012-12-21 19:45 ` Tom Honermann 2012-12-22 3:09 ` Nick Lowe 0 siblings, 1 reply; 65+ messages in thread From: Tom Honermann @ 2012-12-21 19:45 UTC (permalink / raw) To: cygwin On 12/21/2012 07:15 AM, Nick Lowe wrote: > Briefly casting my eye at the test case, as a general point, remember > that these termination APIs all complete asynchronously and I do not > believe it has ever been safe or correct to call another while one is > still pending - you are in undefined, edge case behaviour territory > here. These comments do not match my understanding of these APIs. MSDN documentation contradicts some of this as well. > Win32's TerminateThread/ExitThread, that in turn calls the native > NtTerminateThread, only requests cancellation of a thread and returns > immediately. > One has to wait on a handle to the thread know that termination has > completed, for which the synchronise standard access right is > required. > The same is true of Win32's TerminateProcess/ExitProcess, in turn > NtTerminateProcess, where one waits instead on a handle to the > process. TerminateProcess() is documented to perform error checking and then to schedule asynchronous termination of the specified process. I would not be surprised if the asynchronous termination applies even when GetCurrentProcess() is used to specify the process to terminate, but I would likewise not be surprised if TerminateProcess() has special handling for this. I agree that calls to TerminateProcess() might return before the calling thread/process is terminated. I have not tried to verify this behavior though. http://msdn.microsoft.com/en-us/library/windows/desktop/ms686714%28v=vs.85%29.aspx The MSDN documentation for TerminateThread() does not state that the termination is carried out asynchronously, but I would not be surprised if that is the case. http://msdn.microsoft.com/en-us/library/windows/desktop/ms686717%28v=vs.85%29.aspx I would be *very* surprised if it is possible for ExitProcess() and ExitThread() to return (unless the thread is being suspended and its context manipulated by another process/thread). The MSDN docs for these do not mention any possibility of return. In addition, the ExitThread() documentation explicitly states that Windows manages serialization of calls to ExitProcess() and ExitThread(). <quote> The ExitProcess, ExitThread, CreateThread, CreateRemoteThread functions, and a process that is starting (as the result of a CreateProcess call) are serialized between each other within a process. Only one of these events can happen in an address space at a time. </quote> http://msdn.microsoft.com/en-us/library/windows/desktop/ms682659%28v=vs.85%29.aspx http://msdn.microsoft.com/en-us/library/windows/desktop/ms682658%28v=vs.85%29.aspx I read that quote as supporting my assertion that the observed behavior is a defect in Windows. It appears Windows is failing to serialize the calls appropriately. Tom. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Intermittent failures retrieving process exit codes 2012-12-21 19:45 ` Tom Honermann @ 2012-12-22 3:09 ` Nick Lowe 0 siblings, 0 replies; 65+ messages in thread From: Nick Lowe @ 2012-12-22 3:09 UTC (permalink / raw) To: Andrey Repin The documentation in MSDN is incorrect/incomplete with regards to TerminateThread/TerminateProcess, both are definitely asynchronous. I am not clear/confident on the behaviour of ExitProcess and ExitThread, but will investigate with IDA and a test case later. I suspect any locking/serialisation will pertain to these functions only. On Fri, Dec 21, 2012 at 7:44 PM, Tom Honermann <thonermann@coverity.com> wrote: > On 12/21/2012 07:15 AM, Nick Lowe wrote: >> >> Briefly casting my eye at the test case, as a general point, remember >> that these termination APIs all complete asynchronously and I do not >> believe it has ever been safe or correct to call another while one is >> still pending - you are in undefined, edge case behaviour territory >> here. > > > These comments do not match my understanding of these APIs. MSDN > documentation contradicts some of this as well. > > >> Win32's TerminateThread/ExitThread, that in turn calls the native >> NtTerminateThread, only requests cancellation of a thread and returns >> immediately. >> One has to wait on a handle to the thread know that termination has >> completed, for which the synchronise standard access right is >> required. >> The same is true of Win32's TerminateProcess/ExitProcess, in turn >> NtTerminateProcess, where one waits instead on a handle to the >> process. > > > TerminateProcess() is documented to perform error checking and then to > schedule asynchronous termination of the specified process. I would not be > surprised if the asynchronous termination applies even when > GetCurrentProcess() is used to specify the process to terminate, but I would > likewise not be surprised if TerminateProcess() has special handling for > this. I agree that calls to TerminateProcess() might return before the > calling thread/process is terminated. I have not tried to verify this > behavior though. > > http://msdn.microsoft.com/en-us/library/windows/desktop/ms686714%28v=vs.85%29.aspx > > The MSDN documentation for TerminateThread() does not state that the > termination is carried out asynchronously, but I would not be surprised if > that is the case. > > http://msdn.microsoft.com/en-us/library/windows/desktop/ms686717%28v=vs.85%29.aspx > > I would be *very* surprised if it is possible for ExitProcess() and > ExitThread() to return (unless the thread is being suspended and its context > manipulated by another process/thread). The MSDN docs for these do not > mention any possibility of return. In addition, the ExitThread() > documentation explicitly states that Windows manages serialization of calls > to ExitProcess() and ExitThread(). > > <quote> > The ExitProcess, ExitThread, CreateThread, CreateRemoteThread functions, and > a process that is starting (as the result of a CreateProcess call) are > serialized between each other within a process. Only one of these events can > happen in an address space at a time. > </quote> > > http://msdn.microsoft.com/en-us/library/windows/desktop/ms682659%28v=vs.85%29.aspx > > http://msdn.microsoft.com/en-us/library/windows/desktop/ms682658%28v=vs.85%29.aspx > > I read that quote as supporting my assertion that the observed behavior is a > defect in Windows. It appears Windows is failing to serialize the calls > appropriately. > > Tom. > > > > -- > Problem reports: http://cygwin.com/problems.html > FAQ: http://cygwin.com/faq/ > Documentation: http://cygwin.com/docs.html > Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple > -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Intermittent failures retrieving process exit codes 2012-12-21 10:33 ` Corinna Vinschen 2012-12-21 12:15 ` Nick Lowe @ 2012-12-21 16:10 ` Christopher Faylor 2012-12-21 17:02 ` Corinna Vinschen 1 sibling, 1 reply; 65+ messages in thread From: Christopher Faylor @ 2012-12-21 16:10 UTC (permalink / raw) To: cygwin On Fri, Dec 21, 2012 at 11:32:41AM +0100, Corinna Vinschen wrote: >Maybe the signal thread should really not exit by itself, but just >wait until the TerminateThread is called. Chris? If the analysis is correct, that just fixes one symptom doesn't it? There are potentially many threads running in any Cygwin program and it sounds like any one of them could trigger this. cgf -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Intermittent failures retrieving process exit codes 2012-12-21 16:10 ` Christopher Faylor @ 2012-12-21 17:02 ` Corinna Vinschen 2012-12-21 19:36 ` Intermittent failures retrieving process exit codes - snapshot test requested Christopher Faylor 0 siblings, 1 reply; 65+ messages in thread From: Corinna Vinschen @ 2012-12-21 17:02 UTC (permalink / raw) To: cygwin On Dec 21 11:10, Christopher Faylor wrote: > On Fri, Dec 21, 2012 at 11:32:41AM +0100, Corinna Vinschen wrote: > >Maybe the signal thread should really not exit by itself, but just > >wait until the TerminateThread is called. Chris? > > If the analysis is correct, that just fixes one symptom doesn't it? > There are potentially many threads running in any Cygwin program > and it sounds like any one of them could trigger this. Right. I guess the question is how to synchronize things so that the thread calling TerminateProcess is actually the last one, making sure its return value is used. Maybe the NtQueryInformationThread(ThreadAmILastThread) call is of some help. Or we have to keep all thread IDs of the self-started threads available to terminate them explicitely at process exit. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Intermittent failures retrieving process exit codes - snapshot test requested 2012-12-21 17:02 ` Corinna Vinschen @ 2012-12-21 19:36 ` Christopher Faylor 2012-12-21 20:37 ` Daniel Colascione 2012-12-21 22:23 ` marco atzeri 0 siblings, 2 replies; 65+ messages in thread From: Christopher Faylor @ 2012-12-21 19:36 UTC (permalink / raw) To: cygwin On Fri, Dec 21, 2012 at 06:02:19PM +0100, Corinna Vinschen wrote: >On Dec 21 11:10, Christopher Faylor wrote: >> On Fri, Dec 21, 2012 at 11:32:41AM +0100, Corinna Vinschen wrote: >> >Maybe the signal thread should really not exit by itself, but just >> >wait until the TerminateThread is called. Chris? >> >> If the analysis is correct, that just fixes one symptom doesn't it? >> There are potentially many threads running in any Cygwin program >> and it sounds like any one of them could trigger this. > >Right. I guess the question is how to synchronize things so that the >thread calling TerminateProcess is actually the last one, making sure >its return value is used. > >Maybe the NtQueryInformationThread(ThreadAmILastThread) call is of some >help. Or we have to keep all thread IDs of the self-started threads >available to terminate them explicitely at process exit. I checked in a complicated fix for this problem which only affected Cygwin-created threads. But, then, I thought about another riskier but simpler fix. That version is now in CVS and I'm generating a new snapshot with it. I tested this lightly on Windows 7 and 32-bit XP but it would be nice to hear if multi-threaded things like X work on other platforms too. If you test a snapshot, note that I'm still tracking down Ken Brown's reporte emacs regression in recent snapshots so that will still be broken. cgf -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Intermittent failures retrieving process exit codes - snapshot test requested 2012-12-21 19:36 ` Intermittent failures retrieving process exit codes - snapshot test requested Christopher Faylor @ 2012-12-21 20:37 ` Daniel Colascione 2012-12-21 22:23 ` marco atzeri 1 sibling, 0 replies; 65+ messages in thread From: Daniel Colascione @ 2012-12-21 20:37 UTC (permalink / raw) To: cygwin [-- Attachment #1: Type: text/plain, Size: 1382 bytes --] On 12/21/2012 11:36 AM, Christopher Faylor wrote: > On Fri, Dec 21, 2012 at 06:02:19PM +0100, Corinna Vinschen wrote: >> On Dec 21 11:10, Christopher Faylor wrote: >>> On Fri, Dec 21, 2012 at 11:32:41AM +0100, Corinna Vinschen wrote: >>>> Maybe the signal thread should really not exit by itself, but just >>>> wait until the TerminateThread is called. Chris? >>> >>> If the analysis is correct, that just fixes one symptom doesn't it? >>> There are potentially many threads running in any Cygwin program >>> and it sounds like any one of them could trigger this. >> >> Right. I guess the question is how to synchronize things so that the >> thread calling TerminateProcess is actually the last one, making sure >> its return value is used. >> >> Maybe the NtQueryInformationThread(ThreadAmILastThread) call is of some >> help. Or we have to keep all thread IDs of the self-started threads >> available to terminate them explicitely at process exit. > > I checked in a complicated fix for this problem which only affected > Cygwin-created threads. But, then, I thought about another riskier but > simpler fix. Your second approach scares me. There's no global order imposed on the loader lock and the Cygwin process lock, and Windows can take the loader lock at virtually any time, since LoadLibrary can be used internally to implement any API. [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 258 bytes --] ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Intermittent failures retrieving process exit codes - snapshot test requested 2012-12-21 19:36 ` Intermittent failures retrieving process exit codes - snapshot test requested Christopher Faylor 2012-12-21 20:37 ` Daniel Colascione @ 2012-12-21 22:23 ` marco atzeri 2012-12-21 23:09 ` Tom Honermann 2012-12-22 2:49 ` Christopher Faylor 1 sibling, 2 replies; 65+ messages in thread From: marco atzeri @ 2012-12-21 22:23 UTC (permalink / raw) To: cygwin On 12/21/2012 8:36 PM, Christopher Faylor wrote: > On Fri, Dec 21, 2012 at 06:02:19PM +0100, Corinna Vinschen wrote: >> On Dec 21 11:10, Christopher Faylor wrote: >>> On Fri, Dec 21, 2012 at 11:32:41AM +0100, Corinna Vinschen wrote: >>>> Maybe the signal thread should really not exit by itself, but just >>>> wait until the TerminateThread is called. Chris? >>> >>> If the analysis is correct, that just fixes one symptom doesn't it? >>> There are potentially many threads running in any Cygwin program >>> and it sounds like any one of them could trigger this. >> >> Right. I guess the question is how to synchronize things so that the >> thread calling TerminateProcess is actually the last one, making sure >> its return value is used. >> >> Maybe the NtQueryInformationThread(ThreadAmILastThread) call is of some >> help. Or we have to keep all thread IDs of the self-started threads >> available to terminate them explicitely at process exit. > > I checked in a complicated fix for this problem which only affected > Cygwin-created threads. But, then, I thought about another riskier but > simpler fix. That version is now in CVS and I'm generating a new > snapshot with it. > > I tested this lightly on Windows 7 and 32-bit XP but it would be nice to > hear if multi-threaded things like X work on other platforms too. > > If you test a snapshot, note that I'm still tracking down Ken Brown's > reporte emacs regression in recent snapshots so that will still be > broken. > > cgf > I think the Xserver doesn't like it. on 20121221 it freezes on start on W7/64 no issue on 20121218 Regards Marco -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Intermittent failures retrieving process exit codes - snapshot test requested 2012-12-21 22:23 ` marco atzeri @ 2012-12-21 23:09 ` Tom Honermann 2012-12-22 2:53 ` Christopher Faylor 2012-12-22 2:49 ` Christopher Faylor 1 sibling, 1 reply; 65+ messages in thread From: Tom Honermann @ 2012-12-21 23:09 UTC (permalink / raw) To: cygwin On 12/21/2012 05:23 PM, marco atzeri wrote: > On 12/21/2012 8:36 PM, Christopher Faylor wrote: >> I tested this lightly on Windows 7 and 32-bit XP but it would be nice to >> hear if multi-threaded things like X work on other platforms too. >> >> If you test a snapshot, note that I'm still tracking down Ken Brown's >> reporte emacs regression in recent snapshots so that will still be >> broken. >> >> cgf >> > > I think the Xserver doesn't like it. > on 20121221 it freezes on start on W7/64 > no issue on 20121218 I was worried about this possibility after looking at the code changes. But, I haven't had to a chance to test adequately yet. I would expect indefinite blocking in dll_entry() may prevent unloading DLLs. For example, calls to dll_entry() for DLL_PROCESS_DETACH may get blocked. It looks to me like the changes made are insufficient to prevent the race. For example, this won't address the case where an exiting thread releases the process lock acquired in dll_entry() before a thread exiting the process acquires it in pinfo::exit(). Both threads could still end up in an ExitThread() vs ExitProcess()/TerminateProcess() race. However, this is only true for threads whose exits are not predicated upon an action taken by the process exiting thread after it has acquired the process lock in pinfo::exit(). And since the exiting thread must be the last thread of the process in order to hit the issue, this may not be a concern. I'm not sure that a general workaround for this issue is feasible for all possible threads. At least, not without hooking the Terminate* and Exit* Win32 APIs. My gut tells me that a general solution requires waiting for thread handles to be signaled, but I haven't thought it completely through yet. It looks like Chris reverted the change and checked in a new update. I haven't looked at those changes yet. Tom. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Intermittent failures retrieving process exit codes - snapshot test requested 2012-12-21 23:09 ` Tom Honermann @ 2012-12-22 2:53 ` Christopher Faylor 2012-12-22 2:57 ` Tom Honermann 0 siblings, 1 reply; 65+ messages in thread From: Christopher Faylor @ 2012-12-22 2:53 UTC (permalink / raw) To: cygwin On Fri, Dec 21, 2012 at 06:08:46PM -0500, Tom Honermann wrote: >On 12/21/2012 05:23 PM, marco atzeri wrote: >> On 12/21/2012 8:36 PM, Christopher Faylor wrote: >>> I tested this lightly on Windows 7 and 32-bit XP but it would be nice to >>> hear if multi-threaded things like X work on other platforms too. >>> >>> If you test a snapshot, note that I'm still tracking down Ken Brown's >>> reporte emacs regression in recent snapshots so that will still be >>> broken. >>> >>> cgf >>> >> >> I think the Xserver doesn't like it. >> on 20121221 it freezes on start on W7/64 >> no issue on 20121218 > >I was worried about this possibility after looking at the code changes. > But, I haven't had to a chance to test adequately yet. I would expect >indefinite blocking in dll_entry() may prevent unloading DLLs. For >example, calls to dll_entry() for DLL_PROCESS_DETACH may get blocked. You're looking at the wrong changes. cgf -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Intermittent failures retrieving process exit codes - snapshot test requested 2012-12-22 2:53 ` Christopher Faylor @ 2012-12-22 2:57 ` Tom Honermann 0 siblings, 0 replies; 65+ messages in thread From: Tom Honermann @ 2012-12-22 2:57 UTC (permalink / raw) To: cygwin On 12/21/2012 09:52 PM, Christopher Faylor wrote: > You're looking at the wrong changes. I wasn't at the time that I wrote that :) I noticed that you had reverted those changes. I haven't looked at the new changes yet. Tom. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Intermittent failures retrieving process exit codes - snapshot test requested 2012-12-21 22:23 ` marco atzeri 2012-12-21 23:09 ` Tom Honermann @ 2012-12-22 2:49 ` Christopher Faylor 2012-12-22 3:14 ` Christopher Faylor 1 sibling, 1 reply; 65+ messages in thread From: Christopher Faylor @ 2012-12-22 2:49 UTC (permalink / raw) To: cygwin On Fri, Dec 21, 2012 at 11:23:00PM +0100, marco atzeri wrote: >On 12/21/2012 8:36 PM, Christopher Faylor wrote: >> On Fri, Dec 21, 2012 at 06:02:19PM +0100, Corinna Vinschen wrote: >>> On Dec 21 11:10, Christopher Faylor wrote: >>>> On Fri, Dec 21, 2012 at 11:32:41AM +0100, Corinna Vinschen wrote: >>>>> Maybe the signal thread should really not exit by itself, but just >>>>> wait until the TerminateThread is called. Chris? >>>> >>>> If the analysis is correct, that just fixes one symptom doesn't it? >>>> There are potentially many threads running in any Cygwin program >>>> and it sounds like any one of them could trigger this. >>> >>> Right. I guess the question is how to synchronize things so that the >>> thread calling TerminateProcess is actually the last one, making sure >>> its return value is used. >>> >>> Maybe the NtQueryInformationThread(ThreadAmILastThread) call is of some >>> help. Or we have to keep all thread IDs of the self-started threads >>> available to terminate them explicitely at process exit. >> >> I checked in a complicated fix for this problem which only affected >> Cygwin-created threads. But, then, I thought about another riskier but >> simpler fix. That version is now in CVS and I'm generating a new >> snapshot with it. >> >> I tested this lightly on Windows 7 and 32-bit XP but it would be nice to >> hear if multi-threaded things like X work on other platforms too. >> >> If you test a snapshot, note that I'm still tracking down Ken Brown's >> reporte emacs regression in recent snapshots so that will still be >> broken. >> >> cgf >> > >I think the Xserver doesn't like it. >on 20121221 it freezes on start on W7/64 >no issue on 20121218 I acdtually tried Xserver before submitting my change so it certainly isn't a consistent problem. cgf -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Intermittent failures retrieving process exit codes - snapshot test requested 2012-12-22 2:49 ` Christopher Faylor @ 2012-12-22 3:14 ` Christopher Faylor 2012-12-22 9:06 ` marco atzeri 0 siblings, 1 reply; 65+ messages in thread From: Christopher Faylor @ 2012-12-22 3:14 UTC (permalink / raw) To: cygwin On Fri, Dec 21, 2012 at 09:49:43PM -0500, Christopher Faylor wrote: >I actually tried Xserver before submitting my change so it certainly isn't >a consistent problem. Sorry, I take that back. I tried Xserver before backing out parts of the other change and never retried it. Marco is right. It's definitely broken. I've checked in a new change and am regenerating a snapshot. cgf -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Intermittent failures retrieving process exit codes - snapshot test requested 2012-12-22 3:14 ` Christopher Faylor @ 2012-12-22 9:06 ` marco atzeri 2012-12-22 17:50 ` Christopher Faylor 0 siblings, 1 reply; 65+ messages in thread From: marco atzeri @ 2012-12-22 9:06 UTC (permalink / raw) To: cygwin On 12/22/2012 4:14 AM, Christopher Faylor wrote: > On Fri, Dec 21, 2012 at 09:49:43PM -0500, Christopher Faylor wrote: >> I actually tried Xserver before submitting my change so it certainly isn't >> a consistent problem. > > Sorry, I take that back. I tried Xserver before backing out parts of the > other change and never retried it. Marco is right. It's definitely broken. > I've checked in a new change and am regenerating a snapshot. > > cgf > glad to be useful 20121222 : Xserver works fine and the false loop does not stop. However lftp is still broken $ lftp lftp :~> open -u xxxxxxx matzeri.altervista.org 1 [main] lftp 1092 select_stuff::wait: WaitForMultipleObjects failed, Win32 error 6 (I have the impression it worked after your last select changes, but I am unable to replicate) Regards Marco -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Intermittent failures retrieving process exit codes - snapshot test requested 2012-12-22 9:06 ` marco atzeri @ 2012-12-22 17:50 ` Christopher Faylor 2012-12-23 16:56 ` Christopher Faylor 0 siblings, 1 reply; 65+ messages in thread From: Christopher Faylor @ 2012-12-22 17:50 UTC (permalink / raw) To: cygwin On Sat, Dec 22, 2012 at 10:06:32AM +0100, marco atzeri wrote: >On 12/22/2012 4:14 AM, Christopher Faylor wrote: >> On Fri, Dec 21, 2012 at 09:49:43PM -0500, Christopher Faylor wrote: >>> I actually tried Xserver before submitting my change so it certainly isn't >>> a consistent problem. >> >> Sorry, I take that back. I tried Xserver before backing out parts of the >> other change and never retried it. Marco is right. It's definitely broken. >> I've checked in a new change and am regenerating a snapshot. >> >> cgf >> > >glad to be useful > >20121222 : Xserver works fine and the false loop does not stop. > >However lftp is still broken > >$ lftp >lftp :~> open -u xxxxxxx matzeri.altervista.org > 1 [main] lftp 1092 select_stuff::wait: WaitForMultipleObjects >failed, Win32 error 6 > > >(I have the impression it worked after your last select changes, but I >am unable to replicate) The snapshot is intended to work around the race between ExitThread and ExitProcess. Nothing else. cgf -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Intermittent failures retrieving process exit codes - snapshot test requested 2012-12-22 17:50 ` Christopher Faylor @ 2012-12-23 16:56 ` Christopher Faylor 2012-12-23 18:54 ` marco atzeri 2012-12-27 20:50 ` Tom Honermann 0 siblings, 2 replies; 65+ messages in thread From: Christopher Faylor @ 2012-12-23 16:56 UTC (permalink / raw) To: cygwin On Sat, Dec 22, 2012 at 12:50:41PM -0500, Christopher Faylor wrote: >On Sat, Dec 22, 2012 at 10:06:32AM +0100, marco atzeri wrote: >>On 12/22/2012 4:14 AM, Christopher Faylor wrote: >>> On Fri, Dec 21, 2012 at 09:49:43PM -0500, Christopher Faylor wrote: >>>> I actually tried Xserver before submitting my change so it certainly isn't >>>> a consistent problem. >>> >>> Sorry, I take that back. I tried Xserver before backing out parts of the >>> other change and never retried it. Marco is right. It's definitely broken. >>> I've checked in a new change and am regenerating a snapshot. >> >>glad to be useful >> >>20121222 : Xserver works fine and the false loop does not stop. >> >>However lftp is still broken >> >>$ lftp >>lftp :~> open -u xxxxxxx matzeri.altervista.org >> 1 [main] lftp 1092 select_stuff::wait: WaitForMultipleObjects >>failed, Win32 error 6 >> >> >>(I have the impression it worked after your last select changes, but I >>am unable to replicate) > >The snapshot is intended to work around the race between ExitThread and >ExitProcess. Nothing else. The latest snapshot seems to fix this problem. FYI cgf -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Intermittent failures retrieving process exit codes - snapshot test requested 2012-12-23 16:56 ` Christopher Faylor @ 2012-12-23 18:54 ` marco atzeri 2012-12-27 20:50 ` Tom Honermann 1 sibling, 0 replies; 65+ messages in thread From: marco atzeri @ 2012-12-23 18:54 UTC (permalink / raw) To: cygwin On 12/23/2012 5:56 PM, Christopher Faylor wrote: >>> However lftp is still broken >>> >>> $ lftp >>> lftp :~> open -u xxxxxxx matzeri.altervista.org >>> 1 [main] lftp 1092 select_stuff::wait: WaitForMultipleObjects >>> failed, Win32 error 6 >>> >>> >>> (I have the impression it worked after your last select changes, but I >>> am unable to replicate) >> >> The snapshot is intended to work around the race between ExitThread and >> ExitProcess. Nothing else. > > The latest snapshot seems to fix this problem. > > FYI > cgf > confirmed. 20121222 19:36:23 is fine Thanks Marco -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Intermittent failures retrieving process exit codes - snapshot test requested 2012-12-23 16:56 ` Christopher Faylor 2012-12-23 18:54 ` marco atzeri @ 2012-12-27 20:50 ` Tom Honermann 2012-12-29 21:57 ` Christopher Faylor 1 sibling, 1 reply; 65+ messages in thread From: Tom Honermann @ 2012-12-27 20:50 UTC (permalink / raw) To: cygwin I've been doing some testing with the latest source (pulled updates about 30 minutes ago). I'm no longer able to reproduce any problems with incorrect exit codes (Yay! Thanks for the quick turn around on that!), but I am seeing some new errors when terminating the infinite loop via ctrl-c using the test case below. This is a test case I was using previously to help isolate the original problem - I had added special_printf() calls in a few places and was using strace -m special to trigger them. All of my changes have been reverted and I'm back to using vanilla source code. This test is run with a newly built strace.exe and cygwin1.dll (the false.exe is an old one) c:\>type test-strace.bat @echo off setlocal set PATH=%CD%;%PATH% :loop echo test... strace -m special false if not errorlevel 1 ( echo exiting... exit /B 1 ) goto loop When interrupting the test run, I'll often (but not always) get the following error: c:\>test-strace.bat test... test... test... test... --- Process 8092, exception 40010005 at 75E26D67 Terminate batch job (Y/N)? y Additionally, some of the Cygwin gcc built utilities that I've built for testing now occasionally hang upon interruption by ctrl-c. Basic diagnostics courtesy of gdb follow. This utility was one used in place of strace in the test case above. It does a fork() and execlp() of its first parameter and then calls waitpid() on the child and asserts that the exit code received is 1. If anyone knows of a way to get accurate stack traces when both gcc and Microsoft compiled modules are present, I'll be happy to regenerate the stack traces below. $ gdb --pid=6908 GNU gdb (GDB) 7.5.50.20120815-cvs (cygwin-special) ... Reading symbols from /home/thonermann/cygwin/test-install/bin/expect-false-execve-cygwin32.exe...done. ... (gdb) info shared From To Syms Read Shared Object Library 0x77461000 0x775c5d1c Yes (*) /cygdrive/c/Windows/SysWOW64/ntdll.dll 0x75d71000 0x75e6bd58 Yes (*) /cygdrive/c/Windows/syswow64/kernel32.dll 0x74ba1000 0x74be5a08 Yes (*) /cygdrive/c/Windows/syswow64/KERNELBASE.dll 0x61001000 0x61490000 Yes /home/thonermann/cygwin/test-install/bin/cygwin1.dll 0x76271000 0x76354198 Yes (*) /cygdrive/c/Windows/system32/user32.dll 0x74f11000 0x74f8292c Yes (*) /cygdrive/c/Windows/syswow64/GDI32.dll 0x76181000 0x761892f8 Yes (*) /cygdrive/c/Windows/syswow64/LPK.dll 0x74d71000 0x74e0c9fc Yes (*) /cygdrive/c/Windows/syswow64/USP10.dll 0x75bf1000 0x75c9b2c4 Yes (*) /cygdrive/c/Windows/syswow64/msvcrt.dll 0x75eb1000 0x75f4f04c Yes (*) /cygdrive/c/Windows/syswow64/ADVAPI32.dll 0x74ed1000 0x74ee8ed8 Yes (*) /cygdrive/c/Windows/SysWOW64/sechost.dll 0x76371000 0x76445e04 Yes (*) /cygdrive/c/Windows/syswow64/RPCRT4.dll 0x74b41000 0x74b821f0 Yes (*) /cygdrive/c/Windows/syswow64/SspiCli.dll 0x74b31000 0x74b3b474 Yes (*) /cygdrive/c/Windows/syswow64/CRYPTBASE.dll 0x76581000 0x765c1ce0 Yes (*) /cygdrive/c/Windows/system32/IMM32.DLL 0x75ca1000 0x75d6bebc Yes (*) /cygdrive/c/Windows/syswow64/MSCTF.dll 0x70e41000 0x70e8b464 Yes (*) /cygdrive/c/Windows/system32/apphelp.dll (*): Shared library is missing debugging information. (gdb) info thread Id Target Id Frame * 4 Thread 6908.0x1950 0x7747000d in ntdll!LdrFindResource_U () from /cygdrive/c/Windows/SysWOW64/ntdll.dll 3 Thread 6908.0x1d8c 0x7747f8e5 in ntdll!RtlUpdateClonedSRWLock () from /cygdrive/c/Windows/SysWOW64/ntdll.dll 2 Thread 6908.0x1d34 0x7747f8b1 in ntdll!RtlUpdateClonedSRWLock () from /cygdrive/c/Windows/SysWOW64/ntdll.dll 1 Thread 6908.0x1344 0x7748013d in ntdll!RtlEnableEarlyCriticalSectionEventCreation () from /cygdrive/c/Windows/SysWOW64/ntdll.dll (gdb) thread 1 [Switching to thread 1 (Thread 6908.0x1344)] #0 0x7748013d in ntdll!RtlEnableEarlyCriticalSectionEventCreation () from /cygdrive/c/Windows/SysWOW64/ntdll.dll (gdb) bt #0 0x7748013d in ntdll!RtlEnableEarlyCriticalSectionEventCreation () from /cygdrive/c/Windows/SysWOW64/ntdll.dll #1 0x7748013d in ntdll!RtlEnableEarlyCriticalSectionEventCreation () from /cygdrive/c/Windows/SysWOW64/ntdll.dll #2 0x74bb0bdd in WaitForMultipleObjectsEx () from /cygdrive/c/Windows/syswow64/KERNELBASE.dll #3 0x00000002 in ?? () #4 0x00000001 in ?? () #5 0x00000000 in ?? () (gdb) thread 2 [Switching to thread 2 (Thread 6908.0x1d34)] #0 0x7747f8b1 in ntdll!RtlUpdateClonedSRWLock () from /cygdrive/c/Windows/SysWOW64/ntdll.dll (gdb) bt #0 0x7747f8b1 in ntdll!RtlUpdateClonedSRWLock () from /cygdrive/c/Windows/SysWOW64/ntdll.dll #1 0x7747f8b1 in ntdll!RtlUpdateClonedSRWLock () from /cygdrive/c/Windows/SysWOW64/ntdll.dll #2 0x74bb0a91 in WaitForSingleObjectEx () from /cygdrive/c/Windows/syswow64/KERNELBASE.dll #3 0x00000034 in ?? () #4 0x00000000 in ?? () (gdb) thread 3 [Switching to thread 3 (Thread 6908.0x1d8c)] #0 0x7747f8e5 in ntdll!RtlUpdateClonedSRWLock () from /cygdrive/c/Windows/SysWOW64/ntdll.dll (gdb) bt #0 0x7747f8e5 in ntdll!RtlUpdateClonedSRWLock () from /cygdrive/c/Windows/SysWOW64/ntdll.dll #1 0x7747f8e5 in ntdll!RtlUpdateClonedSRWLock () from /cygdrive/c/Windows/SysWOW64/ntdll.dll #2 0x74bad348 in ReadFile () from /cygdrive/c/Windows/syswow64/KERNELBASE.dll #3 0x00000118 in ?? () #4 0x00000000 in ?? () (gdb) thread 4 [Switching to thread 4 (Thread 6908.0x1950)] #0 0x7747000d in ntdll!LdrFindResource_U () from /cygdrive/c/Windows/SysWOW64/ntdll.dll (gdb) bt #0 0x7747000d in ntdll!LdrFindResource_U () from /cygdrive/c/Windows/SysWOW64/ntdll.dll #1 0x774ff896 in ntdll!RtlQueryTimeZoneInformation () from /cygdrive/c/Windows/SysWOW64/ntdll.dll #2 0x5dfded78 in ?? () #3 0x00000000 in ?? () Tom. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Intermittent failures retrieving process exit codes - snapshot test requested 2012-12-27 20:50 ` Tom Honermann @ 2012-12-29 21:57 ` Christopher Faylor 2013-01-01 1:45 ` Tom Honermann 0 siblings, 1 reply; 65+ messages in thread From: Christopher Faylor @ 2012-12-29 21:57 UTC (permalink / raw) To: cygwin On Thu, Dec 27, 2012 at 03:49:24PM -0500, Tom Honermann wrote: >When interrupting the test run, I'll often (but not always) get the >following error: > >c:\>test-strace.bat >test... >test... >test... >test... >--- Process 8092, exception 40010005 at 75E26D67 That is coming from strace and it's: /usr/include/w32api/ntstatus.h:#define DBG_CONTROL_C ((NTSTATUS)0x40010005) i.e., it's expected. >Additionally, some of the Cygwin gcc built utilities that I've built for >testing now occasionally hang upon interruption by ctrl-c. Basic >diagnostics courtesy of gdb follow. The hang should be fixed in the latest snapshot. cgf -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Intermittent failures retrieving process exit codes - snapshot test requested 2012-12-29 21:57 ` Christopher Faylor @ 2013-01-01 1:45 ` Tom Honermann 2013-01-01 5:36 ` Christopher Faylor 0 siblings, 1 reply; 65+ messages in thread From: Tom Honermann @ 2013-01-01 1:45 UTC (permalink / raw) To: cygwin On 12/29/2012 04:57 PM, Christopher Faylor wrote: > On Thu, Dec 27, 2012 at 03:49:24PM -0500, Tom Honermann wrote: >> When interrupting the test run, I'll often (but not always) get the >> following error: >> >> c:\>test-strace.bat >> test... >> test... >> test... >> test... >> --- Process 8092, exception 40010005 at 75E26D67 > > That is coming from strace and it's: > > /usr/include/w32api/ntstatus.h:#define DBG_CONTROL_C ((NTSTATUS)0x40010005) > > i.e., it's expected. Ah, sorry, I should have researched that further before reporting it. Thanks for the explanation. >> Additionally, some of the Cygwin gcc built utilities that I've built for >> testing now occasionally hang upon interruption by ctrl-c. Basic >> diagnostics courtesy of gdb follow. > > The hang should be fixed in the latest snapshot. I'm still seeing hangs in the latest code from CVS. The stack traces below are from WinDbg. I manually resolved the symbol references within the cygwin1 module using the linker generated .map file. Since the .map file does not include static functions, some of these may be incorrect - I didn't try and verify or correct for this. # ChildEBP RetAddr 00 00288bd0 758d0a91 ntdll!ZwWaitForSingleObject+0x15 01 00288c3c 76c11194 KERNELBASE!WaitForSingleObjectEx+0x98 02 00288c54 76c11148 kernel32!WaitForSingleObjectExImplementation+0x75 03 00288c68 610f1553 kernel32!WaitForSingleObject+0x12 04 00288cb8 6118e54d cygwin1!strtosigno+0x357 __ZN4muto7acquireEm muto::acquire(unsigned long) 05 00288cc8 610f17b2 cygwin1!alloca+0xbbc9 __ZN6dtable4lockEv dtable::lock() 06 00288d28 610eb717 cygwin1!strtosigno+0x5b6 __Z15close_all_filesb@4 close_all_files(bool) 07 00289a48 610eb92b cygwin1!sigfillset+0x7f3e __ZN16child_info_spawn6workerEPKcPKS1_S3_iii child_info_spawn::worker(char const*, char const* const*, char const* const*, int, int, int) 08 00289a88 6103af97 cygwin1!sigfillset+0x8152 _spawnve 09 0028ac28 61007b38 cygwin1!getenv+0x5293 _execlp 0a 0028ac48 61007ad5 cygwin1!setprogname+0x597d 0b 00000000 00000000 cygwin1!setprogname+0x591a # ChildEBP RetAddr 00 0071aafc 758cd348 ntdll!ZwReadFile+0x15 01 0071ab60 76c13ef7 KERNELBASE!ReadFile+0x118 02 0071aba8 610e7910 kernel32!ReadFileImplementation+0xf0 03 0071aca8 61003ec2 cygwin1!sigfillset+0x4137 __ZN15pending_signals4nextEv pending_signals::next() 04 0071ace8 61004057 cygwin1!setprogname+0x1d07 __ZN9cygthread8callfuncEb cygthread::callfunc(bool) 05 0071ad28 61004f61 cygwin1!setprogname+0x1e9c __ZN9cygthread4stubEPv@4 cygthread::stub(void*) 06 0071cd98 61004dbc cygwin1!setprogname+0x2da6 __ZN7_cygtls5call2EPFmPvS0_ES0_S0_A _cygtls::call2(unsigned long (*)(void*, void*), void*, void*) 07 0071ff68 61087074 cygwin1!setprogname+0x2c01 __ZN7_cygtls4callEPFmPvS0_ES0_A _cygtls::call(unsigned long (*)(void*, void*), void*) 08 0071ff88 76c1339a cygwin1!setgrent+0x283c 09 0071ff94 779a9ef2 kernel32!BaseThreadInitThunk+0xe 0a 0071ffd4 779a9ec5 ntdll!__RtlUserThreadStart+0x70 0b 0071ffec 00000000 ntdll!_RtlUserThreadStart+0x1b Tom. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Intermittent failures retrieving process exit codes - snapshot test requested 2013-01-01 1:45 ` Tom Honermann @ 2013-01-01 5:36 ` Christopher Faylor 2013-01-02 19:15 ` Tom Honermann 0 siblings, 1 reply; 65+ messages in thread From: Christopher Faylor @ 2013-01-01 5:36 UTC (permalink / raw) To: cygwin On Mon, Dec 31, 2012 at 08:44:56PM -0500, Tom Honermann wrote: >On 12/29/2012 04:57 PM, Christopher Faylor wrote: >> On Thu, Dec 27, 2012 at 03:49:24PM -0500, Tom Honermann wrote: >>> When interrupting the test run, I'll often (but not always) get the >>> following error: >>> >>> c:\>test-strace.bat >>> test... >>> test... >>> test... >>> test... >>> --- Process 8092, exception 40010005 at 75E26D67 >> >> That is coming from strace and it's: >> >> /usr/include/w32api/ntstatus.h:#define DBG_CONTROL_C ((NTSTATUS)0x40010005) >> >> i.e., it's expected. > >Ah, sorry, I should have researched that further before reporting it. >Thanks for the explanation. > >>> Additionally, some of the Cygwin gcc built utilities that I've built for >>> testing now occasionally hang upon interruption by ctrl-c. Basic >>> diagnostics courtesy of gdb follow. >> >> The hang should be fixed in the latest snapshot. > >I'm still seeing hangs in the latest code from CVS. The stack traces >below are from WinDbg. I'm not asking you to build this yourself. I have no way to know how you are building this. Please just use the snapshots at http://cygwin.com/snapshots/ >I manually resolved the symbol references within >the cygwin1 module using the linker generated .map file. Since the .map >file does not include static functions, some of these may be incorrect - >I didn't try and verify or correct for this. Thanks for trying, but the output below is garbled and not really useful. If you are not going to dive in and attempt to fix code yourself then all we normally need is a simple test case. WinDbg is not really appropriate for debugging Cygwin applications. cgf > # ChildEBP RetAddr >00 00288bd0 758d0a91 ntdll!ZwWaitForSingleObject+0x15 >01 00288c3c 76c11194 KERNELBASE!WaitForSingleObjectEx+0x98 >02 00288c54 76c11148 kernel32!WaitForSingleObjectExImplementation+0x75 >03 00288c68 610f1553 kernel32!WaitForSingleObject+0x12 >04 00288cb8 6118e54d cygwin1!strtosigno+0x357 > __ZN4muto7acquireEm > muto::acquire(unsigned long) >[snip] -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Intermittent failures retrieving process exit codes - snapshot test requested 2013-01-01 5:36 ` Christopher Faylor @ 2013-01-02 19:15 ` Tom Honermann 2013-01-02 20:48 ` Christopher Faylor 0 siblings, 1 reply; 65+ messages in thread From: Tom Honermann @ 2013-01-02 19:15 UTC (permalink / raw) To: cygwin On 01/01/2013 12:36 AM, Christopher Faylor wrote: > On Mon, Dec 31, 2012 at 08:44:56PM -0500, Tom Honermann wrote: >> I'm still seeing hangs in the latest code from CVS. The stack traces >> below are from WinDbg. > > I'm not asking you to build this yourself. I have no way to know how > you are building this. Please just use the snapshots at > > http://cygwin.com/snapshots/ I was building it myself so that I could debug it without having to specify debug source paths and such. I believe my builds are not unconventional. I used options that disabled frame pointer omission so that the resulting binaries could be debugged with non-gcc debuggers. $ mkdir build $ cd build $ ../src/configure \ CFLAGS="-g" \ CXXFLAGS="-g" \ CFLAGS_FOR_TARGET="-g" \ CXXFLAGS_FOR_TARGET="-g" \ --enable-debugging \ --prefix=$HOME/src/cygwin-latest/install -v $ make $ make install >> I manually resolved the symbol references within >> the cygwin1 module using the linker generated .map file. Since the .map >> file does not include static functions, some of these may be incorrect - >> I didn't try and verify or correct for this. > > Thanks for trying, but the output below is garbled and not really > useful. If you are not going to dive in and attempt to fix code > yourself then all we normally need is a simple test case. WinDbg > is not really appropriate for debugging Cygwin applications. The output below is not garbled, but I didn't explain it clearly enough. Lines with frame numbers come directly from WinDbg. Since WinDbg is unable to resolve symbols to gcc generated debug info, the symbol references within the cygwin1 module are incorrect. In those cases, I manually resolved the instruction pointer address using the RetAddr value from the prior frame and searching the linker generated cygwin1.map file. I then pasted the mangled name on a line following the WinDbg line (with the incorrect symbol name) and, if the symbol is a C++ one, the unmangled name on an additional line. For the stack fragment below, address 610f1553 == strtosigno+0x357 == __ZN4muto7acquireEm == muto::acquire(unsigned long). I did not translate offsets for the functions as I resolved them, nor did I try and verify they are correct (ie, that the return address is not for a static function that is not represented in the .map file) >> # ChildEBP RetAddr >> 00 00288bd0 758d0a91 ntdll!ZwWaitForSingleObject+0x15 >> 01 00288c3c 76c11194 KERNELBASE!WaitForSingleObjectEx+0x98 >> 02 00288c54 76c11148 kernel32!WaitForSingleObjectExImplementation+0x75 >> 03 00288c68 610f1553 kernel32!WaitForSingleObject+0x12 >> 04 00288cb8 6118e54d cygwin1!strtosigno+0x357 >> __ZN4muto7acquireEm >> muto::acquire(unsigned long) >> [snip] The reason for using WinDbg is that, from what I understand, gdb is unable to produce accurate stack traces when the call stack includes frames for functions that omit the frame pointer and do not have debug info that gdb can process. I believe many Microsoft provided functions in ntdll, kernel32, kernelbase, etc... do omit the frame pointer and only provide debug info in the PDB format - which gdb is unable to use. Compiling Cygwin without frame pointer omission, and using WinDbg therefore provides the most accurate stack trace. If I am incorrect about any of this, I would very much appreciate a correction and/or explanation. I downloaded the latest snapshot (2012-12-31 18:44:57 UTC) and was able to reproduce several issues which are described below. All of these issues occur when using ctrl-c to interrupt the infinite loop in the test case(s) I've been using to debug inconsistent exit codes. When ctrl-c is pressed, I've observed the following: 1) Programs are (generally) terminated as expected. cmd.exe prompts to "Terminate batch job" as expected. 2) An access violation occurs and a processor context is dumped to the console. I do not yet have stack traces for these cases. 3) One of the processes hangs. access violations occur in ~20% of test runs. Hangs occur in ~5% of test runs. I did not provide a test case previously because I don't have an automated reproducer at present. All sources needed to reproduce the issues are below. The test case uses a .bat file to avoid dependencies on bash so as to minimally isolate the problem. To reproduce the issues, copy test.bat, false-cygwin32.exe, and expect-false-execve-cygwin32.exe to a Cygwin bin directory and run test.bat from a cmd.exe console. Press ctrl-c to interrupt the test. Repeat until problems are observed. I have not been able to reproduce these symptoms when running the test via a MinTTY console. I have been unable to get useful stack traces from hung processes using gdb. gdb reports that the debug information in cygwin1-20130102.dbg.bz2 does not match (CRC mismatch) the cygwin1.dll module in cygwin-inst-20130102.tar.bz2. $ cat expect-false-execve.c #include <errno.h> #include <stdio.h> #include <sys/wait.h> #include <unistd.h> int main(int argc, char *argv[]) { pid_t child_pid, wait_pid; int result, child_status; if (argc != 2) { fprintf(stderr, "expect-false: Missing or too many arguments\n"); return 127; } child_pid = fork(); if (child_pid == -1) { fprintf(stderr, "expect-false: fork failed. errno=%d\n", errno); return 127; } else if (child_pid == 0) { result = execlp(argv[1], argv[1], NULL); if (result == -1) { fprintf(stderr, "expect-false: execlp failed. errno=%d\n", errno); } _exit(127); } do { wait_pid = waitpid(child_pid, &child_status, 0); } while( (wait_pid == -1 && errno == EINTR) || (wait_pid == child_pid && !(WIFEXITED(child_status) || WIFSIGNALED(child_status))) ); if (wait_pid == -1) { fprintf(stderr, "expect-false: waitpid failed. errno=%d\n", errno); return 127; } if (!WIFEXITED(child_status)) { fprintf(stderr, "expect-false: child process did not exit normally\n"); return 127; } if (WEXITSTATUS(child_status) != 1) { fprintf(stderr, "expect-false: unexpected exit code: %d\n", child_status); } return WEXITSTATUS(child_status); } $ cat false.c #include <stdio.h> int main() { printf("myfalse\n"); return 1; } $ cat test.bat @echo off setlocal set PATH=%CD%;%PATH% :loop echo test... expect-false-execve-cygwin32.exe false-cygwin32 if not errorlevel 1 ( echo exiting... exit /B 1 ) goto loop $ gcc -o expect-false-execve-cygwin32.exe expect-false-execve.c $ gcc -o false-cygwin32.exe false.c From a cmd.exe console: (press ctrl-c once the test is running) C:\...\cygwin\bin>test test... myfalse test... myfalse ... Tom. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Intermittent failures retrieving process exit codes - snapshot test requested 2013-01-02 19:15 ` Tom Honermann @ 2013-01-02 20:48 ` Christopher Faylor 2013-01-02 20:53 ` Daniel Colascione 2013-01-02 21:25 ` Tom Honermann 0 siblings, 2 replies; 65+ messages in thread From: Christopher Faylor @ 2013-01-02 20:48 UTC (permalink / raw) To: cygwin I managed to duplicate a hang by really stressing ctrl-c a loop. It uncovers some rather amazing Windows behavior which I have to think about. Apparently ExitThread can be called recursively within the thread that Windows creates to handle CTRL-C. cgf -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Intermittent failures retrieving process exit codes - snapshot test requested 2013-01-02 20:48 ` Christopher Faylor @ 2013-01-02 20:53 ` Daniel Colascione 2013-01-02 21:41 ` Christopher Faylor 2013-01-02 21:25 ` Tom Honermann 1 sibling, 1 reply; 65+ messages in thread From: Daniel Colascione @ 2013-01-02 20:53 UTC (permalink / raw) To: cygwin [-- Attachment #1: Type: text/plain, Size: 507 bytes --] On 1/2/13 12:48 PM, Christopher Faylor wrote: > I managed to duplicate a hang by really stressing ctrl-c a loop. It > uncovers some rather amazing Windows behavior which I have to think > about. Apparently ExitThread can be called recursively within the > thread that Windows creates to handle CTRL-C. What do you mean? ExitThread should never return, and I can't imagine anything on the thread termination path calling ExitThread again, especially not once the thread jumps to kernel mode. [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 235 bytes --] ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Intermittent failures retrieving process exit codes - snapshot test requested 2013-01-02 20:53 ` Daniel Colascione @ 2013-01-02 21:41 ` Christopher Faylor 0 siblings, 0 replies; 65+ messages in thread From: Christopher Faylor @ 2013-01-02 21:41 UTC (permalink / raw) To: cygwin On Wed, Jan 02, 2013 at 12:53:11PM -0800, Daniel Colascione wrote: >On 1/2/13 12:48 PM, Christopher Faylor wrote: >> I managed to duplicate a hang by really stressing ctrl-c a loop. It >> uncovers some rather amazing Windows behavior which I have to think >> about. Apparently ExitThread can be called recursively within the >> thread that Windows creates to handle CTRL-C. > >What do you mean? ExitThread should never return, and I can't >imagine anything on the thread termination path calling ExitThread >again, especially not once the thread jumps to kernel mode. Sorry, I was just speculating about what it looked like. I'm still debugging the problem. cgf -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Intermittent failures retrieving process exit codes - snapshot test requested 2013-01-02 20:48 ` Christopher Faylor 2013-01-02 20:53 ` Daniel Colascione @ 2013-01-02 21:25 ` Tom Honermann 2013-01-15 22:17 ` Intermittent failures with ctrl-c (was: retrieving process exit codes) Tom Honermann 1 sibling, 1 reply; 65+ messages in thread From: Tom Honermann @ 2013-01-02 21:25 UTC (permalink / raw) To: cygwin On 01/02/2013 03:48 PM, Christopher Faylor wrote: > I managed to duplicate a hang by really stressing ctrl-c a loop. It > uncovers some rather amazing Windows behavior which I have to think > about. Apparently ExitThread can be called recursively within the > thread that Windows creates to handle CTRL-C. I'm glad you could reproduce. Based on your description, this sounds like a separate issue and not a regression introduced by the workarounds you put in place for the ExitProcess / ExitThread race. Correct? I wonder if this is the same issue I'm experiencing though. I'm only pressing ctrl-c once and it sounds like you might be deliving a ctrl-c to the same process multiple times. That may not be relevant to the root cause however. Tom. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Intermittent failures with ctrl-c (was: retrieving process exit codes) 2013-01-02 21:25 ` Tom Honermann @ 2013-01-15 22:17 ` Tom Honermann 2013-01-16 2:04 ` Christopher Faylor 0 siblings, 1 reply; 65+ messages in thread From: Tom Honermann @ 2013-01-15 22:17 UTC (permalink / raw) To: cygwin On 01/02/2013 04:24 PM, Tom Honermann wrote: > On 01/02/2013 03:48 PM, Christopher Faylor wrote: >> I managed to duplicate a hang by really stressing ctrl-c a loop. It >> uncovers some rather amazing Windows behavior which I have to think >> about. Apparently ExitThread can be called recursively within the >> thread that Windows creates to handle CTRL-C. > > I'm glad you could reproduce. Based on your description, this sounds > like a separate issue and not a regression introduced by the workarounds > you put in place for the ExitProcess / ExitThread race. Correct? > > I wonder if this is the same issue I'm experiencing though. I'm only > pressing ctrl-c once and it sounds like you might be deliving a ctrl-c > to the same process multiple times. That may not be relevant to the > root cause however. I noticed that some changes were checked in related to signal handling and process termination recently, so I downloaded the most recent snapshot (20130114) and tested again. I was still able to produce hanging processes (including hangs of strace.exe) by hitting ctrl-c in a mintty window while Cygwin processes ran in an infinite loop inside of a .bat file. I was able to produce a hang ~1 out of 20 times. If you are still working on this, then I apologize for the noise. Otherwise, assuming you are still looking at this, if I can provide something further that would be helpful, please let me know. Tom. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Intermittent failures with ctrl-c (was: retrieving process exit codes) 2013-01-15 22:17 ` Intermittent failures with ctrl-c (was: retrieving process exit codes) Tom Honermann @ 2013-01-16 2:04 ` Christopher Faylor 2013-01-16 16:38 ` Intermittent failures with ctrl-c Tom Honermann 0 siblings, 1 reply; 65+ messages in thread From: Christopher Faylor @ 2013-01-16 2:04 UTC (permalink / raw) To: cygwin On Tue, Jan 15, 2013 at 05:16:57PM -0500, Tom Honermann wrote: >I noticed that some changes were checked in related to signal handling >and process termination recently, so I downloaded the most recent >snapshot (20130114) and tested again. I was still able to produce >hanging processes (including hangs of strace.exe) by hitting ctrl-c in a >mintty window while Cygwin processes ran in an infinite loop inside of a >.bat file. I was able to produce a hang ~1 out of 20 times. How does one run a .bat file inside mintty which handles CTRL-C? AFAIK, a CTRL-C will just cause the .bat file to exit when run under bash. cgf -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Intermittent failures with ctrl-c 2013-01-16 2:04 ` Christopher Faylor @ 2013-01-16 16:38 ` Tom Honermann 2013-01-16 16:53 ` marco atzeri 2013-01-16 19:14 ` Christopher Faylor 0 siblings, 2 replies; 65+ messages in thread From: Tom Honermann @ 2013-01-16 16:38 UTC (permalink / raw) To: cygwin On 01/15/2013 09:04 PM, Christopher Faylor wrote: > On Tue, Jan 15, 2013 at 05:16:57PM -0500, Tom Honermann wrote: >> I noticed that some changes were checked in related to signal handling >> and process termination recently, so I downloaded the most recent >> snapshot (20130114) and tested again. I was still able to produce >> hanging processes (including hangs of strace.exe) by hitting ctrl-c in a >> mintty window while Cygwin processes ran in an infinite loop inside of a >> .bat file. I was able to produce a hang ~1 out of 20 times. > > How does one run a .bat file inside mintty which handles CTRL-C? AFAIK, > a CTRL-C will just cause the .bat file to exit when run under bash. Here is the test case: 1) Install the latest snapshot 2) Copy bash.exe, false.exe, and their dependent DLLs from a Cygwin install into the usr/bin directory of the snapshot. For me this consisted of: bash.exe cygintl-8.dll cygiconv-2.dll cygreadline7.dll cygncurses-10.dll cygncursesw-10.dll cyggcc_s-1.dll false.exe 3) Create 'test.bat' in the usr/bin directory of the snapshot with the following contents: @echo off setlocal set PATH=%CD%;%PATH% :loop echo test... bash -c false if not errorlevel 1 ( echo exiting... exit /B 1 ) goto loop 4) Launch mintty using an existing Cygwin installation. Naturally, this will run a shell from the existing Cygwin install. 5) Change directories to the usr/bin directory of the snapshot. 6) Start task manager or some other process monitoring tool and keep it running. Run ./test.bat from the Cygwin shell running within mintty and interrupt it with ctrl-c. Repeat until you see a new bash.exe or false.exe process persisting following the interrupt. You'll likely have multiple bash processes running. If you are able to reproduce, you should see one with a command line of 'bash -c false'. Alternatively, if your process monitoring tool shows the path to the executable, you'll be able to identify it as the one from the usr/bin directory of the snapshot. I rather doubt that the use of a .bat file is necessary to reproduce this hang, but I haven't tried producing a test case that doesn't use a .bat file. This is a test case I was using when debugging the intermittent incorrect exit code issue. Tom. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Intermittent failures with ctrl-c 2013-01-16 16:38 ` Intermittent failures with ctrl-c Tom Honermann @ 2013-01-16 16:53 ` marco atzeri 2013-01-16 17:42 ` Tom Honermann 2013-01-16 19:14 ` Christopher Faylor 1 sibling, 1 reply; 65+ messages in thread From: marco atzeri @ 2013-01-16 16:53 UTC (permalink / raw) To: cygwin On 1/16/2013 5:37 PM, Tom Honermann wrote: > > 4) Launch mintty using an existing Cygwin installation. Naturally, this > will run a shell from the existing Cygwin install. > > 5) Change directories to the usr/bin directory of the snapshot. > This will cause a cygwin1.dll collision between the two versions Nothing is guarantee to work fine > > Tom. > Marco -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Intermittent failures with ctrl-c 2013-01-16 16:53 ` marco atzeri @ 2013-01-16 17:42 ` Tom Honermann 2013-01-16 18:05 ` Earnie Boyd 0 siblings, 1 reply; 65+ messages in thread From: Tom Honermann @ 2013-01-16 17:42 UTC (permalink / raw) To: cygwin On 01/16/2013 11:53 AM, marco atzeri wrote: > On 1/16/2013 5:37 PM, Tom Honermann wrote: > >> >> 4) Launch mintty using an existing Cygwin installation. Naturally, this >> will run a shell from the existing Cygwin install. >> >> 5) Change directories to the usr/bin directory of the snapshot. >> > > This will cause a cygwin1.dll collision between the two versions > Nothing is guarantee to work fine Can you elaborate? Cygwin supports multiple installations just fine these days. Use of a .bat file (an intervening cmd.exe process) should isolate the environments for this test. Regardless, I was also able to produce a hang in bash running the same .bat file from a cmd.exe prompt using only the snapshot install and the copied bash.exe, false.exe, and dependent binaries - no mintty. The hung bash.exe process eventually timed out with an error message: 5 [unknown (0x176C)] bash 2000 sig_send: wait for sig_complete event failed, signal 6, rc 258, Win32 error 0 Tom. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Intermittent failures with ctrl-c 2013-01-16 17:42 ` Tom Honermann @ 2013-01-16 18:05 ` Earnie Boyd 2013-01-16 18:51 ` Tom Honermann 0 siblings, 1 reply; 65+ messages in thread From: Earnie Boyd @ 2013-01-16 18:05 UTC (permalink / raw) To: cygwin On Wed, Jan 16, 2013 at 12:42 PM, Tom Honermann <thonermann@coverity.com> wrote: > On 01/16/2013 11:53 AM, marco atzeri wrote: >> >> On 1/16/2013 5:37 PM, Tom Honermann wrote: >> >>> >>> 4) Launch mintty using an existing Cygwin installation. Naturally, this >>> will run a shell from the existing Cygwin install. >>> >>> 5) Change directories to the usr/bin directory of the snapshot. >>> >> >> This will cause a cygwin1.dll collision between the two versions >> Nothing is guarantee to work fine > > > Can you elaborate? Cygwin supports multiple installations just fine these > days. Use of a .bat file (an intervening cmd.exe process) should isolate > the environments for this test. > While you can multiple installations you cannot mix the environments. You did not copy mintty so you started it in one instance and then went to another instance which will cause a clash of resources. > Regardless, I was also able to produce a hang in bash running the same .bat > file from a cmd.exe prompt using only the snapshot install and the copied > bash.exe, false.exe, and dependent binaries - no mintty. The hung bash.exe > process eventually timed out with an error message: > > 5 [unknown (0x176C)] bash 2000 sig_send: wait for sig_complete event failed, > signal 6, rc 258, Win32 error 0 Looking at the list of DLL you copied you may still be seeing a conflict with which DLL is in use. Do you see a hang if you remain in usr/bin and not changing directories to your copied files? -- Earnie -- https://sites.google.com/site/earnieboyd -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Intermittent failures with ctrl-c 2013-01-16 18:05 ` Earnie Boyd @ 2013-01-16 18:51 ` Tom Honermann 2013-01-16 18:59 ` Christopher Faylor 0 siblings, 1 reply; 65+ messages in thread From: Tom Honermann @ 2013-01-16 18:51 UTC (permalink / raw) To: cygwin On 01/16/2013 01:05 PM, Earnie Boyd wrote: > On Wed, Jan 16, 2013 at 12:42 PM, Tom Honermann <thonermann@coverity.com> wrote: >> On 01/16/2013 11:53 AM, marco atzeri wrote: >>> >>> On 1/16/2013 5:37 PM, Tom Honermann wrote: >>> >>>> >>>> 4) Launch mintty using an existing Cygwin installation. Naturally, this >>>> will run a shell from the existing Cygwin install. >>>> >>>> 5) Change directories to the usr/bin directory of the snapshot. >>>> >>> >>> This will cause a cygwin1.dll collision between the two versions >>> Nothing is guarantee to work fine >> >> >> Can you elaborate? Cygwin supports multiple installations just fine these >> days. Use of a .bat file (an intervening cmd.exe process) should isolate >> the environments for this test. >> > > While you can multiple installations you cannot mix the environments. > You did not copy mintty so you started it in one instance and then > went to another instance which will cause a clash of resources. Can you elaborate on what resources you are referring to? I fail to see how the Cygwin binaries run via the .bat file could conflict with mintty (or the top level bash process) since the intervening cmd.exe execution would have blocked inheritance of Cygwin related resources, primarily since fork() isn't used to create these child processes. My understanding is that shared Cygwin resources are keyed off of the location of the cygwin1.dll loaded into the Cygwin process. If two Cygwin processes run with different cygwin1.dll instances, they should not share resources. I can see a case for there being a problem if a Cygwin process creates another Cygwin process via fork() and that child process is run with a different cygwin1.dll instance, but that isn't the case here. The only other case I can think of would require Cygwin looking at the process tree (stepping up through non-Cygwin processes) to get at resources. That would be quite expensive on Windows. >> Regardless, I was also able to produce a hang in bash running the same .bat >> file from a cmd.exe prompt using only the snapshot install and the copied >> bash.exe, false.exe, and dependent binaries - no mintty. The hung bash.exe >> process eventually timed out with an error message: >> >> 5 [unknown (0x176C)] bash 2000 sig_send: wait for sig_complete event failed, >> signal 6, rc 258, Win32 error 0 > > Looking at the list of DLL you copied you may still be seeing a > conflict with which DLL is in use. I don't see how that would be the case. If it were, then it would not be possible (in general) to have multiple Cygwin installations with unrelated processes running concurrently from each installation. > Do you see a hang if you remain in > usr/bin and not changing directories to your copied files? I believe that would be equivalent to testing in my (non-snapshot) Cygwin installation. The goal is to test the snapshot. Tom. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Intermittent failures with ctrl-c 2013-01-16 18:51 ` Tom Honermann @ 2013-01-16 18:59 ` Christopher Faylor 2013-01-16 20:19 ` Tom Honermann 0 siblings, 1 reply; 65+ messages in thread From: Christopher Faylor @ 2013-01-16 18:59 UTC (permalink / raw) To: cygwin On Wed, Jan 16, 2013 at 01:51:11PM -0500, Tom Honermann wrote: >Can you elaborate on what resources you are referring to? I fail to >see how the Cygwin binaries run via the .bat file could conflict with >mintty (or the top level bash process) since the intervening cmd.exe >execution would have blocked inheritance of Cygwin related resources, >primarily since fork() isn't used to create these child processes. Here is a very basic issue: If you are going to be submitting a bug report you should be making things as simple and as clear as possible. The fact that there are two cygwin DLLs in play here adds additional confusion and complication. If we now have to enter into a theoretical discussion about what should be allowed, we have needlessly strayed from the initial problem. Given the number of historical problems we have had with mixing two versions of Cygwin and given that our consistent guidance is to only have one on your computer, there is no reason to get into a discussion about what is allowed. Just use one version. You can easily switch back and forth using windows tools. cgf -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Intermittent failures with ctrl-c 2013-01-16 18:59 ` Christopher Faylor @ 2013-01-16 20:19 ` Tom Honermann 2013-01-16 22:23 ` Christopher Faylor 0 siblings, 1 reply; 65+ messages in thread From: Tom Honermann @ 2013-01-16 20:19 UTC (permalink / raw) To: cygwin On 01/16/2013 01:59 PM, Christopher Faylor wrote: > On Wed, Jan 16, 2013 at 01:51:11PM -0500, Tom Honermann wrote: >> Can you elaborate on what resources you are referring to? I fail to >> see how the Cygwin binaries run via the .bat file could conflict with >> mintty (or the top level bash process) since the intervening cmd.exe >> execution would have blocked inheritance of Cygwin related resources, >> primarily since fork() isn't used to create these child processes. > > Here is a very basic issue: If you are going to be submitting a bug > report you should be making things as simple and as clear as possible. I'm trying. What you are suggesting implies that all testing of snapshots either be done with a cmd.exe prompt (and copying enough of another Cygwin installation into the snapshot), or updating the host Cygwin installation. My host installation is used for production purposes and I don't have spare machines available for other testing. I'm not messing with it. I am aware of the snapshot guidance: http://cygwin.com/faq-nochunks.html#faq.setup.snapshots > The fact that there are two cygwin DLLs in play here adds additional > confusion and complication. If we now have to enter into a theoretical > discussion about what should be allowed, we have needlessly strayed from > the initial problem. > > Given the number of historical problems we have had with mixing two > versions of Cygwin and given that our consistent guidance is to > only have one on your computer, there is no reason to get into a > discussion about what is allowed. Just use one version. You > can easily switch back and forth using windows tools. I previously mentioned that problems can be duplicated without mintty. Here are detailed steps for how to reproduce without mintty. 1) Install the latest snapshot 2) Copy bash.exe, false.exe, and their dependent DLLs from a Cygwin install into the usr/bin directory of the snapshot. For me this consisted of: bash.exe cygintl-8.dll cygiconv-2.dll cygreadline7.dll cygncurses-10.dll cygncursesw-10.dll cyggcc_s-1.dll false.exe 3) Shutdown all other Cygwin processes. 4) Create 'test.bat' in the usr/bin directory of the snapshot with the following contents: @echo off setlocal set PATH=%CD%;%PATH% :loop echo test... bash -c false if not errorlevel 1 ( echo exiting... exit /B 1 ) goto loop 5 Start a cmd.exe prompt. 6) Change directories to the usr/bin directory of the snapshot. 7) Start task manager or some other process monitoring tool and keep it running. Run ./test.bat from the cmd.exe prompt and interrupt it with ctrl-c. Repeat until you see a new bash.exe or false.exe process persisting following the interrupt. It took me 20 or so tries re-running test.bat and interrupting it before I was able to produce a hanging/abandoned process. I don't know how to make things any simpler or clearer than this. Tom. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Intermittent failures with ctrl-c 2013-01-16 20:19 ` Tom Honermann @ 2013-01-16 22:23 ` Christopher Faylor 2013-01-18 20:12 ` Tom Honermann 0 siblings, 1 reply; 65+ messages in thread From: Christopher Faylor @ 2013-01-16 22:23 UTC (permalink / raw) To: cygwin On Wed, Jan 16, 2013 at 03:18:47PM -0500, Tom Honermann wrote: >I previously mentioned that problems can be duplicated without mintty. >Here are detailed steps for how to reproduce without mintty. I was responding to your latest bug report which mentioned mintty. I managed to duplicate a hang by changing your .bat file to use "sleep 2" rather than false. I'm investigating now. cgf -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Intermittent failures with ctrl-c 2013-01-16 22:23 ` Christopher Faylor @ 2013-01-18 20:12 ` Tom Honermann 2013-01-19 5:58 ` Christopher Faylor 0 siblings, 1 reply; 65+ messages in thread From: Tom Honermann @ 2013-01-18 20:12 UTC (permalink / raw) To: cygwin On 01/16/2013 05:23 PM, Christopher Faylor wrote: > On Wed, Jan 16, 2013 at 03:18:47PM -0500, Tom Honermann wrote: > I managed to duplicate a hang by changing your .bat file to use "sleep > 2" rather than false. I'm investigating now. I noticed that you checked in some additional changes on the 16th that look related to this, so I tested again with today's snapshot (20130118). I was still able to produce hangs using the same test case. The symptoms are slightly different than I had seen previously. bash hung 2 out of the ~60 times I interrupted the test. No error messages were displayed this time. Upon pressing ctrl-c, bash hung for 60 seconds. I was then greeted with the "Terminate batch job" prompt and responding 'Y' terminated the process tree as expected. Pressing ctrl-c while bash was hung for that 60 seconds appeared to have no affect. My apologies for this distraction if you don't yet expect this to be fixed. Tom. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Intermittent failures with ctrl-c 2013-01-18 20:12 ` Tom Honermann @ 2013-01-19 5:58 ` Christopher Faylor 2013-01-20 22:09 ` Tom Honermann 0 siblings, 1 reply; 65+ messages in thread From: Christopher Faylor @ 2013-01-19 5:58 UTC (permalink / raw) To: cygwin On Fri, Jan 18, 2013 at 03:11:03PM -0500, Tom Honermann wrote: >On 01/16/2013 05:23 PM, Christopher Faylor wrote: >> On Wed, Jan 16, 2013 at 03:18:47PM -0500, Tom Honermann wrote: >> I managed to duplicate a hang by changing your .bat file to use "sleep >> 2" rather than false. I'm investigating now. > >I noticed that you checked in some additional changes on the 16th that >look related to this, so I tested again with today's snapshot (20130118). I thought I sent a "try a snapshot" but I must have been hallucinating again. >I was still able to produce hangs using the same test case. The >symptoms are slightly different than I had seen previously. bash hung 2 >out of the ~60 times I interrupted the test. No error messages were >displayed this time. Upon pressing ctrl-c, bash hung for 60 seconds. I >was then greeted with the "Terminate batch job" prompt and responding >'Y' terminated the process tree as expected. Pressing ctrl-c while bash >was hung for that 60 seconds appeared to have no affect. The hang should be fixed in the upcoming snapshot. cgf -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Intermittent failures with ctrl-c 2013-01-19 5:58 ` Christopher Faylor @ 2013-01-20 22:09 ` Tom Honermann 2013-01-23 3:20 ` Tom Honermann 0 siblings, 1 reply; 65+ messages in thread From: Tom Honermann @ 2013-01-20 22:09 UTC (permalink / raw) To: cygwin On 01/19/2013 12:58 AM, Christopher Faylor wrote: > On Fri, Jan 18, 2013 at 03:11:03PM -0500, Tom Honermann wrote: >> On 01/16/2013 05:23 PM, Christopher Faylor wrote: >>> On Wed, Jan 16, 2013 at 03:18:47PM -0500, Tom Honermann wrote: >>> I managed to duplicate a hang by changing your .bat file to use "sleep >>> 2" rather than false. I'm investigating now. >> >> I noticed that you checked in some additional changes on the 16th that >> look related to this, so I tested again with today's snapshot (20130118). > > I thought I sent a "try a snapshot" but I must have been hallucinating > again. > >> I was still able to produce hangs using the same test case. The >> symptoms are slightly different than I had seen previously. bash hung 2 >> out of the ~60 times I interrupted the test. No error messages were >> displayed this time. Upon pressing ctrl-c, bash hung for 60 seconds. I >> was then greeted with the "Terminate batch job" prompt and responding >> 'Y' terminated the process tree as expected. Pressing ctrl-c while bash >> was hung for that 60 seconds appeared to have no affect. > > The hang should be fixed in the upcoming snapshot. Snapshot 20130119 appears to have addressed most of the cases I've witnessed. However, I was still able to reproduce another case. As before, one of the processes is being left running when the rest are terminated. The "abandoned" process appears to be in a live-lock state with two threads (threads 1 and 2) running at 100%. Of particular interest is that each time I press ctrl-c in the cmd.exe console this process was spawned from, a new thread appears in the process even though this program is no longer a foreground process and all other Cygwin processes have terminated. The new threads never exit. Same test case as before. However, since reproducing this may be challenging, I dug in to try and get some details that might help with reproducing it. It looks like thread 1 was interrupted while in a call to free(). Both thread 1 and 2 appear to be stuck looping on calls to yield(). Thread 3 appears to be stuck in a call to WriteFile. I suspect thread 3 was created by the initial ctrl-c event, but I'm not able to get an accurate stack trace for this thread to prove that. Threads 4 and up correspond to new threads created for new ctrl-c events. The following stack traces correspond to the above mentioned snapshot with cygwin1.dbg (from cygwin1-20130119.dbg.bz2) in place. (gdb) thread 1 [Switching to thread 1 (Thread 5344.0x1878)] #0 0x7767fbfa in ntdll!RtlUpdateClonedSRWLock () from /cygdrive/c/Windows/SysWOW64/ntdll.dll (gdb) bt #0 0x7767fbfa in ntdll!RtlUpdateClonedSRWLock () from /cygdrive/c/Windows/SysWOW64/ntdll.dll #1 0x7767fbfa in ntdll!RtlUpdateClonedSRWLock () from /cygdrive/c/Windows/SysWOW64/ntdll.dll #2 0x76792ed6 in KERNELBASE!GetThreadUILanguage () from /cygdrive/c/Windows/syswow64/KERNELBASE.dll #3 0x61087581 in yield () at /netrel/src/cygwin-snapshot-20130119-1/winsup/cygwin/miscfuncs.cc:243 #4 0x610d6d9c in _sigfe () from /home/thonermann/cygwin/snapshot/usr/bin/cygwin1.dll #5 0x61083180 in free () at /netrel/src/cygwin-snapshot-20130119-1/winsup/cygwin/malloc_wrapper.cc:43 #6 0x00000010 in ?? () #7 0x00000000 in ?? () (gdb) thread 2 [Switching to thread 2 (Thread 5344.0x1ac8)] #0 0x7767f99e in ntdll!RtlUpdateClonedSRWLock () from /cygdrive/c/Windows/SysWOW64/ntdll.dll (gdb) bt #0 0x7767f99e in ntdll!RtlUpdateClonedSRWLock () from /cygdrive/c/Windows/SysWOW64/ntdll.dll #1 0x7767f99e in ntdll!RtlUpdateClonedSRWLock () from /cygdrive/c/Windows/SysWOW64/ntdll.dll #2 0x76793a5e in SetThreadPriority () from /cygdrive/c/Windows/syswow64/KERNELBASE.dll #3 0x6108759b in yield () at /netrel/src/cygwin-snapshot-20130119-1/winsup/cygwin/miscfuncs.cc:244 #4 0x610d6eb4 in _cygtls::lock() () from /home/thonermann/cygwin/snapshot/usr/bin/cygwin1.dll #5 0x610302ee in sigpacket::setup_handler (this=0x95ac04, handler=0x6102fdc0 <signal_exit(int, siginfo_t*)>, siga=..., tls=0x28ce64) at /netrel/src/cygwin-snapshot-20130119-1/winsup/cygwin/exceptions.cc:796 #6 0x610319d8 in sigpacket::process (this=0x95ac04) at /netrel/src/cygwin-snapshot-20130119-1/winsup/cygwin/exceptions.cc:1266 #7 0x610dd2ac in wait_sig () at /netrel/src/cygwin-snapshot-20130119-1/winsup/cygwin/sigproc.cc:1389 #8 0x61003ea5 in cygthread::callfunc (this=0x6118b400, issimplestub=<optimized out>) at /netrel/src/cygwin-snapshot-20130119-1/winsup/cygwin/cygthread.cc:51 #9 0x6100442f in cygthread::stub (arg=0x6118b400) at /netrel/src/cygwin-snapshot-20130119-1/winsup/cygwin/cygthread.cc:93 #10 0x6100538d in _cygtls::call2 (this=<optimized out>, func=0x610043e0 <cygthread::stub(void*)>, arg=0x6118b400, buf=0x6100551b <_cygtls::call(unsigned long (*)(void*, void*), void*)+91>) at /netrel/src/cygwin-snapshot-20130119-1/winsup/cygwin/cygtls.cc:99 #11 0x0095ff88 in ?? () #12 0x76a8339a in KERNEL32!BaseCleanupAppcompatCacheSupport () from /cygdrive/c/Windows/syswow64/kernel32.dll #13 0x6118b400 in cygthread::exiting () from /home/thonermann/cygwin/snapshot/usr/bin/cygwin1.dll #14 0x0095ffd4 in ?? () #15 0x77699ef2 in ntdll!RtlpNtSetValueKey () from /cygdrive/c/Windows/SysWOW64/ntdll.dll #16 0x6118b400 in cygthread::exiting () from /home/thonermann/cygwin/snapshot/usr/bin/cygwin1.dll #17 0x4449ca2d in ?? () #18 0x00000000 in ?? () (gdb) thread 3 [Switching to thread 3 (Thread 5344.0x1c2c)] #0 0x7767f91d in ntdll!RtlUpdateClonedSRWLock () from /cygdrive/c/Windows/SysWOW64/ntdll.dll (gdb) bt #0 0x7767f91d in ntdll!RtlUpdateClonedSRWLock () from /cygdrive/c/Windows/SysWOW64/ntdll.dll #1 0x7767f91d in ntdll!RtlUpdateClonedSRWLock () from /cygdrive/c/Windows/SysWOW64/ntdll.dll #2 0x7678d4b5 in WriteFile () from /cygdrive/c/Windows/syswow64/KERNELBASE.dll #3 0x0000009c in ?? () #4 0x00000000 in ?? () (gdb) thread 4 [Switching to thread 4 (Thread 5344.0x718)] #0 0x7767f8b1 in ntdll!RtlUpdateClonedSRWLock () from /cygdrive/c/Windows/SysWOW64/ntdll.dll (gdb) bt #0 0x7767f8b1 in ntdll!RtlUpdateClonedSRWLock () from /cygdrive/c/Windows/SysWOW64/ntdll.dll #1 0x7767f8b1 in ntdll!RtlUpdateClonedSRWLock () from /cygdrive/c/Windows/SysWOW64/ntdll.dll #2 0x76790a91 in WaitForSingleObjectEx () from /cygdrive/c/Windows/syswow64/KERNELBASE.dll #3 0x00000034 in ?? () #4 0x00000000 in ?? () Tom. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Intermittent failures with ctrl-c 2013-01-20 22:09 ` Tom Honermann @ 2013-01-23 3:20 ` Tom Honermann 2013-01-23 5:27 ` Christopher Faylor 0 siblings, 1 reply; 65+ messages in thread From: Tom Honermann @ 2013-01-23 3:20 UTC (permalink / raw) To: cygwin On 01/20/2013 05:08 PM, Tom Honermann wrote: > However, I was still able to reproduce another case. As before, one of > the processes is being left running when the rest are terminated. The > "abandoned" process appears to be in a live-lock state with two threads > (threads 1 and 2) running at 100%. Of particular interest is that each > time I press ctrl-c in the cmd.exe console this process was spawned > from, a new thread appears in the process even though this program is no > longer a foreground process and all other Cygwin processes have > terminated. The new threads never exit. I noticed that more changes were checked in that looked like they might address this, so I tested again with the latest snapshot (20130123). I wasn't able to reproduce any of the symptoms I previously reported. Yay! However, just as I was about to give up testing, I hit one more new issue. One of the ctrl-c events sent bash into what appeared to be an infinite loop emitting error messages like these: 11408974 [unknown (0x144C)] bash 1752 exception::handle: Error while dumping state (probably corrupted stack) 11411584 [unknown (0x144C)] bash 1752 exception::handle: Error while dumping state (probably corrupted stack) While this was going on, hitting ctrl-c had no discernible effect. I resorted to killing the process via task manager. This only occurred once, I wasn't able to get it to happen again. Tom. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Intermittent failures with ctrl-c 2013-01-23 3:20 ` Tom Honermann @ 2013-01-23 5:27 ` Christopher Faylor 2013-01-23 18:18 ` Tom Honermann 0 siblings, 1 reply; 65+ messages in thread From: Christopher Faylor @ 2013-01-23 5:27 UTC (permalink / raw) To: cygwin On Tue, Jan 22, 2013 at 10:20:20PM -0500, Tom Honermann wrote: >On 01/20/2013 05:08 PM, Tom Honermann wrote: >> However, I was still able to reproduce another case. As before, one of >> the processes is being left running when the rest are terminated. The >> "abandoned" process appears to be in a live-lock state with two threads >> (threads 1 and 2) running at 100%. Of particular interest is that each >> time I press ctrl-c in the cmd.exe console this process was spawned >> from, a new thread appears in the process even though this program is no >> longer a foreground process and all other Cygwin processes have >> terminated. The new threads never exit. > >I noticed that more changes were checked in that looked like they might >address this, so I tested again with the latest snapshot (20130123). > >I wasn't able to reproduce any of the symptoms I previously reported. Yay! > >However, just as I was about to give up testing, I hit one more new >issue. One of the ctrl-c events sent bash into what appeared to be an >infinite loop emitting error messages like these: > >11408974 [unknown (0x144C)] bash 1752 exception::handle: Error while >dumping state (probably corrupted stack) >11411584 [unknown (0x144C)] bash 1752 exception::handle: Error while >dumping state (probably corrupted stack) > >While this was going on, hitting ctrl-c had no discernible effect. I >resorted to killing the process via task manager. > >This only occurred once, I wasn't able to get it to happen again. Was there a stackdump? cgf -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Intermittent failures with ctrl-c 2013-01-23 5:27 ` Christopher Faylor @ 2013-01-23 18:18 ` Tom Honermann 2013-01-23 18:35 ` Christopher Faylor 0 siblings, 1 reply; 65+ messages in thread From: Tom Honermann @ 2013-01-23 18:18 UTC (permalink / raw) To: cygwin On 01/23/2013 12:26 AM, Christopher Faylor wrote: > On Tue, Jan 22, 2013 at 10:20:20PM -0500, Tom Honermann wrote: >> However, just as I was about to give up testing, I hit one more new >> issue. One of the ctrl-c events sent bash into what appeared to be an >> infinite loop emitting error messages like these: >> >> 11408974 [unknown (0x144C)] bash 1752 exception::handle: Error while >> dumping state (probably corrupted stack) >> 11411584 [unknown (0x144C)] bash 1752 exception::handle: Error while >> dumping state (probably corrupted stack) >> >> While this was going on, hitting ctrl-c had no discernible effect. I >> resorted to killing the process via task manager. >> >> This only occurred once, I wasn't able to get it to happen again. > > Was there a stackdump? Unfortunately no. And I should have grabbed a stack trace, but I didn't. I tried to reproduce again today using the same snapshot (20130123), but didn't have any luck. I see you checked in a change to detect the infinite recursion. I'd call that good enough. I didn't encounter any further anomalies that I can positively attribute to Cygwin. I did encounter a few that I suspect are cmd.exe issues that I'll report below. I'm only reporting these for the curious, I am not requesting any action be taken with regard to these. 1) Some times a ctrl-C was ignored. I would see ^C echoed to the console, but the test case would keep running without prompting to "Terminate batch job". 2) Some times cmd.exe would issue an error message about a syntax error in the .bat file following pressing ctrl-C and all processes would exit without prompting to "Terminate batch job". Thank you for your prompt attention to all of these issues Chris! I find it very impressive how responsive the Cygwin maintainers are to reports like these! Tom. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Intermittent failures with ctrl-c 2013-01-23 18:18 ` Tom Honermann @ 2013-01-23 18:35 ` Christopher Faylor 2013-01-24 4:12 ` Tom Honermann 0 siblings, 1 reply; 65+ messages in thread From: Christopher Faylor @ 2013-01-23 18:35 UTC (permalink / raw) To: cygwin On Wed, Jan 23, 2013 at 01:17:45PM -0500, Tom Honermann wrote: >On 01/23/2013 12:26 AM, Christopher Faylor wrote: >> On Tue, Jan 22, 2013 at 10:20:20PM -0500, Tom Honermann wrote: >>> However, just as I was about to give up testing, I hit one more new >>> issue. One of the ctrl-c events sent bash into what appeared to be an >>> infinite loop emitting error messages like these: >>> >>> 11408974 [unknown (0x144C)] bash 1752 exception::handle: Error while >>> dumping state (probably corrupted stack) >>> 11411584 [unknown (0x144C)] bash 1752 exception::handle: Error while >>> dumping state (probably corrupted stack) >>> >>> While this was going on, hitting ctrl-c had no discernible effect. I >>> resorted to killing the process via task manager. >>> >>> This only occurred once, I wasn't able to get it to happen again. >> >> Was there a stackdump? > >Unfortunately no. And I should have grabbed a stack trace, but I didn't. > >I tried to reproduce again today using the same snapshot (20130123), but >didn't have any luck. > >I see you checked in a change to detect the infinite recursion. I'd >call that good enough. That probably is relatively ok given that you're trying to terminate the process anyway but it would be nice to know why the stackdump was happening. >Thank you for your prompt attention to all of these issues Chris! I >find it very impressive how responsive the Cygwin maintainers are to >reports like these! You're very welcome. Thanks for hanging in there throughout this process. FYI, as it turns out, working around the thread exit problem uncovered a whole host of issues with locking/signals/exit that have been lurking in the code for a while. So, this exercise should have made a better Cygwin in the long-run. It may even have made Cygwin a little faster. cgf -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Intermittent failures with ctrl-c 2013-01-23 18:35 ` Christopher Faylor @ 2013-01-24 4:12 ` Tom Honermann 0 siblings, 0 replies; 65+ messages in thread From: Tom Honermann @ 2013-01-24 4:12 UTC (permalink / raw) To: cygwin On 01/23/2013 01:35 PM, Christopher Faylor wrote: > On Wed, Jan 23, 2013 at 01:17:45PM -0500, Tom Honermann wrote: >> I see you checked in a change to detect the infinite recursion. I'd >> call that good enough. > > That probably is relatively ok given that you're trying to terminate the > process anyway but it would be nice to know why the stackdump was > happening. Agreed. I'll investigate and report any future cases I encounter. Tom. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Intermittent failures with ctrl-c 2013-01-16 16:38 ` Intermittent failures with ctrl-c Tom Honermann 2013-01-16 16:53 ` marco atzeri @ 2013-01-16 19:14 ` Christopher Faylor 2013-01-16 20:24 ` Tom Honermann 1 sibling, 1 reply; 65+ messages in thread From: Christopher Faylor @ 2013-01-16 19:14 UTC (permalink / raw) To: cygwin On Wed, Jan 16, 2013 at 11:37:43AM -0500, Tom Honermann wrote: >On 01/15/2013 09:04 PM, Christopher Faylor wrote: >> On Tue, Jan 15, 2013 at 05:16:57PM -0500, Tom Honermann wrote: >>> I noticed that some changes were checked in related to signal handling >>> and process termination recently, so I downloaded the most recent >>> snapshot (20130114) and tested again. I was still able to produce >>> hanging processes (including hangs of strace.exe) by hitting ctrl-c in a >>> mintty window while Cygwin processes ran in an infinite loop inside of a >>> .bat file. I was able to produce a hang ~1 out of 20 times. >> >> How does one run a .bat file inside mintty which handles CTRL-C? AFAIK, >> a CTRL-C will just cause the .bat file to exit when run under bash. > >Here is the test case: > >1) Install the latest snapshot > >2) Copy bash.exe, false.exe, and their dependent DLLs from a Cygwin >install into the usr/bin directory of the snapshot. For me this >consisted of: > bash.exe > cygintl-8.dll > cygiconv-2.dll > cygreadline7.dll > cygncurses-10.dll > cygncursesw-10.dll > cyggcc_s-1.dll > false.exe > >3) Create 'test.bat' in the usr/bin directory of the snapshot with the >following contents: > >@echo off >setlocal > >set PATH=%CD%;%PATH% > >:loop >echo test... >bash -c false >if not errorlevel 1 ( > echo exiting... > exit /B 1 >) >goto loop > >4) Launch mintty using an existing Cygwin installation. Naturally, this >will run a shell from the existing Cygwin install. > >5) Change directories to the usr/bin directory of the snapshot. > >6) Start task manager or some other process monitoring tool and keep it >running. Run ./test.bat from the Cygwin shell running within mintty and >interrupt it with ctrl-c. Repeat until you see a new bash.exe or >false.exe process persisting following the interrupt. You'll likely >have multiple bash processes running. If you are able to reproduce, you >should see one with a command line of 'bash -c false'. Alternatively, >if your process monitoring tool shows the path to the executable, you'll >be able to identify it as the one from the usr/bin directory of the >snapshot. Again, if I hit CTRL-C while running ./test.bat in mintty then test.bat exits immediately, as expected. Hitting ctrl-c repeatedly after that point gives me a new bash prompt. Non-exiting behavior was a symptom of a previous snapshot which was mentioned here: http://cygwin.com/ml/cygwin/2013-01/msg00164.html >I rather doubt that the use of a .bat file is necessary to reproduce >this hang, but I haven't tried producing a test case that doesn't use a >.bat file. This is a test case I was using when debugging the >intermittent incorrect exit code issue. Btw, an incorrect exit code is still a possibility if you're running from a cmd shell since it is possible to interrupt a cygwin process before cygwin is entirely set up. That will cause a normal windows CTRL-C exit. cgf -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Intermittent failures with ctrl-c 2013-01-16 19:14 ` Christopher Faylor @ 2013-01-16 20:24 ` Tom Honermann 0 siblings, 0 replies; 65+ messages in thread From: Tom Honermann @ 2013-01-16 20:24 UTC (permalink / raw) To: cygwin On 01/16/2013 02:14 PM, Christopher Faylor wrote: > Again, if I hit CTRL-C while running ./test.bat in mintty then test.bat > exits immediately, as expected. Hitting ctrl-c repeatedly after that > point gives me a new bash prompt. Yes, that is what is expected to happen. What I am reporting is that interrupting test.bat sometimes leaves hung processes still running after control is returned to the shell. > Non-exiting behavior was a symptom of a previous snapshot which was > mentioned here: > > http://cygwin.com/ml/cygwin/2013-01/msg00164.html I'm testing a newer snapshot than that one. I'm been testing with 20130114 which Thomas reported as no longer having that problem here: http://cygwin.com/ml/cygwin/2013-01/msg00196.html >> I rather doubt that the use of a .bat file is necessary to reproduce >> this hang, but I haven't tried producing a test case that doesn't use a >> .bat file. This is a test case I was using when debugging the >> intermittent incorrect exit code issue. > > Btw, an incorrect exit code is still a possibility if you're running > from a cmd shell since it is possible to interrupt a cygwin process > before cygwin is entirely set up. That will cause a normal windows > CTRL-C exit. Yup, that is understood and expected. Tom. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Intermittent failures retrieving process exit codes 2012-12-21 6:30 ` Tom Honermann 2012-12-21 10:33 ` Corinna Vinschen @ 2012-12-21 20:01 ` Tom Honermann 2013-11-14 4:02 ` Tom Honermann 2 siblings, 0 replies; 65+ messages in thread From: Tom Honermann @ 2012-12-21 20:01 UTC (permalink / raw) To: cygwin On 12/21/2012 01:30 AM, Tom Honermann wrote: > I don't know which Windows releases are affected by this. I've only > reproduced the problem (outside of Cygwin) with Wow64 processes running > on 64-bit Windows 7. I haven't yet tried elsewhere. I was able to reproduce the issue with a 64-bit executable compiled with the test case in the parent email using Microsoft's Visual Studio 2010 x64 compiler. This issue does not appear to be specific to support for running 32-bit processes on 64-bit Windows via Wow64. I have not yet tried to reproduce this on any release of Windows other than 64-bit Windows 7 SP1. I am curious about what other Windows releases are affected. Please reply if you try the test case and are able to reproduce the problem on other Windows releases. So far, I'm only aware of the issue being reproduced on multi-processor systems. I suspect the problem can occur on single-processor systems as well, but is much less likely to. Tom. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Intermittent failures retrieving process exit codes 2012-12-21 6:30 ` Tom Honermann 2012-12-21 10:33 ` Corinna Vinschen 2012-12-21 20:01 ` Intermittent failures retrieving process exit codes Tom Honermann @ 2013-11-14 4:02 ` Tom Honermann 2013-11-14 9:20 ` Corinna Vinschen 2013-11-15 18:53 ` Denis Excoffier 2 siblings, 2 replies; 65+ messages in thread From: Tom Honermann @ 2013-11-14 4:02 UTC (permalink / raw) To: cygwin On 12/21/2012 01:30 AM, Tom Honermann wrote: > I spent most of the week debugging this issue. This appears to be a > defect in Windows. I can reproduce the issue without Cygwin. I can't > rule out other third party kernel mode software possibly contributing to > the issue. A simple change to Cygwin works around the problem for me. > > I don't know which Windows releases are affected by this. I've only > reproduced the problem (outside of Cygwin) with Wow64 processes running > on 64-bit Windows 7. I haven't yet tried elsewhere. > > The problem appears to be a race condition involving concurrent calls to > TerminateProcess() and ExitThread(). The example code below minimally > mimics the threads created and exit process/thread calls that are > performed when running Cygwin's false.exe. The primary thread exits the > process via TerminateProcess() ala pinfo::exit() in > winsup/cygwin/pinfo.cc. The secondary thread exits itself via > ExitThread() ala Cygwin's signal processing thread function, wait_sig(), > in winsup/cygwin/sigproc.cc. > > When the race condition results in the undesirable outcome, the exit > code for the process is set to the exit code for the secondary thread's > call to ExitThread(). I can only speculate at this point, but my guess > is that the TerminateProcess() code disassociates the calling thread > from the process before other threads are stopped such that > ExitThread(), concurrently running in another thread, may determine that > the calling thread is the last thread of the process and overwrite the > process exit code. > > The issue also reproduces if ExitProcess() is called in place of > TerminateProcess(). The test case below only uses TerminateProcess() > because that is what Cygwin does. > > Source code to reproduce the issue follows. Again, Cygwin is not > required to reproduce the problem. For my own testing, I compiled the > code using Microsoft's Visual Studio 2010 x86 compiler with the command > 'cl /Fetest-exit-code.exe test-exit-code.cpp' > > test-exit-code.cpp: > > #include <windows.h> > #include <stdio.h> > #include <stdlib.h> > > DWORD WINAPI SecondaryThread( > LPVOID lpParameter) > { > Sleep(1); > ExitThread(2); > } > > int main() { > HANDLE hSecondaryThread = CreateThread( > NULL, // lpThreadAttributes > 0, // dwStackSize > SecondaryThread, // lpStartAddress > (LPVOID)0, // lpParameter > 0, // dwCreationFlags > NULL); // lpThreadId > if (!hSecondaryThread) { > fprintf(stderr, "CreateThread failed. GLE=%lu\n", > (unsigned long)GetLastError()); > exit(127); > } > > Sleep(1); > > if (!TerminateProcess(GetCurrentProcess(), 1)) { > fprintf(stderr, "TerminateProcess failed. GLE=%lu\n", > (unsigned long)GetLastError()); > exit(127); > } > > return 0; > } > > > To run the test, a simple .bat file is used: > > test.bat: > > @echo off > setlocal > > :loop > echo test... > test-exit-code.exe > if %ERRORLEVEL% NEQ 1 ( > echo test-exit-code.exe returned %ERRORLEVEL% > exit /B 1 > ) > goto loop > > > test.bat should run indefinitely. The amount of time it takes to fail > on my machine (64-bit Windows 7 running in a VMware Workstation 8 VM > under Kubuntu 12.04 on a Lenovo T420 Intel i7-2640M 2 processor laptop) > varies considerably. I had one run fail in less than 10 iterations, but > most of the time it has taken upwards of 5 minutes to get a failure. > > The workaround I implemented within Cygwin was simple and sloppy. I > added a call to Sleep(1000) immediately before the call to ExitThread() > in wait_sig() in winsup/cygwin/sigproc.cc. Since this thread (probably) > doesn't exit until the process is exiting anyway, the call to Sleep() > does not adversely affect shutdown. The thread just gets terminated > while in the call to Sleep() instead of exiting before the process is > terminated or getting terminated while still in the call to > ExitThread(). A better solution might be to avoid the thread exiting at > all (so long as it can't get terminated while holding critical > resources), or to have the process exiting thread wait on it. Neither > of these is ideal. Orderly shutdown of multi-threaded processes is > really hard to do correctly on Windows. > > Since the exit code for the signal processing thread is not used, having > the wait_sig() thread (and any other threads that could potentially > concurrently exit with another thread) exit with a special status value > such as STATUS_THREAD_IS_TERMINATING (0xC000004BL) would enable > diagnosis of this issue as any process exit code matching this would be > a likely indicator that this issue was encountered. > > As is, when this race condition results in the undesirable outcome, > since the signal processing thread exits with a status of 0, the exit > status of the process is 0. This explains why false.exe works so well > to reproduce the issue. It would be impossible to produce a negative > test using true.exe. > > Tom. Time passes... I worked with some former colleagues to report this issue to Microsoft. Windows 8.1 and Windows Server 2012 R2 contain a fix that addresses the test case above. A hotfix has been made available for Windows 7 SP1 and Windows Server 2008 R2. Should anyone desire a hotfix for other versions of Windows, it will be necessary to open a case with Microsoft to request it. http://support.microsoft.com/kb/2875501 Tom. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Intermittent failures retrieving process exit codes 2013-11-14 4:02 ` Tom Honermann @ 2013-11-14 9:20 ` Corinna Vinschen 2013-11-14 15:21 ` Tom Honermann 2013-11-15 18:53 ` Denis Excoffier 1 sibling, 1 reply; 65+ messages in thread From: Corinna Vinschen @ 2013-11-14 9:20 UTC (permalink / raw) To: cygwin [-- Attachment #1: Type: text/plain, Size: 1754 bytes --] Hi Tom, On Nov 13 23:01, Tom Honermann wrote: > On 12/21/2012 01:30 AM, Tom Honermann wrote: > >[...] > >When the race condition results in the undesirable outcome, the exit > >code for the process is set to the exit code for the secondary thread's > >call to ExitThread(). I can only speculate at this point, but my guess > >is that the TerminateProcess() code disassociates the calling thread > >from the process before other threads are stopped such that > >ExitThread(), concurrently running in another thread, may determine that > >the calling thread is the last thread of the process and overwrite the > >process exit code. > >[...] > > Time passes... > > I worked with some former colleagues to report this issue to > Microsoft. Windows 8.1 and Windows Server 2012 R2 contain a fix > that addresses the test case above. A hotfix has been made > available for Windows 7 SP1 and Windows Server 2008 R2. Should > anyone desire a hotfix for other versions of Windows, it will be > necessary to open a case with Microsoft to request it. > > http://support.microsoft.com/kb/2875501 > > Tom. thanks for letting us know! I'm very glad to read that this is an OS bug and a fix is available. At least partially. I'm a bit confused. As far as I understand it this is the situation now: Vista/2008 and earlier: no fix available. W7/2008R2: only hotfix for manual installation W8/2012: no fix available. W8.1/2012R2: fixed. Did I get that right? That sounds a bit weird... Thanks again, Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat [-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Intermittent failures retrieving process exit codes 2013-11-14 9:20 ` Corinna Vinschen @ 2013-11-14 15:21 ` Tom Honermann 0 siblings, 0 replies; 65+ messages in thread From: Tom Honermann @ 2013-11-14 15:21 UTC (permalink / raw) To: cygwin On 11/14/2013 04:19 AM, Corinna Vinschen wrote: > thanks for letting us know! You're welcome :) > I'm very glad to read that this is an OS bug and a fix is available. > > At least partially. I'm a bit confused. As far as I understand it this > is the situation now: > > Vista/2008 and earlier: no fix available. > W7/2008R2: only hotfix for manual installation > W8/2012: no fix available. > W8.1/2012R2: fixed. > > Did I get that right? That sounds a bit weird... That is how I understand it. Microsoft requires a Premier Support agreement in order to request hotfixes and I am not a party on any such agreement. So, I worked with former colleagues at another company that does have a Premier support agreement and that I knew were also experiencing the issue. They only requested a hotfix for Windows 7 SP1 and Windows 2008 R2 as those are the only Windows releases they were concerned about having a fix for. The result: it is fixed in currently shipping versions and a hotfix is available for those specific releases, but other releases remain vulnerable. Addressing those releases will presumably require someone with access to a Premier Support agreement to request additional hotfix releases. Tom. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Intermittent failures retrieving process exit codes 2013-11-14 4:02 ` Tom Honermann 2013-11-14 9:20 ` Corinna Vinschen @ 2013-11-15 18:53 ` Denis Excoffier 2013-11-15 19:21 ` Christopher Faylor ` (2 more replies) 1 sibling, 3 replies; 65+ messages in thread From: Denis Excoffier @ 2013-11-15 18:53 UTC (permalink / raw) To: Tom Honermann, Cygwin Mailing List; +Cc: lasse.collin On 2013-11-14 05:01, Tom Honermann wrote: > On 12/21/2012 01:30 AM, Tom Honermann wrote: >> >> The workaround I implemented within Cygwin was simple and sloppy. I >> added a call to Sleep(1000) immediately before the call to ExitThread() >> in wait_sig() in winsup/cygwin/sigproc.cc. Since this thread (probably) >> doesn't exit until the process is exiting anyway, the call to Sleep() >> does not adversely affect shutdown. The thread just gets terminated >> while in the call to Sleep() instead of exiting before the process is >> terminated or getting terminated while still in the call to >> ExitThread(). A better solution might be to avoid the thread exiting at >> all (so long as it can't get terminated while holding critical >> resources), or to have the process exiting thread wait on it. Neither >> of these is ideal. Orderly shutdown of multi-threaded processes is >> really hard to do correctly on Windows. I experience on Windows 7 (not on XP) some problems that may be related. I would like to test your workaround, but sigproc.cc has much changed since then, there is now an exit_thead function with the comment "Exit the current thread very carefully.". I tried to insert Sleep(1000) at the end of exit_thread, immediately before "ExitThread (0)", but this yielded no change at all. Could someone be kind enough to update the workaround for modern sigproc.cc? Very briefly, my problem is that when i "tar xf —use-compress-program=xz", i get: tar: Unexpected EOF in archive tar: Unexpected EOF in archive tar: Error is not recoverable: exiting now and the last file of the archive is truncated at some 512bytes block. This occurs on Windows 7 (not on XP); with xz-5.1.3alpha (not with xz-5.1.2alpha or xz-5.0.5); never on most tar.xz files; almost always on some (rare) tar.xz files (one notable example is bc-1.06.95.tar.bz2 bunzip2’ed and then xz’ed); depends on the .tar file itself, not on the option (like -9e, -0) used to create the .tar.xz; never with "tar tf"; and with all tar’s i have tested. The return code of all the involved xz -d commands is always zero though. Perhaps after all, this is unrelated? Thank you. Regards, Denis Excoffier. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Intermittent failures retrieving process exit codes 2013-11-15 18:53 ` Denis Excoffier @ 2013-11-15 19:21 ` Christopher Faylor 2013-11-17 13:30 ` Denis Excoffier 2013-11-15 22:15 ` Tom Honermann 2013-11-25 19:59 ` Lasse Collin 2 siblings, 1 reply; 65+ messages in thread From: Christopher Faylor @ 2013-11-15 19:21 UTC (permalink / raw) To: cygwin On Fri, Nov 15, 2013 at 07:53:26PM +0100, Denis Excoffier wrote: >On 2013-11-14 05:01, Tom Honermann wrote: >> On 12/21/2012 01:30 AM, Tom Honermann wrote: >>> >>> The workaround I implemented within Cygwin was simple and sloppy. I >>> added a call to Sleep(1000) immediately before the call to ExitThread() >>> in wait_sig() in winsup/cygwin/sigproc.cc. Since this thread (probably) >>> doesn't exit until the process is exiting anyway, the call to Sleep() >>> does not adversely affect shutdown. The thread just gets terminated >>> while in the call to Sleep() instead of exiting before the process is >>> terminated or getting terminated while still in the call to >>> ExitThread(). A better solution might be to avoid the thread exiting at >>> all (so long as it can't get terminated while holding critical >>> resources), or to have the process exiting thread wait on it. Neither >>> of these is ideal. Orderly shutdown of multi-threaded processes is >>> really hard to do correctly on Windows. > >I experience on Windows 7 (not on XP) some problems that may be related. >I would like to test your workaround, but sigproc.cc has much changed since >then, there is now an exit_thead function with the comment "Exit the current >thread very carefully.". I tried to insert Sleep(1000) at the end of >exit_thread, immediately before "ExitThread (0)", but this yielded no >change at all. > >Could someone be kind enough to update the workaround for modern sigproc.cc? You apparently are misunderstanding the whole point of the changes to sigproc.cc. They were to work around this very problem. cgf -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Intermittent failures retrieving process exit codes 2013-11-15 19:21 ` Christopher Faylor @ 2013-11-17 13:30 ` Denis Excoffier 0 siblings, 0 replies; 65+ messages in thread From: Denis Excoffier @ 2013-11-17 13:30 UTC (permalink / raw) To: Cygwin Mailing List On 2013-11-15 20:21, Christopher Faylor wrote: > On Fri, Nov 15, 2013 at 07:53:26PM +0100, Denis Excoffier wrote: >> On 2013-11-14 05:01, Tom Honermann wrote: >>> On 12/21/2012 01:30 AM, Tom Honermann wrote: >>>> >>>> The workaround I implemented within Cygwin was simple and sloppy. I >>>> added a call to Sleep(1000) immediately before the call to ExitThread() >>>> in wait_sig() in winsup/cygwin/sigproc.cc. Since this thread (probably) >>>> doesn't exit until the process is exiting anyway, the call to Sleep() >>>> does not adversely affect shutdown. The thread just gets terminated >>>> while in the call to Sleep() instead of exiting before the process is >>>> terminated or getting terminated while still in the call to >>>> ExitThread(). A better solution might be to avoid the thread exiting at >>>> all (so long as it can't get terminated while holding critical >>>> resources), or to have the process exiting thread wait on it. Neither >>>> of these is ideal. Orderly shutdown of multi-threaded processes is >>>> really hard to do correctly on Windows. >> >> I experience on Windows 7 (not on XP) some problems that may be related. >> I would like to test your workaround, but sigproc.cc has much changed since >> then, there is now an exit_thead function with the comment "Exit the current >> thread very carefully.". I tried to insert Sleep(1000) at the end of >> exit_thread, immediately before "ExitThread (0)", but this yielded no >> change at all. >> >> Could someone be kind enough to update the workaround for modern sigproc.cc? > > You apparently are misunderstanding the whole point of the changes to > sigproc.cc. They were to work around this very problem. Oh, i didn’t remember that. Then this must be the antivirus or something else i have to cope with. Regards, Denis Excoffier. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Intermittent failures retrieving process exit codes 2013-11-15 18:53 ` Denis Excoffier 2013-11-15 19:21 ` Christopher Faylor @ 2013-11-15 22:15 ` Tom Honermann 2013-11-25 19:59 ` Lasse Collin 2 siblings, 0 replies; 65+ messages in thread From: Tom Honermann @ 2013-11-15 22:15 UTC (permalink / raw) To: Denis Excoffier, Cygwin Mailing List; +Cc: lasse.collin On 11/15/2013 01:53 PM, Denis Excoffier wrote: > On 2013-11-14 05:01, Tom Honermann wrote: >> On 12/21/2012 01:30 AM, Tom Honermann wrote: >>> >>> The workaround I implemented within Cygwin was simple and sloppy. I >>> added a call to Sleep(1000) immediately before the call to ExitThread() >>> in wait_sig() in winsup/cygwin/sigproc.cc. Since this thread (probably) >>> doesn't exit until the process is exiting anyway, the call to Sleep() >>> does not adversely affect shutdown. The thread just gets terminated >>> while in the call to Sleep() instead of exiting before the process is >>> terminated or getting terminated while still in the call to >>> ExitThread(). A better solution might be to avoid the thread exiting at >>> all (so long as it can't get terminated while holding critical >>> resources), or to have the process exiting thread wait on it. Neither >>> of these is ideal. Orderly shutdown of multi-threaded processes is >>> really hard to do correctly on Windows. > > I experience on Windows 7 (not on XP) some problems that may be related. > I would like to test your workaround, but sigproc.cc has much changed since > then, there is now an exit_thead function with the comment "Exit the current > thread very carefully.". I tried to insert Sleep(1000) at the end of > exit_thread, immediately before "ExitThread (0)", but this yielded no > change at all. > > Could someone be kind enough to update the workaround for modern sigproc.cc? Hi Denis. Cygwin versions 1.7.18 and later contain a workaround for this issue. If you are running something older than that, I highly encourage you to upgrade. Many stability related fixes have been made in more recent versions. > Very briefly, my problem is that when i "tar xf —use-compress-program=xz", i > get: > tar: Unexpected EOF in archive > tar: Unexpected EOF in archive > tar: Error is not recoverable: exiting now > and the last file of the archive is truncated at some 512bytes block. This > occurs on Windows 7 (not on XP); with xz-5.1.3alpha (not with xz-5.1.2alpha or > xz-5.0.5); never on most tar.xz files; almost always on some (rare) tar.xz files > (one notable example is bc-1.06.95.tar.bz2 bunzip2’ed and then xz’ed); depends > on the .tar file itself, not on the option (like -9e, -0) used to create the > .tar.xz; never with "tar tf"; and with all tar’s i have tested. The return code > of all the involved xz -d commands is always zero though. Perhaps after all, this > is unrelated? This doesn't sound related to the intermittent incorrect exit code defect to me. I'm afraid I don't have other explanations for what you are experiencing though. Tom. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Intermittent failures retrieving process exit codes 2013-11-15 18:53 ` Denis Excoffier 2013-11-15 19:21 ` Christopher Faylor 2013-11-15 22:15 ` Tom Honermann @ 2013-11-25 19:59 ` Lasse Collin 2013-11-25 23:12 ` Antivirus strikes back (probably) (Was: Intermittent failures retrieving process exit codes) Denis Excoffier 2 siblings, 1 reply; 65+ messages in thread From: Lasse Collin @ 2013-11-25 19:59 UTC (permalink / raw) To: Denis Excoffier; +Cc: Tom Honermann, Cygwin Mailing List On 2013-11-15 Denis Excoffier wrote: > Very briefly, my problem is that when i "tar xf > —use-compress-program=xz", i get: > tar: Unexpected EOF in archive > tar: Unexpected EOF in archive > tar: Error is not recoverable: exiting now > and the last file of the archive is truncated at some 512bytes block. > This occurs on Windows 7 (not on XP); with xz-5.1.3alpha (not with > xz-5.1.2alpha or xz-5.0.5); never on most tar.xz files; almost always > on some (rare) tar.xz files (one notable example is > bc-1.06.95.tar.bz2 bunzip2’ed and then xz’ed); depends on the .tar > file itself, not on the option (like -9e, -0) used to create > the .tar.xz; never with "tar tf"; and with all tar’s i have tested. > The return code of all the involved xz -d commands is always zero > though. Perhaps after all, this is unrelated? xz 5.1.3alpha has some new file I/O code that uses non-blocking file descriptors, the self-pipe trick, and poll(). It's there to fix a race condition in signal handling. Since you say it works with 5.1.2alpha, I wonder could there be a bug with the new I/O code in xz or if the code in xz triggers a bug in Cygwin or Windows. If you haven't already tried, please compile both 5.1.2alpha and 5.1.3alpha from source while keeping everything else unchanged, and see if the bug really only occurs with 5.1.3alpha. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 65+ messages in thread
* Antivirus strikes back (probably) (Was: Intermittent failures retrieving process exit codes) 2013-11-25 19:59 ` Lasse Collin @ 2013-11-25 23:12 ` Denis Excoffier 2013-11-26 21:09 ` Denis Excoffier ` (2 more replies) 0 siblings, 3 replies; 65+ messages in thread From: Denis Excoffier @ 2013-11-25 23:12 UTC (permalink / raw) To: Lasse Collin; +Cc: Tom Honermann, Cygwin Mailing List On 2013-11-25 à 21:58 +02:00, Lasse Collin wrote: > On 2013-11-15 Denis Excoffier wrote: >> Very briefly, my problem is that when i "tar xf >> —use-compress-program=xz", i get: >> tar: Unexpected EOF in archive >> tar: Unexpected EOF in archive >> tar: Error is not recoverable: exiting now >> and the last file of the archive is truncated at some 512bytes block. >> This occurs on Windows 7 (not on XP); with xz-5.1.3alpha (not with >> xz-5.1.2alpha or xz-5.0.5); never on most tar.xz files; almost always >> on some (rare) tar.xz files (one notable example is >> bc-1.06.95.tar.bz2 bunzip2’ed and then xz’ed); depends on the .tar >> file itself, not on the option (like -9e, -0) used to create >> the .tar.xz; never with "tar tf"; and with all tar’s i have tested. >> The return code of all the involved xz -d commands is always zero >> though. Perhaps after all, this is unrelated? > > xz 5.1.3alpha has some new file I/O code that uses non-blocking file > descriptors, the self-pipe trick, and poll(). It's there to fix a race > condition in signal handling. Since you say it works with 5.1.2alpha, I > wonder could there be a bug with the new I/O code in xz or if the code > in xz triggers a bug in Cygwin or Windows. > > If you haven't already tried, please compile both 5.1.2alpha and > 5.1.3alpha from source while keeping everything else unchanged, and see > if the bug really only occurs with 5.1.3alpha. Already done. I did some strace-ing, and since i’m not so fluent with the result, i’ll send it there in a while (when i’m back on cygwin) if someone is interested. But the bug (contrary to what i said before) also _sometimes_ occurs with 5.1.2alpha or 5.0.5 and this makes me think now that: a) my antivirus-anti-intrusion-whatever-software (that i can’t remove of course) creates some kind of "background noise" where a certain percentage of such ‘tar xf —use-compress-program’ commands will always fail b) nevertheless, xz-5.1.3alpha (with its new file I/O code etc.) triggers some untypical configuration inside the antivirus that increases drastically the percentage, making the failure almost certain for some files. It is not extraordinary that i cannot observe the failure on XP since i do not have this particular antivirus on XP. You might however want some more detail. Test plan is: perform 'tar xf file.xz --use-compress-program=xz -bx', where x varies from 1 to 200. There are two kinds of results: 1) usual situation is where you observe max 1 or 2 failures (on a maximum of 200). If you launch the same plan, you still report max 1 or 2 failures, usually not with the same numbers. Very often you have no failure at all. Very often the -b20 (the default) does not fail. -> this situation occurs with 5.1.2alpha or 5.0.5 with all input files, or with 5.1.3alpha with most input files. 2) pathological situation is where you observe, say, 30 failures (on a maximum of 200). If you launch the same plan, you report nearly the same failures, ie mostly the same ones, with some minor variability analogous to the variability observed in the usual situation above -> this situation occurs with 5.1.3alpha only, with some selected input files, eg bc-1.06.95.tar.xz (see above how to create bc-1.06.95.tar.xz) When it fails (usually or pathologically), the last file of the archive gets truncated (see above), and _this_ is strange from an antivirus behaviour. After all, perhaps some flush() or similar is missing inside 5.1.3alpha. Denis Excoffier. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Antivirus strikes back (probably) (Was: Intermittent failures retrieving process exit codes) 2013-11-25 23:12 ` Antivirus strikes back (probably) (Was: Intermittent failures retrieving process exit codes) Denis Excoffier @ 2013-11-26 21:09 ` Denis Excoffier 2013-11-26 23:36 ` Christopher Faylor 2013-11-26 21:09 ` Denis Excoffier 2013-12-01 13:24 ` Lasse Collin 2 siblings, 1 reply; 65+ messages in thread From: Denis Excoffier @ 2013-11-26 21:09 UTC (permalink / raw) To: Lasse Collin; +Cc: Tom Honermann, Cygwin Mailing List [-- Attachment #1: Type: text/plain, Size: 695 bytes --] On 2013-11-26 00:11 +01:00, Denis Excoffier wrote: > Already done. I did some strace-ing, and since i’m not so fluent with the > result, i’ll send it there in a while (when i’m back on cygwin) if someone is > interested. But the bug (contrary to what i said before) also _sometimes_ > occurs with 5.1.2alpha or 5.0.5 and this makes me think now that: Here is the result of strace (with minor editing). I kept the whole strace (12000 lines), because xz ends rather early (around line 10000). 2bc-1.06.95.tar.xz is a file built using bunzip2 | xz -c Note the presence of Win32 error 109 (broken pipe). Regards, Denis Excoffier. This is part1. part2 follows in a few minutes. [-- Attachment #2: typescript-part1.xz --] [-- Type: application/octet-stream, Size: 55188 bytes --] [-- Attachment #3: Type: text/plain, Size: 218 bytes --] -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Antivirus strikes back (probably) (Was: Intermittent failures retrieving process exit codes) 2013-11-26 21:09 ` Denis Excoffier @ 2013-11-26 23:36 ` Christopher Faylor 0 siblings, 0 replies; 65+ messages in thread From: Christopher Faylor @ 2013-11-26 23:36 UTC (permalink / raw) To: cygwin On Tue, Nov 26, 2013 at 10:09:19PM +0100, Denis Excoffier wrote: >On 2013-11-26 00:11 +01:00, Denis Excoffier wrote: >> Already done. I did some strace-ing, and since i?m not so fluent with the >> result, i?ll send it there in a while (when i?m back on cygwin) if someone is >> interested. But the bug (contrary to what i said before) also _sometimes_ >> occurs with 5.1.2alpha or 5.0.5 and this makes me think now that: >Here is the result of strace (with minor editing). I kept the whole strace (12000 lines), >because xz ends rather early (around line 10000). > >2bc-1.06.95.tar.xz is a file built using bunzip2 | xz -c > >Note the presence of Win32 error 109 (broken pipe). Please don't post unsolicited straces to this list. No one is going to be looking at them and they just clog up the mailing list. cgf -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Antivirus strikes back (probably) (Was: Intermittent failures retrieving process exit codes) 2013-11-25 23:12 ` Antivirus strikes back (probably) (Was: Intermittent failures retrieving process exit codes) Denis Excoffier 2013-11-26 21:09 ` Denis Excoffier @ 2013-11-26 21:09 ` Denis Excoffier 2013-12-01 13:24 ` Lasse Collin 2 siblings, 0 replies; 65+ messages in thread From: Denis Excoffier @ 2013-11-26 21:09 UTC (permalink / raw) To: Lasse Collin; +Cc: Tom Honermann, Cygwin Mailing List [-- Attachment #1: Type: text/plain, Size: 418 bytes --] On 2013-11-26 00:11 +01:00, Denis Excoffier wrote: > Already done. I did some strace-ing, and since i’m not so fluent with the > result, i’ll send it there in a while (when i’m back on cygwin) if someone is > interested. But the bug (contrary to what i said before) also _sometimes_ > occurs with 5.1.2alpha or 5.0.5 and this makes me think now that: This is part2. Just cat typescript-part1 typescript-part2. [-- Attachment #2: typescript-part2.xz --] [-- Type: application/octet-stream, Size: 53528 bytes --] [-- Attachment #3: Type: text/plain, Size: 218 bytes --] -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Antivirus strikes back (probably) (Was: Intermittent failures retrieving process exit codes) 2013-11-25 23:12 ` Antivirus strikes back (probably) (Was: Intermittent failures retrieving process exit codes) Denis Excoffier 2013-11-26 21:09 ` Denis Excoffier 2013-11-26 21:09 ` Denis Excoffier @ 2013-12-01 13:24 ` Lasse Collin 2 siblings, 0 replies; 65+ messages in thread From: Lasse Collin @ 2013-12-01 13:24 UTC (permalink / raw) To: Denis Excoffier; +Cc: Tom Honermann, Cygwin Mailing List On 2013-11-26 Denis Excoffier wrote: > On 2013-11-25 à 21:58 +02:00, Lasse Collin wrote: > > If you haven't already tried, please compile both 5.1.2alpha and > > 5.1.3alpha from source while keeping everything else unchanged, and > > see if the bug really only occurs with 5.1.3alpha. > Already done. I did some strace-ing, and since i’m not so fluent with > the result, i’ll send it there in a while (when i’m back on cygwin) > if someone is interested. But the bug (contrary to what i said > before) also _sometimes_ occurs with 5.1.2alpha or 5.0.5 and this > makes me think now that: > > a) my antivirus-anti-intrusion-whatever-software (that i can’t remove > of course) creates some kind of "background noise" where a certain > percentage of such ‘tar xf —use-compress-program’ commands will > always fail > > b) nevertheless, xz-5.1.3alpha (with its new file I/O code etc.) > triggers some untypical configuration inside the antivirus that > increases drastically the percentage, making the failure almost > certain for some files. > > It is not extraordinary that i cannot observe the failure on XP since > i do not have this particular antivirus on XP. OK, so the new I/O code in xz probably isn't the problem even if it may affect how easily the actual problem gets triggered. [...] > When it fails (usually or pathologically), the last file of the > archive gets truncated (see above), and _this_ is strange from an > antivirus behaviour. After all, perhaps some flush() or similar is > missing inside 5.1.3alpha. xz uses write() which uses a file descriptor argument, so there is nothing to flush separately. xz just has to write() everything. When used with tar, xz writes to standard output (FILENO_STDOUT) which with tar is a pipe. When xz finishes, it closes its end (the writer end) of the pipe. With xz 5.1.3alpha, O_NONBLOCK flag is set for FILENO_STDIN and FILENO_STDOUT if the flag wasn't already set. If xz set the flag, it will unset it before closing the file descriptor. The setting and unsetting can be seen in the trace you sent and it seems to work correctly. I don't have a guess if these fcntl() calls might cause the difference between 5.1.3alpha and other versions, but it doesn't sound too important since the bug occurs in some form with all versions. From the trace file it seems that the last write() from xz gets lost. xz first makes 173 writes of 8192 bytes and then one 6144-byte write, totalling 1,423,360 bytes. tar gets 1,417,216 from xz, that is, 6144 bytes too little. Since things go wrong with old xz versions that don't use non-blocking I/O, I would expect you to see similar issues with other compressors too. Maybe it would be worth testing with gzip and bzip2 in the same way you did with xz 5.0.5. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 65+ messages in thread
end of thread, other threads:[~2013-12-01 13:24 UTC | newest] Thread overview: 65+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-12-07 19:55 Intermittent failures retrieving process exit codes Tom Honermann 2012-12-07 21:54 ` Tom Honermann 2012-12-07 23:07 ` bartels 2012-12-21 6:30 ` Tom Honermann 2012-12-21 10:33 ` Corinna Vinschen 2012-12-21 12:15 ` Nick Lowe 2012-12-21 19:45 ` Tom Honermann 2012-12-22 3:09 ` Nick Lowe 2012-12-21 16:10 ` Christopher Faylor 2012-12-21 17:02 ` Corinna Vinschen 2012-12-21 19:36 ` Intermittent failures retrieving process exit codes - snapshot test requested Christopher Faylor 2012-12-21 20:37 ` Daniel Colascione 2012-12-21 22:23 ` marco atzeri 2012-12-21 23:09 ` Tom Honermann 2012-12-22 2:53 ` Christopher Faylor 2012-12-22 2:57 ` Tom Honermann 2012-12-22 2:49 ` Christopher Faylor 2012-12-22 3:14 ` Christopher Faylor 2012-12-22 9:06 ` marco atzeri 2012-12-22 17:50 ` Christopher Faylor 2012-12-23 16:56 ` Christopher Faylor 2012-12-23 18:54 ` marco atzeri 2012-12-27 20:50 ` Tom Honermann 2012-12-29 21:57 ` Christopher Faylor 2013-01-01 1:45 ` Tom Honermann 2013-01-01 5:36 ` Christopher Faylor 2013-01-02 19:15 ` Tom Honermann 2013-01-02 20:48 ` Christopher Faylor 2013-01-02 20:53 ` Daniel Colascione 2013-01-02 21:41 ` Christopher Faylor 2013-01-02 21:25 ` Tom Honermann 2013-01-15 22:17 ` Intermittent failures with ctrl-c (was: retrieving process exit codes) Tom Honermann 2013-01-16 2:04 ` Christopher Faylor 2013-01-16 16:38 ` Intermittent failures with ctrl-c Tom Honermann 2013-01-16 16:53 ` marco atzeri 2013-01-16 17:42 ` Tom Honermann 2013-01-16 18:05 ` Earnie Boyd 2013-01-16 18:51 ` Tom Honermann 2013-01-16 18:59 ` Christopher Faylor 2013-01-16 20:19 ` Tom Honermann 2013-01-16 22:23 ` Christopher Faylor 2013-01-18 20:12 ` Tom Honermann 2013-01-19 5:58 ` Christopher Faylor 2013-01-20 22:09 ` Tom Honermann 2013-01-23 3:20 ` Tom Honermann 2013-01-23 5:27 ` Christopher Faylor 2013-01-23 18:18 ` Tom Honermann 2013-01-23 18:35 ` Christopher Faylor 2013-01-24 4:12 ` Tom Honermann 2013-01-16 19:14 ` Christopher Faylor 2013-01-16 20:24 ` Tom Honermann 2012-12-21 20:01 ` Intermittent failures retrieving process exit codes Tom Honermann 2013-11-14 4:02 ` Tom Honermann 2013-11-14 9:20 ` Corinna Vinschen 2013-11-14 15:21 ` Tom Honermann 2013-11-15 18:53 ` Denis Excoffier 2013-11-15 19:21 ` Christopher Faylor 2013-11-17 13:30 ` Denis Excoffier 2013-11-15 22:15 ` Tom Honermann 2013-11-25 19:59 ` Lasse Collin 2013-11-25 23:12 ` Antivirus strikes back (probably) (Was: Intermittent failures retrieving process exit codes) Denis Excoffier 2013-11-26 21:09 ` Denis Excoffier 2013-11-26 23:36 ` Christopher Faylor 2013-11-26 21:09 ` Denis Excoffier 2013-12-01 13:24 ` Lasse Collin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).