public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
* Intermittent failures retrieving process exit codes
@ 2012-12-07 19:55 Tom Honermann
  2012-12-07 21:54 ` Tom Honermann
  2012-12-07 23:07 ` bartels
  0 siblings, 2 replies; 65+ messages in thread
From: Tom Honermann @ 2012-12-07 19:55 UTC (permalink / raw)
  To: cygwin

I've witnessed intermittent failures in multiple build systems while 
working at multiple companies using Cygwin bash and make as part of the 
build system but using non-Cygwin compilers and other tools.  The 
intermittent failures occur when a process appears to complete 
successfully, but the process retrieving its exit code receives an 
unexpected value.  This has been seen on many different Cygwin versions 
across several years.

Several reports of similar sounding issues can be found online:
- 
http://cygwin.1069669.n5.nabble.com/Cygwin-1-7-x-on-Windows-7-Exit-statuses-of-Win32-executables-are-sometimes-wrong-td20186.html
- 
http://stackoverflow.com/questions/9769256/intermittent-failures-under-cygwin-possibly-related-to-candle-and-or-make

I recently was able to produce a very small test case that reproduces 
this issue reliably on some machines:

$ cat test.sh
#!/bin/sh

while [ 1 ]; do
   echo "test..."
   if cmd /c "false"; then
     echo "exiting..."
     exit 1
   fi
done

An invocation of test.sh should run indefinitely, but fails very quickly 
on one of my machines:

$ ./test.sh
test...
test...
exiting...

$ ./test.sh
test...
test...
test...
test...
exiting...

$ ./test.sh
test...
exiting...

There are several high-level possibilities for what is going wrong:

1) cmd.exe is failing to retrieve the correct exit code for the 
invocation of false.exe (A Cygwin process).

2) cmd.exe is failing to return the (correct) exit code it received for 
the invocation of false.exe.

3) bash.exe (A Cygwin process) is failing to retrieve the correct exit 
code for the invocation of cmd.exe.

It is possible that other software installed on the machines I've 
witnessed this on are contributing to the problem (ala 
http://cygwin.com/faq/faq.using.html#faq.using.bloda).  If so, such 
software would be a contributing factor to one of the explanations 
above, but does not necessarily mean that there is not a defect in 
Cygwin (or CreateProcess, WaitForSingleObject, or GetExitCodeProcess). 
I have not yet seen a similar case that does not involve Cygwin, so at 
present I suspect a defect in Cygwin, but possibly one that produces no 
negative symptoms in isolation.

I've reproduced this issue with both the 32-bit and 64-bit versions of 
cmd.exe.  I've also reproduced it by replacing cmd.exe with a C file 
that calls CreateProcess for Cygwin's false.exe on its own.  The issue 
reproduces whether that C file is compiled with Cygwin gcc, MinGW gcc 
(32-bit and 64-bit), and with MSVC (32-bit and 64-bit).  So, substitute 
what you like for 'cmd.exe' in the above.

Likewise, I've reproduced this issue by replacing false.exe in the test 
above with a custom false.exe (A C program that just returns 1).  The 
issue reproduces whether myfalse.exe is compiled with Cygwin gcc, MinGW 
gcc (32-bit and 64-bit), and with MSVC (32-bit and 64-bit).  So, 
substitute what you like for 'false.exe' in the above.

I am not able to reproduce the problem if I elide the invocation of 
false.exe.  (ie, if the cmd.exe invocation is 'cmd /c "exit /B 1"' or if 
my replacement for cmd.exe just returns 1).

The problem feels like a race condition in retrieving process exit 
codes.  Further, it seems that it may only occur when two related 
processes exit in quick succession.

I've been granted several weeks in the near future to work exclusively 
on this issue.  Before I start working on it though, I'd like to hear 
from other community members who have experienced this and tried to 
debug it.  What is and is not known about the issue.  What workarounds 
have been tried (especially any that were found to be successful).  Are 
there specific parts of the Cygwin (or bash) code that you recommend 
starting with?

The machine that I've been running the above script on is 64-bit Windows 
7 Professional SP1 running under VMware Workstation 8 which is running 
on Kubuntu 12.04.

Relevant parts of 'cygcheck-s' are:

Windows 7 Professional N Ver 6.1 Build 7601 Service Pack 1

Running under WOW64 on AMD64

     Cygwin DLL version info:
         DLL version: 1.7.16
         DLL epoch: 19
         DLL old termios: 5
         DLL malloc env: 28
         Cygwin conv: 181
         API major: 0
         API minor: 262
         Shared data: 5
         DLL identifier: cygwin1
         Mount registry: 3
         Cygwin registry name: Cygwin
         Program options name: Program Options
         Installations name: Installations
         Cygdrive default prefix:
         Build date:
         Shared id: cygwin1S5


Potential app conflicts:

ByteMobile laptop optimization client.

No Cygwin services found.

Cygwin Package Information
Package                    Version              Status
bash                       4.1.10-4             OK
cygwin                     1.7.16-1             OK


Tom.


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Intermittent failures retrieving process exit codes
  2012-12-07 19:55 Intermittent failures retrieving process exit codes Tom Honermann
@ 2012-12-07 21:54 ` Tom Honermann
  2012-12-07 23:07 ` bartels
  1 sibling, 0 replies; 65+ messages in thread
From: Tom Honermann @ 2012-12-07 21:54 UTC (permalink / raw)
  To: cygwin

On 12/07/2012 02:54 PM, Tom Honermann wrote:
> Likewise, I've reproduced this issue by replacing false.exe in the test
> above with a custom false.exe (A C program that just returns 1).  The
> issue reproduces whether myfalse.exe is compiled with Cygwin gcc, MinGW
> gcc (32-bit and 64-bit), and with MSVC (32-bit and 64-bit).  So,
> substitute what you like for 'false.exe' in the above.

The above is not correct, I erred in my testing.

I am able to reproduce the issue when replacing false.exe in the test 
case with a custom false.exe compiled with Cygwin gcc.

I am *not* able to reproduce the issue when replacing it with one 
compiled with MinGW gcc (32-bit or 64-bit) or with MSVC (32-bit or 64-bit).

Tom.


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Intermittent failures retrieving process exit codes
  2012-12-07 19:55 Intermittent failures retrieving process exit codes Tom Honermann
  2012-12-07 21:54 ` Tom Honermann
@ 2012-12-07 23:07 ` bartels
  2012-12-21  6:30   ` Tom Honermann
  1 sibling, 1 reply; 65+ messages in thread
From: bartels @ 2012-12-07 23:07 UTC (permalink / raw)
  To: cygwin

On 12/07/2012 08:54 PM, Tom Honermann wrote:
>
> I recently was able to produce a very small test case that reproduces this issue reliably on some machines:

Your suspicion about a race condition may very well be correct: I can easily confirm the problem on both iron and virtual smp, but not on a 
single core virtual.

I have two instances of your test case running for half hour on the same core, without any problem: 30k cycles without hickup.

Apart from the immediate effect exposed by your script, I have reason to believe that the root cause also affects other running (smp) processes.

bartels

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Intermittent failures retrieving process exit codes
  2012-12-07 23:07 ` bartels
@ 2012-12-21  6:30   ` Tom Honermann
  2012-12-21 10:33     ` Corinna Vinschen
                       ` (2 more replies)
  0 siblings, 3 replies; 65+ messages in thread
From: Tom Honermann @ 2012-12-21  6:30 UTC (permalink / raw)
  To: cygwin

I spent most of the week debugging this issue.  This appears to be a 
defect in Windows.  I can reproduce the issue without Cygwin.  I can't 
rule out other third party kernel mode software possibly contributing to 
the issue.  A simple change to Cygwin works around the problem for me.

I don't know which Windows releases are affected by this.  I've only 
reproduced the problem (outside of Cygwin) with Wow64 processes running 
on 64-bit Windows 7.  I haven't yet tried elsewhere.

The problem appears to be a race condition involving concurrent calls to 
TerminateProcess() and ExitThread().  The example code below minimally 
mimics the threads created and exit process/thread calls that are 
performed when running Cygwin's false.exe.  The primary thread exits the 
process via TerminateProcess() ala pinfo::exit() in 
winsup/cygwin/pinfo.cc.  The secondary thread exits itself via 
ExitThread() ala Cygwin's signal processing thread function, wait_sig(), 
in winsup/cygwin/sigproc.cc.

When the race condition results in the undesirable outcome, the exit 
code for the process is set to the exit code for the secondary thread's 
call to ExitThread().  I can only speculate at this point, but my guess 
is that the TerminateProcess() code disassociates the calling thread 
from the process before other threads are stopped such that 
ExitThread(), concurrently running in another thread, may determine that 
the calling thread is the last thread of the process and overwrite the 
process exit code.

The issue also reproduces if ExitProcess() is called in place of 
TerminateProcess().  The test case below only uses TerminateProcess() 
because that is what Cygwin does.

Source code to reproduce the issue follows.  Again, Cygwin is not 
required to reproduce the problem.  For my own testing, I compiled the 
code using Microsoft's Visual Studio 2010 x86 compiler with the command 
'cl /Fetest-exit-code.exe test-exit-code.cpp'

test-exit-code.cpp:

#include <windows.h>
#include <stdio.h>
#include <stdlib.h>

DWORD WINAPI SecondaryThread(
     LPVOID lpParameter)
{
     Sleep(1);
     ExitThread(2);
}

int main() {
     HANDLE hSecondaryThread = CreateThread(
         NULL,                               // lpThreadAttributes
         0,                                  // dwStackSize
         SecondaryThread,                    // lpStartAddress
         (LPVOID)0,                          // lpParameter
         0,                                  // dwCreationFlags
         NULL);                              // lpThreadId
     if (!hSecondaryThread) {
         fprintf(stderr, "CreateThread failed.  GLE=%lu\n",
             (unsigned long)GetLastError());
         exit(127);
     }

     Sleep(1);

     if (!TerminateProcess(GetCurrentProcess(), 1)) {
         fprintf(stderr, "TerminateProcess failed.  GLE=%lu\n",
             (unsigned long)GetLastError());
         exit(127);
     }

     return 0;
}


To run the test, a simple .bat file is used:

test.bat:

@echo off
setlocal

:loop
echo test...
test-exit-code.exe
if %ERRORLEVEL% NEQ 1 (
     echo test-exit-code.exe returned %ERRORLEVEL%
     exit /B 1
)
goto loop


test.bat should run indefinitely.  The amount of time it takes to fail 
on my machine (64-bit Windows 7 running in a VMware Workstation 8 VM 
under Kubuntu 12.04 on a Lenovo T420 Intel i7-2640M 2 processor laptop) 
varies considerably.  I had one run fail in less than 10 iterations, but 
most of the time it has taken upwards of 5 minutes to get a failure.

The workaround I implemented within Cygwin was simple and sloppy.  I 
added a call to Sleep(1000) immediately before the call to ExitThread() 
in wait_sig() in winsup/cygwin/sigproc.cc.  Since this thread (probably) 
doesn't exit until the process is exiting anyway, the call to Sleep() 
does not adversely affect shutdown.  The thread just gets terminated 
while in the call to Sleep() instead of exiting before the process is 
terminated or getting terminated while still in the call to 
ExitThread().  A better solution might be to avoid the thread exiting at 
all (so long as it can't get terminated while holding critical 
resources), or to have the process exiting thread wait on it.  Neither 
of these is ideal.  Orderly shutdown of multi-threaded processes is 
really hard to do correctly on Windows.

Since the exit code for the signal processing thread is not used, having 
the wait_sig() thread (and any other threads that could potentially 
concurrently exit with another thread) exit with a special status value 
such as STATUS_THREAD_IS_TERMINATING (0xC000004BL) would enable 
diagnosis of this issue as any process exit code matching this would be 
a likely indicator that this issue was encountered.

As is, when this race condition results in the undesirable outcome, 
since the signal processing thread exits with a status of 0, the exit 
status of the process is 0.  This explains why false.exe works so well 
to reproduce the issue.  It would be impossible to produce a negative 
test using true.exe.

Tom.


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Intermittent failures retrieving process exit codes
  2012-12-21  6:30   ` Tom Honermann
@ 2012-12-21 10:33     ` Corinna Vinschen
  2012-12-21 12:15       ` Nick Lowe
  2012-12-21 16:10       ` Christopher Faylor
  2012-12-21 20:01     ` Intermittent failures retrieving process exit codes Tom Honermann
  2013-11-14  4:02     ` Tom Honermann
  2 siblings, 2 replies; 65+ messages in thread
From: Corinna Vinschen @ 2012-12-21 10:33 UTC (permalink / raw)
  To: cygwin

On Dec 21 01:30, Tom Honermann wrote:
> I spent most of the week debugging this issue.  This appears to be a
> defect in Windows.  I can reproduce the issue without Cygwin.  I
> can't rule out other third party kernel mode software possibly
> contributing to the issue.  A simple change to Cygwin works around
> the problem for me.
> 
> I don't know which Windows releases are affected by this.  I've only
> reproduced the problem (outside of Cygwin) with Wow64 processes
> running on 64-bit Windows 7.  I haven't yet tried elsewhere.
> 
> The problem appears to be a race condition involving concurrent
> calls to TerminateProcess() and ExitThread().  The example code
> below minimally mimics the threads created and exit process/thread
> calls that are performed when running Cygwin's false.exe.  The
> primary thread exits the process via TerminateProcess() ala
> pinfo::exit() in winsup/cygwin/pinfo.cc.  The secondary thread exits
> itself via ExitThread() ala Cygwin's signal processing thread
> function, wait_sig(), in winsup/cygwin/sigproc.cc.
> 
> When the race condition results in the undesirable outcome, the exit
> code for the process is set to the exit code for the secondary
> thread's call to ExitThread().  I can only speculate at this point,
> but my guess is that the TerminateProcess() code disassociates the
> calling thread from the process before other threads are stopped
> such that ExitThread(), concurrently running in another thread, may
> determine that the calling thread is the last thread of the process
> and overwrite the process exit code.
> 
> The issue also reproduces if ExitProcess() is called in place of
> TerminateProcess().  The test case below only uses
> TerminateProcess() because that is what Cygwin does.
> 
> Source code to reproduce the issue follows.  Again, Cygwin is not
> required to reproduce the problem.  For my own testing, I compiled
> the code using Microsoft's Visual Studio 2010 x86 compiler with the
> command 'cl /Fetest-exit-code.exe test-exit-code.cpp'
> 
> test-exit-code.cpp:

Wow.  Thanks for this testcase.  I tried to reproduce the issue and
I was not able to reprodsuce it on a single-CPU, single-core setup,
but I could reproduce it almost immediately on a dual-core system,
twice in a row in under 5 secs.

> The workaround I implemented within Cygwin was simple and sloppy.  I
> added a call to Sleep(1000) immediately before the call to
> ExitThread() in wait_sig() in winsup/cygwin/sigproc.cc.  Since this
> thread (probably) doesn't exit until the process is exiting anyway,
> the call to Sleep() does not adversely affect shutdown.  The thread
> just gets terminated while in the call to Sleep() instead of exiting
> before the process is terminated or getting terminated while still
> in the call to ExitThread().  A better solution might be to avoid
> the thread exiting at all (so long as it can't get terminated while
> holding critical resources), or to have the process exiting thread
> wait on it.  Neither of these is ideal.  Orderly shutdown of
> multi-threaded processes is really hard to do correctly on Windows.
> 
> Since the exit code for the signal processing thread is not used,
> having the wait_sig() thread (and any other threads that could
> potentially concurrently exit with another thread) exit with a
> special status value such as STATUS_THREAD_IS_TERMINATING
> (0xC000004BL) would enable diagnosis of this issue as any process
> exit code matching this would be a likely indicator that this issue
> was encountered.

Maybe the signal thread should really not exit by itself, but just
wait until the TerminateThread is called.  Chris?


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Intermittent failures retrieving process exit codes
  2012-12-21 10:33     ` Corinna Vinschen
@ 2012-12-21 12:15       ` Nick Lowe
  2012-12-21 19:45         ` Tom Honermann
  2012-12-21 16:10       ` Christopher Faylor
  1 sibling, 1 reply; 65+ messages in thread
From: Nick Lowe @ 2012-12-21 12:15 UTC (permalink / raw)
  To: Andrey Repin

Briefly casting my eye at the test case, as a general point, remember
that these termination APIs all complete asynchronously and I do not
believe it has ever been safe or correct to call another while one is
still pending - you are in undefined, edge case behaviour territory
here.

Win32's TerminateThread/ExitThread, that in turn calls the native
NtTerminateThread, only requests cancellation of a thread and returns
immediately.
One has to wait on a handle to the thread know that termination has
completed, for which the synchronise standard access right is
required.
The same is true of Win32's TerminateProcess/ExitProcess, in turn
NtTerminateProcess, where one waits instead on a handle to the
process.

Regards,

Nick

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Intermittent failures retrieving process exit codes
  2012-12-21 10:33     ` Corinna Vinschen
  2012-12-21 12:15       ` Nick Lowe
@ 2012-12-21 16:10       ` Christopher Faylor
  2012-12-21 17:02         ` Corinna Vinschen
  1 sibling, 1 reply; 65+ messages in thread
From: Christopher Faylor @ 2012-12-21 16:10 UTC (permalink / raw)
  To: cygwin

On Fri, Dec 21, 2012 at 11:32:41AM +0100, Corinna Vinschen wrote:
>Maybe the signal thread should really not exit by itself, but just
>wait until the TerminateThread is called.  Chris?

If the analysis is correct, that just fixes one symptom doesn't it?
There are potentially many threads running in any Cygwin program
and it sounds like any one of them could trigger this.

cgf

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Intermittent failures retrieving process exit codes
  2012-12-21 16:10       ` Christopher Faylor
@ 2012-12-21 17:02         ` Corinna Vinschen
  2012-12-21 19:36           ` Intermittent failures retrieving process exit codes - snapshot test requested Christopher Faylor
  0 siblings, 1 reply; 65+ messages in thread
From: Corinna Vinschen @ 2012-12-21 17:02 UTC (permalink / raw)
  To: cygwin

On Dec 21 11:10, Christopher Faylor wrote:
> On Fri, Dec 21, 2012 at 11:32:41AM +0100, Corinna Vinschen wrote:
> >Maybe the signal thread should really not exit by itself, but just
> >wait until the TerminateThread is called.  Chris?
> 
> If the analysis is correct, that just fixes one symptom doesn't it?
> There are potentially many threads running in any Cygwin program
> and it sounds like any one of them could trigger this.

Right.  I guess the question is how to synchronize things so that the
thread calling TerminateProcess is actually the last one, making sure
its return value is used.

Maybe the NtQueryInformationThread(ThreadAmILastThread) call is of some
help.  Or we have to keep all thread IDs of the self-started threads
available to terminate them explicitely at process exit.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Intermittent failures retrieving process exit codes - snapshot test requested
  2012-12-21 17:02         ` Corinna Vinschen
@ 2012-12-21 19:36           ` Christopher Faylor
  2012-12-21 20:37             ` Daniel Colascione
  2012-12-21 22:23             ` marco atzeri
  0 siblings, 2 replies; 65+ messages in thread
From: Christopher Faylor @ 2012-12-21 19:36 UTC (permalink / raw)
  To: cygwin

On Fri, Dec 21, 2012 at 06:02:19PM +0100, Corinna Vinschen wrote:
>On Dec 21 11:10, Christopher Faylor wrote:
>> On Fri, Dec 21, 2012 at 11:32:41AM +0100, Corinna Vinschen wrote:
>> >Maybe the signal thread should really not exit by itself, but just
>> >wait until the TerminateThread is called.  Chris?
>> 
>> If the analysis is correct, that just fixes one symptom doesn't it?
>> There are potentially many threads running in any Cygwin program
>> and it sounds like any one of them could trigger this.
>
>Right.  I guess the question is how to synchronize things so that the
>thread calling TerminateProcess is actually the last one, making sure
>its return value is used.
>
>Maybe the NtQueryInformationThread(ThreadAmILastThread) call is of some
>help.  Or we have to keep all thread IDs of the self-started threads
>available to terminate them explicitely at process exit.

I checked in a complicated fix for this problem which only affected
Cygwin-created threads.  But, then, I thought about another riskier but
simpler fix.  That version is now in CVS and I'm generating a new
snapshot with it.

I tested this lightly on Windows 7 and 32-bit XP but it would be nice to
hear if multi-threaded things like X work on other platforms too.

If you test a snapshot, note that I'm still tracking down Ken Brown's
reporte emacs regression in recent snapshots so that will still be
broken.

cgf

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Intermittent failures retrieving process exit codes
  2012-12-21 12:15       ` Nick Lowe
@ 2012-12-21 19:45         ` Tom Honermann
  2012-12-22  3:09           ` Nick Lowe
  0 siblings, 1 reply; 65+ messages in thread
From: Tom Honermann @ 2012-12-21 19:45 UTC (permalink / raw)
  To: cygwin

On 12/21/2012 07:15 AM, Nick Lowe wrote:
> Briefly casting my eye at the test case, as a general point, remember
> that these termination APIs all complete asynchronously and I do not
> believe it has ever been safe or correct to call another while one is
> still pending - you are in undefined, edge case behaviour territory
> here.

These comments do not match my understanding of these APIs.  MSDN 
documentation contradicts some of this as well.

> Win32's TerminateThread/ExitThread, that in turn calls the native
> NtTerminateThread, only requests cancellation of a thread and returns
> immediately.
> One has to wait on a handle to the thread know that termination has
> completed, for which the synchronise standard access right is
> required.
> The same is true of Win32's TerminateProcess/ExitProcess, in turn
> NtTerminateProcess, where one waits instead on a handle to the
> process.

TerminateProcess() is documented to perform error checking and then to 
schedule asynchronous termination of the specified process.  I would not 
be surprised if the asynchronous termination applies even when 
GetCurrentProcess() is used to specify the process to terminate, but I 
would likewise not be surprised if TerminateProcess() has special 
handling for this.  I agree that calls to TerminateProcess() might 
return before the calling thread/process is terminated.  I have not 
tried to verify this behavior though.

http://msdn.microsoft.com/en-us/library/windows/desktop/ms686714%28v=vs.85%29.aspx

The MSDN documentation for TerminateThread() does not state that the 
termination is carried out asynchronously, but I would not be surprised 
if that is the case.

http://msdn.microsoft.com/en-us/library/windows/desktop/ms686717%28v=vs.85%29.aspx

I would be *very* surprised if it is possible for ExitProcess() and 
ExitThread() to return (unless the thread is being suspended and its 
context manipulated by another process/thread).  The MSDN docs for these 
do not mention any possibility of return.  In addition, the ExitThread() 
documentation explicitly states that Windows manages serialization of 
calls to ExitProcess() and ExitThread().

<quote>
The ExitProcess, ExitThread, CreateThread, CreateRemoteThread functions, 
and a process that is starting (as the result of a CreateProcess call) 
are serialized between each other within a process. Only one of these 
events can happen in an address space at a time.
</quote>

http://msdn.microsoft.com/en-us/library/windows/desktop/ms682659%28v=vs.85%29.aspx

http://msdn.microsoft.com/en-us/library/windows/desktop/ms682658%28v=vs.85%29.aspx

I read that quote as supporting my assertion that the observed behavior 
is a defect in Windows.  It appears Windows is failing to serialize the 
calls appropriately.

Tom.


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Intermittent failures retrieving process exit codes
  2012-12-21  6:30   ` Tom Honermann
  2012-12-21 10:33     ` Corinna Vinschen
@ 2012-12-21 20:01     ` Tom Honermann
  2013-11-14  4:02     ` Tom Honermann
  2 siblings, 0 replies; 65+ messages in thread
From: Tom Honermann @ 2012-12-21 20:01 UTC (permalink / raw)
  To: cygwin

On 12/21/2012 01:30 AM, Tom Honermann wrote:
> I don't know which Windows releases are affected by this.  I've only
> reproduced the problem (outside of Cygwin) with Wow64 processes running
> on 64-bit Windows 7.  I haven't yet tried elsewhere.

I was able to reproduce the issue with a 64-bit executable compiled with 
the test case in the parent email using Microsoft's Visual Studio 2010 
x64 compiler.  This issue does not appear to be specific to support for 
running 32-bit processes on 64-bit Windows via Wow64.

I have not yet tried to reproduce this on any release of Windows other 
than 64-bit Windows 7 SP1.  I am curious about what other Windows 
releases are affected.  Please reply if you try the test case and are 
able to reproduce the problem on other Windows releases.  So far, I'm 
only aware of the issue being reproduced on multi-processor systems.  I 
suspect the problem can occur on single-processor systems as well, but 
is much less likely to.

Tom.


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Intermittent failures retrieving process exit codes - snapshot test requested
  2012-12-21 19:36           ` Intermittent failures retrieving process exit codes - snapshot test requested Christopher Faylor
@ 2012-12-21 20:37             ` Daniel Colascione
  2012-12-21 22:23             ` marco atzeri
  1 sibling, 0 replies; 65+ messages in thread
From: Daniel Colascione @ 2012-12-21 20:37 UTC (permalink / raw)
  To: cygwin

[-- Attachment #1: Type: text/plain, Size: 1382 bytes --]

On 12/21/2012 11:36 AM, Christopher Faylor wrote:
> On Fri, Dec 21, 2012 at 06:02:19PM +0100, Corinna Vinschen wrote:
>> On Dec 21 11:10, Christopher Faylor wrote:
>>> On Fri, Dec 21, 2012 at 11:32:41AM +0100, Corinna Vinschen wrote:
>>>> Maybe the signal thread should really not exit by itself, but just
>>>> wait until the TerminateThread is called.  Chris?
>>>
>>> If the analysis is correct, that just fixes one symptom doesn't it?
>>> There are potentially many threads running in any Cygwin program
>>> and it sounds like any one of them could trigger this.
>>
>> Right.  I guess the question is how to synchronize things so that the
>> thread calling TerminateProcess is actually the last one, making sure
>> its return value is used.
>>
>> Maybe the NtQueryInformationThread(ThreadAmILastThread) call is of some
>> help.  Or we have to keep all thread IDs of the self-started threads
>> available to terminate them explicitely at process exit.
> 
> I checked in a complicated fix for this problem which only affected
> Cygwin-created threads.  But, then, I thought about another riskier but
> simpler fix. 

Your second approach scares me. There's no global order imposed on the loader
lock and the Cygwin process lock, and Windows can take the loader lock at
virtually any time, since LoadLibrary can be used internally to implement any API.



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 258 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Intermittent failures retrieving process exit codes - snapshot test requested
  2012-12-21 19:36           ` Intermittent failures retrieving process exit codes - snapshot test requested Christopher Faylor
  2012-12-21 20:37             ` Daniel Colascione
@ 2012-12-21 22:23             ` marco atzeri
  2012-12-21 23:09               ` Tom Honermann
  2012-12-22  2:49               ` Christopher Faylor
  1 sibling, 2 replies; 65+ messages in thread
From: marco atzeri @ 2012-12-21 22:23 UTC (permalink / raw)
  To: cygwin

On 12/21/2012 8:36 PM, Christopher Faylor wrote:
> On Fri, Dec 21, 2012 at 06:02:19PM +0100, Corinna Vinschen wrote:
>> On Dec 21 11:10, Christopher Faylor wrote:
>>> On Fri, Dec 21, 2012 at 11:32:41AM +0100, Corinna Vinschen wrote:
>>>> Maybe the signal thread should really not exit by itself, but just
>>>> wait until the TerminateThread is called.  Chris?
>>>
>>> If the analysis is correct, that just fixes one symptom doesn't it?
>>> There are potentially many threads running in any Cygwin program
>>> and it sounds like any one of them could trigger this.
>>
>> Right.  I guess the question is how to synchronize things so that the
>> thread calling TerminateProcess is actually the last one, making sure
>> its return value is used.
>>
>> Maybe the NtQueryInformationThread(ThreadAmILastThread) call is of some
>> help.  Or we have to keep all thread IDs of the self-started threads
>> available to terminate them explicitely at process exit.
>
> I checked in a complicated fix for this problem which only affected
> Cygwin-created threads.  But, then, I thought about another riskier but
> simpler fix.  That version is now in CVS and I'm generating a new
> snapshot with it.
>
> I tested this lightly on Windows 7 and 32-bit XP but it would be nice to
> hear if multi-threaded things like X work on other platforms too.
>
> If you test a snapshot, note that I'm still tracking down Ken Brown's
> reporte emacs regression in recent snapshots so that will still be
> broken.
>
> cgf
>

I think the Xserver doesn't like it.
on 20121221 it freezes on start on W7/64
no issue on 20121218

Regards
Marco





--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Intermittent failures retrieving process exit codes - snapshot test requested
  2012-12-21 22:23             ` marco atzeri
@ 2012-12-21 23:09               ` Tom Honermann
  2012-12-22  2:53                 ` Christopher Faylor
  2012-12-22  2:49               ` Christopher Faylor
  1 sibling, 1 reply; 65+ messages in thread
From: Tom Honermann @ 2012-12-21 23:09 UTC (permalink / raw)
  To: cygwin

On 12/21/2012 05:23 PM, marco atzeri wrote:
> On 12/21/2012 8:36 PM, Christopher Faylor wrote:
>> I tested this lightly on Windows 7 and 32-bit XP but it would be nice to
>> hear if multi-threaded things like X work on other platforms too.
>>
>> If you test a snapshot, note that I'm still tracking down Ken Brown's
>> reporte emacs regression in recent snapshots so that will still be
>> broken.
>>
>> cgf
>>
>
> I think the Xserver doesn't like it.
> on 20121221 it freezes on start on W7/64
> no issue on 20121218

I was worried about this possibility after looking at the code changes. 
  But, I haven't had to a chance to test adequately yet.  I would expect 
indefinite blocking in dll_entry() may prevent unloading DLLs.  For 
example, calls to dll_entry() for DLL_PROCESS_DETACH may get blocked.

It looks to me like the changes made are insufficient to prevent the 
race.  For example, this won't address the case where an exiting thread 
releases the process lock acquired in dll_entry() before a thread 
exiting the process acquires it in pinfo::exit().  Both threads could 
still end up in an ExitThread() vs ExitProcess()/TerminateProcess() 
race.  However, this is only true for threads whose exits are not 
predicated upon an action taken by the process exiting thread after it 
has acquired the process lock in pinfo::exit().  And since the exiting 
thread must be the last thread of the process in order to hit the issue, 
this may not be a concern.

I'm not sure that a general workaround for this issue is feasible for 
all possible threads.  At least, not without hooking the Terminate* and 
Exit* Win32 APIs.  My gut tells me that a general solution requires 
waiting for thread handles to be signaled, but I haven't thought it 
completely through yet.

It looks like Chris reverted the change and checked in a new update.  I 
haven't looked at those changes yet.

Tom.


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Intermittent failures retrieving process exit codes - snapshot test requested
  2012-12-21 22:23             ` marco atzeri
  2012-12-21 23:09               ` Tom Honermann
@ 2012-12-22  2:49               ` Christopher Faylor
  2012-12-22  3:14                 ` Christopher Faylor
  1 sibling, 1 reply; 65+ messages in thread
From: Christopher Faylor @ 2012-12-22  2:49 UTC (permalink / raw)
  To: cygwin

On Fri, Dec 21, 2012 at 11:23:00PM +0100, marco atzeri wrote:
>On 12/21/2012 8:36 PM, Christopher Faylor wrote:
>> On Fri, Dec 21, 2012 at 06:02:19PM +0100, Corinna Vinschen wrote:
>>> On Dec 21 11:10, Christopher Faylor wrote:
>>>> On Fri, Dec 21, 2012 at 11:32:41AM +0100, Corinna Vinschen wrote:
>>>>> Maybe the signal thread should really not exit by itself, but just
>>>>> wait until the TerminateThread is called.  Chris?
>>>>
>>>> If the analysis is correct, that just fixes one symptom doesn't it?
>>>> There are potentially many threads running in any Cygwin program
>>>> and it sounds like any one of them could trigger this.
>>>
>>> Right.  I guess the question is how to synchronize things so that the
>>> thread calling TerminateProcess is actually the last one, making sure
>>> its return value is used.
>>>
>>> Maybe the NtQueryInformationThread(ThreadAmILastThread) call is of some
>>> help.  Or we have to keep all thread IDs of the self-started threads
>>> available to terminate them explicitely at process exit.
>>
>> I checked in a complicated fix for this problem which only affected
>> Cygwin-created threads.  But, then, I thought about another riskier but
>> simpler fix.  That version is now in CVS and I'm generating a new
>> snapshot with it.
>>
>> I tested this lightly on Windows 7 and 32-bit XP but it would be nice to
>> hear if multi-threaded things like X work on other platforms too.
>>
>> If you test a snapshot, note that I'm still tracking down Ken Brown's
>> reporte emacs regression in recent snapshots so that will still be
>> broken.
>>
>> cgf
>>
>
>I think the Xserver doesn't like it.
>on 20121221 it freezes on start on W7/64
>no issue on 20121218

I acdtually tried Xserver before submitting my change so it certainly isn't
a consistent problem.

cgf

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Intermittent failures retrieving process exit codes - snapshot test requested
  2012-12-21 23:09               ` Tom Honermann
@ 2012-12-22  2:53                 ` Christopher Faylor
  2012-12-22  2:57                   ` Tom Honermann
  0 siblings, 1 reply; 65+ messages in thread
From: Christopher Faylor @ 2012-12-22  2:53 UTC (permalink / raw)
  To: cygwin

On Fri, Dec 21, 2012 at 06:08:46PM -0500, Tom Honermann wrote:
>On 12/21/2012 05:23 PM, marco atzeri wrote:
>> On 12/21/2012 8:36 PM, Christopher Faylor wrote:
>>> I tested this lightly on Windows 7 and 32-bit XP but it would be nice to
>>> hear if multi-threaded things like X work on other platforms too.
>>>
>>> If you test a snapshot, note that I'm still tracking down Ken Brown's
>>> reporte emacs regression in recent snapshots so that will still be
>>> broken.
>>>
>>> cgf
>>>
>>
>> I think the Xserver doesn't like it.
>> on 20121221 it freezes on start on W7/64
>> no issue on 20121218
>
>I was worried about this possibility after looking at the code changes. 
>  But, I haven't had to a chance to test adequately yet.  I would expect 
>indefinite blocking in dll_entry() may prevent unloading DLLs.  For 
>example, calls to dll_entry() for DLL_PROCESS_DETACH may get blocked.

You're looking at the wrong changes.

cgf

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Intermittent failures retrieving process exit codes - snapshot test requested
  2012-12-22  2:53                 ` Christopher Faylor
@ 2012-12-22  2:57                   ` Tom Honermann
  0 siblings, 0 replies; 65+ messages in thread
From: Tom Honermann @ 2012-12-22  2:57 UTC (permalink / raw)
  To: cygwin

On 12/21/2012 09:52 PM, Christopher Faylor wrote:
> You're looking at the wrong changes.

I wasn't at the time that I wrote that :)

I noticed that you had reverted those changes.  I haven't looked at the 
new changes yet.

Tom.


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Intermittent failures retrieving process exit codes
  2012-12-21 19:45         ` Tom Honermann
@ 2012-12-22  3:09           ` Nick Lowe
  0 siblings, 0 replies; 65+ messages in thread
From: Nick Lowe @ 2012-12-22  3:09 UTC (permalink / raw)
  To: Andrey Repin

The documentation in MSDN is incorrect/incomplete with regards to
TerminateThread/TerminateProcess, both are definitely asynchronous.

I am not clear/confident on the behaviour of ExitProcess and
ExitThread, but will investigate with IDA and a test case later. I
suspect any locking/serialisation will pertain to these functions
only.

On Fri, Dec 21, 2012 at 7:44 PM, Tom Honermann <thonermann@coverity.com> wrote:
> On 12/21/2012 07:15 AM, Nick Lowe wrote:
>>
>> Briefly casting my eye at the test case, as a general point, remember
>> that these termination APIs all complete asynchronously and I do not
>> believe it has ever been safe or correct to call another while one is
>> still pending - you are in undefined, edge case behaviour territory
>> here.
>
>
> These comments do not match my understanding of these APIs.  MSDN
> documentation contradicts some of this as well.
>
>
>> Win32's TerminateThread/ExitThread, that in turn calls the native
>> NtTerminateThread, only requests cancellation of a thread and returns
>> immediately.
>> One has to wait on a handle to the thread know that termination has
>> completed, for which the synchronise standard access right is
>> required.
>> The same is true of Win32's TerminateProcess/ExitProcess, in turn
>> NtTerminateProcess, where one waits instead on a handle to the
>> process.
>
>
> TerminateProcess() is documented to perform error checking and then to
> schedule asynchronous termination of the specified process.  I would not be
> surprised if the asynchronous termination applies even when
> GetCurrentProcess() is used to specify the process to terminate, but I would
> likewise not be surprised if TerminateProcess() has special handling for
> this.  I agree that calls to TerminateProcess() might return before the
> calling thread/process is terminated.  I have not tried to verify this
> behavior though.
>
> http://msdn.microsoft.com/en-us/library/windows/desktop/ms686714%28v=vs.85%29.aspx
>
> The MSDN documentation for TerminateThread() does not state that the
> termination is carried out asynchronously, but I would not be surprised if
> that is the case.
>
> http://msdn.microsoft.com/en-us/library/windows/desktop/ms686717%28v=vs.85%29.aspx
>
> I would be *very* surprised if it is possible for ExitProcess() and
> ExitThread() to return (unless the thread is being suspended and its context
> manipulated by another process/thread).  The MSDN docs for these do not
> mention any possibility of return.  In addition, the ExitThread()
> documentation explicitly states that Windows manages serialization of calls
> to ExitProcess() and ExitThread().
>
> <quote>
> The ExitProcess, ExitThread, CreateThread, CreateRemoteThread functions, and
> a process that is starting (as the result of a CreateProcess call) are
> serialized between each other within a process. Only one of these events can
> happen in an address space at a time.
> </quote>
>
> http://msdn.microsoft.com/en-us/library/windows/desktop/ms682659%28v=vs.85%29.aspx
>
> http://msdn.microsoft.com/en-us/library/windows/desktop/ms682658%28v=vs.85%29.aspx
>
> I read that quote as supporting my assertion that the observed behavior is a
> defect in Windows.  It appears Windows is failing to serialize the calls
> appropriately.
>
> Tom.
>
>
>
> --
> Problem reports:       http://cygwin.com/problems.html
> FAQ:                   http://cygwin.com/faq/
> Documentation:         http://cygwin.com/docs.html
> Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
>

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Intermittent failures retrieving process exit codes - snapshot test requested
  2012-12-22  2:49               ` Christopher Faylor
@ 2012-12-22  3:14                 ` Christopher Faylor
  2012-12-22  9:06                   ` marco atzeri
  0 siblings, 1 reply; 65+ messages in thread
From: Christopher Faylor @ 2012-12-22  3:14 UTC (permalink / raw)
  To: cygwin

On Fri, Dec 21, 2012 at 09:49:43PM -0500, Christopher Faylor wrote:
>I actually tried Xserver before submitting my change so it certainly isn't
>a consistent problem.

Sorry, I take that back.  I tried Xserver before backing out parts of the
other change and never retried it.  Marco is right.  It's definitely broken.
I've checked in a new change and am regenerating a snapshot.

cgf

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Intermittent failures retrieving process exit codes - snapshot test requested
  2012-12-22  3:14                 ` Christopher Faylor
@ 2012-12-22  9:06                   ` marco atzeri
  2012-12-22 17:50                     ` Christopher Faylor
  0 siblings, 1 reply; 65+ messages in thread
From: marco atzeri @ 2012-12-22  9:06 UTC (permalink / raw)
  To: cygwin

On 12/22/2012 4:14 AM, Christopher Faylor wrote:
> On Fri, Dec 21, 2012 at 09:49:43PM -0500, Christopher Faylor wrote:
>> I actually tried Xserver before submitting my change so it certainly isn't
>> a consistent problem.
>
> Sorry, I take that back.  I tried Xserver before backing out parts of the
> other change and never retried it.  Marco is right.  It's definitely broken.
> I've checked in a new change and am regenerating a snapshot.
>
> cgf
>

glad to be useful

20121222 : Xserver works fine and the false loop does not stop.

However lftp is still broken

$ lftp
lftp :~> open -u xxxxxxx  matzeri.altervista.org
       1 [main] lftp 1092 select_stuff::wait: WaitForMultipleObjects 
failed, Win32 error 6


(I have the impression it worked after your last select changes, but I 
am unable to replicate)

Regards
Marco


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Intermittent failures retrieving process exit codes - snapshot test requested
  2012-12-22  9:06                   ` marco atzeri
@ 2012-12-22 17:50                     ` Christopher Faylor
  2012-12-23 16:56                       ` Christopher Faylor
  0 siblings, 1 reply; 65+ messages in thread
From: Christopher Faylor @ 2012-12-22 17:50 UTC (permalink / raw)
  To: cygwin

On Sat, Dec 22, 2012 at 10:06:32AM +0100, marco atzeri wrote:
>On 12/22/2012 4:14 AM, Christopher Faylor wrote:
>> On Fri, Dec 21, 2012 at 09:49:43PM -0500, Christopher Faylor wrote:
>>> I actually tried Xserver before submitting my change so it certainly isn't
>>> a consistent problem.
>>
>> Sorry, I take that back.  I tried Xserver before backing out parts of the
>> other change and never retried it.  Marco is right.  It's definitely broken.
>> I've checked in a new change and am regenerating a snapshot.
>>
>> cgf
>>
>
>glad to be useful
>
>20121222 : Xserver works fine and the false loop does not stop.
>
>However lftp is still broken
>
>$ lftp
>lftp :~> open -u xxxxxxx  matzeri.altervista.org
>       1 [main] lftp 1092 select_stuff::wait: WaitForMultipleObjects 
>failed, Win32 error 6
>
>
>(I have the impression it worked after your last select changes, but I 
>am unable to replicate)

The snapshot is intended to work around the race between ExitThread and
ExitProcess.  Nothing else.

cgf

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Intermittent failures retrieving process exit codes - snapshot test requested
  2012-12-22 17:50                     ` Christopher Faylor
@ 2012-12-23 16:56                       ` Christopher Faylor
  2012-12-23 18:54                         ` marco atzeri
  2012-12-27 20:50                         ` Tom Honermann
  0 siblings, 2 replies; 65+ messages in thread
From: Christopher Faylor @ 2012-12-23 16:56 UTC (permalink / raw)
  To: cygwin

On Sat, Dec 22, 2012 at 12:50:41PM -0500, Christopher Faylor wrote:
>On Sat, Dec 22, 2012 at 10:06:32AM +0100, marco atzeri wrote:
>>On 12/22/2012 4:14 AM, Christopher Faylor wrote:
>>> On Fri, Dec 21, 2012 at 09:49:43PM -0500, Christopher Faylor wrote:
>>>> I actually tried Xserver before submitting my change so it certainly isn't
>>>> a consistent problem.
>>>
>>> Sorry, I take that back.  I tried Xserver before backing out parts of the
>>> other change and never retried it.  Marco is right.  It's definitely broken.
>>> I've checked in a new change and am regenerating a snapshot.
>>
>>glad to be useful
>>
>>20121222 : Xserver works fine and the false loop does not stop.
>>
>>However lftp is still broken
>>
>>$ lftp
>>lftp :~> open -u xxxxxxx  matzeri.altervista.org
>>       1 [main] lftp 1092 select_stuff::wait: WaitForMultipleObjects 
>>failed, Win32 error 6
>>
>>
>>(I have the impression it worked after your last select changes, but I 
>>am unable to replicate)
>
>The snapshot is intended to work around the race between ExitThread and
>ExitProcess.  Nothing else.

The latest snapshot seems to fix this problem.

FYI
cgf

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Intermittent failures retrieving process exit codes - snapshot test requested
  2012-12-23 16:56                       ` Christopher Faylor
@ 2012-12-23 18:54                         ` marco atzeri
  2012-12-27 20:50                         ` Tom Honermann
  1 sibling, 0 replies; 65+ messages in thread
From: marco atzeri @ 2012-12-23 18:54 UTC (permalink / raw)
  To: cygwin

On 12/23/2012 5:56 PM, Christopher Faylor wrote:

>>> However lftp is still broken
>>>
>>> $ lftp
>>> lftp :~> open -u xxxxxxx  matzeri.altervista.org
>>>        1 [main] lftp 1092 select_stuff::wait: WaitForMultipleObjects
>>> failed, Win32 error 6
>>>
>>>
>>> (I have the impression it worked after your last select changes, but I
>>> am unable to replicate)
>>
>> The snapshot is intended to work around the race between ExitThread and
>> ExitProcess.  Nothing else.
>
> The latest snapshot seems to fix this problem.
>
> FYI
> cgf
>

confirmed.
20121222 19:36:23 is fine

Thanks
Marco



--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Intermittent failures retrieving process exit codes - snapshot test requested
  2012-12-23 16:56                       ` Christopher Faylor
  2012-12-23 18:54                         ` marco atzeri
@ 2012-12-27 20:50                         ` Tom Honermann
  2012-12-29 21:57                           ` Christopher Faylor
  1 sibling, 1 reply; 65+ messages in thread
From: Tom Honermann @ 2012-12-27 20:50 UTC (permalink / raw)
  To: cygwin

I've been doing some testing with the latest source (pulled updates 
about 30 minutes ago).  I'm no longer able to reproduce any problems 
with incorrect exit codes (Yay!  Thanks for the quick turn around on 
that!), but I am seeing some new errors when terminating the infinite 
loop via ctrl-c using the test case below.  This is a test case I was 
using previously to help isolate the original problem - I had added 
special_printf() calls in a few places and was using strace -m special 
to trigger them.  All of my changes have been reverted and I'm back to 
using vanilla source code.  This test is run with a newly built 
strace.exe and cygwin1.dll (the false.exe is an old one)

c:\>type test-strace.bat
@echo off
setlocal

set PATH=%CD%;%PATH%

:loop
echo test...
strace -m special false
if not errorlevel 1 (
     echo exiting...
     exit /B 1
)
goto loop


When interrupting the test run, I'll often (but not always) get the 
following error:

c:\>test-strace.bat
test...
test...
test...
test...
--- Process 8092, exception 40010005 at 75E26D67
Terminate batch job (Y/N)? y


Additionally, some of the Cygwin gcc built utilities that I've built for 
testing now occasionally hang upon interruption by ctrl-c.  Basic 
diagnostics courtesy of gdb follow.

This utility was one used in place of strace in the test case above.  It 
does a fork() and execlp() of its first parameter and then calls 
waitpid() on the child and asserts that the exit code received is 1.

If anyone knows of a way to get accurate stack traces when both gcc and 
Microsoft compiled modules are present, I'll be happy to regenerate the 
stack traces below.

$ gdb --pid=6908
GNU gdb (GDB) 7.5.50.20120815-cvs (cygwin-special)
...
Reading symbols from 
/home/thonermann/cygwin/test-install/bin/expect-false-execve-cygwin32.exe...done.
...
(gdb) info shared
 From        To          Syms Read   Shared Object Library
0x77461000  0x775c5d1c  Yes (*)     /cygdrive/c/Windows/SysWOW64/ntdll.dll
0x75d71000  0x75e6bd58  Yes (*) 
/cygdrive/c/Windows/syswow64/kernel32.dll
0x74ba1000  0x74be5a08  Yes (*) 
/cygdrive/c/Windows/syswow64/KERNELBASE.dll
0x61001000  0x61490000  Yes 
/home/thonermann/cygwin/test-install/bin/cygwin1.dll
0x76271000  0x76354198  Yes (*)     /cygdrive/c/Windows/system32/user32.dll
0x74f11000  0x74f8292c  Yes (*)     /cygdrive/c/Windows/syswow64/GDI32.dll
0x76181000  0x761892f8  Yes (*)     /cygdrive/c/Windows/syswow64/LPK.dll
0x74d71000  0x74e0c9fc  Yes (*)     /cygdrive/c/Windows/syswow64/USP10.dll
0x75bf1000  0x75c9b2c4  Yes (*)     /cygdrive/c/Windows/syswow64/msvcrt.dll
0x75eb1000  0x75f4f04c  Yes (*) 
/cygdrive/c/Windows/syswow64/ADVAPI32.dll
0x74ed1000  0x74ee8ed8  Yes (*)     /cygdrive/c/Windows/SysWOW64/sechost.dll
0x76371000  0x76445e04  Yes (*)     /cygdrive/c/Windows/syswow64/RPCRT4.dll
0x74b41000  0x74b821f0  Yes (*)     /cygdrive/c/Windows/syswow64/SspiCli.dll
0x74b31000  0x74b3b474  Yes (*) 
/cygdrive/c/Windows/syswow64/CRYPTBASE.dll
0x76581000  0x765c1ce0  Yes (*)     /cygdrive/c/Windows/system32/IMM32.DLL
0x75ca1000  0x75d6bebc  Yes (*)     /cygdrive/c/Windows/syswow64/MSCTF.dll
0x70e41000  0x70e8b464  Yes (*)     /cygdrive/c/Windows/system32/apphelp.dll
(*): Shared library is missing debugging information.

(gdb) info thread
   Id   Target Id         Frame
* 4    Thread 6908.0x1950 0x7747000d in ntdll!LdrFindResource_U ()
    from /cygdrive/c/Windows/SysWOW64/ntdll.dll
   3    Thread 6908.0x1d8c 0x7747f8e5 in ntdll!RtlUpdateClonedSRWLock ()
    from /cygdrive/c/Windows/SysWOW64/ntdll.dll
   2    Thread 6908.0x1d34 0x7747f8b1 in ntdll!RtlUpdateClonedSRWLock ()
    from /cygdrive/c/Windows/SysWOW64/ntdll.dll
   1    Thread 6908.0x1344 0x7748013d in 
ntdll!RtlEnableEarlyCriticalSectionEventCreation ()
    from /cygdrive/c/Windows/SysWOW64/ntdll.dll

(gdb) thread 1
[Switching to thread 1 (Thread 6908.0x1344)]
#0  0x7748013d in ntdll!RtlEnableEarlyCriticalSectionEventCreation ()
    from /cygdrive/c/Windows/SysWOW64/ntdll.dll
(gdb) bt
#0  0x7748013d in ntdll!RtlEnableEarlyCriticalSectionEventCreation ()
    from /cygdrive/c/Windows/SysWOW64/ntdll.dll
#1  0x7748013d in ntdll!RtlEnableEarlyCriticalSectionEventCreation ()
    from /cygdrive/c/Windows/SysWOW64/ntdll.dll
#2  0x74bb0bdd in WaitForMultipleObjectsEx () from 
/cygdrive/c/Windows/syswow64/KERNELBASE.dll
#3  0x00000002 in ?? ()
#4  0x00000001 in ?? ()
#5  0x00000000 in ?? ()

(gdb) thread 2
[Switching to thread 2 (Thread 6908.0x1d34)]
#0  0x7747f8b1 in ntdll!RtlUpdateClonedSRWLock () from 
/cygdrive/c/Windows/SysWOW64/ntdll.dll
(gdb) bt
#0  0x7747f8b1 in ntdll!RtlUpdateClonedSRWLock () from 
/cygdrive/c/Windows/SysWOW64/ntdll.dll
#1  0x7747f8b1 in ntdll!RtlUpdateClonedSRWLock () from 
/cygdrive/c/Windows/SysWOW64/ntdll.dll
#2  0x74bb0a91 in WaitForSingleObjectEx () from 
/cygdrive/c/Windows/syswow64/KERNELBASE.dll
#3  0x00000034 in ?? ()
#4  0x00000000 in ?? ()

(gdb) thread 3
[Switching to thread 3 (Thread 6908.0x1d8c)]
#0  0x7747f8e5 in ntdll!RtlUpdateClonedSRWLock () from 
/cygdrive/c/Windows/SysWOW64/ntdll.dll
(gdb) bt
#0  0x7747f8e5 in ntdll!RtlUpdateClonedSRWLock () from 
/cygdrive/c/Windows/SysWOW64/ntdll.dll
#1  0x7747f8e5 in ntdll!RtlUpdateClonedSRWLock () from 
/cygdrive/c/Windows/SysWOW64/ntdll.dll
#2  0x74bad348 in ReadFile () from 
/cygdrive/c/Windows/syswow64/KERNELBASE.dll
#3  0x00000118 in ?? ()
#4  0x00000000 in ?? ()

(gdb) thread 4
[Switching to thread 4 (Thread 6908.0x1950)]
#0  0x7747000d in ntdll!LdrFindResource_U () from 
/cygdrive/c/Windows/SysWOW64/ntdll.dll
(gdb) bt
#0  0x7747000d in ntdll!LdrFindResource_U () from 
/cygdrive/c/Windows/SysWOW64/ntdll.dll
#1  0x774ff896 in ntdll!RtlQueryTimeZoneInformation ()
    from /cygdrive/c/Windows/SysWOW64/ntdll.dll
#2  0x5dfded78 in ?? ()
#3  0x00000000 in ?? ()

Tom.


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Intermittent failures retrieving process exit codes - snapshot test requested
  2012-12-27 20:50                         ` Tom Honermann
@ 2012-12-29 21:57                           ` Christopher Faylor
  2013-01-01  1:45                             ` Tom Honermann
  0 siblings, 1 reply; 65+ messages in thread
From: Christopher Faylor @ 2012-12-29 21:57 UTC (permalink / raw)
  To: cygwin

On Thu, Dec 27, 2012 at 03:49:24PM -0500, Tom Honermann wrote:
>When interrupting the test run, I'll often (but not always) get the 
>following error:
>
>c:\>test-strace.bat
>test...
>test...
>test...
>test...
>--- Process 8092, exception 40010005 at 75E26D67

That is coming from strace and it's:

/usr/include/w32api/ntstatus.h:#define DBG_CONTROL_C ((NTSTATUS)0x40010005)

i.e., it's expected.

>Additionally, some of the Cygwin gcc built utilities that I've built for 
>testing now occasionally hang upon interruption by ctrl-c.  Basic 
>diagnostics courtesy of gdb follow.

The hang should be fixed in the latest snapshot.

cgf

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Intermittent failures retrieving process exit codes - snapshot test requested
  2012-12-29 21:57                           ` Christopher Faylor
@ 2013-01-01  1:45                             ` Tom Honermann
  2013-01-01  5:36                               ` Christopher Faylor
  0 siblings, 1 reply; 65+ messages in thread
From: Tom Honermann @ 2013-01-01  1:45 UTC (permalink / raw)
  To: cygwin

On 12/29/2012 04:57 PM, Christopher Faylor wrote:
> On Thu, Dec 27, 2012 at 03:49:24PM -0500, Tom Honermann wrote:
>> When interrupting the test run, I'll often (but not always) get the
>> following error:
>>
>> c:\>test-strace.bat
>> test...
>> test...
>> test...
>> test...
>> --- Process 8092, exception 40010005 at 75E26D67
>
> That is coming from strace and it's:
>
> /usr/include/w32api/ntstatus.h:#define DBG_CONTROL_C ((NTSTATUS)0x40010005)
>
> i.e., it's expected.

Ah, sorry, I should have researched that further before reporting it. 
Thanks for the explanation.

>> Additionally, some of the Cygwin gcc built utilities that I've built for
>> testing now occasionally hang upon interruption by ctrl-c.  Basic
>> diagnostics courtesy of gdb follow.
>
> The hang should be fixed in the latest snapshot.

I'm still seeing hangs in the latest code from CVS.  The stack traces 
below are from WinDbg.  I manually resolved the symbol references within 
the cygwin1 module using the linker generated .map file.  Since the .map 
file does not include static functions, some of these may be incorrect - 
I didn't try and verify or correct for this.

  # ChildEBP RetAddr
00 00288bd0 758d0a91 ntdll!ZwWaitForSingleObject+0x15
01 00288c3c 76c11194 KERNELBASE!WaitForSingleObjectEx+0x98
02 00288c54 76c11148 kernel32!WaitForSingleObjectExImplementation+0x75
03 00288c68 610f1553 kernel32!WaitForSingleObject+0x12
04 00288cb8 6118e54d cygwin1!strtosigno+0x357
                              __ZN4muto7acquireEm
                              muto::acquire(unsigned long)
05 00288cc8 610f17b2 cygwin1!alloca+0xbbc9
                              __ZN6dtable4lockEv
                              dtable::lock()
06 00288d28 610eb717 cygwin1!strtosigno+0x5b6
                              __Z15close_all_filesb@4
                              close_all_files(bool)
07 00289a48 610eb92b cygwin1!sigfillset+0x7f3e
                              __ZN16child_info_spawn6workerEPKcPKS1_S3_iii
                              child_info_spawn::worker(char const*, char 
const* const*, char const* const*, int, int, int)
08 00289a88 6103af97 cygwin1!sigfillset+0x8152
                              _spawnve
09 0028ac28 61007b38 cygwin1!getenv+0x5293
                              _execlp
0a 0028ac48 61007ad5 cygwin1!setprogname+0x597d
0b 00000000 00000000 cygwin1!setprogname+0x591a


  # ChildEBP RetAddr
00 0071aafc 758cd348 ntdll!ZwReadFile+0x15
01 0071ab60 76c13ef7 KERNELBASE!ReadFile+0x118
02 0071aba8 610e7910 kernel32!ReadFileImplementation+0xf0
03 0071aca8 61003ec2 cygwin1!sigfillset+0x4137
                              __ZN15pending_signals4nextEv
                              pending_signals::next()
04 0071ace8 61004057 cygwin1!setprogname+0x1d07
                              __ZN9cygthread8callfuncEb
                              cygthread::callfunc(bool)
05 0071ad28 61004f61 cygwin1!setprogname+0x1e9c
                              __ZN9cygthread4stubEPv@4
                              cygthread::stub(void*)
06 0071cd98 61004dbc cygwin1!setprogname+0x2da6
                              __ZN7_cygtls5call2EPFmPvS0_ES0_S0_A
                              _cygtls::call2(unsigned long (*)(void*, 
void*), void*, void*)
07 0071ff68 61087074 cygwin1!setprogname+0x2c01
                              __ZN7_cygtls4callEPFmPvS0_ES0_A
                              _cygtls::call(unsigned long (*)(void*, 
void*), void*)
08 0071ff88 76c1339a cygwin1!setgrent+0x283c
09 0071ff94 779a9ef2 kernel32!BaseThreadInitThunk+0xe
0a 0071ffd4 779a9ec5 ntdll!__RtlUserThreadStart+0x70
0b 0071ffec 00000000 ntdll!_RtlUserThreadStart+0x1b

Tom.


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Intermittent failures retrieving process exit codes - snapshot test requested
  2013-01-01  1:45                             ` Tom Honermann
@ 2013-01-01  5:36                               ` Christopher Faylor
  2013-01-02 19:15                                 ` Tom Honermann
  0 siblings, 1 reply; 65+ messages in thread
From: Christopher Faylor @ 2013-01-01  5:36 UTC (permalink / raw)
  To: cygwin

On Mon, Dec 31, 2012 at 08:44:56PM -0500, Tom Honermann wrote:
>On 12/29/2012 04:57 PM, Christopher Faylor wrote:
>> On Thu, Dec 27, 2012 at 03:49:24PM -0500, Tom Honermann wrote:
>>> When interrupting the test run, I'll often (but not always) get the
>>> following error:
>>>
>>> c:\>test-strace.bat
>>> test...
>>> test...
>>> test...
>>> test...
>>> --- Process 8092, exception 40010005 at 75E26D67
>>
>> That is coming from strace and it's:
>>
>> /usr/include/w32api/ntstatus.h:#define DBG_CONTROL_C ((NTSTATUS)0x40010005)
>>
>> i.e., it's expected.
>
>Ah, sorry, I should have researched that further before reporting it. 
>Thanks for the explanation.
>
>>> Additionally, some of the Cygwin gcc built utilities that I've built for
>>> testing now occasionally hang upon interruption by ctrl-c.  Basic
>>> diagnostics courtesy of gdb follow.
>>
>> The hang should be fixed in the latest snapshot.
>
>I'm still seeing hangs in the latest code from CVS.  The stack traces 
>below are from WinDbg.

I'm not asking you to build this yourself.  I have no way to know how
you are building this.  Please just use the snapshots at

http://cygwin.com/snapshots/

>I manually resolved the symbol references within 
>the cygwin1 module using the linker generated .map file.  Since the .map 
>file does not include static functions, some of these may be incorrect - 
>I didn't try and verify or correct for this.

Thanks for trying, but the output below is garbled and not really
useful.  If you are not going to dive in and attempt to fix code
yourself then all we normally need is a simple test case.  WinDbg
is not really appropriate for debugging Cygwin applications.

cgf

>  # ChildEBP RetAddr
>00 00288bd0 758d0a91 ntdll!ZwWaitForSingleObject+0x15
>01 00288c3c 76c11194 KERNELBASE!WaitForSingleObjectEx+0x98
>02 00288c54 76c11148 kernel32!WaitForSingleObjectExImplementation+0x75
>03 00288c68 610f1553 kernel32!WaitForSingleObject+0x12
>04 00288cb8 6118e54d cygwin1!strtosigno+0x357
>                              __ZN4muto7acquireEm
>                              muto::acquire(unsigned long)
>[snip]

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Intermittent failures retrieving process exit codes - snapshot test requested
  2013-01-01  5:36                               ` Christopher Faylor
@ 2013-01-02 19:15                                 ` Tom Honermann
  2013-01-02 20:48                                   ` Christopher Faylor
  0 siblings, 1 reply; 65+ messages in thread
From: Tom Honermann @ 2013-01-02 19:15 UTC (permalink / raw)
  To: cygwin

On 01/01/2013 12:36 AM, Christopher Faylor wrote:
> On Mon, Dec 31, 2012 at 08:44:56PM -0500, Tom Honermann wrote:
>> I'm still seeing hangs in the latest code from CVS.  The stack traces
>> below are from WinDbg.
>
> I'm not asking you to build this yourself.  I have no way to know how
> you are building this.  Please just use the snapshots at
>
> http://cygwin.com/snapshots/

I was building it myself so that I could debug it without having to 
specify debug source paths and such.  I believe my builds are not 
unconventional.  I used options that disabled frame pointer omission so 
that the resulting binaries could be debugged with non-gcc debuggers.

$ mkdir build
$ cd build
$ ../src/configure \
     CFLAGS="-g" \
     CXXFLAGS="-g" \
     CFLAGS_FOR_TARGET="-g" \
     CXXFLAGS_FOR_TARGET="-g" \
     --enable-debugging \
     --prefix=$HOME/src/cygwin-latest/install -v
$ make
$ make install

>> I manually resolved the symbol references within
>> the cygwin1 module using the linker generated .map file.  Since the .map
>> file does not include static functions, some of these may be incorrect -
>> I didn't try and verify or correct for this.
>
> Thanks for trying, but the output below is garbled and not really
> useful.  If you are not going to dive in and attempt to fix code
> yourself then all we normally need is a simple test case.  WinDbg
> is not really appropriate for debugging Cygwin applications.

The output below is not garbled, but I didn't explain it clearly enough. 
  Lines with frame numbers come directly from WinDbg.  Since WinDbg is 
unable to resolve symbols to gcc generated debug info, the symbol 
references within the cygwin1 module are incorrect.  In those cases, I 
manually resolved the instruction pointer address using the RetAddr 
value from the prior frame and searching the linker generated 
cygwin1.map file.  I then pasted the mangled name on a line following 
the WinDbg line (with the incorrect symbol name) and, if the symbol is a 
C++ one, the unmangled name on an additional line.

For the stack fragment below, address 610f1553 == strtosigno+0x357 == 
__ZN4muto7acquireEm == muto::acquire(unsigned long).  I did not 
translate offsets for the functions as I resolved them, nor did I try 
and verify they are correct (ie, that the return address is not for a 
static function that is not represented in the .map file)

>>   # ChildEBP RetAddr
>> 00 00288bd0 758d0a91 ntdll!ZwWaitForSingleObject+0x15
>> 01 00288c3c 76c11194 KERNELBASE!WaitForSingleObjectEx+0x98
>> 02 00288c54 76c11148 kernel32!WaitForSingleObjectExImplementation+0x75
>> 03 00288c68 610f1553 kernel32!WaitForSingleObject+0x12
>> 04 00288cb8 6118e54d cygwin1!strtosigno+0x357
>>                               __ZN4muto7acquireEm
>>                               muto::acquire(unsigned long)
>> [snip]

The reason for using WinDbg is that, from what I understand, gdb is 
unable to produce accurate stack traces when the call stack includes 
frames for functions that omit the frame pointer and do not have debug 
info that gdb can process.  I believe many Microsoft provided functions 
in ntdll, kernel32, kernelbase, etc... do omit the frame pointer and 
only provide debug info in the PDB format - which gdb is unable to use. 
  Compiling Cygwin without frame pointer omission, and using WinDbg 
therefore provides the most accurate stack trace.  If I am incorrect 
about any of this, I would very much appreciate a correction and/or 
explanation.

I downloaded the latest snapshot (2012-12-31 18:44:57 UTC) and was able 
to reproduce several issues which are described below.

All of these issues occur when using ctrl-c to interrupt the infinite 
loop in the test case(s) I've been using to debug inconsistent exit 
codes.  When ctrl-c is pressed, I've observed the following:

1) Programs are (generally) terminated as expected.  cmd.exe prompts to 
"Terminate batch job" as expected.

2) An access violation occurs and a processor context is dumped to the 
console.  I do not yet have stack traces for these cases.

3) One of the processes hangs.

access violations occur in ~20% of test runs.  Hangs occur in ~5% of 
test runs.

I did not provide a test case previously because I don't have an 
automated reproducer at present.  All sources needed to reproduce the 
issues are below.  The test case uses a .bat file to avoid dependencies 
on bash so as to minimally isolate the problem.

To reproduce the issues, copy test.bat, false-cygwin32.exe, and 
expect-false-execve-cygwin32.exe to a Cygwin bin directory and run 
test.bat from a cmd.exe console.  Press ctrl-c to interrupt the test. 
Repeat until problems are observed.  I have not been able to reproduce 
these symptoms when running the test via a MinTTY console.

I have been unable to get useful stack traces from hung processes using 
gdb.  gdb reports that the debug information in cygwin1-20130102.dbg.bz2 
does not match (CRC mismatch) the cygwin1.dll module in 
cygwin-inst-20130102.tar.bz2.


$ cat expect-false-execve.c
#include <errno.h>
#include <stdio.h>
#include <sys/wait.h>
#include <unistd.h>

int main(int argc, char *argv[]) {
     pid_t child_pid, wait_pid;
     int result, child_status;

     if (argc != 2) {
         fprintf(stderr, "expect-false: Missing or too many arguments\n");
         return 127;
     }

     child_pid = fork();
     if (child_pid == -1) {
         fprintf(stderr, "expect-false: fork failed.  errno=%d\n", errno);
         return 127;
     } else if (child_pid == 0) {
         result = execlp(argv[1], argv[1], NULL);
         if (result == -1) {
             fprintf(stderr, "expect-false: execlp failed.  errno=%d\n", 
errno);
         }
         _exit(127);
     }

     do {
         wait_pid = waitpid(child_pid, &child_status, 0);
     } while(
         (wait_pid == -1 && errno == EINTR) ||
         (wait_pid == child_pid && !(WIFEXITED(child_status) || 
WIFSIGNALED(child_status)))
     );
     if (wait_pid == -1) {
         fprintf(stderr, "expect-false: waitpid failed.  errno=%d\n", 
errno);
         return 127;
     }
     if (!WIFEXITED(child_status)) {
         fprintf(stderr, "expect-false: child process did not exit 
normally\n");
         return 127;
     }
     if (WEXITSTATUS(child_status) != 1) {
         fprintf(stderr, "expect-false: unexpected exit code: %d\n", 
child_status);
     }

     return WEXITSTATUS(child_status);
}


$ cat false.c
#include <stdio.h>

int main() {
     printf("myfalse\n");
     return 1;
}


$ cat test.bat
@echo off
setlocal

set PATH=%CD%;%PATH%

:loop
echo test...
expect-false-execve-cygwin32.exe false-cygwin32
if not errorlevel 1 (
     echo exiting...
     exit /B 1
)
goto loop


$ gcc -o expect-false-execve-cygwin32.exe expect-false-execve.c
$ gcc -o false-cygwin32.exe false.c

 From a cmd.exe console: (press ctrl-c once the test is running)
C:\...\cygwin\bin>test
test...
myfalse
test...
myfalse
...


Tom.


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Intermittent failures retrieving process exit codes - snapshot test requested
  2013-01-02 19:15                                 ` Tom Honermann
@ 2013-01-02 20:48                                   ` Christopher Faylor
  2013-01-02 20:53                                     ` Daniel Colascione
  2013-01-02 21:25                                     ` Tom Honermann
  0 siblings, 2 replies; 65+ messages in thread
From: Christopher Faylor @ 2013-01-02 20:48 UTC (permalink / raw)
  To: cygwin

I managed to duplicate a hang by really stressing ctrl-c a loop.  It
uncovers some rather amazing Windows behavior which I have to think
about.  Apparently ExitThread can be called recursively within the
thread that Windows creates to handle CTRL-C.

cgf

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Intermittent failures retrieving process exit codes - snapshot test requested
  2013-01-02 20:48                                   ` Christopher Faylor
@ 2013-01-02 20:53                                     ` Daniel Colascione
  2013-01-02 21:41                                       ` Christopher Faylor
  2013-01-02 21:25                                     ` Tom Honermann
  1 sibling, 1 reply; 65+ messages in thread
From: Daniel Colascione @ 2013-01-02 20:53 UTC (permalink / raw)
  To: cygwin

[-- Attachment #1: Type: text/plain, Size: 507 bytes --]

On 1/2/13 12:48 PM, Christopher Faylor wrote:
> I managed to duplicate a hang by really stressing ctrl-c a loop.  It
> uncovers some rather amazing Windows behavior which I have to think
> about.  Apparently ExitThread can be called recursively within the
> thread that Windows creates to handle CTRL-C.

What do you mean? ExitThread should never return, and I can't
imagine anything on the thread termination path calling ExitThread
again, especially not once the thread jumps to kernel mode.



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 235 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Intermittent failures retrieving process exit codes - snapshot test requested
  2013-01-02 20:48                                   ` Christopher Faylor
  2013-01-02 20:53                                     ` Daniel Colascione
@ 2013-01-02 21:25                                     ` Tom Honermann
  2013-01-15 22:17                                       ` Intermittent failures with ctrl-c (was: retrieving process exit codes) Tom Honermann
  1 sibling, 1 reply; 65+ messages in thread
From: Tom Honermann @ 2013-01-02 21:25 UTC (permalink / raw)
  To: cygwin

On 01/02/2013 03:48 PM, Christopher Faylor wrote:
> I managed to duplicate a hang by really stressing ctrl-c a loop.  It
> uncovers some rather amazing Windows behavior which I have to think
> about.  Apparently ExitThread can be called recursively within the
> thread that Windows creates to handle CTRL-C.

I'm glad you could reproduce.  Based on your description, this sounds 
like a separate issue and not a regression introduced by the workarounds 
you put in place for the ExitProcess / ExitThread race.  Correct?

I wonder if this is the same issue I'm experiencing though.  I'm only 
pressing ctrl-c once and it sounds like you might be deliving a ctrl-c 
to the same process multiple times.  That may not be relevant to the 
root cause however.

Tom.


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Intermittent failures retrieving process exit codes - snapshot test requested
  2013-01-02 20:53                                     ` Daniel Colascione
@ 2013-01-02 21:41                                       ` Christopher Faylor
  0 siblings, 0 replies; 65+ messages in thread
From: Christopher Faylor @ 2013-01-02 21:41 UTC (permalink / raw)
  To: cygwin

On Wed, Jan 02, 2013 at 12:53:11PM -0800, Daniel Colascione wrote:
>On 1/2/13 12:48 PM, Christopher Faylor wrote:
>> I managed to duplicate a hang by really stressing ctrl-c a loop.  It
>> uncovers some rather amazing Windows behavior which I have to think
>> about.  Apparently ExitThread can be called recursively within the
>> thread that Windows creates to handle CTRL-C.
>
>What do you mean? ExitThread should never return, and I can't
>imagine anything on the thread termination path calling ExitThread
>again, especially not once the thread jumps to kernel mode.

Sorry, I was just speculating about what it looked like.  I'm
still debugging the problem.

cgf

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Intermittent failures with ctrl-c (was: retrieving process exit codes)
  2013-01-02 21:25                                     ` Tom Honermann
@ 2013-01-15 22:17                                       ` Tom Honermann
  2013-01-16  2:04                                         ` Christopher Faylor
  0 siblings, 1 reply; 65+ messages in thread
From: Tom Honermann @ 2013-01-15 22:17 UTC (permalink / raw)
  To: cygwin

On 01/02/2013 04:24 PM, Tom Honermann wrote:
> On 01/02/2013 03:48 PM, Christopher Faylor wrote:
>> I managed to duplicate a hang by really stressing ctrl-c a loop.  It
>> uncovers some rather amazing Windows behavior which I have to think
>> about.  Apparently ExitThread can be called recursively within the
>> thread that Windows creates to handle CTRL-C.
>
> I'm glad you could reproduce.  Based on your description, this sounds
> like a separate issue and not a regression introduced by the workarounds
> you put in place for the ExitProcess / ExitThread race.  Correct?
>
> I wonder if this is the same issue I'm experiencing though.  I'm only
> pressing ctrl-c once and it sounds like you might be deliving a ctrl-c
> to the same process multiple times.  That may not be relevant to the
> root cause however.

I noticed that some changes were checked in related to signal handling 
and process termination recently, so I downloaded the most recent 
snapshot (20130114) and tested again.  I was still able to produce 
hanging processes (including hangs of strace.exe) by hitting ctrl-c in a 
mintty window while Cygwin processes ran in an infinite loop inside of a 
.bat file.  I was able to produce a hang ~1 out of 20 times.

If you are still working on this, then I apologize for the noise. 
Otherwise, assuming you are still looking at this, if I can provide 
something further that would be helpful, please let me know.

Tom.


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Intermittent failures with ctrl-c (was: retrieving process exit codes)
  2013-01-15 22:17                                       ` Intermittent failures with ctrl-c (was: retrieving process exit codes) Tom Honermann
@ 2013-01-16  2:04                                         ` Christopher Faylor
  2013-01-16 16:38                                           ` Intermittent failures with ctrl-c Tom Honermann
  0 siblings, 1 reply; 65+ messages in thread
From: Christopher Faylor @ 2013-01-16  2:04 UTC (permalink / raw)
  To: cygwin

On Tue, Jan 15, 2013 at 05:16:57PM -0500, Tom Honermann wrote:
>I noticed that some changes were checked in related to signal handling 
>and process termination recently, so I downloaded the most recent 
>snapshot (20130114) and tested again.  I was still able to produce 
>hanging processes (including hangs of strace.exe) by hitting ctrl-c in a 
>mintty window while Cygwin processes ran in an infinite loop inside of a 
>.bat file.  I was able to produce a hang ~1 out of 20 times.

How does one run a .bat file inside mintty which handles CTRL-C?  AFAIK,
a CTRL-C will just cause the .bat file to exit when run under bash.

cgf

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Intermittent failures with ctrl-c
  2013-01-16  2:04                                         ` Christopher Faylor
@ 2013-01-16 16:38                                           ` Tom Honermann
  2013-01-16 16:53                                             ` marco atzeri
  2013-01-16 19:14                                             ` Christopher Faylor
  0 siblings, 2 replies; 65+ messages in thread
From: Tom Honermann @ 2013-01-16 16:38 UTC (permalink / raw)
  To: cygwin

On 01/15/2013 09:04 PM, Christopher Faylor wrote:
> On Tue, Jan 15, 2013 at 05:16:57PM -0500, Tom Honermann wrote:
>> I noticed that some changes were checked in related to signal handling
>> and process termination recently, so I downloaded the most recent
>> snapshot (20130114) and tested again.  I was still able to produce
>> hanging processes (including hangs of strace.exe) by hitting ctrl-c in a
>> mintty window while Cygwin processes ran in an infinite loop inside of a
>> .bat file.  I was able to produce a hang ~1 out of 20 times.
>
> How does one run a .bat file inside mintty which handles CTRL-C?  AFAIK,
> a CTRL-C will just cause the .bat file to exit when run under bash.

Here is the test case:

1) Install the latest snapshot

2) Copy bash.exe, false.exe, and their dependent DLLs from a Cygwin 
install into the usr/bin directory of the snapshot.  For me this 
consisted of:
   bash.exe
   cygintl-8.dll
   cygiconv-2.dll
   cygreadline7.dll
   cygncurses-10.dll
   cygncursesw-10.dll
   cyggcc_s-1.dll
   false.exe

3) Create 'test.bat' in the usr/bin directory of the snapshot with the 
following contents:

@echo off
setlocal

set PATH=%CD%;%PATH%

:loop
echo test...
bash -c false
if not errorlevel 1 (
     echo exiting...
     exit /B 1
)
goto loop

4) Launch mintty using an existing Cygwin installation.  Naturally, this 
will run a shell from the existing Cygwin install.

5) Change directories to the usr/bin directory of the snapshot.

6) Start task manager or some other process monitoring tool and keep it 
running.  Run ./test.bat from the Cygwin shell running within mintty and 
interrupt it with ctrl-c.  Repeat until you see a new bash.exe or 
false.exe process persisting following the interrupt.  You'll likely 
have multiple bash processes running.  If you are able to reproduce, you 
should see one with a command line of 'bash -c false'.  Alternatively, 
if your process monitoring tool shows the path to the executable, you'll 
be able to identify it as the one from the usr/bin directory of the 
snapshot.

I rather doubt that the use of a .bat file is necessary to reproduce 
this hang, but I haven't tried producing a test case that doesn't use a 
.bat file.  This is a test case I was using when debugging the 
intermittent incorrect exit code issue.

Tom.


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Intermittent failures with ctrl-c
  2013-01-16 16:38                                           ` Intermittent failures with ctrl-c Tom Honermann
@ 2013-01-16 16:53                                             ` marco atzeri
  2013-01-16 17:42                                               ` Tom Honermann
  2013-01-16 19:14                                             ` Christopher Faylor
  1 sibling, 1 reply; 65+ messages in thread
From: marco atzeri @ 2013-01-16 16:53 UTC (permalink / raw)
  To: cygwin

On 1/16/2013 5:37 PM, Tom Honermann wrote:

>
> 4) Launch mintty using an existing Cygwin installation.  Naturally, this
> will run a shell from the existing Cygwin install.
>
> 5) Change directories to the usr/bin directory of the snapshot.
>

This will cause a cygwin1.dll collision between the two versions
Nothing is guarantee to work fine

>
> Tom.
>

Marco


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Intermittent failures with ctrl-c
  2013-01-16 16:53                                             ` marco atzeri
@ 2013-01-16 17:42                                               ` Tom Honermann
  2013-01-16 18:05                                                 ` Earnie Boyd
  0 siblings, 1 reply; 65+ messages in thread
From: Tom Honermann @ 2013-01-16 17:42 UTC (permalink / raw)
  To: cygwin

On 01/16/2013 11:53 AM, marco atzeri wrote:
> On 1/16/2013 5:37 PM, Tom Honermann wrote:
>
>>
>> 4) Launch mintty using an existing Cygwin installation.  Naturally, this
>> will run a shell from the existing Cygwin install.
>>
>> 5) Change directories to the usr/bin directory of the snapshot.
>>
>
> This will cause a cygwin1.dll collision between the two versions
> Nothing is guarantee to work fine

Can you elaborate?  Cygwin supports multiple installations just fine 
these days.  Use of a .bat file (an intervening cmd.exe process) should 
isolate the environments for this test.

Regardless, I was also able to produce a hang in bash running the same 
.bat file from a cmd.exe prompt using only the snapshot install and the 
copied bash.exe, false.exe, and dependent binaries - no mintty.  The 
hung bash.exe process eventually timed out with an error message:

5 [unknown (0x176C)] bash 2000 sig_send: wait for sig_complete event 
failed, signal 6, rc 258, Win32 error 0

Tom.


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Intermittent failures with ctrl-c
  2013-01-16 17:42                                               ` Tom Honermann
@ 2013-01-16 18:05                                                 ` Earnie Boyd
  2013-01-16 18:51                                                   ` Tom Honermann
  0 siblings, 1 reply; 65+ messages in thread
From: Earnie Boyd @ 2013-01-16 18:05 UTC (permalink / raw)
  To: cygwin

On Wed, Jan 16, 2013 at 12:42 PM, Tom Honermann <thonermann@coverity.com> wrote:
> On 01/16/2013 11:53 AM, marco atzeri wrote:
>>
>> On 1/16/2013 5:37 PM, Tom Honermann wrote:
>>
>>>
>>> 4) Launch mintty using an existing Cygwin installation.  Naturally, this
>>> will run a shell from the existing Cygwin install.
>>>
>>> 5) Change directories to the usr/bin directory of the snapshot.
>>>
>>
>> This will cause a cygwin1.dll collision between the two versions
>> Nothing is guarantee to work fine
>
>
> Can you elaborate?  Cygwin supports multiple installations just fine these
> days.  Use of a .bat file (an intervening cmd.exe process) should isolate
> the environments for this test.
>

While you can multiple installations you cannot mix the environments.
You did not copy mintty so you started it in one instance and then
went to another instance which will cause a clash of resources.

> Regardless, I was also able to produce a hang in bash running the same .bat
> file from a cmd.exe prompt using only the snapshot install and the copied
> bash.exe, false.exe, and dependent binaries - no mintty.  The hung bash.exe
> process eventually timed out with an error message:
>
> 5 [unknown (0x176C)] bash 2000 sig_send: wait for sig_complete event failed,
> signal 6, rc 258, Win32 error 0

Looking at the list of DLL you copied you may still be seeing a
conflict with which DLL is in use.  Do you see a hang if you remain in
usr/bin and not changing directories to your copied files?

-- 
Earnie
-- https://sites.google.com/site/earnieboyd

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Intermittent failures with ctrl-c
  2013-01-16 18:05                                                 ` Earnie Boyd
@ 2013-01-16 18:51                                                   ` Tom Honermann
  2013-01-16 18:59                                                     ` Christopher Faylor
  0 siblings, 1 reply; 65+ messages in thread
From: Tom Honermann @ 2013-01-16 18:51 UTC (permalink / raw)
  To: cygwin

On 01/16/2013 01:05 PM, Earnie Boyd wrote:
> On Wed, Jan 16, 2013 at 12:42 PM, Tom Honermann <thonermann@coverity.com> wrote:
>> On 01/16/2013 11:53 AM, marco atzeri wrote:
>>>
>>> On 1/16/2013 5:37 PM, Tom Honermann wrote:
>>>
>>>>
>>>> 4) Launch mintty using an existing Cygwin installation.  Naturally, this
>>>> will run a shell from the existing Cygwin install.
>>>>
>>>> 5) Change directories to the usr/bin directory of the snapshot.
>>>>
>>>
>>> This will cause a cygwin1.dll collision between the two versions
>>> Nothing is guarantee to work fine
>>
>>
>> Can you elaborate?  Cygwin supports multiple installations just fine these
>> days.  Use of a .bat file (an intervening cmd.exe process) should isolate
>> the environments for this test.
>>
>
> While you can multiple installations you cannot mix the environments.
> You did not copy mintty so you started it in one instance and then
> went to another instance which will cause a clash of resources.

Can you elaborate on what resources you are referring to?  I fail to see 
how the Cygwin binaries run via the .bat file could conflict with mintty 
(or the top level bash process) since the intervening cmd.exe execution 
would have blocked inheritance of Cygwin related resources, primarily 
since fork() isn't used to create these child processes.

My understanding is that shared Cygwin resources are keyed off of the 
location of the cygwin1.dll loaded into the Cygwin process.  If two 
Cygwin processes run with different cygwin1.dll instances, they should 
not share resources.  I can see a case for there being a problem if a 
Cygwin process creates another Cygwin process via fork() and that child 
process is run with a different cygwin1.dll instance, but that isn't the 
case here.  The only other case I can think of would require Cygwin 
looking at the process tree (stepping up through non-Cygwin processes) 
to get at resources.  That would be quite expensive on Windows.

>> Regardless, I was also able to produce a hang in bash running the same .bat
>> file from a cmd.exe prompt using only the snapshot install and the copied
>> bash.exe, false.exe, and dependent binaries - no mintty.  The hung bash.exe
>> process eventually timed out with an error message:
>>
>> 5 [unknown (0x176C)] bash 2000 sig_send: wait for sig_complete event failed,
>> signal 6, rc 258, Win32 error 0
>
> Looking at the list of DLL you copied you may still be seeing a
> conflict with which DLL is in use.

I don't see how that would be the case.  If it were, then it would not 
be possible (in general) to have multiple Cygwin installations with 
unrelated processes running concurrently from each installation.

> Do you see a hang if you remain in
> usr/bin and not changing directories to your copied files?

I believe that would be equivalent to testing in my (non-snapshot) 
Cygwin installation.  The goal is to test the snapshot.

Tom.


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Intermittent failures with ctrl-c
  2013-01-16 18:51                                                   ` Tom Honermann
@ 2013-01-16 18:59                                                     ` Christopher Faylor
  2013-01-16 20:19                                                       ` Tom Honermann
  0 siblings, 1 reply; 65+ messages in thread
From: Christopher Faylor @ 2013-01-16 18:59 UTC (permalink / raw)
  To: cygwin

On Wed, Jan 16, 2013 at 01:51:11PM -0500, Tom Honermann wrote:
>Can you elaborate on what resources you are referring to?  I fail to
>see how the Cygwin binaries run via the .bat file could conflict with
>mintty (or the top level bash process) since the intervening cmd.exe
>execution would have blocked inheritance of Cygwin related resources,
>primarily since fork() isn't used to create these child processes.

Here is a very basic issue: If you are going to be submitting a bug
report you should be making things as simple and as clear as possible.
The fact that there are two cygwin DLLs in play here adds additional
confusion and complication.  If we now have to enter into a theoretical
discussion about what should be allowed, we have needlessly strayed from
the initial problem.

Given the number of historical problems we have had with mixing two
versions of Cygwin and given that our consistent guidance is to
only have one on your computer, there is no reason to get into a
discussion about what is allowed.  Just use one version.  You
can easily switch back and forth using windows tools.

cgf

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Intermittent failures with ctrl-c
  2013-01-16 16:38                                           ` Intermittent failures with ctrl-c Tom Honermann
  2013-01-16 16:53                                             ` marco atzeri
@ 2013-01-16 19:14                                             ` Christopher Faylor
  2013-01-16 20:24                                               ` Tom Honermann
  1 sibling, 1 reply; 65+ messages in thread
From: Christopher Faylor @ 2013-01-16 19:14 UTC (permalink / raw)
  To: cygwin

On Wed, Jan 16, 2013 at 11:37:43AM -0500, Tom Honermann wrote:
>On 01/15/2013 09:04 PM, Christopher Faylor wrote:
>> On Tue, Jan 15, 2013 at 05:16:57PM -0500, Tom Honermann wrote:
>>> I noticed that some changes were checked in related to signal handling
>>> and process termination recently, so I downloaded the most recent
>>> snapshot (20130114) and tested again.  I was still able to produce
>>> hanging processes (including hangs of strace.exe) by hitting ctrl-c in a
>>> mintty window while Cygwin processes ran in an infinite loop inside of a
>>> .bat file.  I was able to produce a hang ~1 out of 20 times.
>>
>> How does one run a .bat file inside mintty which handles CTRL-C?  AFAIK,
>> a CTRL-C will just cause the .bat file to exit when run under bash.
>
>Here is the test case:
>
>1) Install the latest snapshot
>
>2) Copy bash.exe, false.exe, and their dependent DLLs from a Cygwin 
>install into the usr/bin directory of the snapshot.  For me this 
>consisted of:
>   bash.exe
>   cygintl-8.dll
>   cygiconv-2.dll
>   cygreadline7.dll
>   cygncurses-10.dll
>   cygncursesw-10.dll
>   cyggcc_s-1.dll
>   false.exe
>
>3) Create 'test.bat' in the usr/bin directory of the snapshot with the 
>following contents:
>
>@echo off
>setlocal
>
>set PATH=%CD%;%PATH%
>
>:loop
>echo test...
>bash -c false
>if not errorlevel 1 (
>     echo exiting...
>     exit /B 1
>)
>goto loop
>
>4) Launch mintty using an existing Cygwin installation.  Naturally, this 
>will run a shell from the existing Cygwin install.
>
>5) Change directories to the usr/bin directory of the snapshot.
>
>6) Start task manager or some other process monitoring tool and keep it 
>running.  Run ./test.bat from the Cygwin shell running within mintty and 
>interrupt it with ctrl-c.  Repeat until you see a new bash.exe or 
>false.exe process persisting following the interrupt.  You'll likely 
>have multiple bash processes running.  If you are able to reproduce, you 
>should see one with a command line of 'bash -c false'.  Alternatively, 
>if your process monitoring tool shows the path to the executable, you'll 
>be able to identify it as the one from the usr/bin directory of the 
>snapshot.

Again, if I hit CTRL-C while running ./test.bat in mintty then test.bat
exits immediately, as expected.  Hitting ctrl-c repeatedly after that
point gives me a new bash prompt.

Non-exiting behavior was a symptom of a previous snapshot which was
mentioned here:

http://cygwin.com/ml/cygwin/2013-01/msg00164.html

>I rather doubt that the use of a .bat file is necessary to reproduce 
>this hang, but I haven't tried producing a test case that doesn't use a 
>.bat file.  This is a test case I was using when debugging the 
>intermittent incorrect exit code issue.

Btw, an incorrect exit code is still a possibility if you're running
from a cmd shell since it is possible to interrupt a cygwin process
before cygwin is entirely set up.  That will cause a normal windows
CTRL-C exit.

cgf

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Intermittent failures with ctrl-c
  2013-01-16 18:59                                                     ` Christopher Faylor
@ 2013-01-16 20:19                                                       ` Tom Honermann
  2013-01-16 22:23                                                         ` Christopher Faylor
  0 siblings, 1 reply; 65+ messages in thread
From: Tom Honermann @ 2013-01-16 20:19 UTC (permalink / raw)
  To: cygwin

On 01/16/2013 01:59 PM, Christopher Faylor wrote:
> On Wed, Jan 16, 2013 at 01:51:11PM -0500, Tom Honermann wrote:
>> Can you elaborate on what resources you are referring to?  I fail to
>> see how the Cygwin binaries run via the .bat file could conflict with
>> mintty (or the top level bash process) since the intervening cmd.exe
>> execution would have blocked inheritance of Cygwin related resources,
>> primarily since fork() isn't used to create these child processes.
>
> Here is a very basic issue: If you are going to be submitting a bug
> report you should be making things as simple and as clear as possible.

I'm trying.  What you are suggesting implies that all testing of 
snapshots either be done with a cmd.exe prompt (and copying enough of 
another Cygwin installation into the snapshot), or updating the host 
Cygwin installation.  My host installation is used for production 
purposes and I don't have spare machines available for other testing. 
I'm not messing with it.

I am aware of the snapshot guidance: 
http://cygwin.com/faq-nochunks.html#faq.setup.snapshots

> The fact that there are two cygwin DLLs in play here adds additional
> confusion and complication.  If we now have to enter into a theoretical
> discussion about what should be allowed, we have needlessly strayed from
> the initial problem.
>
> Given the number of historical problems we have had with mixing two
> versions of Cygwin and given that our consistent guidance is to
> only have one on your computer, there is no reason to get into a
> discussion about what is allowed.  Just use one version.  You
> can easily switch back and forth using windows tools.

I previously mentioned that problems can be duplicated without mintty. 
Here are detailed steps for how to reproduce without mintty.

1) Install the latest snapshot

2) Copy bash.exe, false.exe, and their dependent DLLs from a Cygwin 
install into the usr/bin directory of the snapshot.  For me this 
consisted of:
   bash.exe
   cygintl-8.dll
   cygiconv-2.dll
   cygreadline7.dll
   cygncurses-10.dll
   cygncursesw-10.dll
   cyggcc_s-1.dll
   false.exe

3) Shutdown all other Cygwin processes.

4) Create 'test.bat' in the usr/bin directory of the snapshot with the 
following contents:

@echo off
setlocal

set PATH=%CD%;%PATH%

:loop
echo test...
bash -c false
if not errorlevel 1 (
     echo exiting...
     exit /B 1
)
goto loop

5 Start a cmd.exe prompt.

6) Change directories to the usr/bin directory of the snapshot.

7) Start task manager or some other process monitoring tool and keep it 
running.  Run ./test.bat from the cmd.exe prompt and interrupt it with 
ctrl-c.  Repeat until you see a new bash.exe or false.exe process 
persisting following the interrupt.

It took me 20 or so tries re-running test.bat and interrupting it before 
I was able to produce a hanging/abandoned process.

I don't know how to make things any simpler or clearer than this.

Tom.


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Intermittent failures with ctrl-c
  2013-01-16 19:14                                             ` Christopher Faylor
@ 2013-01-16 20:24                                               ` Tom Honermann
  0 siblings, 0 replies; 65+ messages in thread
From: Tom Honermann @ 2013-01-16 20:24 UTC (permalink / raw)
  To: cygwin

On 01/16/2013 02:14 PM, Christopher Faylor wrote:
> Again, if I hit CTRL-C while running ./test.bat in mintty then test.bat
> exits immediately, as expected.  Hitting ctrl-c repeatedly after that
> point gives me a new bash prompt.

Yes, that is what is expected to happen.  What I am reporting is that 
interrupting test.bat sometimes leaves hung processes still running 
after control is returned to the shell.

> Non-exiting behavior was a symptom of a previous snapshot which was
> mentioned here:
>
> http://cygwin.com/ml/cygwin/2013-01/msg00164.html

I'm testing a newer snapshot than that one.  I'm been testing with 
20130114 which Thomas reported as no longer having that problem here:

http://cygwin.com/ml/cygwin/2013-01/msg00196.html

>> I rather doubt that the use of a .bat file is necessary to reproduce
>> this hang, but I haven't tried producing a test case that doesn't use a
>> .bat file.  This is a test case I was using when debugging the
>> intermittent incorrect exit code issue.
>
> Btw, an incorrect exit code is still a possibility if you're running
> from a cmd shell since it is possible to interrupt a cygwin process
> before cygwin is entirely set up.  That will cause a normal windows
> CTRL-C exit.

Yup, that is understood and expected.

Tom.


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Intermittent failures with ctrl-c
  2013-01-16 20:19                                                       ` Tom Honermann
@ 2013-01-16 22:23                                                         ` Christopher Faylor
  2013-01-18 20:12                                                           ` Tom Honermann
  0 siblings, 1 reply; 65+ messages in thread
From: Christopher Faylor @ 2013-01-16 22:23 UTC (permalink / raw)
  To: cygwin

On Wed, Jan 16, 2013 at 03:18:47PM -0500, Tom Honermann wrote:
>I previously mentioned that problems can be duplicated without mintty. 
>Here are detailed steps for how to reproduce without mintty.

I was responding to your latest bug report which mentioned mintty.

I managed to duplicate a hang by changing your .bat file to use "sleep
2" rather than false.  I'm investigating now.

cgf

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Intermittent failures with ctrl-c
  2013-01-16 22:23                                                         ` Christopher Faylor
@ 2013-01-18 20:12                                                           ` Tom Honermann
  2013-01-19  5:58                                                             ` Christopher Faylor
  0 siblings, 1 reply; 65+ messages in thread
From: Tom Honermann @ 2013-01-18 20:12 UTC (permalink / raw)
  To: cygwin

On 01/16/2013 05:23 PM, Christopher Faylor wrote:
> On Wed, Jan 16, 2013 at 03:18:47PM -0500, Tom Honermann wrote:
> I managed to duplicate a hang by changing your .bat file to use "sleep
> 2" rather than false.  I'm investigating now.

I noticed that you checked in some additional changes on the 16th that 
look related to this, so I tested again with today's snapshot (20130118).

I was still able to produce hangs using the same test case.  The 
symptoms are slightly different than I had seen previously.  bash hung 2 
out of the ~60 times I interrupted the test.  No error messages were 
displayed this time.  Upon pressing ctrl-c, bash hung for 60 seconds.  I 
was then greeted with the "Terminate batch job" prompt and responding 
'Y' terminated the process tree as expected.  Pressing ctrl-c while bash 
was hung for that 60 seconds appeared to have no affect.

My apologies for this distraction if you don't yet expect this to be fixed.

Tom.


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Intermittent failures with ctrl-c
  2013-01-18 20:12                                                           ` Tom Honermann
@ 2013-01-19  5:58                                                             ` Christopher Faylor
  2013-01-20 22:09                                                               ` Tom Honermann
  0 siblings, 1 reply; 65+ messages in thread
From: Christopher Faylor @ 2013-01-19  5:58 UTC (permalink / raw)
  To: cygwin

On Fri, Jan 18, 2013 at 03:11:03PM -0500, Tom Honermann wrote:
>On 01/16/2013 05:23 PM, Christopher Faylor wrote:
>> On Wed, Jan 16, 2013 at 03:18:47PM -0500, Tom Honermann wrote:
>> I managed to duplicate a hang by changing your .bat file to use "sleep
>> 2" rather than false.  I'm investigating now.
>
>I noticed that you checked in some additional changes on the 16th that 
>look related to this, so I tested again with today's snapshot (20130118).

I thought I sent a "try a snapshot" but I must have been hallucinating
again.

>I was still able to produce hangs using the same test case.  The 
>symptoms are slightly different than I had seen previously.  bash hung 2 
>out of the ~60 times I interrupted the test.  No error messages were 
>displayed this time.  Upon pressing ctrl-c, bash hung for 60 seconds.  I 
>was then greeted with the "Terminate batch job" prompt and responding 
>'Y' terminated the process tree as expected.  Pressing ctrl-c while bash 
>was hung for that 60 seconds appeared to have no affect.

The hang should be fixed in the upcoming snapshot.

cgf

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Intermittent failures with ctrl-c
  2013-01-19  5:58                                                             ` Christopher Faylor
@ 2013-01-20 22:09                                                               ` Tom Honermann
  2013-01-23  3:20                                                                 ` Tom Honermann
  0 siblings, 1 reply; 65+ messages in thread
From: Tom Honermann @ 2013-01-20 22:09 UTC (permalink / raw)
  To: cygwin

On 01/19/2013 12:58 AM, Christopher Faylor wrote:
> On Fri, Jan 18, 2013 at 03:11:03PM -0500, Tom Honermann wrote:
>> On 01/16/2013 05:23 PM, Christopher Faylor wrote:
>>> On Wed, Jan 16, 2013 at 03:18:47PM -0500, Tom Honermann wrote:
>>> I managed to duplicate a hang by changing your .bat file to use "sleep
>>> 2" rather than false.  I'm investigating now.
>>
>> I noticed that you checked in some additional changes on the 16th that
>> look related to this, so I tested again with today's snapshot (20130118).
>
> I thought I sent a "try a snapshot" but I must have been hallucinating
> again.
>
>> I was still able to produce hangs using the same test case.  The
>> symptoms are slightly different than I had seen previously.  bash hung 2
>> out of the ~60 times I interrupted the test.  No error messages were
>> displayed this time.  Upon pressing ctrl-c, bash hung for 60 seconds.  I
>> was then greeted with the "Terminate batch job" prompt and responding
>> 'Y' terminated the process tree as expected.  Pressing ctrl-c while bash
>> was hung for that 60 seconds appeared to have no affect.
>
> The hang should be fixed in the upcoming snapshot.

Snapshot 20130119 appears to have addressed most of the cases I've 
witnessed.

However, I was still able to reproduce another case.  As before, one of 
the processes is being left running when the rest are terminated.  The 
"abandoned" process appears to be in a live-lock state with two threads 
(threads 1 and 2) running at 100%.  Of particular interest is that each 
time I press ctrl-c in the cmd.exe console this process was spawned 
from, a new thread appears in the process even though this program is no 
longer a foreground process and all other Cygwin processes have 
terminated.  The new threads never exit.

Same test case as before.  However, since reproducing this may be 
challenging, I dug in to try and get some details that might help with 
reproducing it.

It looks like thread 1 was interrupted while in a call to free().  Both 
thread 1 and 2 appear to be stuck looping on calls to yield().  Thread 3 
appears to be stuck in a call to WriteFile.  I suspect thread 3 was 
created by the initial ctrl-c event, but I'm not able to get an accurate 
stack trace for this thread to prove that.  Threads 4 and up correspond 
to new threads created for new ctrl-c events.

The following stack traces correspond to the above mentioned snapshot 
with cygwin1.dbg (from cygwin1-20130119.dbg.bz2) in place.

(gdb) thread 1
[Switching to thread 1 (Thread 5344.0x1878)]
#0  0x7767fbfa in ntdll!RtlUpdateClonedSRWLock () from 
/cygdrive/c/Windows/SysWOW64/ntdll.dll
(gdb) bt
#0  0x7767fbfa in ntdll!RtlUpdateClonedSRWLock () from 
/cygdrive/c/Windows/SysWOW64/ntdll.dll
#1  0x7767fbfa in ntdll!RtlUpdateClonedSRWLock () from 
/cygdrive/c/Windows/SysWOW64/ntdll.dll
#2  0x76792ed6 in KERNELBASE!GetThreadUILanguage ()
    from /cygdrive/c/Windows/syswow64/KERNELBASE.dll
#3  0x61087581 in yield ()
     at 
/netrel/src/cygwin-snapshot-20130119-1/winsup/cygwin/miscfuncs.cc:243
#4  0x610d6d9c in _sigfe () from 
/home/thonermann/cygwin/snapshot/usr/bin/cygwin1.dll
#5  0x61083180 in free ()
     at 
/netrel/src/cygwin-snapshot-20130119-1/winsup/cygwin/malloc_wrapper.cc:43
#6  0x00000010 in ?? ()
#7  0x00000000 in ?? ()

(gdb) thread 2
[Switching to thread 2 (Thread 5344.0x1ac8)]
#0  0x7767f99e in ntdll!RtlUpdateClonedSRWLock () from 
/cygdrive/c/Windows/SysWOW64/ntdll.dll
(gdb) bt
#0  0x7767f99e in ntdll!RtlUpdateClonedSRWLock () from 
/cygdrive/c/Windows/SysWOW64/ntdll.dll
#1  0x7767f99e in ntdll!RtlUpdateClonedSRWLock () from 
/cygdrive/c/Windows/SysWOW64/ntdll.dll
#2  0x76793a5e in SetThreadPriority () from 
/cygdrive/c/Windows/syswow64/KERNELBASE.dll
#3  0x6108759b in yield ()
     at 
/netrel/src/cygwin-snapshot-20130119-1/winsup/cygwin/miscfuncs.cc:244
#4  0x610d6eb4 in _cygtls::lock() () from 
/home/thonermann/cygwin/snapshot/usr/bin/cygwin1.dll
#5  0x610302ee in sigpacket::setup_handler (this=0x95ac04,
     handler=0x6102fdc0 <signal_exit(int, siginfo_t*)>, siga=..., 
tls=0x28ce64)
     at 
/netrel/src/cygwin-snapshot-20130119-1/winsup/cygwin/exceptions.cc:796
#6  0x610319d8 in sigpacket::process (this=0x95ac04)
     at 
/netrel/src/cygwin-snapshot-20130119-1/winsup/cygwin/exceptions.cc:1266
#7  0x610dd2ac in wait_sig ()
     at /netrel/src/cygwin-snapshot-20130119-1/winsup/cygwin/sigproc.cc:1389
#8  0x61003ea5 in cygthread::callfunc (this=0x6118b400, 
issimplestub=<optimized out>)
     at /netrel/src/cygwin-snapshot-20130119-1/winsup/cygwin/cygthread.cc:51
#9  0x6100442f in cygthread::stub (arg=0x6118b400)
     at /netrel/src/cygwin-snapshot-20130119-1/winsup/cygwin/cygthread.cc:93
#10 0x6100538d in _cygtls::call2 (this=<optimized out>,
     func=0x610043e0 <cygthread::stub(void*)>, arg=0x6118b400,
     buf=0x6100551b <_cygtls::call(unsigned long (*)(void*, void*), 
void*)+91>)
     at /netrel/src/cygwin-snapshot-20130119-1/winsup/cygwin/cygtls.cc:99
#11 0x0095ff88 in ?? ()
#12 0x76a8339a in KERNEL32!BaseCleanupAppcompatCacheSupport ()
    from /cygdrive/c/Windows/syswow64/kernel32.dll
#13 0x6118b400 in cygthread::exiting ()
    from /home/thonermann/cygwin/snapshot/usr/bin/cygwin1.dll
#14 0x0095ffd4 in ?? ()
#15 0x77699ef2 in ntdll!RtlpNtSetValueKey () from 
/cygdrive/c/Windows/SysWOW64/ntdll.dll
#16 0x6118b400 in cygthread::exiting ()
    from /home/thonermann/cygwin/snapshot/usr/bin/cygwin1.dll
#17 0x4449ca2d in ?? ()
#18 0x00000000 in ?? ()

(gdb) thread 3
[Switching to thread 3 (Thread 5344.0x1c2c)]
#0  0x7767f91d in ntdll!RtlUpdateClonedSRWLock () from 
/cygdrive/c/Windows/SysWOW64/ntdll.dll
(gdb) bt
#0  0x7767f91d in ntdll!RtlUpdateClonedSRWLock () from 
/cygdrive/c/Windows/SysWOW64/ntdll.dll
#1  0x7767f91d in ntdll!RtlUpdateClonedSRWLock () from 
/cygdrive/c/Windows/SysWOW64/ntdll.dll
#2  0x7678d4b5 in WriteFile () from 
/cygdrive/c/Windows/syswow64/KERNELBASE.dll
#3  0x0000009c in ?? ()
#4  0x00000000 in ?? ()

(gdb) thread 4
[Switching to thread 4 (Thread 5344.0x718)]
#0  0x7767f8b1 in ntdll!RtlUpdateClonedSRWLock () from 
/cygdrive/c/Windows/SysWOW64/ntdll.dll
(gdb) bt
#0  0x7767f8b1 in ntdll!RtlUpdateClonedSRWLock () from 
/cygdrive/c/Windows/SysWOW64/ntdll.dll
#1  0x7767f8b1 in ntdll!RtlUpdateClonedSRWLock () from 
/cygdrive/c/Windows/SysWOW64/ntdll.dll
#2  0x76790a91 in WaitForSingleObjectEx () from 
/cygdrive/c/Windows/syswow64/KERNELBASE.dll
#3  0x00000034 in ?? ()
#4  0x00000000 in ?? ()

Tom.


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Intermittent failures with ctrl-c
  2013-01-20 22:09                                                               ` Tom Honermann
@ 2013-01-23  3:20                                                                 ` Tom Honermann
  2013-01-23  5:27                                                                   ` Christopher Faylor
  0 siblings, 1 reply; 65+ messages in thread
From: Tom Honermann @ 2013-01-23  3:20 UTC (permalink / raw)
  To: cygwin

On 01/20/2013 05:08 PM, Tom Honermann wrote:
> However, I was still able to reproduce another case.  As before, one of
> the processes is being left running when the rest are terminated.  The
> "abandoned" process appears to be in a live-lock state with two threads
> (threads 1 and 2) running at 100%.  Of particular interest is that each
> time I press ctrl-c in the cmd.exe console this process was spawned
> from, a new thread appears in the process even though this program is no
> longer a foreground process and all other Cygwin processes have
> terminated.  The new threads never exit.

I noticed that more changes were checked in that looked like they might 
address this, so I tested again with the latest snapshot (20130123).

I wasn't able to reproduce any of the symptoms I previously reported.  Yay!

However, just as I was about to give up testing, I hit one more new 
issue.  One of the ctrl-c events sent bash into what appeared to be an 
infinite loop emitting error messages like these:

11408974 [unknown (0x144C)] bash 1752 exception::handle: Error while 
dumping state (probably corrupted stack)
11411584 [unknown (0x144C)] bash 1752 exception::handle: Error while 
dumping state (probably corrupted stack)

While this was going on, hitting ctrl-c had no discernible effect.  I 
resorted to killing the process via task manager.

This only occurred once, I wasn't able to get it to happen again.

Tom.


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Intermittent failures with ctrl-c
  2013-01-23  3:20                                                                 ` Tom Honermann
@ 2013-01-23  5:27                                                                   ` Christopher Faylor
  2013-01-23 18:18                                                                     ` Tom Honermann
  0 siblings, 1 reply; 65+ messages in thread
From: Christopher Faylor @ 2013-01-23  5:27 UTC (permalink / raw)
  To: cygwin

On Tue, Jan 22, 2013 at 10:20:20PM -0500, Tom Honermann wrote:
>On 01/20/2013 05:08 PM, Tom Honermann wrote:
>> However, I was still able to reproduce another case.  As before, one of
>> the processes is being left running when the rest are terminated.  The
>> "abandoned" process appears to be in a live-lock state with two threads
>> (threads 1 and 2) running at 100%.  Of particular interest is that each
>> time I press ctrl-c in the cmd.exe console this process was spawned
>> from, a new thread appears in the process even though this program is no
>> longer a foreground process and all other Cygwin processes have
>> terminated.  The new threads never exit.
>
>I noticed that more changes were checked in that looked like they might 
>address this, so I tested again with the latest snapshot (20130123).
>
>I wasn't able to reproduce any of the symptoms I previously reported.  Yay!
>
>However, just as I was about to give up testing, I hit one more new 
>issue.  One of the ctrl-c events sent bash into what appeared to be an 
>infinite loop emitting error messages like these:
>
>11408974 [unknown (0x144C)] bash 1752 exception::handle: Error while 
>dumping state (probably corrupted stack)
>11411584 [unknown (0x144C)] bash 1752 exception::handle: Error while 
>dumping state (probably corrupted stack)
>
>While this was going on, hitting ctrl-c had no discernible effect.  I 
>resorted to killing the process via task manager.
>
>This only occurred once, I wasn't able to get it to happen again.

Was there a stackdump?

cgf

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Intermittent failures with ctrl-c
  2013-01-23  5:27                                                                   ` Christopher Faylor
@ 2013-01-23 18:18                                                                     ` Tom Honermann
  2013-01-23 18:35                                                                       ` Christopher Faylor
  0 siblings, 1 reply; 65+ messages in thread
From: Tom Honermann @ 2013-01-23 18:18 UTC (permalink / raw)
  To: cygwin

On 01/23/2013 12:26 AM, Christopher Faylor wrote:
> On Tue, Jan 22, 2013 at 10:20:20PM -0500, Tom Honermann wrote:
>> However, just as I was about to give up testing, I hit one more new
>> issue.  One of the ctrl-c events sent bash into what appeared to be an
>> infinite loop emitting error messages like these:
>>
>> 11408974 [unknown (0x144C)] bash 1752 exception::handle: Error while
>> dumping state (probably corrupted stack)
>> 11411584 [unknown (0x144C)] bash 1752 exception::handle: Error while
>> dumping state (probably corrupted stack)
>>
>> While this was going on, hitting ctrl-c had no discernible effect.  I
>> resorted to killing the process via task manager.
>>
>> This only occurred once, I wasn't able to get it to happen again.
>
> Was there a stackdump?

Unfortunately no.  And I should have grabbed a stack trace, but I didn't.

I tried to reproduce again today using the same snapshot (20130123), but 
didn't have any luck.

I see you checked in a change to detect the infinite recursion.  I'd 
call that good enough.

I didn't encounter any further anomalies that I can positively attribute 
to Cygwin.  I did encounter a few that I suspect are cmd.exe issues that 
I'll report below.  I'm only reporting these for the curious, I am not 
requesting any action be taken with regard to these.

1) Some times a ctrl-C was ignored.  I would see ^C echoed to the 
console, but the test case would keep running without prompting to 
"Terminate batch job".

2) Some times cmd.exe would issue an error message about a syntax error 
in the .bat file following pressing ctrl-C and all processes would exit 
without prompting to "Terminate batch job".

Thank you for your prompt attention to all of these issues Chris!  I 
find it very impressive how responsive the Cygwin maintainers are to 
reports like these!

Tom.


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Intermittent failures with ctrl-c
  2013-01-23 18:18                                                                     ` Tom Honermann
@ 2013-01-23 18:35                                                                       ` Christopher Faylor
  2013-01-24  4:12                                                                         ` Tom Honermann
  0 siblings, 1 reply; 65+ messages in thread
From: Christopher Faylor @ 2013-01-23 18:35 UTC (permalink / raw)
  To: cygwin

On Wed, Jan 23, 2013 at 01:17:45PM -0500, Tom Honermann wrote:
>On 01/23/2013 12:26 AM, Christopher Faylor wrote:
>> On Tue, Jan 22, 2013 at 10:20:20PM -0500, Tom Honermann wrote:
>>> However, just as I was about to give up testing, I hit one more new
>>> issue.  One of the ctrl-c events sent bash into what appeared to be an
>>> infinite loop emitting error messages like these:
>>>
>>> 11408974 [unknown (0x144C)] bash 1752 exception::handle: Error while
>>> dumping state (probably corrupted stack)
>>> 11411584 [unknown (0x144C)] bash 1752 exception::handle: Error while
>>> dumping state (probably corrupted stack)
>>>
>>> While this was going on, hitting ctrl-c had no discernible effect.  I
>>> resorted to killing the process via task manager.
>>>
>>> This only occurred once, I wasn't able to get it to happen again.
>>
>> Was there a stackdump?
>
>Unfortunately no.  And I should have grabbed a stack trace, but I didn't.
>
>I tried to reproduce again today using the same snapshot (20130123), but 
>didn't have any luck.
>
>I see you checked in a change to detect the infinite recursion.  I'd 
>call that good enough.

That probably is relatively ok given that you're trying to terminate the
process anyway but it would be nice to know why the stackdump was
happening.

>Thank you for your prompt attention to all of these issues Chris!  I 
>find it very impressive how responsive the Cygwin maintainers are to 
>reports like these!

You're very welcome.  Thanks for hanging in there throughout this
process.

FYI, as it turns out, working around the thread exit problem uncovered a
whole host of issues with locking/signals/exit that have been lurking in
the code for a while.  So, this exercise should have made a better
Cygwin in the long-run.  It may even have made Cygwin a little faster.

cgf

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Intermittent failures with ctrl-c
  2013-01-23 18:35                                                                       ` Christopher Faylor
@ 2013-01-24  4:12                                                                         ` Tom Honermann
  0 siblings, 0 replies; 65+ messages in thread
From: Tom Honermann @ 2013-01-24  4:12 UTC (permalink / raw)
  To: cygwin

On 01/23/2013 01:35 PM, Christopher Faylor wrote:
> On Wed, Jan 23, 2013 at 01:17:45PM -0500, Tom Honermann wrote:
>> I see you checked in a change to detect the infinite recursion.  I'd
>> call that good enough.
>
> That probably is relatively ok given that you're trying to terminate the
> process anyway but it would be nice to know why the stackdump was
> happening.

Agreed.  I'll investigate and report any future cases I encounter.

Tom.


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Intermittent failures retrieving process exit codes
  2012-12-21  6:30   ` Tom Honermann
  2012-12-21 10:33     ` Corinna Vinschen
  2012-12-21 20:01     ` Intermittent failures retrieving process exit codes Tom Honermann
@ 2013-11-14  4:02     ` Tom Honermann
  2013-11-14  9:20       ` Corinna Vinschen
  2013-11-15 18:53       ` Denis Excoffier
  2 siblings, 2 replies; 65+ messages in thread
From: Tom Honermann @ 2013-11-14  4:02 UTC (permalink / raw)
  To: cygwin

On 12/21/2012 01:30 AM, Tom Honermann wrote:
> I spent most of the week debugging this issue.  This appears to be a
> defect in Windows.  I can reproduce the issue without Cygwin.  I can't
> rule out other third party kernel mode software possibly contributing to
> the issue.  A simple change to Cygwin works around the problem for me.
>
> I don't know which Windows releases are affected by this.  I've only
> reproduced the problem (outside of Cygwin) with Wow64 processes running
> on 64-bit Windows 7.  I haven't yet tried elsewhere.
>
> The problem appears to be a race condition involving concurrent calls to
> TerminateProcess() and ExitThread().  The example code below minimally
> mimics the threads created and exit process/thread calls that are
> performed when running Cygwin's false.exe.  The primary thread exits the
> process via TerminateProcess() ala pinfo::exit() in
> winsup/cygwin/pinfo.cc.  The secondary thread exits itself via
> ExitThread() ala Cygwin's signal processing thread function, wait_sig(),
> in winsup/cygwin/sigproc.cc.
>
> When the race condition results in the undesirable outcome, the exit
> code for the process is set to the exit code for the secondary thread's
> call to ExitThread().  I can only speculate at this point, but my guess
> is that the TerminateProcess() code disassociates the calling thread
> from the process before other threads are stopped such that
> ExitThread(), concurrently running in another thread, may determine that
> the calling thread is the last thread of the process and overwrite the
> process exit code.
>
> The issue also reproduces if ExitProcess() is called in place of
> TerminateProcess().  The test case below only uses TerminateProcess()
> because that is what Cygwin does.
>
> Source code to reproduce the issue follows.  Again, Cygwin is not
> required to reproduce the problem.  For my own testing, I compiled the
> code using Microsoft's Visual Studio 2010 x86 compiler with the command
> 'cl /Fetest-exit-code.exe test-exit-code.cpp'
>
> test-exit-code.cpp:
>
> #include <windows.h>
> #include <stdio.h>
> #include <stdlib.h>
>
> DWORD WINAPI SecondaryThread(
>      LPVOID lpParameter)
> {
>      Sleep(1);
>      ExitThread(2);
> }
>
> int main() {
>      HANDLE hSecondaryThread = CreateThread(
>          NULL,                               // lpThreadAttributes
>          0,                                  // dwStackSize
>          SecondaryThread,                    // lpStartAddress
>          (LPVOID)0,                          // lpParameter
>          0,                                  // dwCreationFlags
>          NULL);                              // lpThreadId
>      if (!hSecondaryThread) {
>          fprintf(stderr, "CreateThread failed.  GLE=%lu\n",
>              (unsigned long)GetLastError());
>          exit(127);
>      }
>
>      Sleep(1);
>
>      if (!TerminateProcess(GetCurrentProcess(), 1)) {
>          fprintf(stderr, "TerminateProcess failed.  GLE=%lu\n",
>              (unsigned long)GetLastError());
>          exit(127);
>      }
>
>      return 0;
> }
>
>
> To run the test, a simple .bat file is used:
>
> test.bat:
>
> @echo off
> setlocal
>
> :loop
> echo test...
> test-exit-code.exe
> if %ERRORLEVEL% NEQ 1 (
>      echo test-exit-code.exe returned %ERRORLEVEL%
>      exit /B 1
> )
> goto loop
>
>
> test.bat should run indefinitely.  The amount of time it takes to fail
> on my machine (64-bit Windows 7 running in a VMware Workstation 8 VM
> under Kubuntu 12.04 on a Lenovo T420 Intel i7-2640M 2 processor laptop)
> varies considerably.  I had one run fail in less than 10 iterations, but
> most of the time it has taken upwards of 5 minutes to get a failure.
>
> The workaround I implemented within Cygwin was simple and sloppy.  I
> added a call to Sleep(1000) immediately before the call to ExitThread()
> in wait_sig() in winsup/cygwin/sigproc.cc.  Since this thread (probably)
> doesn't exit until the process is exiting anyway, the call to Sleep()
> does not adversely affect shutdown.  The thread just gets terminated
> while in the call to Sleep() instead of exiting before the process is
> terminated or getting terminated while still in the call to
> ExitThread().  A better solution might be to avoid the thread exiting at
> all (so long as it can't get terminated while holding critical
> resources), or to have the process exiting thread wait on it.  Neither
> of these is ideal.  Orderly shutdown of multi-threaded processes is
> really hard to do correctly on Windows.
>
> Since the exit code for the signal processing thread is not used, having
> the wait_sig() thread (and any other threads that could potentially
> concurrently exit with another thread) exit with a special status value
> such as STATUS_THREAD_IS_TERMINATING (0xC000004BL) would enable
> diagnosis of this issue as any process exit code matching this would be
> a likely indicator that this issue was encountered.
>
> As is, when this race condition results in the undesirable outcome,
> since the signal processing thread exits with a status of 0, the exit
> status of the process is 0.  This explains why false.exe works so well
> to reproduce the issue.  It would be impossible to produce a negative
> test using true.exe.
>
> Tom.

Time passes...

I worked with some former colleagues to report this issue to Microsoft. 
  Windows 8.1 and Windows Server 2012 R2 contain a fix that addresses 
the test case above.  A hotfix has been made available for Windows 7 SP1 
and Windows Server 2008 R2.  Should anyone desire a hotfix for other 
versions of Windows, it will be necessary to open a case with Microsoft 
to request it.

http://support.microsoft.com/kb/2875501

Tom.

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Intermittent failures retrieving process exit codes
  2013-11-14  4:02     ` Tom Honermann
@ 2013-11-14  9:20       ` Corinna Vinschen
  2013-11-14 15:21         ` Tom Honermann
  2013-11-15 18:53       ` Denis Excoffier
  1 sibling, 1 reply; 65+ messages in thread
From: Corinna Vinschen @ 2013-11-14  9:20 UTC (permalink / raw)
  To: cygwin

[-- Attachment #1: Type: text/plain, Size: 1754 bytes --]

Hi Tom,

On Nov 13 23:01, Tom Honermann wrote:
> On 12/21/2012 01:30 AM, Tom Honermann wrote:
> >[...]
> >When the race condition results in the undesirable outcome, the exit
> >code for the process is set to the exit code for the secondary thread's
> >call to ExitThread().  I can only speculate at this point, but my guess
> >is that the TerminateProcess() code disassociates the calling thread
> >from the process before other threads are stopped such that
> >ExitThread(), concurrently running in another thread, may determine that
> >the calling thread is the last thread of the process and overwrite the
> >process exit code.
> >[...]
> 
> Time passes...
> 
> I worked with some former colleagues to report this issue to
> Microsoft.  Windows 8.1 and Windows Server 2012 R2 contain a fix
> that addresses the test case above.  A hotfix has been made
> available for Windows 7 SP1 and Windows Server 2008 R2.  Should
> anyone desire a hotfix for other versions of Windows, it will be
> necessary to open a case with Microsoft to request it.
> 
> http://support.microsoft.com/kb/2875501
> 
> Tom.

thanks for letting us know!

I'm very glad to read that this is an OS bug and a fix is available.

At least partially.  I'm a bit confused.  As far as I understand it this
is the situation now:

  Vista/2008 and earlier:  no fix available.
  W7/2008R2:               only hotfix for manual installation
  W8/2012:                 no fix available.
  W8.1/2012R2:             fixed.

Did I get that right?  That sounds a bit weird...


Thanks again,
Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Intermittent failures retrieving process exit codes
  2013-11-14  9:20       ` Corinna Vinschen
@ 2013-11-14 15:21         ` Tom Honermann
  0 siblings, 0 replies; 65+ messages in thread
From: Tom Honermann @ 2013-11-14 15:21 UTC (permalink / raw)
  To: cygwin

On 11/14/2013 04:19 AM, Corinna Vinschen wrote:
> thanks for letting us know!

You're welcome :)

> I'm very glad to read that this is an OS bug and a fix is available.
>
> At least partially.  I'm a bit confused.  As far as I understand it this
> is the situation now:
>
>    Vista/2008 and earlier:  no fix available.
>    W7/2008R2:               only hotfix for manual installation
>    W8/2012:                 no fix available.
>    W8.1/2012R2:             fixed.
>
> Did I get that right?  That sounds a bit weird...

That is how I understand it.  Microsoft requires a Premier Support 
agreement in order to request hotfixes and I am not a party on any such 
agreement.  So, I worked with former colleagues at another company that 
does have a Premier support agreement and that I knew were also 
experiencing the issue.  They only requested a hotfix for Windows 7 SP1 
and Windows 2008 R2 as those are the only Windows releases they were 
concerned about having a fix for.  The result: it is fixed in currently 
shipping versions and a hotfix is available for those specific releases, 
but other releases remain vulnerable.  Addressing those releases will 
presumably require someone with access to a Premier Support agreement to 
request additional hotfix releases.

Tom.

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Intermittent failures retrieving process exit codes
  2013-11-14  4:02     ` Tom Honermann
  2013-11-14  9:20       ` Corinna Vinschen
@ 2013-11-15 18:53       ` Denis Excoffier
  2013-11-15 19:21         ` Christopher Faylor
                           ` (2 more replies)
  1 sibling, 3 replies; 65+ messages in thread
From: Denis Excoffier @ 2013-11-15 18:53 UTC (permalink / raw)
  To: Tom Honermann, Cygwin Mailing List; +Cc: lasse.collin

On 2013-11-14 05:01, Tom Honermann wrote:
> On 12/21/2012 01:30 AM, Tom Honermann wrote:
>> 
>> The workaround I implemented within Cygwin was simple and sloppy.  I
>> added a call to Sleep(1000) immediately before the call to ExitThread()
>> in wait_sig() in winsup/cygwin/sigproc.cc.  Since this thread (probably)
>> doesn't exit until the process is exiting anyway, the call to Sleep()
>> does not adversely affect shutdown.  The thread just gets terminated
>> while in the call to Sleep() instead of exiting before the process is
>> terminated or getting terminated while still in the call to
>> ExitThread().  A better solution might be to avoid the thread exiting at
>> all (so long as it can't get terminated while holding critical
>> resources), or to have the process exiting thread wait on it.  Neither
>> of these is ideal.  Orderly shutdown of multi-threaded processes is
>> really hard to do correctly on Windows.

I experience on Windows 7 (not on XP) some problems that may be related.
I would like to test your workaround, but sigproc.cc has much changed since
then, there is now an exit_thead function with the comment "Exit the current
thread very carefully.". I tried to insert Sleep(1000) at the end of
exit_thread, immediately before "ExitThread (0)", but this yielded no
change at all.

Could someone be kind enough to update the workaround for modern sigproc.cc?

Very briefly, my problem is that when i "tar xf —use-compress-program=xz", i
get:
tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now
and the last file of the archive is truncated at some 512bytes block. This
occurs on Windows 7 (not on XP); with xz-5.1.3alpha (not with xz-5.1.2alpha or
xz-5.0.5); never on most tar.xz files; almost always on some (rare) tar.xz files
(one notable example is bc-1.06.95.tar.bz2 bunzip2’ed and then xz’ed); depends
on the .tar file itself, not on the option (like -9e, -0) used to create the
.tar.xz; never with "tar tf"; and with all tar’s i have tested. The return code
of all the involved xz -d commands is always zero though. Perhaps after all, this
is unrelated?

Thank you.

Regards,

Denis Excoffier.
--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Intermittent failures retrieving process exit codes
  2013-11-15 18:53       ` Denis Excoffier
@ 2013-11-15 19:21         ` Christopher Faylor
  2013-11-17 13:30           ` Denis Excoffier
  2013-11-15 22:15         ` Tom Honermann
  2013-11-25 19:59         ` Lasse Collin
  2 siblings, 1 reply; 65+ messages in thread
From: Christopher Faylor @ 2013-11-15 19:21 UTC (permalink / raw)
  To: cygwin

On Fri, Nov 15, 2013 at 07:53:26PM +0100, Denis Excoffier wrote:
>On 2013-11-14 05:01, Tom Honermann wrote:
>> On 12/21/2012 01:30 AM, Tom Honermann wrote:
>>> 
>>> The workaround I implemented within Cygwin was simple and sloppy.  I
>>> added a call to Sleep(1000) immediately before the call to ExitThread()
>>> in wait_sig() in winsup/cygwin/sigproc.cc.  Since this thread (probably)
>>> doesn't exit until the process is exiting anyway, the call to Sleep()
>>> does not adversely affect shutdown.  The thread just gets terminated
>>> while in the call to Sleep() instead of exiting before the process is
>>> terminated or getting terminated while still in the call to
>>> ExitThread().  A better solution might be to avoid the thread exiting at
>>> all (so long as it can't get terminated while holding critical
>>> resources), or to have the process exiting thread wait on it.  Neither
>>> of these is ideal.  Orderly shutdown of multi-threaded processes is
>>> really hard to do correctly on Windows.
>
>I experience on Windows 7 (not on XP) some problems that may be related.
>I would like to test your workaround, but sigproc.cc has much changed since
>then, there is now an exit_thead function with the comment "Exit the current
>thread very carefully.". I tried to insert Sleep(1000) at the end of
>exit_thread, immediately before "ExitThread (0)", but this yielded no
>change at all.
>
>Could someone be kind enough to update the workaround for modern sigproc.cc?

You apparently are misunderstanding the whole point of the changes to
sigproc.cc.  They were to work around this very problem.

cgf

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Intermittent failures retrieving process exit codes
  2013-11-15 18:53       ` Denis Excoffier
  2013-11-15 19:21         ` Christopher Faylor
@ 2013-11-15 22:15         ` Tom Honermann
  2013-11-25 19:59         ` Lasse Collin
  2 siblings, 0 replies; 65+ messages in thread
From: Tom Honermann @ 2013-11-15 22:15 UTC (permalink / raw)
  To: Denis Excoffier, Cygwin Mailing List; +Cc: lasse.collin

On 11/15/2013 01:53 PM, Denis Excoffier wrote:
> On 2013-11-14 05:01, Tom Honermann wrote:
>> On 12/21/2012 01:30 AM, Tom Honermann wrote:
>>>
>>> The workaround I implemented within Cygwin was simple and sloppy.  I
>>> added a call to Sleep(1000) immediately before the call to ExitThread()
>>> in wait_sig() in winsup/cygwin/sigproc.cc.  Since this thread (probably)
>>> doesn't exit until the process is exiting anyway, the call to Sleep()
>>> does not adversely affect shutdown.  The thread just gets terminated
>>> while in the call to Sleep() instead of exiting before the process is
>>> terminated or getting terminated while still in the call to
>>> ExitThread().  A better solution might be to avoid the thread exiting at
>>> all (so long as it can't get terminated while holding critical
>>> resources), or to have the process exiting thread wait on it.  Neither
>>> of these is ideal.  Orderly shutdown of multi-threaded processes is
>>> really hard to do correctly on Windows.
>
> I experience on Windows 7 (not on XP) some problems that may be related.
> I would like to test your workaround, but sigproc.cc has much changed since
> then, there is now an exit_thead function with the comment "Exit the current
> thread very carefully.". I tried to insert Sleep(1000) at the end of
> exit_thread, immediately before "ExitThread (0)", but this yielded no
> change at all.
>
> Could someone be kind enough to update the workaround for modern sigproc.cc?

Hi Denis.  Cygwin versions 1.7.18 and later contain a workaround for 
this issue.  If you are running something older than that, I highly 
encourage you to upgrade.  Many stability related fixes have been made 
in more recent versions.

> Very briefly, my problem is that when i "tar xf —use-compress-program=xz", i
> get:
> tar: Unexpected EOF in archive
> tar: Unexpected EOF in archive
> tar: Error is not recoverable: exiting now
> and the last file of the archive is truncated at some 512bytes block. This
> occurs on Windows 7 (not on XP); with xz-5.1.3alpha (not with xz-5.1.2alpha or
> xz-5.0.5); never on most tar.xz files; almost always on some (rare) tar.xz files
> (one notable example is bc-1.06.95.tar.bz2 bunzip2Â’ed and then xzÂ’ed); depends
> on the .tar file itself, not on the option (like -9e, -0) used to create the
> .tar.xz; never with "tar tf"; and with all tarÂ’s i have tested. The return code
> of all the involved xz -d commands is always zero though. Perhaps after all, this
> is unrelated?

This doesn't sound related to the intermittent incorrect exit code 
defect to me.  I'm afraid I don't have other explanations for what you 
are experiencing though.

Tom.

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Intermittent failures retrieving process exit codes
  2013-11-15 19:21         ` Christopher Faylor
@ 2013-11-17 13:30           ` Denis Excoffier
  0 siblings, 0 replies; 65+ messages in thread
From: Denis Excoffier @ 2013-11-17 13:30 UTC (permalink / raw)
  To: Cygwin Mailing List

On 2013-11-15 20:21, Christopher Faylor wrote:
> On Fri, Nov 15, 2013 at 07:53:26PM +0100, Denis Excoffier wrote:
>> On 2013-11-14 05:01, Tom Honermann wrote:
>>> On 12/21/2012 01:30 AM, Tom Honermann wrote:
>>>> 
>>>> The workaround I implemented within Cygwin was simple and sloppy.  I
>>>> added a call to Sleep(1000) immediately before the call to ExitThread()
>>>> in wait_sig() in winsup/cygwin/sigproc.cc.  Since this thread (probably)
>>>> doesn't exit until the process is exiting anyway, the call to Sleep()
>>>> does not adversely affect shutdown.  The thread just gets terminated
>>>> while in the call to Sleep() instead of exiting before the process is
>>>> terminated or getting terminated while still in the call to
>>>> ExitThread().  A better solution might be to avoid the thread exiting at
>>>> all (so long as it can't get terminated while holding critical
>>>> resources), or to have the process exiting thread wait on it.  Neither
>>>> of these is ideal.  Orderly shutdown of multi-threaded processes is
>>>> really hard to do correctly on Windows.
>> 
>> I experience on Windows 7 (not on XP) some problems that may be related.
>> I would like to test your workaround, but sigproc.cc has much changed since
>> then, there is now an exit_thead function with the comment "Exit the current
>> thread very carefully.". I tried to insert Sleep(1000) at the end of
>> exit_thread, immediately before "ExitThread (0)", but this yielded no
>> change at all.
>> 
>> Could someone be kind enough to update the workaround for modern sigproc.cc?
> 
> You apparently are misunderstanding the whole point of the changes to
> sigproc.cc.  They were to work around this very problem.

Oh, i didn’t remember that. Then this must be the antivirus or something else
i have to cope with.

Regards,

Denis Excoffier.


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Intermittent failures retrieving process exit codes
  2013-11-15 18:53       ` Denis Excoffier
  2013-11-15 19:21         ` Christopher Faylor
  2013-11-15 22:15         ` Tom Honermann
@ 2013-11-25 19:59         ` Lasse Collin
  2013-11-25 23:12           ` Antivirus strikes back (probably) (Was: Intermittent failures retrieving process exit codes) Denis Excoffier
  2 siblings, 1 reply; 65+ messages in thread
From: Lasse Collin @ 2013-11-25 19:59 UTC (permalink / raw)
  To: Denis Excoffier; +Cc: Tom Honermann, Cygwin Mailing List

On 2013-11-15 Denis Excoffier wrote:
> Very briefly, my problem is that when i "tar xf
> —use-compress-program=xz", i get:
> tar: Unexpected EOF in archive
> tar: Unexpected EOF in archive
> tar: Error is not recoverable: exiting now
> and the last file of the archive is truncated at some 512bytes block.
> This occurs on Windows 7 (not on XP); with xz-5.1.3alpha (not with
> xz-5.1.2alpha or xz-5.0.5); never on most tar.xz files; almost always
> on some (rare) tar.xz files (one notable example is
> bc-1.06.95.tar.bz2 bunzip2’ed and then xz’ed); depends on the .tar
> file itself, not on the option (like -9e, -0) used to create
> the .tar.xz; never with "tar tf"; and with all tar’s i have tested.
> The return code of all the involved xz -d commands is always zero
> though. Perhaps after all, this is unrelated?

xz 5.1.3alpha has some new file I/O code that uses non-blocking file
descriptors, the self-pipe trick, and poll(). It's there to fix a race
condition in signal handling. Since you say it works with 5.1.2alpha, I
wonder could there be a bug with the new I/O code in xz or if the code
in xz triggers a bug in Cygwin or Windows.

If you haven't already tried, please compile both 5.1.2alpha and
5.1.3alpha from source while keeping everything else unchanged, and see
if the bug really only occurs with 5.1.3alpha.

-- 
Lasse Collin  |  IRC: Larhzu @ IRCnet & Freenode

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Antivirus strikes back (probably) (Was: Intermittent failures retrieving process exit codes)
  2013-11-25 19:59         ` Lasse Collin
@ 2013-11-25 23:12           ` Denis Excoffier
  2013-11-26 21:09             ` Denis Excoffier
                               ` (2 more replies)
  0 siblings, 3 replies; 65+ messages in thread
From: Denis Excoffier @ 2013-11-25 23:12 UTC (permalink / raw)
  To: Lasse Collin; +Cc: Tom Honermann, Cygwin Mailing List

On 2013-11-25 à 21:58 +02:00, Lasse Collin wrote:
> On 2013-11-15 Denis Excoffier wrote:
>> Very briefly, my problem is that when i "tar xf
>> —use-compress-program=xz", i get:
>> tar: Unexpected EOF in archive
>> tar: Unexpected EOF in archive
>> tar: Error is not recoverable: exiting now
>> and the last file of the archive is truncated at some 512bytes block.
>> This occurs on Windows 7 (not on XP); with xz-5.1.3alpha (not with
>> xz-5.1.2alpha or xz-5.0.5); never on most tar.xz files; almost always
>> on some (rare) tar.xz files (one notable example is
>> bc-1.06.95.tar.bz2 bunzip2’ed and then xz’ed); depends on the .tar
>> file itself, not on the option (like -9e, -0) used to create
>> the .tar.xz; never with "tar tf"; and with all tar’s i have tested.
>> The return code of all the involved xz -d commands is always zero
>> though. Perhaps after all, this is unrelated?
> 
> xz 5.1.3alpha has some new file I/O code that uses non-blocking file
> descriptors, the self-pipe trick, and poll(). It's there to fix a race
> condition in signal handling. Since you say it works with 5.1.2alpha, I
> wonder could there be a bug with the new I/O code in xz or if the code
> in xz triggers a bug in Cygwin or Windows.
> 
> If you haven't already tried, please compile both 5.1.2alpha and
> 5.1.3alpha from source while keeping everything else unchanged, and see
> if the bug really only occurs with 5.1.3alpha.
Already done. I did some strace-ing, and since i’m not so fluent with the
result, i’ll send it there in a while (when i’m back on cygwin) if someone is
interested. But the bug (contrary to what i said before) also _sometimes_
occurs with 5.1.2alpha or 5.0.5 and this makes me think now that:

a) my antivirus-anti-intrusion-whatever-software (that i can’t remove of
course) creates some kind of "background noise" where a certain percentage
of such ‘tar xf —use-compress-program’ commands will always fail

b) nevertheless, xz-5.1.3alpha (with its new file I/O code etc.) triggers some
untypical configuration inside the antivirus that increases drastically the
percentage, making the failure almost certain for some files.

It is not extraordinary that i cannot observe the failure on XP since
i do not have this particular antivirus on XP.

You might however want some more detail. Test plan is: perform
'tar xf file.xz --use-compress-program=xz -bx', where x varies from 1 to 200.
There are two kinds of results:

1) usual situation is where you observe max 1 or 2 failures (on a maximum of 200).
If you launch the same plan, you still report max 1 or 2 failures, usually not
with the same numbers. Very often you have no failure at all. Very often the
-b20 (the default) does not fail.
-> this situation occurs with 5.1.2alpha or 5.0.5 with all input files, or with
5.1.3alpha with most input files.

2) pathological situation is where you observe, say, 30 failures (on a maximum of 200).
If you launch the same plan, you report nearly the same failures, ie mostly the same
ones, with some minor variability analogous to the variability observed in the usual
situation above
-> this situation occurs with 5.1.3alpha only, with some selected input files,
eg bc-1.06.95.tar.xz (see above how to create bc-1.06.95.tar.xz)

When it fails (usually or pathologically), the last file of the archive gets
truncated (see above), and _this_ is strange from an antivirus behaviour. After
all, perhaps some flush() or similar is missing inside 5.1.3alpha.

Denis Excoffier.
--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Antivirus strikes back (probably) (Was: Intermittent failures retrieving process exit codes)
  2013-11-25 23:12           ` Antivirus strikes back (probably) (Was: Intermittent failures retrieving process exit codes) Denis Excoffier
@ 2013-11-26 21:09             ` Denis Excoffier
  2013-11-26 21:09             ` Denis Excoffier
  2013-12-01 13:24             ` Lasse Collin
  2 siblings, 0 replies; 65+ messages in thread
From: Denis Excoffier @ 2013-11-26 21:09 UTC (permalink / raw)
  To: Lasse Collin; +Cc: Tom Honermann, Cygwin Mailing List

[-- Attachment #1: Type: text/plain, Size: 418 bytes --]

On 2013-11-26 00:11 +01:00, Denis Excoffier wrote:
> Already done. I did some strace-ing, and since i’m not so fluent with the
> result, i’ll send it there in a while (when i’m back on cygwin) if someone is
> interested. But the bug (contrary to what i said before) also _sometimes_
> occurs with 5.1.2alpha or 5.0.5 and this makes me think now that:

This is part2. Just cat typescript-part1 typescript-part2.

[-- Attachment #2: typescript-part2.xz --]
[-- Type: application/octet-stream, Size: 53528 bytes --]

[-- Attachment #3: Type: text/plain, Size: 218 bytes --]

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Antivirus strikes back (probably) (Was: Intermittent failures retrieving process exit codes)
  2013-11-25 23:12           ` Antivirus strikes back (probably) (Was: Intermittent failures retrieving process exit codes) Denis Excoffier
  2013-11-26 21:09             ` Denis Excoffier
@ 2013-11-26 21:09             ` Denis Excoffier
  2013-11-26 23:36               ` Christopher Faylor
  2013-12-01 13:24             ` Lasse Collin
  2 siblings, 1 reply; 65+ messages in thread
From: Denis Excoffier @ 2013-11-26 21:09 UTC (permalink / raw)
  To: Lasse Collin; +Cc: Tom Honermann, Cygwin Mailing List

[-- Attachment #1: Type: text/plain, Size: 695 bytes --]

On 2013-11-26 00:11 +01:00, Denis Excoffier wrote:
> Already done. I did some strace-ing, and since i’m not so fluent with the
> result, i’ll send it there in a while (when i’m back on cygwin) if someone is
> interested. But the bug (contrary to what i said before) also _sometimes_
> occurs with 5.1.2alpha or 5.0.5 and this makes me think now that:
Here is the result of strace (with minor editing). I kept the whole strace (12000 lines),
because xz ends rather early (around line 10000).

2bc-1.06.95.tar.xz is a file built using bunzip2 | xz -c

Note the presence of Win32 error 109 (broken pipe).

Regards,

Denis Excoffier.

This is part1. part2 follows in a few minutes.

[-- Attachment #2: typescript-part1.xz --]
[-- Type: application/octet-stream, Size: 55188 bytes --]

[-- Attachment #3: Type: text/plain, Size: 218 bytes --]

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Antivirus strikes back (probably) (Was: Intermittent failures retrieving process exit codes)
  2013-11-26 21:09             ` Denis Excoffier
@ 2013-11-26 23:36               ` Christopher Faylor
  0 siblings, 0 replies; 65+ messages in thread
From: Christopher Faylor @ 2013-11-26 23:36 UTC (permalink / raw)
  To: cygwin

On Tue, Nov 26, 2013 at 10:09:19PM +0100, Denis Excoffier wrote:
>On 2013-11-26 00:11 +01:00, Denis Excoffier wrote:
>> Already done. I did some strace-ing, and since i?m not so fluent with the
>> result, i?ll send it there in a while (when i?m back on cygwin) if someone is
>> interested. But the bug (contrary to what i said before) also _sometimes_
>> occurs with 5.1.2alpha or 5.0.5 and this makes me think now that:
>Here is the result of strace (with minor editing). I kept the whole strace (12000 lines),
>because xz ends rather early (around line 10000).
>
>2bc-1.06.95.tar.xz is a file built using bunzip2 | xz -c
>
>Note the presence of Win32 error 109 (broken pipe).

Please don't post unsolicited straces to this list.  No one is going to
be looking at them and they just clog up the mailing list.

cgf

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Antivirus strikes back (probably) (Was: Intermittent failures retrieving process exit codes)
  2013-11-25 23:12           ` Antivirus strikes back (probably) (Was: Intermittent failures retrieving process exit codes) Denis Excoffier
  2013-11-26 21:09             ` Denis Excoffier
  2013-11-26 21:09             ` Denis Excoffier
@ 2013-12-01 13:24             ` Lasse Collin
  2 siblings, 0 replies; 65+ messages in thread
From: Lasse Collin @ 2013-12-01 13:24 UTC (permalink / raw)
  To: Denis Excoffier; +Cc: Tom Honermann, Cygwin Mailing List

On 2013-11-26 Denis Excoffier wrote:
> On 2013-11-25 à 21:58 +02:00, Lasse Collin wrote:
> > If you haven't already tried, please compile both 5.1.2alpha and
> > 5.1.3alpha from source while keeping everything else unchanged, and
> > see if the bug really only occurs with 5.1.3alpha.
> Already done. I did some strace-ing, and since i’m not so fluent with
> the result, i’ll send it there in a while (when i’m back on cygwin)
> if someone is interested. But the bug (contrary to what i said
> before) also _sometimes_ occurs with 5.1.2alpha or 5.0.5 and this
> makes me think now that:
> 
> a) my antivirus-anti-intrusion-whatever-software (that i can’t remove
> of course) creates some kind of "background noise" where a certain
> percentage of such ‘tar xf —use-compress-program’ commands will
> always fail
> 
> b) nevertheless, xz-5.1.3alpha (with its new file I/O code etc.)
> triggers some untypical configuration inside the antivirus that
> increases drastically the percentage, making the failure almost
> certain for some files.
> 
> It is not extraordinary that i cannot observe the failure on XP since
> i do not have this particular antivirus on XP.

OK, so the new I/O code in xz probably isn't the problem even if it may
affect how easily the actual problem gets triggered.

[...]
> When it fails (usually or pathologically), the last file of the
> archive gets truncated (see above), and _this_ is strange from an
> antivirus behaviour. After all, perhaps some flush() or similar is
> missing inside 5.1.3alpha.

xz uses write() which uses a file descriptor argument, so there is
nothing to flush separately. xz just has to write() everything.

When used with tar, xz writes to standard output (FILENO_STDOUT) which
with tar is a pipe. When xz finishes, it closes its end (the writer end)
of the pipe.

With xz 5.1.3alpha, O_NONBLOCK flag is set for FILENO_STDIN and
FILENO_STDOUT if the flag wasn't already set. If xz set the flag, it
will unset it before closing the file descriptor. The setting and
unsetting can be seen in the trace you sent and it seems to work
correctly. I don't have a guess if these fcntl() calls might cause the
difference between 5.1.3alpha and other versions, but it doesn't sound
too important since the bug occurs in some form with all versions.

From the trace file it seems that the last write() from xz gets lost.
xz first makes 173 writes of 8192 bytes and then one 6144-byte write,
totalling 1,423,360 bytes. tar gets 1,417,216 from xz, that is, 6144
bytes too little.

Since things go wrong with old xz versions that don't use non-blocking
I/O, I would expect you to see similar issues with other compressors
too. Maybe it would be worth testing with gzip and bzip2 in the same
way you did with xz 5.0.5.

-- 
Lasse Collin  |  IRC: Larhzu @ IRCnet & Freenode

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 65+ messages in thread

end of thread, other threads:[~2013-12-01 13:24 UTC | newest]

Thread overview: 65+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-12-07 19:55 Intermittent failures retrieving process exit codes Tom Honermann
2012-12-07 21:54 ` Tom Honermann
2012-12-07 23:07 ` bartels
2012-12-21  6:30   ` Tom Honermann
2012-12-21 10:33     ` Corinna Vinschen
2012-12-21 12:15       ` Nick Lowe
2012-12-21 19:45         ` Tom Honermann
2012-12-22  3:09           ` Nick Lowe
2012-12-21 16:10       ` Christopher Faylor
2012-12-21 17:02         ` Corinna Vinschen
2012-12-21 19:36           ` Intermittent failures retrieving process exit codes - snapshot test requested Christopher Faylor
2012-12-21 20:37             ` Daniel Colascione
2012-12-21 22:23             ` marco atzeri
2012-12-21 23:09               ` Tom Honermann
2012-12-22  2:53                 ` Christopher Faylor
2012-12-22  2:57                   ` Tom Honermann
2012-12-22  2:49               ` Christopher Faylor
2012-12-22  3:14                 ` Christopher Faylor
2012-12-22  9:06                   ` marco atzeri
2012-12-22 17:50                     ` Christopher Faylor
2012-12-23 16:56                       ` Christopher Faylor
2012-12-23 18:54                         ` marco atzeri
2012-12-27 20:50                         ` Tom Honermann
2012-12-29 21:57                           ` Christopher Faylor
2013-01-01  1:45                             ` Tom Honermann
2013-01-01  5:36                               ` Christopher Faylor
2013-01-02 19:15                                 ` Tom Honermann
2013-01-02 20:48                                   ` Christopher Faylor
2013-01-02 20:53                                     ` Daniel Colascione
2013-01-02 21:41                                       ` Christopher Faylor
2013-01-02 21:25                                     ` Tom Honermann
2013-01-15 22:17                                       ` Intermittent failures with ctrl-c (was: retrieving process exit codes) Tom Honermann
2013-01-16  2:04                                         ` Christopher Faylor
2013-01-16 16:38                                           ` Intermittent failures with ctrl-c Tom Honermann
2013-01-16 16:53                                             ` marco atzeri
2013-01-16 17:42                                               ` Tom Honermann
2013-01-16 18:05                                                 ` Earnie Boyd
2013-01-16 18:51                                                   ` Tom Honermann
2013-01-16 18:59                                                     ` Christopher Faylor
2013-01-16 20:19                                                       ` Tom Honermann
2013-01-16 22:23                                                         ` Christopher Faylor
2013-01-18 20:12                                                           ` Tom Honermann
2013-01-19  5:58                                                             ` Christopher Faylor
2013-01-20 22:09                                                               ` Tom Honermann
2013-01-23  3:20                                                                 ` Tom Honermann
2013-01-23  5:27                                                                   ` Christopher Faylor
2013-01-23 18:18                                                                     ` Tom Honermann
2013-01-23 18:35                                                                       ` Christopher Faylor
2013-01-24  4:12                                                                         ` Tom Honermann
2013-01-16 19:14                                             ` Christopher Faylor
2013-01-16 20:24                                               ` Tom Honermann
2012-12-21 20:01     ` Intermittent failures retrieving process exit codes Tom Honermann
2013-11-14  4:02     ` Tom Honermann
2013-11-14  9:20       ` Corinna Vinschen
2013-11-14 15:21         ` Tom Honermann
2013-11-15 18:53       ` Denis Excoffier
2013-11-15 19:21         ` Christopher Faylor
2013-11-17 13:30           ` Denis Excoffier
2013-11-15 22:15         ` Tom Honermann
2013-11-25 19:59         ` Lasse Collin
2013-11-25 23:12           ` Antivirus strikes back (probably) (Was: Intermittent failures retrieving process exit codes) Denis Excoffier
2013-11-26 21:09             ` Denis Excoffier
2013-11-26 21:09             ` Denis Excoffier
2013-11-26 23:36               ` Christopher Faylor
2013-12-01 13:24             ` Lasse Collin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).