public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
* Re: Cygwin crashes in kill_pgrp, _pinfo truncation issue.
@ 2012-08-23 11:35 Andrey Khalyavin
  2012-09-17 11:03 ` Andrey Khalyavin
  0 siblings, 1 reply; 6+ messages in thread
From: Andrey Khalyavin @ 2012-08-23 11:35 UTC (permalink / raw)
  To: cygwin

On Wed, 15 Aug 2012 10:11:16 -0400, Christopher Faylor wrote:
>On Wed, Aug 15, 2012 at 04:54:42PM +0400, Andrey Khalyavin wrote:
>>I finally got a cygwin crash dump from our build bots. It shows, that
>>cygwin1.dll crashes in kill_pgrp function on line:
>>	  (pid > 1 && p->pgid != pid) ||
>>where p is a pointer to _pinfo. This function enumerates all _pinfo's
>>and executes this line for all of them which pass p->exists() check.
>>In crash dump p points to _pinfo that has process_state equal to
>>PID_IN_USE | PID_EXECED.
>
>Thanks for tracking this down.  I've added a check for "execed" to
>_pinfo::exists.
>
>cgf

I got two more crash dumps from our bots running 20120816 snapshot.
Both of them crashed in this place. process_state equals to
PID_IN_USE | PIE_EXECED and your new check in _pinfo::exist is really there.
So the race condition on process_state field does happens in practice.

Andrey Khalyavin

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Cygwin crashes in kill_pgrp, _pinfo truncation issue.
  2012-08-23 11:35 Cygwin crashes in kill_pgrp, _pinfo truncation issue Andrey Khalyavin
@ 2012-09-17 11:03 ` Andrey Khalyavin
  0 siblings, 0 replies; 6+ messages in thread
From: Andrey Khalyavin @ 2012-09-17 11:03 UTC (permalink / raw)
  To: cygwin

2012/8/23 Andrey Khalyavin wrote:
>
> I got two more crash dumps from our bots running 20120816 snapshot.
> Both of them crashed in this place. process_state equals to
> PID_IN_USE | PIE_EXECED and your new check in _pinfo::exist is really there.
> So the race condition on process_state field does happens in practice.
>
> Andrey Khalyavin

I removed _pinfo truncation optimization and now all crashes are gone.

Andrey Khalyavin

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Cygwin crashes in kill_pgrp, _pinfo truncation issue.
  2012-08-16 16:45 Andrey Khalyavin
@ 2012-08-16 18:44 ` Christopher Faylor
  0 siblings, 0 replies; 6+ messages in thread
From: Christopher Faylor @ 2012-08-16 18:44 UTC (permalink / raw)
  To: cygwin

On Thu, Aug 16, 2012 at 08:20:37PM +0400, Andrey Khalyavin wrote:
>On Wed, 15 Aug 2012 10:11:16 -0400, Christopher Faylor wrote:
>>On Wed, Aug 15, 2012 at 04:54:42PM +0400, Andrey Khalyavin wrote:
>>>I finally got a cygwin crash dump from our build bots. It shows, that
>>>cygwin1.dll crashes in kill_pgrp function on line:
>>>	  (pid > 1 && p->pgid != pid) ||
>>>where p is a pointer to _pinfo. This function enumerates all _pinfo's
>>>and executes this line for all of them which pass p->exists() check.
>>>In crash dump p points to _pinfo that has process_state equal to
>>>PID_IN_USE | PID_EXECED.
>>
>>Thanks for tracking this down.  I've added a check for "execed" to
>>_pinfo::exists.
>>
>>cgf
>I updated core libraries from 20120803 snapshot to 20120815 snapshot
>and now bash crashes when I execute rm -rf dir. Reproducibility is
>strange. It crashed for hours when I entered
>cd /tmp
>mkdir a
>rm -rf a
>commands but now suddenly stopped crashing in this case.
>It is still crashes on rm -rf in the real script we use though.
>
>Crash happens in setup_handler function on line
>	  HANDLE hth = (HANDLE) *tls;
>because tls->tid equals to zero. Definition of this operation is in
>sygtls.h: operator HANDLE () const {return tid->win32_obj_id;}.
>setup_handler is called from sigpacket::process which in turn
>called from wait_sig. Signal number is 20, signal code is 28.
>All fields of tls structure are zero with exception stacklock equal
>to 1 and stackptr equal to address of tls->stack.

Sounds like a race between thread creation and signal handling.  I have
added some defensive code in the latest snapshot.

cgf

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Cygwin crashes in kill_pgrp, _pinfo truncation issue.
@ 2012-08-16 16:45 Andrey Khalyavin
  2012-08-16 18:44 ` Christopher Faylor
  0 siblings, 1 reply; 6+ messages in thread
From: Andrey Khalyavin @ 2012-08-16 16:45 UTC (permalink / raw)
  To: cygwin

On Wed, 15 Aug 2012 10:11:16 -0400, Christopher Faylor wrote:
>On Wed, Aug 15, 2012 at 04:54:42PM +0400, Andrey Khalyavin wrote:
>>I finally got a cygwin crash dump from our build bots. It shows, that
>>cygwin1.dll crashes in kill_pgrp function on line:
>>	  (pid > 1 && p->pgid != pid) ||
>>where p is a pointer to _pinfo. This function enumerates all _pinfo's
>>and executes this line for all of them which pass p->exists() check.
>>In crash dump p points to _pinfo that has process_state equal to
>>PID_IN_USE | PID_EXECED.
>
>Thanks for tracking this down.  I've added a check for "execed" to
>_pinfo::exists.
>
>cgf
I updated core libraries from 20120803 snapshot to 20120815 snapshot
and now bash crashes when I execute rm -rf dir. Reproducibility is
strange. It crashed for hours when I entered
cd /tmp
mkdir a
rm -rf a
commands but now suddenly stopped crashing in this case.
It is still crashes on rm -rf in the real script we use though.

Crash happens in setup_handler function on line
	  HANDLE hth = (HANDLE) *tls;
because tls->tid equals to zero. Definition of this operation is in
sygtls.h: operator HANDLE () const {return tid->win32_obj_id;}.
setup_handler is called from sigpacket::process which in turn
called from wait_sig. Signal number is 20, signal code is 28.
All fields of tls structure are zero with exception stacklock equal
to 1 and stackptr equal to address of tls->stack.

I don't understand what goes wrong here. Can you suggest what
I can look for in the debugger?

Andrey Khalyavin

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Cygwin crashes in kill_pgrp, _pinfo truncation issue.
  2012-08-15 13:10 Andrey Khalyavin
@ 2012-08-15 20:51 ` Christopher Faylor
  0 siblings, 0 replies; 6+ messages in thread
From: Christopher Faylor @ 2012-08-15 20:51 UTC (permalink / raw)
  To: cygwin

On Wed, Aug 15, 2012 at 04:54:42PM +0400, Andrey Khalyavin wrote:
>I finally got a cygwin crash dump from our build bots. It shows, that
>cygwin1.dll crashes in kill_pgrp function on line:
>	  (pid > 1 && p->pgid != pid) ||
>where p is a pointer to _pinfo. This function enumerates all _pinfo's
>and executes this line for all of them which pass p->exists() check.
>In crash dump p points to _pinfo that has process_state equal to
>PID_IN_USE | PID_EXECED.

Thanks for tracking this down.  I've added a check for "execed" to
_pinfo::exists.

cgf

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Cygwin crashes in kill_pgrp, _pinfo truncation issue.
@ 2012-08-15 13:10 Andrey Khalyavin
  2012-08-15 20:51 ` Christopher Faylor
  0 siblings, 1 reply; 6+ messages in thread
From: Andrey Khalyavin @ 2012-08-15 13:10 UTC (permalink / raw)
  To: cygwin

I finally got a cygwin crash dump from our build bots. It shows, that
cygwin1.dll crashes in kill_pgrp function on line:
	  (pid > 1 && p->pgid != pid) ||
where p is a pointer to _pinfo. This function enumerates all _pinfo's
and executes this line for all of them which pass p->exists() check.
In crash dump p points to _pinfo that has process_state equal to
PID_IN_USE | PID_EXECED. As far as I understand, such _pinfo's
have smaller size and so accessing pgid field leads to crash.
I found a past thread that describes very similar problem:
http://cygwin.com/ml/cygwin/2010-09/msg00390.html

We obviously need to add a check for PID_EXECED in this function. But
I'm worried that we can still have a race condition where
external process can truncate the structure just after our check.

Some ideas how to fix this race condition (if it exists):
1. Use ReadProcessMemory for memory access.
2. Move command line to the end of the structure and extend
PINFO_REDIR_SIZE to include all important fields.
3. Make proper synchronization. Not sure how it should be done in this
case. Since this is a memory mapped file, one may need
to use functions like LockFile.

Andrey Khalyavin

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-09-17  9:00 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-23 11:35 Cygwin crashes in kill_pgrp, _pinfo truncation issue Andrey Khalyavin
2012-09-17 11:03 ` Andrey Khalyavin
  -- strict thread matches above, loose matches on Subject: below --
2012-08-16 16:45 Andrey Khalyavin
2012-08-16 18:44 ` Christopher Faylor
2012-08-15 13:10 Andrey Khalyavin
2012-08-15 20:51 ` Christopher Faylor

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).