public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
* Race condition that leads to random crashes in cygwin-based builds.
@ 2012-07-24 13:25 Andrey Khalyavin
  2012-07-24 13:58 ` Corinna Vinschen
  0 siblings, 1 reply; 6+ messages in thread
From: Andrey Khalyavin @ 2012-07-24 13:25 UTC (permalink / raw)
  To: cygwin

Hi, we have build bots that crash randomly on Windows XP and rarely on
Windows 7.
These bots use our compiler that runs under cygwin. Although crashes
are rare, we
have ~20 bots what makes green builds almost impossible. I tried to
reproduce these
crashes on my local Windows XP computer and after several days (on bots crashes
are much more frequent may be due to them using virtual machines) I
got a crash dump.

Investigation of this crash dump showed that wincapc::init in
winsup\cygwin\wincap.cc
called api_fatal ("Cygwin requires at least Windows 2000."). This
function is called at
cygwin1.dll initialization even before any code in our compiler
(cc1.exe) have been
executed. Further investigation showed that wincapc variable is in
shared section:
wincapc wincap __attribute__((section (".cygwin_dll_common"), shared));
but wincapc::init() function doesn't have any synchronization and is called from
dll_crt0_0 without any synchronization. Using shared variables without
synchronization
is sure way to get random failures. Here is one scenario that can lead
to api_fatal called:

1. No cygwin processes exist in a system.
2. Two cygwin processes are started simultaneously.
3. First process enters wincapc::init, clears version field with
memset and executes
version.dwOSVersionInfoSize = sizeof (OSVERSIONINFOEX)
4. Task switching happens and second process enters wincapc::init. It
sees that caps
field is still not initialized yet and cleaders version field with memset.
5. Task switching happens and first process proceeds to execute
GetVersionEx with
version cleared by memset and so not having its size set.
6. GetVersionEx returns error and first process fails to start.

If there is no easy way to add synchronization to wincapc::init, I
suggest to make
wincap a regular (not shared) variable.

Andrey Khalyavin

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Race condition that leads to random crashes in cygwin-based builds.
  2012-07-24 13:25 Race condition that leads to random crashes in cygwin-based builds Andrey Khalyavin
@ 2012-07-24 13:58 ` Corinna Vinschen
  2012-08-06  9:05   ` Corinna Vinschen
  0 siblings, 1 reply; 6+ messages in thread
From: Corinna Vinschen @ 2012-07-24 13:58 UTC (permalink / raw)
  To: cygwin

On Jul 24 17:25, Andrey Khalyavin wrote:
> Hi, we have build bots that crash randomly on Windows XP and rarely on
> Windows 7.
> These bots use our compiler that runs under cygwin. Although crashes
> are rare, we
> have ~20 bots what makes green builds almost impossible. I tried to
> reproduce these
> crashes on my local Windows XP computer and after several days (on bots crashes
> are much more frequent may be due to them using virtual machines) I
> got a crash dump.
> 
> Investigation of this crash dump showed that wincapc::init in
> winsup\cygwin\wincap.cc
> called api_fatal ("Cygwin requires at least Windows 2000."). This
> function is called at
> cygwin1.dll initialization even before any code in our compiler
> (cc1.exe) have been
> executed. Further investigation showed that wincapc variable is in
> shared section:
> wincapc wincap __attribute__((section (".cygwin_dll_common"), shared));
> but wincapc::init() function doesn't have any synchronization and is called from
> dll_crt0_0 without any synchronization. Using shared variables without
> synchronization
> is sure way to get random failures. Here is one scenario that can lead
> to api_fatal called:
> 
> 1. No cygwin processes exist in a system.
> 2. Two cygwin processes are started simultaneously.
> 3. First process enters wincapc::init, clears version field with
> memset and executes
> version.dwOSVersionInfoSize = sizeof (OSVERSIONINFOEX)
> 4. Task switching happens and second process enters wincapc::init. It
> sees that caps
> field is still not initialized yet and cleaders version field with memset.
> 5. Task switching happens and first process proceeds to execute
> GetVersionEx with
> version cleared by memset and so not having its size set.
> 6. GetVersionEx returns error and first process fails to start.
> 
> If there is no easy way to add synchronization to wincapc::init, I
> suggest to make
> wincap a regular (not shared) variable.

There's another way, afaics.  The idea here was that wincap is only
ever set once, and even *if* the information is written twice, the
content will be identical.

So, afaics, the above problem is a result of using memset at all.  At
startup, wincap is all 0 anyway, so the memset is not required and
apparently it even hurts.  Weird that nobody saw this problem before.

I applied a patch which should fix this problem.  Please give the
next developer snapshot from http://cygwin.com/snapshots/ a try,
or build yourself from CVS.


Thanks,
Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Race condition that leads to random crashes in cygwin-based builds.
  2012-07-24 13:58 ` Corinna Vinschen
@ 2012-08-06  9:05   ` Corinna Vinschen
  2012-08-06 15:43     ` Andrey Khalyavin
  0 siblings, 1 reply; 6+ messages in thread
From: Corinna Vinschen @ 2012-08-06  9:05 UTC (permalink / raw)
  To: cygwin; +Cc: Andrey Khalyavin

Andrey?

On Jul 24 15:57, Corinna Vinschen wrote:
> On Jul 24 17:25, Andrey Khalyavin wrote:
> > Hi, we have build bots that crash randomly on Windows XP and rarely on
> > Windows 7.
> > [...]
> > Investigation of this crash dump showed that wincapc::init in
> > winsup\cygwin\wincap.cc
> > called api_fatal ("Cygwin requires at least Windows 2000."). This
> > function is called at
> > cygwin1.dll initialization even before any code in our compiler
> > (cc1.exe) have been
> > executed. Further investigation showed that wincapc variable is in
> > shared section:
> > wincapc wincap __attribute__((section (".cygwin_dll_common"), shared));
> > but wincapc::init() function doesn't have any synchronization and is called from
> > dll_crt0_0 without any synchronization. Using shared variables without
> > synchronization
> > is sure way to get random failures. Here is one scenario that can lead
> > to api_fatal called:
> > [...]
> > 3. First process enters wincapc::init, clears version field with
> > memset and executes
> > version.dwOSVersionInfoSize = sizeof (OSVERSIONINFOEX)
> > 4. Task switching happens and second process enters wincapc::init. It
> > sees that caps
> > field is still not initialized yet and cleaders version field with memset.
> > 5. Task switching happens and first process proceeds to execute
> > GetVersionEx with
> > version cleared by memset and so not having its size set.
> > 6. GetVersionEx returns error and first process fails to start.
> > 
> > If there is no easy way to add synchronization to wincapc::init, I
> > suggest to make
> > wincap a regular (not shared) variable.
> 
> There's another way, afaics.  The idea here was that wincap is only
> ever set once, and even *if* the information is written twice, the
> content will be identical.
> 
> So, afaics, the above problem is a result of using memset at all.  At
> startup, wincap is all 0 anyway, so the memset is not required and
> apparently it even hurts.  Weird that nobody saw this problem before.
> 
> I applied a patch which should fix this problem.  Please give the
> next developer snapshot from http://cygwin.com/snapshots/ a try,
> or build yourself from CVS.

Ping?  Any feedback?  Did you ever try a snapshot?


Thanks,
Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Race condition that leads to random crashes in cygwin-based builds.
  2012-08-06  9:05   ` Corinna Vinschen
@ 2012-08-06 15:43     ` Andrey Khalyavin
  2012-08-06 18:49       ` Corinna Vinschen
  0 siblings, 1 reply; 6+ messages in thread
From: Andrey Khalyavin @ 2012-08-06 15:43 UTC (permalink / raw)
  To: cygwin

I updated our cygwin with core libraries from 20120725 snapshot. There
are still crashes in our build, I'm investigating them. Haven't got a
crash dump yet. This time I have to catch them on the bots instead of
local computer.

Andrey Khalyavin

2012/8/6 Corinna Vinschen wrote:
> Andrey?
>
> On Jul 24 15:57, Corinna Vinschen wrote:
>> On Jul 24 17:25, Andrey Khalyavin wrote:
>> > Hi, we have build bots that crash randomly on Windows XP and rarely on
>> > Windows 7.
>> > [...]
>> > Investigation of this crash dump showed that wincapc::init in
>> > winsup\cygwin\wincap.cc
>> > called api_fatal ("Cygwin requires at least Windows 2000."). This
>> > function is called at
>> > cygwin1.dll initialization even before any code in our compiler
>> > (cc1.exe) have been
>> > executed. Further investigation showed that wincapc variable is in
>> > shared section:
>> > wincapc wincap __attribute__((section (".cygwin_dll_common"), shared));
>> > but wincapc::init() function doesn't have any synchronization and is called from
>> > dll_crt0_0 without any synchronization. Using shared variables without
>> > synchronization
>> > is sure way to get random failures. Here is one scenario that can lead
>> > to api_fatal called:
>> > [...]
>> > 3. First process enters wincapc::init, clears version field with
>> > memset and executes
>> > version.dwOSVersionInfoSize = sizeof (OSVERSIONINFOEX)
>> > 4. Task switching happens and second process enters wincapc::init. It
>> > sees that caps
>> > field is still not initialized yet and cleaders version field with memset.
>> > 5. Task switching happens and first process proceeds to execute
>> > GetVersionEx with
>> > version cleared by memset and so not having its size set.
>> > 6. GetVersionEx returns error and first process fails to start.
>> >
>> > If there is no easy way to add synchronization to wincapc::init, I
>> > suggest to make
>> > wincap a regular (not shared) variable.
>>
>> There's another way, afaics.  The idea here was that wincap is only
>> ever set once, and even *if* the information is written twice, the
>> content will be identical.
>>
>> So, afaics, the above problem is a result of using memset at all.  At
>> startup, wincap is all 0 anyway, so the memset is not required and
>> apparently it even hurts.  Weird that nobody saw this problem before.
>>
>> I applied a patch which should fix this problem.  Please give the
>> next developer snapshot from http://cygwin.com/snapshots/ a try,
>> or build yourself from CVS.
>
> Ping?  Any feedback?  Did you ever try a snapshot?
>
>
> Thanks,
> Corinna
>
> --
> Corinna Vinschen                  Please, send mails regarding Cygwin to
> Cygwin Project Co-Leader          cygwin AT cygwin DOT com
> Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Race condition that leads to random crashes in cygwin-based builds.
  2012-08-06 15:43     ` Andrey Khalyavin
@ 2012-08-06 18:49       ` Corinna Vinschen
  0 siblings, 0 replies; 6+ messages in thread
From: Corinna Vinschen @ 2012-08-06 18:49 UTC (permalink / raw)
  To: cygwin



Please, don't http://cygwin.com/acronyms/#TOFU


On Aug  6 19:31, Andrey Khalyavin wrote:
> 2012/8/6 Corinna Vinschen wrote:
> > Andrey?
> >
> > On Jul 24 15:57, Corinna Vinschen wrote:
> >> On Jul 24 17:25, Andrey Khalyavin wrote:
> >> > Hi, we have build bots that crash randomly on Windows XP and rarely on
> >> > Windows 7.
> >> > [...]
> >> > 5. Task switching happens and first process proceeds to execute
> >> > GetVersionEx with
> >> > version cleared by memset and so not having its size set.
> >> > 6. GetVersionEx returns error and first process fails to start.
> >> >
> >> > If there is no easy way to add synchronization to wincapc::init, I
> >> > suggest to make
> >> > wincap a regular (not shared) variable.
> >>
> >> There's another way, afaics.  The idea here was that wincap is only
> >> ever set once, and even *if* the information is written twice, the
> >> content will be identical.
> >>
> >> So, afaics, the above problem is a result of using memset at all.  At
> >> startup, wincap is all 0 anyway, so the memset is not required and
> >> apparently it even hurts.  Weird that nobody saw this problem before.
> >>
> >> I applied a patch which should fix this problem.  Please give the
> >> next developer snapshot from http://cygwin.com/snapshots/ a try,
> >> or build yourself from CVS.
> >
> > Ping?  Any feedback?  Did you ever try a snapshot?
> 
> I updated our cygwin with core libraries from 20120725 snapshot. There
> are still crashes in our build, I'm investigating them. Haven't got a
> crash dump yet. This time I have to catch them on the bots instead of
> local computer.

Please use the *latest* snapshot from CVS, 20120803.  There were other
potential reasons for crashes which only have been (tried to) fix in
the latest snapshot.


Thanks,
Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Race condition that leads to random crashes in cygwin-based builds.
@ 2012-08-07 11:06 Andrey Khalyavin
  0 siblings, 0 replies; 6+ messages in thread
From: Andrey Khalyavin @ 2012-08-07 11:06 UTC (permalink / raw)
  To: cygwin

On Aug 7 Corinna Vinschen wrote:
>On Aug  6 19:31, Andrey Khalyavin wrote:
>> 2012/8/6 Corinna Vinschen wrote:
>> > Andrey?
>> >
>> > On Jul 24 15:57, Corinna Vinschen wrote:
>> >> On Jul 24 17:25, Andrey Khalyavin wrote:
>> >> > Hi, we have build bots that crash randomly on Windows XP and rarely on
>> >> > Windows 7.
>> >> > [...]
>> >> > 5. Task switching happens and first process proceeds to execute
>> >> > GetVersionEx with
>> >> > version cleared by memset and so not having its size set.
>> >> > 6. GetVersionEx returns error and first process fails to start.
>> >> >
>> >> > If there is no easy way to add synchronization to wincapc::init, I
>> >> > suggest to make
>> >> > wincap a regular (not shared) variable.
>> >>
>> >> There's another way, afaics.  The idea here was that wincap is only
>> >> ever set once, and even *if* the information is written twice, the
>> >> content will be identical.
>> >>
>> >> So, afaics, the above problem is a result of using memset at all.  At
>> >> startup, wincap is all 0 anyway, so the memset is not required and
>> >> apparently it even hurts.  Weird that nobody saw this problem before.
>> >>
>> >> I applied a patch which should fix this problem.  Please give the
>> >> next developer snapshot from http://cygwin.com/snapshots/ a try,
>> >> or build yourself from CVS.
>> >
>> > Ping?  Any feedback?  Did you ever try a snapshot?
>>
>> I updated our cygwin with core libraries from 20120725 snapshot. There
>> are still crashes in our build, I'm investigating them. Haven't got a
>> crash dump yet. This time I have to catch them on the bots instead of
>> local computer.
>
>Please use the *latest* snapshot from CVS, 20120803.  There were other
>potential reasons for crashes which only have been (tried to) fix in
>the latest snapshot.

Thank you. I will update cygwin before I do a new step in investigating our
build issues (need to implement some way to get crash dumps on bots).

Andrey Khalyavin.

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-08-07 11:00 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-24 13:25 Race condition that leads to random crashes in cygwin-based builds Andrey Khalyavin
2012-07-24 13:58 ` Corinna Vinschen
2012-08-06  9:05   ` Corinna Vinschen
2012-08-06 15:43     ` Andrey Khalyavin
2012-08-06 18:49       ` Corinna Vinschen
2012-08-07 11:06 Andrey Khalyavin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).