From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 16537 invoked by alias); 24 Jul 2012 13:58:04 -0000 Received: (qmail 16438 invoked by uid 22791); 24 Jul 2012 13:57:37 -0000 X-Spam-Check-By: sourceware.org Received: from aquarius.hirmke.de (HELO calimero.vinschen.de) (217.91.18.234) by sourceware.org (qpsmtpd/0.83/v0.83-20-g38e4449) with ESMTP; Tue, 24 Jul 2012 13:57:21 +0000 Received: by calimero.vinschen.de (Postfix, from userid 500) id F13032C0044; Tue, 24 Jul 2012 15:57:18 +0200 (CEST) Date: Tue, 24 Jul 2012 13:58:00 -0000 From: Corinna Vinschen To: cygwin@cygwin.com Subject: Re: Race condition that leads to random crashes in cygwin-based builds. Message-ID: <20120724135718.GA29107@calimero.vinschen.de> Reply-To: cygwin@cygwin.com Mail-Followup-To: cygwin@cygwin.com References: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Mailing-List: contact cygwin-help@cygwin.com; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner@cygwin.com Mail-Followup-To: cygwin@cygwin.com X-SW-Source: 2012-07/txt/msg00532.txt.bz2 On Jul 24 17:25, Andrey Khalyavin wrote: > Hi, we have build bots that crash randomly on Windows XP and rarely on > Windows 7. > These bots use our compiler that runs under cygwin. Although crashes > are rare, we > have ~20 bots what makes green builds almost impossible. I tried to > reproduce these > crashes on my local Windows XP computer and after several days (on bots crashes > are much more frequent may be due to them using virtual machines) I > got a crash dump. > > Investigation of this crash dump showed that wincapc::init in > winsup\cygwin\wincap.cc > called api_fatal ("Cygwin requires at least Windows 2000."). This > function is called at > cygwin1.dll initialization even before any code in our compiler > (cc1.exe) have been > executed. Further investigation showed that wincapc variable is in > shared section: > wincapc wincap __attribute__((section (".cygwin_dll_common"), shared)); > but wincapc::init() function doesn't have any synchronization and is called from > dll_crt0_0 without any synchronization. Using shared variables without > synchronization > is sure way to get random failures. Here is one scenario that can lead > to api_fatal called: > > 1. No cygwin processes exist in a system. > 2. Two cygwin processes are started simultaneously. > 3. First process enters wincapc::init, clears version field with > memset and executes > version.dwOSVersionInfoSize = sizeof (OSVERSIONINFOEX) > 4. Task switching happens and second process enters wincapc::init. It > sees that caps > field is still not initialized yet and cleaders version field with memset. > 5. Task switching happens and first process proceeds to execute > GetVersionEx with > version cleared by memset and so not having its size set. > 6. GetVersionEx returns error and first process fails to start. > > If there is no easy way to add synchronization to wincapc::init, I > suggest to make > wincap a regular (not shared) variable. There's another way, afaics. The idea here was that wincap is only ever set once, and even *if* the information is written twice, the content will be identical. So, afaics, the above problem is a result of using memset at all. At startup, wincap is all 0 anyway, so the memset is not required and apparently it even hurts. Weird that nobody saw this problem before. I applied a patch which should fix this problem. Please give the next developer snapshot from http://cygwin.com/snapshots/ a try, or build yourself from CVS. Thanks, Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple