From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 31443 invoked by alias); 7 Dec 2012 19:55:22 -0000 Received: (qmail 31431 invoked by uid 22791); 7 Dec 2012 19:55:21 -0000 X-SWARE-Spam-Status: No, hits=2.8 required=5.0 tests=AWL,BAYES_00,KHOP_RCVD_UNTRUST,RCVD_IN_DNSWL_LOW,RCVD_IN_HOSTKARMA_NO,RCVD_IN_HOSTKARMA_W,RCVD_IN_HOSTKARMA_WL,URI_HEX X-Spam-Check-By: sourceware.org Received: from va3ehsobe010.messaging.microsoft.com (HELO va3outboundpool.messaging.microsoft.com) (216.32.180.30) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 07 Dec 2012 19:55:15 +0000 Received: from mail202-va3-R.bigfish.com (10.7.14.237) by VA3EHSOBE001.bigfish.com (10.7.40.21) with Microsoft SMTP Server id 14.1.225.23; Fri, 7 Dec 2012 19:55:13 +0000 Received: from mail202-va3 (localhost [127.0.0.1]) by mail202-va3-R.bigfish.com (Postfix) with ESMTP id 9FDC0200248 for ; Fri, 7 Dec 2012 19:55:13 +0000 (UTC) X-Forefront-Antispam-Report: CIP:157.56.238.5;KIP:(null);UIP:(null);IPV:NLI;H:BY2PRD0512HT004.namprd05.prod.outlook.com;RD:none;EFVD:NLI X-SpamScore: 6 X-BigFish: PS6(zzd6eahzz1de0h1202h1e76h1d1ah1d2ahzz177df4h17326ah8275bhb412mz32i2a8h668h839h947hd25he5bhf0ah1288h12a5h12a9h12bdh137ah13b6h1441h1504h1537h153bh162dh1631h1758h1765h1155h) Received: from mail202-va3 (localhost.localdomain [127.0.0.1]) by mail202-va3 (MessageSwitch) id 135491011152567_12237; Fri, 7 Dec 2012 19:55:11 +0000 (UTC) Received: from VA3EHSMHS009.bigfish.com (unknown [10.7.14.250]) by mail202-va3.bigfish.com (Postfix) with ESMTP id 09F2B500119 for ; Fri, 7 Dec 2012 19:55:11 +0000 (UTC) Received: from BY2PRD0512HT004.namprd05.prod.outlook.com (157.56.238.5) by VA3EHSMHS009.bigfish.com (10.7.99.19) with Microsoft SMTP Server (TLS) id 14.1.225.23; Fri, 7 Dec 2012 19:55:04 +0000 Received: from BL2PRD0310HT002.namprd03.prod.outlook.com (157.56.240.21) by pod51010.outlook.com (10.255.243.37) with Microsoft SMTP Server (TLS) id 14.16.245.2; Fri, 7 Dec 2012 19:54:56 +0000 Message-ID: <50C2498C.2000003@coverity.com> Date: Fri, 07 Dec 2012 19:55:00 -0000 From: Tom Honermann User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:14.0) Gecko/20120714 Thunderbird/14.0 MIME-Version: 1.0 To: Subject: Intermittent failures retrieving process exit codes Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit X-OriginatorOrg: coverity.com X-IsSubscribed: yes Mailing-List: contact cygwin-help@cygwin.com; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner@cygwin.com Mail-Followup-To: cygwin@cygwin.com X-SW-Source: 2012-12/txt/msg00140.txt.bz2 I've witnessed intermittent failures in multiple build systems while working at multiple companies using Cygwin bash and make as part of the build system but using non-Cygwin compilers and other tools. The intermittent failures occur when a process appears to complete successfully, but the process retrieving its exit code receives an unexpected value. This has been seen on many different Cygwin versions across several years. Several reports of similar sounding issues can be found online: - http://cygwin.1069669.n5.nabble.com/Cygwin-1-7-x-on-Windows-7-Exit-statuses-of-Win32-executables-are-sometimes-wrong-td20186.html - http://stackoverflow.com/questions/9769256/intermittent-failures-under-cygwin-possibly-related-to-candle-and-or-make I recently was able to produce a very small test case that reproduces this issue reliably on some machines: $ cat test.sh #!/bin/sh while [ 1 ]; do echo "test..." if cmd /c "false"; then echo "exiting..." exit 1 fi done An invocation of test.sh should run indefinitely, but fails very quickly on one of my machines: $ ./test.sh test... test... exiting... $ ./test.sh test... test... test... test... exiting... $ ./test.sh test... exiting... There are several high-level possibilities for what is going wrong: 1) cmd.exe is failing to retrieve the correct exit code for the invocation of false.exe (A Cygwin process). 2) cmd.exe is failing to return the (correct) exit code it received for the invocation of false.exe. 3) bash.exe (A Cygwin process) is failing to retrieve the correct exit code for the invocation of cmd.exe. It is possible that other software installed on the machines I've witnessed this on are contributing to the problem (ala http://cygwin.com/faq/faq.using.html#faq.using.bloda). If so, such software would be a contributing factor to one of the explanations above, but does not necessarily mean that there is not a defect in Cygwin (or CreateProcess, WaitForSingleObject, or GetExitCodeProcess). I have not yet seen a similar case that does not involve Cygwin, so at present I suspect a defect in Cygwin, but possibly one that produces no negative symptoms in isolation. I've reproduced this issue with both the 32-bit and 64-bit versions of cmd.exe. I've also reproduced it by replacing cmd.exe with a C file that calls CreateProcess for Cygwin's false.exe on its own. The issue reproduces whether that C file is compiled with Cygwin gcc, MinGW gcc (32-bit and 64-bit), and with MSVC (32-bit and 64-bit). So, substitute what you like for 'cmd.exe' in the above. Likewise, I've reproduced this issue by replacing false.exe in the test above with a custom false.exe (A C program that just returns 1). The issue reproduces whether myfalse.exe is compiled with Cygwin gcc, MinGW gcc (32-bit and 64-bit), and with MSVC (32-bit and 64-bit). So, substitute what you like for 'false.exe' in the above. I am not able to reproduce the problem if I elide the invocation of false.exe. (ie, if the cmd.exe invocation is 'cmd /c "exit /B 1"' or if my replacement for cmd.exe just returns 1). The problem feels like a race condition in retrieving process exit codes. Further, it seems that it may only occur when two related processes exit in quick succession. I've been granted several weeks in the near future to work exclusively on this issue. Before I start working on it though, I'd like to hear from other community members who have experienced this and tried to debug it. What is and is not known about the issue. What workarounds have been tried (especially any that were found to be successful). Are there specific parts of the Cygwin (or bash) code that you recommend starting with? The machine that I've been running the above script on is 64-bit Windows 7 Professional SP1 running under VMware Workstation 8 which is running on Kubuntu 12.04. Relevant parts of 'cygcheck-s' are: Windows 7 Professional N Ver 6.1 Build 7601 Service Pack 1 Running under WOW64 on AMD64 Cygwin DLL version info: DLL version: 1.7.16 DLL epoch: 19 DLL old termios: 5 DLL malloc env: 28 Cygwin conv: 181 API major: 0 API minor: 262 Shared data: 5 DLL identifier: cygwin1 Mount registry: 3 Cygwin registry name: Cygwin Program options name: Program Options Installations name: Installations Cygdrive default prefix: Build date: Shared id: cygwin1S5 Potential app conflicts: ByteMobile laptop optimization client. No Cygwin services found. Cygwin Package Information Package Version Status bash 4.1.10-4 OK cygwin 1.7.16-1 OK Tom. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple