From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 9928 invoked by alias); 2 Jan 2013 19:15:36 -0000 Received: (qmail 9919 invoked by uid 22791); 2 Jan 2013 19:15:35 -0000 X-SWARE-Spam-Status: No, hits=-4.0 required=5.0 tests=AWL,BAYES_00,KHOP_RCVD_UNTRUST,KHOP_SPAMHAUS_DROP,KHOP_THREADED,RCVD_IN_DNSWL_LOW,RCVD_IN_HOSTKARMA_NO,RCVD_IN_HOSTKARMA_W,RCVD_IN_HOSTKARMA_WL,RCVD_IN_HOSTKARMA_YE,TW_CL X-Spam-Check-By: sourceware.org Received: from db3ehsobe004.messaging.microsoft.com (HELO db3outboundpool.messaging.microsoft.com) (213.199.154.142) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 02 Jan 2013 19:15:28 +0000 Received: from mail66-db3-R.bigfish.com (10.3.81.226) by DB3EHSOBE007.bigfish.com (10.3.84.27) with Microsoft SMTP Server id 14.1.225.23; Wed, 2 Jan 2013 19:15:26 +0000 Received: from mail66-db3 (localhost [127.0.0.1]) by mail66-db3-R.bigfish.com (Postfix) with ESMTP id BD5292E0123 for ; Wed, 2 Jan 2013 19:15:26 +0000 (UTC) X-Forefront-Antispam-Report: CIP:157.56.238.5;KIP:(null);UIP:(null);IPV:NLI;H:BY2PRD0512HT004.namprd05.prod.outlook.com;RD:none;EFVD:NLI X-SpamScore: 0 X-BigFish: PS0(zzbb2dI98dI9371I936eI148cI1432Izz1de0h1202h1e76h1d1ah1d2ahzz75dfh8275bh177df4h17326ahb412mz32i2a8h668h839h947hd25he5bhf0ah1288h12a5h12a9h12bdh137ah13b6h1441h1504h1537h153bh162dh1631h1758h1765h1155h) Received: from mail66-db3 (localhost.localdomain [127.0.0.1]) by mail66-db3 (MessageSwitch) id 1357154123210819_15739; Wed, 2 Jan 2013 19:15:23 +0000 (UTC) Received: from DB3EHSMHS021.bigfish.com (unknown [10.3.81.241]) by mail66-db3.bigfish.com (Postfix) with ESMTP id 314884E0055 for ; Wed, 2 Jan 2013 19:15:23 +0000 (UTC) Received: from BY2PRD0512HT004.namprd05.prod.outlook.com (157.56.238.5) by DB3EHSMHS021.bigfish.com (10.3.87.157) with Microsoft SMTP Server (TLS) id 14.1.225.23; Wed, 2 Jan 2013 19:15:22 +0000 Received: from SN2PRD0310HT004.namprd03.prod.outlook.com (157.56.234.5) by pod51010.outlook.com (10.255.243.37) with Microsoft SMTP Server (TLS) id 14.16.245.2; Wed, 2 Jan 2013 19:15:18 +0000 Message-ID: <50E48743.5040401@coverity.com> Date: Wed, 02 Jan 2013 19:15:00 -0000 From: Tom Honermann User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/17.0 Thunderbird/17.0 MIME-Version: 1.0 To: Subject: Re: Intermittent failures retrieving process exit codes - snapshot test requested References: <20121221193620.GA29203@ednor.casa.cgf.cx> <50D4E144.706@gmail.com> <20121222024943.GA5773@ednor.casa.cgf.cx> <20121222031430.GA8355@ednor.casa.cgf.cx> <50D57818.1070706@gmail.com> <20121222175041.GA14475@ednor.casa.cgf.cx> <20121223165621.GA9935@ednor.casa.cgf.cx> <50DCB454.9030400@coverity.com> <20121229215725.GA18847@ednor.casa.cgf.cx> <50E23F98.1060004@coverity.com> <20130101053606.GB18911@ednor.casa.cgf.cx> In-Reply-To: <20130101053606.GB18911@ednor.casa.cgf.cx> Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit X-OriginatorOrg: coverity.com X-IsSubscribed: yes Mailing-List: contact cygwin-help@cygwin.com; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner@cygwin.com Mail-Followup-To: cygwin@cygwin.com X-SW-Source: 2013-01/txt/msg00027.txt.bz2 On 01/01/2013 12:36 AM, Christopher Faylor wrote: > On Mon, Dec 31, 2012 at 08:44:56PM -0500, Tom Honermann wrote: >> I'm still seeing hangs in the latest code from CVS. The stack traces >> below are from WinDbg. > > I'm not asking you to build this yourself. I have no way to know how > you are building this. Please just use the snapshots at > > http://cygwin.com/snapshots/ I was building it myself so that I could debug it without having to specify debug source paths and such. I believe my builds are not unconventional. I used options that disabled frame pointer omission so that the resulting binaries could be debugged with non-gcc debuggers. $ mkdir build $ cd build $ ../src/configure \ CFLAGS="-g" \ CXXFLAGS="-g" \ CFLAGS_FOR_TARGET="-g" \ CXXFLAGS_FOR_TARGET="-g" \ --enable-debugging \ --prefix=$HOME/src/cygwin-latest/install -v $ make $ make install >> I manually resolved the symbol references within >> the cygwin1 module using the linker generated .map file. Since the .map >> file does not include static functions, some of these may be incorrect - >> I didn't try and verify or correct for this. > > Thanks for trying, but the output below is garbled and not really > useful. If you are not going to dive in and attempt to fix code > yourself then all we normally need is a simple test case. WinDbg > is not really appropriate for debugging Cygwin applications. The output below is not garbled, but I didn't explain it clearly enough. Lines with frame numbers come directly from WinDbg. Since WinDbg is unable to resolve symbols to gcc generated debug info, the symbol references within the cygwin1 module are incorrect. In those cases, I manually resolved the instruction pointer address using the RetAddr value from the prior frame and searching the linker generated cygwin1.map file. I then pasted the mangled name on a line following the WinDbg line (with the incorrect symbol name) and, if the symbol is a C++ one, the unmangled name on an additional line. For the stack fragment below, address 610f1553 == strtosigno+0x357 == __ZN4muto7acquireEm == muto::acquire(unsigned long). I did not translate offsets for the functions as I resolved them, nor did I try and verify they are correct (ie, that the return address is not for a static function that is not represented in the .map file) >> # ChildEBP RetAddr >> 00 00288bd0 758d0a91 ntdll!ZwWaitForSingleObject+0x15 >> 01 00288c3c 76c11194 KERNELBASE!WaitForSingleObjectEx+0x98 >> 02 00288c54 76c11148 kernel32!WaitForSingleObjectExImplementation+0x75 >> 03 00288c68 610f1553 kernel32!WaitForSingleObject+0x12 >> 04 00288cb8 6118e54d cygwin1!strtosigno+0x357 >> __ZN4muto7acquireEm >> muto::acquire(unsigned long) >> [snip] The reason for using WinDbg is that, from what I understand, gdb is unable to produce accurate stack traces when the call stack includes frames for functions that omit the frame pointer and do not have debug info that gdb can process. I believe many Microsoft provided functions in ntdll, kernel32, kernelbase, etc... do omit the frame pointer and only provide debug info in the PDB format - which gdb is unable to use. Compiling Cygwin without frame pointer omission, and using WinDbg therefore provides the most accurate stack trace. If I am incorrect about any of this, I would very much appreciate a correction and/or explanation. I downloaded the latest snapshot (2012-12-31 18:44:57 UTC) and was able to reproduce several issues which are described below. All of these issues occur when using ctrl-c to interrupt the infinite loop in the test case(s) I've been using to debug inconsistent exit codes. When ctrl-c is pressed, I've observed the following: 1) Programs are (generally) terminated as expected. cmd.exe prompts to "Terminate batch job" as expected. 2) An access violation occurs and a processor context is dumped to the console. I do not yet have stack traces for these cases. 3) One of the processes hangs. access violations occur in ~20% of test runs. Hangs occur in ~5% of test runs. I did not provide a test case previously because I don't have an automated reproducer at present. All sources needed to reproduce the issues are below. The test case uses a .bat file to avoid dependencies on bash so as to minimally isolate the problem. To reproduce the issues, copy test.bat, false-cygwin32.exe, and expect-false-execve-cygwin32.exe to a Cygwin bin directory and run test.bat from a cmd.exe console. Press ctrl-c to interrupt the test. Repeat until problems are observed. I have not been able to reproduce these symptoms when running the test via a MinTTY console. I have been unable to get useful stack traces from hung processes using gdb. gdb reports that the debug information in cygwin1-20130102.dbg.bz2 does not match (CRC mismatch) the cygwin1.dll module in cygwin-inst-20130102.tar.bz2. $ cat expect-false-execve.c #include #include #include #include int main(int argc, char *argv[]) { pid_t child_pid, wait_pid; int result, child_status; if (argc != 2) { fprintf(stderr, "expect-false: Missing or too many arguments\n"); return 127; } child_pid = fork(); if (child_pid == -1) { fprintf(stderr, "expect-false: fork failed. errno=%d\n", errno); return 127; } else if (child_pid == 0) { result = execlp(argv[1], argv[1], NULL); if (result == -1) { fprintf(stderr, "expect-false: execlp failed. errno=%d\n", errno); } _exit(127); } do { wait_pid = waitpid(child_pid, &child_status, 0); } while( (wait_pid == -1 && errno == EINTR) || (wait_pid == child_pid && !(WIFEXITED(child_status) || WIFSIGNALED(child_status))) ); if (wait_pid == -1) { fprintf(stderr, "expect-false: waitpid failed. errno=%d\n", errno); return 127; } if (!WIFEXITED(child_status)) { fprintf(stderr, "expect-false: child process did not exit normally\n"); return 127; } if (WEXITSTATUS(child_status) != 1) { fprintf(stderr, "expect-false: unexpected exit code: %d\n", child_status); } return WEXITSTATUS(child_status); } $ cat false.c #include int main() { printf("myfalse\n"); return 1; } $ cat test.bat @echo off setlocal set PATH=%CD%;%PATH% :loop echo test... expect-false-execve-cygwin32.exe false-cygwin32 if not errorlevel 1 ( echo exiting... exit /B 1 ) goto loop $ gcc -o expect-false-execve-cygwin32.exe expect-false-execve.c $ gcc -o false-cygwin32.exe false.c From a cmd.exe console: (press ctrl-c once the test is running) C:\...\cygwin\bin>test test... myfalse test... myfalse ... Tom. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple