Dear Cygwin developers, While testing Cygwin on Wine, there is a random crashing puzzled me for a wrong time. The easiest way to reproduce it on my machine is: 1. Install latest Wine (staging version) from http://www.wine-staging.com/ 2. Install latest Cygwin on Wine $ uname -a CYGWIN_NT-5.1 2.2.1(0.289/5/3) 2015-08-20 11:40 i686 Cygwin 3. run curl to fetch some non existent url, like $ curl 127.0.0.2 This reproduce the crashing almost 100% gdb provides pretty good backtrace like below: (gdb) r The program being debugged has been started already. Start it from the beginning? (y or n) y Starting program: /usr/bin/curl 127.0.0.2 [New Thread 209146.0x330fb] [New Thread 209146.0x330fc] [New Thread 209146.0x330fd] Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 209146.0x330fd] 0x00000000 in ?? () (gdb) bt #0 0x00000000 in ?? () #1 0x6100609b in _cygtls::call2(unsigned long (*)(void*, void*), void*, void*)@16 (this=0xa2ce64, func=func@entry=0x0, arg=arg@entry=0x611b6c70 , buf=buf@entry=0xa2b824) at /usr/src/debug/cygwin-2.2.1-1/winsup/cygwin/cygtls.cc:111 #2 0x61006151 in _cygtls::call (func=0x0, arg=0x611b6c70 ) at /usr/src/debug/cygwin-2.2.1-1/winsup/cygwin/cygtls.cc:30 #3 0x61088a58 in threadfunc_fe (arg=) at /usr/src/debug/cygwin-2.2.1-1/winsup/cygwin/init.cc:32 #4 0x7bc90da0 in call_thread_func_wrapper+0xc in ntdll (1) #5 0x7bc90e1c in call_thread_func+0x72 [/home/fracting/src/wine-patched/dlls/ntdll/signal_i386.c:3017] in ntdll (1) #6 0x7bc90d7e in call_thread_entry_point+0x12 in ntdll (1) #7 0x7bc9938a in ?? ()RtlCreateUserThread [/home/fracting/src/wine-patched/dlls/ntdll/thread.c:480] in ntdll (1) #8 0xb7525f16 in ?? () #9 0xb745c11e in ?? () (gdb) The hard way to reproduce it on my machine is to build some large project (like cygwin itself) on cygwin on top of wine, and some process will crashing randomly, the failure rate is very low, which puzzled me a lot ( I did verify they are the same bug by testing several different kinds of workaround ). However, not every Wine user can reproduce this bug, even with the "easy way". I also can't reproduce this bug with strace. After investigation, we found the problem is related to munge_threadfunc: 1. When a new cygwin thread is created, init.cc:dll_entry() is called with DLL_THREAD_ATTACH, which calls munge_threadfunc(). 2. Inside munge_threadfunc(), cygwin search for the address of cygthread::stub() in order to determine the thread entry point, and then try to patch the thread entry point and its copies in stack frame to a wrapper function called threadfunc_fe(). 3. According to my test, the searching result on Windows is always as expected. 4. However, when testing on Wine, we found the searching result is not reliable. Sometimes threadfunc_ix[0] point to Wine's mod_name inside dlls/ntdll/loader.c:MODULE_InitDLL(), which is called everytime a thread is initializing. This looks unexpected, since the original purpose of munge_threadfunc() is to find the thread entry point, but on Wine some garbage data in memory happens to be equal the the address of thread entry point, so munge_threadfunc() found the wrong address and fill the wrong offset to threadfunc_ix[]. 5. Since the offset might be wrong, ebp[threadfunc_ix[0]] is sometimes changed to unexpected data on Wine, so "TlsSetValue (_my_oldfunc, threadfunc)" store the wrong data and "TlsGetValue (_my_oldfunc)" get the wrong data, which makes Cygwin crashing randomly. We have a simple hack in the Wine side which makes Cygwin happy, which is attached as 0001-ntdll-Initialize-mode_name-to-zero.txt. The reason this hack works is, by filling the array mod_name by zero, it won't contain garbage data which confuse munge_threadfunc() anymore. However, this hack is ugly for Wine and not reliable for all compilers and or all compiler options. When people build wine with a newer/older compiler or with different optimization levels, the offsets might be slightly different again, and the problem would reappear. Alternative, we also tried a hack in the Cygwin side, which use ThreadQuerySetWin32StartAddress to query the thread entry point, as 0001-hack-use-ThreadQuerySetWin32StartAddress.txt show. I tested this hack with recent Cygwin git repo and confirming it works for me (without hack from Wine side). I also tested my own cygwin build with this hack on Windows to confirm it doesn't break things. Is the proposal way accepted by Cygwin? I understand we hate changing working code (on Windows), but using ThreadQuerySetWin32StartAddress seems like an improvement than rely on searching result from stack memory. If we could discuss a solution which makes both Cygwin and Wine happy that would be great. MSDN says, "Note that on versions of Windows prior to Windows Vista, the returned start address is only reliable before the thread starts running.". Actually I tested my build on Windows XP sp2 and it works for me. Additional, since Cygwin is moving to the end of Windows XP support, maybe we are at the right time to do this change. Any comment is great appreciated! cross-reference: https://bugs.wine-staging.com/show_bug.cgi?id=561 [1] https://github.com/wine-compholio/wine-patched/blob/5dee89ca82c36bf191ce3e26011b82dc87a42d4a/dlls/ntdll/loader.c#L1150 -- Regards, Qian Hong - http://www.winehq.org