public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
* Threads
@ 2014-10-20 13:04 Ken Brown
  2014-10-20 16:43 ` Threads Corinna Vinschen
  2014-10-23 11:31 ` Threads Jon TURNEY
  0 siblings, 2 replies; 23+ messages in thread
From: Ken Brown @ 2014-10-20 13:04 UTC (permalink / raw)
  To: cygwin

When trying to debug emacs in gdb, I see several threads, but it's not always 
clear who created those threads and what they're doing.  As an example, I 
attached gdb to an emacs-X11 process (running under X) shortly after starting 
it, and I obtained the backtrace appended at the end of this message.

I assume Thread 12 was created by gdb.  Thread 6 appears to be a timer thread 
and Thread 2 appears to be a signal thread; I assume both of these were created 
by the Cygwin DLL.  And Thread 1 is the main thread.  I don't have any idea 
where the other threads came from.  Presumably at least one of them was created 
by Glib.

The situation is similar with emacs-w32 and emacs-nox, but with fewer threads.

In general, is there a way I can understand where all the threads come from?  My 
reason for asking is that we're still getting emacs bug reports on 64-bit 
Cygwin, with random crashes or assertion violations that are "impossible" 
according to the gdb backtraces. [*]  So I'm wondering whether they're caused by 
interference from other threads.

Or is there some other plausible explanation for "impossible" crashes?  This 
can't just be a result of a gdb bug, because in at least one case the assertion 
can be shown to be valid by using printf instead of gdb.

Ken

[*] By "impossible" I mean that examination of the relevant variables in gdb 
shows that the assertions are in fact true.  Two ongoing examples are

    http://debbugs.gnu.org/cgi/bugreport.cgi?bug=18438
    http://debbugs.gnu.org/cgi/bugreport.cgi?bug=18769


=============thread apply all bt==============================

Thread 12 (Thread 6288.0x554):
#0  0x0000000077b50591 in ntdll!DbgBreakPoint ()
     at /c/Windows/SYSTEM32/ntdll.dll
#1  0x0000000077bf7f48 in ntdll!DbgUiRemoteBreakin ()
     at /c/Windows/SYSTEM32/ntdll.dll
#2  0x00000000779f59ed in KERNEL32!BaseThreadInitThunk ()
     at /c/Windows/system32/kernel32.dll
#3  0x0000000077b2c541 in ntdll!RtlUserThreadStart ()
     at /c/Windows/SYSTEM32/ntdll.dll
#4  0x0000000000000000 in  ()

Thread 11 (Thread 6288.0x2280):
#0  0x0000000077b52bba in ntdll!ZwWaitForWorkViaWorkerFactory ()
     at /c/Windows/SYSTEM32/ntdll.dll
#1  0x0000000077b1fe3b in ntdll!RtlValidateHeap ()
     at /c/Windows/SYSTEM32/ntdll.dll
#2  0x000000018004619b in _cygtls::call2(unsigned int (*)(void*, void*), void*, 
void*) (this=0x2000002, func=0x0, arg=0x5e82d0, buf=buf@entry=0x47bcd50)
     at /usr/src/debug/cygwin-1.7.32-1/winsup/cygwin/cygtls.cc:100
#3  0x00000001800462f4 in _cygtls::call(unsigned int (*)(void*, void*), void*) 
(func=<optimized out>, arg=<optimized out>)
     at /usr/src/debug/cygwin-1.7.32-1/winsup/cygwin/cygtls.cc:30
#4  0x00000000779f59ed in KERNEL32!BaseThreadInitThunk ()
     at /c/Windows/system32/kernel32.dll
#5  0x0000000077b2c541 in ntdll!RtlUserThreadStart ()
     at /c/Windows/SYSTEM32/ntdll.dll
#6  0x0000000000000000 in  ()

Thread 10 (Thread 6288.0x22c0):
#0  0x0000000077b52bba in ntdll!ZwWaitForWorkViaWorkerFactory ()
     at /c/Windows/SYSTEM32/ntdll.dll
#1  0x0000000077b1fe3b in ntdll!RtlValidateHeap ()
     at /c/Windows/SYSTEM32/ntdll.dll
#2  0x000000018004619b in _cygtls::call2(unsigned int (*)(void*, void*), void*, 
void*) (this=0x2000002, func=0x29000029, arg=0x5e82d0, buf=buf@entry=0x4bbcd50)
     at /usr/src/debug/cygwin-1.7.32-1/winsup/cygwin/cygtls.cc:100
#3  0x00000001800462f4 in _cygtls::call(unsigned int (*)(void*, void*), void*) 
(func=<optimized out>, arg=<optimized out>)
     at /usr/src/debug/cygwin-1.7.32-1/winsup/cygwin/cygtls.cc:30
#4  0x00000000779f59ed in KERNEL32!BaseThreadInitThunk ()
     at /c/Windows/system32/kernel32.dll
#5  0x0000000077b2c541 in ntdll!RtlUserThreadStart ()
     at /c/Windows/SYSTEM32/ntdll.dll
#6  0x0000000000000000 in  ()

Thread 9 (Thread 6288.0x1e98):
#0  0x0000000077b515fa in ntdll!ZwDelayExecution ()
     at /c/Windows/SYSTEM32/ntdll.dll
#1  0x000007fefda11203 in SleepEx () at /c/Windows/system32/KERNELBASE.dll
#2  0x000000018010d970 in thread_pipe(void*) (arg=0x600d2bfe0)
     at /usr/src/debug/cygwin-1.7.32-1/winsup/cygwin/select.cc:690
#3  0x0000000180044fc5 in cygthread::callfunc(bool) (this=this@entry=0x1801d0500 
<threads+352>, issimplestub=issimplestub@entry=false)
     at /usr/src/debug/cygwin-1.7.32-1/winsup/cygwin/cygthread.cc:51
#4  0x000000018004552a in cygthread::stub(void*) (arg=arg@entry=0x1801d0500 
<threads+352>) at /usr/src/debug/cygwin-1.7.32-1/winsup/cygwin/cygthread.cc:93
#5  0x000000018004619b in _cygtls::call2(unsigned int (*)(void*, void*), void*, 
void*) (this=0x43bce00, func=
     0x1800454d0 <cygthread::stub(void*)>, arg=0x1801d0500 <threads+352>, 
buf=buf@entry=0x43bcd50) at 
/usr/src/debug/cygwin-1.7.32-1/winsup/cygwin/cygtls.cc:100
#6  0x00000001800462f4 in _cygtls::call(unsigned int (*)(void*, void*), void*) 
(func=<optimized out>, arg=<optimized out>)
     at /usr/src/debug/cygwin-1.7.32-1/winsup/cygwin/cygtls.cc:30
#7  0x00000000779f59ed in KERNEL32!BaseThreadInitThunk ()
     at /c/Windows/system32/kernel32.dll
#8  0x0000000077b2c541 in ntdll!RtlUserThreadStart ()
     at /c/Windows/SYSTEM32/ntdll.dll
#9  0x0000000000000000 in  ()

Thread 8 (Thread 6288.0x1ae4):
#0  0x0000000077b5186a in ntdll!ZwWaitForMultipleObjects ()
     at /c/Windows/SYSTEM32/ntdll.dll
#1  0x000007fefda11430 in KERNELBASE!GetCurrentProcess ()
     at /c/Windows/system32/KERNELBASE.dll
#2  0x0000000000000000 in  ()

Thread 7 (Thread 6288.0x1bac):
#0  0x0000000077b5134a in ntdll!ZwRemoveIoCompletion ()
     at /c/Windows/SYSTEM32/ntdll.dll
#1  0x000007fefd095941 in  () at /c/Windows/System32/mswsock.dll
#2  0x0000000000000000 in  ()

Thread 6 (Thread 6288.0xf40):
#0  0x0000000077b512fa in ntdll!ZwWaitForSingleObject ()
     at /c/Windows/SYSTEM32/ntdll.dll
#1  0x000007fefda110dc in WaitForSingleObjectEx ()
     at /c/Windows/system32/KERNELBASE.dll
#2  0x000000018013db94 in timer_thread(void*) (x=0x37ba9d8)
     at /usr/src/debug/cygwin-1.7.32-1/winsup/cygwin/timer.cc:145
#3  0x0000000180044fc5 in cygthread::callfunc(bool) (this=this@entry=0x1801d0450 
<threads+176>, issimplestub=issimplestub@entry=false)
     at /usr/src/debug/cygwin-1.7.32-1/winsup/cygwin/cygthread.cc:51
#4  0x000000018004552a in cygthread::stub(void*) (arg=arg@entry=0x1801d0450 
<threads+176>) at /usr/src/debug/cygwin-1.7.32-1/winsup/cygwin/cygthread.cc:93
#5  0x000000018004619b in _cygtls::call2(unsigned int (*)(void*, void*), void*, 
void*) (this=0x37bce00, func=
     0x1800454d0 <cygthread::stub(void*)>, arg=0x1801d0450 <threads+176>, 
buf=buf@entry=0x37bcd50) at 
/usr/src/debug/cygwin-1.7.32-1/winsup/cygwin/cygtls.cc:100
#6  0x00000001800462f4 in _cygtls::call(unsigned int (*)(void*, void*), void*) 
(func=<optimized out>, arg=<optimized out>)
     at /usr/src/debug/cygwin-1.7.32-1/winsup/cygwin/cygtls.cc:30
#7  0x00000000779f59ed in KERNEL32!BaseThreadInitThunk ()
     at /c/Windows/system32/kernel32.dll
#8  0x0000000077b2c541 in ntdll!RtlUserThreadStart ()
     at /c/Windows/SYSTEM32/ntdll.dll
#9  0x0000000000000000 in  ()

Thread 5 (Thread 6288.0x1cc0):
#0  0x0000000077b515fa in ntdll!ZwDelayExecution ()
     at /c/Windows/SYSTEM32/ntdll.dll
#1  0x000007fefda11203 in SleepEx () at /c/Windows/system32/KERNELBASE.dll
#2  0x000000018010d970 in thread_pipe(void*) (arg=0x600045f20)
     at /usr/src/debug/cygwin-1.7.32-1/winsup/cygwin/select.cc:690
#3  0x0000000180044fc5 in cygthread::callfunc(bool) (this=this@entry=0x1801d03f8 
<threads+88>, issimplestub=issimplestub@entry=false)
     at /usr/src/debug/cygwin-1.7.32-1/winsup/cygwin/cygthread.cc:51
#4  0x000000018004552a in cygthread::stub(void*) (arg=arg@entry=0x1801d03f8 
<threads+88>) at /usr/src/debug/cygwin-1.7.32-1/winsup/cygwin/cygthread.cc:93
#5  0x000000018004619b in _cygtls::call2(unsigned int (*)(void*, void*), void*, 
void*) (this=0x33bce00, func=
     0x1800454d0 <cygthread::stub(void*)>, arg=0x1801d03f8 <threads+88>, 
buf=buf@entry=0x33bcd50) at 
/usr/src/debug/cygwin-1.7.32-1/winsup/cygwin/cygtls.cc:100
#6  0x00000001800462f4 in _cygtls::call(unsigned int (*)(void*, void*), void*) 
(func=<optimized out>, arg=<optimized out>)
     at /usr/src/debug/cygwin-1.7.32-1/winsup/cygwin/cygtls.cc:30
#7  0x00000000779f59ed in KERNEL32!BaseThreadInitThunk ()
     at /c/Windows/system32/kernel32.dll
#8  0x0000000077b2c541 in ntdll!RtlUserThreadStart ()
     at /c/Windows/SYSTEM32/ntdll.dll
#9  0x0000000000000000 in  ()

Thread 4 (Thread 6288.0x620):
#0  0x0000000077b5186a in ntdll!ZwWaitForMultipleObjects ()
     at /c/Windows/SYSTEM32/ntdll.dll
#1  0x000007fefda11430 in KERNELBASE!GetCurrentProcess ()
     at /c/Windows/system32/KERNELBASE.dll
#2  0x0000000000000000 in  ()

Thread 3 (Thread 6288.0x50c):
#0  0x0000000077b5186a in ntdll!ZwWaitForMultipleObjects ()
     at /c/Windows/SYSTEM32/ntdll.dll
#1  0x0000000077b1b037 in ntdll!TpIsTimerSet () at /c/Windows/SYSTEM32/ntdll.dll
#2  0x00000000779f59ed in KERNEL32!BaseThreadInitThunk ()
     at /c/Windows/system32/kernel32.dll
#3  0x0000000077b2c541 in ntdll!RtlUserThreadStart ()
     at /c/Windows/SYSTEM32/ntdll.dll
#4  0x0000000000000000 in  ()

Thread 2 (Thread 6288.0x4ec):
#0  0x0000000077b5131a in ntdll!ZwReadFile () at /c/Windows/SYSTEM32/ntdll.dll
#1  0x000007fefda11a7a in ReadFile () at /c/Windows/system32/KERNELBASE.dll
#2  0x00000000779f0a19 in ReadFile () at /c/Windows/system32/kernel32.dll
#3  0x00000001801197c2 in wait_sig(void*) ()
     at /usr/src/debug/cygwin-1.7.32-1/winsup/cygwin/sigproc.cc:1239
#4  0x0000000180044fc5 in cygthread::callfunc(bool) (this=this@entry=0x1801d03a0 
<threads>, issimplestub=issimplestub@entry=false)
     at /usr/src/debug/cygwin-1.7.32-1/winsup/cygwin/cygthread.cc:51
#5  0x000000018004552a in cygthread::stub(void*) (arg=arg@entry=0x1801d03a0 
<threads>) at /usr/src/debug/cygwin-1.7.32-1/winsup/cygwin/cygthread.cc:93
#6  0x000000018004619b in _cygtls::call2(unsigned int (*)(void*, void*), void*, 
void*) (this=0x23ace00, func=
     0x1800454d0 <cygthread::stub(void*)>, arg=0x1801d03a0 <threads>, 
buf=buf@entry=0x23acd50) at 
/usr/src/debug/cygwin-1.7.32-1/winsup/cygwin/cygtls.cc:100
#7  0x00000001800462f4 in _cygtls::call(unsigned int (*)(void*, void*), void*) 
(func=<optimized out>, arg=<optimized out>)
     at /usr/src/debug/cygwin-1.7.32-1/winsup/cygwin/cygtls.cc:30
#8  0x00000000779f59ed in KERNEL32!BaseThreadInitThunk ()
     at /c/Windows/system32/kernel32.dll
#9  0x0000000077b2c541 in ntdll!RtlUserThreadStart ()
     at /c/Windows/SYSTEM32/ntdll.dll
#10 0x0000000000000000 in  ()

Thread 1 (Thread 6288.0x2304):
#0  0x0000000077b5186a in ntdll!ZwWaitForMultipleObjects ()
     at /c/Windows/SYSTEM32/ntdll.dll
#1  0x000007fefda11430 in KERNELBASE!GetCurrentProcess ()
     at /c/Windows/system32/KERNELBASE.dll
#2  0x0000000000000001 in  ()
Continuing.

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Threads
  2014-10-20 13:04 Threads Ken Brown
@ 2014-10-20 16:43 ` Corinna Vinschen
  2014-10-20 19:03   ` Threads Corinna Vinschen
  2014-10-23 11:31 ` Threads Jon TURNEY
  1 sibling, 1 reply; 23+ messages in thread
From: Corinna Vinschen @ 2014-10-20 16:43 UTC (permalink / raw)
  To: cygwin

[-- Attachment #1: Type: text/plain, Size: 3726 bytes --]

On Oct 20 09:03, Ken Brown wrote:
> When trying to debug emacs in gdb, I see several threads, but it's not
> always clear who created those threads and what they're doing.  As an
> example, I attached gdb to an emacs-X11 process (running under X) shortly
> after starting it, and I obtained the backtrace appended at the end of this
> message.
> 
> I assume Thread 12 was created by gdb.

Yes, that's the debugging thread created by the OS when a debugger
connects.

Thread 11 and 10 seem to be dead threads which have been added to the
thread pool?!?

Thread 9 is a worker thread for select on a pipe.

Thread 8 and 7 are unrecognizable, I'd bet on a select call in at
least one of them.

> Thread 6 appears to be a timer thread 

Right, thread 6 is a worker thread for a timer_settime call (also 
called from setitimer, alarm, ualarm).

Thread 5 is another select on a pipe, thread 4 and 3 again not
recognizable.

> and Thread 2 appears to be a signal thread;

Right.

> I assume both of these
> were created by the Cygwin DLL.  And Thread 1 is the main thread.

Right.  Thread 2, 5, 6, 9 are Cygwin-created threads.

Threads 3, 4, 7 and 8 appear to be application-created threads.  At
least one of them is waiting in a select call, waiting for two pipe
handles, or two of them waiting for one each.  Select itself starts
threads a lot.

Threads 10 and 11 seemed to be ignorable, but I never saw threads
waiting in WaitForWorkViaWorkerFactory myself.  Cygwin does not
utilize the OS thread pools by itself, rather it implements its
own.

> I don't
> have any idea where the other threads came from.  Presumably at least one of
> them was created by Glib.
> 
> The situation is similar with emacs-w32 and emacs-nox, but with fewer threads.
> 
> In general, is there a way I can understand where all the threads come from?

There's no simple generic way to do that, afaik.

One big problem is to have all the symbols.  You should definitely
install the debuginfo packages of all potentially affected packages, not
only cygwin-debuginfo.  If you want to find out where threads are called
from the application, you might get a clue by running emacs under GDB and
set a breakpoint to pthread_create.

Unless, of course, any component starts threads by calling the Windows
function CreateThread.  There's no guarantee that Cygwin's thread
handling will catch these, even though it tries.

> My reason for asking is that we're still getting emacs bug reports on 64-bit
> Cygwin, with random crashes or assertion violations that are "impossible"
> according to the gdb backtraces. [*]  So I'm wondering whether they're
> caused by interference from other threads.
> 
> Or is there some other plausible explanation for "impossible" crashes?  This
> can't just be a result of a gdb bug, because in at least one case the
> assertion can be shown to be valid by using printf instead of gdb.

One of the headaches when porting is sometimes the ABI.  While on Linux
the first 6 arguments to a function are given in registers, on Windows
only 4 args are in registers.  This can result in bugs when calling
functions with more than 4 parameters, which are invisible on Linux, due
to the way 32 bit parameter are stored in registers on x86_64.  This
happened to us already for at least one package.

Other than that, it could be a bug in any of the affected components,
including Cygwin.  I'm sorry, but I don't even have a tang of a hunch,
even after reading the emacs bugreport entries :(


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Threads
  2014-10-20 16:43 ` Threads Corinna Vinschen
@ 2014-10-20 19:03   ` Corinna Vinschen
  2014-10-20 19:58     ` Threads Ken Brown
  0 siblings, 1 reply; 23+ messages in thread
From: Corinna Vinschen @ 2014-10-20 19:03 UTC (permalink / raw)
  To: cygwin

[-- Attachment #1: Type: text/plain, Size: 1995 bytes --]

On Oct 20 18:43, Corinna Vinschen wrote:
> On Oct 20 09:03, Ken Brown wrote:
> > When trying to debug emacs in gdb, I see several threads, but it's not
> > always clear who created those threads and what they're doing.  As an
> > example, I attached gdb to an emacs-X11 process (running under X) shortly
> > after starting it, and I obtained the backtrace appended at the end of this
> > message.
> > 
> > I assume Thread 12 was created by gdb.
> [...]
> > I don't
> > have any idea where the other threads came from.  Presumably at least one of
> > them was created by Glib.
> > 
> > The situation is similar with emacs-w32 and emacs-nox, but with fewer threads.
> > 
> > In general, is there a way I can understand where all the threads come from?
> 
> There's no simple generic way to do that, afaik.
> 
> One big problem is to have all the symbols.  You should definitely
> install the debuginfo packages of all potentially affected packages, not
> only cygwin-debuginfo.  If you want to find out where threads are called
> from the application, you might get a clue by running emacs under GDB and
> set a breakpoint to pthread_create.

Btw., I don't know if that helps, but the function names of native
Windows functions given in the GDB backtrace may be off because GDB
doesn't have access to the Windows DLL symbol tables.  If you want to
analyze the stacks from that side, you should install WinDbg(*) and the
symbol files for your OS(**).  If you start the process from WinDbg, you
can better see the Windows functions called from these threads, while
not getting any info about the functions from inside the application
or the Cygwin DLLs.


HTH,
Corinna


(*)  http://msdn.microsoft.com/en-us/windows/hardware/hh852365.aspx
(**) http://msdn.microsoft.com/en-us/windows/hardware/gg463028.aspx

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Threads
  2014-10-20 19:03   ` Threads Corinna Vinschen
@ 2014-10-20 19:58     ` Ken Brown
  2014-10-21 11:17       ` Threads Corinna Vinschen
  0 siblings, 1 reply; 23+ messages in thread
From: Ken Brown @ 2014-10-20 19:58 UTC (permalink / raw)
  To: cygwin

On 10/20/2014 3:03 PM, Corinna Vinschen wrote:
> On Oct 20 18:43, Corinna Vinschen wrote:
>> On Oct 20 09:03, Ken Brown wrote:
>>> When trying to debug emacs in gdb, I see several threads, but it's not
>>> always clear who created those threads and what they're doing.  As an
>>> example, I attached gdb to an emacs-X11 process (running under X) shortly
>>> after starting it, and I obtained the backtrace appended at the end of this
>>> message.
>>>
>>> I assume Thread 12 was created by gdb.
>> [...]
>>> I don't
>>> have any idea where the other threads came from.  Presumably at least one of
>>> them was created by Glib.
>>>
>>> The situation is similar with emacs-w32 and emacs-nox, but with fewer threads.
>>>
>>> In general, is there a way I can understand where all the threads come from?
>>
>> There's no simple generic way to do that, afaik.
>>
>> One big problem is to have all the symbols.  You should definitely
>> install the debuginfo packages of all potentially affected packages, not
>> only cygwin-debuginfo.  If you want to find out where threads are called
>> from the application, you might get a clue by running emacs under GDB and
>> set a breakpoint to pthread_create.
>
> Btw., I don't know if that helps, but the function names of native
> Windows functions given in the GDB backtrace may be off because GDB
> doesn't have access to the Windows DLL symbol tables.  If you want to
> analyze the stacks from that side, you should install WinDbg(*) and the
> symbol files for your OS(**).  If you start the process from WinDbg, you
> can better see the Windows functions called from these threads, while
> not getting any info about the functions from inside the application
> or the Cygwin DLLs.

Thanks, I'll give that a try.

> One of the headaches when porting is sometimes the ABI.  While on Linux
> the first 6 arguments to a function are given in registers, on Windows
> only 4 args are in registers.  This can result in bugs when calling
> functions with more than 4 parameters, which are invisible on Linux, due
> to the way 32 bit parameter are stored in registers on x86_64.  This
> happened to us already for at least one package.

Am I right in thinking this can only be an issue if the source includes 
assembler code?

Ken

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Threads
  2014-10-20 19:58     ` Threads Ken Brown
@ 2014-10-21 11:17       ` Corinna Vinschen
  2014-10-21 12:27         ` Threads Ken Brown
  0 siblings, 1 reply; 23+ messages in thread
From: Corinna Vinschen @ 2014-10-21 11:17 UTC (permalink / raw)
  To: cygwin

[-- Attachment #1: Type: text/plain, Size: 2969 bytes --]

On Oct 20 15:58, Ken Brown wrote:
> On 10/20/2014 3:03 PM, Corinna Vinschen wrote:
> >One of the headaches when porting is sometimes the ABI.  While on Linux
> >the first 6 arguments to a function are given in registers, on Windows
> >only 4 args are in registers.  This can result in bugs when calling
> >functions with more than 4 parameters, which are invisible on Linux, due
> >to the way 32 bit parameter are stored in registers on x86_64.  This
> >happened to us already for at least one package.
> 
> Am I right in thinking this can only be an issue if the source includes
> assembler code?

No.  This can be easily trigger by a bug in C code.  What happens is
this:

The 64 bit ABI is defined so that the first function args are passed
to the called functions via CPU registers.  On Windows the ABI uses 4
such registers(*), on Linux 6(**).  All following arguments are passed
on the stack.

The AMD64 CPUs introduced the following behaviour:  If a 32 bit value
(for instance, an int in C) is written to a register, the CPU
automatically clears the upper 32 bits of the register.  For instance:

  %rdx == 0x0123456789abcdef

  mov.l $0x42,%edx		<- This is a 32 bit mov!

  ==> %rdx == 0x0000000000000042

  No sign extension:

  mov.l $0xffffffff,%edx

  ==> %rdx == 0x00000000ffffffff

Now consider what happens if, for instance, the 5th argument to a
stdargs function is expecting a pointer value.  The caller calls the
function like this:

  foo (a, b, c, d, 0);

The 0 is int, it's not extendend to 64 bit.  On Linux, nothing bad
happens, because the 0 will be passed over to foo via register R8,
so the upper 32 bits are cleared.  On Cygwin, the 5th parameter is
passed via the stack, 64 bit aligned.  The upper 32 bits will not
be explicitely written.  They will contain random bytes.  foo doesn't
get a NULL pointer, but something like 0xdeadbeef00000000.  Here's
an example:

  $ cat > p.c <<EOF
  #include <stdio.h>

  int
  main ()
  {
    printf ("prepare stack:\n%p\n%p\n%p\n%p\n%p\n%p\n",
	    0x1111111111111111UL, 0x2222222222222222UL, 0x3333333333333333UL,
	    0x4444444444444444UL, 0x5555555555555555UL, 0x6666666666666666UL);
    printf ("\nprint null ptr:\n%p\n%p\n%p\n%p\n%p\n%p\n", 0, 0, 0, 0, 0, 0);
  }
  EOF
  $ gcc -g -o p p.c
  $ ./p

The same problem might occur if some code uses a function unprototyped.
My favorite example:

  /* Don't include string,h */
  printf ("Error message is: %s\n", strerror (errno));

Long story short, I have no idea if that's your problem at all, but I
thought I should at least mention it.


Corinna

(*) http://en.wikipedia.org/wiki/X86_calling_conventions#Microsoft_x64_calling_convention
(**) http://en.wikipedia.org/wiki/X86_calling_conventions#System_V_AMD64_ABI


-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Threads
  2014-10-21 11:17       ` Threads Corinna Vinschen
@ 2014-10-21 12:27         ` Ken Brown
  0 siblings, 0 replies; 23+ messages in thread
From: Ken Brown @ 2014-10-21 12:27 UTC (permalink / raw)
  To: cygwin

On 10/21/2014 7:17 AM, Corinna Vinschen wrote:
> On Oct 20 15:58, Ken Brown wrote:
>> On 10/20/2014 3:03 PM, Corinna Vinschen wrote:
>>> One of the headaches when porting is sometimes the ABI.  While on Linux
>>> the first 6 arguments to a function are given in registers, on Windows
>>> only 4 args are in registers.  This can result in bugs when calling
>>> functions with more than 4 parameters, which are invisible on Linux, due
>>> to the way 32 bit parameter are stored in registers on x86_64.  This
>>> happened to us already for at least one package.
>>
>> Am I right in thinking this can only be an issue if the source includes
>> assembler code?
>
> No.  This can be easily trigger by a bug in C code.  What happens is
> this:
>
> The 64 bit ABI is defined so that the first function args are passed
> to the called functions via CPU registers.  On Windows the ABI uses 4
> such registers(*), on Linux 6(**).  All following arguments are passed
> on the stack.
>
> The AMD64 CPUs introduced the following behaviour:  If a 32 bit value
> (for instance, an int in C) is written to a register, the CPU
> automatically clears the upper 32 bits of the register.  For instance:
>
>    %rdx == 0x0123456789abcdef
>
>    mov.l $0x42,%edx		<- This is a 32 bit mov!
>
>    ==> %rdx == 0x0000000000000042
>
>    No sign extension:
>
>    mov.l $0xffffffff,%edx
>
>    ==> %rdx == 0x00000000ffffffff
>
> Now consider what happens if, for instance, the 5th argument to a
> stdargs function is expecting a pointer value.  The caller calls the
> function like this:
>
>    foo (a, b, c, d, 0);
>
> The 0 is int, it's not extendend to 64 bit.  On Linux, nothing bad
> happens, because the 0 will be passed over to foo via register R8,
> so the upper 32 bits are cleared.  On Cygwin, the 5th parameter is
> passed via the stack, 64 bit aligned.  The upper 32 bits will not
> be explicitely written.  They will contain random bytes.  foo doesn't
> get a NULL pointer, but something like 0xdeadbeef00000000.  Here's
> an example:
>
>    $ cat > p.c <<EOF
>    #include <stdio.h>
>
>    int
>    main ()
>    {
>      printf ("prepare stack:\n%p\n%p\n%p\n%p\n%p\n%p\n",
> 	    0x1111111111111111UL, 0x2222222222222222UL, 0x3333333333333333UL,
> 	    0x4444444444444444UL, 0x5555555555555555UL, 0x6666666666666666UL);
>      printf ("\nprint null ptr:\n%p\n%p\n%p\n%p\n%p\n%p\n", 0, 0, 0, 0, 0, 0);
>    }
>    EOF
>    $ gcc -g -o p p.c
>    $ ./p
>
> The same problem might occur if some code uses a function unprototyped.
> My favorite example:
>
>    /* Don't include string,h */
>    printf ("Error message is: %s\n", strerror (errno));
>
> Long story short, I have no idea if that's your problem at all, but I
> thought I should at least mention it.

Thanks for the explanation!

Ken

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Threads
  2014-10-20 13:04 Threads Ken Brown
  2014-10-20 16:43 ` Threads Corinna Vinschen
@ 2014-10-23 11:31 ` Jon TURNEY
  2014-10-23 12:04   ` Threads Ken Brown
  1 sibling, 1 reply; 23+ messages in thread
From: Jon TURNEY @ 2014-10-23 11:31 UTC (permalink / raw)
  To: Ken Brown, cygwin

On 20/10/2014 14:03, Ken Brown wrote:
> Or is there some other plausible explanation for "impossible" crashes?
> This can't just be a result of a gdb bug, because in at least one case
> the assertion can be shown to be valid by using printf instead of gdb.
>
> [*] By "impossible" I mean that examination of the relevant variables in
> gdb shows that the assertions are in fact true.  Two ongoing examples are
>
>     http://debbugs.gnu.org/cgi/bugreport.cgi?bug=18438
>     http://debbugs.gnu.org/cgi/bugreport.cgi?bug=18769

As a suggestion, you might want to also take a careful look at how 
signal delivery is implemented in cygwin on x86_64

I had a vague idea that there was, at some time in the past, a fix made 
for register corruption on x86_64 after a signal was handled, but I 
can't find it now, so maybe I imagined it.  But if for e.g. the flags 
register was getting corrupted when a signal interrupts the main thread, 
that could perhaps also explain what is being seen.

(More generally, it doesn't have to be another thread which is causing 
these problems, it could be some form of interrupt)


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Threads
  2014-10-23 11:31 ` Threads Jon TURNEY
@ 2014-10-23 12:04   ` Ken Brown
  2014-10-23 15:37     ` Threads Corinna Vinschen
  0 siblings, 1 reply; 23+ messages in thread
From: Ken Brown @ 2014-10-23 12:04 UTC (permalink / raw)
  To: cygwin

On 10/23/2014 7:31 AM, Jon TURNEY wrote:
> On 20/10/2014 14:03, Ken Brown wrote:
>> Or is there some other plausible explanation for "impossible" crashes?
>> This can't just be a result of a gdb bug, because in at least one case
>> the assertion can be shown to be valid by using printf instead of gdb.
>>
>> [*] By "impossible" I mean that examination of the relevant variables in
>> gdb shows that the assertions are in fact true.  Two ongoing examples are
>>
>>     http://debbugs.gnu.org/cgi/bugreport.cgi?bug=18438
>>     http://debbugs.gnu.org/cgi/bugreport.cgi?bug=18769
>
> As a suggestion, you might want to also take a careful look at how signal
> delivery is implemented in cygwin on x86_64
>
> I had a vague idea that there was, at some time in the past, a fix made for
> register corruption on x86_64 after a signal was handled, but I can't find it
> now, so maybe I imagined it.

Is this what you're thinking of?

   https://cygwin.com/ml/cygwin-cvs/2014-q1/msg00020.html

> But if for e.g. the flags register was getting
> corrupted when a signal interrupts the main thread, that could perhaps also
> explain what is being seen.

Yes, flags register corruption is exactly what Eli suggested in the other bug 
report I cited.

> (More generally, it doesn't have to be another thread which is causing these
> problems, it could be some form of interrupt)
>

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Threads
  2014-10-23 12:04   ` Threads Ken Brown
@ 2014-10-23 15:37     ` Corinna Vinschen
  2014-10-23 18:07       ` Threads Achim Gratz
                         ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Corinna Vinschen @ 2014-10-23 15:37 UTC (permalink / raw)
  To: cygwin

[-- Attachment #1: Type: text/plain, Size: 1623 bytes --]

On Oct 23 08:04, Ken Brown wrote:
> On 10/23/2014 7:31 AM, Jon TURNEY wrote:
> >On 20/10/2014 14:03, Ken Brown wrote:
> >>Or is there some other plausible explanation for "impossible" crashes?
> >>This can't just be a result of a gdb bug, because in at least one case
> >>the assertion can be shown to be valid by using printf instead of gdb.
> >>
> >>[*] By "impossible" I mean that examination of the relevant variables in
> >>gdb shows that the assertions are in fact true.  Two ongoing examples are
> >>
> >>    http://debbugs.gnu.org/cgi/bugreport.cgi?bug=18438
> >>    http://debbugs.gnu.org/cgi/bugreport.cgi?bug=18769
> >
> >As a suggestion, you might want to also take a careful look at how signal
> >delivery is implemented in cygwin on x86_64
> >
> >I had a vague idea that there was, at some time in the past, a fix made for
> >register corruption on x86_64 after a signal was handled, but I can't find it
> >now, so maybe I imagined it.
> 
> Is this what you're thinking of?
> 
>   https://cygwin.com/ml/cygwin-cvs/2014-q1/msg00020.html
> 
> >But if for e.g. the flags register was getting
> >corrupted when a signal interrupts the main thread, that could perhaps also
> >explain what is being seen.
> 
> Yes, flags register corruption is exactly what Eli suggested in the other
> bug report I cited.

The aforementioned patch was supposed to fix this problem and it is
definitely in the current 1.7.32 release...


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Threads
  2014-10-23 15:37     ` Threads Corinna Vinschen
@ 2014-10-23 18:07       ` Achim Gratz
  2014-10-23 20:32       ` Threads Ken Brown
  2014-10-24 11:05       ` Threads Jon TURNEY
  2 siblings, 0 replies; 23+ messages in thread
From: Achim Gratz @ 2014-10-23 18:07 UTC (permalink / raw)
  To: cygwin

Corinna Vinschen writes:
>> Yes, flags register corruption is exactly what Eli suggested in the other
>> bug report I cited.
>
> The aforementioned patch was supposed to fix this problem and it is
> definitely in the current 1.7.32 release...

Emacs uses a bunch of libraries and also messes itself extensively with
signals, so that may have nothing to do with Cygwin.  I don't think
there are all that many non-Cygwin 64-bit Emacsen in use.  The
underlying timer-triggered errors seem to happen (very rarely) in the
32bit version as well, but I've never seen them crash Emacs there.


Regards,
Achim.
-- 
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

Factory and User Sound Singles for Waldorf Blofeld:
http://Synth.Stromeko.net/Downloads.html#WaldorfSounds

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Threads
  2014-10-23 15:37     ` Threads Corinna Vinschen
  2014-10-23 18:07       ` Threads Achim Gratz
@ 2014-10-23 20:32       ` Ken Brown
  2014-10-24  1:07         ` Threads Ken Brown
  2014-10-24 11:05       ` Threads Jon TURNEY
  2 siblings, 1 reply; 23+ messages in thread
From: Ken Brown @ 2014-10-23 20:32 UTC (permalink / raw)
  To: cygwin

On 10/23/2014 11:37 AM, Corinna Vinschen wrote:
> On Oct 23 08:04, Ken Brown wrote:
>> On 10/23/2014 7:31 AM, Jon TURNEY wrote:
>>> On 20/10/2014 14:03, Ken Brown wrote:
>>>> Or is there some other plausible explanation for "impossible" crashes?
>>>> This can't just be a result of a gdb bug, because in at least one case
>>>> the assertion can be shown to be valid by using printf instead of gdb.
>>>>
>>>> [*] By "impossible" I mean that examination of the relevant variables in
>>>> gdb shows that the assertions are in fact true.  Two ongoing examples are
>>>>
>>>>     http://debbugs.gnu.org/cgi/bugreport.cgi?bug=18438
>>>>     http://debbugs.gnu.org/cgi/bugreport.cgi?bug=18769
>>>
>>> As a suggestion, you might want to also take a careful look at how signal
>>> delivery is implemented in cygwin on x86_64
>>>
>>> I had a vague idea that there was, at some time in the past, a fix made for
>>> register corruption on x86_64 after a signal was handled, but I can't find it
>>> now, so maybe I imagined it.
>>
>> Is this what you're thinking of?
>>
>>    https://cygwin.com/ml/cygwin-cvs/2014-q1/msg00020.html
>>
>>> But if for e.g. the flags register was getting
>>> corrupted when a signal interrupts the main thread, that could perhaps also
>>> explain what is being seen.
>>
>> Yes, flags register corruption is exactly what Eli suggested in the other
>> bug report I cited.
>
> The aforementioned patch was supposed to fix this problem and it is
> definitely in the current 1.7.32 release...

The ChangeLog entry just mentions the FPU control word and the XMM 
registers, but not the ordinary FLAGS register (or rather EFLAGS for x86 
and RFLAGS for x86_64, if I'm understanding correctly what I find in 
Wikipedia).  Did the patch also take care of that?

Ken


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Threads
  2014-10-23 20:32       ` Threads Ken Brown
@ 2014-10-24  1:07         ` Ken Brown
  2014-10-24  9:46           ` Threads Corinna Vinschen
  0 siblings, 1 reply; 23+ messages in thread
From: Ken Brown @ 2014-10-24  1:07 UTC (permalink / raw)
  To: cygwin

On 10/23/2014 4:32 PM, Ken Brown wrote:
> On 10/23/2014 11:37 AM, Corinna Vinschen wrote:
>> On Oct 23 08:04, Ken Brown wrote:
>>> On 10/23/2014 7:31 AM, Jon TURNEY wrote:
>>>> On 20/10/2014 14:03, Ken Brown wrote:
>>>>> Or is there some other plausible explanation for "impossible" crashes?
>>>>> This can't just be a result of a gdb bug, because in at least one case
>>>>> the assertion can be shown to be valid by using printf instead of gdb.
>>>>>
>>>>> [*] By "impossible" I mean that examination of the relevant
>>>>> variables in
>>>>> gdb shows that the assertions are in fact true.  Two ongoing
>>>>> examples are
>>>>>
>>>>>     http://debbugs.gnu.org/cgi/bugreport.cgi?bug=18438
>>>>>     http://debbugs.gnu.org/cgi/bugreport.cgi?bug=18769
>>>>
>>>> As a suggestion, you might want to also take a careful look at how
>>>> signal
>>>> delivery is implemented in cygwin on x86_64
>>>>
>>>> I had a vague idea that there was, at some time in the past, a fix
>>>> made for
>>>> register corruption on x86_64 after a signal was handled, but I
>>>> can't find it
>>>> now, so maybe I imagined it.
>>>
>>> Is this what you're thinking of?
>>>
>>>    https://cygwin.com/ml/cygwin-cvs/2014-q1/msg00020.html
>>>
>>>> But if for e.g. the flags register was getting
>>>> corrupted when a signal interrupts the main thread, that could
>>>> perhaps also
>>>> explain what is being seen.
>>>
>>> Yes, flags register corruption is exactly what Eli suggested in the
>>> other
>>> bug report I cited.
>>
>> The aforementioned patch was supposed to fix this problem and it is
>> definitely in the current 1.7.32 release...
>
> The ChangeLog entry just mentions the FPU control word and the XMM
> registers, but not the ordinary FLAGS register (or rather EFLAGS for x86
> and RFLAGS for x86_64, if I'm understanding correctly what I find in
> Wikipedia).  Did the patch also take care of that?

Never mind, it looks like that was already OK before the patch.  I see 
that there are pushf and popf instructions in gendef.

Ken


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Threads
  2014-10-24  1:07         ` Threads Ken Brown
@ 2014-10-24  9:46           ` Corinna Vinschen
  0 siblings, 0 replies; 23+ messages in thread
From: Corinna Vinschen @ 2014-10-24  9:46 UTC (permalink / raw)
  To: cygwin

[-- Attachment #1: Type: text/plain, Size: 2522 bytes --]

On Oct 23 21:07, Ken Brown wrote:
> On 10/23/2014 4:32 PM, Ken Brown wrote:
> >On 10/23/2014 11:37 AM, Corinna Vinschen wrote:
> >>On Oct 23 08:04, Ken Brown wrote:
> >>>On 10/23/2014 7:31 AM, Jon TURNEY wrote:
> >>>>On 20/10/2014 14:03, Ken Brown wrote:
> >>>>>Or is there some other plausible explanation for "impossible" crashes?
> >>>>>This can't just be a result of a gdb bug, because in at least one case
> >>>>>the assertion can be shown to be valid by using printf instead of gdb.
> >>>>>
> >>>>>[*] By "impossible" I mean that examination of the relevant
> >>>>>variables in
> >>>>>gdb shows that the assertions are in fact true.  Two ongoing
> >>>>>examples are
> >>>>>
> >>>>>    http://debbugs.gnu.org/cgi/bugreport.cgi?bug=18438
> >>>>>    http://debbugs.gnu.org/cgi/bugreport.cgi?bug=18769
> >>>>
> >>>>As a suggestion, you might want to also take a careful look at how
> >>>>signal
> >>>>delivery is implemented in cygwin on x86_64
> >>>>
> >>>>I had a vague idea that there was, at some time in the past, a fix
> >>>>made for
> >>>>register corruption on x86_64 after a signal was handled, but I
> >>>>can't find it
> >>>>now, so maybe I imagined it.
> >>>
> >>>Is this what you're thinking of?
> >>>
> >>>   https://cygwin.com/ml/cygwin-cvs/2014-q1/msg00020.html
> >>>
> >>>>But if for e.g. the flags register was getting
> >>>>corrupted when a signal interrupts the main thread, that could
> >>>>perhaps also
> >>>>explain what is being seen.
> >>>
> >>>Yes, flags register corruption is exactly what Eli suggested in the
> >>>other
> >>>bug report I cited.
> >>
> >>The aforementioned patch was supposed to fix this problem and it is
> >>definitely in the current 1.7.32 release...
> >
> >The ChangeLog entry just mentions the FPU control word and the XMM
> >registers, but not the ordinary FLAGS register (or rather EFLAGS for x86
> >and RFLAGS for x86_64, if I'm understanding correctly what I find in
> >Wikipedia).  Did the patch also take care of that?
> 
> Never mind, it looks like that was already OK before the patch.  I see that
> there are pushf and popf instructions in gendef.

Right.  If there's any indication that this isn't sufficient, please
tell me.  Maybe there's some other kind of CPU state information which
needs to be saved and restored for signal handling?!?


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Threads
  2014-10-23 15:37     ` Threads Corinna Vinschen
  2014-10-23 18:07       ` Threads Achim Gratz
  2014-10-23 20:32       ` Threads Ken Brown
@ 2014-10-24 11:05       ` Jon TURNEY
  2014-10-24 12:54         ` Threads Corinna Vinschen
  2 siblings, 1 reply; 23+ messages in thread
From: Jon TURNEY @ 2014-10-24 11:05 UTC (permalink / raw)
  To: cygwin

[-- Attachment #1: Type: text/plain, Size: 2036 bytes --]

On 23/10/2014 16:37, Corinna Vinschen wrote:
> On Oct 23 08:04, Ken Brown wrote:
>> On 10/23/2014 7:31 AM, Jon TURNEY wrote:
>>> On 20/10/2014 14:03, Ken Brown wrote:
>>>> Or is there some other plausible explanation for "impossible" crashes?
>>>> This can't just be a result of a gdb bug, because in at least one case
>>>> the assertion can be shown to be valid by using printf instead of gdb.
>>>>
>>>> [*] By "impossible" I mean that examination of the relevant variables in
>>>> gdb shows that the assertions are in fact true.  Two ongoing examples are
>>>>
>>>>     http://debbugs.gnu.org/cgi/bugreport.cgi?bug=18438
>>>>     http://debbugs.gnu.org/cgi/bugreport.cgi?bug=18769
>>>
>>> As a suggestion, you might want to also take a careful look at how signal
>>> delivery is implemented in cygwin on x86_64
>>>
>>> I had a vague idea that there was, at some time in the past, a fix made for
>>> register corruption on x86_64 after a signal was handled, but I can't find it
>>> now, so maybe I imagined it.
>>
>> Is this what you're thinking of?
>>
>>    https://cygwin.com/ml/cygwin-cvs/2014-q1/msg00020.html
>>
>>> But if for e.g. the flags register was getting
>>> corrupted when a signal interrupts the main thread, that could perhaps also
>>> explain what is being seen.
>>
>> Yes, flags register corruption is exactly what Eli suggested in the other
>> bug report I cited.
>
> The aforementioned patch was supposed to fix this problem and it is
> definitely in the current 1.7.32 release...

I didn't mean to suggest otherwise, just that perhaps a similar problem 
exists now.

So I made the attached test case to explore that.  Maybe I've made an 
obvious mistake with it, but on the face of it, it seems to demonstrate 
something...

jon@tambora /
$ gcc signal-stress.c  -Wall -O0 -g

jon@tambora /
$ ./a
failed: 2144210386 isn't equal to 2144210386, apparently

Note there is some odd load dependency. For me, it works fine when it's 
the only thing running, but when I start up something CPU intensive, it 
often fails...

[-- Attachment #2: signal-stress.c --]
[-- Type: text/plain, Size: 1208 bytes --]


#include <assert.h>
#include <sys/time.h>
#include <signal.h>
#include <stdio.h>

long SmartScheduleInterval = 1; /* ms */
long SmartScheduleTime = 0;

static void
SmartScheduleTimer(int sig)
{
    if (sig != 0)
       SmartScheduleTime += SmartScheduleInterval;
}

void
SmartScheduleStartTimer(void)
{
    struct itimerval timer;
    timer.it_interval.tv_sec = 0;
    timer.it_interval.tv_usec = SmartScheduleInterval * 1000;
    timer.it_value.tv_sec = 0;
    timer.it_value.tv_usec = SmartScheduleInterval * 1000;
    setitimer(ITIMER_REAL, &timer, 0);
}

int main()
{
    /* Set up the timer signal function */
    struct sigaction act;
    act.sa_handler = SmartScheduleTimer;
    sigemptyset(&act.sa_mask);
    sigaddset(&act.sa_mask, SIGALRM);
    if (sigaction(SIGALRM, &act, 0) < 0) {
        perror("sigaction failed");
	return -1;
    }

   /* start timer */
   SmartScheduleStartTimer();

   /* Loop forever, doing tests which should always succeed, with lots of signals */
   int x = 0;
   int i = 0;
   while (1) {
     x = i;
     int j = x;
     if (j != i)
       {
          printf("failed: %d isn't equal to %d, apparently\n", i, j);
          break;
       }
     i++;
  }
  return 0;
}

[-- Attachment #3: Type: text/plain, Size: 218 bytes --]

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Threads
  2014-10-24 11:05       ` Threads Jon TURNEY
@ 2014-10-24 12:54         ` Corinna Vinschen
  2014-10-24 13:52           ` Threads Corinna Vinschen
  0 siblings, 1 reply; 23+ messages in thread
From: Corinna Vinschen @ 2014-10-24 12:54 UTC (permalink / raw)
  To: cygwin

[-- Attachment #1: Type: text/plain, Size: 3179 bytes --]

On Oct 24 12:05, Jon TURNEY wrote:
> On 23/10/2014 16:37, Corinna Vinschen wrote:
> >On Oct 23 08:04, Ken Brown wrote:
> >>Yes, flags register corruption is exactly what Eli suggested in the other
> >>bug report I cited.
> >
> >The aforementioned patch was supposed to fix this problem and it is
> >definitely in the current 1.7.32 release...
> 
> I didn't mean to suggest otherwise, just that perhaps a similar problem
> exists now.
> 
> So I made the attached test case to explore that.  Maybe I've made an
> obvious mistake with it, but on the face of it, it seems to demonstrate
> something...
> 
> jon@tambora /
> $ gcc signal-stress.c  -Wall -O0 -g
> 
> jon@tambora /
> $ ./a
> failed: 2144210386 isn't equal to 2144210386, apparently

So it checks i and j for equality, fails, and then comes up with
"42 isn't equal to 42"?  This is weird...

> Note there is some odd load dependency. For me, it works fine when it's the
> only thing running, but when I start up something CPU intensive, it often
> fails...

That's... interesting.  I wonder if that only occurs in multi-core or
multi-CPU environments.  The fact that i and j are not the same when
testing, but then are the same when printf is called looks like a
out-of-order execution problem.

Is it possible that we have to add CPU memory barriers to the sigdelayed
function to avoid stuff like this?


Corinna


> #include <assert.h>
> #include <sys/time.h>
> #include <signal.h>
> #include <stdio.h>
> 
> long SmartScheduleInterval = 1; /* ms */
> long SmartScheduleTime = 0;
> 
> static void
> SmartScheduleTimer(int sig)
> {
>     if (sig != 0)
>        SmartScheduleTime += SmartScheduleInterval;
> }
> 
> void
> SmartScheduleStartTimer(void)
> {
>     struct itimerval timer;
>     timer.it_interval.tv_sec = 0;
>     timer.it_interval.tv_usec = SmartScheduleInterval * 1000;
>     timer.it_value.tv_sec = 0;
>     timer.it_value.tv_usec = SmartScheduleInterval * 1000;
>     setitimer(ITIMER_REAL, &timer, 0);
> }
> 
> int main()
> {
>     /* Set up the timer signal function */
>     struct sigaction act;
>     act.sa_handler = SmartScheduleTimer;
>     sigemptyset(&act.sa_mask);
>     sigaddset(&act.sa_mask, SIGALRM);
>     if (sigaction(SIGALRM, &act, 0) < 0) {
>         perror("sigaction failed");
> 	return -1;
>     }
> 
>    /* start timer */
>    SmartScheduleStartTimer();
> 
>    /* Loop forever, doing tests which should always succeed, with lots of signals */
>    int x = 0;
>    int i = 0;
>    while (1) {
>      x = i;
>      int j = x;
>      if (j != i)
>        {
>           printf("failed: %d isn't equal to %d, apparently\n", i, j);
>           break;
>        }
>      i++;
>   }
>   return 0;
> }

> --
> Problem reports:       http://cygwin.com/problems.html
> FAQ:                   http://cygwin.com/faq/
> Documentation:         http://cygwin.com/docs.html
> Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple


-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Threads
  2014-10-24 12:54         ` Threads Corinna Vinschen
@ 2014-10-24 13:52           ` Corinna Vinschen
  2014-10-26 11:58             ` Threads Ken Brown
  2014-10-28 10:44             ` Threads Jon TURNEY
  0 siblings, 2 replies; 23+ messages in thread
From: Corinna Vinschen @ 2014-10-24 13:52 UTC (permalink / raw)
  To: cygwin

[-- Attachment #1: Type: text/plain, Size: 2302 bytes --]

On Oct 24 14:54, Corinna Vinschen wrote:
> On Oct 24 12:05, Jon TURNEY wrote:
> > On 23/10/2014 16:37, Corinna Vinschen wrote:
> > >On Oct 23 08:04, Ken Brown wrote:
> > >>Yes, flags register corruption is exactly what Eli suggested in the other
> > >>bug report I cited.
> > >
> > >The aforementioned patch was supposed to fix this problem and it is
> > >definitely in the current 1.7.32 release...
> > 
> > I didn't mean to suggest otherwise, just that perhaps a similar problem
> > exists now.
> > 
> > So I made the attached test case to explore that.  Maybe I've made an
> > obvious mistake with it, but on the face of it, it seems to demonstrate
> > something...
> > 
> > jon@tambora /
> > $ gcc signal-stress.c  -Wall -O0 -g
> > 
> > jon@tambora /
> > $ ./a
> > failed: 2144210386 isn't equal to 2144210386, apparently
> 
> So it checks i and j for equality, fails, and then comes up with
> "42 isn't equal to 42"?  This is weird...
> 
> > Note there is some odd load dependency. For me, it works fine when it's the
> > only thing running, but when I start up something CPU intensive, it often
> > fails...
> 
> That's... interesting.  I wonder if that only occurs in multi-core or
> multi-CPU environments.  The fact that i and j are not the same when
> testing, but then are the same when printf is called looks like a
> out-of-order execution problem.
> 
> Is it possible that we have to add CPU memory barriers to the sigdelayed
> function to avoid stuff like this?

I discussed this with my college Kai Tietz (many thanks to him from
here), and we came up with a problem in sigdelayed in the 64 bit case:
pushf is called *after* aligning the stack with andq.  This alignment
potentially changes the CPU flag values so the restored flags are
potentially not the flags when entering sigdelayed.

I just applied a patch and created new snapshots on
https://cygwin.com/snapshots/

I couldn't reprocude the problem locally, so I'd be grateful if you
could test if that fixes the problem in your testcase, Jon.

Ken, can you check if this snapshot helps emacs along, too?


Thanks,
Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Threads
  2014-10-24 13:52           ` Threads Corinna Vinschen
@ 2014-10-26 11:58             ` Ken Brown
  2014-10-28 10:44             ` Threads Jon TURNEY
  1 sibling, 0 replies; 23+ messages in thread
From: Ken Brown @ 2014-10-26 11:58 UTC (permalink / raw)
  To: cygwin

On 10/24/2014 9:52 AM, Corinna Vinschen wrote:
> On Oct 24 14:54, Corinna Vinschen wrote:
>> On Oct 24 12:05, Jon TURNEY wrote:
>>> On 23/10/2014 16:37, Corinna Vinschen wrote:
>>>> On Oct 23 08:04, Ken Brown wrote:
>>>>> Yes, flags register corruption is exactly what Eli suggested in the other
>>>>> bug report I cited.
>>>>
>>>> The aforementioned patch was supposed to fix this problem and it is
>>>> definitely in the current 1.7.32 release...
>>>
>>> I didn't mean to suggest otherwise, just that perhaps a similar problem
>>> exists now.
>>>
>>> So I made the attached test case to explore that.  Maybe I've made an
>>> obvious mistake with it, but on the face of it, it seems to demonstrate
>>> something...
>>>
>>> jon@tambora /
>>> $ gcc signal-stress.c  -Wall -O0 -g
>>>
>>> jon@tambora /
>>> $ ./a
>>> failed: 2144210386 isn't equal to 2144210386, apparently
>>
>> So it checks i and j for equality, fails, and then comes up with
>> "42 isn't equal to 42"?  This is weird...
>>
>>> Note there is some odd load dependency. For me, it works fine when it's the
>>> only thing running, but when I start up something CPU intensive, it often
>>> fails...
>>
>> That's... interesting.  I wonder if that only occurs in multi-core or
>> multi-CPU environments.  The fact that i and j are not the same when
>> testing, but then are the same when printf is called looks like a
>> out-of-order execution problem.
>>
>> Is it possible that we have to add CPU memory barriers to the sigdelayed
>> function to avoid stuff like this?
>
> I discussed this with my college Kai Tietz (many thanks to him from
> here), and we came up with a problem in sigdelayed in the 64 bit case:
> pushf is called *after* aligning the stack with andq.  This alignment
> potentially changes the CPU flag values so the restored flags are
> potentially not the flags when entering sigdelayed.
>
> I just applied a patch and created new snapshots on
> https://cygwin.com/snapshots/
>
> I couldn't reprocude the problem locally, so I'd be grateful if you
> could test if that fixes the problem in your testcase, Jon.

I tried Jon's testcase.  With cygwin-1.7.33-0.1, it failed within a few minutes. 
  With cygwin-1.7.33-0.2, I ran it for over an hour with no problem, with the 
system heavily loaded.  So it looks good so far.

> Ken, can you check if this snapshot helps emacs along, too?

The people who have been reporting frequent crashes are aware of the fix.  Now I 
just have to wait and hope I don't hear from them for a few days.

Ken

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Threads
  2014-10-24 13:52           ` Threads Corinna Vinschen
  2014-10-26 11:58             ` Threads Ken Brown
@ 2014-10-28 10:44             ` Jon TURNEY
  2014-10-28 11:40               ` Threads Corinna Vinschen
  1 sibling, 1 reply; 23+ messages in thread
From: Jon TURNEY @ 2014-10-28 10:44 UTC (permalink / raw)
  To: cygwin

On 24/10/2014 14:52, Corinna Vinschen wrote:
> I discussed this with my college Kai Tietz (many thanks to him from
> here), and we came up with a problem in sigdelayed in the 64 bit case:
> pushf is called *after* aligning the stack with andq.  This alignment
> potentially changes the CPU flag values so the restored flags are
> potentially not the flags when entering sigdelayed.
>
> I just applied a patch and created new snapshots on
> https://cygwin.com/snapshots/
>
> I couldn't reproduce the problem locally, so I'd be grateful if you
> could test if that fixes the problem in your testcase, Jon.

I've tried that snapshot with both my testcase and emacs, without problems.

Thanks very much for fixing this!


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Threads
  2014-10-28 10:44             ` Threads Jon TURNEY
@ 2014-10-28 11:40               ` Corinna Vinschen
  2014-10-28 13:47                 ` Threads Ken Brown
  0 siblings, 1 reply; 23+ messages in thread
From: Corinna Vinschen @ 2014-10-28 11:40 UTC (permalink / raw)
  To: cygwin

[-- Attachment #1: Type: text/plain, Size: 1062 bytes --]

On Oct 28 10:43, Jon TURNEY wrote:
> On 24/10/2014 14:52, Corinna Vinschen wrote:
> >I discussed this with my college Kai Tietz (many thanks to him from
> >here), and we came up with a problem in sigdelayed in the 64 bit case:
> >pushf is called *after* aligning the stack with andq.  This alignment
> >potentially changes the CPU flag values so the restored flags are
> >potentially not the flags when entering sigdelayed.
> >
> >I just applied a patch and created new snapshots on
> >https://cygwin.com/snapshots/
> >
> >I couldn't reproduce the problem locally, so I'd be grateful if you
> >could test if that fixes the problem in your testcase, Jon.
> 
> I've tried that snapshot with both my testcase and emacs, without problems.
> 
> Thanks very much for fixing this!

Thanks again to Kai Tietz for helping me with this.

Let's just hope this was the actual problem... :}


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Threads
  2014-10-28 11:40               ` Threads Corinna Vinschen
@ 2014-10-28 13:47                 ` Ken Brown
  2014-10-28 14:19                   ` [GOLDSTARS] Threads Corinna Vinschen
  0 siblings, 1 reply; 23+ messages in thread
From: Ken Brown @ 2014-10-28 13:47 UTC (permalink / raw)
  To: cygwin; +Cc: Eli Zaretskii

On 10/28/2014 7:40 AM, Corinna Vinschen wrote:
> On Oct 28 10:43, Jon TURNEY wrote:
>> On 24/10/2014 14:52, Corinna Vinschen wrote:
>>> I discussed this with my college Kai Tietz (many thanks to him from
>>> here), and we came up with a problem in sigdelayed in the 64 bit case:
>>> pushf is called *after* aligning the stack with andq.  This alignment
>>> potentially changes the CPU flag values so the restored flags are
>>> potentially not the flags when entering sigdelayed.
>>>
>>> I just applied a patch and created new snapshots on
>>> https://cygwin.com/snapshots/
>>>
>>> I couldn't reproduce the problem locally, so I'd be grateful if you
>>> could test if that fixes the problem in your testcase, Jon.
>>
>> I've tried that snapshot with both my testcase and emacs, without problems.
>>
>> Thanks very much for fixing this!
>
> Thanks again to Kai Tietz for helping me with this.

And thanks to Eli for suggesting that we look for corruption of the flags 
register, and to Jon for providing the test case that narrowed it down to signal 
handling.

Ken

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [GOLDSTARS] Re: Threads
  2014-10-28 13:47                 ` Threads Ken Brown
@ 2014-10-28 14:19                   ` Corinna Vinschen
  2014-10-28 17:39                     ` Andrew Schulman
  0 siblings, 1 reply; 23+ messages in thread
From: Corinna Vinschen @ 2014-10-28 14:19 UTC (permalink / raw)
  To: cygwin

[-- Attachment #1: Type: text/plain, Size: 1492 bytes --]

On Oct 28 09:46, Ken Brown wrote:
> On 10/28/2014 7:40 AM, Corinna Vinschen wrote:
> >On Oct 28 10:43, Jon TURNEY wrote:
> >>On 24/10/2014 14:52, Corinna Vinschen wrote:
> >>>I discussed this with my college Kai Tietz (many thanks to him from
> >>>here), and we came up with a problem in sigdelayed in the 64 bit case:
> >>>pushf is called *after* aligning the stack with andq.  This alignment
> >>>potentially changes the CPU flag values so the restored flags are
> >>>potentially not the flags when entering sigdelayed.
> >>>
> >>>I just applied a patch and created new snapshots on
> >>>https://cygwin.com/snapshots/
> >>>
> >>>I couldn't reproduce the problem locally, so I'd be grateful if you
> >>>could test if that fixes the problem in your testcase, Jon.
> >>
> >>I've tried that snapshot with both my testcase and emacs, without problems.
> >>
> >>Thanks very much for fixing this!
> >
> >Thanks again to Kai Tietz for helping me with this.
> 
> And thanks to Eli for suggesting that we look for corruption of the flags
> register, and to Jon for providing the test case that narrowed it down to
> signal handling.

Absolutely.  Since that's a serious but very subtil error, and you all
were very resourceful and diligent helping to fix it, I buy us all a
round of goldstars.


Thanks guys,
Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [GOLDSTARS] Re: Threads
  2014-10-28 14:19                   ` [GOLDSTARS] Threads Corinna Vinschen
@ 2014-10-28 17:39                     ` Andrew Schulman
  2014-10-29 10:00                       ` Corinna Vinschen
  0 siblings, 1 reply; 23+ messages in thread
From: Andrew Schulman @ 2014-10-28 17:39 UTC (permalink / raw)
  To: cygwin

> Absolutely.  Since that's a serious but very subtil error, and you all
> were very resourceful and diligent helping to fix it, I buy us all a
> round of goldstars.

Whew.

http://cygwin.com/goldstars/#KB
http://cygwin.com/goldstars/#KT
http://cygwin.com/goldstars/#EZ
http://cygwin.com/goldstars/#JTy
http://cygwin.com/goldstars/#CV


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [GOLDSTARS] Re: Threads
  2014-10-28 17:39                     ` Andrew Schulman
@ 2014-10-29 10:00                       ` Corinna Vinschen
  0 siblings, 0 replies; 23+ messages in thread
From: Corinna Vinschen @ 2014-10-29 10:00 UTC (permalink / raw)
  To: cygwin

[-- Attachment #1: Type: text/plain, Size: 592 bytes --]

On Oct 28 13:38, Andrew Schulman wrote:
> > Absolutely.  Since that's a serious but very subtil error, and you all
> > were very resourceful and diligent helping to fix it, I buy us all a
> > round of goldstars.
> 
> Whew.
> 
> http://cygwin.com/goldstars/#KB
> http://cygwin.com/goldstars/#KT
> http://cygwin.com/goldstars/#EZ
> http://cygwin.com/goldstars/#JTy
> http://cygwin.com/goldstars/#CV

Cool, thank you!


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2014-10-29 10:00 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-20 13:04 Threads Ken Brown
2014-10-20 16:43 ` Threads Corinna Vinschen
2014-10-20 19:03   ` Threads Corinna Vinschen
2014-10-20 19:58     ` Threads Ken Brown
2014-10-21 11:17       ` Threads Corinna Vinschen
2014-10-21 12:27         ` Threads Ken Brown
2014-10-23 11:31 ` Threads Jon TURNEY
2014-10-23 12:04   ` Threads Ken Brown
2014-10-23 15:37     ` Threads Corinna Vinschen
2014-10-23 18:07       ` Threads Achim Gratz
2014-10-23 20:32       ` Threads Ken Brown
2014-10-24  1:07         ` Threads Ken Brown
2014-10-24  9:46           ` Threads Corinna Vinschen
2014-10-24 11:05       ` Threads Jon TURNEY
2014-10-24 12:54         ` Threads Corinna Vinschen
2014-10-24 13:52           ` Threads Corinna Vinschen
2014-10-26 11:58             ` Threads Ken Brown
2014-10-28 10:44             ` Threads Jon TURNEY
2014-10-28 11:40               ` Threads Corinna Vinschen
2014-10-28 13:47                 ` Threads Ken Brown
2014-10-28 14:19                   ` [GOLDSTARS] Threads Corinna Vinschen
2014-10-28 17:39                     ` Andrew Schulman
2014-10-29 10:00                       ` Corinna Vinschen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).