[Please don't CC me, just send mail to the list.  Thank you]


On Nov 21 15:11, Mikulas Patocka wrote:
> > Do you use a DLL built with optimization by any chance?  I wouldn't take 
> > the backtraces too serious in that case.  For debugging it helps a lot 
> > to use a Cygwin DLL built without -O2.
> 
> I use optimization. The stacktrace may contain some other functions that 
> already finished but left the address on the stack.

There may also be functions completely missing on the stack.  I still
suggest to build with -g only.

> > Btw., are you testing on 32 or 64 bit?
> 
> On 32-bit. The rebuild of cygwin1.dll requires large number of packages to 
> create the documentation (including tex and java) and I haven't bloated 

Java?!?

> the 64-bit cygwin installation with them yet. I wish it were possible to 
> build the library without documentation and without such big dependecies.

You don't have to build the docs to build the DLL.  The make process
continues even if building the docs fails.

> > I'm testing on 64 bit. I can't reproduce your backtrace, but I can 
> > reproduce another one, which is related to thread_exit.  At one point 
> > after a couple thousand runs through your testcase I have a variable 
> > number of threads hanging in thread_exit, and a timer thread which is 
> > unable to send its signal.  the other threads all hang in thread_exit, 
> > waiting for a muto which is taken by a thread which doesn't exist 
> > anymore.
> 
> So you can - just for debugging - add a counter to thread local storage 
> that is incremented when muto is taken and decremented when muto is 
> released. If the thread exists, test the counter, if it is non-zero, print 
> the backtrace or attach the debugger.

For instance.

> > That's a very serious downside of the muto implementation not 
> > being able to recognize being abandoned. I wonder if that shouldn't be 
> > using a real OS mutex.
> 
> That would hide the problem that a thread is exiting with locked muto, but 
> not fix it.

See exit_thread and cgf-000017 in DevNotes.  This setup deliberately
locks the muto and then calls ExitThread.  The signal handler is
supposed to unlock the muto when the __SIGTHREADEXIT signal comes in,
but then it happens that it doesn't for some reason.  It seems the
problem here is that the SIGALRM is filling up the signal pipe so
the __SIGTHREADEXIT signal is not actually delivered.  I have a local
workaround, but it seems to open a can of worms.

I'm going to take a step back for now, and reevaluate what happens
before trying to apply even more hacks.  Ultimately the problem is that
the cygtls area is accessed from other threads (mainly the signal
thread) without locking, and worse, that the lock for the cygtls area is
a member of _cygtls itself.  The latter needs certainly a patch, and I'm
contemplating to extend cygheap::threadlist to become a per-thread
structure containing the _cygtls pointer, the thread ID, the main thread
HANDLE, and the tls muto.  This should allow to serialize access to the
cygtls area in a way which avoids the aforementioned problems without
a complete redesign.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat