public inbox for pthreads-win32@sourceware.org
 help / color / mirror / Atom feed
* Bug update
@ 2000-03-04 14:23 Dave Baggett
  0 siblings, 0 replies; only message in thread
From: Dave Baggett @ 2000-03-04 14:23 UTC (permalink / raw)
  To: pthreads-win32

After several days of trial and error, I believe I've made some progress on this.

It seems that DllMain can get called with fdwReason == DLL_THREAD_DETACH
when a thread has threadH == 0. This seems to happen very, very rarely, which is
one reason this bug might not have revealed itself until now. Looking at the code,
I see no way that threadH could be zero for a thread that has been successfully
created, unless the call to _beginthread[ex] got far enough to create the thread,
but then failed for some other reason (something bad happening in pthread_threadStart,
for example). In any case,  this code in dll.c causes some memory corruption:

  pthread_setspecific (_pthread_selfThreadKey, NULL);
  _pthread_threadDestroy (self);

I don't know yet which of these two calls actually causes the problem. I guarded
both like so:

    if (self->threadH) {
        pthread_setspecific (_pthread_selfThreadKey, NULL);
        _pthread_threadDestroy (self);
    }

and this seems to prevent the crashes. I've run my test program for 15 hours and it
hasn't crashed. However, it still (very suspiciously) leaks handles slowly. After 15
hours, it has lost 10847 of them. Of course, this program is creating millions of
threads, accelerating the handle leakage as much as possible. It may be very hard
to detect this handle leakage in any real-world programs. Also, since I'm not calling
_pthread_threadDestroy(self) when threadH == 0, one would expect memory some
memory leakage there. However, this seems to be a very rare occurrence, so in
practice makes no difference. (The program's memory usage is still under 2MB.)

I have also run my (much more complicated) real application with the patched DLL,
and it too seems to run happily for many hours.

I don't yet fully understand what's going on in either the pthread_setspecific or
pthread_callUserDestroyRoutines, so I don't know exactly who's to blame. It's also
quite possible that my 15 crash-free hours are simple luck. Until I understand the
bug fully, I won't rest easy. But this seems like something some of you who are
more familiar with the code might be able to work from.

BTW, I did verify that the program crashes on another NT machine. I.e., it's a real bug,
not my machine.

Dave

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2000-03-04 14:23 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-03-04 14:23 Bug update Dave Baggett

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).