From mboxrd@z Thu Jan 1 00:00:00 1970 From: "David Baggett" To: "Tristan Savatier" , Subject: Re: pthreads comments and suggestions [was: Re: Handle leak ?] Date: Sun, 23 Jul 2000 18:01:00 -0000 Message-id: <022f01bff50a$5178c2d0$9f01a8c0@itasoftware.com> References: <018401bff2a5$d78a1660$9f01a8c0@itasoftware.com> <3977F6E8.C0977C06@mpegtv.com> <01ed01bff357$a15db300$9f01a8c0@itasoftware.com> X-SW-Source: 2000/msg00071.html ----- Original Message ----- From: "Tristan Savatier" To: "David Baggett" Sent: Friday, July 21, 2000 6:19 PM Subject: Re: [pthread-win32] Re: pthreads comments and suggestions [was: Re: Handle leak ?] I suspect that, for the most part, we are in violent agreement. > Yes, but there is no way that threadH == 0 when pthread_create > returns with success. Right. I agree that, looking at the source, there is no obvious way that could happen. Furthermore, I'm pretty sure I put asserts in pthread_create to trap this case, and didn't see it happen even in cases where the guard was triggered. While this doesn't exonerate pthread_create, it certainly makes it unlikely that it is at fault. > So this only thing that is possible is that it gets > trashed later by the application (or by a bug in ANY > library). Since a tiny program with no other libraries or dependencies can exhibit the bug, I don't really suspect anything but the DLL itself, although I suppose it could be the compiler. I only tried this with MSVC++5 and MSVC++6. Let me be clear that it's quite possible that the bug is not *only* the fault of the DLL. It could be the interaction of aspects of or bugs in Windows NT with the DLL. (Remember that putting severe stress on the O/S causes the bug to manifest itself much more frequently.) I'll also point out two things that may or may not be relevant, but which are certainly scary about NT: 1) Run a test program that creates lots of threads. Call up the task manager. While your program is running, grab a slider on a window and slide it back and forth rapidly. Notice how thread creation drops to zero as the system hiccups. What's up with that? I have no idea, but it doesn't make me feel too warm and fuzzy. 2) If you grep Deja News at www.deja.com you will find messages from knowledgeable people claiming that the NT scheduler has various problems under high-load situations. Obviously many multithreaded apps do work under NT in practice. But perhaps the pthreads DLL is doing something that typical applications do not, which is getting NT into states that have not been extensively tested. I don't pretend to know what the bug might be; I'm just pointing out that there are possible problems at many levels. > If that happens, it is VERY BAD, and you little fix would probably > not improve the situation significantly. I agree with your sentiment that it looks like a memory corruption bug, that my workaround is not something you want to put in place and then forget about, and that it is at best ameliorating a VERY BAD bug. However, the fact is that in practice the guard either stops or significantly slows the handle leakage and crashes, which allowed me to run my (larger) application for long periods of time without any of the ill-effects seen previously, so in practice it does actually improve things significantly. This could be because the memory corruption is occuring very soon before the guarded code. Without knowing the source of the bug, it is impossible to draw any sound conclusions. At the least, it's a useful stopgap until the bug can be found and fixed, and may provide valuable clues about where the bug lies. And as I said, I wish I'd had the time and wherewithall to actually track the bug down and fix it, but I didn't. This was for a commercial application so the schedule was too rigid to allow further debugging (after two weeks of putting assertions in the code and trying random things). I moved the application to Linux rather than agonize further over it. But I'm glad someone else is concerned about it. That may mean it will get fixed, which would be a very good thing. You can grep the logs of the mailing list for my email address to find my previous postings on this topic, where I posted code and more detailed findings. Dave