Hi, I tried fixing that last problem (semaphore leak when multiple threads are cancelled and the initialization is never completed) but it appears the reference counting approach is inherently flawed. So I had to re-implement the whole thing from scratch. This implementation uses MCS locks which are not only fast but also require no clean-up thus avoiding the above problem. The implementation of MCS locks, which I had to build, probably deserves a module (.c and .h) of its own since we may want to use them in other places. This implementation has passed all the tests (again on a single CPU only) and I hope it is finally bug-free now. Pls, let me know what you think. --vlad PS. Sorry it takes me so many iterations. I feel like I have been spamming you with too many versions of this routine. > -----Original Message----- > From: pthreads-win32-owner@sources.redhat.com [mailto:pthreads-win32- > owner@sources.redhat.com] On Behalf Of Vladimir Kliatchko > Sent: Saturday, May 28, 2005 10:30 AM > To: 'Ross Johnson' > Cc: 'Gottlob Frege'; pthreads-win32@sources.redhat.com > Subject: RE: New pthread_once implementation > > What do you think of the attached implementation? I am still analyzing it, > but it passes the tests and appears to be free of that problem. It does > have > one minor glitch though: > If two threads come in, the semaphore is created. If both are cancelled > and > no new calls a made to finish the job, the semaphore is never destroyed. > I am not sure how big a deal this is. > > Re. optimizations: Great, I will try to do something. > > Thnx, > --vlad > > > -----Original Message----- > > From: pthreads-win32-owner@sources.redhat.com [mailto:pthreads-win32- > > owner@sources.redhat.com] On Behalf Of Ross Johnson > > Sent: Saturday, May 28, 2005 9:55 AM > > To: Vladimir Kliatchko > > Cc: 'Gottlob Frege'; Pthreads-Win32 list > > Subject: RE: New pthread_once implementation > > > > On Sat, 2005-05-28 at 06:51 -0400, Vladimir Kliatchko wrote: > > > > > > > -----Original Message----- > > > > From: pthreads-win32-owner@sources.redhat.com [mailto:pthreads- > win32- > > > > owner@sources.redhat.com] On Behalf Of Ross Johnson > > > > Sent: Friday, May 27, 2005 11:48 PM > > > > To: Vladimir Kliatchko > > > > Cc: 'Gottlob Frege'; Pthreads-Win32 list > > > > Subject: RE: New pthread_once implementation > > > > > > > > On Fri, 2005-05-27 at 21:30 -0400, Vladimir Kliatchko wrote: > > > > > Nice catch. Let me see if I can fix it. > > > > > > > > > > Note that the same problem exists in the currently released event- > > based > > > > > implementation (cvs version 1.16): > > > > > > > > > > thread1 comes in, start initing > > > > > thread2 creates event, starts waiting > > > > > thread3 comes in starts waiting > > > > > thread1 is cancelled, signals event > > > > > thread2 wakes up, proceeds to the point right before the > resetEvent > > > > > thread3 wakes up, closes event handle > > > > > thread2 resets closed handle > > > > > > > > Relies on HANDLE uniqueness and assumes that an error will result. > > This > > > > is why the 2.6.0 version (and earlier) checks the return code and > > > > restores Win32 LastError if necessary - for GetLastError > transparency. > > > > > > Does Windows guarantee that the handles are not reused? What happens > if > > a > > > thread closes a handle while another thread is blocked on it? Is any > of > > this > > > in Microsoft documentation? Consider the following scenario for the > > > event-based implementation: > > > > Well, apparently they're not unique when recycled, so there is a bug > > here to fix in both versions: > > http://msdn.microsoft.com/library/default.asp?url=/library/en- > > us/dngenlib/html/msdn_handles1.asp > > [Under "Native Windows NT Objects"] > > "Unlike the handles that are maintained by the Win32 USER and GDI > > subsystem components, handles to native objects under Windows NT are not > > unique; that is, upon destruction of an object, the corresponding handle > > may be recycled and will look exactly like the handle to the destroyed > > object." > > > > But they are local to the process, rather than system wide if that > > helps. > > > > > > > Also, regarding my previous comment to Ross about very high cost > of > > > > using > > > > > InterlockedExchangeAdd for MBR: > > > > > I did some simple benchmarking. Running pthread_once 50,000,000 on > > my > > > > pretty > > > > > slow single CPU machine takes about 2.1 seconds. Replacing > > > > > InterlockedExchangeAdd with simple read brings it down to 0.6 > > seconds. > > > > This > > > > > looks significant. > > > > > > > > Using the PTW32_INTERLOCKED_COMPARE_EXCHANGE macro as in your latest > > (in > > > > CVS) version and building the library for inlined functions (nmake > VC- > > > > inlined) and x86 architecture causes customised versions of > > > > InterlockedCompareExchange to be used, and this results in inlined > > asm. > > > > Same for PTW32_INTERLOCKED_EXCHANGE. > > > > > > > > Also, on single-CPU x86, the library dynamically switches to using > > > > 'cmpxchg' rather than 'lock cmpxchg' to avoid locking the bus. This > > > > appears to match what the kernel32.dll versions do. On non-x86 > > > > architectures the kernel32.dll versions are called, with call > > overhead. > > > > > > > > PTW32_INTERLOCKED_EXCHANGE_ADD could be added, as could other > > > > architectures. See ptw32_InterlockedCompareExchange.c > > > > > > I have rerun my benchmark with VC-inline. The difference is now less > > > significant 0.9 vs 0.6 but still noticeable. I guess cmpxchg even > > without > > > locking is quite expensive. On multi-CPU systems the difference should > > be > > > much higher due to the time it takes to lock the bus and to the > > contention > > > it may cause. It sounded as if you did not care much to try to > optimize > > it. > > > I did not mean to suggest that we have to do it right now either. I > just > > > wanted to get your opinion on whether we want to deal with this in the > > > future. > > > > By all means include any optimisation you think is worthwhile. I was > > just pointing out that the difference isn't necessarily 2.1 v 0.6. > >