What do you think of the attached implementation? I am still analyzing it, but it passes the tests and appears to be free of that problem. It does have one minor glitch though: If two threads come in, the semaphore is created. If both are cancelled and no new calls a made to finish the job, the semaphore is never destroyed. I am not sure how big a deal this is. Re. optimizations: Great, I will try to do something. Thnx, --vlad > -----Original Message----- > From: pthreads-win32-owner@sources.redhat.com [mailto:pthreads-win32- > owner@sources.redhat.com] On Behalf Of Ross Johnson > Sent: Saturday, May 28, 2005 9:55 AM > To: Vladimir Kliatchko > Cc: 'Gottlob Frege'; Pthreads-Win32 list > Subject: RE: New pthread_once implementation > > On Sat, 2005-05-28 at 06:51 -0400, Vladimir Kliatchko wrote: > > > > > -----Original Message----- > > > From: pthreads-win32-owner@sources.redhat.com [mailto:pthreads-win32- > > > owner@sources.redhat.com] On Behalf Of Ross Johnson > > > Sent: Friday, May 27, 2005 11:48 PM > > > To: Vladimir Kliatchko > > > Cc: 'Gottlob Frege'; Pthreads-Win32 list > > > Subject: RE: New pthread_once implementation > > > > > > On Fri, 2005-05-27 at 21:30 -0400, Vladimir Kliatchko wrote: > > > > Nice catch. Let me see if I can fix it. > > > > > > > > Note that the same problem exists in the currently released event- > based > > > > implementation (cvs version 1.16): > > > > > > > > thread1 comes in, start initing > > > > thread2 creates event, starts waiting > > > > thread3 comes in starts waiting > > > > thread1 is cancelled, signals event > > > > thread2 wakes up, proceeds to the point right before the resetEvent > > > > thread3 wakes up, closes event handle > > > > thread2 resets closed handle > > > > > > Relies on HANDLE uniqueness and assumes that an error will result. > This > > > is why the 2.6.0 version (and earlier) checks the return code and > > > restores Win32 LastError if necessary - for GetLastError transparency. > > > > Does Windows guarantee that the handles are not reused? What happens if > a > > thread closes a handle while another thread is blocked on it? Is any of > this > > in Microsoft documentation? Consider the following scenario for the > > event-based implementation: > > Well, apparently they're not unique when recycled, so there is a bug > here to fix in both versions: > http://msdn.microsoft.com/library/default.asp?url=/library/en- > us/dngenlib/html/msdn_handles1.asp > [Under "Native Windows NT Objects"] > "Unlike the handles that are maintained by the Win32 USER and GDI > subsystem components, handles to native objects under Windows NT are not > unique; that is, upon destruction of an object, the corresponding handle > may be recycled and will look exactly like the handle to the destroyed > object." > > But they are local to the process, rather than system wide if that > helps. > > > > > Also, regarding my previous comment to Ross about very high cost of > > > using > > > > InterlockedExchangeAdd for MBR: > > > > I did some simple benchmarking. Running pthread_once 50,000,000 on > my > > > pretty > > > > slow single CPU machine takes about 2.1 seconds. Replacing > > > > InterlockedExchangeAdd with simple read brings it down to 0.6 > seconds. > > > This > > > > looks significant. > > > > > > Using the PTW32_INTERLOCKED_COMPARE_EXCHANGE macro as in your latest > (in > > > CVS) version and building the library for inlined functions (nmake VC- > > > inlined) and x86 architecture causes customised versions of > > > InterlockedCompareExchange to be used, and this results in inlined > asm. > > > Same for PTW32_INTERLOCKED_EXCHANGE. > > > > > > Also, on single-CPU x86, the library dynamically switches to using > > > 'cmpxchg' rather than 'lock cmpxchg' to avoid locking the bus. This > > > appears to match what the kernel32.dll versions do. On non-x86 > > > architectures the kernel32.dll versions are called, with call > overhead. > > > > > > PTW32_INTERLOCKED_EXCHANGE_ADD could be added, as could other > > > architectures. See ptw32_InterlockedCompareExchange.c > > > > I have rerun my benchmark with VC-inline. The difference is now less > > significant 0.9 vs 0.6 but still noticeable. I guess cmpxchg even > without > > locking is quite expensive. On multi-CPU systems the difference should > be > > much higher due to the time it takes to lock the bus and to the > contention > > it may cause. It sounded as if you did not care much to try to optimize > it. > > I did not mean to suggest that we have to do it right now either. I just > > wanted to get your opinion on whether we want to deal with this in the > > future. > > By all means include any optimisation you think is worthwhile. I was > just pointing out that the difference isn't necessarily 2.1 v 0.6. >