From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 17169 invoked by alias); 28 May 2005 13:54:42 -0000 Mailing-List: contact pthreads-win32-help@sources.redhat.com; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: pthreads-win32-owner@sources.redhat.com Received: (qmail 17158 invoked by uid 22791); 28 May 2005 13:54:36 -0000 Received: from canyonero.dot.net.au (HELO canyonero.dot.net.au) (202.147.68.14) by sourceware.org (qpsmtpd/0.30-dev) with ESMTP; Sat, 28 May 2005 13:54:36 +0000 Received: from [203.129.42.10] (helo=ppp-42-10.grapevine.net.au) by canyonero.dot.net.au with esmtp (Exim 3.35 #1 (Debian)) id 1Dc1jx-0004hm-00; Sat, 28 May 2005 23:52:38 +1000 Subject: RE: New pthread_once implementation From: Ross Johnson To: Vladimir Kliatchko Cc: 'Gottlob Frege' , Pthreads-Win32 list In-Reply-To: <0IH70010T4SR3C@mta10.srv.hcvlny.cv.net> References: <0IH70010T4SR3C@mta10.srv.hcvlny.cv.net> Content-Type: text/plain Date: Sat, 28 May 2005 13:54:00 -0000 Message-Id: <1117288359.787.143.camel@desk.home> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit X-SW-Source: 2005/txt/msg00098.txt.bz2 On Sat, 2005-05-28 at 06:51 -0400, Vladimir Kliatchko wrote: > > > -----Original Message----- > > From: pthreads-win32-owner@sources.redhat.com [mailto:pthreads-win32- > > owner@sources.redhat.com] On Behalf Of Ross Johnson > > Sent: Friday, May 27, 2005 11:48 PM > > To: Vladimir Kliatchko > > Cc: 'Gottlob Frege'; Pthreads-Win32 list > > Subject: RE: New pthread_once implementation > > > > On Fri, 2005-05-27 at 21:30 -0400, Vladimir Kliatchko wrote: > > > Nice catch. Let me see if I can fix it. > > > > > > Note that the same problem exists in the currently released event-based > > > implementation (cvs version 1.16): > > > > > > thread1 comes in, start initing > > > thread2 creates event, starts waiting > > > thread3 comes in starts waiting > > > thread1 is cancelled, signals event > > > thread2 wakes up, proceeds to the point right before the resetEvent > > > thread3 wakes up, closes event handle > > > thread2 resets closed handle > > > > Relies on HANDLE uniqueness and assumes that an error will result. This > > is why the 2.6.0 version (and earlier) checks the return code and > > restores Win32 LastError if necessary - for GetLastError transparency. > > Does Windows guarantee that the handles are not reused? What happens if a > thread closes a handle while another thread is blocked on it? Is any of this > in Microsoft documentation? Consider the following scenario for the > event-based implementation: Well, apparently they're not unique when recycled, so there is a bug here to fix in both versions: http://msdn.microsoft.com/library/default.asp?url=/library/en- us/dngenlib/html/msdn_handles1.asp [Under "Native Windows NT Objects"] "Unlike the handles that are maintained by the Win32 USER and GDI subsystem components, handles to native objects under Windows NT are not unique; that is, upon destruction of an object, the corresponding handle may be recycled and will look exactly like the handle to the destroyed object." But they are local to the process, rather than system wide if that helps. > > > Also, regarding my previous comment to Ross about very high cost of > > using > > > InterlockedExchangeAdd for MBR: > > > I did some simple benchmarking. Running pthread_once 50,000,000 on my > > pretty > > > slow single CPU machine takes about 2.1 seconds. Replacing > > > InterlockedExchangeAdd with simple read brings it down to 0.6 seconds. > > This > > > looks significant. > > > > Using the PTW32_INTERLOCKED_COMPARE_EXCHANGE macro as in your latest (in > > CVS) version and building the library for inlined functions (nmake VC- > > inlined) and x86 architecture causes customised versions of > > InterlockedCompareExchange to be used, and this results in inlined asm. > > Same for PTW32_INTERLOCKED_EXCHANGE. > > > > Also, on single-CPU x86, the library dynamically switches to using > > 'cmpxchg' rather than 'lock cmpxchg' to avoid locking the bus. This > > appears to match what the kernel32.dll versions do. On non-x86 > > architectures the kernel32.dll versions are called, with call overhead. > > > > PTW32_INTERLOCKED_EXCHANGE_ADD could be added, as could other > > architectures. See ptw32_InterlockedCompareExchange.c > > I have rerun my benchmark with VC-inline. The difference is now less > significant 0.9 vs 0.6 but still noticeable. I guess cmpxchg even without > locking is quite expensive. On multi-CPU systems the difference should be > much higher due to the time it takes to lock the bus and to the contention > it may cause. It sounded as if you did not care much to try to optimize it. > I did not mean to suggest that we have to do it right now either. I just > wanted to get your opinion on whether we want to deal with this in the > future. By all means include any optimisation you think is worthwhile. I was just pointing out that the difference isn't necessarily 2.1 v 0.6.