From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 4820 invoked by alias); 28 May 2005 14:29:36 -0000 Mailing-List: contact pthreads-win32-help@sources.redhat.com; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: pthreads-win32-owner@sources.redhat.com Received: (qmail 4810 invoked by uid 22791); 28 May 2005 14:29:30 -0000 Received: from mta8.srv.hcvlny.cv.net (HELO mta8.srv.hcvlny.cv.net) (167.206.4.203) by sourceware.org (qpsmtpd/0.30-dev) with ESMTP; Sat, 28 May 2005 14:29:30 +0000 Received: from vbook (ool-182dac10.dyn.optonline.net [24.45.172.16]) by mta8.srv.hcvlny.cv.net (iPlanet Messaging Server 5.2 HotFix 1.25 (built Mar 3 2004)) with ESMTP id <0IH700C2ZEV237@mta8.srv.hcvlny.cv.net> for pthreads-win32@sources.redhat.com; Sat, 28 May 2005 10:28:15 -0400 (EDT) Date: Sat, 28 May 2005 14:29:00 -0000 From: Vladimir Kliatchko Subject: RE: New pthread_once implementation In-reply-to: <1117288359.787.143.camel@desk.home> To: 'Ross Johnson' Cc: 'Gottlob Frege' , pthreads-win32@sources.redhat.com Message-id: <0IH700C30EV237@mta8.srv.hcvlny.cv.net> MIME-version: 1.0 Content-type: multipart/mixed; boundary="Boundary_(ID_c640MrZfwc+i4MUn7ifO7g)" X-SW-Source: 2005/txt/msg00099.txt.bz2 This is a multi-part message in MIME format. --Boundary_(ID_c640MrZfwc+i4MUn7ifO7g) Content-type: text/plain; charset=us-ascii Content-transfer-encoding: 7BIT Content-length: 4787 What do you think of the attached implementation? I am still analyzing it, but it passes the tests and appears to be free of that problem. It does have one minor glitch though: If two threads come in, the semaphore is created. If both are cancelled and no new calls a made to finish the job, the semaphore is never destroyed. I am not sure how big a deal this is. Re. optimizations: Great, I will try to do something. Thnx, --vlad > -----Original Message----- > From: pthreads-win32-owner@sources.redhat.com [mailto:pthreads-win32- > owner@sources.redhat.com] On Behalf Of Ross Johnson > Sent: Saturday, May 28, 2005 9:55 AM > To: Vladimir Kliatchko > Cc: 'Gottlob Frege'; Pthreads-Win32 list > Subject: RE: New pthread_once implementation > > On Sat, 2005-05-28 at 06:51 -0400, Vladimir Kliatchko wrote: > > > > > -----Original Message----- > > > From: pthreads-win32-owner@sources.redhat.com [mailto:pthreads-win32- > > > owner@sources.redhat.com] On Behalf Of Ross Johnson > > > Sent: Friday, May 27, 2005 11:48 PM > > > To: Vladimir Kliatchko > > > Cc: 'Gottlob Frege'; Pthreads-Win32 list > > > Subject: RE: New pthread_once implementation > > > > > > On Fri, 2005-05-27 at 21:30 -0400, Vladimir Kliatchko wrote: > > > > Nice catch. Let me see if I can fix it. > > > > > > > > Note that the same problem exists in the currently released event- > based > > > > implementation (cvs version 1.16): > > > > > > > > thread1 comes in, start initing > > > > thread2 creates event, starts waiting > > > > thread3 comes in starts waiting > > > > thread1 is cancelled, signals event > > > > thread2 wakes up, proceeds to the point right before the resetEvent > > > > thread3 wakes up, closes event handle > > > > thread2 resets closed handle > > > > > > Relies on HANDLE uniqueness and assumes that an error will result. > This > > > is why the 2.6.0 version (and earlier) checks the return code and > > > restores Win32 LastError if necessary - for GetLastError transparency. > > > > Does Windows guarantee that the handles are not reused? What happens if > a > > thread closes a handle while another thread is blocked on it? Is any of > this > > in Microsoft documentation? Consider the following scenario for the > > event-based implementation: > > Well, apparently they're not unique when recycled, so there is a bug > here to fix in both versions: > http://msdn.microsoft.com/library/default.asp?url=/library/en- > us/dngenlib/html/msdn_handles1.asp > [Under "Native Windows NT Objects"] > "Unlike the handles that are maintained by the Win32 USER and GDI > subsystem components, handles to native objects under Windows NT are not > unique; that is, upon destruction of an object, the corresponding handle > may be recycled and will look exactly like the handle to the destroyed > object." > > But they are local to the process, rather than system wide if that > helps. > > > > > Also, regarding my previous comment to Ross about very high cost of > > > using > > > > InterlockedExchangeAdd for MBR: > > > > I did some simple benchmarking. Running pthread_once 50,000,000 on > my > > > pretty > > > > slow single CPU machine takes about 2.1 seconds. Replacing > > > > InterlockedExchangeAdd with simple read brings it down to 0.6 > seconds. > > > This > > > > looks significant. > > > > > > Using the PTW32_INTERLOCKED_COMPARE_EXCHANGE macro as in your latest > (in > > > CVS) version and building the library for inlined functions (nmake VC- > > > inlined) and x86 architecture causes customised versions of > > > InterlockedCompareExchange to be used, and this results in inlined > asm. > > > Same for PTW32_INTERLOCKED_EXCHANGE. > > > > > > Also, on single-CPU x86, the library dynamically switches to using > > > 'cmpxchg' rather than 'lock cmpxchg' to avoid locking the bus. This > > > appears to match what the kernel32.dll versions do. On non-x86 > > > architectures the kernel32.dll versions are called, with call > overhead. > > > > > > PTW32_INTERLOCKED_EXCHANGE_ADD could be added, as could other > > > architectures. See ptw32_InterlockedCompareExchange.c > > > > I have rerun my benchmark with VC-inline. The difference is now less > > significant 0.9 vs 0.6 but still noticeable. I guess cmpxchg even > without > > locking is quite expensive. On multi-CPU systems the difference should > be > > much higher due to the time it takes to lock the bus and to the > contention > > it may cause. It sounded as if you did not care much to try to optimize > it. > > I did not mean to suggest that we have to do it right now either. I just > > wanted to get your opinion on whether we want to deal with this in the > > future. > > By all means include any optimisation you think is worthwhile. I was > just pointing out that the difference isn't necessarily 2.1 v 0.6. > --Boundary_(ID_c640MrZfwc+i4MUn7ifO7g) Content-type: text/plain; name=vk_pthread_once4.c Content-transfer-encoding: 7BIT Content-disposition: attachment; filename=vk_pthread_once4.c Content-length: 3358 #define PTHREAD_ONCE_INIT {0, 0, 0, 0} enum ptw32_once_state { PTW32_ONCE_INIT = 0x0, PTW32_ONCE_STARTED = 0x1, PTW32_ONCE_DONE = 0x2 }; struct pthread_once_t_ { int state; int reserved; int numSemaphoreUsers; HANDLE semaphore; }; static void PTW32_CDECL ptw32_once_on_init_cancel (void * arg) { pthread_once_t * once_control = (pthread_once_t *) arg; (void) PTW32_INTERLOCKED_EXCHANGE((LPLONG)&once_control->state, (LONG)PTW32_ONCE_INIT); if (InterlockedExchangeAdd((LPLONG)&once_control->semaphore, 0L)) /* MBR fence */ { ReleaseSemaphore(once_control->semaphore, 1, NULL); } } static int ptw32_once (pthread_once_t * once_control, void (*init_routine) (void)) { int state; HANDLE sema; while ((state = PTW32_INTERLOCKED_COMPARE_EXCHANGE( (PTW32_INTERLOCKED_LPLONG)&once_control->state, (PTW32_INTERLOCKED_LONG)PTW32_ONCE_STARTED, (PTW32_INTERLOCKED_LONG)PTW32_ONCE_INIT)) != PTW32_ONCE_DONE) { if (PTW32_ONCE_INIT == state) { #ifdef _MSC_VER #pragma inline_depth(0) #endif pthread_cleanup_push(ptw32_once_on_init_cancel, (void *) once_control); (*init_routine)(); pthread_cleanup_pop(0); #ifdef _MSC_VER #pragma inline_depth() #endif (void) PTW32_INTERLOCKED_EXCHANGE((LPLONG)&once_control->state, (LONG)PTW32_ONCE_DONE); /* * we didn't create the semaphore. * it is only there if there is someone waiting. */ if (InterlockedExchangeAdd((LPLONG)&once_control->semaphore, 0L)) /* MBR fence */ { ReleaseSemaphore(once_control->semaphore, once_control->numSemaphoreUsers, NULL); } } else { InterlockedIncrement((LPLONG)&once_control->numSemaphoreUsers); if (!InterlockedExchangeAdd((LPLONG)&once_control->semaphore, 0L)) /* MBR fence */ { sema = CreateSemaphore(NULL, 0, INT_MAX, NULL); if (PTW32_INTERLOCKED_COMPARE_EXCHANGE( (PTW32_INTERLOCKED_LPLONG)&once_control->semaphore, (PTW32_INTERLOCKED_LONG)sema, (PTW32_INTERLOCKED_LONG)0)) { CloseHandle(sema); } } /* * Check 'state' again in case the initting thread has finished or cancelled and left before seeing that there was a semaphore. */ if (InterlockedExchangeAdd((LPLONG)&once_control->state, 0L) == PTW32_ONCE_STARTED) { WaitForSingleObject(once_control->semaphore, INFINITE); } InterlockedDecrement((LPLONG)&once_control->numSemaphoreUsers); } } if (0 == InterlockedExchangeAdd((LPLONG)&once_control->numSemaphoreUsers, 0)) /* MBR */ { if ((sema = (HANDLE) PTW32_INTERLOCKED_EXCHANGE( (LPLONG)&once_control->semaphore, (LONG)0))) { CloseHandle(sema); } } return 0; } int pthread_once (pthread_once_t * once_control, void (*init_routine) (void)) { int result; if (once_control == NULL || init_routine == NULL) { result = EINVAL; } else { if (InterlockedExchangeAdd((LPLONG)&once_control->state, 0L) != PTW32_ONCE_DONE) /* MBR */ { result = ptw32_once(once_control, init_routine); } else { result = 0; } } return (result); } /* pthread_once */ --Boundary_(ID_c640MrZfwc+i4MUn7ifO7g)--