From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 13694 invoked by alias); 13 Apr 2005 07:45:14 -0000 Mailing-List: contact pthreads-win32-help@sources.redhat.com; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: pthreads-win32-owner@sources.redhat.com Received: (qmail 13593 invoked from network); 13 Apr 2005 07:44:58 -0000 Received: from unknown (HELO canyonero.dot.net.au) (202.147.68.14) by sourceware.org with SMTP; 13 Apr 2005 07:44:58 -0000 Received: from [203.129.48.56] (helo=ppp-48-56.grapevine.net.au) by canyonero.dot.net.au with esmtp (Exim 3.35 #1 (Debian)) id 1DLcYS-0003NB-00 for ; Wed, 13 Apr 2005 17:44:56 +1000 Subject: Re: New pthreads-w32 releases available: versions 2.3.0 and 1.7.0 From: Ross Johnson To: Pthreads-Win32 list In-Reply-To: References: Content-Type: text/plain Date: Wed, 13 Apr 2005 07:45:00 -0000 Message-Id: <1113378289.9335.83.camel@desk.home> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit X-SW-Source: 2005/txt/msg00069.txt.bz2 On Tue, 2005-04-12 at 12:10 +0200, Alexander Terekhov wrote: > [... pthread_once() ...] > > I'm not inclined to check the code at the moment, but I can tell > you that robust approach to priority problems is to use locks with > priority protocols on the them. Trying to optimize-out mutex > doesn't make much sense here since you need it on slow path (once > per thread per once_control instance at most) only. Hi Alexander, [For anyone new to this issue: http://sources.redhat.com/ml/pthreads-win32/2005/msg00034.html ] As Tim Theisen has just kindly reported ( http://sources.redhat.com/ml/pthreads-win32/2005/msg00068.html ), both of the once_routine cancellation tests have failed on his MP system (no problem on my UP system though), so you may yet prevail. It's hard to argue when I don't have a system that I can properly test this stuff on. So I'm now hoping that someone might take a look and point out the error and also, that the error is retrievable, because, even though your model is logically very attractive and robust, there are still aspects of it that concern me and compel me to look at alternatives. I'll list them now: 1 - OVERHEAD: I could live with this if I absolutely had to, but it does make sense to me to try to optimise the uncontended path - although, not if it can't be made to work reliably, obviously. The create/open named mutex seems like a disproportionate overhead for something that will probably complete uncontested in most situations. Here's a table showing uncontested speeds (from tests/benchtest1.c) on my system (Windows 2000, 2400MHz): benchtest1 ============================================================================= Lock plus unlock on an unlocked mutex. 10000000 iterations Using Total(msec) average(usec) ----------------------------------------------------------------------------- Dummy call x 2 78 0.008 InterlockedOp x 2 140 0.014 Simple Critical Section 140 0.014 Old PT Mutex using a Critical Section (WNT) 281 0.028 Old PT Mutex using a Win32 Mutex (W9x) 18265 1.827 ............................................................................. PTHREAD_MUTEX_DEFAULT (W9x,WNT) 297 0.030 PTHREAD_MUTEX_NORMAL (W9x,WNT) 296 0.030 PTHREAD_MUTEX_ERRORCHECK (W9x,WNT) 1312 0.131 PTHREAD_MUTEX_RECURSIVE (W9x,WNT) 1328 0.133 ============================================================================= The (W9x) in the description for Win32 Mutex timing just means that this was type of object used by older versions of the library when running on Win9x systems. As you can see, Win32 mutexes are 61 times slower than the POSIX default mutex type, and 130 times slower than two Interlocked operations or simple (enter+leave) critical section. 2 - SECURITY: This makes me especially reluctant, given that users may not be aware of the library's internals. I haven't looked in detail at, and have no prior experience with, security of Windows objects, but I think I can justify caution. I don't think it's possible to guarantee that a 'named' mutex is not vulnerable across all Windows systems that use the library. The name must be unique (easy enough to do), but it's not possible to absolutely prevent other processes from opening and locking the named mutex accidentally or maliciously. I imagine that, while it's unlikely (but not impossible) for another process to take the lock and stop the init_routine from running, it might be possible to take a place in the queue and ultimately prevent other legitimate waiters from ever continuing. According to the MS documentation, WinCE and Win95/98 ignore the security parameter (and so I assume no security) - and namespaces for named objects were introduced even later I think (Win 2000?). The MSDN CreateMutex web page also includes the following note that implies even more overhead and complexity: "If you are using a named mutex to limit your application to a single instance, a malicious user can create this mutex before you do and prevent your application from starting. To prevent this situation, create a randomly named mutex and store the name so that it can only be obtained by an authorized user. Alternatively, you can use a file for this purpose. To limit your application to one instance per user, create a locked file in the user's profile directory." Now, I could have investigated further and further, but I decided to try to find an alternative first. Something that avoids all of these 'whatifs'. I want to be able to give users some assurances when they use the library. 3 - TLS: To use this I believe I need to make pthread_once_t an opaque pointer or something similar (and PTHREAD_ONCE_INIT an auto-init magic token). This adds still more overhead to the 'use once' object. 4 - MBR (alternative to TLS): Not a concern - more a response: >From various URLs, (http://groups.yahoo.com/group/boost/message/15442) I see that you don't trust Windows Interlocked operations to provide proper MBR semantics. I recall someone at Microsoft writing that the unadorned Interlocked routines (those not specifically acquire or release) insert a full memory barrier on systems that support it. Anyway, those are the questions that occurred to me at the time, as I logged the following ChangeLog entry: 2005-03-13 Ross Johnson * pthread_once.c (pthread_once): Completely redesigned; a change was required to the ABI (pthread_once_t_), and resulting in a version compatibility index increment. NOTES: The design (based on pseudo code contributed by Gottlob Frege) avoids creating a kernel object if there is no contention. See URL for details:- http://sources.redhat.com/ml/pthreads-win32/2005/msg00029.html This uses late initialisation similar to the technique already used for pthreads-win32 mutexes and semaphores (from Alexander Terekhov). The subsequent cancelation cleanup additions (by rpj) could not be implemented without sacrificing some of the efficiency in Gottlob's design. In particular, although each once_control uses it's own event to block on, a global CS is required to manage it - since the event must be either re-usable or re-creatable under cancelation. This is not needed in the non-cancelable design because it is able to mark the event as closed (forever). When uncontested, a CS operation is equivalent to an Interlocked operation in speed. So, in the final design with cancelability, an uncontested once_control operation involves a minimum of five interlocked operations (including the LeaveCS operation). ALTERNATIVES: An alternative design from Alexander Terekhov proposed using a named mutex, as sketched below:- if (!once_control) { // May be in TLS named_mutex::guard guard(&once_control2); if (!once_control2) { once_control2 = true; } once_control = true; } A more detailed description of this can be found here:- http://groups.yahoo.com/group/boost/message/15442 [Although the definition of a suitable PTHREAD_ONCE_INIT precludes use of the TLS located flag, this is not critical.] There are three primary concerns though:- 1) The [named] mutex is 'created' even in the uncontended case. 2) A system wide unique name must be generated. 3) Win32 mutexes are VERY slow even in the uncontended case. An uncontested Win32 mutex lock operation can be 50 (or more) times slower than an uncontested EnterCS operation. Ultimately, the named mutex trick is making use of the global locks maintained by the kernel. Regards. Ross