* New pthreads-w32 releases available: versions 2.3.0 and 1.7.0 @ 2005-04-12 7:46 Ross Johnson 2005-04-12 10:10 ` Alexander Terekhov 0 siblings, 1 reply; 6+ messages in thread From: Ross Johnson @ 2005-04-12 7:46 UTC (permalink / raw) To: Pthreads-Win32 list Announcing two new releases of pthreads-w32:- pthreads-w32-2-3-0-release pthreads-w32-1-7-0-release Packages are available in self-unpacking zip files (.exe) and gzipped tar files (.tar.gz) as usual. See http://sources.redhat.com/pthreads-win32/ or go directly to: ftp://sources.redhat.com/pub/pthreads-win32/ Red Hat have a low ftp concurrent user limit. Mirrors are at (available as they update): http://sources.redhat.com/mirrors.html These releases hopefully fix all known problems with pthread_once() in both versions 1 and 2 of pthreads-win32. In particular, the starvation problem that potentially arises after an init_routine cancellation has been resolved using momentary priority boosting. If it proves to be robust then there will be no need for a version 3 release as previously implied (at least, not to fix pthread_once()). The additional work of managing thread priorities inside of pthread_once has been kept out of the normal (cancellation-free) pathways so that the additional normal path overhead is almost nil (i.e. introduces no additional bus locking or cache coherence operations if cancellation- free). The functionally and behaviour of versions 1.7 and 2.3 should be logically identical. However, the version 2 pthread_once implementation (based on code posted by Gottlob Frege) is much more efficient. RELEASE 2.3.0 ------------- (2005-04-12) General ------- Release 1.7.0 is the backport of features and bug fixes new in this release. See earlier notes under Release 2.0.0/General. Bugs fixed ---------- * Fixed pthread_once potential for post once_routine cancellation hanging due to starvation. See comments in pthread_once.c. Momentary priority boosting is used to ensure that, after a once_routine is cancelled, the thread that will run the once_routine is not starved by higher priority waiting threads at critical times. Priority boosting occurs only AFTER a once_routine cancellation, and is applied only to that once_control. The once_routine is run at the thread's normal base priority. New tests --------- * once4.c: Aggressively tests pthread_once() under realtime conditions using threads with varying priorities. Windows' random priority boosting does not occur for threads with realtime priority levels. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: New pthreads-w32 releases available: versions 2.3.0 and 1.7.0 2005-04-12 7:46 New pthreads-w32 releases available: versions 2.3.0 and 1.7.0 Ross Johnson @ 2005-04-12 10:10 ` Alexander Terekhov 2005-04-13 7:45 ` Ross Johnson 0 siblings, 1 reply; 6+ messages in thread From: Alexander Terekhov @ 2005-04-12 10:10 UTC (permalink / raw) To: Ross Johnson; +Cc: Pthreads-Win32 list [... pthread_once() ...] I'm not inclined to check the code at the moment, but I can tell you that robust approach to priority problems is to use locks with priority protocols on the them. Trying to optimize-out mutex doesn't make much sense here since you need it on slow path (once per thread per once_control instance at most) only. Variation of DCSI (either MBR or TLS) with named mutex is the way to go. regards, alexander. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: New pthreads-w32 releases available: versions 2.3.0 and 1.7.0 2005-04-12 10:10 ` Alexander Terekhov @ 2005-04-13 7:45 ` Ross Johnson 2005-04-13 8:21 ` Gottlob Frege 2005-04-14 16:19 ` Alexander Terekhov 0 siblings, 2 replies; 6+ messages in thread From: Ross Johnson @ 2005-04-13 7:45 UTC (permalink / raw) To: Pthreads-Win32 list On Tue, 2005-04-12 at 12:10 +0200, Alexander Terekhov wrote: > [... pthread_once() ...] > > I'm not inclined to check the code at the moment, but I can tell > you that robust approach to priority problems is to use locks with > priority protocols on the them. Trying to optimize-out mutex > doesn't make much sense here since you need it on slow path (once > per thread per once_control instance at most) only. Hi Alexander, [For anyone new to this issue: http://sources.redhat.com/ml/pthreads-win32/2005/msg00034.html ] As Tim Theisen has just kindly reported ( http://sources.redhat.com/ml/pthreads-win32/2005/msg00068.html ), both of the once_routine cancellation tests have failed on his MP system (no problem on my UP system though), so you may yet prevail. It's hard to argue when I don't have a system that I can properly test this stuff on. So I'm now hoping that someone might take a look and point out the error and also, that the error is retrievable, because, even though your model is logically very attractive and robust, there are still aspects of it that concern me and compel me to look at alternatives. I'll list them now: 1 - OVERHEAD: I could live with this if I absolutely had to, but it does make sense to me to try to optimise the uncontended path - although, not if it can't be made to work reliably, obviously. The create/open named mutex seems like a disproportionate overhead for something that will probably complete uncontested in most situations. Here's a table showing uncontested speeds (from tests/benchtest1.c) on my system (Windows 2000, 2400MHz): benchtest1 ============================================================================= Lock plus unlock on an unlocked mutex. 10000000 iterations Using Total(msec) average(usec) ----------------------------------------------------------------------------- Dummy call x 2 78 0.008 InterlockedOp x 2 140 0.014 Simple Critical Section 140 0.014 Old PT Mutex using a Critical Section (WNT) 281 0.028 Old PT Mutex using a Win32 Mutex (W9x) 18265 1.827 ............................................................................. PTHREAD_MUTEX_DEFAULT (W9x,WNT) 297 0.030 PTHREAD_MUTEX_NORMAL (W9x,WNT) 296 0.030 PTHREAD_MUTEX_ERRORCHECK (W9x,WNT) 1312 0.131 PTHREAD_MUTEX_RECURSIVE (W9x,WNT) 1328 0.133 ============================================================================= The (W9x) in the description for Win32 Mutex timing just means that this was type of object used by older versions of the library when running on Win9x systems. As you can see, Win32 mutexes are 61 times slower than the POSIX default mutex type, and 130 times slower than two Interlocked operations or simple (enter+leave) critical section. 2 - SECURITY: This makes me especially reluctant, given that users may not be aware of the library's internals. I haven't looked in detail at, and have no prior experience with, security of Windows objects, but I think I can justify caution. I don't think it's possible to guarantee that a 'named' mutex is not vulnerable across all Windows systems that use the library. The name must be unique (easy enough to do), but it's not possible to absolutely prevent other processes from opening and locking the named mutex accidentally or maliciously. I imagine that, while it's unlikely (but not impossible) for another process to take the lock and stop the init_routine from running, it might be possible to take a place in the queue and ultimately prevent other legitimate waiters from ever continuing. According to the MS documentation, WinCE and Win95/98 ignore the security parameter (and so I assume no security) - and namespaces for named objects were introduced even later I think (Win 2000?). The MSDN CreateMutex web page also includes the following note that implies even more overhead and complexity: "If you are using a named mutex to limit your application to a single instance, a malicious user can create this mutex before you do and prevent your application from starting. To prevent this situation, create a randomly named mutex and store the name so that it can only be obtained by an authorized user. Alternatively, you can use a file for this purpose. To limit your application to one instance per user, create a locked file in the user's profile directory." Now, I could have investigated further and further, but I decided to try to find an alternative first. Something that avoids all of these 'whatifs'. I want to be able to give users some assurances when they use the library. 3 - TLS: To use this I believe I need to make pthread_once_t an opaque pointer or something similar (and PTHREAD_ONCE_INIT an auto-init magic token). This adds still more overhead to the 'use once' object. 4 - MBR (alternative to TLS): Not a concern - more a response: From various URLs, (http://groups.yahoo.com/group/boost/message/15442) I see that you don't trust Windows Interlocked operations to provide proper MBR semantics. I recall someone at Microsoft writing that the unadorned Interlocked routines (those not specifically acquire or release) insert a full memory barrier on systems that support it. Anyway, those are the questions that occurred to me at the time, as I logged the following ChangeLog entry: 2005-03-13 Ross Johnson <rpj at callisto.canberra.edu.au> * pthread_once.c (pthread_once): Completely redesigned; a change was required to the ABI (pthread_once_t_), and resulting in a version compatibility index increment. NOTES: The design (based on pseudo code contributed by Gottlob Frege) avoids creating a kernel object if there is no contention. See URL for details:- http://sources.redhat.com/ml/pthreads-win32/2005/msg00029.html This uses late initialisation similar to the technique already used for pthreads-win32 mutexes and semaphores (from Alexander Terekhov). The subsequent cancelation cleanup additions (by rpj) could not be implemented without sacrificing some of the efficiency in Gottlob's design. In particular, although each once_control uses it's own event to block on, a global CS is required to manage it - since the event must be either re-usable or re-creatable under cancelation. This is not needed in the non-cancelable design because it is able to mark the event as closed (forever). When uncontested, a CS operation is equivalent to an Interlocked operation in speed. So, in the final design with cancelability, an uncontested once_control operation involves a minimum of five interlocked operations (including the LeaveCS operation). ALTERNATIVES: An alternative design from Alexander Terekhov proposed using a named mutex, as sketched below:- if (!once_control) { // May be in TLS named_mutex::guard guard(&once_control2); if (!once_control2) { <init> once_control2 = true; } once_control = true; } A more detailed description of this can be found here:- http://groups.yahoo.com/group/boost/message/15442 [Although the definition of a suitable PTHREAD_ONCE_INIT precludes use of the TLS located flag, this is not critical.] There are three primary concerns though:- 1) The [named] mutex is 'created' even in the uncontended case. 2) A system wide unique name must be generated. 3) Win32 mutexes are VERY slow even in the uncontended case. An uncontested Win32 mutex lock operation can be 50 (or more) times slower than an uncontested EnterCS operation. Ultimately, the named mutex trick is making use of the global locks maintained by the kernel. Regards. Ross ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: New pthreads-w32 releases available: versions 2.3.0 and 1.7.0 2005-04-13 7:45 ` Ross Johnson @ 2005-04-13 8:21 ` Gottlob Frege 2005-04-14 16:19 ` Alexander Terekhov 1 sibling, 0 replies; 6+ messages in thread From: Gottlob Frege @ 2005-04-13 8:21 UTC (permalink / raw) To: Ross Johnson; +Cc: Pthreads-Win32 list On 4/13/05, Ross Johnson <RossJohnson@homemail.com.au> wrote: > [...] > So I'm now hoping that someone might take a look and point out the error > and also, that the error is retrievable > [...] Yeah, I still have 'take another crack at call_once' on my list of things to do. If we end up needing to mess with thread priorities anyhow (although only in the cancel case), it might be easiest to just boost everyone's priority right away, which was my first version (Alex might remember seeing it on comp.programming.threads). Something like (from memory, and with interlocks, etc, left out): oldpriority = GetPriority(); SetPriority(MAX); // must do this before checking initted - otherwise could be too late! if (!started++) { do_init(); initted = true; } else { SetPriority(oldPriority); // note that oldPriority may coincidentally be MAX while( !initted) { Sleep(1); } } Very much like the old original version, but since we mess with the priorities, we don't have to worry about starvation (even the worse case where a waiter's original priority is MAX, the init still gets a share of time (assuming a reasonable scheduler)) From this version, adding cancellation / exception handling should be easy. But I never liked this version much: - not sure of the overhead of SetPriority - not sure of portability (stuff like this is inherently not very portable, and pthreads-win32 obviously doesn't care, but I care for my purposes) - just 'inelegant' (polling, etc...) ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: New pthreads-w32 releases available: versions 2.3.0 and 1.7.0 2005-04-13 7:45 ` Ross Johnson 2005-04-13 8:21 ` Gottlob Frege @ 2005-04-14 16:19 ` Alexander Terekhov 2005-04-15 3:08 ` Ross Johnson 1 sibling, 1 reply; 6+ messages in thread From: Alexander Terekhov @ 2005-04-14 16:19 UTC (permalink / raw) To: Ross Johnson; +Cc: Pthreads-Win32 list [... named mutex and atrocities of MS impl ...] It's not that hard to implement it using a map with pointers (to "normal" mutex + refcount structure and addresses of once_control variables as keys). [... Windows Interlocked stuff ...] In the meantime, MS has documented that they are fully fenced (newer Acq/Rel stuff aside for a moment). But you'll need acquire semantics for check on fast path. Not a problem on IA32 (where all loads do have acquire semantics), but on IA64 (lets pretend that it's still alive ;-) ), you'll need a bit of assembly to make it fast. regards, alexander. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: New pthreads-w32 releases available: versions 2.3.0 and 1.7.0 2005-04-14 16:19 ` Alexander Terekhov @ 2005-04-15 3:08 ` Ross Johnson 0 siblings, 0 replies; 6+ messages in thread From: Ross Johnson @ 2005-04-15 3:08 UTC (permalink / raw) To: Pthreads-Win32 list UPDATE: The latest change to pthread_once.c (current CVS head) passes all the tests on an MP system, particularly the two tests that failed with release 2.3.0 (each was run hundreds of times). Thanks to Tim Theisen for running the tests. This means that people can have more confidence in 2.4.0 when it comes out (version 1.7.0 still needs to be confirmed but it uses a different model again - and the original change that caused v2 to fail doesn't apply to v1). On Thu, 2005-04-14 at 18:18 +0200, Alexander Terekhov wrote: > [... named mutex and atrocities of MS impl ...] > > It's not that hard to implement it using a map with pointers (to > "normal" mutex + refcount structure and addresses of once_control > variables as keys). I won't be rushing a version 3 release now unless more problems require it, but I will pursue this. It would be nice to leave the priority hacking to the kernel. As it's turned out, v1 and v2 can be fixed rather than abandoned, as far as this problem is concerned at least. Other as yet unknown bugs aside, the version differences are now just in efficiency and, in most applications, should not even be noticeable. Thanks. Ross ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2005-04-15 3:08 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2005-04-12 7:46 New pthreads-w32 releases available: versions 2.3.0 and 1.7.0 Ross Johnson 2005-04-12 10:10 ` Alexander Terekhov 2005-04-13 7:45 ` Ross Johnson 2005-04-13 8:21 ` Gottlob Frege 2005-04-14 16:19 ` Alexander Terekhov 2005-04-15 3:08 ` Ross Johnson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).