From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <pthreads-win32-return-987-listarch-pthreads-win32=sources.redhat.com@sources.redhat.com>
Received: (qmail 13694 invoked by alias); 13 Apr 2005 07:45:14 -0000
Mailing-List: contact pthreads-win32-help@sources.redhat.com; run by ezmlm
Precedence: bulk
List-Subscribe: <mailto:pthreads-win32-subscribe@sources.redhat.com>
List-Archive: <http://sources.redhat.com/ml/pthreads-win32/>
List-Post: <mailto:pthreads-win32@sources.redhat.com>
List-Help: <mailto:pthreads-win32-help@sources.redhat.com>, <http://sources.redhat.com/ml/#faqs>
Sender: pthreads-win32-owner@sources.redhat.com
Received: (qmail 13593 invoked from network); 13 Apr 2005 07:44:58 -0000
Received: from unknown (HELO canyonero.dot.net.au) (202.147.68.14)
  by sourceware.org with SMTP; 13 Apr 2005 07:44:58 -0000
Received: from [203.129.48.56] (helo=ppp-48-56.grapevine.net.au)
	by canyonero.dot.net.au with esmtp (Exim 3.35 #1 (Debian))
	id 1DLcYS-0003NB-00
	for <pthreads-win32@sources.redhat.com>; Wed, 13 Apr 2005 17:44:56 +1000
Subject: Re: New pthreads-w32 releases available: versions 2.3.0 and 1.7.0
From: Ross Johnson <RossJohnson@homemail.com.au>
To: Pthreads-Win32 list <pthreads-win32@sources.redhat.com>
In-Reply-To: <OF71257A66.8271DD9E-ONC1256FE1.0036A008-C1256FE1.0037C3E4@de.ibm.com>
References:
	 <OF71257A66.8271DD9E-ONC1256FE1.0036A008-C1256FE1.0037C3E4@de.ibm.com>
Content-Type: text/plain
Date: Wed, 13 Apr 2005 07:45:00 -0000
Message-Id: <1113378289.9335.83.camel@desk.home>
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
X-SW-Source: 2005/txt/msg00069.txt.bz2

On Tue, 2005-04-12 at 12:10 +0200, Alexander Terekhov wrote: 
> [... pthread_once() ...]
> 
> I'm not inclined to check the code at the moment, but I can tell
> you that robust approach to priority problems is to use locks with
> priority protocols on the them. Trying to optimize-out mutex
> doesn't make much sense here since you need it on slow path (once
> per thread per once_control instance at most) only.

Hi Alexander,

[For anyone new to this issue:
http://sources.redhat.com/ml/pthreads-win32/2005/msg00034.html
]

As Tim Theisen has just kindly reported
( http://sources.redhat.com/ml/pthreads-win32/2005/msg00068.html ), both
of the once_routine cancellation tests have failed on his MP system (no
problem on my UP system though), so you may yet prevail. It's hard to
argue when I don't have a system that I can properly test this stuff on.

So I'm now hoping that someone might take a look and point out the error
and also, that the error is retrievable, because, even though your model
is logically very attractive and robust, there are still aspects of it
that concern me and compel me to look at alternatives. I'll list them
now:

1 - OVERHEAD:
I could live with this if I absolutely had to, but it does make sense to
me to try to optimise the uncontended path - although, not if it can't
be made to work reliably, obviously.

The create/open named mutex seems like a disproportionate overhead for
something that will probably complete uncontested in most situations.
Here's a table showing uncontested speeds (from tests/benchtest1.c) on
my system (Windows 2000, 2400MHz):

benchtest1
=============================================================================

Lock plus unlock on an unlocked mutex.
10000000 iterations

Using                                              Total(msec)   average(usec)
-----------------------------------------------------------------------------
Dummy call x 2                                             78           0.008
InterlockedOp x 2                                         140           0.014
Simple Critical Section                                   140           0.014
Old PT Mutex using a Critical Section (WNT)               281           0.028
Old PT Mutex using a Win32 Mutex (W9x)                  18265           1.827
.............................................................................
PTHREAD_MUTEX_DEFAULT (W9x,WNT)                           297           0.030
PTHREAD_MUTEX_NORMAL (W9x,WNT)                            296           0.030
PTHREAD_MUTEX_ERRORCHECK (W9x,WNT)                       1312           0.131
PTHREAD_MUTEX_RECURSIVE (W9x,WNT)                        1328           0.133
=============================================================================

The (W9x) in the description for Win32 Mutex timing just means that this
was type of object used by older versions of the library when running on
Win9x systems. As you can see, Win32 mutexes are 61 times slower than
the POSIX default mutex type, and 130 times slower than two Interlocked
operations or simple (enter+leave) critical section.


2 - SECURITY:
This makes me especially reluctant, given that users may not be aware of
the library's internals. I haven't looked in detail at, and have no
prior experience with, security of Windows objects, but I think I can
justify caution.

I don't think it's possible to guarantee that a 'named' mutex is not
vulnerable across all Windows systems that use the library. The name
must be unique (easy enough to do), but it's not possible to absolutely
prevent other processes from opening and locking the named mutex
accidentally or maliciously. I imagine that, while it's unlikely (but
not impossible) for another process to take the lock and stop the
init_routine from running, it might be possible to take a place in the
queue and ultimately prevent other legitimate waiters from ever
continuing.

According to the MS documentation, WinCE and Win95/98 ignore the
security parameter (and so I assume no security) - and namespaces for
named objects were introduced even later I think (Win 2000?).

The MSDN CreateMutex web page also includes the following note that
implies even more overhead and complexity:

"If you are using a named mutex to limit your application to a single
instance, a malicious user can create this mutex before you do and
prevent your application from starting. To prevent this situation,
create a randomly named mutex and store the name so that it can only be
obtained by an authorized user. Alternatively, you can use a file for
this purpose. To limit your application to one instance per user, create
a locked file in the user's profile directory."

Now, I could have investigated further and further, but I decided to try
to find an alternative first. Something that avoids all of these
'whatifs'. I want to be able to give users some assurances when they use
the library.

3 - TLS:
To use this I believe I need to make pthread_once_t an opaque pointer or
something similar (and PTHREAD_ONCE_INIT an auto-init magic token). This
adds still more overhead to the 'use once' object.

4 - MBR (alternative to TLS):
Not a concern - more a response:
>From various URLs, (http://groups.yahoo.com/group/boost/message/15442) I
see that you don't trust Windows Interlocked operations to provide
proper MBR semantics. I recall someone at Microsoft writing that the
unadorned Interlocked routines (those not specifically acquire or
release) insert a full memory barrier on systems that support it.


Anyway, those are the questions that occurred to me at the time, as I
logged the following ChangeLog entry:


2005-03-13 Ross Johnson <rpj at callisto.canberra.edu.au> 

* pthread_once.c (pthread_once): Completely redesigned; a change was
required to the ABI (pthread_once_t_), and resulting in a version
compatibility index increment.

NOTES:
The design (based on pseudo code contributed by Gottlob Frege) avoids
creating a kernel object if there is no contention. See URL for details:-
http://sources.redhat.com/ml/pthreads-win32/2005/msg00029.html
This uses late initialisation similar to the technique already used for
pthreads-win32 mutexes and semaphores (from Alexander Terekhov).

The subsequent cancelation cleanup additions (by rpj) could not be implemented
without sacrificing some of the efficiency in Gottlob's design. In particular,
although each once_control uses it's own event to block on, a global CS is
required to manage it - since the event must be either re-usable or
re-creatable under cancelation. This is not needed in the non-cancelable
design because it is able to mark the event as closed (forever).

When uncontested, a CS operation is equivalent to an Interlocked operation
in speed. So, in the final design with cancelability, an uncontested
once_control operation involves a minimum of five interlocked operations
(including the LeaveCS operation).
	
ALTERNATIVES:
An alternative design from Alexander Terekhov proposed using a named mutex,
as sketched below:-

  if (!once_control) { // May be in TLS
    named_mutex::guard guard(&once_control2);
      if (!once_control2) {
         <init>
         once_control2 = true;
      }
    once_control = true;
  }
	
A more detailed description of this can be found here:-
http://groups.yahoo.com/group/boost/message/15442

[Although the definition of a suitable PTHREAD_ONCE_INIT precludes use of the
TLS located flag, this is not critical.]

There are three primary concerns though:-
1) The [named] mutex is 'created' even in the uncontended case.
2) A system wide unique name must be generated.
3) Win32 mutexes are VERY slow even in the uncontended 	case. An uncontested
Win32 mutex lock operation can be 50 (or more) times slower than an
uncontested EnterCS operation.

Ultimately, the named mutex trick is making use of the global locks maintained
by the kernel.


Regards.
Ross