From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <pthreads-win32-return-981-listarch-pthreads-win32=sources.redhat.com@sources.redhat.com>
Received: (qmail 24454 invoked by alias); 5 Apr 2005 16:03:20 -0000
Mailing-List: contact pthreads-win32-help@sources.redhat.com; run by ezmlm
Precedence: bulk
List-Subscribe: <mailto:pthreads-win32-subscribe@sources.redhat.com>
List-Archive: <http://sources.redhat.com/ml/pthreads-win32/>
List-Post: <mailto:pthreads-win32@sources.redhat.com>
List-Help: <mailto:pthreads-win32-help@sources.redhat.com>, <http://sources.redhat.com/ml/#faqs>
Sender: pthreads-win32-owner@sources.redhat.com
Received: (qmail 24333 invoked from network); 5 Apr 2005 16:03:01 -0000
Received: from unknown (HELO quokka.dot.net.au) (202.147.68.16)
  by sourceware.org with SMTP; 5 Apr 2005 16:03:01 -0000
Received: from [202.147.67.24] (helo=ip-67-24.dot.net.au)
	by quokka.dot.net.au with esmtp (Exim 3.35 #1 (Debian))
	id 1DIqW4-0002ZD-00
	for <pthreads-win32@sources.redhat.com>; Wed, 06 Apr 2005 02:03:00 +1000
Subject: Re: pthreads-w32 2.2.0 test failures
From: Ross Johnson <rpj@callisto.canberra.edu.au>
To: Pthreads-Win32 list <pthreads-win32@sources.redhat.com>
In-Reply-To: <42523837.1060309@btinternet.com>
References: <1E2E66102E75104D8C740340EBCD9867144A37@tomoex.tomotherapy.com>
	 <42523837.1060309@btinternet.com>
Content-Type: text/plain
Date: Tue, 05 Apr 2005 16:03:00 -0000
Message-Id: <1112716985.15352.423.camel@desk.home>
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
X-SW-Source: 2005/txt/msg00063.txt.bz2

On Tue, 2005-04-05 at 08:03 +0100, Steve Croall wrote: 
> FYI I'm running pthreads on a number of multi-CPU machines.  My twin :( 
>   And three 8-Ways in the office.  It's also been run on a 32-way and it 
> has been given a damn good thrashing.

Fantastic! Thanks for posting.

> I'm a bit concerned about the pthread_once() bug though.   Have you a 
> test application that shows this problem or are the test applications 
> enough to show this?

The bug is identified from code inspection. I have to thank Gottlob
Frege for pointing out that the starvation problem is still there, only
shifted.

I'm referring to version 2 of the library (not version 1) and I actually
have an experimental version 3 (which fixes the bug I believe) in a CVS
branch. Changing pthread_once(), if it's wrong, tends to require ABI
changes because of PTHREAD_ONCE_INIT.

You need 3 conditions before the bug becomes a threat (only need the
first 2 on a single processor machine). They are:
- a possibility that the once_routine can be cancelled; AND,
- threads with different priorities accessing the same once_control; AND
- no other available CPUs that the lower priority threads can run on.

If you look at the code in version 2.2.0 and consider what happens if
the once_routine is cancelled, you'll see that newly arriving threads,
and any currently waiting threads compete again to run the once_routine.
The winner must reset both a flag and a manual reset event to cause
other threads to wait again. But if the winner suspends before
completing this then there's an opportunity for some higher priority
thread to begin busy looping and keep the winner (once_routine thread)
from ever resuming.

This may not even be a problem at all if Windows promotes threads caught
in this situation. I've read that it does this by incrementing a
thread's priority by 1 each time it misses a turn. This may only be in
some situations though.

For the record:
Gottlob provided an efficient working version without once_routine
cancellability. I wanted to take the opportunity to conform to SUS v3
and add cancellability. That complicated things a little.

The experimental version 3 is similar to version 2 in order to retain
the fast uncontended track. Current options for fixing the bug are:
- change the current manual reset event, that threads wait on, into an
auto reset event, and have each waking thread set it to wake the next
waiting thread; OR
- add priority inheritance, to ensure the once_routine thread always
gets a turn.

Both of these options are only necessary in the post cancellation logic.
I'm not real keen on daisy chained event setting because of the
cumulative effects, while priority inheritance is a standard way to
solve priority inversion and starvation problems.

I hope to have version 3 out soon.

Ross