From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 24454 invoked by alias); 5 Apr 2005 16:03:20 -0000 Mailing-List: contact pthreads-win32-help@sources.redhat.com; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: pthreads-win32-owner@sources.redhat.com Received: (qmail 24333 invoked from network); 5 Apr 2005 16:03:01 -0000 Received: from unknown (HELO quokka.dot.net.au) (202.147.68.16) by sourceware.org with SMTP; 5 Apr 2005 16:03:01 -0000 Received: from [202.147.67.24] (helo=ip-67-24.dot.net.au) by quokka.dot.net.au with esmtp (Exim 3.35 #1 (Debian)) id 1DIqW4-0002ZD-00 for ; Wed, 06 Apr 2005 02:03:00 +1000 Subject: Re: pthreads-w32 2.2.0 test failures From: Ross Johnson To: Pthreads-Win32 list In-Reply-To: <42523837.1060309@btinternet.com> References: <1E2E66102E75104D8C740340EBCD9867144A37@tomoex.tomotherapy.com> <42523837.1060309@btinternet.com> Content-Type: text/plain Date: Tue, 05 Apr 2005 16:03:00 -0000 Message-Id: <1112716985.15352.423.camel@desk.home> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit X-SW-Source: 2005/txt/msg00063.txt.bz2 On Tue, 2005-04-05 at 08:03 +0100, Steve Croall wrote: > FYI I'm running pthreads on a number of multi-CPU machines. My twin :( > And three 8-Ways in the office. It's also been run on a 32-way and it > has been given a damn good thrashing. Fantastic! Thanks for posting. > I'm a bit concerned about the pthread_once() bug though. Have you a > test application that shows this problem or are the test applications > enough to show this? The bug is identified from code inspection. I have to thank Gottlob Frege for pointing out that the starvation problem is still there, only shifted. I'm referring to version 2 of the library (not version 1) and I actually have an experimental version 3 (which fixes the bug I believe) in a CVS branch. Changing pthread_once(), if it's wrong, tends to require ABI changes because of PTHREAD_ONCE_INIT. You need 3 conditions before the bug becomes a threat (only need the first 2 on a single processor machine). They are: - a possibility that the once_routine can be cancelled; AND, - threads with different priorities accessing the same once_control; AND - no other available CPUs that the lower priority threads can run on. If you look at the code in version 2.2.0 and consider what happens if the once_routine is cancelled, you'll see that newly arriving threads, and any currently waiting threads compete again to run the once_routine. The winner must reset both a flag and a manual reset event to cause other threads to wait again. But if the winner suspends before completing this then there's an opportunity for some higher priority thread to begin busy looping and keep the winner (once_routine thread) from ever resuming. This may not even be a problem at all if Windows promotes threads caught in this situation. I've read that it does this by incrementing a thread's priority by 1 each time it misses a turn. This may only be in some situations though. For the record: Gottlob provided an efficient working version without once_routine cancellability. I wanted to take the opportunity to conform to SUS v3 and add cancellability. That complicated things a little. The experimental version 3 is similar to version 2 in order to retain the fast uncontended track. Current options for fixing the bug are: - change the current manual reset event, that threads wait on, into an auto reset event, and have each waking thread set it to wake the next waiting thread; OR - add priority inheritance, to ensure the once_routine thread always gets a turn. Both of these options are only necessary in the post cancellation logic. I'm not real keen on daisy chained event setting because of the cumulative effects, while priority inheritance is a standard way to solve priority inversion and starvation problems. I hope to have version 3 out soon. Ross