From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 16612 invoked by alias); 8 Mar 2005 06:11:16 -0000 Mailing-List: contact pthreads-win32-help@sources.redhat.com; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: pthreads-win32-owner@sources.redhat.com Received: (qmail 16410 invoked from network); 8 Mar 2005 06:10:59 -0000 Received: from unknown (HELO canyonero.dot.net.au) (202.147.68.14) by sourceware.org with SMTP; 8 Mar 2005 06:10:59 -0000 Received: from [203.24.160.100] (helo=ip-203-24-160-100.dot.net.au) by canyonero.dot.net.au with esmtp (Exim 3.35 #1 (Debian)) id 1D8Xvl-0007NL-00 for ; Tue, 08 Mar 2005 17:10:57 +1100 Subject: Re: starvation in pthread_once? From: Ross Johnson To: Pthreads-Win32 list In-Reply-To: <97ffb31050307190021ec7667@mail.gmail.com> References: <97ffb3105030301323c1bae1@mail.gmail.com> <1109978309.8332.70.camel@desk.home> <1110248521.6900.44.camel@desk.home> <97ffb31050307190021ec7667@mail.gmail.com> Content-Type: text/plain Date: Tue, 08 Mar 2005 06:11:00 -0000 Message-Id: <1110262256.6900.118.camel@desk.home> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit X-SW-Source: 2005/txt/msg00030.txt.bz2 On Mon, 2005-03-07 at 22:00 -0500, Gottlob Frege wrote: > I was thinking of something like this: (haven't compiled it yet or anything...) > This is definitely the more interesting solution. A slight downside is the ABI change (pthread_once_t and PTHREAD_ONCE_INIT). So this is what I'd like to do:- - because of the ABI change, I will put out a new snapshot today containing the version I posted earlier (no ABI change and fixes the starvation problem) plus the other bug fix (handle leak fix). - after that, include the version below in CVS after testing it, ready for the next snapshot, which will have a new dll name (pthreadVC2.dll etc.) and compatibility version number. - generate another [new] snapshot in a few days if no complications or other issues arise in the meantime. One other thing:- since pthread_once() is being redesigned, I'm also looking at the latest SUS3 spec and trying to decide what the following paragraph means for threads that are already waiting:- "The pthread_once() function is not a cancelation point. However, if init_routine is a cancelation point and is canceled, the effect on once_control shall be as if pthread_once() was never called." Should one or more waiting threads get woken up and made to re-compete to run the init routine? That's the only sensible (maybe even obvious) thing to do AFAIKS. Ross > if (flag.initted) // quick and easy check > { > return; > } > > if (!AtomicExchange(&flag.entry, true)) // no one was in before us? > { > if (func) > { > func(); > } > > AtomicExchange(&initted, true); // needs release type memory barrier > > // we didn't create the event. > // it is only there if there is someone waiting > if (AtomicRead(&flag.event)) > { > SetEvent(flag.event); > } > } > else > { > // wait for init. > // while waiting, create an event to wait on > > AtomicIncrement(&flag.eventUsers); // mark that we have committed to > creating an event > > if (!AtomicRead(&flag.event)) > { > tmpEvent = CreateEvent(true, false); // manual reset (ie will never be reset) > if (AtomicCompareExchange(&flag.event, 0, tmpEvent) == 0) // > release memory barrier > { > // wasn't set, is now > } > else > { > // someone snuck in > // get rid of our attempt > CloseHandle(tmpEvent); > } > } > > // check initted again in case the initting thread has finished > // and left before seeing that there was an event to trigger. > // (Now that the event IS created, if init gets finished AFTER this, > // then the event handle is guaranteed to be seen and triggered). > // > // Note also, that our first 'quick and easy' init check did NOT use > atomic read; > // this atomic read here ensures that initted is properly seen by > the CPU cache. > // And once seen as 'true' it will always stay true, so no need to > worry about the cache anymore. > // (Thus the first 'quick and easy' check might 'falsely' fail, but > it gets corrected here. > > if (!AtomicRead(&flag.initted)) // needs acquire type memory barrier > { > WaitEvent(flag.event); // acquire type memory barrier > } > > // last one out shut off the lights: > if (AtomicDecrement(&flag.eventUsers) == 0) // we were last > { > HANDLE tmpEvent = AtomicExchange(&flag.event, (HANDLE)-1); > if (tmpEvent && tmpEvent != (HANDLE)-1); > { > CloseHandle(tmpEvent); > } > } > } > > > > On Tue, 08 Mar 2005 13:22:01 +1100, Ross Johnson > wrote: > > All, > > > > I've redesigned pthread_once() and will shortly drop it into CVS and a > > new snapshot. Please take a look over the code below and let me know of > > any problems or improvements you find. The rationale is explained in the > > comments. > > > > This new version passes a new test I've added to the test suite that > > exercises the routine much more intensively than previously (although it > > doesn't play around with thread priorities yet). > > > > Thanks. > > Ross > > > > /* > > * Use a single global cond+mutex to manage access to all once_control objects. > > * Unlike a global mutex on it's own, the global cond+mutex allows faster > > * once_controls to overtake slower ones. Spurious wakeups can occur, but > > * can be tolerated. > > * > > * To maintain a separate mutex for each once_control object requires either > > * cleaning up the mutex (difficult to synchronise reliably), or leaving it > > * around forever. Since we can't make assumptions about how an application might > > * employ once_t objects, the later is considered to be unacceptable. > > * > > * Since this is being introduced as a bug fix, the global cond+mtx also avoids > > * a change in the ABI, maintaining backwards compatibility. > > * > > * The mutex should be an ERRORCHECK type to be sure we will never, in the event > > * we're cancelled before we get the lock, unlock the mutex when it's held by > > * another thread (possible with NORMAL/DEFAULT mutexes because they don't check > > * ownership). > > */ > > > > if (!once_control->done) > > { > > if (InterlockedExchange((LPLONG) &once_control->started, (LONG) 0) == -1) > > { > > (*init_routine) (); > > > > /* > > * Holding the mutex during the broadcast prevents threads being left > > * behind waiting. > > */ > > pthread_cleanup_push(pthread_mutex_unlock, (void *) &ptw32_once_control.mtx); > > (void) pthread_mutex_lock(&ptw32_once_control.mtx); > > once_control->done = PTW32_TRUE; > > (void) pthread_cond_broadcast(&ptw32_once_control.cond); > > pthread_cleanup_pop(1); > > } > > else > > { > > pthread_cleanup_push(pthread_mutex_unlock, (void *) &ptw32_once_control.mtx); > > (void) pthread_mutex_lock(&ptw32_once_control.mtx); > > while (!once_control->done) > > { > > (void) pthread_cond_wait(&ptw32_once_control.cond, &ptw32_once_control.mtx); > > } > > pthread_cleanup_pop(1); > > } > > } > > > > On Sat, 2005-03-05 at 10:18 +1100, Ross Johnson wrote: > > > On Thu, 2005-03-03 at 04:32 -0500, Gottlob Frege wrote: > > > > I'm concerned about the Sleep(0) in pthread_once: > > > > > > > > > > Thanks. It looks like this routine needs to be redesigned. > > > > > > Regards. > > > Ross > > > > > > > if (!once_control->done) > > > > { > > > > if (InterlockedIncrement (&(once_control->started)) == 0) > > > > { > > > > /* > > > > * First thread to increment the started variable > > > > */ > > > > (*init_routine) (); > > > > once_control->done = PTW32_TRUE; > > > > > > > > } > > > > else > > > > { > > > > /* > > > > * Block until other thread finishes executing the onceRoutine > > > > */ > > > > while (!(once_control->done)) > > > > { > > > > /* > > > > * The following gives up CPU cycles without pausing > > > > * unnecessarily > > > > */ > > > > Sleep (0); > > > > } > > > > } > > > > } > > > > > > > > IIRC, Sleep(0) does not relinquish time slices to lower priority > > > > threads. (Sleep(n) for n != 0 does, but 0 does not). So, if a lower > > > > priority thread is first in, followed closely by a higher priority > > > > one, the higher priority thread will spin on Sleep(0) *forever* > > > > because the lower, first thread will never get a chance to set done. > > > > > > > > So even Sleep(10) should be good enough. In theory, there could be > > > > enough higher priority threads in the system that the first thread > > > > still doesn't get in (ever?!), but unlikely. And that would probably > > > > mean a general design flaw of the calling code, not pthread_once. > > > > > > > > ? > > > > > > > > >