From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 10473 invoked by alias); 18 Nov 2004 16:26:46 -0000 Mailing-List: contact pthreads-win32-help@sources.redhat.com; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: pthreads-win32-owner@sources.redhat.com Received: (qmail 10450 invoked from network); 18 Nov 2004 16:26:40 -0000 Received: from unknown (HELO ExchSvr.taquote) (65.173.225.254) by sourceware.org with SMTP; 18 Nov 2004 16:26:40 -0000 Received: by EXCHSVR with Internet Mail Service (5.5.2650.21) id ; Thu, 18 Nov 2004 11:26:49 -0500 Message-ID: <715D092C2A9AD411B4DD006097C6EE8401482C8F@EXCHSVR> From: Alex Kotliarov To: "'pthreads-win32@sources.redhat.com'" Subject: pthread_cond_broadcast(...) leads to a deadlock Date: Thu, 18 Nov 2004 16:26:00 -0000 MIME-Version: 1.0 Content-Type: text/plain X-SW-Source: 2004/txt/msg00153.txt.bz2 Hi, - my application that uses one "producer" thread and N "consumer" threads, where N > 2, locks up and it seems like there is a problem in implementation of the condition variable. - the app locks up if I use pthread_cond_broadcast(...) to unblock waiting "consumers" - the app does not lock if pthread_cond_signal(...) is used - code that causes deadlock pocedure: ptw32_cond_wait_cleanup(...) - CV's external mutex gets locked immediately upon entering the procedure. - It must be locked before exiting the procedure, after "semBlockLock" - bin.semaphore - has been posted. - let's say that N "consumer" threads are waiting on CV and "producer" thread broadcasts signal on that CV to wake up all consumers given: semBlockLock semaphore's count == 0 ( decremented in ptw32_cond_unblock(...) ) 1. - all "consumers" wake up and enter ptw32_cond_wait_cleanup(...) 2. - one "consumer" - ALPHA - acquires CV's external mutex, executes cleanup code, returns from pthread_cond_wait() function, releases CV's external mutex 3. - another "consumer" acquires CV's "external" mutex and cleans up....etc 4. - ALPHA "consumer" sees that "producer"'s work queue is empty, decides to wait on CV again, aquires CV's mutex, and enters pthread_cond_wait(...) 5. - there are still "consumers" to be unblocked - nWaitersToUnblock !=0 - and they are not going anywhere, because ALPHA "consumer" holds CV's external lock 6. ALPHA consumer executes sem_wait( semBlockLock ); and we get a deadlock, because nWaitersToUnblock will never reach 0, and semBlockLock semaphore will never get incremented. - solution: move these lines: if ((result = pthread_mutex_lock (cleanup_args->mutexPtr)) != 0) { *resultPtr = result; return; } to the end of ptw32_cond_wait_cleanup procedure: static void PTW32_CDECL ptw32_cond_wait_cleanup (void *args) { ..... ..... ..... if (1 == nSignalsWasLeft) { if (sem_post (&(cv->semBlockLock)) != 0) { *resultPtr = errno; return; } } /* * XSH: Upon successful return, the mutex has been locked and is owned * by the calling thread. This must be done before any cancelation * cleanup handlers are run. */ if ((result = pthread_mutex_lock (cleanup_args->mutexPtr)) != 0) { *resultPtr = result; return; } } - any reason why pthread_mutex_lock (cleanup_args->mutexPtr) was moved to the top? Algorithm 8A has this line at the bottom of ptw32_cond_wait_cleanup() Thanks, Alexander Kotliarov.