From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 61A983938C04; Thu, 24 Dec 2020 20:02:49 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 61A983938C04 From: "triegel at redhat dot com" To: glibc-bugs@sourceware.org Subject: [Bug nptl/25847] pthread_cond_signal failed to wake up pthread_cond_wait due to a bug in undoing stealing Date: Thu, 24 Dec 2020 20:02:48 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: glibc X-Bugzilla-Component: nptl X-Bugzilla-Version: 2.27 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: triegel at redhat dot com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: unassigned at sourceware dot org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: glibc-bugs@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Glibc-bugs mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 24 Dec 2020 20:02:49 -0000 https://sourceware.org/bugzilla/show_bug.cgi?id=3D25847 --- Comment #17 from Torvald Riegel --- (In reply to Arun from comment #3) > MUTEX_LOCK(gil->mutex); > _Py_ANNOTATE_RWLOCK_RELEASED(&gil->locked, /*is_write=3D*/1); > _Py_atomic_store_relaxed(&gil->locked, 0); > COND_SIGNAL(gil->cond); > MUTEX_UNLOCK(gil->mutex); >=20 > #ifdef FORCE_SWITCHING > if (_Py_atomic_load_relaxed(&ceval->gil_drop_request) && tstate !=3D = NULL) > { > MUTEX_LOCK(gil->switch_mutex); > /* Not switched yet =3D> wait */ > if (((PyThreadState*)_Py_atomic_load_relaxed(&gil->last_holder)) = =3D=3D > tstate) > { > assert(is_tstate_valid(tstate)); > RESET_GIL_DROP_REQUEST(tstate->interp); > /* NOTE: if COND_WAIT does not atomically start waiting when > releasing the mutex, another thread can run through, take > the GIL and drop it again, and reset the condition > before we even had a chance to wait for it. */ > COND_WAIT(gil->switch_cond, gil->switch_mutex); > } > MUTEX_UNLOCK(gil->switch_mutex); > } > #endif > } Can you please check whether you are using condvars correctly in your code,= in particular whether your code handles spurious wake-ups of COND_WAIT correct= ly?=20 The bits of code you have posted do not have a loop that checks the wait condition again; there is just an if statement and you unlock the mutex rig= ht after the COND_WAIT. Also, the two critical sections seem to use different mutexes and different conditions. It would be more helpful if you could show code examples for p= airs of related signals and waits. The use of atomic access to the condition within the critical section ( gil->last_holder) can make sense, but it should not be required because tha= t's what the mutex / critical section takes care of in a typical use of condvar= s.=20 Perhaps check that as well.=20 Ideally, a small reproducer would be best. (I'm aware of the first reprodu= cer posted, but I'm currently looking at it and am not yet convinced that it is correct; it sends out more signals than the number of wake-ups it allows through the wait condition, AFAICT, which I find surprising.) --=20 You are receiving this mail because: You are on the CC list for the bug.=