From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 923A4384B823; Tue, 6 Apr 2021 14:32:29 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 923A4384B823 From: "frankbarrus_sw at shaggy dot cc" To: glibc-bugs@sourceware.org Subject: [Bug nptl/25847] pthread_cond_signal failed to wake up pthread_cond_wait due to a bug in undoing stealing Date: Tue, 06 Apr 2021 14:32:29 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: glibc X-Bugzilla-Component: nptl X-Bugzilla-Version: 2.27 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: frankbarrus_sw at shaggy dot cc X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: unassigned at sourceware dot org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: glibc-bugs@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Glibc-bugs mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 06 Apr 2021 14:32:29 -0000 https://sourceware.org/bugzilla/show_bug.cgi?id=3D25847 --- Comment #30 from Frank Barrus --- (In reply to Frank Barrus from comment #29) > I should add one more thing about my workaround fix: >=20 > Although it's currently implemented as a post-fix wrapper on > pthread_cond_signal() as well as after waiting, it doesn't need to be.=20 > That's just because it also tries to detect the bug condition *before* the > lost wakeup occurs whenever possible. Practically speaking, that almost > never occurs anyway since wrefs usually leaves the state ambiguous until = the > last waiter exits, and then most of the time, the last exiting G1 waiter = is > the one who ends up detecting it first, whether it's before or after the > signal was lost. Also, checking the condvar in this way in > pthread_cond_signal() assumes you know for sure that the wrapper code is > being called while the mutex is held, which can't be guaranteed in a > general-purpose fix for pthread_cond_signal. >=20 > So the post-fix check I'm doing could be done instead entirely from the > wrappers for pthread_cond_wait/pthread_cond_timedwait(). That also means= it > could go in the common signal wait code, after the mutex is re-acquired. = As > long as the common cases can bail out of the checks fast, there should be > minimal overhead. The only heavyweight cases will be when it actually > detects that signals have already been lost, or the condvar is in a state > that will lose the next signal. I should also add that this is only true (that it can be detected in just t= he waiters after they wake up) if the goal is only to detect and fix the lost wakeup conditions that can result in getting stuck. But if the goal is al= so to replace the correct count of lost wakeups, then it's important to still = add the detection at the end of pthread_cond_signal() as well, since if enough additional signals are sent before the last waiter exits to get out of this state and cause the quiesce_and_switch, then the count of unconsumed signal= s in g_signal[G1] will get cleared without performing that number of wakeups, un= less there is detection of this state and that number of additional signals are = sent to the G2 waiters once they become G1. --=20 You are receiving this mail because: You are on the CC list for the bug.=