From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 837AF388CC1D; Tue, 6 Apr 2021 01:17:33 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 837AF388CC1D From: "frankbarrus_sw at shaggy dot cc" To: glibc-bugs@sourceware.org Subject: [Bug nptl/25847] pthread_cond_signal failed to wake up pthread_cond_wait due to a bug in undoing stealing Date: Tue, 06 Apr 2021 01:17:32 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: glibc X-Bugzilla-Component: nptl X-Bugzilla-Version: 2.27 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: frankbarrus_sw at shaggy dot cc X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: unassigned at sourceware dot org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: glibc-bugs@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Glibc-bugs mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 06 Apr 2021 01:17:33 -0000 https://sourceware.org/bugzilla/show_bug.cgi?id=3D25847 --- Comment #29 from Frank Barrus --- I should add one more thing about my workaround fix: Although it's currently implemented as a post-fix wrapper on pthread_cond_signal() as well as after waiting, it doesn't need to be. Tha= t's just because it also tries to detect the bug condition *before* the lost wa= keup occurs whenever possible. Practically speaking, that almost never occurs anyway since wrefs usually leaves the state ambiguous until the last waiter exits, and then most of the time, the last exiting G1 waiter is the one who ends up detecting it first, whether it's before or after the signal was los= t.=20=20 Also, checking the condvar in this way in pthread_cond_signal() assumes you know for sure that the wrapper code is being called while the mutex is held, which can't be guaranteed in a general-purpose fix for pthread_cond_signal. So the post-fix check I'm doing could be done instead entirely from the wrappers for pthread_cond_wait/pthread_cond_timedwait(). That also means it could go in the common signal wait code, after the mutex is re-acquired. As long as the common cases can bail out of the checks fast, there should be minimal overhead. The only heavyweight cases will be when it actually dete= cts that signals have already been lost, or the condvar is in a state that will lose the next signal. --=20 You are receiving this mail because: You are on the CC list for the bug.=