From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 971F33854801; Sat, 30 Jan 2021 00:59:56 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 971F33854801 From: "michael.bacarella at gmail dot com" To: glibc-bugs@sourceware.org Subject: [Bug nptl/25847] pthread_cond_signal failed to wake up pthread_cond_wait due to a bug in undoing stealing Date: Sat, 30 Jan 2021 00:59:55 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: glibc X-Bugzilla-Component: nptl X-Bugzilla-Version: 2.27 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: michael.bacarella at gmail dot com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: unassigned at sourceware dot org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: glibc-bugs@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Glibc-bugs mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 30 Jan 2021 00:59:56 -0000 https://sourceware.org/bugzilla/show_bug.cgi?id=3D25847 --- Comment #27 from Michael Bacarella = --- > Torvald Riegel 2021-01-16 00:23:42 UTC > (In reply to Michael Bacarella from comment #8) > > I ran into this issue recently as well. It took a *very* long time to > > isolate. I'm glad to see a patch was posted already. > >=20 > > I confirm on my systems that the deadlock was introduced in 2.27 and ap= pears > > to resolve when the one-line patch is applied to at least glibc-2.31. > > Do you happen to have a reproducer (that is different from Qin Li's)? > Alternatively, is a call to pthread_cond_signal instead of the broadcast > sufficient to prevent triggering errors in the cases that you observed? Sadly no. The existence of the bug was inferred through "masterlock" corruption in the OCaml runtime, and using the one-line patch to glibc above stopped the application from deadlocking. I wasn't writing any C code directly that I could experiment with. If it's helpful, there was some discussion here on how it was tracked down: https://discuss.ocaml.org/t/is-there-a-known-recent-linux-locking-bug-that-= affects-the-ocaml-runtime/6542/3 A pthread_cond_signal appears to simply be missed by threads waiting on pthread_cond_wait. If I can force another thread to try to acquire the run= time lock again (briefly, by causing a thread that was blocked on I/O to complete (hitting Enter on stdin) which dispatches an new call to pthread_cond_wait), the deadlock clears. --=20 You are receiving this mail because: You are on the CC list for the bug.=