From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id B1EC0396EC55; Thu, 7 Jan 2021 13:54:33 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B1EC0396EC55 From: "malteskarupke at fastmail dot fm" To: glibc-bugs@sourceware.org Subject: [Bug nptl/25847] pthread_cond_signal failed to wake up pthread_cond_wait due to a bug in undoing stealing Date: Thu, 07 Jan 2021 13:54:32 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: glibc X-Bugzilla-Component: nptl X-Bugzilla-Version: 2.27 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: malteskarupke at fastmail dot fm X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: unassigned at sourceware dot org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: glibc-bugs@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Glibc-bugs mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Jan 2021 13:54:33 -0000 https://sourceware.org/bugzilla/show_bug.cgi?id=3D25847 --- Comment #21 from Malte Skarupke --- Sorry, I continue to have very little energy to work on this. I have been putting a little more work into it yesterday though, and I have more doubts= now that the mitigation patch is a long term solution. I still think it's better than doing nothing, but it might introduce an even more rare deadlock instead. It's even more rare, because while the current deadlock triggers only in an edge case of an edge case, the new one would o= nly trigger in an edge case of said edge case of an edge case. My suspicions come from trying to make one of the alternative solutions mentioned in the patch email chain work, which was to do something similar = to the mitigation patch, but to signal for the correct group instead of broadcasting. I couldn't make that alternative change work, and I reduced t= he problem down to calling this right in front of the "We potentially stole a signal" check: __condvar_acquire_lock (cond, private); __condvar_release_lock (cond, private); Just taking and releasing the internal lock can deadlock in some of the tes= ts. (most often in sysdeps/pthread/tst-cond16.c) The mitigation patch will call these functions. Most of the time that doesn't deadlock, and I haven't found out what exactly causes the deadlock (I'll instead put my limited energy ba= ck into a second version of my emailed patch), but the fact that unconditional= ly calling the above will deadlock should give us suspicion that calling the a= bove conditionally might also sometimes deadlock. It's impossible to reproduce in the glibc tests because it's even more rare than the original problem, so I think the mitigation patch is still a little better than doing nothing. Otherwise I think the patch I emailed is still good, (unlike my patch attac= hed to this issue) and I aim to have a second version out, maybe on the weekend, that addresses the comments in the email conversation. --=20 You are receiving this mail because: You are on the CC list for the bug.=