From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id D1AC53842400; Thu, 7 Jan 2021 23:31:05 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org D1AC53842400 From: "triegel at redhat dot com" To: glibc-bugs@sourceware.org Subject: [Bug nptl/25847] pthread_cond_signal failed to wake up pthread_cond_wait due to a bug in undoing stealing Date: Thu, 07 Jan 2021 23:31:05 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: glibc X-Bugzilla-Component: nptl X-Bugzilla-Version: 2.27 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: triegel at redhat dot com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: unassigned at sourceware dot org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: glibc-bugs@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Glibc-bugs mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Jan 2021 23:31:05 -0000 https://sourceware.org/bugzilla/show_bug.cgi?id=3D25847 --- Comment #23 from Torvald Riegel --- (In reply to Malte Skarupke from comment #21) > My suspicions come from trying to make one of the alternative solutions > mentioned in the patch email chain work, which was to do something similar > to the mitigation patch, but to signal for the correct group instead of > broadcasting. I couldn't make that alternative change work, and I reduced > the problem down to calling this right in front of the "We potentially st= ole > a signal" check: >=20 > __condvar_acquire_lock (cond, private); > __condvar_release_lock (cond, private); >=20 > Just taking and releasing the internal lock can deadlock in some of the > tests. (most often in sysdeps/pthread/tst-cond16.c) I believe that this just shows that stealing can happen. Cancellation and timeouts, which appear to be working fine, also acquire this lock in __condvar_cancel_waiting -- but do so before decreasing the number of avail= able signals. Consider the case where a signal is stolen by thread A from a group 1 that = is then fully signaled according to the group size, and then another thread B tries to quiesce and switch group 1. The internal lock is held by B, which waits for a waiter W whose signal has been stolen by A. W can't make progr= ess because A has "its" signal. And A waits for B if one adds the empty critic= al section, which is how I think this can deadlock. > The mitigation patch > will call these functions. It does, but after giving back the stolen signal (or at least trying to do = so, incorrectly). --=20 You are receiving this mail because: You are on the CC list for the bug.=