From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id AB6C3395443E; Sun, 18 Sep 2022 05:38:41 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org AB6C3395443E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1663479521; bh=UrIU/qlMCp78xTQ+RhiE2x2Rf1X8tnFqCazedZ09BLs=; h=From:To:Subject:Date:In-Reply-To:References:From; b=co9MK+slBTiMyu1NcTQBCI3nRZLswyWEOEUxl+pBIRdRJ/I4ao65849n98GRvnRMa E2wyZRMuabdQBGpoyfPR6N29Q5mGkJ0f0AfB5+hENbSKjZvq5nyBmbzUg+RFhBPEal n1ZWTErJLfTIifzLg63xoe9BLIR6CLyGMsPMi5rA= From: "malteskarupke at fastmail dot fm" To: glibc-bugs@sourceware.org Subject: [Bug nptl/25847] pthread_cond_signal failed to wake up pthread_cond_wait due to a bug in undoing stealing Date: Sun, 18 Sep 2022 05:38:39 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: glibc X-Bugzilla-Component: nptl X-Bugzilla-Version: 2.27 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: malteskarupke at fastmail dot fm X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: carlos at redhat dot com X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://sourceware.org/bugzilla/show_bug.cgi?id=3D25847 --- Comment #45 from Malte Skarupke --- I looked into this bug again because people kept on running into it. I ran a bigger TLA+ analysis and found an interleaving where the mitigation patch doesn't help. I wrote it up on my blog again here: https://probablydance.com/2022/09/17/finding-the-second-bug-in-glibcs-condi= tion-variable/ The short of it is that the pthread_cond_broadcast() in the mitigation patch can early-out when no thread is sleeping at the same time, in which case it doesn't wake anyone and doesn't change any state. Meaning the mitigation pa= tch might do nothing. But the leftover signal from the wait will stick around. After that you can get a similar interleaving as before which will result i= n an incorrect g_size, causing pthread_cond_signal() to wake in the wrong group. It's less likely than before, because you need three rare things to happen,= but it can still happen: 1. Trigger the potential steal 2. The pthread_cond_broadcast() from the mitigation patch has to early-out without doing anything 3. You need to consume that extra signal from the potential steal at a time where it will cause quiesce_and_switch_g1() to increase g_size to at least 2 My patch from a year ago should fix this: https://sourceware.org/pipermail/libc-alpha/2021-September/130840.html --=20 You are receiving this mail because: You are on the CC list for the bug.=