From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 5BC053857000; Sun, 18 Sep 2022 20:06:59 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 5BC053857000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1663531620; bh=ZkPEfdIl7EkDCc3oBBlCgpBFSkUiGQrAcbb17dJm48k=; h=From:To:Subject:Date:In-Reply-To:References:From; b=TmLds5NDyLLeibod6THjE/biRi7jDr2opPgyLzhzlyOfXV2VNOqPz7x5n45yBSRD3 U0GOnviRXW9bzBY6G0P5tb+x/G7FcidHrJ5iNm03m/60p63Kz9R/chWwB5RYvt+k7j merzfJ18dhmltjay1L5mF3EtGHpgC4aeAYnj1b2c= From: "carlos at redhat dot com" To: glibc-bugs@sourceware.org Subject: [Bug nptl/25847] pthread_cond_signal failed to wake up pthread_cond_wait due to a bug in undoing stealing Date: Sun, 18 Sep 2022 20:06:56 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: glibc X-Bugzilla-Component: nptl X-Bugzilla-Version: 2.27 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: carlos at redhat dot com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: carlos at redhat dot com X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://sourceware.org/bugzilla/show_bug.cgi?id=3D25847 --- Comment #46 from Carlos O'Donell --- (In reply to Malte Skarupke from comment #45) > I looked into this bug again because people kept on running into it. I ra= n a > bigger TLA+ analysis and found an interleaving where the mitigation patch > doesn't help. I wrote it up on my blog again here: >=20 > https://probablydance.com/2022/09/17/finding-the-second-bug-in-glibcs- > condition-variable/ I'll have a look at this when I'm back from GNU Tools Cauldron. It's disappointing that there is still an interleaving that doesn't work. I've h= ad a community member testing Frank Barrus' patch and that did seem to be correc= t. > The short of it is that the pthread_cond_broadcast() in the mitigation pa= tch > can early-out when no thread is sleeping at the same time, in which case = it > doesn't wake anyone and doesn't change any state. Meaning the mitigation > patch might do nothing. But the leftover signal from the wait will stick > around. After that you can get a similar interleaving as before which will > result in an incorrect g_size, causing pthread_cond_signal() to wake in t= he > wrong group. >=20 > It's less likely than before, because you need three rare things to happe= n, > but it can still happen: > 1. Trigger the potential steal > 2. The pthread_cond_broadcast() from the mitigation patch has to early-out > without doing anything > 3. You need to consume that extra signal from the potential steal at a ti= me > where it will cause quiesce_and_switch_g1() to increase g_size to at leas= t 2 >=20 > My patch from a year ago should fix this: > https://sourceware.org/pipermail/libc-alpha/2021-September/130840.html Could you expand on this a bit? Do you mean to say your patch from last September resolves all the issues you have seen, including the new one? --=20 You are receiving this mail because: You are on the CC list for the bug.=