From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 56C4838708BA; Sun, 15 Nov 2020 02:45:06 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 56C4838708BA From: "malteskarupke at fastmail dot fm" To: glibc-bugs@sourceware.org Subject: [Bug nptl/25847] pthread_cond_signal failed to wake up pthread_cond_wait due to a bug in undoing stealing Date: Sun, 15 Nov 2020 02:45:06 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: glibc X-Bugzilla-Component: nptl X-Bugzilla-Version: 2.27 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: malteskarupke at fastmail dot fm X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: unassigned at sourceware dot org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: attachments.created Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: glibc-bugs@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Glibc-bugs mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 15 Nov 2020 02:45:06 -0000 https://sourceware.org/bugzilla/show_bug.cgi?id=3D25847 --- Comment #13 from Malte Skarupke --- Created attachment 12956 --> https://sourceware.org/bugzilla/attachment.cgi?id=3D12956&action=3Ded= it Patch fix by broadening the scope of g_refs, making stealing impossible I've attached a patch file (Fix-pthread_cond_wait-signal-bug-related-to-stealing.patch) that contains a fix for this bug. I haven't verified this fix with TLA+ yet, but I still in= tend to do that soon. I'm posting this early because I'm about to go on vacation= for a week, so I didn't want to keep people waiting. Plus, it's not like other patches are verified in TLA+. So for now this follows the usual standard of passing all the tests, not triggering the bug in the reproduce case that's attached to this ticket, and of me spending a lot of time reasoning through= it and convincing myself that it's correct. The fix works by making stealing of signals from later groups impossible, allowing me to get rid of the code that tries to handle the stealing case, = thus getting rid of the bug in that code. The fix got a bit bigger in scope. While a simple fix would have just broad= ened the scope of g_refs, (this fix I did verify in TLA+) that leads to inelegant code where both wrefs and g_refs serve pretty much the same purpose. So I decided to get rid of the reference count in wrefs with this change. And unfortunately I couldn't completely do that. So the change got more complex. This definitely needs some reviewing before people use it. But I think the direction is good, since it gets rid of a very complex edge case. Yes, it a= lso introduces a new edge case, but that new edge case only happens for people = who use pthread_cancel, and even then it should be simpler to reason through. (= to the point that I'm not going to explain it here, hoping that you'll underst= and it just from the comment in the code) Let me know if there is anything else that you need. Like is this even the right place to submit a patch? Or should I send an email to the mailing lis= t? --=20 You are receiving this mail because: You are on the CC list for the bug.=