From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 9BC45395C80D; Tue, 4 May 2021 22:51:44 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 9BC45395C80D From: "frankbarrus_sw at shaggy dot cc" To: glibc-bugs@sourceware.org Subject: [Bug nptl/25847] pthread_cond_signal failed to wake up pthread_cond_wait due to a bug in undoing stealing Date: Tue, 04 May 2021 22:51:43 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: glibc X-Bugzilla-Component: nptl X-Bugzilla-Version: 2.27 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: frankbarrus_sw at shaggy dot cc X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: carlos at redhat dot com X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: glibc-bugs@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Glibc-bugs mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 May 2021 22:51:44 -0000 https://sourceware.org/bugzilla/show_bug.cgi?id=3D25847 --- Comment #39 from Frank Barrus --- (In reply to Qin Li from comment #34) > Hi Frank, I am the original reporter of this bug. Could you share a sneak > peak version of your alternate fix that you mentioned below? >=20 > > FYI, I'm currently testing a different pthreads fix for this issue that= does it without the suggested "broadcast" solution that some distros appea= r to be adopting for now. >=20 > The reason I am asking is that several months after applied the broadcast > fix we started to observe a different hang caused by either > pthread_cond_signal/pthread_cond_wait, or the constructs it relied on, e.= g. > futex. Original I feared it is related or caused by the broadcast fix, but > later realized this might be another issue as it has also been independen= tly > reported in this bug by Arun without applying the broadcast fix: > https://sourceware.org/bugzilla/show_bug.cgi?id=3D25847#c3 >=20 > The signature of this different issue is this: 1 thread blocked "infinite= ly" > in the pthread_cond_signal when calling __condvar_quiesce_and_switch_g1: >=20 > #0 futex_wait > #1 futex_wait_simple > #2 __condvar_quiesce_and_switch_g1 > #3 __pthread_cond_signal >=20 > And all the other threads are blocked in pthread_cond_wait waiting for the > signal. >=20 > Another interesting observation is the "infinitely" blocked thread on > pthread_cond_signal can be unblocked if I send a SIG_SEGV to the linux .N= et > Core process that hit this issue which has a segfault handler that will c= all > another external executable to take a core dump of this process. I am not > exactly sure how much of the special signal handling logic is important to > get pthread_cond_signal unblocked. It is possible that such signal would > cause a spurious wakeup from futex_wait that actually unblocks > __condvar_quiesce_and_switch_g1 and later __pthread_cond_signal, but this= is > pure speculation. >=20 > It would be nice to know if the ^^^ hanging pthread_cond_signal signature > has also been discovered by the community and whether there might be any > investigation/fix available. Hi Qin, I just posted the current version of my proposed fix. (see previous comment and attachment) This will address your second concern as well, since it removes the need for condvar_quiesce_and_switch_g1() to block. Please try the patch and let me know how it goes for your testing. Thanks! - Frank --=20 You are receiving this mail because: You are on the CC list for the bug.=