From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id C7E78396E875; Wed, 14 Apr 2021 16:57:07 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org C7E78396E875 From: "qin.li at thetradedesk dot com" To: glibc-bugs@sourceware.org Subject: [Bug nptl/25847] pthread_cond_signal failed to wake up pthread_cond_wait due to a bug in undoing stealing Date: Wed, 14 Apr 2021 16:57:07 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: glibc X-Bugzilla-Component: nptl X-Bugzilla-Version: 2.27 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: qin.li at thetradedesk dot com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: carlos at redhat dot com X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: glibc-bugs@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Glibc-bugs mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 14 Apr 2021 16:57:07 -0000 https://sourceware.org/bugzilla/show_bug.cgi?id=3D25847 --- Comment #34 from Qin Li --- Hi Frank, I am the original reporter of this bug. Could you share a sneak p= eak version of your alternate fix that you mentioned below? > FYI, I'm currently testing a different pthreads fix for this issue that d= oes it without the suggested "broadcast" solution that some distros appear = to be adopting for now. The reason I am asking is that several months after applied the broadcast f= ix we started to observe a different hang caused by either pthread_cond_signal/pthread_cond_wait, or the constructs it relied on, e.g. futex. Original I feared it is related or caused by the broadcast fix, but later realized this might be another issue as it has also been independently reported in this bug by Arun without applying the broadcast fix: https://sourceware.org/bugzilla/show_bug.cgi?id=3D25847#c3 The signature of this different issue is this: 1 thread blocked "infinitely= " in the pthread_cond_signal when calling __condvar_quiesce_and_switch_g1: #0 futex_wait #1 futex_wait_simple #2 __condvar_quiesce_and_switch_g1 #3 __pthread_cond_signal And all the other threads are blocked in pthread_cond_wait waiting for the signal. Another interesting observation is the "infinitely" blocked thread on pthread_cond_signal can be unblocked if I send a SIG_SEGV to the linux .Net Core process that hit this issue which has a segfault handler that will call another external executable to take a core dump of this process. I am not exactly sure how much of the special signal handling logic is important to = get pthread_cond_signal unblocked. It is possible that such signal would cause a spurious wakeup from futex_wait that actually unblocks __condvar_quiesce_and_switch_g1 and later __pthread_cond_signal, but this is pure speculation. It would be nice to know if the ^^^ hanging pthread_cond_signal signature h= as also been discovered by the community and whether there might be any investigation/fix available. --=20 You are receiving this mail because: You are on the CC list for the bug.=