From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 6382A39540CC; Tue, 4 May 2021 22:48:55 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 6382A39540CC From: "frankbarrus_sw at shaggy dot cc" To: glibc-bugs@sourceware.org Subject: [Bug nptl/25847] pthread_cond_signal failed to wake up pthread_cond_wait due to a bug in undoing stealing Date: Tue, 04 May 2021 22:48:54 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: glibc X-Bugzilla-Component: nptl X-Bugzilla-Version: 2.27 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: frankbarrus_sw at shaggy dot cc X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: carlos at redhat dot com X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: attachments.created Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: glibc-bugs@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Glibc-bugs mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 May 2021 22:48:55 -0000 https://sourceware.org/bugzilla/show_bug.cgi?id=3D25847 --- Comment #38 from Frank Barrus --- Created attachment 13419 --> https://sourceware.org/bugzilla/attachment.cgi?id=3D13419&action=3Ded= it proposed lost wakeup fix with g_signals relative to g1_start and no signal stealing Here is my proposed solution to the lost wakeup problem. Further details a= re in the attachment, but the high level summary is that it turns g_signals[] = into an always-advancing value for each new G1 group (using the low 31 bits of t= he current g1_start value as the relative base for the signal count), thus avoiding aliasing issues from G1/G2 re-use and avoiding A/B/A issues on the signal count. This eliminates the signal stealing, as well as the need to block and wait for G1 to fully quiesce when signaling. This provides a performance benefit at the same time as fixing the lost wakeup issue, since pthread_cond_signal no longer has to wait for remaining G1 waiters to wake = up and run, or for pre-empted waiter threads to resume running. (the latter w= as introducing rather high latencies for signaling when waiters got pre-empted) My own testing (in a multi-core setup that was showing the lost wakeup quite frequently) has not shown any problems yet, but I'd welcome others to try t= he patch and give their feedback/results. This hasn't been through TLA+ testi= ng yet. Note that this patch is against the current master version of glibc/pthread= s.=20 If you need to patch an older version, make sure the additional futex waits= in pthread_cond_wait_common() all have their "0" value changed to "signals". This patch could still use some additional cleanup of the comments and some minor optimizations, but I'd like to get feedback on it (both the specific patch and the general path this solution is taking) first before polishing = it further. --=20 You are receiving this mail because: You are on the CC list for the bug.=