public inbox for glibc-bugs@sourceware.org help / color / mirror / Atom feed
From: "frankbarrus_sw at shaggy dot cc" <sourceware-bugzilla@sourceware.org> To: glibc-bugs@sourceware.org Subject: [Bug nptl/25847] pthread_cond_signal failed to wake up pthread_cond_wait due to a bug in undoing stealing Date: Tue, 04 May 2021 22:58:55 +0000 [thread overview] Message-ID: <bug-25847-131-zAOIU6fpqd@http.sourceware.org/bugzilla/> (raw) In-Reply-To: <bug-25847-131@http.sourceware.org/bugzilla/> https://sourceware.org/bugzilla/show_bug.cgi?id=25847 --- Comment #40 from Frank Barrus <frankbarrus_sw at shaggy dot cc> --- Since posting my diff as a patch obscures the commit comment that has a description of this solution, here is another copy of it: This fixes the lost wakeup (from a bug in signal stealing) with a change in the usage of g_signals[] in the condition variable internal state. It also completely eliminates the concept and handling of signal stealing, as well as the need for signalers to block to wait for waiters to wake up every time there is a G1/G2 switch. This greatly reduces the average and maximum latency for pthread_cond_signal. The g_signals[] field now contains a signal count that is relative to the current g1_start value. Since it is a 32-bit field, and the LSB is still reserved (though not currently used anymore), it has a 31-bit value that corresponds to the low 31 bits of the sequence number in g1_start. (since g1_start also has an LSB flag, this means bits 31:1 in g_signals correspond to bits 31:1 in g1_start, plus the current signal count) By making the signal count relative to g1_start, there is no longer any ambiguity or A/B/A issue, and thus any checks before blocking, including the futex call itself, are guaranteed not to block if the G1/G2 switch occurs, even if the signal count remains the same. This allows initially safely blocking in G2 until the switch to G1 occurs, and then transitioning from G1 to a new G1 or G2, and always being able to distinguish the state change. This removes the race condition and A/B/A problems that otherwise ocurred if a late (pre-empted) waiter were to resume just as the futex call attempted to block on g_signal since otherwise there was no last opportunity to re-check things like whether the current G1 group was already closed. By fixing these issues, the signal stealing code can be eliminated, since there is no concept of signal stealing anymore. The code to block for all waiters to exit g_refs can also be removed, since any waiters that are still in the g_refs region can be guaranteed to safely wake up and exit. If there are still any left at this time, they are all sent one final futex wakeup to ensure that they are not blocked any longer, but there is no need for the signaller to block and wait for them to wake up and exit the g_refs region. The signal count is then effectively "zeroed" but since it is now relative to g1_start, this is done by advancing it to a new value that can be observed by any pending blocking waiters. Any late waiters can always tell the difference, and can thus just cleanly exit if they are in a stale G1 or G2. They can never steal a signal from the current G1 if they are not in the current G1, since the signal value that has to match in the cmpxchg has the low 31 bits of the g1_start value contained in it, and that's first checked, and then it won't match if there's a G1/G2 change. Note: the 31-bit sequence number used in g_signals is designed to handle wrap-around when checking the signal count, but if the entire 31-bit wraparound (2 billion signals) occurs while there is still a late waiter that has not yet resumed, and it happens to then match the current g1_start low bits, and the pre-emption occurs after the normal "closed group" checks (which are 64-bit) but then hits the futex syscall and signal consuming code, then an A/B/A issue could still result and cause an incorrect assumption about whether it should block. This particular scenario seems unlikely in practice. Note that once awake from the futex, the waiter would notice the closed group before consuming the signal (since that's still a 64-bit check that would not be aliased in the wrap-around in g_signals), so the biggest impact would be blocking on the futex until the next full wakeup from a G1/G2 switch. -- You are receiving this mail because: You are on the CC list for the bug.
next prev parent reply other threads:[~2021-05-04 22:58 UTC|newest] Thread overview: 93+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-04-18 8:04 [Bug nptl/25847] New: " qin.li at thetradedesk dot com 2020-04-18 8:07 ` [Bug nptl/25847] " qin.li at thetradedesk dot com 2020-04-20 17:52 ` qin.li at thetradedesk dot com 2020-05-05 11:50 ` arun11299 at gmail dot com 2020-05-05 11:52 ` arun11299 at gmail dot com 2020-05-14 8:52 ` dingxiangfei2009 at protonmail dot ch 2020-05-19 21:15 ` carlos at redhat dot com 2020-10-08 19:07 ` michael.bacarella at gmail dot com 2020-10-10 8:11 ` flo at geekplace dot eu 2020-10-12 20:53 ` anssi.hannula at iki dot fi 2020-10-13 16:50 ` guillaume at morinfr dot org 2020-10-16 20:23 ` kirichenkoga at gmail dot com 2020-10-18 22:55 ` michael.bacarella at gmail dot com 2020-10-20 20:12 ` kirichenkoga at gmail dot com 2020-10-20 20:17 ` kirichenkoga at gmail dot com 2020-11-01 13:58 ` malteskarupke at fastmail dot fm 2020-11-01 15:40 ` michael.bacarella at gmail dot com 2020-11-15 2:45 ` malteskarupke at fastmail dot fm 2020-11-23 10:37 ` ydroneaud at opteya dot com 2020-11-23 16:36 ` mattst88 at gmail dot com 2020-11-23 16:46 ` adhemerval.zanella at linaro dot org 2020-12-07 15:41 ` balint at balintreczey dot hu 2020-12-09 3:22 ` malteskarupke at fastmail dot fm 2020-12-24 20:02 ` triegel at redhat dot com 2020-12-25 16:19 ` triegel at redhat dot com 2021-01-07 1:09 ` manojgupta at google dot com 2021-01-07 7:31 ` balint at balintreczey dot hu 2021-01-07 13:54 ` malteskarupke at fastmail dot fm 2021-01-07 20:43 ` triegel at redhat dot com 2021-01-07 23:31 ` triegel at redhat dot com 2021-01-08 3:45 ` malteskarupke at fastmail dot fm 2021-01-16 0:21 ` triegel at redhat dot com 2021-01-16 0:23 ` triegel at redhat dot com 2021-01-30 0:59 ` michael.bacarella at gmail dot com 2021-02-07 17:38 ` slav.isv at gmail dot com 2021-03-09 15:18 ` bugdal at aerifal dot cx 2021-04-06 0:37 ` frankbarrus_sw at shaggy dot cc 2021-04-06 1:17 ` frankbarrus_sw at shaggy dot cc 2021-04-06 14:32 ` frankbarrus_sw at shaggy dot cc 2021-04-06 16:49 ` frankbarrus_sw at shaggy dot cc 2021-04-11 12:12 ` carlos at redhat dot com 2021-04-11 12:13 ` carlos at redhat dot com 2021-04-13 12:21 ` frankbarrus_sw at shaggy dot cc 2021-04-14 16:57 ` qin.li at thetradedesk dot com 2021-04-15 14:13 ` frankbarrus_sw at shaggy dot cc 2021-04-15 14:34 ` frankbarrus_sw at shaggy dot cc 2021-04-30 17:41 ` venkiram_be at yahoo dot co.in 2021-05-04 22:48 ` frankbarrus_sw at shaggy dot cc 2021-05-04 22:51 ` frankbarrus_sw at shaggy dot cc 2021-05-04 22:58 ` frankbarrus_sw at shaggy dot cc [this message] 2021-05-13 13:25 ` tuliom at ascii dot art.br 2021-06-14 13:31 ` willireamangel at gmail dot com 2021-07-09 1:41 ` uwydoc at gmail dot com 2021-07-15 0:55 ` benh at kernel dot crashing.org 2021-08-16 9:41 ` evgeny+sourceware at loop54 dot com 2021-09-13 16:50 ` dushistov at mail dot ru 2021-09-22 19:03 ` evgeny+sourceware at loop54 dot com 2021-09-22 19:07 ` balint at balintreczey dot hu 2021-09-24 0:18 ` tuliom at ascii dot art.br 2021-09-24 0:58 ` michael.bacarella at gmail dot com 2021-09-29 11:50 ` fweimer at redhat dot com 2021-10-21 15:42 ` fweimer at redhat dot com 2021-10-30 22:17 ` sam at gentoo dot org 2021-11-25 14:49 ` arekm at maven dot pl 2022-09-18 5:38 ` malteskarupke at fastmail dot fm 2022-09-18 20:06 ` carlos at redhat dot com 2022-09-19 3:38 ` malteskarupke at fastmail dot fm 2022-09-24 0:03 ` bugzilla at dimebar dot com 2022-09-24 10:15 ` ismail at i10z dot com 2022-09-26 14:28 ` ehagberg at janestreet dot com 2022-09-26 14:32 ` ehagberg at janestreet dot com 2022-10-06 21:58 ` malteskarupke at fastmail dot fm 2022-10-07 12:01 ` crrodriguez at opensuse dot org 2022-10-15 19:57 ` malteskarupke at fastmail dot fm 2022-11-07 18:23 ` sourceware-bugzilla at djr61 dot uk 2023-01-28 14:57 ` malteskarupke at fastmail dot fm 2023-05-01 12:52 ` carlos at redhat dot com 2023-05-02 12:57 ` carlos at redhat dot com 2023-05-03 3:04 ` malteskarupke at fastmail dot fm 2023-05-04 4:57 ` malteskarupke at fastmail dot fm 2023-05-04 12:24 ` carlos at redhat dot com 2023-05-05 23:44 ` carlos at redhat dot com 2023-05-10 21:29 ` frankbarrus_sw at shaggy dot cc 2023-05-10 21:39 ` frankbarrus_sw at shaggy dot cc 2023-05-11 0:22 ` frankbarrus_sw at shaggy dot cc 2023-05-11 12:01 ` carlos at redhat dot com 2023-05-11 12:05 ` carlos at redhat dot com 2023-05-13 4:10 ` malteskarupke at fastmail dot fm 2023-08-24 20:24 ` jwakely.gcc at gmail dot com 2023-09-26 12:33 ` malteskarupke at fastmail dot fm 2023-09-26 12:38 ` fweimer at redhat dot com 2024-01-05 7:31 ` malteskarupke at fastmail dot fm 2024-02-17 9:44 ` github at kalvdans dot no-ip.org
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=bug-25847-131-zAOIU6fpqd@http.sourceware.org/bugzilla/ \ --to=sourceware-bugzilla@sourceware.org \ --cc=glibc-bugs@sourceware.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).