[Bug nptl/25847] pthread_cond_signal failed to wake up pthread_cond_wait due to a bug in undoing stealing

public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed

From: "frankbarrus_sw at shaggy dot cc" <sourceware-bugzilla@sourceware.org>
To: glibc-bugs@sourceware.org
Subject: [Bug nptl/25847] pthread_cond_signal failed to wake up pthread_cond_wait due to a bug in undoing stealing
Date: Thu, 15 Apr 2021 14:34:14 +0000	[thread overview]
Message-ID: <bug-25847-131-tkw5GBFe7l@http.sourceware.org/bugzilla/> (raw)
In-Reply-To: <bug-25847-131@http.sourceware.org/bugzilla/>

https://sourceware.org/bugzilla/show_bug.cgi?id=25847

--- Comment #36 from Frank Barrus <frankbarrus_sw at shaggy dot cc> ---
(In reply to Frank Barrus from comment #35)
> (In reply to Qin Li from comment #34)
> > Hi Frank, I am the original reporter of this bug. Could you share a sneak
> > peak version of your alternate fix that you mentioned below?
> > 
> > > FYI, I'm currently testing a different pthreads fix for this issue that does it without the suggested "broadcast" solution that some distros appear to be adopting for now.
> > 
> > The reason I am asking is that several months after applied the broadcast
> > fix we started to observe a different hang caused by either
> > pthread_cond_signal/pthread_cond_wait, or the constructs it relied on, e.g.
> > futex. Original I feared it is related or caused by the broadcast fix, but
> > later realized this might be another issue as it has also been independently
> > reported in this bug by Arun without applying the broadcast fix:
> > https://sourceware.org/bugzilla/show_bug.cgi?id=25847#c3
> > 
> > The signature of this different issue is this: 1 thread blocked "infinitely"
> > in the pthread_cond_signal when calling __condvar_quiesce_and_switch_g1:
> > 
> > #0  futex_wait
> > #1  futex_wait_simple
> > #2  __condvar_quiesce_and_switch_g1
> > #3  __pthread_cond_signal
> > 
> > And all the other threads are blocked in pthread_cond_wait waiting for the
> > signal.
> > 
> > Another interesting observation is the "infinitely" blocked thread on
> > pthread_cond_signal can be unblocked if I send a SIG_SEGV to the linux .Net
> > Core process that hit this issue which has a segfault handler that will call
> > another external executable to take a core dump of this process. I am not
> > exactly sure how much of the special signal handling logic is important to
> > get pthread_cond_signal unblocked. It is possible that such signal would
> > cause a spurious wakeup from futex_wait that actually unblocks
> > __condvar_quiesce_and_switch_g1 and later __pthread_cond_signal, but this is
> > pure speculation.
> > 
> > It would be nice to know if the ^^^ hanging pthread_cond_signal signature
> > has also been discovered by the community and whether there might be any
> > investigation/fix available.
> 
> Hi Qin,
> 
> I believe you may be right about the second bug.  I have also seen such a
> signature ever since the fix went in to "or" in the 1 bit so it doesn't spin
> while waiting for g_refs to reach 0:
>           r = atomic_fetch_or_relaxed (cond->__data.__g_refs + g1, 1) | 1;
> 
> Whenever I've encountered threads "stuck" in that futex_wait, they have
> always become unblocked as soon as I resumed from gdb (which makes sense if
> they're just missing a futex wake since the signal handling unblocks them
> from the futex) so I was never quite sure if they were truly stuck there
> (which they seemed to be) or whether I was just catching them in a transient
> state.   However, since this was always at a time when other parts of our
> system had also become stuck, it seemed quite likely that the threads really
> were stuck in the futex_wait.
> 
> I have not seen this happen anymore ever since applying my new fix and
> testing with it, but that's purely anecdotal evidence of fixing the
> secondary issues, since it was quite rare to begin with.
> 
> If there's a race in the wait/wake logic itself for the grefs counting, I
> haven't found it yet, unless there's some very obscure bug in the underlying
> atomics themselves allowing the atomic or and atomic subtract to cross paths
> somehow thus missing the wakeup flag.  i.e. these two atomics:
> 
>   r = atomic_fetch_or_relaxed (cond->__data.__g_refs + g1, 1) | 1;
>   (in the quiesce_and_switch) 
> and:
>   if (atomic_fetch_add_release (cond->__data.__g_refs + g, -2) == 3)
>   (in the gref_decrefs used from the cond_wait)
> 
> I think the more likely cause is related to still having an extra waiter
> that didn't get woken yet and didn't see the group being closed.   Note that
> when the group is closed, the LSB of g_signals is set, but there's no futex
> wakeup:
> 
>   atomic_fetch_or_relaxed (cond->__data.__g_signals + g1, 1);
> 
> This should be fine if there are no other bugs, since every waiter should
> have been woken already.  But since we're already dealing with a mix of both
> current waiters and older ones that were pre-empted first and which are now
> resuming, and possibly not yet aware that they are in an older group, there
> might be some race there.  In particular, if a signal did actually get
> stolen and not properly detected and replaced, there could be an extra
> waiter stuck in a futex wait on g_signals still.  Without an explicit wakeup
> when closing the group, it won't see the "closed" bit until it wakes from
> the futex, which won't happen in that case if there are no more signals sent
> to that group. However, an interrupt from normal signal handling (SIG*
> signals) will break it out of the futex, which is why a gdb attach and
> resume would get it unstuck in that case.  And then it will see the closed
> group on g_signals and proceed as if nothing was ever wrong.
> 
> Until we figure out the exact cause, I cannot guarantee that my new fix also
> addresses this issue.  Although as I said, I have not yet seen it occur with
> the fix.  Also since I eliminated the need for handling stolen signals and
> signal replacement, it might remove the cause of this second bug as a side
> effect.  (if indeed there truly is a second bug)
> 
> Do you happen to still have the condvar state when you've hit this bug?  Or
> can it be reproduced often enough to capture this state?   Could you
> instrument your calls to pthread_cond_signal to capture the condvar state
> before you send the signal also?  (and perhaps also in your waiters?).  I
> ended up adding circular logging of the condvar state (and a timestamp and
> TID) before and after every pthread_cond_wait and pthread_cond_signal to
> diagnose the lost wakeup so I could see how the events interleaved,
> including the often 2ms to 3ms pre-emption time in the older waiters that
> were coming back and leading to the bug.  I also called rusage() and
> captured the voluntary and involuntary context-switch counts and added those
> to the condvar event logs to make sure I was always seeing a pre-emptive
> involuntary context switch when the bug occurred.  You might not need to go
> that far, but it would help to at least find out the current condvar state
> for all the threads involved when you see the futex_wait get stuck.
> 
> I'm working on trying to get my patch available soon for further review and
> testing.
> 
> If you have a self-contained test case that you can release that can even
> occasionally show this new "second bug" (if it is), let me know.  Thanks!

I should clarify what I'm looking for here in particular with the condvar state
in this case, as well as knowing the state of all the other waiters and where
they are stuck (or spinning):

(assuming this second bug exists) it seems that it is stuck on the futex_wait
for g_refs to become 0 and that an interrupt to the futex_wait from signal
handlers "fixes" the problem, which implies the underlying condvar state is
currently correct and that there's just a missing/lost futex_wake.  However, is
the wakeup to this particular futex_wait the one that fixes it?  Or is it an
extra wakeup to a waiter blocked on g_signals that is needed to fix it?   We
need to see if the condvar state shows that the g_refs waiter already has what
it needs to continue and just missed a wakeup, or whether it is legitimately
still waiting for g_refs to reach 0, and it's still at a refcount of 1 or more
because there's another waiter that is still blocked on g_signals and has not
been awakened to see that the group is now closed.  (the more likely cause) 
Seeing the condvar state would greatly help in figuring out which case it is.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

next prev parent reply	other threads:[~2021-04-15 14:34 UTC|newest]

Thread overview: 93+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-18  8:04 [Bug nptl/25847] New: " qin.li at thetradedesk dot com
2020-04-18  8:07 ` [Bug nptl/25847] " qin.li at thetradedesk dot com
2020-04-20 17:52 ` qin.li at thetradedesk dot com
2020-05-05 11:50 ` arun11299 at gmail dot com
2020-05-05 11:52 ` arun11299 at gmail dot com
2020-05-14  8:52 ` dingxiangfei2009 at protonmail dot ch
2020-05-19 21:15 ` carlos at redhat dot com
2020-10-08 19:07 ` michael.bacarella at gmail dot com
2020-10-10  8:11 ` flo at geekplace dot eu
2020-10-12 20:53 ` anssi.hannula at iki dot fi
2020-10-13 16:50 ` guillaume at morinfr dot org
2020-10-16 20:23 ` kirichenkoga at gmail dot com
2020-10-18 22:55 ` michael.bacarella at gmail dot com
2020-10-20 20:12 ` kirichenkoga at gmail dot com
2020-10-20 20:17 ` kirichenkoga at gmail dot com
2020-11-01 13:58 ` malteskarupke at fastmail dot fm
2020-11-01 15:40 ` michael.bacarella at gmail dot com
2020-11-15  2:45 ` malteskarupke at fastmail dot fm
2020-11-23 10:37 ` ydroneaud at opteya dot com
2020-11-23 16:36 ` mattst88 at gmail dot com
2020-11-23 16:46 ` adhemerval.zanella at linaro dot org
2020-12-07 15:41 ` balint at balintreczey dot hu
2020-12-09  3:22 ` malteskarupke at fastmail dot fm
2020-12-24 20:02 ` triegel at redhat dot com
2020-12-25 16:19 ` triegel at redhat dot com
2021-01-07  1:09 ` manojgupta at google dot com
2021-01-07  7:31 ` balint at balintreczey dot hu
2021-01-07 13:54 ` malteskarupke at fastmail dot fm
2021-01-07 20:43 ` triegel at redhat dot com
2021-01-07 23:31 ` triegel at redhat dot com
2021-01-08  3:45 ` malteskarupke at fastmail dot fm
2021-01-16  0:21 ` triegel at redhat dot com
2021-01-16  0:23 ` triegel at redhat dot com
2021-01-30  0:59 ` michael.bacarella at gmail dot com
2021-02-07 17:38 ` slav.isv at gmail dot com
2021-03-09 15:18 ` bugdal at aerifal dot cx
2021-04-06  0:37 ` frankbarrus_sw at shaggy dot cc
2021-04-06  1:17 ` frankbarrus_sw at shaggy dot cc
2021-04-06 14:32 ` frankbarrus_sw at shaggy dot cc
2021-04-06 16:49 ` frankbarrus_sw at shaggy dot cc
2021-04-11 12:12 ` carlos at redhat dot com
2021-04-11 12:13 ` carlos at redhat dot com
2021-04-13 12:21 ` frankbarrus_sw at shaggy dot cc
2021-04-14 16:57 ` qin.li at thetradedesk dot com
2021-04-15 14:13 ` frankbarrus_sw at shaggy dot cc
2021-04-15 14:34 ` frankbarrus_sw at shaggy dot cc [this message]
2021-04-30 17:41 ` venkiram_be at yahoo dot co.in
2021-05-04 22:48 ` frankbarrus_sw at shaggy dot cc
2021-05-04 22:51 ` frankbarrus_sw at shaggy dot cc
2021-05-04 22:58 ` frankbarrus_sw at shaggy dot cc
2021-05-13 13:25 ` tuliom at ascii dot art.br
2021-06-14 13:31 ` willireamangel at gmail dot com
2021-07-09  1:41 ` uwydoc at gmail dot com
2021-07-15  0:55 ` benh at kernel dot crashing.org
2021-08-16  9:41 ` evgeny+sourceware at loop54 dot com
2021-09-13 16:50 ` dushistov at mail dot ru
2021-09-22 19:03 ` evgeny+sourceware at loop54 dot com
2021-09-22 19:07 ` balint at balintreczey dot hu
2021-09-24  0:18 ` tuliom at ascii dot art.br
2021-09-24  0:58 ` michael.bacarella at gmail dot com
2021-09-29 11:50 ` fweimer at redhat dot com
2021-10-21 15:42 ` fweimer at redhat dot com
2021-10-30 22:17 ` sam at gentoo dot org
2021-11-25 14:49 ` arekm at maven dot pl
2022-09-18  5:38 ` malteskarupke at fastmail dot fm
2022-09-18 20:06 ` carlos at redhat dot com
2022-09-19  3:38 ` malteskarupke at fastmail dot fm
2022-09-24  0:03 ` bugzilla at dimebar dot com
2022-09-24 10:15 ` ismail at i10z dot com
2022-09-26 14:28 ` ehagberg at janestreet dot com
2022-09-26 14:32 ` ehagberg at janestreet dot com
2022-10-06 21:58 ` malteskarupke at fastmail dot fm
2022-10-07 12:01 ` crrodriguez at opensuse dot org
2022-10-15 19:57 ` malteskarupke at fastmail dot fm
2022-11-07 18:23 ` sourceware-bugzilla at djr61 dot uk
2023-01-28 14:57 ` malteskarupke at fastmail dot fm
2023-05-01 12:52 ` carlos at redhat dot com
2023-05-02 12:57 ` carlos at redhat dot com
2023-05-03  3:04 ` malteskarupke at fastmail dot fm
2023-05-04  4:57 ` malteskarupke at fastmail dot fm
2023-05-04 12:24 ` carlos at redhat dot com
2023-05-05 23:44 ` carlos at redhat dot com
2023-05-10 21:29 ` frankbarrus_sw at shaggy dot cc
2023-05-10 21:39 ` frankbarrus_sw at shaggy dot cc
2023-05-11  0:22 ` frankbarrus_sw at shaggy dot cc
2023-05-11 12:01 ` carlos at redhat dot com
2023-05-11 12:05 ` carlos at redhat dot com
2023-05-13  4:10 ` malteskarupke at fastmail dot fm
2023-08-24 20:24 ` jwakely.gcc at gmail dot com
2023-09-26 12:33 ` malteskarupke at fastmail dot fm
2023-09-26 12:38 ` fweimer at redhat dot com
2024-01-05  7:31 ` malteskarupke at fastmail dot fm
2024-02-17  9:44 ` github at kalvdans dot no-ip.org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-25847-131-tkw5GBFe7l@http.sourceware.org/bugzilla/ \
    --to=sourceware-bugzilla@sourceware.org \
    --cc=glibc-bugs@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).