[Bug libstdc++/104442] New: atomic<T>::wait incorrectly loops in case of spurious notification when _

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug libstdc++/104442] New: atomic<T>::wait incorrectly loops in case of spurious notification when __waiter is shared
@ 2022-02-08 16:31 poulhies at adacore dot com
  2022-02-09  0:27 ` [Bug libstdc++/104442] " rodgertq at gcc dot gnu.org
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: poulhies at adacore dot com @ 2022-02-08 16:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104442

            Bug ID: 104442
           Summary: atomic<T>::wait incorrectly loops in case of spurious
                    notification when __waiter is shared
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libstdc++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: poulhies at adacore dot com
  Target Milestone: ---

Created attachment 52377
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52377&action=edit
patch fixing the issue

We are observing a deadlock in 100334.cc on vxworks.

This is caused by :

        template<typename _Tp, typename _ValFn>
          void
          _M_do_wait_v(_Tp __old, _ValFn __vfn)
          {
            __platform_wait_t __val;
            if (__base_type::_M_do_spin_v(__old, __vfn, __val))
               return;

            do
              {
                __base_type::_M_w._M_do_wait(__base_type::_M_addr, __val);
              }
            while (__detail::__atomic_compare(__old, __vfn()));
          }

When several thread are sharing the waiter (as in 100334.cc), the notify_one()
will wake all threads blocked in the _M_do_wait() above. The thread whose data
changed exits the loop correctly, but the others are looping back in
_M_do_wait() with the same arguments. As the waiter's value has changed since
the previous iteration but not __val, the method directly returns (as if it had
detected a notification) and the loop continues.

On GNU/Linux, the test is PASS because the main thread is still scheduled and
will do a .store(1) on all atoms, unblocking all the busy-waiting thread (but
the thread doing a busywait can still be observed with gdb).

On vxworks, the main thread is never scheduled again (I think there's no
preemption at the same prio level) and the busy-wait starves the system.

The attached patch is a possible fix. It moves the spin() call inside the loop,
updating the __val at every iteration. A better fix is probably possible but
may require some refactoring (a bit more than I'm comfortable with).

I've checked the patch for regression on gcc-master for x86_64. It also fixes
the test on gcc-11 for aarch64-vxworks7.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug libstdc++/104442] atomic<T>::wait incorrectly loops in case of spurious notification when __waiter is shared
  2022-02-08 16:31 [Bug libstdc++/104442] New: atomic<T>::wait incorrectly loops in case of spurious notification when __waiter is shared poulhies at adacore dot com
@ 2022-02-09  0:27 ` rodgertq at gcc dot gnu.org
  2022-02-09  8:51 ` poulhies at adacore dot com
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: rodgertq at gcc dot gnu.org @ 2022-02-09  0:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104442

Thomas Rodgers <rodgertq at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2022-02-09
                 CC|                            |rodgertq at gcc dot gnu.org
             Status|UNCONFIRMED                 |ASSIGNED

--- Comment #1 from Thomas Rodgers <rodgertq at gcc dot gnu.org> ---
I took a look at whether or not it made sense to do a different refactoring
with this version of the implementation and I don't believe that it does (the
implementation detail here is likely to change substantially when we commit to
an ABI stable version) and indeed, the predicate version does exactly what this
patch proposes.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug libstdc++/104442] atomic<T>::wait incorrectly loops in case of spurious notification when __waiter is shared
  2022-02-08 16:31 [Bug libstdc++/104442] New: atomic<T>::wait incorrectly loops in case of spurious notification when __waiter is shared poulhies at adacore dot com
  2022-02-09  0:27 ` [Bug libstdc++/104442] " rodgertq at gcc dot gnu.org
@ 2022-02-09  8:51 ` poulhies at adacore dot com
  2022-02-09 10:16 ` redi at gcc dot gnu.org
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: poulhies at adacore dot com @ 2022-02-09  8:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104442

--- Comment #2 from Marc Poulhiès <poulhies at adacore dot com> ---
Ok thanks for looking into this.
I'll submit the patch on the list then !

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug libstdc++/104442] atomic<T>::wait incorrectly loops in case of spurious notification when __waiter is shared
  2022-02-08 16:31 [Bug libstdc++/104442] New: atomic<T>::wait incorrectly loops in case of spurious notification when __waiter is shared poulhies at adacore dot com
  2022-02-09  0:27 ` [Bug libstdc++/104442] " rodgertq at gcc dot gnu.org
  2022-02-09  8:51 ` poulhies at adacore dot com
@ 2022-02-09 10:16 ` redi at gcc dot gnu.org
  2022-02-09 15:29 ` rodgertq at gcc dot gnu.org
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: redi at gcc dot gnu.org @ 2022-02-09 10:16 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104442

--- Comment #3 from Jonathan Wakely <redi at gcc dot gnu.org> ---
Tom submitted
https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590050.html already,
which is slightly different.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug libstdc++/104442] atomic<T>::wait incorrectly loops in case of spurious notification when __waiter is shared
  2022-02-08 16:31 [Bug libstdc++/104442] New: atomic<T>::wait incorrectly loops in case of spurious notification when __waiter is shared poulhies at adacore dot com
                   ` (2 preceding siblings ...)
  2022-02-09 10:16 ` redi at gcc dot gnu.org
@ 2022-02-09 15:29 ` rodgertq at gcc dot gnu.org
  2022-02-09 20:31 ` cvs-commit at gcc dot gnu.org
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: rodgertq at gcc dot gnu.org @ 2022-02-09 15:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104442

--- Comment #4 from Thomas Rodgers <rodgertq at gcc dot gnu.org> ---
(In reply to Jonathan Wakely from comment #3)
> Tom submitted
> https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590050.html already,
> which is slightly different.

The primary difference was changing the memory order of the load in _M_do_spin.
I may revert that part of the change and submit it separately, as I've noticed
one other case where I'd like to change it from RELAXED to SEQ_CST.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug libstdc++/104442] atomic<T>::wait incorrectly loops in case of spurious notification when __waiter is shared
  2022-02-08 16:31 [Bug libstdc++/104442] New: atomic<T>::wait incorrectly loops in case of spurious notification when __waiter is shared poulhies at adacore dot com
                   ` (3 preceding siblings ...)
  2022-02-09 15:29 ` rodgertq at gcc dot gnu.org
@ 2022-02-09 20:31 ` cvs-commit at gcc dot gnu.org
  2022-02-09 20:32 ` cvs-commit at gcc dot gnu.org
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-02-09 20:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104442

--- Comment #5 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Thomas Rodgers <rodgertq@gcc.gnu.org>:

https://gcc.gnu.org/g:4cf3c339815cdfa636b25a512f91b63d7c313fd6

commit r12-7151-g4cf3c339815cdfa636b25a512f91b63d7c313fd6
Author: Thomas Rodgers <rodgert@appliantology.com>
Date:   Wed Feb 9 12:29:19 2022 -0800

    libstdc++: Fix deadlock in atomic wait [PR104442]

    This issue was observed as a deadlock in
    29_atomics/atomic/wait_notify/100334.cc on vxworks. When a wait is
    "laundered" (e.g. type T* does not suffice as a waitable address for the
    platform's native waiting primitive), the address waited is that of the
    _M_ver member of __waiter_pool_base, so several threads may wait on the
    same address for unrelated atomic<T> objects. As noted in the PR, the
    implementation correctly exits the wait for the thread whose data
    changed, but not for any other threads waiting on the same address.

    As noted in the PR the __waiter::_M_do_wait_v member was correctly exiting
    but the other waiters were not reloading the value of _M_ver before
    re-entering the wait.

    Moving the spin call inside the loop accomplishes this, and is
    consistent with the predicate accepting version of __waiter::_M_do_wait.

    libstdc++-v3/ChangeLog:

            PR libstdc++/104442
            * include/bits/atomic_wait.h (__waiter::_M_do_wait_v): Move spin
             loop inside do loop so that threads failing the wait, reload
             _M_ver.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug libstdc++/104442] atomic<T>::wait incorrectly loops in case of spurious notification when __waiter is shared
  2022-02-08 16:31 [Bug libstdc++/104442] New: atomic<T>::wait incorrectly loops in case of spurious notification when __waiter is shared poulhies at adacore dot com
                   ` (4 preceding siblings ...)
  2022-02-09 20:31 ` cvs-commit at gcc dot gnu.org
@ 2022-02-09 20:32 ` cvs-commit at gcc dot gnu.org
  2022-02-10  8:52 ` poulhies at adacore dot com
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-02-09 20:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104442

--- Comment #6 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The releases/gcc-11 branch has been updated by Thomas Rodgers
<rodgertq@gcc.gnu.org>:

https://gcc.gnu.org/g:5669a60e2fcc90afaa16fe20391cdcf18ab398d7

commit r11-9547-g5669a60e2fcc90afaa16fe20391cdcf18ab398d7
Author: Thomas Rodgers <rodgert@appliantology.com>
Date:   Wed Feb 9 12:29:19 2022 -0800

    libstdc++: Fix deadlock in atomic wait [PR104442]

    This issue was observed as a deadlock in
    29_atomics/atomic/wait_notify/100334.cc on vxworks. When a wait is
    "laundered" (e.g. type T* does not suffice as a waitable address for the
    platform's native waiting primitive), the address waited is that of the
    _M_ver member of __waiter_pool_base, so several threads may wait on the
    same address for unrelated atomic<T> objects. As noted in the PR, the
    implementation correctly exits the wait for the thread whose data
    changed, but not for any other threads waiting on the same address.

    As noted in the PR the __waiter::_M_do_wait_v member was correctly exiting
    but the other waiters were not reloading the value of _M_ver before
    re-entering the wait.

    Moving the spin call inside the loop accomplishes this, and is
    consistent with the predicate accepting version of __waiter::_M_do_wait.

    libstdc++-v3/ChangeLog:

            PR libstdc++/104442
            * include/bits/atomic_wait.h (__waiter::_M_do_wait_v): Move spin
             loop inside do loop so that threads failing the wait, reload
             _M_ver.

    (cherry picked from commit 4cf3c339815cdfa636b25a512f91b63d7c313fd6)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug libstdc++/104442] atomic<T>::wait incorrectly loops in case of spurious notification when __waiter is shared
  2022-02-08 16:31 [Bug libstdc++/104442] New: atomic<T>::wait incorrectly loops in case of spurious notification when __waiter is shared poulhies at adacore dot com
                   ` (5 preceding siblings ...)
  2022-02-09 20:32 ` cvs-commit at gcc dot gnu.org
@ 2022-02-10  8:52 ` poulhies at adacore dot com
  2022-02-10 17:57 ` rodgertq at gcc dot gnu.org
  2022-02-11  8:16 ` poulhies at adacore dot com
  8 siblings, 0 replies; 10+ messages in thread
From: poulhies at adacore dot com @ 2022-02-10  8:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104442

--- Comment #7 from Marc Poulhiès <poulhies at adacore dot com> ---
Thanks !

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug libstdc++/104442] atomic<T>::wait incorrectly loops in case of spurious notification when __waiter is shared
  2022-02-08 16:31 [Bug libstdc++/104442] New: atomic<T>::wait incorrectly loops in case of spurious notification when __waiter is shared poulhies at adacore dot com
                   ` (6 preceding siblings ...)
  2022-02-10  8:52 ` poulhies at adacore dot com
@ 2022-02-10 17:57 ` rodgertq at gcc dot gnu.org
  2022-02-11  8:16 ` poulhies at adacore dot com
  8 siblings, 0 replies; 10+ messages in thread
From: rodgertq at gcc dot gnu.org @ 2022-02-10 17:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104442

Thomas Rodgers <rodgertq at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|---                         |FIXED

--- Comment #8 from Thomas Rodgers <rodgertq at gcc dot gnu.org> ---
(In reply to Marc Poulhiès from comment #7)
> Thanks !

Since the patch committed is effectively the same as your proposed patch, I am
going to assume this resolves the issue on vxworks, but I have no way to
independently confirm that.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug libstdc++/104442] atomic<T>::wait incorrectly loops in case of spurious notification when __waiter is shared
  2022-02-08 16:31 [Bug libstdc++/104442] New: atomic<T>::wait incorrectly loops in case of spurious notification when __waiter is shared poulhies at adacore dot com
                   ` (7 preceding siblings ...)
  2022-02-10 17:57 ` rodgertq at gcc dot gnu.org
@ 2022-02-11  8:16 ` poulhies at adacore dot com
  8 siblings, 0 replies; 10+ messages in thread
From: poulhies at adacore dot com @ 2022-02-11  8:16 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104442

--- Comment #9 from Marc Poulhiès <poulhies at adacore dot com> ---
FWIW, I confirm that the vxworks issue is fixed in latest gcc-11 branch !

Thanks

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-02-11  8:16 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-08 16:31 [Bug libstdc++/104442] New: atomic<T>::wait incorrectly loops in case of spurious notification when __waiter is shared poulhies at adacore dot com
2022-02-09  0:27 ` [Bug libstdc++/104442] " rodgertq at gcc dot gnu.org
2022-02-09  8:51 ` poulhies at adacore dot com
2022-02-09 10:16 ` redi at gcc dot gnu.org
2022-02-09 15:29 ` rodgertq at gcc dot gnu.org
2022-02-09 20:31 ` cvs-commit at gcc dot gnu.org
2022-02-09 20:32 ` cvs-commit at gcc dot gnu.org
2022-02-10  8:52 ` poulhies at adacore dot com
2022-02-10 17:57 ` rodgertq at gcc dot gnu.org
2022-02-11  8:16 ` poulhies at adacore dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).