public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug libstdc++/104928] New: std::counting_semaphore on Linux can sleep forever
@ 2022-03-15  9:16 angelsl at in04 dot sg
  2022-03-15  9:17 ` [Bug libstdc++/104928] " angelsl at in04 dot sg
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: angelsl at in04 dot sg @ 2022-03-15  9:16 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104928

            Bug ID: 104928
           Summary: std::counting_semaphore on Linux can sleep forever
           Product: gcc
           Version: 11.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libstdc++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: angelsl at in04 dot sg
  Target Milestone: ---

The __atomic_semaphore implementation of std::counting_semaphore can sometimes
sleep forever, especially when there is high contention.

Here is a possible sequence of events that can lead to the issue:

1. _M_counter is 1.
2. thread A enters (semaphore_base.h) _M_acquire -> (atomic_wait.h)
__atomic_wait_address_bare -> _S_do_spin.
3. In _S_do_spin, thread A does __atomic_load on _M_counter and reads 1.
4. A different thread manages to decrement the semaphore. _M_counter is now 0.
5. Thread A enters __atomic_spin, and all calls to __pred (which just does
_S_do_try_acquire) fail, since _M_counter is now 0. Thread A returns from
__atomic_spin and _S_do_spin with false.
6. Another thread signals the semaphore. _M_counter is back to 1.
7. Thread A returns to __atomic_wait_address_bare and proceeds to call
__platform_wait using the value of _M_counter read in step 3 (= 1).
8. In __platform_wait, thread A calls futex on _M_counter with WAIT_PRIVATE and
value 1. Since _M_counter is still 1, the kernel puts thread A to sleep.
9. Unless the semaphore later reaches 0 again, thread A will never be awoken
since _M_release does not call futex WAKE unless the counter is 0.

This is sort of an ABA problem. I think the most proper solution would be to
make it so that futex is called with value 0 (since that is what we are waiting
for _M_counter to change from). But it will need some restructuring to the
current abstractions over futex that are being used here.

Here is an almost 100%-reliable minimal reproducible example for the issue:

---
// g++ -O2 -std=c++20 -o hang hang.cpp
#include <iostream>
#include <semaphore>
#include <thread>
#include <vector>

int main(int argc, char *argv[]) {
  std::counting_semaphore s{0};
  std::vector<std::thread> threads;
  for (size_t i = 0; i < 10; ++i) {
    threads.emplace_back([&s]() {
      for (size_t i = 0; i < 1000000; ++i) {
        s.acquire();
      }
    });
    threads.emplace_back([&s]() {
      for (size_t i = 0; i < 1000000; ++i) {
        s.release();
      }
    });
  }
  for (auto &t : threads) t.join();
  return 0;
}
---

If you run it, you will observe one or two acquirer threads sleeping
indefinitely at the end. If you observe them in gdb, you will see that they are
waiting on futex, and also that the semaphore counter is actually positive.

Other notes:
- This probably affects the implementation of atomic::wait as well, but I have
not tried that.
- FWIW, libc++ also has this exact issue, but they sidestep it by passing a
timeout to futex(). I don't know whether specifying a timeout was done
intentionally to avoid this issue. I also don't personally think that that is a
great solution to this. (If you run the MRE with clang++ -stdlib=libc++, you
will observe the issue manifesting as some acquirer threads sleeping for a few
seconds until the futex timeout expires.)

System information:

Target: x86_64-pc-linux-gnu
Configured with: /build/gcc/src/gcc/configure
--enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++,d --enable-bootstrap
--prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man
--infodir=/usr/share/info --with-bugurl=https://bugs.archlinux.org/
--with-linker-hash-style=gnu --with-system-zlib --enable-__cxa_atexit
--enable-cet=auto --enable-checking=release --enable-clocale=gnu
--enable-default-pie --enable-default-ssp --enable-gnu-indirect-function
--enable-gnu-unique-object --enable-linker-build-id --enable-lto
--enable-multilib --enable-plugin --enable-shared --enable-threads=posix
--disable-libssp --disable-libstdcxx-pch --disable-werror
--with-build-config=bootstrap-lto --enable-link-serialization=1
gdc_include_dir=/usr/include/dlang/gdc
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 11.2.0 (GCC)

I have attached the preprocessed version of the above MRE.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug libstdc++/104928] std::counting_semaphore on Linux can sleep forever
  2022-03-15  9:16 [Bug libstdc++/104928] New: std::counting_semaphore on Linux can sleep forever angelsl at in04 dot sg
@ 2022-03-15  9:17 ` angelsl at in04 dot sg
  2023-12-10  6:59 ` nate at thatsmathematics dot com
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: angelsl at in04 dot sg @ 2022-03-15  9:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104928

--- Comment #1 from Tee Hao Wei <angelsl at in04 dot sg> ---
Created attachment 52628
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52628&action=edit
Preprocessed, gzipped reproducer

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug libstdc++/104928] std::counting_semaphore on Linux can sleep forever
  2022-03-15  9:16 [Bug libstdc++/104928] New: std::counting_semaphore on Linux can sleep forever angelsl at in04 dot sg
  2022-03-15  9:17 ` [Bug libstdc++/104928] " angelsl at in04 dot sg
@ 2023-12-10  6:59 ` nate at thatsmathematics dot com
  2023-12-10 12:46 ` redi at gcc dot gnu.org
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: nate at thatsmathematics dot com @ 2023-12-10  6:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104928

--- Comment #2 from Nate Eldredge <nate at thatsmathematics dot com> ---
This bug is still present.  Tested and reproduced with g++ 13.1.0 (Ubuntu
package), and by inspection of the source code, it's still in the trunk as
well.

Encountered on StackOverflow:
https://stackoverflow.com/questions/77626624/race-condition-in-morriss-algorithm

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug libstdc++/104928] std::counting_semaphore on Linux can sleep forever
  2022-03-15  9:16 [Bug libstdc++/104928] New: std::counting_semaphore on Linux can sleep forever angelsl at in04 dot sg
  2022-03-15  9:17 ` [Bug libstdc++/104928] " angelsl at in04 dot sg
  2023-12-10  6:59 ` nate at thatsmathematics dot com
@ 2023-12-10 12:46 ` redi at gcc dot gnu.org
  2023-12-10 20:33 ` nate at thatsmathematics dot com
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: redi at gcc dot gnu.org @ 2023-12-10 12:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104928

Jonathan Wakely <redi at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2023-12-10
                 CC|                            |rodgertq at gcc dot gnu.org
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |NEW

--- Comment #3 from Jonathan Wakely <redi at gcc dot gnu.org> ---
I can't reproduce the deadlock with the (currently uncommitted) changes in
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636805.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug libstdc++/104928] std::counting_semaphore on Linux can sleep forever
  2022-03-15  9:16 [Bug libstdc++/104928] New: std::counting_semaphore on Linux can sleep forever angelsl at in04 dot sg
                   ` (2 preceding siblings ...)
  2023-12-10 12:46 ` redi at gcc dot gnu.org
@ 2023-12-10 20:33 ` nate at thatsmathematics dot com
  2023-12-10 20:35 ` nate at thatsmathematics dot com
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: nate at thatsmathematics dot com @ 2023-12-10 20:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104928

--- Comment #4 from Nate Eldredge <nate at thatsmathematics dot com> ---
@Jonathan: I think that patch set is on the right track, but it has some other
serious bugs.  For one, __atomic_wait_address calls __detail::__wait_impl with
__args._M_old uninitialized (g++ -O3 -Wall catches it).  There is another
uninitialized warning about __wait_addr that I haven't yet confirmed.  

Lastly, in __wait_impl, there is a test `if (__args &
__wait_flags::__spin_only)`, but __spin_only has two bits set, one of which is
__do_spin.  So in effect, __do_spin (which is set by default on Linux) is taken
to imply __spin_only, with the result that it *only* ever spins, without ever
sleeping.  Thus every semaphore (and maybe other waits too) becomes a spinlock,
which is Not Good.

Should I take this up on gcc-patches, or elsewhere?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug libstdc++/104928] std::counting_semaphore on Linux can sleep forever
  2022-03-15  9:16 [Bug libstdc++/104928] New: std::counting_semaphore on Linux can sleep forever angelsl at in04 dot sg
                   ` (3 preceding siblings ...)
  2023-12-10 20:33 ` nate at thatsmathematics dot com
@ 2023-12-10 20:35 ` nate at thatsmathematics dot com
  2023-12-10 20:51 ` redi at gcc dot gnu.org
  2023-12-11  8:18 ` nate at thatsmathematics dot com
  6 siblings, 0 replies; 8+ messages in thread
From: nate at thatsmathematics dot com @ 2023-12-10 20:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104928

--- Comment #5 from Nate Eldredge <nate at thatsmathematics dot com> ---
Oh wait, disregard that last, I realized that I only applied one of the two
patches.  Let me try again.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug libstdc++/104928] std::counting_semaphore on Linux can sleep forever
  2022-03-15  9:16 [Bug libstdc++/104928] New: std::counting_semaphore on Linux can sleep forever angelsl at in04 dot sg
                   ` (4 preceding siblings ...)
  2023-12-10 20:35 ` nate at thatsmathematics dot com
@ 2023-12-10 20:51 ` redi at gcc dot gnu.org
  2023-12-11  8:18 ` nate at thatsmathematics dot com
  6 siblings, 0 replies; 8+ messages in thread
From: redi at gcc dot gnu.org @ 2023-12-10 20:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104928

--- Comment #6 from Jonathan Wakely <redi at gcc dot gnu.org> ---
If you can reply on gcc-patches (CCing the libstdc++ list) that would be great,
thanks for looking at it!

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug libstdc++/104928] std::counting_semaphore on Linux can sleep forever
  2022-03-15  9:16 [Bug libstdc++/104928] New: std::counting_semaphore on Linux can sleep forever angelsl at in04 dot sg
                   ` (5 preceding siblings ...)
  2023-12-10 20:51 ` redi at gcc dot gnu.org
@ 2023-12-11  8:18 ` nate at thatsmathematics dot com
  6 siblings, 0 replies; 8+ messages in thread
From: nate at thatsmathematics dot com @ 2023-12-11  8:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104928

--- Comment #7 from Nate Eldredge <nate at thatsmathematics dot com> ---
@Jonathan: Done,
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640119.html (sorry, may
not be linked to original threads).

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-12-11  8:18 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-15  9:16 [Bug libstdc++/104928] New: std::counting_semaphore on Linux can sleep forever angelsl at in04 dot sg
2022-03-15  9:17 ` [Bug libstdc++/104928] " angelsl at in04 dot sg
2023-12-10  6:59 ` nate at thatsmathematics dot com
2023-12-10 12:46 ` redi at gcc dot gnu.org
2023-12-10 20:33 ` nate at thatsmathematics dot com
2023-12-10 20:35 ` nate at thatsmathematics dot com
2023-12-10 20:51 ` redi at gcc dot gnu.org
2023-12-11  8:18 ` nate at thatsmathematics dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).