public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug libstdc++/111588] New: Provide opt-out of shared_ptr single-threaded optimization
@ 2023-09-25  9:54 redbeard0531 at gmail dot com
  2023-09-25 15:03 ` [Bug libstdc++/111588] " pinskia at gcc dot gnu.org
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: redbeard0531 at gmail dot com @ 2023-09-25  9:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111588

            Bug ID: 111588
           Summary: Provide opt-out of shared_ptr single-threaded
                    optimization
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libstdc++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: redbeard0531 at gmail dot com
  Target Milestone: ---

Right now there is a fast-path for single-threaded programs to avoid the
overhead of atomics in shared_ptr, but there is no equivalent for a program the
knows it is multi-threaded to remove the check and branch. If __GTHREADS is not
defined then no atomic code is emitted.

There are two issues with this: 1) for programs that know they are effectively
always multithreaded they pay for a runtime branch and .text segment bloat for
an optimization that never applies. This may have knock-on effects of making
functions that use shared_ptr less likely to be inlined by pushing them
slightly over the complexity threshold. 2) It invalidates singlethreaded
microbenchmarks of code that uses shared_ptr because the performance of the
code may be very different from when run in the real multithreaded program.

I understand the value of making a fast mode for single-threaded code, and I
can even except having the runtime branch by default, rather than as an opt-in,
when it is unknown if the program will be run with multiple threads. But an
opt-out would be nice to have. If it had to be a gcc-build time option rather
than a #define, that would be acceptable for us since we always use our own
build of gcc, but it seems like a worse option for other users.

FWIW, neither llvm libc++
(https://github.com/llvm/llvm-project/blob/0bfaed8c612705cfa8c5382d26d8089a0a26386b/libcxx/include/__memory/shared_ptr.h#L103-L110)
nor MS-STL
(https://github.com/microsoft/STL/blob/main/stl/inc/memory#L1171-L1173) ever
use runtime branching to detect multithreading.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug libstdc++/111588] Provide opt-out of shared_ptr single-threaded optimization
  2023-09-25  9:54 [Bug libstdc++/111588] New: Provide opt-out of shared_ptr single-threaded optimization redbeard0531 at gmail dot com
@ 2023-09-25 15:03 ` pinskia at gcc dot gnu.org
  2023-09-25 20:11 ` redi at gcc dot gnu.org
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-09-25 15:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111588

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
>for programs that know they are effectively always multithreaded they pay for a runtime branch and .text segment bloat for an optimization that never applies.

The bloat is not much and the overhead for a branch compared to atomics is
still not going to have a bent.


I suspect you are looking into the wrong place for optimizations really.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug libstdc++/111588] Provide opt-out of shared_ptr single-threaded optimization
  2023-09-25  9:54 [Bug libstdc++/111588] New: Provide opt-out of shared_ptr single-threaded optimization redbeard0531 at gmail dot com
  2023-09-25 15:03 ` [Bug libstdc++/111588] " pinskia at gcc dot gnu.org
@ 2023-09-25 20:11 ` redi at gcc dot gnu.org
  2023-09-27 12:40 ` redi at gcc dot gnu.org
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: redi at gcc dot gnu.org @ 2023-09-25 20:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111588

--- Comment #2 from Jonathan Wakely <redi at gcc dot gnu.org> ---
This needs numbers, not opinions.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug libstdc++/111588] Provide opt-out of shared_ptr single-threaded optimization
  2023-09-25  9:54 [Bug libstdc++/111588] New: Provide opt-out of shared_ptr single-threaded optimization redbeard0531 at gmail dot com
  2023-09-25 15:03 ` [Bug libstdc++/111588] " pinskia at gcc dot gnu.org
  2023-09-25 20:11 ` redi at gcc dot gnu.org
@ 2023-09-27 12:40 ` redi at gcc dot gnu.org
  2023-10-27 10:01 ` redbeard0531 at gmail dot com
  2023-10-27 10:12 ` redbeard0531 at gmail dot com
  4 siblings, 0 replies; 6+ messages in thread
From: redi at gcc dot gnu.org @ 2023-09-27 12:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111588

--- Comment #3 from Jonathan Wakely <redi at gcc dot gnu.org> ---
If we do want to do it, I think we'd just need something like this (and docs):

--- a/libstdc++-v3/include/ext/atomicity.h
+++ b/libstdc++-v3/include/ext/atomicity.h
@@ -48,6 +48,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   {
 #ifndef __GTHREADS
     return true;
+#elif defined(_GLIBCXX_ASSUME_NEVER_SINGLE_THREADED)
+    return false;
 #elif __has_include(<sys/single_threaded.h>)
     return ::__libc_single_threaded;
 #else

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug libstdc++/111588] Provide opt-out of shared_ptr single-threaded optimization
  2023-09-25  9:54 [Bug libstdc++/111588] New: Provide opt-out of shared_ptr single-threaded optimization redbeard0531 at gmail dot com
                   ` (2 preceding siblings ...)
  2023-09-27 12:40 ` redi at gcc dot gnu.org
@ 2023-10-27 10:01 ` redbeard0531 at gmail dot com
  2023-10-27 10:12 ` redbeard0531 at gmail dot com
  4 siblings, 0 replies; 6+ messages in thread
From: redbeard0531 at gmail dot com @ 2023-10-27 10:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111588

--- Comment #4 from Mathias Stearn <redbeard0531 at gmail dot com> ---
So even though I already knew in the back of my mind about how this can affect
benchmark results, I *still* got burned by it! I was benchmarking a simple
hazard pointer implementation against shared ptrs by having them crab-walk a
list of 1000 pointers. This was an admittedly simple and unrealistic benchmark
that only ever accessed the objects from a single thread and never had any
contention. It was a few hours after getting the initial results that I
remembered this issue and went back to add a bg thread.

> This needs numbers, not opinions

While my biggest concern was misleading benchmarks (which I now feel a bit
validated by my own mistake :), here are the numbers I see. I added
boost::shared_ptr on the assumption (unvalidated) that the primary difference
would be that it unconditionally uses atomics.

-----------------------------------------------------------------------
Benchmark                             Time             CPU   Iterations
-----------------------------------------------------------------------
BM_scanLinks_HazPtr                6420 ns         6420 ns       108948
BM_scanLinks_BoostSharedPtr       11223 ns        11223 ns        62380
BM_scanLinks_StdSharedPtr          3217 ns         3217 ns       217621
BM_scanLinks_AtomicSharedPtr      28920 ns        28920 ns        24200

And with a bg thread doing nothing but sleeping:

-----------------------------------------------------------------------
Benchmark                             Time             CPU   Iterations
-----------------------------------------------------------------------
BM_scanLinks_HazPtr                6887 ns         6887 ns       101445
BM_scanLinks_BoostSharedPtr       11224 ns        11224 ns        62373
BM_scanLinks_StdSharedPtr         14221 ns        14221 ns        49221
BM_scanLinks_AtomicSharedPtr      49574 ns        49573 ns        14126

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug libstdc++/111588] Provide opt-out of shared_ptr single-threaded optimization
  2023-09-25  9:54 [Bug libstdc++/111588] New: Provide opt-out of shared_ptr single-threaded optimization redbeard0531 at gmail dot com
                   ` (3 preceding siblings ...)
  2023-10-27 10:01 ` redbeard0531 at gmail dot com
@ 2023-10-27 10:12 ` redbeard0531 at gmail dot com
  4 siblings, 0 replies; 6+ messages in thread
From: redbeard0531 at gmail dot com @ 2023-10-27 10:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111588

--- Comment #5 from Mathias Stearn <redbeard0531 at gmail dot com> ---
Mea culpa. The difference between boost and std was due to the code to
fast-path shared_ptrs that aren't actually shared:
https://github.com/gcc-mirror/gcc/blob/be34a8b538c0f04b11a428bd1a9340eb19dec13f/libstdc%2B%2B-v3/include/bits/shared_ptr_base.h#L324-L362.
I still think that optimization is a good idea, even if it looks bad in this
specific microbenchmark. When that is disabled, they have the same perf, even
with the check for single-threaded.

That said, I'd still love an opt out. For now, I'll just propose that we add a
do-nothing bg thread in our benchmark main() to avoid misleading results.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-10-27 10:12 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-25  9:54 [Bug libstdc++/111588] New: Provide opt-out of shared_ptr single-threaded optimization redbeard0531 at gmail dot com
2023-09-25 15:03 ` [Bug libstdc++/111588] " pinskia at gcc dot gnu.org
2023-09-25 20:11 ` redi at gcc dot gnu.org
2023-09-27 12:40 ` redi at gcc dot gnu.org
2023-10-27 10:01 ` redbeard0531 at gmail dot com
2023-10-27 10:12 ` redbeard0531 at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).