public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug libstdc++/111588] New: Provide opt-out of shared_ptr single-threaded optimization
@ 2023-09-25 9:54 redbeard0531 at gmail dot com
2023-09-25 15:03 ` [Bug libstdc++/111588] " pinskia at gcc dot gnu.org
` (4 more replies)
0 siblings, 5 replies; 6+ messages in thread
From: redbeard0531 at gmail dot com @ 2023-09-25 9:54 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111588
Bug ID: 111588
Summary: Provide opt-out of shared_ptr single-threaded
optimization
Product: gcc
Version: unknown
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: libstdc++
Assignee: unassigned at gcc dot gnu.org
Reporter: redbeard0531 at gmail dot com
Target Milestone: ---
Right now there is a fast-path for single-threaded programs to avoid the
overhead of atomics in shared_ptr, but there is no equivalent for a program the
knows it is multi-threaded to remove the check and branch. If __GTHREADS is not
defined then no atomic code is emitted.
There are two issues with this: 1) for programs that know they are effectively
always multithreaded they pay for a runtime branch and .text segment bloat for
an optimization that never applies. This may have knock-on effects of making
functions that use shared_ptr less likely to be inlined by pushing them
slightly over the complexity threshold. 2) It invalidates singlethreaded
microbenchmarks of code that uses shared_ptr because the performance of the
code may be very different from when run in the real multithreaded program.
I understand the value of making a fast mode for single-threaded code, and I
can even except having the runtime branch by default, rather than as an opt-in,
when it is unknown if the program will be run with multiple threads. But an
opt-out would be nice to have. If it had to be a gcc-build time option rather
than a #define, that would be acceptable for us since we always use our own
build of gcc, but it seems like a worse option for other users.
FWIW, neither llvm libc++
(https://github.com/llvm/llvm-project/blob/0bfaed8c612705cfa8c5382d26d8089a0a26386b/libcxx/include/__memory/shared_ptr.h#L103-L110)
nor MS-STL
(https://github.com/microsoft/STL/blob/main/stl/inc/memory#L1171-L1173) ever
use runtime branching to detect multithreading.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug libstdc++/111588] Provide opt-out of shared_ptr single-threaded optimization
2023-09-25 9:54 [Bug libstdc++/111588] New: Provide opt-out of shared_ptr single-threaded optimization redbeard0531 at gmail dot com
@ 2023-09-25 15:03 ` pinskia at gcc dot gnu.org
2023-09-25 20:11 ` redi at gcc dot gnu.org
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-09-25 15:03 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111588
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
>for programs that know they are effectively always multithreaded they pay for a runtime branch and .text segment bloat for an optimization that never applies.
The bloat is not much and the overhead for a branch compared to atomics is
still not going to have a bent.
I suspect you are looking into the wrong place for optimizations really.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug libstdc++/111588] Provide opt-out of shared_ptr single-threaded optimization
2023-09-25 9:54 [Bug libstdc++/111588] New: Provide opt-out of shared_ptr single-threaded optimization redbeard0531 at gmail dot com
2023-09-25 15:03 ` [Bug libstdc++/111588] " pinskia at gcc dot gnu.org
@ 2023-09-25 20:11 ` redi at gcc dot gnu.org
2023-09-27 12:40 ` redi at gcc dot gnu.org
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: redi at gcc dot gnu.org @ 2023-09-25 20:11 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111588
--- Comment #2 from Jonathan Wakely <redi at gcc dot gnu.org> ---
This needs numbers, not opinions.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug libstdc++/111588] Provide opt-out of shared_ptr single-threaded optimization
2023-09-25 9:54 [Bug libstdc++/111588] New: Provide opt-out of shared_ptr single-threaded optimization redbeard0531 at gmail dot com
2023-09-25 15:03 ` [Bug libstdc++/111588] " pinskia at gcc dot gnu.org
2023-09-25 20:11 ` redi at gcc dot gnu.org
@ 2023-09-27 12:40 ` redi at gcc dot gnu.org
2023-10-27 10:01 ` redbeard0531 at gmail dot com
2023-10-27 10:12 ` redbeard0531 at gmail dot com
4 siblings, 0 replies; 6+ messages in thread
From: redi at gcc dot gnu.org @ 2023-09-27 12:40 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111588
--- Comment #3 from Jonathan Wakely <redi at gcc dot gnu.org> ---
If we do want to do it, I think we'd just need something like this (and docs):
--- a/libstdc++-v3/include/ext/atomicity.h
+++ b/libstdc++-v3/include/ext/atomicity.h
@@ -48,6 +48,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
{
#ifndef __GTHREADS
return true;
+#elif defined(_GLIBCXX_ASSUME_NEVER_SINGLE_THREADED)
+ return false;
#elif __has_include(<sys/single_threaded.h>)
return ::__libc_single_threaded;
#else
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug libstdc++/111588] Provide opt-out of shared_ptr single-threaded optimization
2023-09-25 9:54 [Bug libstdc++/111588] New: Provide opt-out of shared_ptr single-threaded optimization redbeard0531 at gmail dot com
` (2 preceding siblings ...)
2023-09-27 12:40 ` redi at gcc dot gnu.org
@ 2023-10-27 10:01 ` redbeard0531 at gmail dot com
2023-10-27 10:12 ` redbeard0531 at gmail dot com
4 siblings, 0 replies; 6+ messages in thread
From: redbeard0531 at gmail dot com @ 2023-10-27 10:01 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111588
--- Comment #4 from Mathias Stearn <redbeard0531 at gmail dot com> ---
So even though I already knew in the back of my mind about how this can affect
benchmark results, I *still* got burned by it! I was benchmarking a simple
hazard pointer implementation against shared ptrs by having them crab-walk a
list of 1000 pointers. This was an admittedly simple and unrealistic benchmark
that only ever accessed the objects from a single thread and never had any
contention. It was a few hours after getting the initial results that I
remembered this issue and went back to add a bg thread.
> This needs numbers, not opinions
While my biggest concern was misleading benchmarks (which I now feel a bit
validated by my own mistake :), here are the numbers I see. I added
boost::shared_ptr on the assumption (unvalidated) that the primary difference
would be that it unconditionally uses atomics.
-----------------------------------------------------------------------
Benchmark Time CPU Iterations
-----------------------------------------------------------------------
BM_scanLinks_HazPtr 6420 ns 6420 ns 108948
BM_scanLinks_BoostSharedPtr 11223 ns 11223 ns 62380
BM_scanLinks_StdSharedPtr 3217 ns 3217 ns 217621
BM_scanLinks_AtomicSharedPtr 28920 ns 28920 ns 24200
And with a bg thread doing nothing but sleeping:
-----------------------------------------------------------------------
Benchmark Time CPU Iterations
-----------------------------------------------------------------------
BM_scanLinks_HazPtr 6887 ns 6887 ns 101445
BM_scanLinks_BoostSharedPtr 11224 ns 11224 ns 62373
BM_scanLinks_StdSharedPtr 14221 ns 14221 ns 49221
BM_scanLinks_AtomicSharedPtr 49574 ns 49573 ns 14126
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug libstdc++/111588] Provide opt-out of shared_ptr single-threaded optimization
2023-09-25 9:54 [Bug libstdc++/111588] New: Provide opt-out of shared_ptr single-threaded optimization redbeard0531 at gmail dot com
` (3 preceding siblings ...)
2023-10-27 10:01 ` redbeard0531 at gmail dot com
@ 2023-10-27 10:12 ` redbeard0531 at gmail dot com
4 siblings, 0 replies; 6+ messages in thread
From: redbeard0531 at gmail dot com @ 2023-10-27 10:12 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111588
--- Comment #5 from Mathias Stearn <redbeard0531 at gmail dot com> ---
Mea culpa. The difference between boost and std was due to the code to
fast-path shared_ptrs that aren't actually shared:
https://github.com/gcc-mirror/gcc/blob/be34a8b538c0f04b11a428bd1a9340eb19dec13f/libstdc%2B%2B-v3/include/bits/shared_ptr_base.h#L324-L362.
I still think that optimization is a good idea, even if it looks bad in this
specific microbenchmark. When that is disabled, they have the same perf, even
with the check for single-threaded.
That said, I'd still love an opt out. For now, I'll just propose that we add a
do-nothing bg thread in our benchmark main() to avoid misleading results.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2023-10-27 10:12 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-25 9:54 [Bug libstdc++/111588] New: Provide opt-out of shared_ptr single-threaded optimization redbeard0531 at gmail dot com
2023-09-25 15:03 ` [Bug libstdc++/111588] " pinskia at gcc dot gnu.org
2023-09-25 20:11 ` redi at gcc dot gnu.org
2023-09-27 12:40 ` redi at gcc dot gnu.org
2023-10-27 10:01 ` redbeard0531 at gmail dot com
2023-10-27 10:12 ` redbeard0531 at gmail dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).