From: Jonathan Wakely <jwakely.gcc@gmail.com>
To: Maged Michael <maged.michael@gmail.com>
Cc: "libstdc++" <libstdc++@gcc.gnu.org>,
gcc-patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] libstdc++: Skip atomic instructions in _Sp_counted_base::_M_release when both counts are 1
Date: Wed, 4 Aug 2021 16:32:36 +0100 [thread overview]
Message-ID: <CAH6eHdT=WCfPt+CaAJtEN_+tbEn52X=Q7jR7_aQmktv6zs-zVg@mail.gmail.com> (raw)
In-Reply-To: <CAH6eHdSH9uOpOLrH37qztYe=eRpdX=iWSYkzAaC4EOZio9ekwA@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 4302 bytes --]
On Tue, 3 Aug 2021 at 21:59, Jonathan Wakely wrote:
>
> On Mon, 2 Aug 2021 at 14:29, Maged Michael wrote:
> >
> > This is the right patch. The previous one is missing noexcept. Sorry.
> >
> >
> > On Mon, Aug 2, 2021 at 9:23 AM Maged Michael <maged.michael@gmail.com> wrote:
> >>
> >> Please find attached an updated patch after incorporating Jonathan's suggestions.
> >>
> >> Changes from the last patch include:
> >> - Add a TSAN macro to bits/c++config.
> >> - Use separate constexpr bool-s for the conditions for lock-freedom, double-width and alignment.
> >> - Move the code in the optimized path to a separate function _M_release_double_width_cas.
>
> Thanks for the updated patch. At a quick glance it looks great. I'll
> apply it locally and test it tomorrow.
It occurs to me now that _M_release_double_width_cas is the wrong
name, because we're not doing a DWCAS, just a double-width load. What
do you think about renaming it to _M_release_combined instead? Since
it does a combined load of the two counts.
I'm no longer confident in the alignof suggestion I gave you.
+ constexpr bool __double_word
+ = sizeof(long long) == 2 * sizeof(_Atomic_word);
+ // The ref-count members follow the vptr, so are aligned to
+ // alignof(void*).
+ constexpr bool __aligned = alignof(long long) <= alignof(void*);
For IA32 (i.e. 32-bit x86) this constant will be true, because
alignof(long long) and alignof(void*) are both 4, even though
sizeof(long long) is 8. So in theory the _M_use_count and
_M_weak_count members could be in different cache lines, and the
atomic load will not be atomic. I think we want to use __alignof(long
long) here to get the "natural" alignment, not the smaller 4B
alignment allowed by SysV ABI. That means that we won't do this
optimization on IA32, but I think that's acceptable.
Alternatively, we could keep using alignof, but with an additional
run-time check something like (uintptr_t)&_M_use_count / 64 ==
(uintptr_t)&_M_weak_count / 64 to check they're on the same cache
line. I think for now we should just use __alignof and live without
the optimization on IA32.
Secondly:
+ void
+ __attribute__ ((noinline))
This needs to be __noinline__ because noinline is not a reserved word,
so users can do:
#define noinline 1
#include <memory>
Was it performance profiling, or code-size measurements (or something
else?) that showed this should be non-inline?
I'd like to add a comment explaining why we want it to be noinline.
In that function ...
+ _M_release_last_use() noexcept
+ {
+ _GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(&_M_use_count);
+ _M_dispose();
+ if (_Mutex_base<_Lp>::_S_need_barriers)
+ {
+ __atomic_thread_fence (__ATOMIC_ACQ_REL);
+ }
I think we can remove this fence. The _S_need_barriers constant is
only true for the _S_mutex policy, and that just calls
_M_release_orig(), so never uses this function. I'll remove it and add
a comment noting that the lack of barrier is intentional.
+ _GLIBCXX_SYNCHRONIZATION_HAPPENS_BEFORE(&_M_weak_count);
+ if (__gnu_cxx::__exchange_and_add_dispatch(&_M_weak_count,
+ -1) == 1)
+ {
+ _GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(&_M_weak_count);
+ _M_destroy();
+ }
+ }
Alternatively, we could keep the fence in _M_release_last_use() and
refactor the other _M_release functions, so that we have explicit
specializations for each of:
_Sp_counted_base<_S_single>::_M_release() (already present)
_Sp_counted_base<_S_mutex>::_M_release()
_Sp_counted_base<_S_atomic>::_M_release()
The second and third would be new, as currently they both use the
definition in the primary template. The _S_mutex one would just
decrement _M_use_count then call _M_release_last_use() (which still
has the barrier needed for the _S_mutex policy). The _S_atomic one
would have your new optimization. See the attached patch showing what
I mean. I find this version a bit simpler to understand, as we just
have _M_release and _M_release_last_use, without
_M_release_double_width_cas and _M_release_orig. What do you think of
this version? Does it lose any important properties of your version
which I've failed to notice?
[-- Attachment #2: patch.txt --]
[-- Type: text/plain, Size: 4410 bytes --]
diff --git a/libstdc++-v3/include/bits/c++config b/libstdc++-v3/include/bits/c++config
index 32b8957f814..07465f7ecd5 100644
--- a/libstdc++-v3/include/bits/c++config
+++ b/libstdc++-v3/include/bits/c++config
@@ -143,6 +143,15 @@
# define _GLIBCXX_NODISCARD
#endif
+// Macro for TSAN.
+#if __SANITIZE_THREAD__
+# define _GLIBCXX_TSAN 1
+#elif defined __has_feature
+# if __has_feature(thread_sanitizer)
+# define _GLIBCXX_TSAN 1
+# endif
+#endif
+
#if __cplusplus
diff --git a/libstdc++-v3/include/bits/shared_ptr_base.h b/libstdc++-v3/include/bits/shared_ptr_base.h
index 5be935d174d..b2397c8fddb 100644
--- a/libstdc++-v3/include/bits/shared_ptr_base.h
+++ b/libstdc++-v3/include/bits/shared_ptr_base.h
@@ -143,10 +143,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
virtual void*
_M_get_deleter(const std::type_info&) noexcept = 0;
+ // Increment the use count (used when the count is greater than zero).
void
_M_add_ref_copy()
{ __gnu_cxx::__atomic_add_dispatch(&_M_use_count, 1); }
+ // Increment the use count if it is non-zero, throw otherwise.
void
_M_add_ref_lock()
{
@@ -154,15 +156,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
__throw_bad_weak_ptr();
}
+ // Increment the use count if it is non-zero.
bool
_M_add_ref_lock_nothrow() noexcept;
+ // Decrement the use count.
void
- _M_release() noexcept
- {
- // Be race-detector-friendly. For more info see bits/c++config.
- _GLIBCXX_SYNCHRONIZATION_HAPPENS_BEFORE(&_M_use_count);
- if (__gnu_cxx::__exchange_and_add_dispatch(&_M_use_count, -1) == 1)
+ _M_release() noexcept;
+
+ // Called by _M_release() when the use count reaches zero.
+ __attribute__((__noinline__))
+ void
+ _M_release_last_use() noexcept
{
_GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(&_M_use_count);
_M_dispose();
@@ -184,12 +189,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
_M_destroy();
}
}
- }
+ // Increment the weak count.
void
_M_weak_add_ref() noexcept
{ __gnu_cxx::__atomic_add_dispatch(&_M_weak_count, 1); }
+ // Decrement the weak count.
void
_M_weak_release() noexcept
{
@@ -286,6 +292,61 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
}
}
+ template<>
+ inline void
+ _Sp_counted_base<_S_mutex>::_M_release() noexcept
+ {
+ // Be race-detector-friendly. For more info see bits/c++config.
+ _GLIBCXX_SYNCHRONIZATION_HAPPENS_BEFORE(&_M_use_count);
+ if (__gnu_cxx::__exchange_and_add_dispatch(&_M_use_count, -1) == 1)
+ {
+ _M_release_last_use();
+ }
+ }
+
+ template<>
+ inline void
+ _Sp_counted_base<_S_atomic>::_M_release() noexcept
+ {
+ _GLIBCXX_SYNCHRONIZATION_HAPPENS_BEFORE(&_M_use_count);
+#if ! _GLIBCXX_TSAN
+ constexpr bool __lock_free
+ = __atomic_always_lock_free(sizeof(long long), 0)
+ && __atomic_always_lock_free(sizeof(_Atomic_word), 0);
+ constexpr bool __double_word
+ = sizeof(long long) == 2 * sizeof(_Atomic_word);
+ // The ref-count members follow the vptr, so are aligned to
+ // alignof(void*).
+ constexpr bool __aligned = __alignof(long long) <= alignof(void*);
+ if _GLIBCXX17_CONSTEXPR (__lock_free && __double_word && __aligned)
+ {
+ constexpr long long __unique_ref
+ = 1LL + (1LL << (__CHAR_BIT__ * sizeof(_Atomic_word)));
+ auto __both_counts = reinterpret_cast<long long*>(&_M_use_count);
+
+ _GLIBCXX_SYNCHRONIZATION_HAPPENS_BEFORE(&_M_weak_count);
+ if (__atomic_load_n(__both_counts, __ATOMIC_ACQUIRE) == __unique_ref)
+ {
+ // Both counts are 1, so there are no weak references and
+ // we are releasing the last strong reference. No other
+ // threads can observe the effects of this _M_release()
+ // call (e.g. calling use_count()) without a data race.
+ *(long long*)(&_M_use_count) = 0;
+ _GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(&_M_use_count);
+ _GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(&_M_weak_count);
+ _M_dispose();
+ _M_destroy();
+ __builtin_puts("double double bye bye");
+ return;
+ }
+ }
+#endif
+ if (__gnu_cxx::__exchange_and_add_dispatch(&_M_use_count, -1) == 1)
+ {
+ _M_release_last_use();
+ }
+ }
+
template<>
inline void
_Sp_counted_base<_S_single>::_M_weak_add_ref() noexcept
next prev parent reply other threads:[~2021-08-04 15:32 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-12-17 20:49 Maged Michael
2021-01-05 16:19 ` Maged Michael
2021-06-10 14:41 ` Maged Michael
2021-07-16 13:54 ` Jonathan Wakely
2021-07-16 15:55 ` Maged Michael
2021-08-02 13:23 ` Maged Michael
2021-08-02 13:29 ` Maged Michael
2021-08-03 20:59 ` Jonathan Wakely
2021-08-04 15:32 ` Jonathan Wakely [this message]
2021-08-04 15:47 ` Maged Michael
2021-08-04 15:52 ` Jonathan Wakely
2021-08-04 17:19 ` Maged Michael
2021-08-04 17:34 ` Maged Michael
2021-08-04 19:32 ` Jonathan Wakely
2021-08-04 19:48 ` Maged Michael
2021-12-08 11:49 ` Jonathan Wakely
2021-12-08 19:15 ` Rainer Orth
2021-12-08 19:21 ` Jonathan Wakely
2021-12-08 19:27 ` Jonathan Wakely
2021-12-08 23:47 ` [committed] libstdc++: Fix undefined shift when _Atomic_word is 64-bit Jonathan Wakely
2021-12-09 12:28 ` Rainer Orth
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAH6eHdT=WCfPt+CaAJtEN_+tbEn52X=Q7jR7_aQmktv6zs-zVg@mail.gmail.com' \
--to=jwakely.gcc@gmail.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=libstdc++@gcc.gnu.org \
--cc=maged.michael@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).