From: Maged Michael <maged.michael@gmail.com>
To: Jonathan Wakely <jwakely.gcc@gmail.com>
Cc: "libstdc++" <libstdc++@gcc.gnu.org>,
gcc-patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] libstdc++: Skip atomic instructions in _Sp_counted_base::_M_release when both counts are 1
Date: Mon, 2 Aug 2021 09:29:21 -0400 [thread overview]
Message-ID: <CABBFi0_QP2Psm2RbSmr38WnjX9MJgehywTfhjXRtJ3rXFsN4-w@mail.gmail.com> (raw)
In-Reply-To: <CABBFi0-+oRNUDM3TxDetxMzhmfzrxBHJQxBvM3TWsogbM8hMZw@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 12713 bytes --]
This is the right patch. The previous one is missing noexcept. Sorry.
On Mon, Aug 2, 2021 at 9:23 AM Maged Michael <maged.michael@gmail.com>
wrote:
> Please find attached an updated patch after incorporating Jonathan's
> suggestions.
>
> Changes from the last patch include:
> - Add a TSAN macro to bits/c++config.
> - Use separate constexpr bool-s for the conditions for lock-freedom,
> double-width and alignment.
> - Move the code in the optimized path to a separate function
> _M_release_double_width_cas.
>
> Thanks,
> Maged
>
>
> On Fri, Jul 16, 2021 at 11:55 AM Maged Michael <maged.michael@gmail.com>
> wrote:
>
>> Thank you, Jonathan, for the detailed comments! I'll update the patch
>> accordingly.
>>
>> On Fri, Jul 16, 2021 at 9:55 AM Jonathan Wakely <jwakely.gcc@gmail.com>
>> wrote:
>>
>>> On Thu, 17 Dec 2020 at 20:51, Maged Michael wrote:
>>> >
>>> > Please find a proposed patch for _Sp_counted_base::_M_release to skip
>>> the
>>> > two atomic instructions that decrement each of the use count and the
>>> weak
>>> > count when both are 1. I proposed the general idea in an earlier
>>> thread (
>>> > https://gcc.gnu.org/pipermail/libstdc++/2020-December/051642.html)
>>> and got
>>> > useful feedback on a draft patch and responses to related questions
>>> about
>>> > multi-granular atomicity and alignment. This patch is based on that
>>> > feedback.
>>> >
>>> >
>>> > I added a check for thread sanitizer to use the current algorithm in
>>> that
>>> > case because TSAN does not support multi-granular atomicity. I'd like
>>> to
>>> > add a check of __has_feature(thread_sanitizer) for building using
>>> LLVM. I
>>> > found examples of __has_feature in libstdc++
>>>
>>> There are no uses of __has_feature in libstdc++. We do use
>>> __has_builtin (which GCC also supports) and Clang's __is_identifier
>>> (which GCC doesn't support) to work around some weird semantics of
>>> __has_builtin in older versions of Clang.
>>>
>>>
>>> > but it doesn't seem to be
>>> > recognized in shared_ptr_base.h. Any guidance on how to check
>>> > __has_feature(thread_sanitizer) in this patch?
>>>
>>> I think we want to do something like this in include/bits/c++config
>>>
>>> #if __SANITIZE_THREAD__
>>> # define _GLIBCXX_TSAN 1
>>> #elif defined __has_feature
>>> # if __has_feature(thread_sanitizer)
>>> # define _GLIBCXX_TSAN 1
>>> # endif
>>> #endif
>>>
>>> Then in bits/shared_ptr_base.h
>>>
>>> #if _GLIBCXX_TSAN
>>> _M_release_orig();
>>> return;
>>> #endif
>>>
>>>
>>>
>>> > GCC generates code for _M_release that is larger and more complex than
>>> that
>>> > generated by LLVM. I'd like to file a bug report about that. Jonathan,
>>>
>>> Is this the same issue as https://gcc.gnu.org/PR101406 ?
>>>
>>> Partly yes. Even when using __atomic_add_dispatch I noticed that clang
>> generated less code than gcc. I see in the response to the issue that the
>> new glibc is expected to optimize better. So maybe this will eliminate the
>> issue.
>>
>>
>>> > would you please create a bugzilla account for me (
>>> > https://gcc.gnu.org/bugzilla/) using my gmail address. Thank you.
>>>
>>> Done (sorry, I didn't notice the request in this mail until coming
>>> back to it to review the patch properly).
>>>
>>> Thank you!
>>
>>
>>>
>>>
>>> >
>>> >
>>> > Information about the patch:
>>> >
>>> > - Benefits of the patch: Save the cost of the last atomic decrements of
>>> > each of the use count and the weak count in _Sp_counted_base. Atomic
>>> > instructions are significantly slower than regular loads and stores
>>> across
>>> > major architectures.
>>> >
>>> > - How current code works: _M_release() atomically decrements the use
>>> count,
>>> > checks if it was 1, if so calls _M_dispose(), atomically decrements the
>>> > weak count, checks if it was 1, and if so calls _M_destroy().
>>> >
>>> > - How the proposed patch works: _M_release() loads both use count and
>>> weak
>>> > count together atomically (when properly aligned), checks if the value
>>> is
>>> > equal to the value of both counts equal to 1 (e.g., 0x100000001), and
>>> if so
>>> > calls _M_dispose() and _M_destroy(). Otherwise, it follows the original
>>> > algorithm.
>>> >
>>> > - Why it works: When the current thread executing _M_release() finds
>>> each
>>> > of the counts is equal to 1, then (when _lock_policy is _S_atomic) no
>>> other
>>> > threads could possibly hold use or weak references to this control
>>> block.
>>> > That is, no other threads could possibly access the counts or the
>>> protected
>>> > object.
>>> >
>>> > - The proposed patch is intended to interact correctly with current
>>> code
>>> > (under certain conditions: _Lock_policy is _S_atomic, proper
>>> alignment, and
>>> > native lock-free support for atomic operations). That is, multiple
>>> threads
>>> > using different versions of the code with and without the patch
>>> operating
>>> > on the same objects should always interact correctly. The intent for
>>> the
>>> > patch is to be ABI compatible with the current implementation.
>>> >
>>> > - The proposed patch involves a performance trade-off between saving
>>> the
>>> > costs of two atomic instructions when the counts are both 1 vs adding
>>> the
>>> > cost of loading the combined counts and comparison with two ones (e.g.,
>>> > 0x100000001).
>>> >
>>> > - The patch has been in use (built using LLVM) in a large environment
>>> for
>>> > many months. The performance gains outweigh the losses (roughly 10 to
>>> 1)
>>> > across a large variety of workloads.
>>> >
>>> >
>>> > I'd appreciate feedback on the patch and any suggestions for checking
>>> > __has_feature(thread_sanitizer).
>>>
>>> N.B. gmail completely mangles patches unless you send them as
>>> attachments.
>>>
>>>
>>> > diff --git a/libstdc++-v3/include/bits/shared_ptr_base.h
>>> > b/libstdc++-v3/include/bits/shared_ptr_base.h
>>> >
>>> > index 368b2d7379a..a8fc944af5f 100644
>>> >
>>> > --- a/libstdc++-v3/include/bits/shared_ptr_base.h
>>> >
>>> > +++ b/libstdc++-v3/include/bits/shared_ptr_base.h
>>> >
>>> > @@ -153,20 +153,78 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>> >
>>> > if (!_M_add_ref_lock_nothrow())
>>> >
>>> > __throw_bad_weak_ptr();
>>> >
>>> > }
>>> >
>>> >
>>> > bool
>>> >
>>> > _M_add_ref_lock_nothrow() noexcept;
>>> >
>>> >
>>> > void
>>> >
>>> > _M_release() noexcept
>>> >
>>> > {
>>> >
>>> > +#if __SANITIZE_THREAD__
>>> >
>>> > + _M_release_orig();
>>> >
>>> > + return;
>>> >
>>> > +#endif
>>> >
>>> > + if (!__atomic_always_lock_free(sizeof(long long), 0) ||
>>>
>>> The line break should come before the logical operator, not after.
>>> This makes it easier to see which operator it is, because it's at a
>>> predictable position, not off on the right edge somewhere.
>>>
>>> i.e.
>>>
>>> if (!__atomic_always_lock_free(sizeof(long long), 0)
>>> || !__atomic_always_lock_free(sizeof(_Atomic_word), 0)
>>> || sizeof(long long) < (2 * sizeof(_Atomic_word))
>>> || sizeof(long long) > (sizeof(void*)))
>>>
>>> But I think I'd like to see this condition expressed differently
>>> anyway, see below.
>>>
>>> >
>>> > + !__atomic_always_lock_free(sizeof(_Atomic_word), 0) ||
>>> >
>>> > + sizeof(long long) < (2 * sizeof(_Atomic_word)) ||
>>>
>>> Shouldn't this be != instead of < ?
>>>
>>> On a big endian target where sizeof(long long) > sizeof(_Atomic_word)
>>> loading two _Atomic_word objects will fill the high bits of the long
>>> long, and so the (1LL + (1LL << (8 * 4))) calculation won't match what
>>> got loaded into the long long.
>>>
>>> >
>>> > + sizeof(long long) > (sizeof(void*)))
>>>
>>> This is checking the alignment, right? I think we can do so more
>>> reliably, and should comment it.
>>>
>>> I think I'd like to see this condition defined as a number of
>>> constexpr booleans, with comments. Maybe:
>>>
>>> constexpr bool __lock_free
>>> = __atomic_always_lock_free(sizeof(long long), 0)
>>> && __atomic_always_lock_free(sizeof(_Atomic_word), 0);
>>> constexpr bool __double_word
>>> = sizeof(long long) == 2 * sizeof(_Atomic_word);
>>> // The ref-count members follow the vptr, so are aligned to
>>> alignof(void*).
>>> constexpr bool __aligned = alignof(long long) <= alignof(void*);
>>>
>>> if _GLIBCXX17_CONSTEXPR
>>> (__lock_free && __double_word && __aligned)
>>> {
>>> _M_release_double_width_cas();
>>> return;
>>> }
>>> else
>>> {
>>> // ... original body of _M_release();
>>> }
>>>
>>> > + {
>>> >
>>> > + _M_release_orig();
>>> >
>>> > + return;
>>> >
>>> > + }
>>> >
>>> > + _GLIBCXX_SYNCHRONIZATION_HAPPENS_BEFORE(&_M_use_count);
>>> >
>>> > + _GLIBCXX_SYNCHRONIZATION_HAPPENS_BEFORE(&_M_weak_count);
>>> >
>>> > + if (__atomic_load_n((long long*)(&_M_use_count),
>>> __ATOMIC_ACQUIRE)
>>> >
>>> > + == (1LL + (1LL << (8 * sizeof(_Atomic_word)))))
>>>
>>> This should use __CHAR_BIT__ instead of 8.
>>>
>>>
>>> >
>>> > + {
>>> >
>>> > + // Both counts are 1, so there are no weak references and
>>> >
>>> > + // we are releasing the last strong reference. No other
>>> >
>>> > + // threads can observe the effects of this _M_release()
>>> >
>>> > + // call (e.g. calling use_count()) without a data race.
>>> >
>>> > + *(long long*)(&_M_use_count) = 0;
>>> >
>>> > + _GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(&_M_use_count);
>>> >
>>> > + _GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(&_M_weak_count);
>>> >
>>> > + _M_dispose();
>>> >
>>> > + _M_destroy();
>>> >
>>> > + }
>>> >
>>> > + else
>>> >
>>> > + {
>>> >
>>> > + if ((__gnu_cxx::__exchange_and_add(&_M_use_count, -1) ==
>>> 1))
>>> >
>>> > + {
>>> >
>>> > + _M_release_last_use();
>>> >
>>> > + }
>>> >
>>> > + }
>>> >
>>> > + }
>>> >
>>> > +
>>> >
>>> > + void
>>> >
>>> > + __attribute__ ((noinline))
>>> >
>>> > + _M_release_last_use() noexcept
>>> >
>>> > + {
>>> >
>>> > + _GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(&_M_use_count);
>>> >
>>> > + _M_dispose();
>>> >
>>> > + if (_Mutex_base<_Lp>::_S_need_barriers)
>>> >
>>> > + {
>>> >
>>> > + __atomic_thread_fence (__ATOMIC_ACQ_REL);
>>> >
>>> > + }
>>> >
>>> > + _GLIBCXX_SYNCHRONIZATION_HAPPENS_BEFORE(&_M_weak_count);
>>> >
>>> > + if (__gnu_cxx::__exchange_and_add_dispatch(&_M_weak_count,
>>> >
>>> > + -1) == 1)
>>> >
>>> > + {
>>> >
>>> > + _GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(&_M_weak_count);
>>> >
>>> > + _M_destroy();
>>> >
>>> > + }
>>> >
>>> > + }
>>> >
>>> > +
>>> >
>>> > + void
>>> >
>>> > + _M_release_orig() noexcept
>>> >
>>> > + {
>>> >
>>> > // Be race-detector-friendly. For more info see
>>> bits/c++config.
>>> >
>>> > _GLIBCXX_SYNCHRONIZATION_HAPPENS_BEFORE(&_M_use_count);
>>> >
>>> > if (__gnu_cxx::__exchange_and_add_dispatch(&_M_use_count, -1)
>>> == 1)
>>> >
>>> > {
>>> >
>>> > _GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(&_M_use_count);
>>> >
>>> > _M_dispose();
>>> >
>>> > // There must be a memory barrier between dispose() and
>>> > destroy()
>>> >
>>> > // to ensure that the effects of dispose() are observed in
>>> the
>>> >
>>> > // thread that runs destroy().
>>> >
>>> > // See
>>> http://gcc.gnu.org/ml/libstdc++/2005-11/msg00136.html
>>> >
>>> > @@ -279,20 +337,27 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>> >
>>> > _Sp_counted_base<_S_single>::_M_release() noexcept
>>> >
>>> > {
>>> >
>>> > if (--_M_use_count == 0)
>>> >
>>> > {
>>> >
>>> > _M_dispose();
>>> >
>>> > if (--_M_weak_count == 0)
>>> >
>>> > _M_destroy();
>>> >
>>> > }
>>> >
>>> > }
>>> >
>>> >
>>> > + template<>
>>> >
>>> > + inline void
>>> >
>>> > + _Sp_counted_base<_S_mutex>::_M_release() noexcept
>>> >
>>> > + {
>>> >
>>> > + _M_release_orig();
>>> >
>>> > + }
>>> >
>>> > +
>>> >
>>> > template<>
>>> >
>>> > inline void
>>> >
>>> > _Sp_counted_base<_S_single>::_M_weak_add_ref() noexcept
>>> >
>>> > { ++_M_weak_count; }
>>> >
>>> >
>>> > template<>
>>> >
>>> > inline void
>>> >
>>> > _Sp_counted_base<_S_single>::_M_weak_release() noexcept
>>> >
>>> > {
>>> >
>>> > if (--_M_weak_count == 0)
>>>
>>
[-- Attachment #2: patch.txt --]
[-- Type: text/plain, Size: 5107 bytes --]
diff --git a/libstdc++-v3/include/bits/c++config b/libstdc++-v3/include/bits/c++config
index 69ace386dd7..543fac268b4 100644
--- a/libstdc++-v3/include/bits/c++config
+++ b/libstdc++-v3/include/bits/c++config
@@ -126,20 +126,29 @@
# define _GLIBCXX_ABI_TAG_CXX11 __attribute ((__abi_tag__ ("cxx11")))
#endif
// Macro to warn about unused results.
#if __cplusplus >= 201703L
# define _GLIBCXX_NODISCARD [[__nodiscard__]]
#else
# define _GLIBCXX_NODISCARD
#endif
+// Macro for TSAN.
+#if __SANITIZE_THREAD__
+# define _GLIBCXX_TSAN 1
+#elif defined __has_feature
+# if __has_feature(thread_sanitizer)
+# define _GLIBCXX_TSAN 1
+# endif
+#endif
+
#if __cplusplus
// Macro for constexpr, to support in mixed 03/0x mode.
#ifndef _GLIBCXX_CONSTEXPR
# if __cplusplus >= 201103L
# define _GLIBCXX_CONSTEXPR constexpr
# define _GLIBCXX_USE_CONSTEXPR constexpr
# else
diff --git a/libstdc++-v3/include/bits/shared_ptr_base.h b/libstdc++-v3/include/bits/shared_ptr_base.h
index 5be935d174d..834ca13bdc6 100644
--- a/libstdc++-v3/include/bits/shared_ptr_base.h
+++ b/libstdc++-v3/include/bits/shared_ptr_base.h
@@ -153,20 +153,93 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
if (!_M_add_ref_lock_nothrow())
__throw_bad_weak_ptr();
}
bool
_M_add_ref_lock_nothrow() noexcept;
void
_M_release() noexcept
{
+#if _GLIBCXX_TSAN
+ _M_release_orig();
+ return;
+#endif
+ constexpr bool __lock_free
+ = __atomic_always_lock_free(sizeof(long long), 0)
+ && __atomic_always_lock_free(sizeof(_Atomic_word), 0);
+ constexpr bool __double_word
+ = sizeof(long long) == 2 * sizeof(_Atomic_word);
+ // The ref-count members follow the vptr, so are aligned to
+ // alignof(void*).
+ constexpr bool __aligned = alignof(long long) <= alignof(void*);
+ if _GLIBCXX17_CONSTEXPR
+ (__lock_free && __double_word && __aligned)
+ {
+ _M_release_double_width_cas();
+ return;
+ }
+ else
+ {
+ _M_release_orig();
+ }
+ }
+
+ void
+ _M_release_double_width_cas() noexcept
+ {
+ _GLIBCXX_SYNCHRONIZATION_HAPPENS_BEFORE(&_M_use_count);
+ _GLIBCXX_SYNCHRONIZATION_HAPPENS_BEFORE(&_M_weak_count);
+ if (__atomic_load_n((long long*)(&_M_use_count), __ATOMIC_ACQUIRE)
+ == (1LL + (1LL << (__CHAR_BIT__ * sizeof(_Atomic_word)))))
+ {
+ // Both counts are 1, so there are no weak references and
+ // we are releasing the last strong reference. No other
+ // threads can observe the effects of this _M_release()
+ // call (e.g. calling use_count()) without a data race.
+ *(long long*)(&_M_use_count) = 0;
+ _GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(&_M_use_count);
+ _GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(&_M_weak_count);
+ _M_dispose();
+ _M_destroy();
+ }
+ else
+ {
+ if ((__gnu_cxx::__exchange_and_add(&_M_use_count, -1) == 1))
+ {
+ _M_release_last_use();
+ }
+ }
+ }
+
+ void
+ __attribute__ ((noinline))
+ _M_release_last_use() noexcept
+ {
+ _GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(&_M_use_count);
+ _M_dispose();
+ if (_Mutex_base<_Lp>::_S_need_barriers)
+ {
+ __atomic_thread_fence (__ATOMIC_ACQ_REL);
+ }
+ _GLIBCXX_SYNCHRONIZATION_HAPPENS_BEFORE(&_M_weak_count);
+ if (__gnu_cxx::__exchange_and_add_dispatch(&_M_weak_count,
+ -1) == 1)
+ {
+ _GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(&_M_weak_count);
+ _M_destroy();
+ }
+ }
+
+ void
+ _M_release_orig() noexcept
+ {
// Be race-detector-friendly. For more info see bits/c++config.
_GLIBCXX_SYNCHRONIZATION_HAPPENS_BEFORE(&_M_use_count);
if (__gnu_cxx::__exchange_and_add_dispatch(&_M_use_count, -1) == 1)
{
_GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(&_M_use_count);
_M_dispose();
// There must be a memory barrier between dispose() and destroy()
// to ensure that the effects of dispose() are observed in the
// thread that runs destroy().
// See http://gcc.gnu.org/ml/libstdc++/2005-11/msg00136.html
@@ -279,20 +352,27 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
_Sp_counted_base<_S_single>::_M_release() noexcept
{
if (--_M_use_count == 0)
{
_M_dispose();
if (--_M_weak_count == 0)
_M_destroy();
}
}
+ template<>
+ inline void
+ _Sp_counted_base<_S_mutex>::_M_release() noexcept
+ {
+ _M_release_orig();
+ }
+
template<>
inline void
_Sp_counted_base<_S_single>::_M_weak_add_ref() noexcept
{ ++_M_weak_count; }
template<>
inline void
_Sp_counted_base<_S_single>::_M_weak_release() noexcept
{
if (--_M_weak_count == 0)
next prev parent reply other threads:[~2021-08-02 13:29 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-12-17 20:49 Maged Michael
2021-01-05 16:19 ` Maged Michael
2021-06-10 14:41 ` Maged Michael
2021-07-16 13:54 ` Jonathan Wakely
2021-07-16 15:55 ` Maged Michael
2021-08-02 13:23 ` Maged Michael
2021-08-02 13:29 ` Maged Michael [this message]
2021-08-03 20:59 ` Jonathan Wakely
2021-08-04 15:32 ` Jonathan Wakely
2021-08-04 15:47 ` Maged Michael
2021-08-04 15:52 ` Jonathan Wakely
2021-08-04 17:19 ` Maged Michael
2021-08-04 17:34 ` Maged Michael
2021-08-04 19:32 ` Jonathan Wakely
2021-08-04 19:48 ` Maged Michael
2021-12-08 11:49 ` Jonathan Wakely
2021-12-08 19:15 ` Rainer Orth
2021-12-08 19:21 ` Jonathan Wakely
2021-12-08 19:27 ` Jonathan Wakely
2021-12-08 23:47 ` [committed] libstdc++: Fix undefined shift when _Atomic_word is 64-bit Jonathan Wakely
2021-12-09 12:28 ` Rainer Orth
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CABBFi0_QP2Psm2RbSmr38WnjX9MJgehywTfhjXRtJ3rXFsN4-w@mail.gmail.com \
--to=maged.michael@gmail.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=jwakely.gcc@gmail.com \
--cc=libstdc++@gcc.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).