public inbox for libstdc++@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] libstdc++: Skip atomic instructions in _Sp_counted_base::_M_release when both counts are 1
@ 2020-12-17 20:49 Maged Michael
  2021-01-05 16:19 ` Maged Michael
                   ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Maged Michael @ 2020-12-17 20:49 UTC (permalink / raw)
  To: libstdc++; +Cc: Maged Michael, gcc-patches

Please find a proposed patch for _Sp_counted_base::_M_release to skip the
two atomic instructions that decrement each of the use count and the weak
count when both are 1. I proposed the general idea in an earlier thread (
https://gcc.gnu.org/pipermail/libstdc++/2020-December/051642.html) and got
useful feedback on a draft patch and responses to related questions about
multi-granular atomicity and alignment. This patch is based on that
feedback.


I added a check for thread sanitizer to use the current algorithm in that
case because TSAN does not support multi-granular atomicity. I'd like to
add a check of __has_feature(thread_sanitizer) for building using LLVM. I
found examples of __has_feature in libstdc++ but it doesn't seem to be
recognized in shared_ptr_base.h. Any guidance on how to check
__has_feature(thread_sanitizer) in this patch?


GCC generates code for _M_release that is larger and more complex than that
generated by LLVM. I'd like to file a bug report about that. Jonathan,
would you please create a bugzilla account for me (
https://gcc.gnu.org/bugzilla/) using my gmail address. Thank you.


Information about the patch:

- Benefits of the patch: Save the cost of the last atomic decrements of
each of the use count and the weak count in _Sp_counted_base. Atomic
instructions are significantly slower than regular loads and stores across
major architectures.

- How current code works: _M_release() atomically decrements the use count,
checks if it was 1, if so calls _M_dispose(), atomically decrements the
weak count, checks if it was 1, and if so calls _M_destroy().

- How the proposed patch works: _M_release() loads both use count and weak
count together atomically (when properly aligned), checks if the value is
equal to the value of both counts equal to 1 (e.g., 0x100000001), and if so
calls _M_dispose() and _M_destroy(). Otherwise, it follows the original
algorithm.

- Why it works: When the current thread executing _M_release() finds each
of the counts is equal to 1, then (when _lock_policy is _S_atomic) no other
threads could possibly hold use or weak references to this control block.
That is, no other threads could possibly access the counts or the protected
object.

- The proposed patch is intended to interact correctly with current code
(under certain conditions: _Lock_policy is _S_atomic, proper alignment, and
native lock-free support for atomic operations). That is, multiple threads
using different versions of the code with and without the patch operating
on the same objects should always interact correctly. The intent for the
patch is to be ABI compatible with the current implementation.

- The proposed patch involves a performance trade-off between saving the
costs of two atomic instructions when the counts are both 1 vs adding the
cost of loading the combined counts and comparison with two ones (e.g.,
0x100000001).

- The patch has been in use (built using LLVM) in a large environment for
many months. The performance gains outweigh the losses (roughly 10 to 1)
across a large variety of workloads.


I'd appreciate feedback on the patch and any suggestions for checking
__has_feature(thread_sanitizer).


Maged



diff --git a/libstdc++-v3/include/bits/shared_ptr_base.h
b/libstdc++-v3/include/bits/shared_ptr_base.h

index 368b2d7379a..a8fc944af5f 100644

--- a/libstdc++-v3/include/bits/shared_ptr_base.h

+++ b/libstdc++-v3/include/bits/shared_ptr_base.h

@@ -153,20 +153,78 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION

        if (!_M_add_ref_lock_nothrow())

          __throw_bad_weak_ptr();

       }


       bool

       _M_add_ref_lock_nothrow() noexcept;


       void

       _M_release() noexcept

       {

+#if __SANITIZE_THREAD__

+        _M_release_orig();

+        return;

+#endif

+        if (!__atomic_always_lock_free(sizeof(long long), 0) ||

+            !__atomic_always_lock_free(sizeof(_Atomic_word), 0) ||

+            sizeof(long long) < (2 * sizeof(_Atomic_word)) ||

+            sizeof(long long) > (sizeof(void*)))

+          {

+            _M_release_orig();

+            return;

+          }

+        _GLIBCXX_SYNCHRONIZATION_HAPPENS_BEFORE(&_M_use_count);

+        _GLIBCXX_SYNCHRONIZATION_HAPPENS_BEFORE(&_M_weak_count);

+        if (__atomic_load_n((long long*)(&_M_use_count), __ATOMIC_ACQUIRE)

+            == (1LL + (1LL << (8 * sizeof(_Atomic_word)))))

+          {

+            // Both counts are 1, so there are no weak references and

+            // we are releasing the last strong reference. No other

+            // threads can observe the effects of this _M_release()

+            // call (e.g. calling use_count()) without a data race.

+            *(long long*)(&_M_use_count) = 0;

+            _GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(&_M_use_count);

+            _GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(&_M_weak_count);

+            _M_dispose();

+            _M_destroy();

+          }

+        else

+          {

+            if ((__gnu_cxx::__exchange_and_add(&_M_use_count, -1) == 1))

+              {

+                _M_release_last_use();

+              }

+          }

+      }

+

+      void

+      __attribute__ ((noinline))

+      _M_release_last_use() noexcept

+      {

+        _GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(&_M_use_count);

+        _M_dispose();

+        if (_Mutex_base<_Lp>::_S_need_barriers)

+          {

+            __atomic_thread_fence (__ATOMIC_ACQ_REL);

+          }

+        _GLIBCXX_SYNCHRONIZATION_HAPPENS_BEFORE(&_M_weak_count);

+        if (__gnu_cxx::__exchange_and_add_dispatch(&_M_weak_count,

+                                                   -1) == 1)

+          {

+            _GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(&_M_weak_count);

+            _M_destroy();

+          }

+      }

+

+      void

+      _M_release_orig() noexcept

+      {

         // Be race-detector-friendly.  For more info see bits/c++config.

         _GLIBCXX_SYNCHRONIZATION_HAPPENS_BEFORE(&_M_use_count);

        if (__gnu_cxx::__exchange_and_add_dispatch(&_M_use_count, -1) == 1)

          {

             _GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(&_M_use_count);

            _M_dispose();

            // There must be a memory barrier between dispose() and
destroy()

            // to ensure that the effects of dispose() are observed in the

            // thread that runs destroy().

            // See http://gcc.gnu.org/ml/libstdc++/2005-11/msg00136.html

@@ -279,20 +337,27 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION

     _Sp_counted_base<_S_single>::_M_release() noexcept

     {

       if (--_M_use_count == 0)

         {

           _M_dispose();

           if (--_M_weak_count == 0)

             _M_destroy();

         }

     }


+  template<>

+    inline void

+    _Sp_counted_base<_S_mutex>::_M_release() noexcept

+    {

+      _M_release_orig();

+    }

+

   template<>

     inline void

     _Sp_counted_base<_S_single>::_M_weak_add_ref() noexcept

     { ++_M_weak_count; }


   template<>

     inline void

     _Sp_counted_base<_S_single>::_M_weak_release() noexcept

     {

       if (--_M_weak_count == 0)

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2021-12-09 12:28 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-17 20:49 [PATCH] libstdc++: Skip atomic instructions in _Sp_counted_base::_M_release when both counts are 1 Maged Michael
2021-01-05 16:19 ` Maged Michael
2021-06-10 14:41 ` Maged Michael
2021-07-16 13:54 ` Jonathan Wakely
2021-07-16 15:55   ` Maged Michael
2021-08-02 13:23     ` Maged Michael
2021-08-02 13:29       ` Maged Michael
2021-08-03 20:59         ` Jonathan Wakely
2021-08-04 15:32           ` Jonathan Wakely
2021-08-04 15:47             ` Maged Michael
2021-08-04 15:52               ` Jonathan Wakely
2021-08-04 17:19             ` Maged Michael
2021-08-04 17:34               ` Maged Michael
2021-08-04 19:32               ` Jonathan Wakely
2021-08-04 19:48                 ` Maged Michael
2021-12-08 11:49                   ` Jonathan Wakely
2021-12-08 19:15                     ` Rainer Orth
2021-12-08 19:21                       ` Jonathan Wakely
2021-12-08 19:27                         ` Jonathan Wakely
2021-12-08 23:47                           ` [committed] libstdc++: Fix undefined shift when _Atomic_word is 64-bit Jonathan Wakely
2021-12-09 12:28                             ` Rainer Orth

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).