Re: [PATCH] libstdc++: Skip atomic instructions in _Sp_counted_base::_M_release when both counts are 1

public inbox for libstdc++@gcc.gnu.org
 help / color / mirror / Atom feed

From: Jonathan Wakely <jwakely.gcc@gmail.com>
To: Maged Michael <maged.michael@gmail.com>
Cc: "libstdc++" <libstdc++@gcc.gnu.org>,
	gcc-patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] libstdc++: Skip atomic instructions in _Sp_counted_base::_M_release when both counts are 1
Date: Wed, 4 Aug 2021 16:32:36 +0100	[thread overview]
Message-ID: <CAH6eHdT=WCfPt+CaAJtEN_+tbEn52X=Q7jR7_aQmktv6zs-zVg@mail.gmail.com> (raw)
In-Reply-To: <CAH6eHdSH9uOpOLrH37qztYe=eRpdX=iWSYkzAaC4EOZio9ekwA@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 4302 bytes --]

On Tue, 3 Aug 2021 at 21:59, Jonathan Wakely wrote:
>
> On Mon, 2 Aug 2021 at 14:29, Maged Michael wrote:
> >
> > This is the right patch. The previous one is missing noexcept. Sorry.
> >
> >
> > On Mon, Aug 2, 2021 at 9:23 AM Maged Michael <maged.michael@gmail.com> wrote:
> >>
> >> Please find attached an updated patch after incorporating Jonathan's suggestions.
> >>
> >> Changes from the last patch include:
> >> - Add a TSAN macro to bits/c++config.
> >> - Use separate constexpr bool-s for the conditions for lock-freedom, double-width and alignment.
> >> - Move the code in the optimized path to a separate function _M_release_double_width_cas.
>
> Thanks for the updated patch. At a quick glance it looks great. I'll
> apply it locally and test it tomorrow.


It occurs to me now that _M_release_double_width_cas is the wrong
name, because we're not doing a DWCAS, just a double-width load. What
do you think about renaming it to _M_release_combined instead? Since
it does a combined load of the two counts.

I'm no longer confident in the alignof suggestion I gave you.

+        constexpr bool __double_word
+          = sizeof(long long) == 2 * sizeof(_Atomic_word);
+        // The ref-count members follow the vptr, so are aligned to
+        // alignof(void*).
+        constexpr bool __aligned = alignof(long long) <= alignof(void*);

For IA32 (i.e. 32-bit x86) this constant will be true, because
alignof(long long) and alignof(void*) are both 4, even though
sizeof(long long) is 8. So in theory the _M_use_count and
_M_weak_count members could be in different cache lines, and the
atomic load will not be atomic. I think we want to use __alignof(long
long) here to get the "natural" alignment, not the smaller 4B
alignment allowed by SysV ABI. That means that we won't do this
optimization on IA32, but I think that's acceptable.

Alternatively, we could keep using alignof, but with an additional
run-time check something like (uintptr_t)&_M_use_count / 64 ==
(uintptr_t)&_M_weak_count / 64 to check they're on the same cache
line. I think for now we should just use __alignof and live without
the optimization on IA32.

Secondly:

+      void
+      __attribute__ ((noinline))

This needs to be __noinline__ because noinline is not a reserved word,
so users can do:
#define noinline 1
#include <memory>

Was it performance profiling, or code-size measurements (or something
else?) that showed this should be non-inline?
I'd like to add a comment explaining why we want it to be noinline.

In that function ...

+      _M_release_last_use() noexcept
+      {
+        _GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(&_M_use_count);
+        _M_dispose();
+        if (_Mutex_base<_Lp>::_S_need_barriers)
+          {
+            __atomic_thread_fence (__ATOMIC_ACQ_REL);
+          }

I think we can remove this fence. The _S_need_barriers constant is
only true for the _S_mutex policy, and that just calls
_M_release_orig(), so never uses this function. I'll remove it and add
a comment noting that the lack of barrier is intentional.

+        _GLIBCXX_SYNCHRONIZATION_HAPPENS_BEFORE(&_M_weak_count);
+        if (__gnu_cxx::__exchange_and_add_dispatch(&_M_weak_count,
+                                                   -1) == 1)
+          {
+            _GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(&_M_weak_count);
+            _M_destroy();
+          }
+      }

Alternatively, we could keep the fence in _M_release_last_use() and
refactor the other _M_release functions, so that we have explicit
specializations for each of:
_Sp_counted_base<_S_single>::_M_release() (already present)
_Sp_counted_base<_S_mutex>::_M_release()
_Sp_counted_base<_S_atomic>::_M_release()

The second and third would be new, as currently they both use the
definition in the primary template. The _S_mutex one would just
decrement _M_use_count then call _M_release_last_use() (which still
has the barrier needed for the _S_mutex policy). The _S_atomic one
would have your new optimization. See the attached patch showing what
I mean. I find this version a bit simpler to understand, as we just
have _M_release and _M_release_last_use, without
_M_release_double_width_cas and _M_release_orig. What do you think of
this version? Does it lose any important properties of your version
which I've failed to notice?

[-- Attachment #2: patch.txt --]
[-- Type: text/plain, Size: 4410 bytes --]

diff --git a/libstdc++-v3/include/bits/c++config b/libstdc++-v3/include/bits/c++config
index 32b8957f814..07465f7ecd5 100644
--- a/libstdc++-v3/include/bits/c++config
+++ b/libstdc++-v3/include/bits/c++config
@@ -143,6 +143,15 @@
 # define _GLIBCXX_NODISCARD
 #endif
 
+// Macro for TSAN.
+#if __SANITIZE_THREAD__
+#  define _GLIBCXX_TSAN 1
+#elif defined __has_feature
+# if __has_feature(thread_sanitizer)
+#  define _GLIBCXX_TSAN 1
+# endif
+#endif
+
 
 
 #if __cplusplus
diff --git a/libstdc++-v3/include/bits/shared_ptr_base.h b/libstdc++-v3/include/bits/shared_ptr_base.h
index 5be935d174d..b2397c8fddb 100644
--- a/libstdc++-v3/include/bits/shared_ptr_base.h
+++ b/libstdc++-v3/include/bits/shared_ptr_base.h
@@ -143,10 +143,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
       virtual void*
       _M_get_deleter(const std::type_info&) noexcept = 0;
 
+      // Increment the use count (used when the count is greater than zero).
       void
       _M_add_ref_copy()
       { __gnu_cxx::__atomic_add_dispatch(&_M_use_count, 1); }
 
+      // Increment the use count if it is non-zero, throw otherwise.
       void
       _M_add_ref_lock()
       {
@@ -154,15 +156,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	  __throw_bad_weak_ptr();
       }
 
+      // Increment the use count if it is non-zero.
       bool
       _M_add_ref_lock_nothrow() noexcept;
 
+      // Decrement the use count.
       void
-      _M_release() noexcept
-      {
-        // Be race-detector-friendly.  For more info see bits/c++config.
-        _GLIBCXX_SYNCHRONIZATION_HAPPENS_BEFORE(&_M_use_count);
-	if (__gnu_cxx::__exchange_and_add_dispatch(&_M_use_count, -1) == 1)
+      _M_release() noexcept;
+
+      // Called by _M_release() when the use count reaches zero.
+      __attribute__((__noinline__))
+      void
+      _M_release_last_use() noexcept
       {
 	_GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(&_M_use_count);
 	_M_dispose();
@@ -184,12 +189,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	    _M_destroy();
 	  }
       }
-      }
 
+      // Increment the weak count.
       void
       _M_weak_add_ref() noexcept
       { __gnu_cxx::__atomic_add_dispatch(&_M_weak_count, 1); }
 
+      // Decrement the weak count.
       void
       _M_weak_release() noexcept
       {
@@ -286,6 +292,61 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
         }
     }
 
+  template<>
+    inline void
+    _Sp_counted_base<_S_mutex>::_M_release() noexcept
+    {
+      // Be race-detector-friendly.  For more info see bits/c++config.
+      _GLIBCXX_SYNCHRONIZATION_HAPPENS_BEFORE(&_M_use_count);
+      if (__gnu_cxx::__exchange_and_add_dispatch(&_M_use_count, -1) == 1)
+	{
+	  _M_release_last_use();
+	}
+    }
+
+  template<>
+    inline void
+    _Sp_counted_base<_S_atomic>::_M_release() noexcept
+    {
+      _GLIBCXX_SYNCHRONIZATION_HAPPENS_BEFORE(&_M_use_count);
+#if ! _GLIBCXX_TSAN
+      constexpr bool __lock_free
+	= __atomic_always_lock_free(sizeof(long long), 0)
+	&& __atomic_always_lock_free(sizeof(_Atomic_word), 0);
+      constexpr bool __double_word
+	= sizeof(long long) == 2 * sizeof(_Atomic_word);
+      // The ref-count members follow the vptr, so are aligned to
+      // alignof(void*).
+      constexpr bool __aligned = __alignof(long long) <= alignof(void*);
+      if _GLIBCXX17_CONSTEXPR (__lock_free && __double_word && __aligned)
+	{
+	  constexpr long long __unique_ref
+	    = 1LL + (1LL << (__CHAR_BIT__ * sizeof(_Atomic_word)));
+	  auto __both_counts = reinterpret_cast<long long*>(&_M_use_count);
+
+	  _GLIBCXX_SYNCHRONIZATION_HAPPENS_BEFORE(&_M_weak_count);
+	  if (__atomic_load_n(__both_counts, __ATOMIC_ACQUIRE) == __unique_ref)
+	    {
+	      // Both counts are 1, so there are no weak references and
+	      // we are releasing the last strong reference. No other
+	      // threads can observe the effects of this _M_release()
+	      // call (e.g. calling use_count()) without a data race.
+	      *(long long*)(&_M_use_count) = 0;
+	      _GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(&_M_use_count);
+	      _GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(&_M_weak_count);
+	      _M_dispose();
+	      _M_destroy();
+	      __builtin_puts("double double bye bye");
+	      return;
+	    }
+	}
+#endif
+      if (__gnu_cxx::__exchange_and_add_dispatch(&_M_use_count, -1) == 1)
+	{
+	  _M_release_last_use();
+	}
+    }
+
   template<>
     inline void
     _Sp_counted_base<_S_single>::_M_weak_add_ref() noexcept

next prev parent reply	other threads:[~2021-08-04 15:32 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-17 20:49 Maged Michael
2021-01-05 16:19 ` Maged Michael
2021-06-10 14:41 ` Maged Michael
2021-07-16 13:54 ` Jonathan Wakely
2021-07-16 15:55   ` Maged Michael
2021-08-02 13:23     ` Maged Michael
2021-08-02 13:29       ` Maged Michael
2021-08-03 20:59         ` Jonathan Wakely
2021-08-04 15:32           ` Jonathan Wakely [this message]
2021-08-04 15:47             ` Maged Michael
2021-08-04 15:52               ` Jonathan Wakely
2021-08-04 17:19             ` Maged Michael
2021-08-04 17:34               ` Maged Michael
2021-08-04 19:32               ` Jonathan Wakely
2021-08-04 19:48                 ` Maged Michael
2021-12-08 11:49                   ` Jonathan Wakely
2021-12-08 19:15                     ` Rainer Orth
2021-12-08 19:21                       ` Jonathan Wakely
2021-12-08 19:27                         ` Jonathan Wakely
2021-12-08 23:47                           ` [committed] libstdc++: Fix undefined shift when _Atomic_word is 64-bit Jonathan Wakely
2021-12-09 12:28                             ` Rainer Orth

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAH6eHdT=WCfPt+CaAJtEN_+tbEn52X=Q7jR7_aQmktv6zs-zVg@mail.gmail.com' \
    --to=jwakely.gcc@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=libstdc++@gcc.gnu.org \
    --cc=maged.michael@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).