Re: [PATCH] libstdc++: Skip atomic instructions in _Sp_counted_base::_M_release when both counts are 1

public inbox for libstdc++@gcc.gnu.org
 help / color / mirror / Atom feed

From: Jonathan Wakely <jwakely.gcc@gmail.com>
To: Maged Michael <maged.michael@gmail.com>
Cc: "libstdc++" <libstdc++@gcc.gnu.org>,
	gcc-patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] libstdc++: Skip atomic instructions in _Sp_counted_base::_M_release when both counts are 1
Date: Fri, 16 Jul 2021 14:54:56 +0100	[thread overview]
Message-ID: <CAH6eHdQzMFVVrnaKiL5v_bF-HtC+H1reLaimN6x=9LxRxvERCg@mail.gmail.com> (raw)
In-Reply-To: <CABBFi095Sn2-=DiZyLEfAP2=XCf9jA6F6uATBBe0LAK16EhogA@mail.gmail.com>

On Thu, 17 Dec 2020 at 20:51, Maged Michael wrote:
>
> Please find a proposed patch for _Sp_counted_base::_M_release to skip the
> two atomic instructions that decrement each of the use count and the weak
> count when both are 1. I proposed the general idea in an earlier thread (
> https://gcc.gnu.org/pipermail/libstdc++/2020-December/051642.html) and got
> useful feedback on a draft patch and responses to related questions about
> multi-granular atomicity and alignment. This patch is based on that
> feedback.
>
>
> I added a check for thread sanitizer to use the current algorithm in that
> case because TSAN does not support multi-granular atomicity. I'd like to
> add a check of __has_feature(thread_sanitizer) for building using LLVM. I
> found examples of __has_feature in libstdc++

There are no uses of __has_feature in libstdc++. We do use
__has_builtin (which GCC also supports) and Clang's __is_identifier
(which GCC doesn't support) to work around some weird semantics of
__has_builtin in older versions of Clang.


> but it doesn't seem to be
> recognized in shared_ptr_base.h. Any guidance on how to check
> __has_feature(thread_sanitizer) in this patch?

I think we want to do something like this in include/bits/c++config

#if __SANITIZE_THREAD__
#  define _GLIBCXX_TSAN 1
#elif defined __has_feature
# if __has_feature(thread_sanitizer)
#  define _GLIBCXX_TSAN 1
# endif
#endif

Then in bits/shared_ptr_base.h

#if _GLIBCXX_TSAN
        _M_release_orig();
        return;
#endif



> GCC generates code for _M_release that is larger and more complex than that
> generated by LLVM. I'd like to file a bug report about that. Jonathan,

Is this the same issue as https://gcc.gnu.org/PR101406 ?

> would you please create a bugzilla account for me (
> https://gcc.gnu.org/bugzilla/) using my gmail address. Thank you.

Done (sorry, I didn't notice the request in this mail until coming
back to it to review the patch properly).



>
>
> Information about the patch:
>
> - Benefits of the patch: Save the cost of the last atomic decrements of
> each of the use count and the weak count in _Sp_counted_base. Atomic
> instructions are significantly slower than regular loads and stores across
> major architectures.
>
> - How current code works: _M_release() atomically decrements the use count,
> checks if it was 1, if so calls _M_dispose(), atomically decrements the
> weak count, checks if it was 1, and if so calls _M_destroy().
>
> - How the proposed patch works: _M_release() loads both use count and weak
> count together atomically (when properly aligned), checks if the value is
> equal to the value of both counts equal to 1 (e.g., 0x100000001), and if so
> calls _M_dispose() and _M_destroy(). Otherwise, it follows the original
> algorithm.
>
> - Why it works: When the current thread executing _M_release() finds each
> of the counts is equal to 1, then (when _lock_policy is _S_atomic) no other
> threads could possibly hold use or weak references to this control block.
> That is, no other threads could possibly access the counts or the protected
> object.
>
> - The proposed patch is intended to interact correctly with current code
> (under certain conditions: _Lock_policy is _S_atomic, proper alignment, and
> native lock-free support for atomic operations). That is, multiple threads
> using different versions of the code with and without the patch operating
> on the same objects should always interact correctly. The intent for the
> patch is to be ABI compatible with the current implementation.
>
> - The proposed patch involves a performance trade-off between saving the
> costs of two atomic instructions when the counts are both 1 vs adding the
> cost of loading the combined counts and comparison with two ones (e.g.,
> 0x100000001).
>
> - The patch has been in use (built using LLVM) in a large environment for
> many months. The performance gains outweigh the losses (roughly 10 to 1)
> across a large variety of workloads.
>
>
> I'd appreciate feedback on the patch and any suggestions for checking
> __has_feature(thread_sanitizer).

N.B. gmail completely mangles patches unless you send them as attachments.


> diff --git a/libstdc++-v3/include/bits/shared_ptr_base.h
> b/libstdc++-v3/include/bits/shared_ptr_base.h
>
> index 368b2d7379a..a8fc944af5f 100644
>
> --- a/libstdc++-v3/include/bits/shared_ptr_base.h
>
> +++ b/libstdc++-v3/include/bits/shared_ptr_base.h
>
> @@ -153,20 +153,78 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>
>         if (!_M_add_ref_lock_nothrow())
>
>           __throw_bad_weak_ptr();
>
>        }
>
>
>        bool
>
>        _M_add_ref_lock_nothrow() noexcept;
>
>
>        void
>
>        _M_release() noexcept
>
>        {
>
> +#if __SANITIZE_THREAD__
>
> +        _M_release_orig();
>
> +        return;
>
> +#endif
>
> +        if (!__atomic_always_lock_free(sizeof(long long), 0) ||

The line break should come before the logical operator, not after.
This makes it easier to see which operator it is, because it's at a
predictable position, not off on the right edge somewhere.

i.e.

        if (!__atomic_always_lock_free(sizeof(long long), 0)
           || !__atomic_always_lock_free(sizeof(_Atomic_word), 0)
           || sizeof(long long) < (2 * sizeof(_Atomic_word))
           || sizeof(long long) > (sizeof(void*)))

But I think I'd like to see this condition expressed differently
anyway, see below.

>
> +            !__atomic_always_lock_free(sizeof(_Atomic_word), 0) ||
>
> +            sizeof(long long) < (2 * sizeof(_Atomic_word)) ||

Shouldn't this be != instead of < ?

On a big endian target where sizeof(long long) > sizeof(_Atomic_word)
loading two _Atomic_word objects will fill the high bits of the long
long, and so the (1LL + (1LL << (8 * 4))) calculation won't match what
got loaded into the long long.

>
> +            sizeof(long long) > (sizeof(void*)))

This is checking the alignment, right? I think we can do so more
reliably, and should comment it.

I think I'd like to see this condition defined as a number of
constexpr booleans, with comments. Maybe:

constexpr bool __lock_free
  = __atomic_always_lock_free(sizeof(long long), 0)
     && __atomic_always_lock_free(sizeof(_Atomic_word), 0);
constexpr bool __double_word
  = sizeof(long long) == 2 * sizeof(_Atomic_word);
// The ref-count members follow the vptr, so are aligned to alignof(void*).
constexpr bool __aligned = alignof(long long) <= alignof(void*);

if _GLIBCXX17_CONSTEXPR
  (__lock_free && __double_word && __aligned)
  {
    _M_release_double_width_cas();
    return;
  }
else
  {
    // ... original body of _M_release();
  }

> +          {
>
> +            _M_release_orig();
>
> +            return;
>
> +          }
>
> +        _GLIBCXX_SYNCHRONIZATION_HAPPENS_BEFORE(&_M_use_count);
>
> +        _GLIBCXX_SYNCHRONIZATION_HAPPENS_BEFORE(&_M_weak_count);
>
> +        if (__atomic_load_n((long long*)(&_M_use_count), __ATOMIC_ACQUIRE)
>
> +            == (1LL + (1LL << (8 * sizeof(_Atomic_word)))))

This should use __CHAR_BIT__ instead of 8.


>
> +          {
>
> +            // Both counts are 1, so there are no weak references and
>
> +            // we are releasing the last strong reference. No other
>
> +            // threads can observe the effects of this _M_release()
>
> +            // call (e.g. calling use_count()) without a data race.
>
> +            *(long long*)(&_M_use_count) = 0;
>
> +            _GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(&_M_use_count);
>
> +            _GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(&_M_weak_count);
>
> +            _M_dispose();
>
> +            _M_destroy();
>
> +          }
>
> +        else
>
> +          {
>
> +            if ((__gnu_cxx::__exchange_and_add(&_M_use_count, -1) == 1))
>
> +              {
>
> +                _M_release_last_use();
>
> +              }
>
> +          }
>
> +      }
>
> +
>
> +      void
>
> +      __attribute__ ((noinline))
>
> +      _M_release_last_use() noexcept
>
> +      {
>
> +        _GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(&_M_use_count);
>
> +        _M_dispose();
>
> +        if (_Mutex_base<_Lp>::_S_need_barriers)
>
> +          {
>
> +            __atomic_thread_fence (__ATOMIC_ACQ_REL);
>
> +          }
>
> +        _GLIBCXX_SYNCHRONIZATION_HAPPENS_BEFORE(&_M_weak_count);
>
> +        if (__gnu_cxx::__exchange_and_add_dispatch(&_M_weak_count,
>
> +                                                   -1) == 1)
>
> +          {
>
> +            _GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(&_M_weak_count);
>
> +            _M_destroy();
>
> +          }
>
> +      }
>
> +
>
> +      void
>
> +      _M_release_orig() noexcept
>
> +      {
>
>          // Be race-detector-friendly.  For more info see bits/c++config.
>
>          _GLIBCXX_SYNCHRONIZATION_HAPPENS_BEFORE(&_M_use_count);
>
>         if (__gnu_cxx::__exchange_and_add_dispatch(&_M_use_count, -1) == 1)
>
>           {
>
>              _GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(&_M_use_count);
>
>             _M_dispose();
>
>             // There must be a memory barrier between dispose() and
> destroy()
>
>             // to ensure that the effects of dispose() are observed in the
>
>             // thread that runs destroy().
>
>             // See http://gcc.gnu.org/ml/libstdc++/2005-11/msg00136.html
>
> @@ -279,20 +337,27 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>
>      _Sp_counted_base<_S_single>::_M_release() noexcept
>
>      {
>
>        if (--_M_use_count == 0)
>
>          {
>
>            _M_dispose();
>
>            if (--_M_weak_count == 0)
>
>              _M_destroy();
>
>          }
>
>      }
>
>
> +  template<>
>
> +    inline void
>
> +    _Sp_counted_base<_S_mutex>::_M_release() noexcept
>
> +    {
>
> +      _M_release_orig();
>
> +    }
>
> +
>
>    template<>
>
>      inline void
>
>      _Sp_counted_base<_S_single>::_M_weak_add_ref() noexcept
>
>      { ++_M_weak_count; }
>
>
>    template<>
>
>      inline void
>
>      _Sp_counted_base<_S_single>::_M_weak_release() noexcept
>
>      {
>
>        if (--_M_weak_count == 0)

next prev parent reply	other threads:[~2021-07-16 13:55 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-17 20:49 Maged Michael
2021-01-05 16:19 ` Maged Michael
2021-06-10 14:41 ` Maged Michael
2021-07-16 13:54 ` Jonathan Wakely [this message]
2021-07-16 15:55   ` Maged Michael
2021-08-02 13:23     ` Maged Michael
2021-08-02 13:29       ` Maged Michael
2021-08-03 20:59         ` Jonathan Wakely
2021-08-04 15:32           ` Jonathan Wakely
2021-08-04 15:47             ` Maged Michael
2021-08-04 15:52               ` Jonathan Wakely
2021-08-04 17:19             ` Maged Michael
2021-08-04 17:34               ` Maged Michael
2021-08-04 19:32               ` Jonathan Wakely
2021-08-04 19:48                 ` Maged Michael
2021-12-08 11:49                   ` Jonathan Wakely
2021-12-08 19:15                     ` Rainer Orth
2021-12-08 19:21                       ` Jonathan Wakely
2021-12-08 19:27                         ` Jonathan Wakely
2021-12-08 23:47                           ` [committed] libstdc++: Fix undefined shift when _Atomic_word is 64-bit Jonathan Wakely
2021-12-09 12:28                             ` Rainer Orth

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAH6eHdQzMFVVrnaKiL5v_bF-HtC+H1reLaimN6x=9LxRxvERCg@mail.gmail.com' \
    --to=jwakely.gcc@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=libstdc++@gcc.gnu.org \
    --cc=maged.michael@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).