public inbox for libstdc++@gcc.gnu.org
 help / color / mirror / Atom feed
From: Jonathan Wakely <jwakely@redhat.com>
To: Maged Michael <maged.michael@gmail.com>
Cc: libstdc++@gcc.gnu.org
Subject: Re: Proposed patch to skip the last atomic decrements in _Sp_counted_base::_M_release
Date: Mon, 7 Dec 2020 15:37:21 +0000	[thread overview]
Message-ID: <20201207153721.GA4100713@redhat.com> (raw)
In-Reply-To: <CABBFi0-HanBFtYt4J81bpEojGXXi6AQtAak6b7vU6nCKwAg+4g@mail.gmail.com>

On 07/12/20 07:24 -0500, Maged Michael via Libstdc++ wrote:
>Hi all,
>
>The attached patch includes a proposed alternative implementation of
>_Sp_counted_base::_M_release(). I'd appreciate your feedback.

This is very interesting, thanks.

>Benefits: Save the cost of the last atomic decrements of each of the use
>count and the weak count in _Sp_counted_base. Atomic instructions are
>significantly slower than regular loads and stores across
>major architectures.
>
>How current code works: _M_release() atomically decrements the use count,
>checks if it was 1, if so calls _M_dispose(), atomically decrements the
>weak count, checks if it was 1, and if so calls _M_destroy().
>
>How the proposed algorithm works: _M_release() loads both use count and
>weak count together atomically (assuming 8-byte alignment, discussed
>later), checks if the value is equal to 0x100000001 (i.e., both counts are
>equal to 1), and if so calls _M_dispose() and _M_destroy(). Otherwise, it
>follows the original algorithm.
>
>Why it works: When the current thread executing _M_release() finds each of
>the counts is equal to 1, then no other threads could possibly hold use or
>weak references to this control block. That is, no other threads could
>possibly access the counts or the protected object.
>
>There are two crucial high-level issues that I'd like to point out first:
>- Atomicity of access to the counts together
>- Proper alignment of the counts together
>
>The patch is intended to apply the proposed algorithm only to the case of
>64-bit mode, 4-byte counts, and 8-byte aligned _Sp_counted_base.
>
>** Atomicity **
>- The proposed algorithm depends on the mutual atomicity among 8-byte
>atomic operations and 4-byte atomic operations on each of the 4-byte halves
>of the 8-byte aligned 8-byte block.
>- The standard does not guarantee atomicity of 8-byte operations on a pair
>of 8-byte aligned 4-byte objects.
>- To my knowledge this works in practice on systems that guarantee native
>implementation of 4-byte and 8-byte atomic operations.
>- Can we limit applying the proposed algorithm to architectures that
>guarantee native implementation of atomic operations?

Check __ATOMIC_LLONG_LOCK_FREE > 1

Or std::atomic<long long>::is_always_lock_free
   && std::atomic<_Atomic_word>::is_always_lock_free

>** Alignment **
>- _Sp_counted_base is an internal base class. Three internal classes are
>derived from it.
>- Two of these classes include a pointer as a first member. That is, the
>layout (in the relevant case) is: use count (4 bytes), weak count (4
>bytes), pointer (8 bytes). My understanding is that in these cases the two
>4-byte counts are guaranteed to occupy an 8-byte aligned 8 byte range.

N.B. _Atomic_word is long on sparc64, but that's OK as we won't try to
use the new code in that case.

>- In the third case (_Sp_counted_ptr_inplace), only includes user data
>after the two counts, without necessarily including any 8-byte aligned
>members. For example, if the protected object is int32_t, then the
>_Sp_counted_ptr_inplace object consists of three 4-byte members (the two
>counts inherited from _Sp_counted_base and the user data). My understanding
>is that the standard does not guarantee 8-byte alignment in such a case.
>- Is 8-byte alignment guaranteed in practice in some widely-used
>environments?

For all types derived from _Sp_counted_base the whole class is going
to depend on the alignment of the vptr, which means that even the
_Sp_counted_ptr_inplace is guaranteed to be aligned to alignof(void*).

>- Can 8-byte alignment be checked at build time in some widely-used
>environments?

Yes, we absolutely must check the alignments before using the new code
path, either with static_assert or if-constexpr. The expected
alignment must be checked along with the expected sizes of
_Atomic_word and long long.

On that point, the patch has:

>+        if (!__LP64__ || sizeof(_Atomic_word) != 4 || sizeof(long long) !=

This will fail to compile on all targets except LP64 ones, because
__LP64__ is not defined there.

Why is __LP64__ the right check anyway?

Shouldn't it depend on __ATOMIC_LLONG_LOCK_FREE > 1 because what
matters is that both 8-byte and 4-byte atomics are lock-free?

Why do we care if the sizes are 4 bytes and 8 bytes, wouldn't it also
work for 16-bit int and 32-bit long long?  i.e.

     if constexpr (sizeof(long long) < (2 * sizeof(_Atomic_word))

(Except that the constexpr should be _GLIBCXX17_CONSTEXPR which
expands to nothing for C++11 and C++14, which don't support
if-constexpr).

I think this also needs to be restricted to the _S_atomic lock policy,
otherwise you're mixing different ways to modify the same memory
locations using incompatible techniques.

>Other points:
>- The proposed algorithm can interact correctly with the current algorithm.
>That is, multiple threads using different versions of the code with and
>without the patch operating on the same objects should always interact
>correctly. The intent for the patch is to be ABI compatible with the
>current implementation.

Yes, this is important, but I disagree that the patch preserves that
property. It's only true for the _S_atomic policy.


>- The proposed patch involves a performance trade-off between saving the
>costs of atomic instructions when the counts are both 1 vs adding the cost
>of loading the 8-byte combined counts and comparison with 0x100000001.
>- I noticed a big difference between the code generated by GCC vs LLVM. GCC
>seems to generate noticeably more code and what seems to be redundant null
>checks and branches.
>- The patch has been in use (built using LLVM) in a large environment for
>many months. The performance gains outweigh the losses (roughly 10 to 1)
>across a large variety of workloads.

That's great.

>I'd appreciate your feedback on the alignment and atomicity issues as well
>as any other comments.

Please CC the gcc-patches@gcc.gnu.org list for all patches as well as
the libstdc++ list, thanks!

I think the idea is definitely worth pursuing, but the patch needs
work (which is fine because a change like this won't be able to get
into the repo until after GCC 11 is released in a few months, so
there's no rush).



  parent reply	other threads:[~2020-12-07 15:37 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-07 12:24 Maged Michael
2020-12-07 15:28 ` Marc Glisse
2020-12-07 15:41   ` Jonathan Wakely
2020-12-08  4:10   ` Maged Michael
2020-12-08  8:01     ` Marc Glisse
2020-12-08  9:56       ` Jonathan Wakely
2020-12-07 15:37 ` Jonathan Wakely [this message]
2020-12-08  4:06   ` Maged Michael
2020-12-07 12:30 Maged Michael
2020-12-07 15:57 ` Jonathan Wakely
2020-12-07 16:23   ` Jonathan Wakely

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201207153721.GA4100713@redhat.com \
    --to=jwakely@redhat.com \
    --cc=libstdc++@gcc.gnu.org \
    --cc=maged.michael@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).