From: Maged Michael <maged.michael@gmail.com>
To: Jonathan Wakely <jwakely@redhat.com>
Cc: libstdc++@gcc.gnu.org
Subject: Re: Proposed patch to skip the last atomic decrements in _Sp_counted_base::_M_release
Date: Mon, 7 Dec 2020 23:06:11 -0500 [thread overview]
Message-ID: <CABBFi0_hi-aD4xVx-_NxmWAVHBZxpRCdV3JR4PnzTvJP1UNRxQ@mail.gmail.com> (raw)
In-Reply-To: <20201207153721.GA4100713@redhat.com>
Jonathan,
Thank you for your comments both on the high-level idea and on the draft
patch. This is exactly what I needed for guidance to work on the patch.
Maged
On Mon, Dec 7, 2020 at 10:37 AM Jonathan Wakely <jwakely@redhat.com> wrote:
> On 07/12/20 07:24 -0500, Maged Michael via Libstdc++ wrote:
> >Hi all,
> >
> >The attached patch includes a proposed alternative implementation of
> >_Sp_counted_base::_M_release(). I'd appreciate your feedback.
>
> This is very interesting, thanks.
>
> >Benefits: Save the cost of the last atomic decrements of each of the use
> >count and the weak count in _Sp_counted_base. Atomic instructions are
> >significantly slower than regular loads and stores across
> >major architectures.
> >
> >How current code works: _M_release() atomically decrements the use count,
> >checks if it was 1, if so calls _M_dispose(), atomically decrements the
> >weak count, checks if it was 1, and if so calls _M_destroy().
> >
> >How the proposed algorithm works: _M_release() loads both use count and
> >weak count together atomically (assuming 8-byte alignment, discussed
> >later), checks if the value is equal to 0x100000001 (i.e., both counts are
> >equal to 1), and if so calls _M_dispose() and _M_destroy(). Otherwise, it
> >follows the original algorithm.
> >
> >Why it works: When the current thread executing _M_release() finds each of
> >the counts is equal to 1, then no other threads could possibly hold use or
> >weak references to this control block. That is, no other threads could
> >possibly access the counts or the protected object.
> >
> >There are two crucial high-level issues that I'd like to point out first:
> >- Atomicity of access to the counts together
> >- Proper alignment of the counts together
> >
> >The patch is intended to apply the proposed algorithm only to the case of
> >64-bit mode, 4-byte counts, and 8-byte aligned _Sp_counted_base.
> >
> >** Atomicity **
> >- The proposed algorithm depends on the mutual atomicity among 8-byte
> >atomic operations and 4-byte atomic operations on each of the 4-byte
> halves
> >of the 8-byte aligned 8-byte block.
> >- The standard does not guarantee atomicity of 8-byte operations on a pair
> >of 8-byte aligned 4-byte objects.
> >- To my knowledge this works in practice on systems that guarantee native
> >implementation of 4-byte and 8-byte atomic operations.
> >- Can we limit applying the proposed algorithm to architectures that
> >guarantee native implementation of atomic operations?
>
> Check __ATOMIC_LLONG_LOCK_FREE > 1
>
> Or std::atomic<long long>::is_always_lock_free
> && std::atomic<_Atomic_word>::is_always_lock_free
>
> >** Alignment **
> >- _Sp_counted_base is an internal base class. Three internal classes are
> >derived from it.
> >- Two of these classes include a pointer as a first member. That is, the
> >layout (in the relevant case) is: use count (4 bytes), weak count (4
> >bytes), pointer (8 bytes). My understanding is that in these cases the two
> >4-byte counts are guaranteed to occupy an 8-byte aligned 8 byte range.
>
> N.B. _Atomic_word is long on sparc64, but that's OK as we won't try to
> use the new code in that case.
>
> >- In the third case (_Sp_counted_ptr_inplace), only includes user data
> >after the two counts, without necessarily including any 8-byte aligned
> >members. For example, if the protected object is int32_t, then the
> >_Sp_counted_ptr_inplace object consists of three 4-byte members (the two
> >counts inherited from _Sp_counted_base and the user data). My
> understanding
> >is that the standard does not guarantee 8-byte alignment in such a case.
> >- Is 8-byte alignment guaranteed in practice in some widely-used
> >environments?
>
> For all types derived from _Sp_counted_base the whole class is going
> to depend on the alignment of the vptr, which means that even the
> _Sp_counted_ptr_inplace is guaranteed to be aligned to alignof(void*).
>
> >- Can 8-byte alignment be checked at build time in some widely-used
> >environments?
>
> Yes, we absolutely must check the alignments before using the new code
> path, either with static_assert or if-constexpr. The expected
> alignment must be checked along with the expected sizes of
> _Atomic_word and long long.
>
> On that point, the patch has:
>
> >+ if (!__LP64__ || sizeof(_Atomic_word) != 4 || sizeof(long long)
> !=
>
> This will fail to compile on all targets except LP64 ones, because
> __LP64__ is not defined there.
>
> Why is __LP64__ the right check anyway?
>
> Shouldn't it depend on __ATOMIC_LLONG_LOCK_FREE > 1 because what
> matters is that both 8-byte and 4-byte atomics are lock-free?
>
> Why do we care if the sizes are 4 bytes and 8 bytes, wouldn't it also
> work for 16-bit int and 32-bit long long? i.e.
>
> if constexpr (sizeof(long long) < (2 * sizeof(_Atomic_word))
>
> (Except that the constexpr should be _GLIBCXX17_CONSTEXPR which
> expands to nothing for C++11 and C++14, which don't support
> if-constexpr).
>
> I think this also needs to be restricted to the _S_atomic lock policy,
> otherwise you're mixing different ways to modify the same memory
> locations using incompatible techniques.
>
> >Other points:
> >- The proposed algorithm can interact correctly with the current
> algorithm.
> >That is, multiple threads using different versions of the code with and
> >without the patch operating on the same objects should always interact
> >correctly. The intent for the patch is to be ABI compatible with the
> >current implementation.
>
> Yes, this is important, but I disagree that the patch preserves that
> property. It's only true for the _S_atomic policy.
>
>
> >- The proposed patch involves a performance trade-off between saving the
> >costs of atomic instructions when the counts are both 1 vs adding the cost
> >of loading the 8-byte combined counts and comparison with 0x100000001.
> >- I noticed a big difference between the code generated by GCC vs LLVM.
> GCC
> >seems to generate noticeably more code and what seems to be redundant null
> >checks and branches.
> >- The patch has been in use (built using LLVM) in a large environment for
> >many months. The performance gains outweigh the losses (roughly 10 to 1)
> >across a large variety of workloads.
>
> That's great.
>
> >I'd appreciate your feedback on the alignment and atomicity issues as well
> >as any other comments.
>
> Please CC the gcc-patches@gcc.gnu.org list for all patches as well as
> the libstdc++ list, thanks!
>
> I think the idea is definitely worth pursuing, but the patch needs
> work (which is fine because a change like this won't be able to get
> into the repo until after GCC 11 is released in a few months, so
> there's no rush).
>
>
>
next prev parent reply other threads:[~2020-12-08 4:06 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-12-07 12:24 Maged Michael
2020-12-07 15:28 ` Marc Glisse
2020-12-07 15:41 ` Jonathan Wakely
2020-12-08 4:10 ` Maged Michael
2020-12-08 8:01 ` Marc Glisse
2020-12-08 9:56 ` Jonathan Wakely
2020-12-07 15:37 ` Jonathan Wakely
2020-12-08 4:06 ` Maged Michael [this message]
2020-12-07 12:30 Maged Michael
2020-12-07 15:57 ` Jonathan Wakely
2020-12-07 16:23 ` Jonathan Wakely
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CABBFi0_hi-aD4xVx-_NxmWAVHBZxpRCdV3JR4PnzTvJP1UNRxQ@mail.gmail.com \
--to=maged.michael@gmail.com \
--cc=jwakely@redhat.com \
--cc=libstdc++@gcc.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).