public inbox for libstdc++@gcc.gnu.org
 help / color / mirror / Atom feed
From: Maged Michael <maged.michael@gmail.com>
To: Jonathan Wakely <jwakely.gcc@gmail.com>
Cc: "libstdc++" <libstdc++@gcc.gnu.org>,
	gcc-patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] libstdc++: Skip atomic instructions in _Sp_counted_base::_M_release when both counts are 1
Date: Wed, 4 Aug 2021 13:34:21 -0400	[thread overview]
Message-ID: <CABBFi08ayszLv7_B5j8otgozcm0=7ikewx_jS8pprUWkp+_FpQ@mail.gmail.com> (raw)
In-Reply-To: <CABBFi0-ZfBW4xU9GSbNe8HZaJ44cwdmO-CVjwF-X7XviXZmqow@mail.gmail.com>

On Wed, Aug 4, 2021 at 1:19 PM Maged Michael <maged.michael@gmail.com>
wrote:

> Sorry. I totally missed the rest of your message and the patch. My fuzzy
> eyesight, which usually guesses correctly 90% of the time, mistook
> "Secondly" on a line by itself for "Sincerely" :-)
>
> On Wed, Aug 4, 2021 at 11:32 AM Jonathan Wakely <jwakely.gcc@gmail.com>
> wrote:
>
>> On Tue, 3 Aug 2021 at 21:59, Jonathan Wakely wrote:
>> >
>> > On Mon, 2 Aug 2021 at 14:29, Maged Michael wrote:
>> > >
>> > > This is the right patch. The previous one is missing noexcept. Sorry.
>> > >
>> > >
>> > > On Mon, Aug 2, 2021 at 9:23 AM Maged Michael <maged.michael@gmail.com>
>> wrote:
>> > >>
>> > >> Please find attached an updated patch after incorporating Jonathan's
>> suggestions.
>> > >>
>> > >> Changes from the last patch include:
>> > >> - Add a TSAN macro to bits/c++config.
>> > >> - Use separate constexpr bool-s for the conditions for lock-freedom,
>> double-width and alignment.
>> > >> - Move the code in the optimized path to a separate function
>> _M_release_double_width_cas.
>> >
>> > Thanks for the updated patch. At a quick glance it looks great. I'll
>> > apply it locally and test it tomorrow.
>>
>>
>> It occurs to me now that _M_release_double_width_cas is the wrong
>> name, because we're not doing a DWCAS, just a double-width load. What
>> do you think about renaming it to _M_release_combined instead? Since
>> it does a combined load of the two counts.
>>
>> I'm no longer confident in the alignof suggestion I gave you.
>>
>> +        constexpr bool __double_word
>> +          = sizeof(long long) == 2 * sizeof(_Atomic_word);
>> +        // The ref-count members follow the vptr, so are aligned to
>> +        // alignof(void*).
>> +        constexpr bool __aligned = alignof(long long) <= alignof(void*);
>>
>> For IA32 (i.e. 32-bit x86) this constant will be true, because
>> alignof(long long) and alignof(void*) are both 4, even though
>> sizeof(long long) is 8. So in theory the _M_use_count and
>> _M_weak_count members could be in different cache lines, and the
>> atomic load will not be atomic. I think we want to use __alignof(long
>> long) here to get the "natural" alignment, not the smaller 4B
>> alignment allowed by SysV ABI. That means that we won't do this
>> optimization on IA32, but I think that's acceptable.
>>
>> Alternatively, we could keep using alignof, but with an additional
>> run-time check something like (uintptr_t)&_M_use_count / 64 ==
>> (uintptr_t)&_M_weak_count / 64 to check they're on the same cache
>> line. I think for now we should just use __alignof and live without
>> the optimization on IA32.
>>
>> Secondly:
>>
>> +      void
>> +      __attribute__ ((noinline))
>>
>> This needs to be __noinline__ because noinline is not a reserved word,
>> so users can do:
>> #define noinline 1
>> #include <memory>
>>
>> Was it performance profiling, or code-size measurements (or something
>> else?) that showed this should be non-inline?
>> I'd like to add a comment explaining why we want it to be noinline.
>>
>> The noinlining was based on looking at generated code. That was for
> clang. It was inlining the _M_last_use function for every instance of
> _M_release (e.g., ~shared_ptr). This optimization with the noinline for
> _M_release_last_use ended up reducing massive binary text sizes by 1.5%
> (several megabytes)., which was an extra benefit. Without the noinline we
> saw code size increase.
>
> IIUC, we van use the following. Right?
>
> __attribute__((__noinline__))
>
>
> I didn't understand the part about programmers doing #define noinline 1.
> I don't see code in the patch that uses noinline.
>
> How about something like this comment?
>
> // Noinline to avoid code size increase.
>
>
>
> In that function ...
>>
>> +      _M_release_last_use() noexcept
>> +      {
>> +        _GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(&_M_use_count);
>> +        _M_dispose();
>> +        if (_Mutex_base<_Lp>::_S_need_barriers)
>> +          {
>> +            __atomic_thread_fence (__ATOMIC_ACQ_REL);
>> +          }
>>
>> I think we can remove this fence. The _S_need_barriers constant is
>> only true for the _S_mutex policy, and that just calls
>> _M_release_orig(), so never uses this function. I'll remove it and add
>> a comment noting that the lack of barrier is intentional.
>>
>> +        _GLIBCXX_SYNCHRONIZATION_HAPPENS_BEFORE(&_M_weak_count);
>> +        if (__gnu_cxx::__exchange_and_add_dispatch(&_M_weak_count,
>> +                                                   -1) == 1)
>> +          {
>> +            _GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(&_M_weak_count);
>> +            _M_destroy();
>> +          }
>> +      }
>>
>> Alternatively, we could keep the fence in _M_release_last_use() and
>> refactor the other _M_release functions, so that we have explicit
>> specializations for each of:
>> _Sp_counted_base<_S_single>::_M_release() (already present)
>> _Sp_counted_base<_S_mutex>::_M_release()
>> _Sp_counted_base<_S_atomic>::_M_release()
>>
>> The second and third would be new, as currently they both use the
>> definition in the primary template. The _S_mutex one would just
>> decrement _M_use_count then call _M_release_last_use() (which still
>> has the barrier needed for the _S_mutex policy). The _S_atomic one
>> would have your new optimization. See the attached patch showing what
>> I mean. I find this version a bit simpler to understand, as we just
>> have _M_release and _M_release_last_use, without
>> _M_release_double_width_cas and _M_release_orig. What do you think of
>> this version? Does it lose any important properties of your version
>> which I've failed to notice?
>>
>
> Unless I missed something I don't think the patch is correct. IIUC it uses
> _M_release_last_use as if it is equivalent to what was _M_release_orig,
> which is not true. The logic in _M_release_last_use assumes that the use
> count has already been decremented and the return value was 1. So I don't
> think the refactor works. Please correct me if I missed something..
>
> Actually I take back what I said. Sorry. I think the logic in your patch
is correct. I missed some of the atomic decrements.
But I'd be concerned about performance. If we make _M_release_last_use
noinline then we are adding overhead to the fast path of the original logic
(where both counts are 1).

What do you think?



> If I am not mistaken, how about I prepare a new patch that incorporates
> the following?
> - __noinline__
> - The change to __align in my last message.
> - Remove the unnecessary checks based on _S_need_barriers in the combined
> / last_use path.
> - Going back to having _M_release_orig and _M_release_combined
> - Include all of the comments that you added in your patch.
>
> Would that be OK with you?
>
> Thanks,
> Maged
>
>

  reply	other threads:[~2021-08-04 17:34 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-17 20:49 Maged Michael
2021-01-05 16:19 ` Maged Michael
2021-06-10 14:41 ` Maged Michael
2021-07-16 13:54 ` Jonathan Wakely
2021-07-16 15:55   ` Maged Michael
2021-08-02 13:23     ` Maged Michael
2021-08-02 13:29       ` Maged Michael
2021-08-03 20:59         ` Jonathan Wakely
2021-08-04 15:32           ` Jonathan Wakely
2021-08-04 15:47             ` Maged Michael
2021-08-04 15:52               ` Jonathan Wakely
2021-08-04 17:19             ` Maged Michael
2021-08-04 17:34               ` Maged Michael [this message]
2021-08-04 19:32               ` Jonathan Wakely
2021-08-04 19:48                 ` Maged Michael
2021-12-08 11:49                   ` Jonathan Wakely
2021-12-08 19:15                     ` Rainer Orth
2021-12-08 19:21                       ` Jonathan Wakely
2021-12-08 19:27                         ` Jonathan Wakely
2021-12-08 23:47                           ` [committed] libstdc++: Fix undefined shift when _Atomic_word is 64-bit Jonathan Wakely
2021-12-09 12:28                             ` Rainer Orth

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CABBFi08ayszLv7_B5j8otgozcm0=7ikewx_jS8pprUWkp+_FpQ@mail.gmail.com' \
    --to=maged.michael@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jwakely.gcc@gmail.com \
    --cc=libstdc++@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).