public inbox for libstdc++@gcc.gnu.org
 help / color / mirror / Atom feed
From: Jonathan Wakely <jwakely.gcc@gmail.com>
To: Maged Michael <maged.michael@gmail.com>
Cc: "libstdc++" <libstdc++@gcc.gnu.org>,
	gcc-patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] libstdc++: Skip atomic instructions in _Sp_counted_base::_M_release when both counts are 1
Date: Wed, 4 Aug 2021 20:32:02 +0100	[thread overview]
Message-ID: <CAH6eHdQmpH_jpub6YjHuwf=qXJKv0rZnHCBEV4DjBbTax73aJQ@mail.gmail.com> (raw)
In-Reply-To: <CABBFi0-ZfBW4xU9GSbNe8HZaJ44cwdmO-CVjwF-X7XviXZmqow@mail.gmail.com>

On Wed, 4 Aug 2021 at 18:19, Maged Michael wrote:
>
> Sorry. I totally missed the rest of your message and the patch. My fuzzy eyesight, which usually guesses correctly 90% of the time, mistook "Secondly" on a line by itself for "Sincerely" :-)

:-)

> The noinlining was based on looking at generated code. That was for clang. It was inlining the _M_last_use function for every instance of _M_release (e.g., ~shared_ptr). This optimization with the noinline for _M_release_last_use ended up reducing massive binary text sizes by 1.5% (several megabytes)., which was an extra benefit. Without the noinline we saw code size increase.

Wow, that is a convincing argument for making it not inline, thanks.

> IIUC, we van use the following. Right?
>
> __attribute__((__noinline__))

Right.

> I didn't understand the part about programmers doing #define noinline 1. I don't see code in the patch that uses noinline.

This is a valid C++ program:

#define noinline 1
#include <memory>
int main() { }

But if anything in <memory> uses "noinline" then this valid program
will not compile. Which is why we must use ((__noinline__)) instead of
((noinline)).



>
> How about something like this comment?
>
> // Noinline to avoid code size increase.

Great, thanks.

On Wed, 4 Aug 2021 at 18:34, Maged Michael wrote:
> Actually I take back what I said. Sorry. I think the logic in your patch is correct. I missed some of the atomic decrements.
> But I'd be concerned about performance. If we make _M_release_last_use noinline then we are adding overhead to the fast path of the original logic (where both counts are 1).

Oh, I see. So the code duplication serves a purpose. We want the
_M_release_last_use() code to be non-inline for the new logic, because
in the common case we never call it (we either detect that both counts
are 1 and do the dispose & destroy without atomic decrements, or we do
a single decrement and don't dispose or destroy anything). But for the
old logic, we do want that code inlined into _M_release (or
_M_release_orig as it was called in your patch). Have I got that right
now?

What if we remove the __noinline__ from _M_release_last_use() so that
it can be inlined, but than add a noinline wrapper that can be called
when we don't want to inline it?

So:
      // Called by _M_release() when the use count reaches zero.
      void
      _M_release_last_use() noexcept
      {
        // unchanged from previous patch, but without the attribute.
        // ...
      }

      // As above, but 'noinline' to reduce code size on the cold path.
      __attribute__((__noinline__))
      void
      _M_release_last_use_cold() noexcept
      { _M_release_last_use(); }


And then:

  template<>
    inline void
    _Sp_counted_base<_S_atomic>::_M_release() noexcept
    {
      _GLIBCXX_SYNCHRONIZATION_HAPPENS_BEFORE(&_M_use_count);
#if ! _GLIBCXX_TSAN
      constexpr bool __lock_free
        = __atomic_always_lock_free(sizeof(long long), 0)
        && __atomic_always_lock_free(sizeof(_Atomic_word), 0);
      constexpr bool __double_word
        = sizeof(long long) == 2 * sizeof(_Atomic_word);
      // The ref-count members follow the vptr, so are aligned to
      // alignof(void*).
      constexpr bool __aligned = __alignof(long long) <= alignof(void*);
      if _GLIBCXX17_CONSTEXPR (__lock_free && __double_word && __aligned)
        {
          constexpr long long __unique_ref
            = 1LL + (1LL << (__CHAR_BIT__ * sizeof(_Atomic_word)));
          auto __both_counts = reinterpret_cast<long long*>(&_M_use_count);

          _GLIBCXX_SYNCHRONIZATION_HAPPENS_BEFORE(&_M_weak_count);
          if (__atomic_load_n(__both_counts, __ATOMIC_ACQUIRE) == __unique_ref)
            {
              // Both counts are 1, so there are no weak references and
              // we are releasing the last strong reference. No other
              // threads can observe the effects of this _M_release()
              // call (e.g. calling use_count()) without a data race.
              *(long long*)(&_M_use_count) = 0;
              _GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(&_M_use_count);
              _GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(&_M_weak_count);
              _M_dispose();
              _M_destroy();
              return;
            }
          if (__gnu_cxx::__exchange_and_add_dispatch(&_M_use_count, -1) == 1)
            {
              _M_release_last_use_cold();
              return;
            }
        }
      else
#endif
      if (__gnu_cxx::__exchange_and_add_dispatch(&_M_use_count, -1) == 1)
        {
          _M_release_last_use();
        }
    }


So we use the noinline version for the else branch in the new logic,
but the can-inline version for the old logic. Would that work?

We could also consider adding __attribute__((__cold__)) to the
_M_release_last_use_cold() function, and/or add [[__unlikely__]] to
the 'if' that calls it, but maybe that's overkill.

It seems like this will slightly pessimize the case where the last use
count is dropped but there are weak refs, as that will take the cold
path now. That's probably OK, given the benefits for the "both counts
are 1" case, and that it might reduce the code size for the "use count
> 1" case.

It's past 8pm here so I'll come back to this tomorrow and do some code
size measurements of my own, now that I have a clearer understanding
of what your original patch was optimizing for.

  parent reply	other threads:[~2021-08-04 19:32 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-17 20:49 Maged Michael
2021-01-05 16:19 ` Maged Michael
2021-06-10 14:41 ` Maged Michael
2021-07-16 13:54 ` Jonathan Wakely
2021-07-16 15:55   ` Maged Michael
2021-08-02 13:23     ` Maged Michael
2021-08-02 13:29       ` Maged Michael
2021-08-03 20:59         ` Jonathan Wakely
2021-08-04 15:32           ` Jonathan Wakely
2021-08-04 15:47             ` Maged Michael
2021-08-04 15:52               ` Jonathan Wakely
2021-08-04 17:19             ` Maged Michael
2021-08-04 17:34               ` Maged Michael
2021-08-04 19:32               ` Jonathan Wakely [this message]
2021-08-04 19:48                 ` Maged Michael
2021-12-08 11:49                   ` Jonathan Wakely
2021-12-08 19:15                     ` Rainer Orth
2021-12-08 19:21                       ` Jonathan Wakely
2021-12-08 19:27                         ` Jonathan Wakely
2021-12-08 23:47                           ` [committed] libstdc++: Fix undefined shift when _Atomic_word is 64-bit Jonathan Wakely
2021-12-09 12:28                             ` Rainer Orth

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAH6eHdQmpH_jpub6YjHuwf=qXJKv0rZnHCBEV4DjBbTax73aJQ@mail.gmail.com' \
    --to=jwakely.gcc@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=libstdc++@gcc.gnu.org \
    --cc=maged.michael@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).