From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-vs1-xe2a.google.com (mail-vs1-xe2a.google.com [IPv6:2607:f8b0:4864:20::e2a]) by sourceware.org (Postfix) with ESMTPS id C11C03857824; Thu, 17 Dec 2020 20:49:45 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org C11C03857824 Received: by mail-vs1-xe2a.google.com with SMTP id z16so198305vsp.5; Thu, 17 Dec 2020 12:49:45 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to:cc; bh=kU1BDe8JqvaAMQoQA53iTqfgllPQNHYOG5LMHITlYVE=; b=Fgd7YhGAWeQZ28hbH7LaoTl5ERfj08PN/z7gmr2sDbfS7zOCA3fAX//KTUtm8Nb46z 0RajWcFwLnXlr04L3yi4Z7qT9y7Js3yfxKd9ZnRpYDwX39Hh18FuvnRON3caFLel/BiA pvK/Si5RWZpONz2TRNTwTgWd0PJP4Cs7GdI72XNnuJElJgcdDWxUlcF7uskz4Cnwf3HP qwTe91ZzK6hdeE8wfWgLy48DcTr2hofJg1Esx6geMVTEdOSLd3YSz31UdYV+Y53TtrGG 6pW2HB1jLtRh0M3hfvUTxsn33SbzM1waW0inPcnuQdFiuHG69dkaxifmTHK6TTv5z1PZ BRSg== X-Gm-Message-State: AOAM531ODBiICchAMdFkrxNFADjKGK+W+eWZeBgxWzqfRUrZPQIQHAPC edf6zcK1E80zRI2eLXv5YarTy6bAjDreg1Ju2JJjcYV1tpY= X-Google-Smtp-Source: ABdhPJxpx6AVCXbNAZHI5OBZQlGC0nvMRKPtOLGcH4Py8P4O6UKRFNGKk0hg5BiC+NhvSudIUMODQ61TcinDKLM+S/A= X-Received: by 2002:a05:6102:108e:: with SMTP id s14mr1096661vsr.50.1608238184772; Thu, 17 Dec 2020 12:49:44 -0800 (PST) MIME-Version: 1.0 From: Maged Michael Date: Thu, 17 Dec 2020 15:49:33 -0500 Message-ID: Subject: [PATCH] libstdc++: Skip atomic instructions in _Sp_counted_base::_M_release when both counts are 1 To: libstdc++@gcc.gnu.org Cc: Maged Michael , gcc-patches@gcc.gnu.org X-Spam-Status: No, score=-10.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, HTML_MESSAGE, KAM_NUMSUBJECT, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: libstdc++@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libstdc++ mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Dec 2020 20:49:48 -0000 Please find a proposed patch for _Sp_counted_base::_M_release to skip the two atomic instructions that decrement each of the use count and the weak count when both are 1. I proposed the general idea in an earlier thread ( https://gcc.gnu.org/pipermail/libstdc++/2020-December/051642.html) and got useful feedback on a draft patch and responses to related questions about multi-granular atomicity and alignment. This patch is based on that feedback. I added a check for thread sanitizer to use the current algorithm in that case because TSAN does not support multi-granular atomicity. I'd like to add a check of __has_feature(thread_sanitizer) for building using LLVM. I found examples of __has_feature in libstdc++ but it doesn't seem to be recognized in shared_ptr_base.h. Any guidance on how to check __has_feature(thread_sanitizer) in this patch? GCC generates code for _M_release that is larger and more complex than that generated by LLVM. I'd like to file a bug report about that. Jonathan, would you please create a bugzilla account for me ( https://gcc.gnu.org/bugzilla/) using my gmail address. Thank you. Information about the patch: - Benefits of the patch: Save the cost of the last atomic decrements of each of the use count and the weak count in _Sp_counted_base. Atomic instructions are significantly slower than regular loads and stores across major architectures. - How current code works: _M_release() atomically decrements the use count, checks if it was 1, if so calls _M_dispose(), atomically decrements the weak count, checks if it was 1, and if so calls _M_destroy(). - How the proposed patch works: _M_release() loads both use count and weak count together atomically (when properly aligned), checks if the value is equal to the value of both counts equal to 1 (e.g., 0x100000001), and if so calls _M_dispose() and _M_destroy(). Otherwise, it follows the original algorithm. - Why it works: When the current thread executing _M_release() finds each of the counts is equal to 1, then (when _lock_policy is _S_atomic) no other threads could possibly hold use or weak references to this control block. That is, no other threads could possibly access the counts or the protected object. - The proposed patch is intended to interact correctly with current code (under certain conditions: _Lock_policy is _S_atomic, proper alignment, and native lock-free support for atomic operations). That is, multiple threads using different versions of the code with and without the patch operating on the same objects should always interact correctly. The intent for the patch is to be ABI compatible with the current implementation. - The proposed patch involves a performance trade-off between saving the costs of two atomic instructions when the counts are both 1 vs adding the cost of loading the combined counts and comparison with two ones (e.g., 0x100000001). - The patch has been in use (built using LLVM) in a large environment for many months. The performance gains outweigh the losses (roughly 10 to 1) across a large variety of workloads. I'd appreciate feedback on the patch and any suggestions for checking __has_feature(thread_sanitizer). Maged diff --git a/libstdc++-v3/include/bits/shared_ptr_base.h b/libstdc++-v3/include/bits/shared_ptr_base.h index 368b2d7379a..a8fc944af5f 100644 --- a/libstdc++-v3/include/bits/shared_ptr_base.h +++ b/libstdc++-v3/include/bits/shared_ptr_base.h @@ -153,20 +153,78 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION if (!_M_add_ref_lock_nothrow()) __throw_bad_weak_ptr(); } bool _M_add_ref_lock_nothrow() noexcept; void _M_release() noexcept { +#if __SANITIZE_THREAD__ + _M_release_orig(); + return; +#endif + if (!__atomic_always_lock_free(sizeof(long long), 0) || + !__atomic_always_lock_free(sizeof(_Atomic_word), 0) || + sizeof(long long) < (2 * sizeof(_Atomic_word)) || + sizeof(long long) > (sizeof(void*))) + { + _M_release_orig(); + return; + } + _GLIBCXX_SYNCHRONIZATION_HAPPENS_BEFORE(&_M_use_count); + _GLIBCXX_SYNCHRONIZATION_HAPPENS_BEFORE(&_M_weak_count); + if (__atomic_load_n((long long*)(&_M_use_count), __ATOMIC_ACQUIRE) + == (1LL + (1LL << (8 * sizeof(_Atomic_word))))) + { + // Both counts are 1, so there are no weak references and + // we are releasing the last strong reference. No other + // threads can observe the effects of this _M_release() + // call (e.g. calling use_count()) without a data race. + *(long long*)(&_M_use_count) = 0; + _GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(&_M_use_count); + _GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(&_M_weak_count); + _M_dispose(); + _M_destroy(); + } + else + { + if ((__gnu_cxx::__exchange_and_add(&_M_use_count, -1) == 1)) + { + _M_release_last_use(); + } + } + } + + void + __attribute__ ((noinline)) + _M_release_last_use() noexcept + { + _GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(&_M_use_count); + _M_dispose(); + if (_Mutex_base<_Lp>::_S_need_barriers) + { + __atomic_thread_fence (__ATOMIC_ACQ_REL); + } + _GLIBCXX_SYNCHRONIZATION_HAPPENS_BEFORE(&_M_weak_count); + if (__gnu_cxx::__exchange_and_add_dispatch(&_M_weak_count, + -1) == 1) + { + _GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(&_M_weak_count); + _M_destroy(); + } + } + + void + _M_release_orig() noexcept + { // Be race-detector-friendly. For more info see bits/c++config. _GLIBCXX_SYNCHRONIZATION_HAPPENS_BEFORE(&_M_use_count); if (__gnu_cxx::__exchange_and_add_dispatch(&_M_use_count, -1) == 1) { _GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(&_M_use_count); _M_dispose(); // There must be a memory barrier between dispose() and destroy() // to ensure that the effects of dispose() are observed in the // thread that runs destroy(). // See http://gcc.gnu.org/ml/libstdc++/2005-11/msg00136.html @@ -279,20 +337,27 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _Sp_counted_base<_S_single>::_M_release() noexcept { if (--_M_use_count == 0) { _M_dispose(); if (--_M_weak_count == 0) _M_destroy(); } } + template<> + inline void + _Sp_counted_base<_S_mutex>::_M_release() noexcept + { + _M_release_orig(); + } + template<> inline void _Sp_counted_base<_S_single>::_M_weak_add_ref() noexcept { ++_M_weak_count; } template<> inline void _Sp_counted_base<_S_single>::_M_weak_release() noexcept { if (--_M_weak_count == 0)