From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-vk1-xa2b.google.com (mail-vk1-xa2b.google.com [IPv6:2607:f8b0:4864:20::a2b]) by sourceware.org (Postfix) with ESMTPS id F0C93394D832; Wed, 4 Aug 2021 17:19:34 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org F0C93394D832 Received: by mail-vk1-xa2b.google.com with SMTP id k124so598672vke.5; Wed, 04 Aug 2021 10:19:34 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=/4h+SXAbc//0l/dFlHdyh76EKmcsO4ky+eELdMr8s1Q=; b=YInHPa4un1Y8Dtqz/TPosJ3nBrAkIGemIIBJ84WObsja96bwj4Vu4PI85gP4PTTXDB qS3e8pghgkGWs60bQdYy3N7mATnShtaMRN0X7l5wHayPukjKce7bUzXKVFHZm/gLsD8g mz+I/JhtYV5JPrYKfcYb8UbIDppDVZhQwYtHajqZDFwF5KvJKnJ5eTyYtUT97ygFu7z1 3eY5MjjATwvU4KgmeD8FJTr0uwDpd+QL4ICdBkWv5FXmWKHNjaoVfinr4Zc/okVne0nk BDsuAsQBIcSbNWny7hg7j1Qzw7Pph6833N1Pd1aciYqCvnS/ZDou+lZX2Yc1HsoSEsL0 cNEQ== X-Gm-Message-State: AOAM532OC4NdP763GuOlnH/UIZdAbh8niPH404qE+ZjVz/Z9iVF/mDdV dMZLNVJAUCTjZlV5YTT6q8vx2PwspvJoBXLEMRo= X-Google-Smtp-Source: ABdhPJzGPQXtJPK+fmFIH+BLUS49IrJY0XU7Q1W5eCllEONUjJrlYs4zJemAFZG6cLoQueyXDiYcR9cJQOPT+cWK6OY= X-Received: by 2002:a1f:5c09:: with SMTP id q9mr502643vkb.3.1628097574465; Wed, 04 Aug 2021 10:19:34 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Maged Michael Date: Wed, 4 Aug 2021 13:19:23 -0400 Message-ID: Subject: Re: [PATCH] libstdc++: Skip atomic instructions in _Sp_counted_base::_M_release when both counts are 1 To: Jonathan Wakely Cc: "libstdc++" , gcc-patches X-Spam-Status: No, score=-4.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, HTML_MESSAGE, KAM_NUMSUBJECT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: libstdc++@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libstdc++ mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 Aug 2021 17:19:37 -0000 Sorry. I totally missed the rest of your message and the patch. My fuzzy eyesight, which usually guesses correctly 90% of the time, mistook "Secondly" on a line by itself for "Sincerely" :-) On Wed, Aug 4, 2021 at 11:32 AM Jonathan Wakely wrote: > On Tue, 3 Aug 2021 at 21:59, Jonathan Wakely wrote: > > > > On Mon, 2 Aug 2021 at 14:29, Maged Michael wrote: > > > > > > This is the right patch. The previous one is missing noexcept. Sorry. > > > > > > > > > On Mon, Aug 2, 2021 at 9:23 AM Maged Michael > wrote: > > >> > > >> Please find attached an updated patch after incorporating Jonathan's > suggestions. > > >> > > >> Changes from the last patch include: > > >> - Add a TSAN macro to bits/c++config. > > >> - Use separate constexpr bool-s for the conditions for lock-freedom, > double-width and alignment. > > >> - Move the code in the optimized path to a separate function > _M_release_double_width_cas. > > > > Thanks for the updated patch. At a quick glance it looks great. I'll > > apply it locally and test it tomorrow. > > > It occurs to me now that _M_release_double_width_cas is the wrong > name, because we're not doing a DWCAS, just a double-width load. What > do you think about renaming it to _M_release_combined instead? Since > it does a combined load of the two counts. > > I'm no longer confident in the alignof suggestion I gave you. > > + constexpr bool __double_word > + = sizeof(long long) == 2 * sizeof(_Atomic_word); > + // The ref-count members follow the vptr, so are aligned to > + // alignof(void*). > + constexpr bool __aligned = alignof(long long) <= alignof(void*); > > For IA32 (i.e. 32-bit x86) this constant will be true, because > alignof(long long) and alignof(void*) are both 4, even though > sizeof(long long) is 8. So in theory the _M_use_count and > _M_weak_count members could be in different cache lines, and the > atomic load will not be atomic. I think we want to use __alignof(long > long) here to get the "natural" alignment, not the smaller 4B > alignment allowed by SysV ABI. That means that we won't do this > optimization on IA32, but I think that's acceptable. > > Alternatively, we could keep using alignof, but with an additional > run-time check something like (uintptr_t)&_M_use_count / 64 == > (uintptr_t)&_M_weak_count / 64 to check they're on the same cache > line. I think for now we should just use __alignof and live without > the optimization on IA32. > > Secondly: > > + void > + __attribute__ ((noinline)) > > This needs to be __noinline__ because noinline is not a reserved word, > so users can do: > #define noinline 1 > #include > > Was it performance profiling, or code-size measurements (or something > else?) that showed this should be non-inline? > I'd like to add a comment explaining why we want it to be noinline. > > The noinlining was based on looking at generated code. That was for clang. It was inlining the _M_last_use function for every instance of _M_release (e.g., ~shared_ptr). This optimization with the noinline for _M_release_last_use ended up reducing massive binary text sizes by 1.5% (several megabytes)., which was an extra benefit. Without the noinline we saw code size increase. IIUC, we van use the following. Right? __attribute__((__noinline__)) I didn't understand the part about programmers doing #define noinline 1. I don't see code in the patch that uses noinline. How about something like this comment? // Noinline to avoid code size increase. In that function ... > > + _M_release_last_use() noexcept > + { > + _GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(&_M_use_count); > + _M_dispose(); > + if (_Mutex_base<_Lp>::_S_need_barriers) > + { > + __atomic_thread_fence (__ATOMIC_ACQ_REL); > + } > > I think we can remove this fence. The _S_need_barriers constant is > only true for the _S_mutex policy, and that just calls > _M_release_orig(), so never uses this function. I'll remove it and add > a comment noting that the lack of barrier is intentional. > > + _GLIBCXX_SYNCHRONIZATION_HAPPENS_BEFORE(&_M_weak_count); > + if (__gnu_cxx::__exchange_and_add_dispatch(&_M_weak_count, > + -1) == 1) > + { > + _GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(&_M_weak_count); > + _M_destroy(); > + } > + } > > Alternatively, we could keep the fence in _M_release_last_use() and > refactor the other _M_release functions, so that we have explicit > specializations for each of: > _Sp_counted_base<_S_single>::_M_release() (already present) > _Sp_counted_base<_S_mutex>::_M_release() > _Sp_counted_base<_S_atomic>::_M_release() > > The second and third would be new, as currently they both use the > definition in the primary template. The _S_mutex one would just > decrement _M_use_count then call _M_release_last_use() (which still > has the barrier needed for the _S_mutex policy). The _S_atomic one > would have your new optimization. See the attached patch showing what > I mean. I find this version a bit simpler to understand, as we just > have _M_release and _M_release_last_use, without > _M_release_double_width_cas and _M_release_orig. What do you think of > this version? Does it lose any important properties of your version > which I've failed to notice? > Unless I missed something I don't think the patch is correct. IIUC it uses _M_release_last_use as if it is equivalent to what was _M_release_orig, which is not true. The logic in _M_release_last_use assumes that the use count has already been decremented and the return value was 1. So I don't think the refactor works. Please correct me if I missed something.. If I am not mistaken, how about I prepare a new patch that incorporates the following? - __noinline__ - The change to __align in my last message. - Remove the unnecessary checks based on _S_need_barriers in the combined / last_use path. - Going back to having _M_release_orig and _M_release_combined - Include all of the comments that you added in your patch. Would that be OK with you? Thanks, Maged