From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm1-x336.google.com (mail-wm1-x336.google.com [IPv6:2a00:1450:4864:20::336]) by sourceware.org (Postfix) with ESMTPS id A977F3890406; Wed, 4 Aug 2021 15:32:48 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A977F3890406 Received: by mail-wm1-x336.google.com with SMTP id a192-20020a1c7fc90000b0290253b32e8796so3913507wmd.0; Wed, 04 Aug 2021 08:32:48 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=ANqTcuy+jqB8EZaFvUaE7SvN6H/UOa49Lixl86lZBJU=; b=n+eIwSWK0Z3fUFwmiyknu8HAer2EVPY/f19z2qdMIbpzOCcPcBpc20BkhnRX+a/Sa2 26Ao7ZC+gloEVjS4pVwzUcWyvGW3HODGRAX+3P3V10Uyaq0lQqwnw0CGw9Ny1Z8LV2hb ItmC5RtwifjgodVKEyCOM3rM4NI+8h7b/wFvHB8smI4xB/gvlkwf0kWUuLDeqe62uHuV dB5WGlGe/pSdV12n2BcPiAXRczCBFYp51PeUmY3kcvHxW1z+oCiIK+6zt1Idfxs6Zs2W L8+AENSMooGKuZE3kj6C6gYG9vzR4UrYf44QA2yuc/3s47BIfnBCXqHz8S1JLVQAL5Xt p+sQ== X-Gm-Message-State: AOAM532kxqb+mM0LVS9+bbVBrO1BKEvV+VTDGF24RIibPEAAruirVTEk fZeFCm+0b+4OL7oTwFOYNPZyOt2RCxK7WyJWwB0= X-Google-Smtp-Source: ABdhPJyfgIxbQSGzSuIT2iWsVEvLPKBxxRmMhyuZ3sSnK4UbZxi5Cpy5G525A2xNG0i1lobjzLAzWEIzQoeeu5X/9Hg= X-Received: by 2002:a1c:a187:: with SMTP id k129mr187062wme.17.1628091167722; Wed, 04 Aug 2021 08:32:47 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Jonathan Wakely Date: Wed, 4 Aug 2021 16:32:36 +0100 Message-ID: Subject: Re: [PATCH] libstdc++: Skip atomic instructions in _Sp_counted_base::_M_release when both counts are 1 To: Maged Michael Cc: "libstdc++" , gcc-patches Content-Type: multipart/mixed; boundary="000000000000b6f25705c8bd837b" X-Spam-Status: No, score=-7.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_NUMSUBJECT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libstdc++@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libstdc++ mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 Aug 2021 15:32:50 -0000 --000000000000b6f25705c8bd837b Content-Type: text/plain; charset="UTF-8" On Tue, 3 Aug 2021 at 21:59, Jonathan Wakely wrote: > > On Mon, 2 Aug 2021 at 14:29, Maged Michael wrote: > > > > This is the right patch. The previous one is missing noexcept. Sorry. > > > > > > On Mon, Aug 2, 2021 at 9:23 AM Maged Michael wrote: > >> > >> Please find attached an updated patch after incorporating Jonathan's suggestions. > >> > >> Changes from the last patch include: > >> - Add a TSAN macro to bits/c++config. > >> - Use separate constexpr bool-s for the conditions for lock-freedom, double-width and alignment. > >> - Move the code in the optimized path to a separate function _M_release_double_width_cas. > > Thanks for the updated patch. At a quick glance it looks great. I'll > apply it locally and test it tomorrow. It occurs to me now that _M_release_double_width_cas is the wrong name, because we're not doing a DWCAS, just a double-width load. What do you think about renaming it to _M_release_combined instead? Since it does a combined load of the two counts. I'm no longer confident in the alignof suggestion I gave you. + constexpr bool __double_word + = sizeof(long long) == 2 * sizeof(_Atomic_word); + // The ref-count members follow the vptr, so are aligned to + // alignof(void*). + constexpr bool __aligned = alignof(long long) <= alignof(void*); For IA32 (i.e. 32-bit x86) this constant will be true, because alignof(long long) and alignof(void*) are both 4, even though sizeof(long long) is 8. So in theory the _M_use_count and _M_weak_count members could be in different cache lines, and the atomic load will not be atomic. I think we want to use __alignof(long long) here to get the "natural" alignment, not the smaller 4B alignment allowed by SysV ABI. That means that we won't do this optimization on IA32, but I think that's acceptable. Alternatively, we could keep using alignof, but with an additional run-time check something like (uintptr_t)&_M_use_count / 64 == (uintptr_t)&_M_weak_count / 64 to check they're on the same cache line. I think for now we should just use __alignof and live without the optimization on IA32. Secondly: + void + __attribute__ ((noinline)) This needs to be __noinline__ because noinline is not a reserved word, so users can do: #define noinline 1 #include Was it performance profiling, or code-size measurements (or something else?) that showed this should be non-inline? I'd like to add a comment explaining why we want it to be noinline. In that function ... + _M_release_last_use() noexcept + { + _GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(&_M_use_count); + _M_dispose(); + if (_Mutex_base<_Lp>::_S_need_barriers) + { + __atomic_thread_fence (__ATOMIC_ACQ_REL); + } I think we can remove this fence. The _S_need_barriers constant is only true for the _S_mutex policy, and that just calls _M_release_orig(), so never uses this function. I'll remove it and add a comment noting that the lack of barrier is intentional. + _GLIBCXX_SYNCHRONIZATION_HAPPENS_BEFORE(&_M_weak_count); + if (__gnu_cxx::__exchange_and_add_dispatch(&_M_weak_count, + -1) == 1) + { + _GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(&_M_weak_count); + _M_destroy(); + } + } Alternatively, we could keep the fence in _M_release_last_use() and refactor the other _M_release functions, so that we have explicit specializations for each of: _Sp_counted_base<_S_single>::_M_release() (already present) _Sp_counted_base<_S_mutex>::_M_release() _Sp_counted_base<_S_atomic>::_M_release() The second and third would be new, as currently they both use the definition in the primary template. The _S_mutex one would just decrement _M_use_count then call _M_release_last_use() (which still has the barrier needed for the _S_mutex policy). The _S_atomic one would have your new optimization. See the attached patch showing what I mean. I find this version a bit simpler to understand, as we just have _M_release and _M_release_last_use, without _M_release_double_width_cas and _M_release_orig. What do you think of this version? Does it lose any important properties of your version which I've failed to notice? --000000000000b6f25705c8bd837b Content-Type: text/plain; charset="US-ASCII"; name="patch.txt" Content-Disposition: attachment; filename="patch.txt" Content-Transfer-Encoding: base64 Content-ID: X-Attachment-Id: f_krxndpw00 ZGlmZiAtLWdpdCBhL2xpYnN0ZGMrKy12My9pbmNsdWRlL2JpdHMvYysrY29uZmlnIGIvbGlic3Rk YysrLXYzL2luY2x1ZGUvYml0cy9jKytjb25maWcKaW5kZXggMzJiODk1N2Y4MTQuLjA3NDY1Zjdl Y2Q1IDEwMDY0NAotLS0gYS9saWJzdGRjKystdjMvaW5jbHVkZS9iaXRzL2MrK2NvbmZpZworKysg Yi9saWJzdGRjKystdjMvaW5jbHVkZS9iaXRzL2MrK2NvbmZpZwpAQCAtMTQzLDYgKzE0MywxNSBA QAogIyBkZWZpbmUgX0dMSUJDWFhfTk9ESVNDQVJECiAjZW5kaWYKIAorLy8gTWFjcm8gZm9yIFRT QU4uCisjaWYgX19TQU5JVElaRV9USFJFQURfXworIyAgZGVmaW5lIF9HTElCQ1hYX1RTQU4gMQor I2VsaWYgZGVmaW5lZCBfX2hhc19mZWF0dXJlCisjIGlmIF9faGFzX2ZlYXR1cmUodGhyZWFkX3Nh bml0aXplcikKKyMgIGRlZmluZSBfR0xJQkNYWF9UU0FOIDEKKyMgZW5kaWYKKyNlbmRpZgorCiAK IAogI2lmIF9fY3BsdXNwbHVzCmRpZmYgLS1naXQgYS9saWJzdGRjKystdjMvaW5jbHVkZS9iaXRz L3NoYXJlZF9wdHJfYmFzZS5oIGIvbGlic3RkYysrLXYzL2luY2x1ZGUvYml0cy9zaGFyZWRfcHRy X2Jhc2UuaAppbmRleCA1YmU5MzVkMTc0ZC4uYjIzOTdjOGZkZGIgMTAwNjQ0Ci0tLSBhL2xpYnN0 ZGMrKy12My9pbmNsdWRlL2JpdHMvc2hhcmVkX3B0cl9iYXNlLmgKKysrIGIvbGlic3RkYysrLXYz L2luY2x1ZGUvYml0cy9zaGFyZWRfcHRyX2Jhc2UuaApAQCAtMTQzLDEwICsxNDMsMTIgQEAgX0dM SUJDWFhfQkVHSU5fTkFNRVNQQUNFX1ZFUlNJT04KICAgICAgIHZpcnR1YWwgdm9pZCoKICAgICAg IF9NX2dldF9kZWxldGVyKGNvbnN0IHN0ZDo6dHlwZV9pbmZvJikgbm9leGNlcHQgPSAwOwogCisg ICAgICAvLyBJbmNyZW1lbnQgdGhlIHVzZSBjb3VudCAodXNlZCB3aGVuIHRoZSBjb3VudCBpcyBn cmVhdGVyIHRoYW4gemVybykuCiAgICAgICB2b2lkCiAgICAgICBfTV9hZGRfcmVmX2NvcHkoKQog ICAgICAgeyBfX2dudV9jeHg6Ol9fYXRvbWljX2FkZF9kaXNwYXRjaCgmX01fdXNlX2NvdW50LCAx KTsgfQogCisgICAgICAvLyBJbmNyZW1lbnQgdGhlIHVzZSBjb3VudCBpZiBpdCBpcyBub24temVy bywgdGhyb3cgb3RoZXJ3aXNlLgogICAgICAgdm9pZAogICAgICAgX01fYWRkX3JlZl9sb2NrKCkK ICAgICAgIHsKQEAgLTE1NCwxNSArMTU2LDE4IEBAIF9HTElCQ1hYX0JFR0lOX05BTUVTUEFDRV9W RVJTSU9OCiAJICBfX3Rocm93X2JhZF93ZWFrX3B0cigpOwogICAgICAgfQogCisgICAgICAvLyBJ bmNyZW1lbnQgdGhlIHVzZSBjb3VudCBpZiBpdCBpcyBub24temVyby4KICAgICAgIGJvb2wKICAg ICAgIF9NX2FkZF9yZWZfbG9ja19ub3Rocm93KCkgbm9leGNlcHQ7CiAKKyAgICAgIC8vIERlY3Jl bWVudCB0aGUgdXNlIGNvdW50LgogICAgICAgdm9pZAotICAgICAgX01fcmVsZWFzZSgpIG5vZXhj ZXB0Ci0gICAgICB7Ci0gICAgICAgIC8vIEJlIHJhY2UtZGV0ZWN0b3ItZnJpZW5kbHkuICBGb3Ig bW9yZSBpbmZvIHNlZSBiaXRzL2MrK2NvbmZpZy4KLSAgICAgICAgX0dMSUJDWFhfU1lOQ0hST05J WkFUSU9OX0hBUFBFTlNfQkVGT1JFKCZfTV91c2VfY291bnQpOwotCWlmIChfX2dudV9jeHg6Ol9f ZXhjaGFuZ2VfYW5kX2FkZF9kaXNwYXRjaCgmX01fdXNlX2NvdW50LCAtMSkgPT0gMSkKKyAgICAg IF9NX3JlbGVhc2UoKSBub2V4Y2VwdDsKKworICAgICAgLy8gQ2FsbGVkIGJ5IF9NX3JlbGVhc2Uo KSB3aGVuIHRoZSB1c2UgY291bnQgcmVhY2hlcyB6ZXJvLgorICAgICAgX19hdHRyaWJ1dGVfXygo X19ub2lubGluZV9fKSkKKyAgICAgIHZvaWQKKyAgICAgIF9NX3JlbGVhc2VfbGFzdF91c2UoKSBu b2V4Y2VwdAogICAgICAgewogCV9HTElCQ1hYX1NZTkNIUk9OSVpBVElPTl9IQVBQRU5TX0FGVEVS KCZfTV91c2VfY291bnQpOwogCV9NX2Rpc3Bvc2UoKTsKQEAgLTE4NCwxMiArMTg5LDEzIEBAIF9H TElCQ1hYX0JFR0lOX05BTUVTUEFDRV9WRVJTSU9OCiAJICAgIF9NX2Rlc3Ryb3koKTsKIAkgIH0K ICAgICAgIH0KLSAgICAgIH0KIAorICAgICAgLy8gSW5jcmVtZW50IHRoZSB3ZWFrIGNvdW50Lgog ICAgICAgdm9pZAogICAgICAgX01fd2Vha19hZGRfcmVmKCkgbm9leGNlcHQKICAgICAgIHsgX19n bnVfY3h4OjpfX2F0b21pY19hZGRfZGlzcGF0Y2goJl9NX3dlYWtfY291bnQsIDEpOyB9CiAKKyAg ICAgIC8vIERlY3JlbWVudCB0aGUgd2VhayBjb3VudC4KICAgICAgIHZvaWQKICAgICAgIF9NX3dl YWtfcmVsZWFzZSgpIG5vZXhjZXB0CiAgICAgICB7CkBAIC0yODYsNiArMjkyLDYxIEBAIF9HTElC Q1hYX0JFR0lOX05BTUVTUEFDRV9WRVJTSU9OCiAgICAgICAgIH0KICAgICB9CiAKKyAgdGVtcGxh dGU8PgorICAgIGlubGluZSB2b2lkCisgICAgX1NwX2NvdW50ZWRfYmFzZTxfU19tdXRleD46Ol9N X3JlbGVhc2UoKSBub2V4Y2VwdAorICAgIHsKKyAgICAgIC8vIEJlIHJhY2UtZGV0ZWN0b3ItZnJp ZW5kbHkuICBGb3IgbW9yZSBpbmZvIHNlZSBiaXRzL2MrK2NvbmZpZy4KKyAgICAgIF9HTElCQ1hY X1NZTkNIUk9OSVpBVElPTl9IQVBQRU5TX0JFRk9SRSgmX01fdXNlX2NvdW50KTsKKyAgICAgIGlm IChfX2dudV9jeHg6Ol9fZXhjaGFuZ2VfYW5kX2FkZF9kaXNwYXRjaCgmX01fdXNlX2NvdW50LCAt MSkgPT0gMSkKKwl7CisJICBfTV9yZWxlYXNlX2xhc3RfdXNlKCk7CisJfQorICAgIH0KKworICB0 ZW1wbGF0ZTw+CisgICAgaW5saW5lIHZvaWQKKyAgICBfU3BfY291bnRlZF9iYXNlPF9TX2F0b21p Yz46Ol9NX3JlbGVhc2UoKSBub2V4Y2VwdAorICAgIHsKKyAgICAgIF9HTElCQ1hYX1NZTkNIUk9O SVpBVElPTl9IQVBQRU5TX0JFRk9SRSgmX01fdXNlX2NvdW50KTsKKyNpZiAhIF9HTElCQ1hYX1RT QU4KKyAgICAgIGNvbnN0ZXhwciBib29sIF9fbG9ja19mcmVlCisJPSBfX2F0b21pY19hbHdheXNf bG9ja19mcmVlKHNpemVvZihsb25nIGxvbmcpLCAwKQorCSYmIF9fYXRvbWljX2Fsd2F5c19sb2Nr X2ZyZWUoc2l6ZW9mKF9BdG9taWNfd29yZCksIDApOworICAgICAgY29uc3RleHByIGJvb2wgX19k b3VibGVfd29yZAorCT0gc2l6ZW9mKGxvbmcgbG9uZykgPT0gMiAqIHNpemVvZihfQXRvbWljX3dv cmQpOworICAgICAgLy8gVGhlIHJlZi1jb3VudCBtZW1iZXJzIGZvbGxvdyB0aGUgdnB0ciwgc28g YXJlIGFsaWduZWQgdG8KKyAgICAgIC8vIGFsaWdub2Yodm9pZCopLgorICAgICAgY29uc3RleHBy IGJvb2wgX19hbGlnbmVkID0gX19hbGlnbm9mKGxvbmcgbG9uZykgPD0gYWxpZ25vZih2b2lkKik7 CisgICAgICBpZiBfR0xJQkNYWDE3X0NPTlNURVhQUiAoX19sb2NrX2ZyZWUgJiYgX19kb3VibGVf d29yZCAmJiBfX2FsaWduZWQpCisJeworCSAgY29uc3RleHByIGxvbmcgbG9uZyBfX3VuaXF1ZV9y ZWYKKwkgICAgPSAxTEwgKyAoMUxMIDw8IChfX0NIQVJfQklUX18gKiBzaXplb2YoX0F0b21pY193 b3JkKSkpOworCSAgYXV0byBfX2JvdGhfY291bnRzID0gcmVpbnRlcnByZXRfY2FzdDxsb25nIGxv bmcqPigmX01fdXNlX2NvdW50KTsKKworCSAgX0dMSUJDWFhfU1lOQ0hST05JWkFUSU9OX0hBUFBF TlNfQkVGT1JFKCZfTV93ZWFrX2NvdW50KTsKKwkgIGlmIChfX2F0b21pY19sb2FkX24oX19ib3Ro X2NvdW50cywgX19BVE9NSUNfQUNRVUlSRSkgPT0gX191bmlxdWVfcmVmKQorCSAgICB7CisJICAg ICAgLy8gQm90aCBjb3VudHMgYXJlIDEsIHNvIHRoZXJlIGFyZSBubyB3ZWFrIHJlZmVyZW5jZXMg YW5kCisJICAgICAgLy8gd2UgYXJlIHJlbGVhc2luZyB0aGUgbGFzdCBzdHJvbmcgcmVmZXJlbmNl LiBObyBvdGhlcgorCSAgICAgIC8vIHRocmVhZHMgY2FuIG9ic2VydmUgdGhlIGVmZmVjdHMgb2Yg dGhpcyBfTV9yZWxlYXNlKCkKKwkgICAgICAvLyBjYWxsIChlLmcuIGNhbGxpbmcgdXNlX2NvdW50 KCkpIHdpdGhvdXQgYSBkYXRhIHJhY2UuCisJICAgICAgKihsb25nIGxvbmcqKSgmX01fdXNlX2Nv dW50KSA9IDA7CisJICAgICAgX0dMSUJDWFhfU1lOQ0hST05JWkFUSU9OX0hBUFBFTlNfQUZURVIo Jl9NX3VzZV9jb3VudCk7CisJICAgICAgX0dMSUJDWFhfU1lOQ0hST05JWkFUSU9OX0hBUFBFTlNf QUZURVIoJl9NX3dlYWtfY291bnQpOworCSAgICAgIF9NX2Rpc3Bvc2UoKTsKKwkgICAgICBfTV9k ZXN0cm95KCk7CisJICAgICAgX19idWlsdGluX3B1dHMoImRvdWJsZSBkb3VibGUgYnllIGJ5ZSIp OworCSAgICAgIHJldHVybjsKKwkgICAgfQorCX0KKyNlbmRpZgorICAgICAgaWYgKF9fZ251X2N4 eDo6X19leGNoYW5nZV9hbmRfYWRkX2Rpc3BhdGNoKCZfTV91c2VfY291bnQsIC0xKSA9PSAxKQor CXsKKwkgIF9NX3JlbGVhc2VfbGFzdF91c2UoKTsKKwl9CisgICAgfQorCiAgIHRlbXBsYXRlPD4K ICAgICBpbmxpbmUgdm9pZAogICAgIF9TcF9jb3VudGVkX2Jhc2U8X1Nfc2luZ2xlPjo6X01fd2Vh a19hZGRfcmVmKCkgbm9leGNlcHQK --000000000000b6f25705c8bd837b--