From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr1-x42b.google.com (mail-wr1-x42b.google.com [IPv6:2a00:1450:4864:20::42b]) by sourceware.org (Postfix) with ESMTPS id 87C26398203F; Wed, 4 Aug 2021 15:52:14 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 87C26398203F Received: by mail-wr1-x42b.google.com with SMTP id k4so2833288wrc.0; Wed, 04 Aug 2021 08:52:14 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=J/jPeskLfTviKFoakztE3DGGhmS8lbZw80hCWYDU/RM=; b=OjOt1Vp3q7764diX84SlRRnSDA7Hj8ieePJMFK3OIhOgwTuqOT+0WkdFsHgc8CoODr GJr8Ex6XZ4NG3hpmSbod5Lduq28o/G21IOiG09Sin5KdQBHhBHr8YgTFpmQQIBSyhvRP Y9NVpVIT8OWGPTTFTv7aRKKTGNKnkmpMTRauOv6i2zgp+PrvpS71OT+4TQNuuiRdtF6n ypFkCG1fZz22iGHSmpI+6zNxI+qyttWYbcZaRevItw/BqyqbBTd+93i7fMgv8x+ad3uT SK5H1++F0rucvuQy0KwiBhRWNIyswjsipGO/jsycjNlwlkZ0vnr0sqOU1AXJV03pRWJF y5EQ== X-Gm-Message-State: AOAM5339tkl3kOgYEgTRfcttGZoQ6Bw9/rY8scaY0uQtUKGhNTfsBiEn Po+UYjULiG2XeaAeKOOO86zdbuQu3KUXmq+PKWs= X-Google-Smtp-Source: ABdhPJycBjBppnkuAJAlOJRaXPRxxht1eOkQGiGeqnalj5bkAFQcOHW2XEB9Hd0L+oMXJyNGj09lS33DOQwbxETqFXI= X-Received: by 2002:a5d:49c7:: with SMTP id t7mr7757wrs.321.1628092333632; Wed, 04 Aug 2021 08:52:13 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Jonathan Wakely Date: Wed, 4 Aug 2021 16:52:02 +0100 Message-ID: Subject: Re: [PATCH] libstdc++: Skip atomic instructions in _Sp_counted_base::_M_release when both counts are 1 To: Maged Michael Cc: "libstdc++" , gcc-patches Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-1.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, KAM_NUMSUBJECT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libstdc++@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libstdc++ mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 Aug 2021 15:52:15 -0000 On Wed, 4 Aug 2021 at 16:47, Maged Michael wrote: > > Thanks, Jonathan! > > On Wed, Aug 4, 2021 at 11:32 AM Jonathan Wakely wrote: >> >> On Tue, 3 Aug 2021 at 21:59, Jonathan Wakely wrote: >> > >> > On Mon, 2 Aug 2021 at 14:29, Maged Michael wrote: >> > > >> > > This is the right patch. The previous one is missing noexcept. Sorry. >> > > >> > > >> > > On Mon, Aug 2, 2021 at 9:23 AM Maged Michael wrote: >> > >> >> > >> Please find attached an updated patch after incorporating Jonathan's suggestions. >> > >> >> > >> Changes from the last patch include: >> > >> - Add a TSAN macro to bits/c++config. >> > >> - Use separate constexpr bool-s for the conditions for lock-freedom, double-width and alignment. >> > >> - Move the code in the optimized path to a separate function _M_release_double_width_cas. >> > >> > Thanks for the updated patch. At a quick glance it looks great. I'll >> > apply it locally and test it tomorrow. >> >> >> It occurs to me now that _M_release_double_width_cas is the wrong >> name, because we're not doing a DWCAS, just a double-width load. What >> do you think about renaming it to _M_release_combined instead? Since >> it does a combined load of the two counts. > > > Yes definitely _M_release_combined makes sense. Will do. See the patch I attached to my previous mail, which has a refactored version that gets rid of that function entirely. > >> >> I'm no longer confident in the alignof suggestion I gave you. >> >> + constexpr bool __double_word >> + = sizeof(long long) == 2 * sizeof(_Atomic_word); >> + // The ref-count members follow the vptr, so are aligned to >> + // alignof(void*). >> + constexpr bool __aligned = alignof(long long) <= alignof(void*); >> >> For IA32 (i.e. 32-bit x86) this constant will be true, because >> alignof(long long) and alignof(void*) are both 4, even though >> sizeof(long long) is 8. So in theory the _M_use_count and >> _M_weak_count members could be in different cache lines, and the >> atomic load will not be atomic. I think we want to use __alignof(long >> long) here to get the "natural" alignment, not the smaller 4B >> alignment allowed by SysV ABI. That means that we won't do this >> optimization on IA32, but I think that's acceptable. >> >> Alternatively, we could keep using alignof, but with an additional >> run-time check something like (uintptr_t)&_M_use_count / 64 == >> (uintptr_t)&_M_weak_count / 64 to check they're on the same cache >> line. I think for now we should just use __alignof and live without >> the optimization on IA32. >> > I'd rather avoid any runtime checks because they may negate the speed rationale for doing this optimization. > I'd be OK with not doing this optimization for any 32-bit architecture. > > Is it OK to change the __align condition to the following? > constexpr bool __aligned = > (alignof(long long) <= alignof(void*)) > && (sizeof(long long) == sizeof(void*)); Yes, that will work fine.