From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by sourceware.org (Postfix) with ESMTPS id 50268386EC49 for ; Mon, 1 Mar 2021 19:12:04 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 50268386EC49 IronPort-SDR: NDMj9/G8k+NJuVXMguI9vgN3lxwWeKxAKvgbiIf6SRsFx0e4UAi8PcS8oRjJ+irQWAiKIiBP+8 0BCBA1GwsFHw== X-IronPort-AV: E=McAfee;i="6000,8403,9910"; a="185895451" X-IronPort-AV: E=Sophos;i="5.81,215,1610438400"; d="scan'208";a="185895451" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Mar 2021 11:12:02 -0800 IronPort-SDR: 0cJPHpKUv5m0o0N0lgW9uoTaAFiaayI/fw9+kCVa5rm78XNQL3K0o7y0J0isl3Spxw2mZY3q7f zwYQeoSOrUzw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.81,215,1610438400"; d="scan'208";a="397884011" Received: from irsmsx605.ger.corp.intel.com ([163.33.146.138]) by fmsmga008.fm.intel.com with ESMTP; 01 Mar 2021 11:12:01 -0800 Received: from tjmaciei-mobl1.localnet (10.255.230.217) by IRSMSX605.ger.corp.intel.com (163.33.146.138) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2106.2; Mon, 1 Mar 2021 19:12:00 +0000 From: Thiago Macieira To: Thomas Rodgers CC: Thomas Rodgers , , Subject: Re: C++2a synchronisation inefficient in GCC 11 Date: Mon, 1 Mar 2021 11:11:56 -0800 Message-ID: <3732407.PpFURekcsd@tjmaciei-mobl1> Organization: Intel Corporation In-Reply-To: <18527452.32617548.1614623662482.JavaMail.zimbra@redhat.com> References: <1968544.UC5HiB4uFJ@tjmaciei-mobl1> <7309459.cE9rBlv1QQ@tjmaciei-mobl1> <18527452.32617548.1614623662482.JavaMail.zimbra@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-Originating-IP: [10.255.230.217] X-ClientProxiedBy: orsmsx603.amr.corp.intel.com (10.22.229.16) To IRSMSX605.ger.corp.intel.com (163.33.146.138) X-Spam-Status: No, score=-2.9 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS, KAM_NUMSUBJECT, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libstdc++@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libstdc++ mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 01 Mar 2021 19:12:06 -0000 On Monday, 1 March 2021 10:34:22 PST Thomas Rodgers wrote: > > I'm worried that it is benchmarking the wrong thing. Can we benchmark > > latch > > and counting_semaphore instead? Those already track contention on the > > waitable > > atomic by themselves. > > So is your concern here that e.g. latch::count_down() will do an extra > atomic load? I'm concerned we're violating "Don't Pay for What You Don't Need". Waiting algorithms should be written to track the contention by themselves. If they say they want to wake or wait, the runtime shouldn't second-guess them. Take a look at how glibc's sem_post and sem_wait are implemented. On systems with 64-bit atomics, they store the number of waiters in the high 32-bit; on systems without, they store a single bit indicating whether there's anyone waiting. The 64-bit code summarises to: void acquire() { auto &low = low_half_by_ref(); // handle big-endian, reinterpret_cast, uint64_t d = _M_counter.fetch_add(1ULL << 32); for (;;) { if ((d & SEM_VALUE_MASK) == 0) { low.wait(d); d = _M_counter.load(std::memory_order_relaxed); } else { if (_M_counter.compare_exchange_strong(d, d - 1, ...)) break; } _M_counter.fetch_sub(1ULL << 32); } void release(int n = 1) { auto &low = low_half_by_ref(); // handle big-endian, reinterpret_cast, uint64_d = _M_counter.fetch_add(n); if (d >> SEM_NWAITERS_SHIFT) if (n == 1) low.notify_one(); else low.notify_all(); // ought to be "notify_many(n)" } It's even simpler in the latch, where any value different from "the last to arrive" is "contended". _GLIBCXX_ALWAYS_INLINE void count_down(ptrdiff_t __update = 1) { auto const __old = __atomic_impl::fetch_sub(&_M_a, __update, memory_order::release); if (__old == __update) __atomic_impl::notify_all(&_M_a); } There's no need to track contention again here. The only case where it would be useful is if there's only one thread operating on this latch, but if that's the case then why are you using a latch in the first place? And again, if that's an issue, then we can use the LSB or MSB to indicate that there are waiters waiting (we don't need a count of waiters). -- Thiago Macieira - thiago.macieira (AT) intel.com Software Architect - Intel DPG Cloud Engineering