From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx.kolabnow.com (mx.kolabnow.com [95.128.36.42]) by sourceware.org (Postfix) with ESMTPS id 2070F3854830 for ; Sat, 27 Feb 2021 01:13:54 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 2070F3854830 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=appliantology.com Authentication-Results: sourceware.org; spf=none smtp.mailfrom=rodgert@appliantology.com Received: from localhost (unknown [127.0.0.1]) by ext-mx-out001.mykolab.com (Postfix) with ESMTP id 99437B69; Sat, 27 Feb 2021 02:13:52 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kolabnow.com; h= message-id:references:in-reply-to:subject:subject:from:from:date :date:content-type:content-type:mime-version:received:received :received; s=dkim20160331; t=1614388431; x=1616202832; bh=Aqy7gx XhJePNP0UlMyoJG+7IUl5Zex6zyU+Vx26q4NM=; b=TJ0CPJWLb+am2FIMnLDNet qI5tUptDJ/QoKM8P43JpG13JbaYLcR2sfYBjTfYJcvYwAKMh/I2EQznE68jh8PEW u+ntaBLolTRNW4F7CTfDEu5KM6nm6fkXbjQ4cv4CQr9+YUWScXzPShkPffosMMMz /MID9MLR34v/JfuepkdIfMrCLB7oo7zqIvcDA1tMIG+Dd3KNLwboScVVUkh9K/M5 MIeCa8yajRSoqKotls5K1HWo+JLP4PwgHu6Rp6ACzU2lmwh4Lx31m6Vlo2cPrM5R Q5Krz0cdqq8usfG8CFfSVl8dwnEVfIwbb4s69pZNYFHr2GhRYESfg5f3O+d5uHSo AmifaLom5/FZlzUbDwwmvhgjJoG+QkYRDLfc0/42oKQBpxlUQA9lcJBSope0Ynua pBf+4K6sQORXVJZO0GMchzveetr+uPB6nuC28iGBh29E3C5mhu1UsY16h7rWoi0H 8I9gEJaC2papIE+Ix1kTciaNwbFDw3UMv1S3N+oyJHwacvcCfkUsjDnujcB8UM1r l3fjZxGrd/0shpMFm1Eb5OtILJ+hZfkosbvYpcVdMTH4URRTGE1umnuOurfTbaZr VRiXN08zqRwqo5+qP5k5zN96889XP48LsJu0QTghFiiBxRZq+57/ymhII08u3Ivv B6f2CFJ18UhqWypit/JjA= X-Virus-Scanned: amavisd-new at mykolab.com X-Spam-Score: -1.898 X-Spam-Level: X-Spam-Status: No, score=-6.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, HTML_MESSAGE, KAM_NUMSUBJECT, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=no autolearn_force=no version=3.4.2 Received: from mx.kolabnow.com ([127.0.0.1]) by localhost (ext-mx-out001.mykolab.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id YER4HsV3TS7b; Sat, 27 Feb 2021 02:13:51 +0100 (CET) Received: from int-mx002.mykolab.com (unknown [10.9.13.2]) by ext-mx-out001.mykolab.com (Postfix) with ESMTPS id D6FEA82F; Sat, 27 Feb 2021 02:13:50 +0100 (CET) Received: from int-subm001.mykolab.com (unknown [10.9.37.1]) by int-mx002.mykolab.com (Postfix) with ESMTPS id F3A98223C; Sat, 27 Feb 2021 02:13:48 +0100 (CET) MIME-Version: 1.0 Date: Fri, 26 Feb 2021 17:13:46 -0800 From: Thomas Rodgers To: Thiago Macieira Cc: libstdc++@gcc.gnu.org, trodgers@redhat.com, jwakely@redhat.com Subject: Re: C++2a synchronisation inefficient in GCC 11 In-Reply-To: <1968544.UC5HiB4uFJ@tjmaciei-mobl1> References: <1968544.UC5HiB4uFJ@tjmaciei-mobl1> Message-ID: <62df56697b99a0ab414e8f1e35f10b0b@appliantology.com> X-Sender: rodgert@appliantology.com X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: libstdc++@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libstdc++ mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 27 Feb 2021 01:13:56 -0000 We are SPECIFICALLY NOT committing to an ABI for wait/notify with GCC11. These features are experimental and I have no intention of treating this as a blocker. On 2021-02-25 14:50, Thiago Macieira via Libstdc++ wrote: > Hello all > > I was investigating the implementation of std::atomic::wait to use > inside > my own code (Qt's QMutex and QSemaphore), when I realised that it has > huge inefficiencies. Since everything in this implementation is inline > and, > once released, it will tie our hands until the next ABI break > (libstdc++.so. > 7). And no, inline namespaces and ABI tags are no different than ABI > breaks, > they only make the ABI break more or less silent in the process. > > Here's a summary of the findings: > > 1) everything is inline > 2) futex code is still behind a lot of code calling into > std::_Hash_bytes > 3) other int-sized (incl. enums) atomics don't use futex > 4) std::latch and std::counting_semaphore defaults preclude from using > futex on Linux > 5) std::barrier implementation also uses a type that futex(2) can't > handle > > Here's what the code currently looks like: > https://gcc.godbolt.org/z/MeP8rr > > Given how far we are in the release cycle, a simple solution is to > simply > disable the entire solution by #if 0'ing out bits/atomic_wait.h. If > __cpp_lib_atomic_wait isn't defined, the rest of the functionality > cascades > disabled too. Then we can spend however long is necessary to get a good > implementation for GCC 12. > > I have patches for all but the first issue, but they are a result of > only 3 > hours of hacking to prove the concept. I will post them, but I do not > expect > them to get merged as-is. > > Details: > > 1) everything is inline > > I think this is unnecessary optimisation, especially since it expands > to a LOT > of code (see link above). A cursory research into libc++ and MS STL > shows that > the actual wait and notification are out-of-line. I recommend we do the > same > for libstdc++: if we're going to syscall anyway, the extra function > call isn't > going to be the performance bottleneck. > > In , > Jonathan > wrote: > >> Eventually we'll want to move the rest of this function (which doesn't >> depend on the template argument) into the compiled library, but it's >> better to be header-only for now. > > But that has never happened. > > Having out-of-line implementations would allow the implementation for > generic- > sized atomic wait/notify to fine-tune as time and experience > progresses. The > std::__detail::__waiter implementation looks clever, but it's unclear > if it is > the best for all workloads. Moreover, the implementation is limited to > 16 > waiters. > > Moreover, it would allow us to have support in the future for a 64-bit > futex. > Linus did shoot down any need for it 13 years ago, but the world has > changed > since and it's not inconceivable to have a pointer-sized futex on every > platform, which on 64-bit would be 64-bit. Windows and Darwin do have > support > for 64-bit compare-and-wait functionality. > > And it would allow us to have futex-equivalent support in other OSes, > added at > our leisure. The current implementation does not support Windows's > WaitOnAddress and could never support it if released in the current > state. > It's also possible other OSes will eventually add equivalent > functionality > because of the C++ standard. For example, FreeBSD already has futex() > support > inside its kernel in order to run Linux binaries, but it is not exposed > to the > FreeBSD-personality syscall ABI. > > Inlining certain functionality at a later date is possible. De-inlining > isn't. > > 2) futex code for int is still behind a lot of boiler plate > > It comes from the creation/use of std::__detail::__waiter in > std::__atomic_wait and __atomic_notify. That does allow one to detect > whether > there is contention in the first place and avoid the system call, but I > think > that's premature optimisation. > > In , > Thomas > says: > >> __atomic_notify checks to see if that count is non-zero >> before invoking the platform/semaphore notify because it is cheaper >> to do the atomic load than it is to make the syscall() when there are >> no >> waiters. > > That's true, but it ignores the fact that users of > std::atomic::wait() are > likely to track contention by themselves on the value of the atomic (I > know I > do). libc++, MS STL and the original atomic_wait[1 [1]] implementation > operate on > direct system calls and don't try to track contention. Because of that, > libstdc++ adding an extra layer of tracking is double book-keeping and > violates "don't pay for what you don't need". > > [1] https://github.com/ogiroux/atomic_wait > > 3) other int-sized atomics don't use futex > > The kernel doesn't care what we store in those 32 bits, only that they > are > comparable (FUTEX_OP does provide less-than and greater-than operations > that > are signed, but we don't use that and the kernel API can be later > extended to > unsigned compares). Therefore, I would recommend we extend futex > support to > all 32-bit signed scalar types. This would add: > * pointers and long on 32-bit architectures > * unsigned > * untyped enums and typed enums on int & unsigned int > * float > > 4) std::latch and std::counting_semaphore defaults too high > > The standard requires a ptrdiff_t in the API, but it does allow us to > set hard > limits and/or defaults for best performance. Right now, std::latch is > implemented using ptrdiff_t internally and std::counting_semaphore's > default > limit uses PTRDIFF_MAX: > > template __gnu_cxx::__int_traits::__max> > class counting_semaphore > > As a consequence, those two classes will use 64-bit atomics on 64-bit > architectures, which means they will not use futex. I recommend > changing > std::latch's internal to int and lowering the default maximum in > std::counting_semaphore to INT_MAX. > > 5) std::barrier uses a type that futex can't handle > > The barrier phase is the actual type that gets waited / notified on, > but it is > an enum class on unsigned char: > > enum class __barrier_phase_t : unsigned char { }; > > This is a harder problem to solve, but I really do recommend making it > 32-bit > in width. Whether each barrier phase holds only 256 values or the full > 32 > bits, I don't know (I've implemented with wrapping on 256). > > What is of consequence is that the __state_t type could increase in > size > ("could" because we can simply reduce the number of tickets in each > state > entry). I haven't done enough research on this to know whether it is > the right > way to go. Links: ------ [1] https://github.com/ogiroux/atomic_wait