From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id C81983858D28; Wed, 28 Sep 2022 22:34:21 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org C81983858D28 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1664404461; bh=QTW0//5yUECX8KWrZH0UP8BW8cyQmU2l9SFgvBtWPfs=; h=From:To:Subject:Date:In-Reply-To:References:From; b=sKs3agcDCkRNLf+HbjkkZDBAgr1jWrBF4cxzbw39wMWVkBrhMaVzzI/Ec+5onFmjA JbdnmvMv8+IQgdrtuVU01b4H7Oh/VSOiAEGqROYglIMBmE97kdqpzYxqCirKfIn0gy 41Hh67+ZnpIX66cUi4GYuUtetVl+7fB+1I/4U/oQ= From: "rodgertq at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug libstdc++/106772] atomic::wait shouldn't touch waiter pool if used platform wait Date: Wed, 28 Sep 2022 22:34:20 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: libstdc++ X-Bugzilla-Version: unknown X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: rodgertq at gcc dot gnu.org X-Bugzilla-Status: RESOLVED X-Bugzilla-Resolution: INVALID X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D106772 --- Comment #21 from Thomas Rodgers --- (In reply to Mkkt Bkkt from comment #16) > > it with sarcasm >=20 > I started with sarcasm because you restart this thread with some doubtful > benchmarks without code for them. >=20 > I think it's very disrespectfully. I wasn't sarcastic in what I posted. As I noted, this question has come up before in different contexts, Bugzilla is a useful historical archive, so updating this issue with my reasoning and a bit of data was primarily a cap= ture task. So, let's try this again. I did not post the original source because it required hacking on the libst= dc++ headers. I have posted a version that does not require that, the results are identical. In this test, it is the example Jonathan cited in #14; incrementing an atom= ic int and calling notify. This isn't about semaphore or any other synchronization primitive. Those ty= pes are free to make different choices that may be more appropriate to the constrained usage of the type just like Lewis' lightweight manual reset eve= nt does (I will also note that Lewis has reviewed this implementation, and has written a paper to be discussed at the Fall meeting, p2616).=20 There are, as Jonathan has pointed out, use cases where notify can and will= be called without a notifier having any way to determine it will wake a waiter. One example I, as the person that is going to have to implement C++26 execu= tors care about is a wait free work-stealing deque, it certainly doesn't require anything more than spinning for work on an empty queue to be algorithmically correct, but after spinning on an empty queue, making the rounds trying to steal work from other deques, maybe spinning a bit more, just to be sure, t= he de-queuing thread which hasn't been able to acquire more work is probably g= oing to want to enter a wait until such time as it knows it can do productive wo= rk. Another thread pushing work into that queue won't be able to determine if t= he dequeue-ing thread is spinning for new work, work stealing, or has entered a wait, but atomic::notify() does know that, and can avoid penalizing the thread submitting work with a syscall, if there is no thread in a wait on t= he other end of the deque, which is the expected case for this algorithm. p1135 was the paper that added atomic wait/notify. One of the co-authors of that paper wrote the libc++ implementation. That implementation, as with libstdc++, is not simply a wrapper over the underlying platform's wait primitive. It has two 'backends', an exponentially timed backoff and ulock wait/wake. libstdc++ currently has a Futex and condvar backend. Both implementations make the choice of having a short-term spinning strategy and long term waiting strategy (spinning, futex/ulock, condvar). I have confirmed with the libc++ implementation's author, (who also chairs = the C++ committee's Concurrency and Parallelism study group), that it was never= the intention of p1135 or the subsequently standardized language in C++20 to im= ply that wait/notify were direct portable analogs to the platform's waiting primitives. There are users, such as yourself that want exactly that, there= are other users (like in my prior industry) where busy waiting is the desired strategy, and in between those two choices are people who want it to work as advertised in the standard, and to do so 'efficiently'. Both libc++ and libstdc++ take a balanced approach somewhere between always going to OS and always spinning. There is an open question here that your original issue raises - * At what point do collisions on the waiter pool, with the cache invalidati= on traffic and spurious wakeups that result, swamp the gain of doing this empty waiter check on notify? I also, made a comment about the 'size of the waiter pool not withstanding'= . I chose a smaller size than libc++ chose, in part because Jonathan and I did = not want to make any sort of ABI commitment until this had been in a few GCC releases. This implementation is header only at present, and still consider= ed experimental. libc++ committed to an ABI early. In the sizing of the libc++ waiter pool there is the comment that 'there is= no magic in this value'.=20 Not only is there no magic, there is no test of any sort that I have done, = or that has been done on libc++ to determine what effect different size pools = have under different load scenarios. So, all of it is a guess at this point. I w= ill likely match libc++ when I do move this into the .so. Finally, in the most recent mailing there is p2643 which proposes additional changes to atomic waiting. One proposal is to add a 'hinted' wait that can allow the caller to steer the choices atomic wait/notify uses. I have confe= rred with the other authors of the paper and this latter option is not without controversy, and likely some sharp edges for the user, but I plan to raise = the discussion at the Fall WG21 meeting to see what the other members of SG1 th= ink.=