From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id C81983858D28; Wed, 28 Sep 2022 22:34:21 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org C81983858D28
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1664404461;
	bh=QTW0//5yUECX8KWrZH0UP8BW8cyQmU2l9SFgvBtWPfs=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=sKs3agcDCkRNLf+HbjkkZDBAgr1jWrBF4cxzbw39wMWVkBrhMaVzzI/Ec+5onFmjA
	 JbdnmvMv8+IQgdrtuVU01b4H7Oh/VSOiAEGqROYglIMBmE97kdqpzYxqCirKfIn0gy
	 41Hh67+ZnpIX66cUi4GYuUtetVl+7fB+1I/4U/oQ=
From: "rodgertq at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug libstdc++/106772] atomic<T>::wait shouldn't touch waiter pool
 if used platform wait
Date: Wed, 28 Sep 2022 22:34:20 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: libstdc++
X-Bugzilla-Version: unknown
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: normal
X-Bugzilla-Who: rodgertq at gcc dot gnu.org
X-Bugzilla-Status: RESOLVED
X-Bugzilla-Resolution: INVALID
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-106772-4-X37TSZjcJB@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-106772-4@http.gcc.gnu.org/bugzilla/>
References: <bug-106772-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D106772
--- Comment #21 from Thomas Rodgers <rodgertq at gcc dot gnu.org> ---
(In reply to Mkkt Bkkt from comment #16)
> > it with sarcasm
>=20
> I started with sarcasm because you restart this thread with some doubtful
> benchmarks without code for them.
>=20
> I think it's very disrespectfully.

I wasn't sarcastic in what I posted. As I noted, this question has come up
before in different contexts, Bugzilla is a useful historical archive, so
updating this issue with my reasoning and a bit of data was primarily a cap=
ture
task.

So, let's try this again.

I did not post the original source because it required hacking on the libst=
dc++
headers. I have posted a version that does not require that, the results are
identical.

In this test, it is the example Jonathan cited in #14; incrementing an atom=
ic
int and calling notify.

This isn't about semaphore or any other synchronization primitive. Those ty=
pes
are free to make different choices that may be more appropriate to the
constrained usage of the type just like Lewis' lightweight manual reset eve=
nt
does (I will also note that Lewis has reviewed this implementation, and has
written a paper to be discussed at the Fall meeting, p2616).=20

There are, as Jonathan has pointed out, use cases where notify can and will=
 be
called without a notifier having any way to determine it will wake a waiter.
One example I, as the person that is going to have to implement C++26 execu=
tors
care about is a wait free work-stealing deque, it certainly doesn't require
anything more than spinning for work on an empty queue to be algorithmically
correct, but after spinning on an empty queue, making the rounds trying to
steal work from other deques, maybe spinning a bit more, just to be sure, t=
he
de-queuing thread which hasn't been able to acquire more work is probably g=
oing
to want to enter a wait until such time as it knows it can do productive wo=
rk.
Another thread pushing work into that queue won't be able to determine if t=
he
dequeue-ing thread is spinning for new work, work stealing, or has entered a
wait, but atomic<int>::notify() does know that, and can avoid penalizing the
thread submitting work with a syscall, if there is no thread in a wait on t=
he
other end of the deque, which is the expected case for this algorithm.

p1135 was the paper that added atomic wait/notify. One of the co-authors of
that paper wrote the libc++ implementation. That implementation, as with
libstdc++, is not simply a wrapper over the underlying platform's wait
primitive. It has two 'backends', an exponentially timed backoff and ulock
wait/wake. libstdc++ currently has a Futex and condvar backend. Both
implementations make the choice of having a short-term spinning strategy and
long term waiting strategy (spinning, futex/ulock, condvar).

I have confirmed with the libc++ implementation's author, (who also chairs =
the
C++ committee's Concurrency and Parallelism study group), that it was never=
 the
intention of p1135 or the subsequently standardized language in C++20 to im=
ply
that wait/notify were direct portable analogs to the platform's waiting
primitives. There are users, such as yourself that want exactly that, there=
 are
other users (like in my prior industry) where busy waiting is the desired
strategy, and in between those two choices are people who want it to work as
advertised in the standard, and to do so 'efficiently'. Both libc++ and
libstdc++ take a balanced approach somewhere between always going to OS and
always spinning.

There is an open question here that your original issue raises -

* At what point do collisions on the waiter pool, with the cache invalidati=
on
traffic and spurious wakeups that result, swamp the gain of doing this empty
waiter check on notify?

I also, made a comment about the 'size of the waiter pool not withstanding'=
. I
chose a smaller size than libc++ chose, in part because Jonathan and I did =
not
want to make any sort of ABI commitment until this had been in a few GCC
releases. This implementation is header only at present, and still consider=
ed
experimental. libc++ committed to an ABI early.

In the sizing of the libc++ waiter pool there is the comment that 'there is=
 no
magic in this value'.=20

Not only is there no magic, there is no test of any sort that I have done, =
or
that has been done on libc++ to determine what effect different size pools =
have
under different load scenarios. So, all of it is a guess at this point. I w=
ill
likely match libc++ when I do move this into the .so.

Finally, in the most recent mailing there is p2643 which proposes additional
changes to atomic waiting. One proposal is to add a 'hinted' wait that can
allow the caller to steer the choices atomic wait/notify uses. I have confe=
rred
with the other authors of the paper and this latter option is not without
controversy, and likely some sharp edges for the user, but I plan to raise =
the
discussion at the Fall WG21 meeting to see what the other members of SG1 th=
ink.=