From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id 197A23858D28; Wed, 28 Sep 2022 23:40:37 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 197A23858D28
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1664408437;
	bh=6HPSdpLA1vwH1DGr0c3SbKasR6wi/Il6j3ozO3BAEBw=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=PQ9HmySnUcksWZPLM09GXVfry7NrzQQvgxW8MGprziUxbkPwrHXwxv0sNQHu/WP1V
	 FSyQgOKyz0QpJXzOPFYy3sBiD/BnrCnB0oHg29fLS3ZhJtSUZ8c6KfHXPtU9KPssTM
	 6bkN2YbmMiEAGr6VWnN1HGh4SDcZdnDHktGGuGCI=
From: "rodgertq at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug libstdc++/106772] atomic<T>::wait shouldn't touch waiter pool
 if used platform wait
Date: Wed, 28 Sep 2022 23:40:36 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: libstdc++
X-Bugzilla-Version: unknown
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: normal
X-Bugzilla-Who: rodgertq at gcc dot gnu.org
X-Bugzilla-Status: RESOLVED
X-Bugzilla-Resolution: INVALID
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-106772-4-ofloZsmr9T@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-106772-4@http.gcc.gnu.org/bugzilla/>
References: <bug-106772-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D106772
--- Comment #25 from Thomas Rodgers <rodgertq at gcc dot gnu.org> ---
(In reply to Mkkt Bkkt from comment #24)
> (In reply to Thomas Rodgers from comment #22)
> > Your example of '100+ core' systems especially on NUMA is certainly a v=
alid
> > one. I would ask, at what point do those collisions and the resulting c=
ache
> > invalidation traffic swamp the cost of just making the syscall? I do pl=
an to
> > put these tests together, because there is another algorithm that I am
> > exploring, that I believe will reduce the likelihood of spurious wakeup=
s,
> > and achieves the same result as this particular approach, without gener=
ating
> > the same invalidation traffic. At this point, I don't anticipate doing =
that
> > work until after GCC13 stage1 closes.
>=20
> I try to explain:=20
>=20
> syscall overhead is some constant commonly like 10-30ns (futex syscall can
> be more expensive like 100ns in your example)
>=20
> But numbers of cores are grow, arm also makes more popular (fetch_add/sub
> have more cost on it compares to x86)
> And people already faced with situation where fetch_add have a bigger cost
> than syscall overhead:
>=20
> https://pkolaczk.github.io/server-slower-than-a-laptop/
> https://travisdowns.github.io/blog/2020/07/06/concurrency-costs.html
>=20
> I don't think we will faced with problem like in these links in
> atomic::wait/notify in real code, but I'm pretty sure in some cases it can
> be more expansive than syscall part of atomic::wait/notify
>=20
> Of course better to prove it, maybe someday I will do it :(

So to your previous comment, I don't the discussion is at all pointless. i =
plan
to raise some of these issues at the next SG1 meeting in November. Sure, th=
at
doesn't help *you* or any developer with your specific intent until C++26, =
and
maybe Boost's implementation is a better choice, I also get how unsatisfyin=
g of
an aswer that is.

I'm well aware of the potential scalability problems, and I have a longer t=
erm
plan to get concrete data on how different implementation choices impact
scalability. The barrier implementation (which is the same algorithm as in
libc++), for example spreads this traffic over 64 individual atomic_refs, f=
or
this very reason, and that implementation has been shown to scale quite wel=
l on
ORNL's Summit. But not all users of libstdc++ have those sorts of problems
either.=