From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id D87543857B80; Wed, 28 Sep 2022 23:25:40 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org D87543857B80
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1664407540;
	bh=oAw8fApWO3B5zFQwhKCxR+MGjN5iwEZKJOA4hM5z/fo=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=q8zQoNGLcHgmcXQiefuthCls2PaMGQ64bxxwGg3oDXUd6Nc9bVQ2NIh4KBKp3nodz
	 YyZxDvNpAiCdLzAM4K8PHP26aFHAmNbqNM2dHfDkQ6QBswIq03cQC3PP5MMb/JaQIL
	 RNtY1E8Lb5qVUfL51zKc398H/kO4/GPbOQ7mHzN0=
From: "valera.mironow at gmail dot com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug libstdc++/106772] atomic<T>::wait shouldn't touch waiter pool
 if used platform wait
Date: Wed, 28 Sep 2022 23:25:40 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: libstdc++
X-Bugzilla-Version: unknown
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: normal
X-Bugzilla-Who: valera.mironow at gmail dot com
X-Bugzilla-Status: RESOLVED
X-Bugzilla-Resolution: INVALID
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-106772-4-w3kfkRKY1z@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-106772-4@http.gcc.gnu.org/bugzilla/>
References: <bug-106772-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D106772
--- Comment #24 from Mkkt Bkkt <valera.mironow at gmail dot com> ---
(In reply to Thomas Rodgers from comment #22)
> Your example of '100+ core' systems especially on NUMA is certainly a val=
id
> one. I would ask, at what point do those collisions and the resulting cac=
he
> invalidation traffic swamp the cost of just making the syscall? I do plan=
 to
> put these tests together, because there is another algorithm that I am
> exploring, that I believe will reduce the likelihood of spurious wakeups,
> and achieves the same result as this particular approach, without generat=
ing
> the same invalidation traffic. At this point, I don't anticipate doing th=
at
> work until after GCC13 stage1 closes.

I try to explain:=20

syscall overhead is some constant commonly like 10-30ns (futex syscall can =
be
more expensive like 100ns in your example)

But numbers of cores are grow, arm also makes more popular (fetch_add/sub h=
ave
more cost on it compares to x86)
And people already faced with situation where fetch_add have a bigger cost =
than
syscall overhead:

https://pkolaczk.github.io/server-slower-than-a-laptop/
https://travisdowns.github.io/blog/2020/07/06/concurrency-costs.html

I don't think we will faced with problem like in these links in
atomic::wait/notify in real code, but I'm pretty sure in some cases it can =
be
more expansive than syscall part of atomic::wait/notify

Of course better to prove it, maybe someday I will do it :(=