From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id 747A73858CDA; Wed, 28 Sep 2022 22:50:22 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 747A73858CDA
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1664405422;
	bh=pgu/g+tC1bTss+pCh7GCEgBvFu7xuJ4JAx2cZAO87aU=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=R1dWVurVxXWnRlR0z7PIk76YR4RAJplrUUtPqyyOUaBBX1zu7WZQ8BN6bO2sn6q7D
	 QPwqXtdpvcW/6JO9uxyLpYB8qdrmcG+MBQhc3AxJ3d3WSlF7vUOJ38o2NWSujjSjh/
	 zPlq0lK3ZBGDkbNRuCJ7IN5D8+WYjmKmG3Gr/BBM=
From: "rodgertq at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug libstdc++/106772] atomic<T>::wait shouldn't touch waiter pool
 if used platform wait
Date: Wed, 28 Sep 2022 22:50:22 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: libstdc++
X-Bugzilla-Version: unknown
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: normal
X-Bugzilla-Who: rodgertq at gcc dot gnu.org
X-Bugzilla-Status: RESOLVED
X-Bugzilla-Resolution: INVALID
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-106772-4-BZdheWwgEV@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-106772-4@http.gcc.gnu.org/bugzilla/>
References: <bug-106772-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D106772
--- Comment #22 from Thomas Rodgers <rodgertq at gcc dot gnu.org> ---
(In reply to Mkkt Bkkt from comment #20)
> My main concern with this optimization it's not zero-overhead.
>=20
> It's not necessary when we expect we have some waiters, in that case it j=
ust
> additional synchronization and contention in waiter pool (that have small
> fixed size, just imagine system with 100+ cores, if we have > 16 waiting
> threads some of them make fetch_add/sub on the same atomic, that can be
> expensive, especially on numa)
>=20
> And at the same time, I don't understand when I need to notify and cannot
> know notification not needed.
> I don't understand when it useful.

You are correct, it is not zero overhead. It also isn't clear what those
overheads are, either. As I noted in comment #21, there is no test over a
variety of workloads to inform this discussion, either.

Your example of '100+ core' systems especially on NUMA is certainly a valid
one. I would ask, at what point do those collisions and the resulting cache
invalidation traffic swamp the cost of just making the syscall? I do plan to
put these tests together, because there is another algorithm that I am
exploring, that I believe will reduce the likelihood of spurious wakeups, a=
nd
achieves the same result as this particular approach, without generating the
same invalidation traffic. At this point, I don't anticipate doing that work
until after GCC13 stage1 closes.=