From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 747A73858CDA; Wed, 28 Sep 2022 22:50:22 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 747A73858CDA DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1664405422; bh=pgu/g+tC1bTss+pCh7GCEgBvFu7xuJ4JAx2cZAO87aU=; h=From:To:Subject:Date:In-Reply-To:References:From; b=R1dWVurVxXWnRlR0z7PIk76YR4RAJplrUUtPqyyOUaBBX1zu7WZQ8BN6bO2sn6q7D QPwqXtdpvcW/6JO9uxyLpYB8qdrmcG+MBQhc3AxJ3d3WSlF7vUOJ38o2NWSujjSjh/ zPlq0lK3ZBGDkbNRuCJ7IN5D8+WYjmKmG3Gr/BBM= From: "rodgertq at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug libstdc++/106772] atomic::wait shouldn't touch waiter pool if used platform wait Date: Wed, 28 Sep 2022 22:50:22 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: libstdc++ X-Bugzilla-Version: unknown X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: rodgertq at gcc dot gnu.org X-Bugzilla-Status: RESOLVED X-Bugzilla-Resolution: INVALID X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D106772 --- Comment #22 from Thomas Rodgers --- (In reply to Mkkt Bkkt from comment #20) > My main concern with this optimization it's not zero-overhead. >=20 > It's not necessary when we expect we have some waiters, in that case it j= ust > additional synchronization and contention in waiter pool (that have small > fixed size, just imagine system with 100+ cores, if we have > 16 waiting > threads some of them make fetch_add/sub on the same atomic, that can be > expensive, especially on numa) >=20 > And at the same time, I don't understand when I need to notify and cannot > know notification not needed. > I don't understand when it useful. You are correct, it is not zero overhead. It also isn't clear what those overheads are, either. As I noted in comment #21, there is no test over a variety of workloads to inform this discussion, either. Your example of '100+ core' systems especially on NUMA is certainly a valid one. I would ask, at what point do those collisions and the resulting cache invalidation traffic swamp the cost of just making the syscall? I do plan to put these tests together, because there is another algorithm that I am exploring, that I believe will reduce the likelihood of spurious wakeups, a= nd achieves the same result as this particular approach, without generating the same invalidation traffic. At this point, I don't anticipate doing that work until after GCC13 stage1 closes.=