From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 8DC693858D20; Tue, 8 Feb 2022 16:31:23 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 8DC693858D20 From: "poulhies at adacore dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug libstdc++/104442] New: atomic::wait incorrectly loops in case of spurious notification when __waiter is shared Date: Tue, 08 Feb 2022 16:31:23 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: libstdc++ X-Bugzilla-Version: 11.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: poulhies at adacore dot com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter target_milestone attachments.created Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Feb 2022 16:31:23 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D104442 Bug ID: 104442 Summary: atomic::wait incorrectly loops in case of spurious notification when __waiter is shared Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: poulhies at adacore dot com Target Milestone: --- Created attachment 52377 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=3D52377&action=3Dedit patch fixing the issue We are observing a deadlock in 100334.cc on vxworks. This is caused by : template void _M_do_wait_v(_Tp __old, _ValFn __vfn) { __platform_wait_t __val; if (__base_type::_M_do_spin_v(__old, __vfn, __val)) return; do { __base_type::_M_w._M_do_wait(__base_type::_M_addr, __val); } while (__detail::__atomic_compare(__old, __vfn())); } When several thread are sharing the waiter (as in 100334.cc), the notify_on= e() will wake all threads blocked in the _M_do_wait() above. The thread whose d= ata changed exits the loop correctly, but the others are looping back in _M_do_wait() with the same arguments. As the waiter's value has changed sin= ce the previous iteration but not __val, the method directly returns (as if it= had detected a notification) and the loop continues. On GNU/Linux, the test is PASS because the main thread is still scheduled a= nd will do a .store(1) on all atoms, unblocking all the busy-waiting thread (b= ut the thread doing a busywait can still be observed with gdb). On vxworks, the main thread is never scheduled again (I think there's no preemption at the same prio level) and the busy-wait starves the system. The attached patch is a possible fix. It moves the spin() call inside the l= oop, updating the __val at every iteration. A better fix is probably possible but may require some refactoring (a bit more than I'm comfortable with). I've checked the patch for regression on gcc-master for x86_64. It also fix= es the test on gcc-11 for aarch64-vxworks7.=