From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 77F773858C78; Mon, 29 May 2023 13:30:01 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 77F773858C78 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1685367001; bh=24f6/HZtP46+fuIPKWSxddqjvTfFLyEzmqx+2FQ9N18=; h=From:To:Subject:Date:In-Reply-To:References:From; b=nKyG2m5K0kKPlTjS2zbiZInuVgy83hnLPtN1EsoYsfMQScW+jjwVe5LUz8xcErIap x9zXI7LJ/guwxlTpwkxuxFsd7ZuXgW49qrbZITurzR0Gp+60Nn/8+Av5/KSHGcJlge fCqnvtyEDonPOBrpGAY+am5xVfmwouaUcOyqeGOI= From: "rachel at rachelmant dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug libstdc++/110016] Possible miscodegen when inlining std::condition_variable::wait predicate causes deadlock Date: Mon, 29 May 2023 13:30:01 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: libstdc++ X-Bugzilla-Version: 12.2.1 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: rachel at rachelmant dot com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D110016 Rachel Mant changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |rachel at rachelmant dot c= om --- Comment #11 from Rachel Mant --- (In reply to Andrew Pinski from comment #9) Hi Andrew, original author of the ode here > So I think this is a bug in your code: >=20 > Inside substrate::threadPool_t::finish, > we have: >=20 > finished =3D true; > haveWork.notify_all(); >=20 `finished` here should be a std::atomic in strict sequential ordering mode > If I change it to: > { > std::lock_guard lock{workMutex}; > finished =3D true; > haveWork.notify_all(); > } >=20 This will work, yes, except it will also cause mutex contention on the condition variable, and should not be necessary iff the atomic is working correctly as the order of operations between the threads should be: Thread 1 | Thread 2 ----------------------+------------------- finished =3D true | haveWork.notify_all() | | lambda evlauation | The only way that the mutex can fix this is to add in the barrier that shou= ld already be there by virtue of the atomic being used std::memory_order_seq_c= st unless there's also a TOCTOU issue in play. > Then I don't get a deadlock at all. > As I mentioned, I did think there was a race condition. > Here is what I think happened: > Thread26: thread 1 > checks finished, still false sets finished to be true > calls wait calls notify_all > ... notify_all happens > finally gets into futex_wait syscall .... >=20 > And then thread26 never got the notification. >=20 > With my change the check for finished has to wait till thread1 lets go of > the mutex (and the other way around). Do please correct us if we're wrong about our understanding on how the atom= ic should be working here. We're well aware that concurrency is /hard/ and get= ting things like this right is even harder. The thing that stands out, as Amy sa= id, is that GCC is the *only* compiler on which we get this deadlock which is w= hy she and we think this is a stdlib or codegen bug rather than our bug, but if you can say how it's not, then we're happy to accept that.=