From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 5574 invoked by alias); 10 Apr 2015 15:09:05 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org Received: (qmail 5477 invoked by uid 48); 10 Apr 2015 15:08:56 -0000 From: "jgreenhalgh at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins Date: Fri, 10 Apr 2015 15:09:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 5.0 X-Bugzilla-Keywords: wrong-code X-Bugzilla-Severity: normal X-Bugzilla-Who: jgreenhalgh at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2015-04/txt/msg00861.txt.bz2 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697 --- Comment #12 from James Greenhalgh --- There are two problems here, one of which concerns me more in the real world, and both of which rely on races if you are in the C/C++11 model - there isn't much I can do about that as the __sync primitives are legacy and give stronger guarantees about "visible memory references" (which includes ALL global data). Consider: int x = 0; int y = 0; int z = 0; void foo (void) { x = 1; __sync_fetch_and_add (&y, 1); z = 1; } My reading of what a full barrier implies would suggest that we should never have a modification order which is any of: z = 1, x = 1, y = 0+1 z = 1, y = 0+1, x = 1 x = 1, z = 1, y = 0+1 GCC 5.0-ish (recent trunk) will emit: foo: adrp x3, .LANCHOR0 mov w1, 1 add x0, x3, :lo12:.LANCHOR0 add x2, x0, 4 str w1, [x3, #:lo12:.LANCHOR0] .L2: ldaxr w3, [x2] add w3, w3, w1 stlxr w4, w3, [x2] cbnz w4, .L2 str w1, [x0, 8] ret Dropping some of the context and switching to pseudo-asm we have: str 1, [x] .L2: ldaxr tmp, [y] add tmp, tmp, 1 stlxr flag, tmp, [y] cbnz flag, .L2 str 1, [z] As far as I understand it, the memory ordering guarantees of the half-barrier LDAXR/STLXR instructions do not prevent this being reordered to an execution which looks like: ldaxr tmp, [y] str 1, [z] str 1, [x] add tmp, tmp, 1 stlxr flag, tmp, [y] cbnz flag, .L2 which gives one of the modification orders we wanted to forbid above. Similar reordering can give the other undesirable modification orders. As mentioned, this reordering is permitted under C11, as the stores to x and z are racy - this permits the CPPMEM/Cambridge documentation of what an SEQ_CST must do. However, my feeling is that it is at the very least *surprising* for the __sync primitives to allow this ordering, and more likely points to a break in the AArch64 implementation. Fixing this requires a hard memory barrier - but I really don't want to see us penalise C11 SEQ_CST to get the right behaviour out of __sync_fetch_and_add... The second problem relies on a similar argument to show that in a __sync_lock_test_and_set operation we can reorder a later access before the "set" half of the operation. i.e. that __sync_lock_test_and_set (&y, 1) x = 1; Can be seen as tmp = load_acquire (y) store (x, 1) store (y, 1) Giving an incorrect modification order. The AArch64 parts are something I can argue through with smarter people than me, but I think my comments on the __sync primitives are correct?