From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugs-return-483309-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 5574 invoked by alias); 10 Apr 2015 15:09:05 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Received: (qmail 5477 invoked by uid 48); 10 Apr 2015 15:08:56 -0000
From: "jgreenhalgh at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/65697] __atomic memory barriers not strong enough for __sync builtins
Date: Fri, 10 Apr 2015 15:09:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: target
X-Bugzilla-Version: 5.0
X-Bugzilla-Keywords: wrong-code
X-Bugzilla-Severity: normal
X-Bugzilla-Who: jgreenhalgh at gcc dot gnu.org
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields:
Message-ID: <bug-65697-4-ySfLPWcbmW@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-65697-4@http.gcc.gnu.org/bugzilla/>
References: <bug-65697-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2015-04/txt/msg00861.txt.bz2

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697
--- Comment #12 from James Greenhalgh <jgreenhalgh at gcc dot gnu.org> ---
There are two problems here, one of which concerns me more in the real world,
and both of which rely on races if you are in the C/C++11 model - there isn't
much I can do about that as the __sync primitives are legacy and give stronger
guarantees about "visible memory references" (which includes ALL global data).

Consider:

int x = 0;
int y = 0;
int z = 0;

void foo (void)
{
  x = 1;
  __sync_fetch_and_add (&y, 1);
  z = 1;
}

My reading of what a full barrier implies would suggest that we should never
have a modification order which is any of:

  z = 1, x = 1, y = 0+1
  z = 1, y = 0+1, x = 1
  x = 1, z = 1, y = 0+1

GCC 5.0-ish (recent trunk) will emit:

foo:
    adrp    x3, .LANCHOR0
    mov    w1, 1
    add    x0, x3, :lo12:.LANCHOR0
    add    x2, x0, 4
    str    w1, [x3, #:lo12:.LANCHOR0]
.L2:
    ldaxr    w3, [x2]
    add    w3, w3, w1
    stlxr    w4, w3, [x2]
    cbnz    w4, .L2
    str    w1, [x0, 8]
    ret

Dropping some of the context and switching to pseudo-asm we have:

    str    1, [x]
.L2:
    ldaxr    tmp, [y]
    add    tmp, tmp, 1
    stlxr    flag, tmp, [y]
    cbnz    flag, .L2
    str    1, [z]

As far as I understand it, the memory ordering guarantees of the half-barrier
LDAXR/STLXR instructions do not prevent this being reordered to an execution
which looks like:

    ldaxr    tmp, [y]
    str    1, [z]
    str    1, [x]
    add    tmp, tmp, 1
    stlxr    flag, tmp, [y]
    cbnz    flag, .L2

which gives one of the modification orders we wanted to forbid above. Similar
reordering can give the other undesirable modification orders.

As mentioned, this reordering is permitted under C11, as the stores to x and z
are racy - this permits the CPPMEM/Cambridge documentation of what an SEQ_CST
must do. However, my feeling is that it is at the very least *surprising* for
the __sync primitives to allow this ordering, and more likely points to a break
in the AArch64 implementation.

Fixing this requires a hard memory barrier - but I really don't want to see us
penalise C11 SEQ_CST to get the right behaviour out of __sync_fetch_and_add...

The second problem relies on a similar argument to show that in a
__sync_lock_test_and_set operation we can reorder a later access before the
"set" half of the operation. i.e. that

  __sync_lock_test_and_set (&y, 1)
  x = 1;

Can be seen as

  tmp = load_acquire (y)
  store (x, 1)
  store (y, 1)

Giving an incorrect modification order.

The AArch64 parts are something I can argue through with smarter people than
me, but I think my comments on the __sync primitives are correct?