From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugs-return-465401-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 7060 invoked by alias); 30 Oct 2014 22:13:46 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Received: (qmail 6992 invoked by uid 48); 30 Oct 2014 22:13:42 -0000
From: "filter-gcc at preshing dot com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug middle-end/59448] Code generation doesn't respect C11 address-dependency
Date: Thu, 30 Oct 2014 22:16:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: middle-end
X-Bugzilla-Version: unknown
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: filter-gcc at preshing dot com
X-Bugzilla-Status: NEW
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields: cc
Message-ID: <bug-59448-4-gPaaetDNE0@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-59448-4@http.gcc.gnu.org/bugzilla/>
References: <bug-59448-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2014-10/txt/msg02422.txt.bz2

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59448

preshing <filter-gcc at preshing dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |filter-gcc at preshing dot com
--- Comment #22 from preshing <filter-gcc at preshing dot com> ---
(In reply to algrant from comment #13)
> I see what you mean - there is a race if threadB reads the data when the
> flag is not set, i.e. in the case when the read value is never used.
> Moving the read of data after the test (as in the fragment in comment 4)
> would fix the race.

algrant, could you post the complete assembly listing for threadB(), using your
version of AArch64 g++, after all the corrections to the example have been
taken into account, with -O2? For convenience, I've posted the corrected
example below.

When I compile the example using PowerPC g++ 4.9.1, it seems that
memory_order_consume is already promoted to memory_order_acquire, as evidenced
by the cmpw;bne-;isync sequence following the load from flag, which acts as an
acquire barrier. I've posted the PowerPC assembly listing below as well.

Is the bug specific to AArch64?

-------------

/* Reproducer for 59448: compile with -std=c++11 -lpthread -O2 */

#include <atomic>
#include <thread>
#include <stdio.h>
#include <assert.h>

static std::atomic<unsigned int> flag(0);
static int data;

/*
Thread A tries to release a data value 1 to thread B via a
release/consume sequence.

In the demonstration, this is tried repeatedly.  The iterations of
the two threads are synchronized via a release/acquire sequence
from B to A.  This is not relevant to the ordering problem.
*/

void threadA(void)
{
  for (;;) {
    data = 1;                                           // A.A
    flag.store(1, std::memory_order_release);           // A.B
    /* By 1.9#14, A.A is sequenced before A.B */
    /* Now wait for thread B to accept the data. */
    while (flag.load(std::memory_order_acquire) == 1);
    assert(data == 0);
  }
}

void threadB(void)
{
  for (;;) {
    int f, d;
    do {
      f = flag.load(std::memory_order_consume);             // B.A
    } while (!f);
    /* By 1.10#9, B.A carries a dependency to B.B */
    /* By 1.10#10, A.B is dependency-ordered before B.B */
    /* By 1.10#11, A.A intra-thread happens before B.B */
    /* By 1.10#12, A.A happens before B.B */
    /* By 1.10#13, A.A is a visible side-effect with respect to B.B
                   and the value determined by B.B shall be the
                   value (1) stored by A.A. */
    d = *(&data + (f - f));                                 // B.B
    assert(d == 1);
    /* Release thread A to start another iteration. */
    data = 0;
    flag.store(0, std::memory_order_release);
  }
}

int main(void)
{
  std::thread a(&threadA);
  std::thread b(&threadB);
  a.join();
  b.join();
  return 0;
}

------------- PowerPC listing of threadB():

_Z7threadBv:
.LFB2302:
    .cfi_startproc
    lis 10,.LANCHOR0@ha
    li 8,0
    la 10,.LANCHOR0@l(10)
.L16:
    lwz 9,4(10)
    cmpw 7,9,9      <------- acquire barrier
    bne- 7,$+4      <-------
    isync           <-------
    cmpwi 7,9,0
    beq+ 7,.L16
    lwz 9,0(10)
    cmpwi 7,9,1
    bne- 7,.L22
    stw 8,0(10)
    lwsync
    stw 8,4(10)
    b .L16
.L22:
    mflr 0
    lis 6,.LANCHOR1@ha
    stwu 1,-16(1)
    .cfi_def_cfa_offset 16
    .cfi_register 65, 0
    lis 3,.LC2@ha
    lis 4,.LC1@ha
    la 6,.LANCHOR1@l(6)
    la 3,.LC2@l(3)
    la 4,.LC1@l(4)
    li 5,47
    addi 6,6,16
    stw 0,20(1)
    .cfi_offset 65, 4
    bl __assert_fail
    .cfi_endproc