From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 7060 invoked by alias); 30 Oct 2014 22:13:46 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org Received: (qmail 6992 invoked by uid 48); 30 Oct 2014 22:13:42 -0000 From: "filter-gcc at preshing dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug middle-end/59448] Code generation doesn't respect C11 address-dependency Date: Thu, 30 Oct 2014 22:16:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: middle-end X-Bugzilla-Version: unknown X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: filter-gcc at preshing dot com X-Bugzilla-Status: NEW X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2014-10/txt/msg02422.txt.bz2 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59448 preshing changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |filter-gcc at preshing dot com --- Comment #22 from preshing --- (In reply to algrant from comment #13) > I see what you mean - there is a race if threadB reads the data when the > flag is not set, i.e. in the case when the read value is never used. > Moving the read of data after the test (as in the fragment in comment 4) > would fix the race. algrant, could you post the complete assembly listing for threadB(), using your version of AArch64 g++, after all the corrections to the example have been taken into account, with -O2? For convenience, I've posted the corrected example below. When I compile the example using PowerPC g++ 4.9.1, it seems that memory_order_consume is already promoted to memory_order_acquire, as evidenced by the cmpw;bne-;isync sequence following the load from flag, which acts as an acquire barrier. I've posted the PowerPC assembly listing below as well. Is the bug specific to AArch64? ------------- /* Reproducer for 59448: compile with -std=c++11 -lpthread -O2 */ #include #include #include #include static std::atomic flag(0); static int data; /* Thread A tries to release a data value 1 to thread B via a release/consume sequence. In the demonstration, this is tried repeatedly. The iterations of the two threads are synchronized via a release/acquire sequence from B to A. This is not relevant to the ordering problem. */ void threadA(void) { for (;;) { data = 1; // A.A flag.store(1, std::memory_order_release); // A.B /* By 1.9#14, A.A is sequenced before A.B */ /* Now wait for thread B to accept the data. */ while (flag.load(std::memory_order_acquire) == 1); assert(data == 0); } } void threadB(void) { for (;;) { int f, d; do { f = flag.load(std::memory_order_consume); // B.A } while (!f); /* By 1.10#9, B.A carries a dependency to B.B */ /* By 1.10#10, A.B is dependency-ordered before B.B */ /* By 1.10#11, A.A intra-thread happens before B.B */ /* By 1.10#12, A.A happens before B.B */ /* By 1.10#13, A.A is a visible side-effect with respect to B.B and the value determined by B.B shall be the value (1) stored by A.A. */ d = *(&data + (f - f)); // B.B assert(d == 1); /* Release thread A to start another iteration. */ data = 0; flag.store(0, std::memory_order_release); } } int main(void) { std::thread a(&threadA); std::thread b(&threadB); a.join(); b.join(); return 0; } ------------- PowerPC listing of threadB(): _Z7threadBv: .LFB2302: .cfi_startproc lis 10,.LANCHOR0@ha li 8,0 la 10,.LANCHOR0@l(10) .L16: lwz 9,4(10) cmpw 7,9,9 <------- acquire barrier bne- 7,$+4 <------- isync <------- cmpwi 7,9,0 beq+ 7,.L16 lwz 9,0(10) cmpwi 7,9,1 bne- 7,.L22 stw 8,0(10) lwsync stw 8,4(10) b .L16 .L22: mflr 0 lis 6,.LANCHOR1@ha stwu 1,-16(1) .cfi_def_cfa_offset 16 .cfi_register 65, 0 lis 3,.LC2@ha lis 4,.LC1@ha la 6,.LANCHOR1@l(6) la 3,.LC2@l(3) la 4,.LC1@l(4) li 5,47 addi 6,6,16 stw 0,20(1) .cfi_offset 65, 4 bl __assert_fail .cfi_endproc