From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 1A8A43858C39; Wed, 30 Aug 2023 09:42:07 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 1A8A43858C39 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1693388527; bh=BYgdaE0A6jUWDO72sjOGRyrWBnMdnmtLo+cawJBioxM=; h=From:To:Subject:Date:From; b=MtH/k5ZIExwgnQoDcvztgGeGhkKXtQtIsLTSdWPOxBUxVf4P7PQ/YihddJqpt2h2m umHxPCZzZnCpKv67isUugOXppwGD0+7UhfXe7ppS8kSOJQcKhqTRBAQGy6qWFA6vGL NtWTbOWwCJbKgXg+NRO9E7+4fYVnD7PsSeFubpj0= From: "luke.geeson at cs dot ucl.ac.uk" To: gcc-bugs@gcc.gnu.org Subject: [Bug translation/111235] New: [Armv7-a]: Control-dependency between atomic accesses removed by -O1. Date: Wed, 30 Aug 2023 09:42:04 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: translation X-Bugzilla-Version: 14.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: luke.geeson at cs dot ucl.ac.uk X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D111235 Bug ID: 111235 Summary: [Armv7-a]: Control-dependency between atomic accesses removed by -O1. Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: translation Assignee: unassigned at gcc dot gnu.org Reporter: luke.geeson at cs dot ucl.ac.uk Target Milestone: --- Consider the following litmus test that captures bad behaviour: ``` C test { *x =3D 0; *y =3D 0; } // fixed initial state where x and y are 0 // Concurrent threads P0 (atomic_int* y,int* x) { int r0 =3D *x; if (r0 =3D=3D 1) { atomic_store_explicit(y,1,memory_order_relaxed); } } P1 (atomic_int* y,int* x) { int r0 =3D atomic_load_explicit(y,memory_order_acquire); *x =3D 1; } // predicate over final state - does this outcome occur? exists (P0:r0=3D1 /\ P1:r0=3D1) ``` where 'P0:r0 =3D 0' means thread P0, local variable r0 has value 0. When simulating this test under the C/C++ model from its initial state, the outcome of execution in the exists clause is forbidden by the source model.= The allowed outcomes are: ``` {P0:r0=3D0; P1:r0=3D0;} {P0:r0=3D1; P1:r0=3D0;}=20=20=20 ``` When compiling to target armv7-a using either GCC or LLVM trunk (https://godbolt.org/z/nxWTbEfK1), the compiled program has the following outcomes when simulated under the armv7 model: ``` {P0:r0=3D0; P1:r0=3D0;}=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20 {P0:r0=3D1; P1:r0=3D0;} {P0:r0=3D1; P1:r0=3D1;} <--- forbidden by source model, bug! ``` Since the `CMP;MOVEQ r1, #1;STREQ` sequence is predicated on the result of CMP, but the STREQ can be reordered before the LDR of x on thread P0, allow= ing the outcome {P0:r0=3D1; P1:r0=3D1;}.=20 The issue is that the control dependency between the load and store on P0 h= as been removed by the compiler. Both GCC and LLVM produce the following (repl= ace symbolic registers e.g. %P0_x with concrete registers containing x). ``` ARM test { [P0_r0]=3D0;[P1_r0]=3D0;[x]=3D0;[y]=3D0; %P0_P0_r0=3DP0_r0;%P0_x=3Dx;%P0_y=3Dy; %P1_P1_r0=3DP1_r0;%P1_x=3Dx;%P1_y=3Dy } (*****************************************************************) (* the Telechat toolsuite *) (* *) (* Luke Geeson, University College London, UK. *) (* gcc -O1 -march=3Darmv7-a -pthread --std=3Dc11 -marm -fno-section-anchors= *) (* *) (*****************************************************************) P0 | P1 ; LDR R2,[%P0_x] | LDR R2,[%P1_y] ; CMP R2,#1 | DMB ISH ; MOVEQ R1,#1 | MOV R1,#1 ; STREQ R1,[%P0_y] | STR R1,[%P1_x] ; STR R2,[%P0_P0_r0] | STR R2,[%P1_P1_r0] ; BX LR | BX LR ; exists (P0_r0=3D1 /\ P1_r0=3D1) <--- yes, allowed by arm.cat, bug! ``` The forbidden {P0:r0=3D1; P1:r0=3D1;} behaviour disappears in GCC when opti= mising using -O2 or above, since R2 is reused in the STREQ when the EQ condition h= olds - a data dependency masks the loss of the control dependency. This buggy behaviour remains in clang -O2 and above however (https://godbolt.org/z/hGv7P3vGW). To fix this - mark relaxed atomic load/stores as not predicated, and ideally use an explicit branch to enforce the control dependency so that it is not lost: ``` ARM test-fixed { [P0_r0]=3D0;[P1_r0]=3D0;[x]=3D0;[y]=3D0; %P0_P0_r0=3DP0_r0;%P0_x=3Dx;%P0_y=3Dy; %P1_P1_r0=3DP1_r0;%P1_x=3Dx;%P1_y=3Dy } P0 | P1 ; LDR R0,[%P0_x] | MOV R2,#1 ; CMP R0,#1 | LDR R0,[%P1_y] ; BNE L0x2c | DMB ISH ; MOV R2,#1 | STR R2,[%P1_x] ; STR R2,[%P0_y] | STR R0,[%P1_P1_r0] ; L0x2c: | ; STR R0,[%P0_P0_r0] | ; exists (P0_r0=3D1 /\ P1_r0=3D1) ``` This prevents the buggy behaviour. We will follow up by reporting the bug for GCC.=