From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id BFB72383E690; Sat, 11 Jun 2022 20:21:33 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org BFB72383E690 From: "peter at cordes dot ca" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/105929] New: [AArch64] armv8.4-a allows atomic stp. 64-bit constants can use 2 32-bit halves with _Atomic or volatile Date: Sat, 11 Jun 2022 20:21:33 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 13.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: peter at cordes dot ca X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status keywords bug_severity priority component assigned_to reporter target_milestone cf_gcctarget Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 11 Jun 2022 20:21:33 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D105929 Bug ID: 105929 Summary: [AArch64] armv8.4-a allows atomic stp. 64-bit constants can use 2 32-bit halves with _Atomic or volatile Product: gcc Version: 13.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: peter at cordes dot ca Target Milestone: --- Target: arm64-*-* void foo(unsigned long *p) { *p =3D 0xdeadbeefdeadbeef; } // compiles nicely: https://godbolt.org/z/8zf8ns14K mov w1, 48879 movk w1, 0xdead, lsl 16 stp w1, w1, [x0] ret But even with -Os -march=3Darmv8.4-a the following doesn't: void foo_atomic(_Atomic unsigned long *p) { __atomic_store_n(p, 0xdeadbeefdeadbeef, __ATOMIC_RELAXED); } mov x1, 48879 movk x1, 0xdead, lsl 16 movk x1, 0xbeef, lsl 32 movk x1, 0xdead, lsl 48 stlr x1, [x0] ret ARMv8.4-a and later guarantees atomicity for aligned ldp/stp, according to ARM's architecture reference manual: ARM DDI 0487H.a - ID020222, so we could use the same asm as the non-atomic version. > If FEAT_LSE2 is implemented, LDP, LDNP, and STP instructions that access = fewer than 16 bytes are single-copy atomic when all of the following condit= ions are true: > =E2=80=A2 All bytes being accessed are within a 16-byte quantity aligned = to 16 bytes. > =E2=80=A2 Accesses are to Inner Write-Back, Outer Write-Back Normal cache= able memory (FEAT_LSE2 is the same CPU feature that gives 128-bit atomicity for aligned ldp/stp x,x,mem) Prior to that, apparently it wasn't guaranteed that stp of 32-bit halves me= rged into a single 64-bit store. So without -march=3Darmv8.4-a it wasn't a missed optimization to construct the constant in a single register for _Atomic or volatile. But with ARMv8.4, we should use MOV/MOVK + STP. Since there doesn't seem to be a release-store version of STP, 64-bit relea= se and seq_cst stores should still generate the full constant in a register, instead of using STP + barriers. (Without ARMv8.4-a, or with a memory-order other than relaxed, see PR105928= for generating 64-bit constants in 3 instructions instead of 4, at least for -O= s, with add x0, x0, x0, lsl 32)=