From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=+mM9=73=huawei.com=mudrievskyjpetro@sourceware.org>
Received: from szxga08-in.huawei.com (szxga08-in.huawei.com [45.249.212.255])
	by sourceware.org (Postfix) with ESMTPS id 543583858D1E
	for <gcc@gcc.gnu.org>; Tue,  4 Apr 2023 11:34:33 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 543583858D1E
Authentication-Results: sourceware.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=huawei.com
Received: from kwepemi100008.china.huawei.com (unknown [172.30.72.54])
	by szxga08-in.huawei.com (SkyGuard) with ESMTP id 4PrQZ05ZFCz17QGT;
	Tue,  4 Apr 2023 19:31:04 +0800 (CST)
Received: from kwepemi500008.china.huawei.com (7.221.188.139) by
 kwepemi100008.china.huawei.com (7.221.188.57) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id
 15.1.2507.23; Tue, 4 Apr 2023 19:34:28 +0800
Received: from kwepemi500008.china.huawei.com ([7.221.188.139]) by
 kwepemi500008.china.huawei.com ([7.221.188.139]) with mapi id 15.01.2507.023;
 Tue, 4 Apr 2023 19:34:28 +0800
From: mudrievskyjpetro <mudrievskyjpetro@huawei.com>
To: "will.deacon@arm.com" <will.deacon@arm.com>, "will@kernel.org"
	<will@kernel.org>
CC: "linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>, "gcc@gcc.gnu.org" <gcc@gcc.gnu.org>,
	"codegen-arm@discourse.llvm.org" <codegen-arm@discourse.llvm.org>
Subject: ARMv7 doubleword atomicity
Thread-Topic: ARMv7 doubleword atomicity
Thread-Index: Adlm4XyCzV9mtXAcQCCsZCC+twIkRA==
Date: Tue, 4 Apr 2023 11:34:28 +0000
Message-ID: <c88d98c53a7641139b6339c59f5dd9b6@huawei.com>
Accept-Language: en-US, zh-CN
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [10.146.83.222]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-CFilter-Loop: Reflected
X-Spam-Status: No, score=-2.4 required=5.0 tests=BAYES_00,KAM_DMARC_STATUS,KAM_SHORT,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc.gcc.gnu.org>

*sending this email again, now in plain text

Hi Will,

I'm working at Huawei on verification of atomic primitives. I thought it wo=
uld be appropriate to write to you because you're mentioned in several pape=
rs on ARM concurrency (https://www.cl.cam.ac.uk/~pes20/papers/topics.html),=
 gcc patches and you're an author of several patches to kernel regarding th=
is.

I've recently been looking into our implementation of atomic loads/stores o=
n ARMv7 and found out that we treat stores specially - with LDREXD/STREXD l=
oop - while load is just LDREXD.
The latest version of manual (DDI 0406C.d) explicitly prohibits this saying=
 in A3.5.3 that "The way to atomically load two 32-bit quantities is to per=
form an LDREXD/STREXD sequence, reading and writing the same value, for whi=
ch the STREXD succeeds, and use the read values."
Both GCC and LLVM produce the same code as us (https://godbolt.org/z/bYaWbE=
bjh). The explanation for this in GCC (https://gcc.gnu.org/pipermail/gcc-pa=
tches/2012-April/338841.html) given by Richard Earnshaw is based on older v=
ersion of ARM ARM (before C.c), which says that LDREXDs are atomic:

--- C.b
+++ C.c
  In ARMv7, the single-copy atomic processor accesses are
  ***
- memory accesses caused by LDREXD and STREXD instructions to doubleword-al=
igned locations.
+ Memory accesses caused by a LDREXD/STREXD to a doubleword-aligned locatio=
n for which the STREXD succeeds
+ cause single-copy atomic updates of the doubleword being accessed.

Interestingly, prior to issue C.c LDREXD's pseudocode contained one single-=
copy atomic memory access (0406C.b A8.8.77): MemA[address,8] , whereas now =
it contains two (0406C.d A8.8.78): MemA[address,4] and MemA[address+4,4].

Also regarding LPAE, there is a discrepancy between prose and pseudocode ex=
planations of atomicity of LDRD/STRD on LPAE. In prose LDRD/STRD are atomic=
 only in locations that might be used to hold translations, while in pseudo=
code they are always atomic.
LLVM doesn't change its code output for LPAE. GCC produces regular LDRD for=
 load and keeps the LDREXD/STREXD loop for store. In the kernel both loads =
and stores are regular LDRD/STRD.

I've ran some litmus tests on a bunch of boards. Here are the results:
- cortex-a7 (BCM2836, RK3128), cortex-a9 (Exynos4412) and cortex-a17 (RK318=
8) - LDRD/STRD are single-copy atomic;
- cortex-a5 (MSM8625Q) - LDRD/STRD are not single-copy atomic, but it's eno=
ugh to use LDEXRD/STRD to fix it: two writers with STRD and one reader with=
 LDEXRD don't produce inconsistent results.
Interestingly, on cortex-a9 (without LPAE) regular LDRD/STRD are atomic.

Can you shed some light on the situation with LDREXD/STREXD? Why was the ma=
nual changed? Do you think we should change implementation in kernel and el=
sewhere to what manual suggests?
Also about LPAE, manual doesn't pose a requirement that all locations in th=
e memory system are 64-bit single-copy atomic - only those that might be us=
ed to hold translations, "such as bulk SDRAM". Does this mean we can safely=
 use LDRD/STRD?

Related patches/discussion:
http://lists.infradead.org/pipermail/linux-arm-kernel/2009-December/005934.=
html
https://lists.infradead.org/pipermail/linux-arm-kernel/2013-March/157817.ht=
ml
https://gcc.gnu.org/pipermail/gcc-patches/2012-April/338781.html
https://gcc.gnu.org/pipermail/gcc-patches/2016-February/442717.html
https://reviews.llvm.org/rGc882eb0723afa9dfe626eebb9699c1871a8bbbab

---
Peter