From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from szxga08-in.huawei.com (szxga08-in.huawei.com [45.249.212.255]) by sourceware.org (Postfix) with ESMTPS id 543583858D1E for ; Tue, 4 Apr 2023 11:34:33 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 543583858D1E Authentication-Results: sourceware.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=huawei.com Received: from kwepemi100008.china.huawei.com (unknown [172.30.72.54]) by szxga08-in.huawei.com (SkyGuard) with ESMTP id 4PrQZ05ZFCz17QGT; Tue, 4 Apr 2023 19:31:04 +0800 (CST) Received: from kwepemi500008.china.huawei.com (7.221.188.139) by kwepemi100008.china.huawei.com (7.221.188.57) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.23; Tue, 4 Apr 2023 19:34:28 +0800 Received: from kwepemi500008.china.huawei.com ([7.221.188.139]) by kwepemi500008.china.huawei.com ([7.221.188.139]) with mapi id 15.01.2507.023; Tue, 4 Apr 2023 19:34:28 +0800 From: mudrievskyjpetro To: "will.deacon@arm.com" , "will@kernel.org" CC: "linux-arm-kernel@lists.infradead.org" , "gcc@gcc.gnu.org" , "codegen-arm@discourse.llvm.org" Subject: ARMv7 doubleword atomicity Thread-Topic: ARMv7 doubleword atomicity Thread-Index: Adlm4XyCzV9mtXAcQCCsZCC+twIkRA== Date: Tue, 4 Apr 2023 11:34:28 +0000 Message-ID: Accept-Language: en-US, zh-CN Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.146.83.222] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-CFilter-Loop: Reflected X-Spam-Status: No, score=-2.4 required=5.0 tests=BAYES_00,KAM_DMARC_STATUS,KAM_SHORT,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: *sending this email again, now in plain text Hi Will, I'm working at Huawei on verification of atomic primitives. I thought it wo= uld be appropriate to write to you because you're mentioned in several pape= rs on ARM concurrency (https://www.cl.cam.ac.uk/~pes20/papers/topics.html),= gcc patches and you're an author of several patches to kernel regarding th= is. I've recently been looking into our implementation of atomic loads/stores o= n ARMv7 and found out that we treat stores specially - with LDREXD/STREXD l= oop - while load is just LDREXD. The latest version of manual (DDI 0406C.d) explicitly prohibits this saying= in A3.5.3 that "The way to atomically load two 32-bit quantities is to per= form an LDREXD/STREXD sequence, reading and writing the same value, for whi= ch the STREXD succeeds, and use the read values." Both GCC and LLVM produce the same code as us (https://godbolt.org/z/bYaWbE= bjh). The explanation for this in GCC (https://gcc.gnu.org/pipermail/gcc-pa= tches/2012-April/338841.html) given by Richard Earnshaw is based on older v= ersion of ARM ARM (before C.c), which says that LDREXDs are atomic: --- C.b +++ C.c In ARMv7, the single-copy atomic processor accesses are *** - memory accesses caused by LDREXD and STREXD instructions to doubleword-al= igned locations. + Memory accesses caused by a LDREXD/STREXD to a doubleword-aligned locatio= n for which the STREXD succeeds + cause single-copy atomic updates of the doubleword being accessed. Interestingly, prior to issue C.c LDREXD's pseudocode contained one single-= copy atomic memory access (0406C.b A8.8.77): MemA[address,8] , whereas now = it contains two (0406C.d A8.8.78): MemA[address,4] and MemA[address+4,4]. Also regarding LPAE, there is a discrepancy between prose and pseudocode ex= planations of atomicity of LDRD/STRD on LPAE. In prose LDRD/STRD are atomic= only in locations that might be used to hold translations, while in pseudo= code they are always atomic. LLVM doesn't change its code output for LPAE. GCC produces regular LDRD for= load and keeps the LDREXD/STREXD loop for store. In the kernel both loads = and stores are regular LDRD/STRD. I've ran some litmus tests on a bunch of boards. Here are the results: - cortex-a7 (BCM2836, RK3128), cortex-a9 (Exynos4412) and cortex-a17 (RK318= 8) - LDRD/STRD are single-copy atomic; - cortex-a5 (MSM8625Q) - LDRD/STRD are not single-copy atomic, but it's eno= ugh to use LDEXRD/STRD to fix it: two writers with STRD and one reader with= LDEXRD don't produce inconsistent results. Interestingly, on cortex-a9 (without LPAE) regular LDRD/STRD are atomic. Can you shed some light on the situation with LDREXD/STREXD? Why was the ma= nual changed? Do you think we should change implementation in kernel and el= sewhere to what manual suggests? Also about LPAE, manual doesn't pose a requirement that all locations in th= e memory system are 64-bit single-copy atomic - only those that might be us= ed to hold translations, "such as bulk SDRAM". Does this mean we can safely= use LDRD/STRD? Related patches/discussion: http://lists.infradead.org/pipermail/linux-arm-kernel/2009-December/005934.= html https://lists.infradead.org/pipermail/linux-arm-kernel/2013-March/157817.ht= ml https://gcc.gnu.org/pipermail/gcc-patches/2012-April/338781.html https://gcc.gnu.org/pipermail/gcc-patches/2016-February/442717.html https://reviews.llvm.org/rGc882eb0723afa9dfe626eebb9699c1871a8bbbab --- Peter