From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from asav22.altibox.net (asav22.altibox.net [109.247.116.9]) by sourceware.org (Postfix) with ESMTPS id A8635384B13C for ; Sat, 10 Oct 2020 19:43:17 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org A8635384B13C Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=hesbynett.no Authentication-Results: sourceware.org; spf=fail smtp.mailfrom=david.brown@hesbynett.no Received: from mail.jansenbrown.no (unknown [92.221.34.247]) by asav22.altibox.net (Postfix) with ESMTP id BB01220084; Sat, 10 Oct 2020 21:43:15 +0200 (CEST) Received: from [192.168.4.245] (unicorn.lan [192.168.4.245]) by mail.jansenbrown.no (Postfix) with ESMTPSA id 8956F200F5D; Sat, 10 Oct 2020 21:43:15 +0200 (CEST) Subject: Re: Atomic accesses on ARM microcontrollers To: Toby Douglass References: <945d5e74-b449-3746-6560-996d0437db76@hesbynett.no> Cc: GCC help From: David Brown Openpgp: preference=signencrypt Autocrypt: addr=david.brown@hesbynett.no; keydata= mQINBF7iNb8BEADHIIQfKe+zVZfSFCOYcvT8+WDog9OtzDIvEPMpORzODkUGMsF50bnN2vLh mby6K4O/jKCKwN/uoeoAW/QkrIYPdrp78o9ldDm5L+qw5gDkHYYSXuY9UOkXTJ8Iva/aTYPs wxTNMYA8QosgvA6C3ivi9qzFbB7BcEeBhMw4bz5iG/4wM/0QpKw8/7y1JyMWhh/3kV8b0ldV d/DJeRlV+zXzHISlfTXLzjGlf/l22zfE6b7keZzbQ7Mw6/DkiD64WnFo3bkLUrbXmU6SDcwi A9oVQMGBOniC0fgiIJsSk20HyV5m6LPStItTUqQiAIYtXWR81hS6Oa1nIhbvADe2+osr5nSu tijp2OqrRhKJo8qbiLmzAvaKzoWFyoGBBm8BlXDKB7MFVYFjgpQl2x6Y82z46UXzhU9H5jvA KiRFY26xNqFHaYKDOyqK0lbwZlqZRz8XpXxmCMWU0Q3seAVzt58WMXkz9mBgX0cfF2I84uHd mZ6E4PECWn6pfVhLlyZ2cgRcbaOGoQ2j5zqQYoO9L8J5XRY6k6aR9ElS5UVyHLl6Sgc1bl8D I0ZNWmM7/YARm56RrxfDuJM98akj/2Kn1QIrrPk4DMrmpUN/WaXjGkh+cuP1QBXu3aCrBSCC JOAE9+eESPr4qGLCv+XG1CJNh/vFWE4fc8kFBRnCZW1X3IFrNQARAQABtCZEYXZpZCBCcm93 biA8ZGF2aWQuYnJvd25AaGVzYnluZXR0Lm5vPokCNwQTAQgAIQUCXuI1vwIbIwULCQgHAgYV CAkKCwIEFgIDAQIeAQIXgAAKCRAojagM+fRW9w9uD/9jDRt7VazcPIXk5R82aJRJ2EQ+zWkj cKkh0O5PDaSapBfM8cl2lHX/uXtGQxtt/Ep0iFY6Jn8rUvuXbxZGd6Le61nmL43cg9GYSmjm J6w3VitkzjWojZ61oNVATjnXsQl4juca7j6jGL2SfHOYI5Q0yiBp+x0vSwIvqpvXw80ixU+p sYSVcFR1EE41/ldZOmeFBl34Rbgl0EvLFMJbhTkR/edggK+f6+Y4i6Ih9es3lgYAsghYFdGs nIvJ9aSbDw1j9HIo1thZTcy6U54vjs5k3L/Jd7FYobMbqgkmTQ01/9aFgpSfR6U9qWovYACm 1Zxcdgt7Qz+/ZqThFSe3yOkUaIW/QjcLSdYQU9+DszMRKKfRA7J37Ti/1tyaTOVwlABrVBel Ct2n2jz8rBjmBnvQXXGi+eReFBXsw+CUGabLaiBWTtAJ0svRsvpXQ5w7rxMFkjyv6d7xzABO SJXLRBPG8NRSvJmYKDAiyfmfleQEXliXe/78MpWGU8IMrdLwvDAx+cI7cRhNY7Bdf6iVpJVz 0rBK3NpusMAZKm7ThFjnICGH8gU6KoAU02ZF2ZZBPklMMpY2BSlL+l00tNs+E+2eR2O3H1+D abZm1TFvpr2/1bifTHheeTbO3CYY09G7PYI2JlScQ2YjHJ8k+G/JlIns0odlPI6RCM6fE6/Y ZOzyWLkCDQRe4jW/ARAAuviAYrnL3ND3cBxxtiV3FpEsspJ7J8wMrLudkGJjkh169SehRF+X xlMUOZlrjXD+SW7eUNHTlaRtsSVrzouUAKnTWgkko7XYH7Y2W/9uUesCCCwWVXIvU8CZ2hSR 4wOI90sm8yPO4E/uPQV+YDxoI21bsUGhsk9L/zhT0ju3mnn/0t0c6Hh4E8CooEA1v8PT3a/G k9/WuUuTPHjv9kuPMB1Wg7gJSF3r/f3v+PSruQFBZjMTmyx8MPOinSB/MGg7XN/323CxE40H ssvhuVlVskVaCvzlWUhP9bAuYY10Q8Kb5X1Ep1XCBTCqgysXLWcgLXt9xsZvHpYkLbt0WBka fmYAxAIlpC2eeK2RwWmpQQEHBIRa95TZy+ZZGoK2UWgPqidtM3SCT60haugnKuaWYxYYPuQP pML6wQ5TNgweUNdukvcynOqVCJD+eCS++paQHjk7BKvGNHTGrf1mcbjBxzO7HydjXdrczIYG HhiEzsp2BEOOocGbRpWT2ih36d7DUDzTtyWUB7Ix5zIGGSDYMKrQlMbuXZ4uR1pHo8XudbSy mKqXI5gabOY43Z/5tFNeHtSanKcISBahNhjn2ZkcY69CC2ci0ypbXMQNQxqIwcJGEu/9x27c pBjwReT4veul5I0W6jqQsqnVY+wNhl9CbkH5okoEjWj4h4Cf3Qqu0B8AEQEAAYkCHwQYAQgA CQUCXuI1vwIbDAAKCRAojagM+fRW900cD/kBfvaqF6wChX1FIcn2yLVjMhBDFN2waA3YLYnQ v7xlhCKajcmnHSTNMdBJu76MIpoGtNT4TZETssTBK6NltqKgEybSiu0gBiQ6BZORr3mx0QK4 s/nwyAN1r4ZUwTB7ZRSO6oe+3IS520y4XemLXLPslUOearawXktrVMMC/Alzrjnjri6K/VnO M8TMsFOapsVJnJrKRwAbyIrCAmqab5YPDw52/m6amyD4oHv81XhQXtj0KFFvRO/jkhT7sXza K1xjGoUd1SffbViApOKIas9H+n6lT2r9IDYkSxvWHTYePjG4SyQC9Hf+ZaG6E+eHewd+JCiR Fs+e95j3HUO/Jk7wqT89U4ZwKyXWCBml1Zv41Z6rtmkUfnT2wg5seSJCLCZQX8gukNKAeNfY xaSVxd5Swjmymqt3PviqIAdGYp7cQD69HedgebFhHcdIO/k0273OZVNXpO2TKundMx++g7Jy gcslF3M1pRHxFeU2O8ghYuV+CbEMcoij5+n0U93NkmCpc3zds2VoNomhfG/9KyMqxajoUD9S lI1lUrDZ8muJXiE56KugKbbowlSCHqyx8qAD+eGizSrC2pMF0EbmgNnojoAlhJ/jxNxgFn9s IpM0y1D6dpt8W+ZEYgs8FqIA9DDgx3WsP1TM7qoOmNc3FY0KwgFUFFqqdPWYZEum8S0WbA== Message-ID: <015e0ad8-8052-c63f-0eb1-4b9fa817cc10@hesbynett.no> Date: Sat, 10 Oct 2020 21:43:15 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.9.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-GB Content-Transfer-Encoding: 8bit X-CMAE-Score: 0 X-CMAE-Analysis: v=2.3 cv=Du94Bl3+ c=1 sm=1 tr=0 a=+Fy6h7hJ4UJcWgHwdIx3jg==:117 a=+Fy6h7hJ4UJcWgHwdIx3jg==:17 a=IkcTkHD0fZMA:10 a=afefHYAZSVUA:10 a=m3A5i-zGAAAA:8 a=TZSLdoHq6AXV4vmawu8A:9 a=YqtDFxP6Cn9IisyU:21 a=fGhOCtgfd6OUWriR:21 a=QEXdDO2ut3YA:10 a=rn15xoU9krn_F-UpvMYP:22 X-Spam-Status: No, score=-6.2 required=5.0 tests=BAYES_00, JMQ_SPF_NEUTRAL, KAM_DMARC_STATUS, NICE_REPLY_A, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NEUTRAL, TXREP autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-help@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-help mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 10 Oct 2020 19:43:19 -0000 Hi, Thanks for trying to help here, though I think perhaps we are talking slightly at cross-purposes. On 09/10/2020 23:35, Toby Douglass wrote: > On 09/10/2020 20:28, David Brown wrote: > > Hi, David. > > I would like - but cannot - reply to the list, as their email server > does not handle encrypted email. I've put the help list on the cc to my reply - I assume that's okay for you. (Your email to me was not encrypted, unless I am missing something.) > >> I work primarily with microcontrollers, with 32-bit ARM Cortex-M devices >> being the most common these days.  I've been trying out atomics in gcc, >> and I find it badly lacking. > > The 4.1.2 atomics or the later, replacement API? > I am not sure what you mean here, or what "4.1.2" refers to - it doesn't match either the gcc manual or the C standards as far as I can see. >> (I've tried C11 , C++11 >> , and the gcc builtins - they all generate the same results, >> which is to be expected.)  I'm concentrating on plain loads and stores >> at the moment, not other atomic operations. > > Now, it's been about two years since I was working on this stuff, so I > may well be wrong, but I recall there's no such thing as an actual, > simple, atomic load or store. > > You can issue a load, or a store, and you can control the order in which > events occur around it, and you can also force the load or store to > complete by issuing a later operation which forces the load or store to > be completed - so there's not an actual, direct, "atomic load" or > "atomic store". Yes, I know that atomics are used like this to correlate operations between different threads and ensure specific orders. And they are vital for that purpose. However, "atomic" also has a simpler, more fundamental and clearer meaning with a wider applicability - it means an operation that cannot be divided (or at least, cannot be /observed/ to be divided). This is the meaning that is important to me here. And yes, you /can/ describe this in terms of loads and stores without any reference to ordering or other aspects. What it means is that if thread A stores a value in the atomic variable ax, and thread B attempts to read the value in ax, then B will read either the entire old value before the write, or the entire new value after the write - it will never read an inconsistent partial write. Other atomic operations require atomic read-modify-write semantics, or require ordering of operations on different objects. But for many uses, simple atomic loads and stores is enough. > >> These microcontrollers are all single core, so memory ordering does not >> matter. > > I am not sure this is true.  A single thread must make the world appear > as if events occur in the order specified in the source code, but I bet > you this already not true for interrupts. > It is true even for interrupts. In any single processor core, regardless of any re-ordering done by the cpu, the operations will be carried out logically in the order they are given. Any write operation followed by a read operation (to the same address) will be result in the read giving the value written. This is not necessarily true for different cores (including virtual cores on SMT systems) - ensuring that each core has a synchronised view of the other core's write buffers, instruction re-ordering, etc., would severely limit performance. That's why you need memory ordering atomics on multi-core systems, but not on single-core systems. (Even on a single core, there can be other memory masters such as DMA that complicate orderings - but that's a different matter, and handled in a different manner. C11/C++11 atomics are neither necessary nor sufficient for non-cpu memory masters.) Interrupts, with few exceptions, come either before or after an instruction has executed. (Some cpus support interruptible and resumable instructions - for the Cortex M, that applies to load/store multiple registers. Some support restartable instructions - for the Cortex M, that includes division and load/store double register.) The observable behaviour of an interrupt is basically like inserting a "call to subroutine" instruction in the middle of the normal logical instruction stream. >> For 8-bit, 16-bit and 32-bit types, atomic accesses are just simple >> loads and stores.  These are generated fine. > > I wonder if they really are. They are. >  It may be for example they can be > re-ordered with regard to each other, and this is not being prevented. Do you mean the kind of re-ordering the compiler does for code? That is not in question here - at least, not to me. I know what kinds of reorders are done, and how to prevent them if necessary. (On a single core, "volatile" is all you need - though there are more efficient ways. One of the reasons for wanting to use C11/C++11 atomics is to be able to control order as I want.) But as I said earlier, I am concerned here primarily with the atomicity of the accesses, not their order. And while the cpu and memory system can include write store buffers, caches, etc., that can affect the order of data hitting the memory, these are not an issue in a single core system. (They /are/ important for multi-core systems.) > Also, I still don't quite think there *are* atomic loads/stores as such > - although having said that I'm now remembering the LOCK prefix on > Intel, which might be usable with a load.  That would then lock the > cache line and load - but, ah yes, it doesn't *mean* anything to > atomically load.  The very next micro-second you value could be replaced > a new write. Replacing values is not an issue. The important part is the atomicity of the action. When thread A reads variable ax, it doesn't matter if thread B (or an interrupt, or whatever) has changed ax just before the read, or just after the read - it matters that it cannot change it /during/ the read. The key is /consistent/ values, not most up-to-date values. > >> But for 64-bit and above, there are library calls to a compiler-provided >> library. > > Oh ho ho ho yes.  This is why I had to roll my own.  When the processor > doesn't do what the API offers, rather than say no, a *NON LOCK FREE > ALTERNATIVE IS USED* - and this is WRONG. > >> For larger types, the situation is far, far worse.  Not only is the >> library code inefficient on these devices (disabling and re-enabling >> global interrupts is the optimal solution in most cases, with load/store >> with reservation being a second option), but it is /wrong/.  The library >> uses spin locks (AFAICS) - on a single core system, that generally means >> deadlocking the processor.  That is worse than useless. >> >> Is there any way I can replace this library with my own code here, while >> still using the language atomics? > > Sounds terrifying. > > Have a look here; > > https://www.liblfds.org > > Download the latest version, and have a look at the atomic abstraction > header for ARM32.  It may have what you need. I had a look through the github sources, but could not find anything relevant. But obviously that library has a lot more code and features than I am looking for. To be clear here, I am not looking for lock-free data structures. I am looking for simple atomic accesses. And I am happy to implement these myself. For 64-bit types, it's little more than a single line of inline assembly (and even that is only to guarantee the code that the compiler is likely to generate automatically, given the right source code). For bigger types, it's load/store with reservation instructions or disabling and enabling interrupts. Thanks, David