From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from asav22.altibox.net (asav22.altibox.net [109.247.116.9]) by sourceware.org (Postfix) with ESMTPS id AD0413858C27 for ; Sat, 10 Oct 2020 19:43:10 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org AD0413858C27 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=hesbynett.no Authentication-Results: sourceware.org; spf=fail smtp.mailfrom=david.brown@hesbynett.no Received: from mail.jansenbrown.no (unknown [92.221.34.247]) by asav22.altibox.net (Postfix) with ESMTP id 11A0420084; Sat, 10 Oct 2020 21:43:08 +0200 (CEST) Received: from [192.168.4.245] (unicorn.lan [192.168.4.245]) by mail.jansenbrown.no (Postfix) with ESMTPSA id A08B0200F58; Sat, 10 Oct 2020 21:43:08 +0200 (CEST) Subject: Re: Atomic accesses on ARM microcontrollers To: Jonathan Wakely Cc: gcc-help References: <945d5e74-b449-3746-6560-996d0437db76@hesbynett.no> From: David Brown Openpgp: preference=signencrypt Autocrypt: addr=david.brown@hesbynett.no; keydata= mQINBF7iNb8BEADHIIQfKe+zVZfSFCOYcvT8+WDog9OtzDIvEPMpORzODkUGMsF50bnN2vLh mby6K4O/jKCKwN/uoeoAW/QkrIYPdrp78o9ldDm5L+qw5gDkHYYSXuY9UOkXTJ8Iva/aTYPs wxTNMYA8QosgvA6C3ivi9qzFbB7BcEeBhMw4bz5iG/4wM/0QpKw8/7y1JyMWhh/3kV8b0ldV d/DJeRlV+zXzHISlfTXLzjGlf/l22zfE6b7keZzbQ7Mw6/DkiD64WnFo3bkLUrbXmU6SDcwi A9oVQMGBOniC0fgiIJsSk20HyV5m6LPStItTUqQiAIYtXWR81hS6Oa1nIhbvADe2+osr5nSu tijp2OqrRhKJo8qbiLmzAvaKzoWFyoGBBm8BlXDKB7MFVYFjgpQl2x6Y82z46UXzhU9H5jvA KiRFY26xNqFHaYKDOyqK0lbwZlqZRz8XpXxmCMWU0Q3seAVzt58WMXkz9mBgX0cfF2I84uHd mZ6E4PECWn6pfVhLlyZ2cgRcbaOGoQ2j5zqQYoO9L8J5XRY6k6aR9ElS5UVyHLl6Sgc1bl8D I0ZNWmM7/YARm56RrxfDuJM98akj/2Kn1QIrrPk4DMrmpUN/WaXjGkh+cuP1QBXu3aCrBSCC JOAE9+eESPr4qGLCv+XG1CJNh/vFWE4fc8kFBRnCZW1X3IFrNQARAQABtCZEYXZpZCBCcm93 biA8ZGF2aWQuYnJvd25AaGVzYnluZXR0Lm5vPokCNwQTAQgAIQUCXuI1vwIbIwULCQgHAgYV CAkKCwIEFgIDAQIeAQIXgAAKCRAojagM+fRW9w9uD/9jDRt7VazcPIXk5R82aJRJ2EQ+zWkj cKkh0O5PDaSapBfM8cl2lHX/uXtGQxtt/Ep0iFY6Jn8rUvuXbxZGd6Le61nmL43cg9GYSmjm J6w3VitkzjWojZ61oNVATjnXsQl4juca7j6jGL2SfHOYI5Q0yiBp+x0vSwIvqpvXw80ixU+p sYSVcFR1EE41/ldZOmeFBl34Rbgl0EvLFMJbhTkR/edggK+f6+Y4i6Ih9es3lgYAsghYFdGs nIvJ9aSbDw1j9HIo1thZTcy6U54vjs5k3L/Jd7FYobMbqgkmTQ01/9aFgpSfR6U9qWovYACm 1Zxcdgt7Qz+/ZqThFSe3yOkUaIW/QjcLSdYQU9+DszMRKKfRA7J37Ti/1tyaTOVwlABrVBel Ct2n2jz8rBjmBnvQXXGi+eReFBXsw+CUGabLaiBWTtAJ0svRsvpXQ5w7rxMFkjyv6d7xzABO SJXLRBPG8NRSvJmYKDAiyfmfleQEXliXe/78MpWGU8IMrdLwvDAx+cI7cRhNY7Bdf6iVpJVz 0rBK3NpusMAZKm7ThFjnICGH8gU6KoAU02ZF2ZZBPklMMpY2BSlL+l00tNs+E+2eR2O3H1+D abZm1TFvpr2/1bifTHheeTbO3CYY09G7PYI2JlScQ2YjHJ8k+G/JlIns0odlPI6RCM6fE6/Y ZOzyWLkCDQRe4jW/ARAAuviAYrnL3ND3cBxxtiV3FpEsspJ7J8wMrLudkGJjkh169SehRF+X xlMUOZlrjXD+SW7eUNHTlaRtsSVrzouUAKnTWgkko7XYH7Y2W/9uUesCCCwWVXIvU8CZ2hSR 4wOI90sm8yPO4E/uPQV+YDxoI21bsUGhsk9L/zhT0ju3mnn/0t0c6Hh4E8CooEA1v8PT3a/G k9/WuUuTPHjv9kuPMB1Wg7gJSF3r/f3v+PSruQFBZjMTmyx8MPOinSB/MGg7XN/323CxE40H ssvhuVlVskVaCvzlWUhP9bAuYY10Q8Kb5X1Ep1XCBTCqgysXLWcgLXt9xsZvHpYkLbt0WBka fmYAxAIlpC2eeK2RwWmpQQEHBIRa95TZy+ZZGoK2UWgPqidtM3SCT60haugnKuaWYxYYPuQP pML6wQ5TNgweUNdukvcynOqVCJD+eCS++paQHjk7BKvGNHTGrf1mcbjBxzO7HydjXdrczIYG HhiEzsp2BEOOocGbRpWT2ih36d7DUDzTtyWUB7Ix5zIGGSDYMKrQlMbuXZ4uR1pHo8XudbSy mKqXI5gabOY43Z/5tFNeHtSanKcISBahNhjn2ZkcY69CC2ci0ypbXMQNQxqIwcJGEu/9x27c pBjwReT4veul5I0W6jqQsqnVY+wNhl9CbkH5okoEjWj4h4Cf3Qqu0B8AEQEAAYkCHwQYAQgA CQUCXuI1vwIbDAAKCRAojagM+fRW900cD/kBfvaqF6wChX1FIcn2yLVjMhBDFN2waA3YLYnQ v7xlhCKajcmnHSTNMdBJu76MIpoGtNT4TZETssTBK6NltqKgEybSiu0gBiQ6BZORr3mx0QK4 s/nwyAN1r4ZUwTB7ZRSO6oe+3IS520y4XemLXLPslUOearawXktrVMMC/Alzrjnjri6K/VnO M8TMsFOapsVJnJrKRwAbyIrCAmqab5YPDw52/m6amyD4oHv81XhQXtj0KFFvRO/jkhT7sXza K1xjGoUd1SffbViApOKIas9H+n6lT2r9IDYkSxvWHTYePjG4SyQC9Hf+ZaG6E+eHewd+JCiR Fs+e95j3HUO/Jk7wqT89U4ZwKyXWCBml1Zv41Z6rtmkUfnT2wg5seSJCLCZQX8gukNKAeNfY xaSVxd5Swjmymqt3PviqIAdGYp7cQD69HedgebFhHcdIO/k0273OZVNXpO2TKundMx++g7Jy gcslF3M1pRHxFeU2O8ghYuV+CbEMcoij5+n0U93NkmCpc3zds2VoNomhfG/9KyMqxajoUD9S lI1lUrDZ8muJXiE56KugKbbowlSCHqyx8qAD+eGizSrC2pMF0EbmgNnojoAlhJ/jxNxgFn9s IpM0y1D6dpt8W+ZEYgs8FqIA9DDgx3WsP1TM7qoOmNc3FY0KwgFUFFqqdPWYZEum8S0WbA== Message-ID: <6cfc20a9-05c5-e2be-d9f0-d10911268b4a@hesbynett.no> Date: Sat, 10 Oct 2020 21:43:08 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.9.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-GB Content-Transfer-Encoding: 8bit X-CMAE-Score: 0 X-CMAE-Analysis: v=2.3 cv=Du94Bl3+ c=1 sm=1 tr=0 a=+Fy6h7hJ4UJcWgHwdIx3jg==:117 a=+Fy6h7hJ4UJcWgHwdIx3jg==:17 a=IkcTkHD0fZMA:10 a=afefHYAZSVUA:10 a=PeOOapuUAAAA:8 a=4CCxDsOTzA15fPm-9yUA:9 a=QEXdDO2ut3YA:10 a=0BaqRfgCL6CLbWgV2pdm:22 X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS, NICE_REPLY_A, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NEUTRAL, TXREP autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-help@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-help mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 10 Oct 2020 19:43:12 -0000 On 10/10/2020 14:39, Jonathan Wakely wrote: > On Fri, 9 Oct 2020 at 19:29, David Brown wrote: >> >> I don't know if this can be answered here, or would be best on the >> development mailing list. But I'll start on the help list. >> >> I work primarily with microcontrollers, with 32-bit ARM Cortex-M devices >> being the most common these days. I've been trying out atomics in gcc, >> and I find it badly lacking. (I've tried C11 , C++11 >> , and the gcc builtins - they all generate the same results, >> which is to be expected.) I'm concentrating on plain loads and stores >> at the moment, not other atomic operations. >> >> These microcontrollers are all single core, so memory ordering does not >> matter. >> >> For 8-bit, 16-bit and 32-bit types, atomic accesses are just simple >> loads and stores. These are generated fine. >> >> But for 64-bit and above, there are library calls to a compiler-provided >> library. For the Cortex M4 and M7 cores (and several other Cortex M >> cores), the "load double register" and "store double register" >> instructions are atomic (but not suitable for use with volatile data, >> since they are restarted if they are interrupted). The compiler >> generates these for normal 64-bit types, but not for atomics. >> >> For larger types, the situation is far, far worse. Not only is the >> library code inefficient on these devices (disabling and re-enabling >> global interrupts is the optimal solution in most cases, with load/store >> with reservation being a second option), but it is /wrong/. The library >> uses spin locks (AFAICS) - on a single core system, that generally means >> deadlocking the processor. That is worse than useless. >> >> Is there any way I can replace this library with my own code here, while >> still using the language atomics? > > Yes. My understanding is that libatomic is designed to be replaceable > by users who want to provide their own custom implementations of the > API. > > You're using bare metal ARM, right? For Arm on Linux I think there are > kernel helpers that make the atomics efficient even when the hardware > doesn't support them. > Yes, I am using bare metal (well, sometimes an RTOS - but that's still a lot closer to bare metal than to a host OS like Linux). And I have a single core - that makes atomics easier because I don't even need "dmb" or other memory barrier instructions, and I can freely use "disable interrupts around the access" strategy. On the other hand, it means that the spin locks in libatomic are completely wrong. If I understand you correctly, you mean that I can simply implement my own version of __atomic_load_8 and other functions in libatomic? I had a quick test (using the godbolt.org online compiler). By adding this to my file: extern inline uint64_t __atomic_load_8(const volatile void * p, int order) { (void) order; const volatile uint64_t * q = (const volatile uint64_t *) p; return *q; } then a straight load of a 64-bit atomic becomes a single "ldrd" load double register instruction, which is optimal for this processor. (In a finished solution, I'd want to check that this is correct for different flags - possibly adding function attributes for optimisation or inline assembly to ensure that it is always correct. But that's a detail for me to check.) The same worked for __atomic_store_8. (The general load/store functions are a bit more involved, as are the read-modify-write atomic functions.) Is this strategy guaranteed to work in gcc, or is it a case of "it works in a simple test, but might fail in a complicated program or with different flags" ?