Atomic accesses on ARM microcontrollers

public inbox for gcc-help@gcc.gnu.org
 help / color / mirror / Atom feed

* Atomic accesses on ARM microcontrollers
@ 2020-10-09 18:28 David Brown
  2020-10-09 23:28 ` Segher Boessenkool
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: David Brown @ 2020-10-09 18:28 UTC (permalink / raw)
  To: gcc-help

I don't know if this can be answered here, or would be best on the
development mailing list.  But I'll start on the help list.

I work primarily with microcontrollers, with 32-bit ARM Cortex-M devices
being the most common these days.  I've been trying out atomics in gcc,
and I find it badly lacking.  (I've tried C11 <stdatomic.h>, C++11
<atomic>, and the gcc builtins - they all generate the same results,
which is to be expected.)  I'm concentrating on plain loads and stores
at the moment, not other atomic operations.

These microcontrollers are all single core, so memory ordering does not
matter.

For 8-bit, 16-bit and 32-bit types, atomic accesses are just simple
loads and stores.  These are generated fine.

But for 64-bit and above, there are library calls to a compiler-provided
library.  For the Cortex M4 and M7 cores (and several other Cortex M
cores), the "load double register" and "store double register"
instructions are atomic (but not suitable for use with volatile data,
since they are restarted if they are interrupted).  The compiler
generates these for normal 64-bit types, but not for atomics.

For larger types, the situation is far, far worse.  Not only is the
library code inefficient on these devices (disabling and re-enabling
global interrupts is the optimal solution in most cases, with load/store
with reservation being a second option), but it is /wrong/.  The library
uses spin locks (AFAICS) - on a single core system, that generally means
deadlocking the processor.  That is worse than useless.

Is there any way I can replace this library with my own code here, while
still using the language atomics?

David

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Atomic accesses on ARM microcontrollers
  2020-10-09 18:28 Atomic accesses on ARM microcontrollers David Brown
@ 2020-10-09 23:28 ` Segher Boessenkool
  2020-10-10 12:39 ` Jonathan Wakely
       [not found] ` <b29b1595-9441-68eb-f257-244a35082c82@winterflaw.net>
  2 siblings, 0 replies; 13+ messages in thread
From: Segher Boessenkool @ 2020-10-09 23:28 UTC (permalink / raw)
  To: David Brown; +Cc: gcc-help

Hi!

On Fri, Oct 09, 2020 at 08:28:05PM +0200, David Brown wrote:
> For 8-bit, 16-bit and 32-bit types, atomic accesses are just simple
> loads and stores.  These are generated fine.
> 
> But for 64-bit and above, there are library calls to a compiler-provided
> library.

> Is there any way I can replace this library with my own code here, while
> still using the language atomics?

Add something in libgcc/config/arm/ ?

You might need to add something specific to some configuration (in some
t-* file), or maybe add another command line flag.

It sounds like you have to add more constraints on what atomic can be
used for than what is specified currently.


Segher

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Atomic accesses on ARM microcontrollers
  2020-10-09 18:28 Atomic accesses on ARM microcontrollers David Brown
  2020-10-09 23:28 ` Segher Boessenkool
@ 2020-10-10 12:39 ` Jonathan Wakely
  2020-10-10 19:43   ` David Brown
  2020-10-12 21:44   ` Patrick Oppenlander
       [not found] ` <b29b1595-9441-68eb-f257-244a35082c82@winterflaw.net>
  2 siblings, 2 replies; 13+ messages in thread
From: Jonathan Wakely @ 2020-10-10 12:39 UTC (permalink / raw)
  To: David Brown; +Cc: gcc-help

On Fri, 9 Oct 2020 at 19:29, David Brown <david.brown@hesbynett.no> wrote:
>
> I don't know if this can be answered here, or would be best on the
> development mailing list.  But I'll start on the help list.
>
> I work primarily with microcontrollers, with 32-bit ARM Cortex-M devices
> being the most common these days.  I've been trying out atomics in gcc,
> and I find it badly lacking.  (I've tried C11 <stdatomic.h>, C++11
> <atomic>, and the gcc builtins - they all generate the same results,
> which is to be expected.)  I'm concentrating on plain loads and stores
> at the moment, not other atomic operations.
>
> These microcontrollers are all single core, so memory ordering does not
> matter.
>
> For 8-bit, 16-bit and 32-bit types, atomic accesses are just simple
> loads and stores.  These are generated fine.
>
> But for 64-bit and above, there are library calls to a compiler-provided
> library.  For the Cortex M4 and M7 cores (and several other Cortex M
> cores), the "load double register" and "store double register"
> instructions are atomic (but not suitable for use with volatile data,
> since they are restarted if they are interrupted).  The compiler
> generates these for normal 64-bit types, but not for atomics.
>
> For larger types, the situation is far, far worse.  Not only is the
> library code inefficient on these devices (disabling and re-enabling
> global interrupts is the optimal solution in most cases, with load/store
> with reservation being a second option), but it is /wrong/.  The library
> uses spin locks (AFAICS) - on a single core system, that generally means
> deadlocking the processor.  That is worse than useless.
>
> Is there any way I can replace this library with my own code here, while
> still using the language atomics?

Yes. My understanding is that libatomic is designed to be replaceable
by users who want to provide their own custom implementations of the
API.

You're using bare metal ARM, right? For Arm on Linux I think there are
kernel helpers that make the atomics efficient even when the hardware
doesn't support them.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Atomic accesses on ARM microcontrollers
  2020-10-10 12:39 ` Jonathan Wakely
@ 2020-10-10 19:43   ` David Brown
  2020-10-10 20:18     ` Jonathan Wakely
  2020-10-12  7:17     ` David Brown
  2020-10-12 21:44   ` Patrick Oppenlander
  1 sibling, 2 replies; 13+ messages in thread
From: David Brown @ 2020-10-10 19:43 UTC (permalink / raw)
  To: Jonathan Wakely; +Cc: gcc-help



On 10/10/2020 14:39, Jonathan Wakely wrote:
> On Fri, 9 Oct 2020 at 19:29, David Brown <david.brown@hesbynett.no> wrote:
>>
>> I don't know if this can be answered here, or would be best on the
>> development mailing list.  But I'll start on the help list.
>>
>> I work primarily with microcontrollers, with 32-bit ARM Cortex-M devices
>> being the most common these days.  I've been trying out atomics in gcc,
>> and I find it badly lacking.  (I've tried C11 <stdatomic.h>, C++11
>> <atomic>, and the gcc builtins - they all generate the same results,
>> which is to be expected.)  I'm concentrating on plain loads and stores
>> at the moment, not other atomic operations.
>>
>> These microcontrollers are all single core, so memory ordering does not
>> matter.
>>
>> For 8-bit, 16-bit and 32-bit types, atomic accesses are just simple
>> loads and stores.  These are generated fine.
>>
>> But for 64-bit and above, there are library calls to a compiler-provided
>> library.  For the Cortex M4 and M7 cores (and several other Cortex M
>> cores), the "load double register" and "store double register"
>> instructions are atomic (but not suitable for use with volatile data,
>> since they are restarted if they are interrupted).  The compiler
>> generates these for normal 64-bit types, but not for atomics.
>>
>> For larger types, the situation is far, far worse.  Not only is the
>> library code inefficient on these devices (disabling and re-enabling
>> global interrupts is the optimal solution in most cases, with load/store
>> with reservation being a second option), but it is /wrong/.  The library
>> uses spin locks (AFAICS) - on a single core system, that generally means
>> deadlocking the processor.  That is worse than useless.
>>
>> Is there any way I can replace this library with my own code here, while
>> still using the language atomics?
> 
> Yes. My understanding is that libatomic is designed to be replaceable
> by users who want to provide their own custom implementations of the
> API.
> 
> You're using bare metal ARM, right? For Arm on Linux I think there are
> kernel helpers that make the atomics efficient even when the hardware
> doesn't support them.
> 

Yes, I am using bare metal (well, sometimes an RTOS - but that's still a
lot closer to bare metal than to a host OS like Linux).  And I have a
single core - that makes atomics easier because I don't even need "dmb"
or other memory barrier instructions, and I can freely use "disable
interrupts around the access" strategy.  On the other hand, it means
that the spin locks in libatomic are completely wrong.

If I understand you correctly, you mean that I can simply implement my
own version of __atomic_load_8 and other functions in libatomic?

I had a quick test (using the godbolt.org online compiler).

By adding this to my file:

extern inline
uint64_t __atomic_load_8(const volatile void * p, int order) {
    (void) order;
    const volatile uint64_t * q = (const volatile uint64_t *) p;
    return *q;
}

then a straight load of a 64-bit atomic becomes a single "ldrd" load
double register instruction, which is optimal for this processor.  (In a
finished solution, I'd want to check that this is correct for different
flags - possibly adding function attributes for optimisation or inline
assembly to ensure that it is always correct.  But that's a detail for
me to check.)

The same worked for __atomic_store_8.

(The general load/store functions are a bit more involved, as are the
read-modify-write atomic functions.)

Is this strategy guaranteed to work in gcc, or is it a case of "it works
in a simple test, but might fail in a complicated program or with
different flags" ?



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Atomic accesses on ARM microcontrollers
       [not found] ` <b29b1595-9441-68eb-f257-244a35082c82@winterflaw.net>
@ 2020-10-10 19:43   ` David Brown
  2020-10-10 20:09     ` Jonathan Wakely
       [not found]     ` <bdf0f96f-0377-bee7-c02e-9704f0bea6a5@winterflaw.net>
  0 siblings, 2 replies; 13+ messages in thread
From: David Brown @ 2020-10-10 19:43 UTC (permalink / raw)
  To: Toby Douglass; +Cc: GCC help

Hi,

Thanks for trying to help here, though I think perhaps we are talking
slightly at cross-purposes.

On 09/10/2020 23:35, Toby Douglass wrote:
> On 09/10/2020 20:28, David Brown wrote:
> 
> Hi, David.
> 
> I would like - but cannot - reply to the list, as their email server
> does not handle encrypted email.

I've put the help list on the cc to my reply - I assume that's okay for
you.  (Your email to me was not encrypted, unless I am missing something.)

> 
>> I work primarily with microcontrollers, with 32-bit ARM Cortex-M devices
>> being the most common these days.  I've been trying out atomics in gcc,
>> and I find it badly lacking.
> 
> The 4.1.2 atomics or the later, replacement API?
> 

I am not sure what you mean here, or what "4.1.2" refers to - it doesn't
match either the gcc manual or the C standards as far as I can see.

>> (I've tried C11 <stdatomic.h>, C++11
>> <atomic>, and the gcc builtins - they all generate the same results,
>> which is to be expected.)  I'm concentrating on plain loads and stores
>> at the moment, not other atomic operations.
> 
> Now, it's been about two years since I was working on this stuff, so I
> may well be wrong, but I recall there's no such thing as an actual,
> simple, atomic load or store.
> 
> You can issue a load, or a store, and you can control the order in which
> events occur around it, and you can also force the load or store to
> complete by issuing a later operation which forces the load or store to
> be completed - so there's not an actual, direct, "atomic load" or
> "atomic store".

Yes, I know that atomics are used like this to correlate operations
between different threads and ensure specific orders.  And they are
vital for that purpose.

However, "atomic" also has a simpler, more fundamental and clearer
meaning with a wider applicability - it means an operation that cannot
be divided (or at least, cannot be /observed/ to be divided).  This is
the meaning that is important to me here.  And yes, you /can/ describe
this in terms of loads and stores without any reference to ordering or
other aspects.  What it means is that if thread A stores a value in the
atomic variable ax, and thread B attempts to read the value in ax, then
B will read either the entire old value before the write, or the entire
new value after the write - it will never read an inconsistent partial
write.

Other atomic operations require atomic read-modify-write semantics, or
require ordering of operations on different objects.  But for many uses,
simple atomic loads and stores is enough.

> 
>> These microcontrollers are all single core, so memory ordering does not
>> matter.
> 
> I am not sure this is true.  A single thread must make the world appear
> as if events occur in the order specified in the source code, but I bet
> you this already not true for interrupts.
> 

It is true even for interrupts.

In any single processor core, regardless of any re-ordering done by the
cpu, the operations will be carried out logically in the order they are
given.  Any write operation followed by a read operation (to the same
address) will be result in the read giving the value written.

This is not necessarily true for different cores (including virtual
cores on SMT systems) - ensuring that each core has a synchronised view
of the other core's write buffers, instruction re-ordering, etc., would
severely limit performance.  That's why you need memory ordering atomics
on multi-core systems, but not on single-core systems.

(Even on a single core, there can be other memory masters such as DMA
that complicate orderings - but that's a different matter, and handled
in a different manner.  C11/C++11 atomics are neither necessary nor
sufficient for non-cpu memory masters.)

Interrupts, with few exceptions, come either before or after an
instruction has executed.  (Some cpus support interruptible and
resumable instructions - for the Cortex M, that applies to load/store
multiple registers.  Some support restartable instructions - for the
Cortex M, that includes division and load/store double register.)  The
observable behaviour of an interrupt is basically like inserting a "call
to subroutine" instruction in the middle of the normal logical
instruction stream.

>> For 8-bit, 16-bit and 32-bit types, atomic accesses are just simple
>> loads and stores.  These are generated fine.
> 
> I wonder if they really are.

They are.

>  It may be for example they can be
> re-ordered with regard to each other, and this is not being prevented. 

Do you mean the kind of re-ordering the compiler does for code?  That is
not in question here - at least, not to me.  I know what kinds of
reorders are done, and how to prevent them if necessary.  (On a single
core, "volatile" is all you need - though there are more efficient ways.
 One of the reasons for wanting to use C11/C++11 atomics is to be able
to control order as I want.)  But as I said earlier, I am concerned here
primarily with the atomicity of the accesses, not their order.

And while the cpu and memory system can include write store buffers,
caches, etc., that can affect the order of data hitting the memory,
these are not an issue in a single core system.  (They /are/ important
for multi-core systems.)

> Also, I still don't quite think there *are* atomic loads/stores as such
> - although having said that I'm now remembering the LOCK prefix on
> Intel, which might be usable with a load.  That would then lock the
> cache line and load - but, ah yes, it doesn't *mean* anything to
> atomically load.  The very next micro-second you value could be replaced
> a new write.

Replacing values is not an issue.  The important part is the atomicity
of the action.  When thread A reads variable ax, it doesn't matter if
thread B (or an interrupt, or whatever) has changed ax just before the
read, or just after the read - it matters that it cannot change it
/during/ the read.  The key is /consistent/ values, not most up-to-date
values.

> 
>> But for 64-bit and above, there are library calls to a compiler-provided
>> library.
> 
> Oh ho ho ho yes.  This is why I had to roll my own.  When the processor
> doesn't do what the API offers, rather than say no, a *NON LOCK FREE
> ALTERNATIVE IS USED* - and this is WRONG.
> 
>> For larger types, the situation is far, far worse.  Not only is the
>> library code inefficient on these devices (disabling and re-enabling
>> global interrupts is the optimal solution in most cases, with load/store
>> with reservation being a second option), but it is /wrong/.  The library
>> uses spin locks (AFAICS) - on a single core system, that generally means
>> deadlocking the processor.  That is worse than useless.
>>
>> Is there any way I can replace this library with my own code here, while
>> still using the language atomics?
> 
> Sounds terrifying.
> 
> Have a look here;
> 
> https://www.liblfds.org
> 
> Download the latest version, and have a look at the atomic abstraction
> header for ARM32.  It may have what you need.

I had a look through the github sources, but could not find anything
relevant.  But obviously that library has a lot more code and features
than I am looking for.

To be clear here, I am not looking for lock-free data structures.  I am
looking for simple atomic accesses.  And I am happy to implement these
myself.  For 64-bit types, it's little more than a single line of inline
assembly (and even that is only to guarantee the code that the compiler
is likely to generate automatically, given the right source code).  For
bigger types, it's load/store with reservation instructions or disabling
and enabling interrupts.

Thanks,

David

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Atomic accesses on ARM microcontrollers
  2020-10-10 19:43   ` David Brown
@ 2020-10-10 20:09     ` Jonathan Wakely
       [not found]     ` <bdf0f96f-0377-bee7-c02e-9704f0bea6a5@winterflaw.net>
  1 sibling, 0 replies; 13+ messages in thread
From: Jonathan Wakely @ 2020-10-10 20:09 UTC (permalink / raw)
  To: David Brown; +Cc: Toby Douglass, GCC help

On Sat, 10 Oct 2020 at 20:47, David Brown <david.brown@hesbynett.no> wrote:
> On 09/10/2020 23:35, Toby Douglass wrote:
> > The 4.1.2 atomics or the later, replacement API?
> >
>
> I am not sure what you mean here, or what "4.1.2" refers to - it doesn't
> match either the gcc manual or the C standards as far as I can see.

I assume this means GCC 4.1.2 which was the first to support the
__sync built-ins
<https://gcc.gnu.org/onlinedocs/gcc/_005f_005fsync-Builtins.html>,
which were superseded by the __atomic ones
<https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Atomic accesses on ARM microcontrollers
  2020-10-10 19:43   ` David Brown
@ 2020-10-10 20:18     ` Jonathan Wakely
  2020-10-11 10:54       ` David Brown
  2020-10-12  7:17     ` David Brown
  1 sibling, 1 reply; 13+ messages in thread
From: Jonathan Wakely @ 2020-10-10 20:18 UTC (permalink / raw)
  To: David Brown; +Cc: gcc-help

On Sat, 10 Oct 2020 at 20:43, David Brown <david.brown@hesbynett.no> wrote:
> Is this strategy guaranteed to work in gcc, or is it a case of "it works
> in a simple test, but might fail in a complicated program or with
> different flags" ?

I think it works by design. My understanding is that users providing
their own implementation of those calls is fully supported. I think
that's partly why libatomic.so is a distinct library, and not just
part of libgcc_s.so. The docs aren't entirely clear about this, they
just say that if the compiler can't emit lock-free instructions for
the atomic operation "a call is made to an external routine with the
same parameters to be resolved at run time." But I think that "to be
resolved at run time" means that you can choose how those calls will
be resolved.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Atomic accesses on ARM microcontrollers
  2020-10-10 20:18     ` Jonathan Wakely
@ 2020-10-11 10:54       ` David Brown
  0 siblings, 0 replies; 13+ messages in thread
From: David Brown @ 2020-10-11 10:54 UTC (permalink / raw)
  To: Jonathan Wakely; +Cc: gcc-help

On 10/10/2020 22:18, Jonathan Wakely wrote:
> On Sat, 10 Oct 2020 at 20:43, David Brown <david.brown@hesbynett.no> wrote:
>> Is this strategy guaranteed to work in gcc, or is it a case of "it works
>> in a simple test, but might fail in a complicated program or with
>> different flags" ?
> 
> I think it works by design. My understanding is that users providing
> their own implementation of those calls is fully supported. I think
> that's partly why libatomic.so is a distinct library, and not just
> part of libgcc_s.so. The docs aren't entirely clear about this, they
> just say that if the compiler can't emit lock-free instructions for
> the atomic operation "a call is made to an external routine with the
> same parameters to be resolved at run time." But I think that "to be
> resolved at run time" means that you can choose how those calls will
> be resolved.
> 

Thank you for that - this sounds like the way to go for my code.

And I suppose I should file a bug report for the current libatomic that
comes with gcc.  It would be much better to have no support by default
than to have the current spinlocks - as it stands, on a microcontroller
like mine, it will all appear to work during development and most
testing.  But if in normal processing the code takes the spinlock and
then there is an interrupt whose handler tries to access the same atomic
(or at least one that shares the same lock), the interrupt handler will
be stalled forever.

David

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Atomic accesses on ARM microcontrollers
       [not found]     ` <bdf0f96f-0377-bee7-c02e-9704f0bea6a5@winterflaw.net>
@ 2020-10-11 12:16       ` David Brown
       [not found]         ` <24c49c76-43c3-9a0d-6b02-a4340b1fccba@winterflaw.net>
  0 siblings, 1 reply; 13+ messages in thread
From: David Brown @ 2020-10-11 12:16 UTC (permalink / raw)
  To: Toby Douglass; +Cc: GCC help



On 10/10/2020 22:05, Toby Douglass wrote:
> On 10/10/2020 21:43, David Brown wrote:
>> On 09/10/2020 23:35, Toby Douglass wrote:
>>> On 09/10/2020 20:28, David Brown wrote:
> 
>>> I would like - but cannot - reply to the list, as their email server
>>> does not handle encrypted email.
>>
>> I've put the help list on the cc to my reply - I assume that's okay for
>> you.
> 
> Yes.
> 
>> (Your email to me was not encrypted, unless I am missing something.)
> 
> I mean TLS for SMTP, as opposed to say PGP.
> 

Ah, you have your own mail server that sends directly to the receiving
server?  I always set up my mail servers to send via my ISP's server (a
"smarthost" in Debian setup terms).  That makes this kind of thing an SEP.

>>>> I work primarily with microcontrollers, with 32-bit ARM Cortex-M
>>>> devices
>>>> being the most common these days.  I've been trying out atomics in gcc,
>>>> and I find it badly lacking.
>>>
>>> The 4.1.2 atomics or the later, replacement API?
>>
>> I am not sure what you mean here, or what "4.1.2" refers to - it doesn't
>> match either the gcc manual or the C standards as far as I can see.
> 
> GCC introduced its first API for atomics in version 4.1.2, these guys;
> 

Jonathan Wakely explained the reference.  I've read the manuals for a
/lot/ of gcc versions over the years, but I don't have all the details
in my head!

> https://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html
> 
> Then in a later version, which I can't remember offhand, a second and
> much evolved version of the API was introduced.
> 

Yes.

>> However, "atomic" also has a simpler, more fundamental and clearer
>> meaning with a wider applicability - it means an operation that cannot
>> be divided (or at least, cannot be /observed/ to be divided).  This is
>> the meaning that is important to me here.
> 
> Ah and you mentioned atomically writing larger objects, so we're past
> just caring about say word tearing.
> 

Yes.  Sizes up to 4 bytes can be accessed atomically on this processor
using "normal" operations, and 8 byte accesses are atomic if specific
instructions are used.  (gcc generates these for non-atomic accesses.)
I am hoping to be able to put together a solution for using standard
C11/C++11 atomic types of any size and know that these actually work
correctly.  It is not essential - I can make my own types, functions,
etc., and use them as needed.  But it would be nice and convenient to be
able to use the standard types and functions.

From Jonathan's replies, it seems I can simply make my own libatomic
implementations and use them.

>> What it means is that if thread A stores a value in the
>> atomic variable ax, and thread B attempts to read the value in ax, then
>> B will read either the entire old value before the write, or the entire
>> new value after the write - it will never read an inconsistent partial
>> write.
> 
> I could be wrong, but I think the only way you can do this with atomics
> is copy-on-write.  Make a new copy of the data, and use an atomic to
> flip a pointer, so the readers move atomically from the old version to
> the new version.

I've been thinking a bit more about this, inspired by your post here.
And I believe you are correct - neither ldrex/strex nor load/store
double register is sufficient for 64-bit atomic accesses on the 32-bit
ARM, even for plain reads and writes.  That's annoying - I had thought
the double register read/writes were enough.  But if the store double
register is interruptible with a restart (and I can't find official
documentation on the matter for the Cortex-M7), then an interrupted
store could lead to an inconsistent read by the interrupting code.

I guess I am back to the good old "disable interrupts" solution so
popular in the microcontroller world.  That always works.

> 
>>>> These microcontrollers are all single core, so memory ordering does not
>>>> matter.
>>>
>>> I am not sure this is true.  A single thread must make the world appear
>>> as if events occur in the order specified in the source code, but I bet
>>> you this already not true for interrupts.
>>
>> It is true even for interrupts.
> 
> [snip]
> 
> Thankyou for the insights.  I've done hardly any bare-metal work, so I'm
> not familiar with the actual practicalities of interrupts and their
> effect in these matters.
> 
>>>    It may be for example they can be
>>> re-ordered with regard to each other, and this is not being prevented.
>>
>> Do you mean the kind of re-ordering the compiler does for code?
> 
> I was thinking here of the processor.
> 
>> That is
>> not in question here - at least, not to me.  I know what kinds of
>> reorders are done, and how to prevent them if necessary.  (On a single
>> core, "volatile" is all you need - though there are more efficient ways.
> 
> I'm not sure about that.  I'd need to revisit the subject though to
> rebuild my knowledge, so I can't make any assertion here - only that I
> know I don't know one way or the other.
> 

One thing we can all be sure about - this stuff is difficult, it needs a
/lot/ of thought, and the documentation is often poor on the critical
details.

>> And while the cpu and memory system can include write store buffers,
>> caches, etc., that can affect the order of data hitting the memory,
>> these are not an issue in a single core system.  (They /are/ important
>> for multi-core systems.)
> 
> Yes, I think so too, but to be clear we mean single physical and single
> logical core; no hyperthreading.
> 

Yes, absolutely.

>>> Also, I still don't quite think there *are* atomic loads/stores as such
>>> - although having said that I'm now remembering the LOCK prefix on
>>> Intel, which might be usable with a load.  That would then lock the
>>> cache line and load - but, ah yes, it doesn't *mean* anything to
>>> atomically load.  The very next micro-second you value could be replaced
>>> a new write.
>>
>> Replacing values is not an issue.  The important part is the atomicity
>> of the action.  When thread A reads variable ax, it doesn't matter if
>> thread B (or an interrupt, or whatever) has changed ax just before the
>> read, or just after the read - it matters that it cannot change it
>> /during/ the read.  The key is /consistent/ values, not most up-to-date
>> values.
> 
> Yes.  I can see this from your earlier explanation regarding what you're
> looking for with atomic writes.
> 
>> I had a look through the github sources, but could not find anything
>> relevant.  But obviously that library has a lot more code and features
>> than I am looking for.
> 
> I was only thinking of a single header file which contains the atomics
> for ARM32.  However, it's not useful to you for what you're looking for
> with atomic writes.
> 

Thank you anyway - and thank you for making me think a little more,
correcting a mistake I made!

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Atomic accesses on ARM microcontrollers
       [not found]         ` <24c49c76-43c3-9a0d-6b02-a4340b1fccba@winterflaw.net>
@ 2020-10-11 12:51           ` David Brown
  2020-10-13 11:46             ` Richard Earnshaw
  0 siblings, 1 reply; 13+ messages in thread
From: David Brown @ 2020-10-11 12:51 UTC (permalink / raw)
  To: Toby Douglass; +Cc: GCC help



On 11/10/2020 14:34, Toby Douglass wrote:
> On 11/10/2020 14:16, David Brown wrote:

> 
>>> I could be wrong, but I think the only way you can do this with atomics
>>> is copy-on-write.  Make a new copy of the data, and use an atomic to
>>> flip a pointer, so the readers move atomically from the old version to
>>> the new version.
>>
>> I've been thinking a bit more about this, inspired by your post here.
>> And I believe you are correct - neither ldrex/strex nor load/store
>> double register is sufficient for 64-bit atomic accesses on the 32-bit
>> ARM, even for plain reads and writes.
> 
> No - I think you can have 64-bit atomic stores on a 32-bit CPU.  There
> *is* a double word atomic compare-and-swap.

Certainly it is possible - if the cpu has such an instruction.  The
Cortex-M cores do not.

>  If you define a 64-bit
> integer type, and use it with __atomic_compare_exchange_n(), you should
> get a 64-bit atomic swap.  In older versions of the library, I actually
> had inline assembly for this, but I realised in the end I could in fact
> get GCC to emit the correct code.
> 
> However, I don't understand how a double-word atomic store helps you. 
> If you have an arbitrarily-sized block of memory to update atomically,
> how can you use a 64-bit atomic store to do this?
> 

It would not help for arbitrary blocks of memory, but it /would/ help
for 64-bit blocks.  And that would cover a sizeable majority of
use-cases for me.

>> One thing we can all be sure about - this stuff is difficult, it needs a
>> /lot/ of thought, and the documentation is often poor on the critical
>> details.
> 
> Amen.
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Atomic accesses on ARM microcontrollers
  2020-10-10 19:43   ` David Brown
  2020-10-10 20:18     ` Jonathan Wakely
@ 2020-10-12  7:17     ` David Brown
  1 sibling, 0 replies; 13+ messages in thread
From: David Brown @ 2020-10-12  7:17 UTC (permalink / raw)
  To: David Brown, Jonathan Wakely; +Cc: gcc-help

On 10/10/2020 21:43, David Brown wrote:
> 

> If I understand you correctly, you mean that I can simply implement my
> own version of __atomic_load_8 and other functions in libatomic?
> 
> I had a quick test (using the godbolt.org online compiler).
> 
> By adding this to my file:
> 
> extern inline
> uint64_t __atomic_load_8(const volatile void * p, int order) {
>     (void) order;
>     const volatile uint64_t * q = (const volatile uint64_t *) p;
>     return *q;
> }
> 
> then a straight load of a 64-bit atomic becomes a single "ldrd" load
> double register instruction, which is optimal for this processor.  (In a
> finished solution, I'd want to check that this is correct for different
> flags - possibly adding function attributes for optimisation or inline
> assembly to ensure that it is always correct.  But that's a detail for
> me to check.)
> 
> The same worked for __atomic_store_8.
> 

Just to be clear here, in case anyone else is reading these posts and
uses them for their own code, a single "strd" store double register
operation is not, in general, strong enough for the __atomic_store_8 on
the Cortex-M devices.  If the processor gets an interrupt while
executing the strd, it may have stored the first half of the new value
but not the second half.  If the interrupting code then reads the
object, it will get an inconsistent read.  (If that can't occur - maybe
only your interrupt routine changes the object - there's no problem.)
So the __atomic_store_8 should disable interrupts around the write.

(And again to be clear, this is for single-core microcontrollers with
bare metal, rather than for multi-core or with a host host.)

> (The general load/store functions are a bit more involved, as are the
> read-modify-write atomic functions.)
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Atomic accesses on ARM microcontrollers
  2020-10-10 12:39 ` Jonathan Wakely
  2020-10-10 19:43   ` David Brown
@ 2020-10-12 21:44   ` Patrick Oppenlander
  1 sibling, 0 replies; 13+ messages in thread
From: Patrick Oppenlander @ 2020-10-12 21:44 UTC (permalink / raw)
  To: Jonathan Wakely; +Cc: David Brown, gcc-help

On Sat, Oct 10, 2020 at 11:42 PM Jonathan Wakely via Gcc-help
<gcc-help@gcc.gnu.org> wrote:
>
> On Fri, 9 Oct 2020 at 19:29, David Brown <david.brown@hesbynett.no> wrote:
> >
> > I don't know if this can be answered here, or would be best on the
> > development mailing list.  But I'll start on the help list.
> >
> > I work primarily with microcontrollers, with 32-bit ARM Cortex-M devices
> > being the most common these days.  I've been trying out atomics in gcc,
> > and I find it badly lacking.  (I've tried C11 <stdatomic.h>, C++11
> > <atomic>, and the gcc builtins - they all generate the same results,
> > which is to be expected.)  I'm concentrating on plain loads and stores
> > at the moment, not other atomic operations.
> >
> > These microcontrollers are all single core, so memory ordering does not
> > matter.
> >
> > For 8-bit, 16-bit and 32-bit types, atomic accesses are just simple
> > loads and stores.  These are generated fine.
> >
> > But for 64-bit and above, there are library calls to a compiler-provided
> > library.  For the Cortex M4 and M7 cores (and several other Cortex M
> > cores), the "load double register" and "store double register"
> > instructions are atomic (but not suitable for use with volatile data,
> > since they are restarted if they are interrupted).  The compiler
> > generates these for normal 64-bit types, but not for atomics.
> >
> > For larger types, the situation is far, far worse.  Not only is the
> > library code inefficient on these devices (disabling and re-enabling
> > global interrupts is the optimal solution in most cases, with load/store
> > with reservation being a second option), but it is /wrong/.  The library
> > uses spin locks (AFAICS) - on a single core system, that generally means
> > deadlocking the processor.  That is worse than useless.
> >
> > Is there any way I can replace this library with my own code here, while
> > still using the language atomics?
>
> Yes. My understanding is that libatomic is designed to be replaceable
> by users who want to provide their own custom implementations of the
> API.
>
> You're using bare metal ARM, right? For Arm on Linux I think there are
> kernel helpers that make the atomics efficient even when the hardware
> doesn't support them.

Hi Jonathan,

AFAIK https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88456 has not been
resolved, which means that you can end up with a weird mix of gcc
builtins and your own provided functions.

Patrick

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Atomic accesses on ARM microcontrollers
  2020-10-11 12:51           ` David Brown
@ 2020-10-13 11:46             ` Richard Earnshaw
  0 siblings, 0 replies; 13+ messages in thread
From: Richard Earnshaw @ 2020-10-13 11:46 UTC (permalink / raw)
  To: David Brown, Toby Douglass; +Cc: GCC help

On 11/10/2020 13:51, David Brown wrote:
> 
> 
> On 11/10/2020 14:34, Toby Douglass wrote:
>> On 11/10/2020 14:16, David Brown wrote:
> 
>>
>>>> I could be wrong, but I think the only way you can do this with atomics
>>>> is copy-on-write.  Make a new copy of the data, and use an atomic to
>>>> flip a pointer, so the readers move atomically from the old version to
>>>> the new version.
>>>
>>> I've been thinking a bit more about this, inspired by your post here.
>>> And I believe you are correct - neither ldrex/strex nor load/store
>>> double register is sufficient for 64-bit atomic accesses on the 32-bit
>>> ARM, even for plain reads and writes.
>>
>> No - I think you can have 64-bit atomic stores on a 32-bit CPU.  There
>> *is* a double word atomic compare-and-swap.
> 
> Certainly it is possible - if the cpu has such an instruction.  The
> Cortex-M cores do not.
> 
>>   If you define a 64-bit
>> integer type, and use it with __atomic_compare_exchange_n(), you should
>> get a 64-bit atomic swap.  In older versions of the library, I actually
>> had inline assembly for this, but I realised in the end I could in fact
>> get GCC to emit the correct code.
>>
>> However, I don't understand how a double-word atomic store helps you. 
>> If you have an arbitrarily-sized block of memory to update atomically,
>> how can you use a 64-bit atomic store to do this?
>>
> 
> It would not help for arbitrary blocks of memory, but it /would/ help
> for 64-bit blocks.  And that would cover a sizeable majority of
> use-cases for me.
> 
>>> One thing we can all be sure about - this stuff is difficult, it needs a
>>> /lot/ of thought, and the documentation is often poor on the critical
>>> details.
>>
>> Amen.
>>

I think you've probably identified a real issue in the way we expand
64-bit volatile accesses on all AArch32 targets.  If you have not
already done so, please could you open a bugzilla ticket?

R.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-10-13 11:46 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-09 18:28 Atomic accesses on ARM microcontrollers David Brown
2020-10-09 23:28 ` Segher Boessenkool
2020-10-10 12:39 ` Jonathan Wakely
2020-10-10 19:43   ` David Brown
2020-10-10 20:18     ` Jonathan Wakely
2020-10-11 10:54       ` David Brown
2020-10-12  7:17     ` David Brown
2020-10-12 21:44   ` Patrick Oppenlander
     [not found] ` <b29b1595-9441-68eb-f257-244a35082c82@winterflaw.net>
2020-10-10 19:43   ` David Brown
2020-10-10 20:09     ` Jonathan Wakely
     [not found]     ` <bdf0f96f-0377-bee7-c02e-9704f0bea6a5@winterflaw.net>
2020-10-11 12:16       ` David Brown
     [not found]         ` <24c49c76-43c3-9a0d-6b02-a4340b1fccba@winterflaw.net>
2020-10-11 12:51           ` David Brown
2020-10-13 11:46             ` Richard Earnshaw

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).