Fences/Barriers when mixing C++ atomics and non-atomics

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* Fences/Barriers when mixing C++ atomics and non-atomics
@ 2022-10-13 19:31 Vineet Gupta
  2022-10-13 20:15 ` Jonathan Wakely
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Vineet Gupta @ 2022-10-13 19:31 UTC (permalink / raw)
  To: tech-unprivileged, gcc; +Cc: Hans Boehm, Hongyu Wang, Uros Bizjak

Hi,

I have a testcase (from real workloads) involving C++ atomics and trying 
to understand the codegen (gcc 12) for RVWMO and x86.
It does mix atomics with non-atomics so not obvious what the behavior is 
intended to be hence some explicit CC of subject matter experts 
(apologies for that in advance).

Test has a non-atomic store followed by an atomic_load(SEQ_CST). I 
assume that unadorned direct access defaults to safest/conservative seq_cst.

    extern int g;
    std::atomic<int> a;

    int bar_noaccessor(int n, int *n2)
    {
         *n2 = g;
         return n + a;
    }

    int bar_seqcst(int n, int *n2)
    {
         *n2 = g;
         return n + a.load(std::memory_order_seq_cst);
    }

On RV (rvwmo), with current gcc 12 we get 2 full fences around the load 
as prescribed by Privileged Spec, Chpater A, Table A.6 (Mappings from 
C/C++ to RISC-V primitives).

    _Z10bar_seqcstiPi:
    .LFB382:
         .cfi_startproc
         lui    a5,%hi(g)
         lw    a5,%lo(g)(a5)
         sw    a5,0(a1)
    *fence    iorw,iorw*
         lui    a5,%hi(a)
         lw    a5,%lo(a)(a5)
    *fence    iorw,iorw*
         addw    a0,a5,a0
         ret

OTOH, for x86 (same default toggles) there's no barriers at all.

    _Z10bar_seqcstiPi:
         endbr64
         movl    g(%rip), %eax
         movl    %eax, (%rsi)
         movl    a(%rip), %eax
         addl    %edi, %eax
         ret

My naive intuition was x86 TSO would require a fence before 
load(seq_cst) for a prior store, even if that store was non atomic, so 
ensure load didn't bubble up ahead of store.

Perhaps this begs the general question of intermixing non atomic 
accesses with atomics and if that is undefined behavior or some such. I 
skimmed through C++14 specification chapter Atomic Operations library 
but nothing's jumping out on the topic.

Or is it much deeper, related to As-if rule or something.

Thx,
-Vineet

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Fences/Barriers when mixing C++ atomics and non-atomics
  2022-10-13 19:31 Fences/Barriers when mixing C++ atomics and non-atomics Vineet Gupta
@ 2022-10-13 20:15 ` Jonathan Wakely
  2022-10-13 20:30 ` Uros Bizjak
  2022-10-13 20:54 ` Hans Boehm
  2 siblings, 0 replies; 8+ messages in thread
From: Jonathan Wakely @ 2022-10-13 20:15 UTC (permalink / raw)
  To: Vineet Gupta; +Cc: tech-unprivileged, gcc, Uros Bizjak, Hans Boehm, Hongyu Wang

On Thu, 13 Oct 2022 at 20:31, Vineet Gupta wrote:
>
> Hi,
>
> I have a testcase (from real workloads) involving C++ atomics and trying
> to understand the codegen (gcc 12) for RVWMO and x86.
> It does mix atomics with non-atomics so not obvious what the behavior is
> intended to be hence some explicit CC of subject matter experts
> (apologies for that in advance).
>
> Test has a non-atomic store

And a non-atomic load of 'g'

> followed by an atomic_load(SEQ_CST). I
> assume that unadorned direct access defaults to safest/conservative seq_cst.

Yes, the two functions below are identical.

>
>     extern int g;
>     std::atomic<int> a;
>
>     int bar_noaccessor(int n, int *n2)
>     {
>          *n2 = g;
>          return n + a;
>     }
>
>     int bar_seqcst(int n, int *n2)
>     {
>          *n2 = g;
>          return n + a.load(std::memory_order_seq_cst);
>     }
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Fences/Barriers when mixing C++ atomics and non-atomics
  2022-10-13 19:31 Fences/Barriers when mixing C++ atomics and non-atomics Vineet Gupta
  2022-10-13 20:15 ` Jonathan Wakely
@ 2022-10-13 20:30 ` Uros Bizjak
  2022-10-13 21:14   ` Vineet Gupta
  2022-10-13 20:54 ` Hans Boehm
  2 siblings, 1 reply; 8+ messages in thread
From: Uros Bizjak @ 2022-10-13 20:30 UTC (permalink / raw)
  To: Vineet Gupta; +Cc: tech-unprivileged, gcc, Hans Boehm, Hongyu Wang

On Thu, Oct 13, 2022 at 9:31 PM Vineet Gupta <vineetg@rivosinc.com> wrote:
>
> Hi,
>
> I have a testcase (from real workloads) involving C++ atomics and trying
> to understand the codegen (gcc 12) for RVWMO and x86.
> It does mix atomics with non-atomics so not obvious what the behavior is
> intended to be hence some explicit CC of subject matter experts
> (apologies for that in advance).
>
> Test has a non-atomic store followed by an atomic_load(SEQ_CST). I
> assume that unadorned direct access defaults to safest/conservative seq_cst.
>
>     extern int g;
>     std::atomic<int> a;
>
>     int bar_noaccessor(int n, int *n2)
>     {
>          *n2 = g;
>          return n + a;
>     }
>
>     int bar_seqcst(int n, int *n2)
>     {
>          *n2 = g;
>          return n + a.load(std::memory_order_seq_cst);
>     }
>
> On RV (rvwmo), with current gcc 12 we get 2 full fences around the load
> as prescribed by Privileged Spec, Chpater A, Table A.6 (Mappings from
> C/C++ to RISC-V primitives).
>
>     _Z10bar_seqcstiPi:
>     .LFB382:
>          .cfi_startproc
>          lui    a5,%hi(g)
>          lw    a5,%lo(g)(a5)
>          sw    a5,0(a1)
>     *fence    iorw,iorw*
>          lui    a5,%hi(a)
>          lw    a5,%lo(a)(a5)
>     *fence    iorw,iorw*
>          addw    a0,a5,a0
>          ret
>
>
> OTOH, for x86 (same default toggles) there's no barriers at all.
>
>     _Z10bar_seqcstiPi:
>          endbr64
>          movl    g(%rip), %eax
>          movl    %eax, (%rsi)
>          movl    a(%rip), %eax
>          addl    %edi, %eax
>          ret
>

Regarding x86 memory model, please see Intel® 64 and IA-32 Architectures
Software Developer’s Manual, Volume 3A, section 8.2 [1]

[1] https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html

> My naive intuition was x86 TSO would require a fence before
> load(seq_cst) for a prior store, even if that store was non atomic, so
> ensure load didn't bubble up ahead of store.

As documented in the SDM above, the x86 memory model guarantees that

• Reads are not reordered with other reads.
• Writes are not reordered with older reads.
• Writes to memory are not reordered with other writes, with the
following exceptions:
...
• Reads may be reordered with older writes to different locations but
not with older writes to the same location.
...

Uros.

> Perhaps this begs the general question of intermixing non atomic
> accesses with atomics and if that is undefined behavior or some such. I
> skimmed through C++14 specification chapter Atomic Operations library
> but nothing's jumping out on the topic.
>
> Or is it much deeper, related to As-if rule or something.
>
> Thx,
> -Vineet

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Fences/Barriers when mixing C++ atomics and non-atomics
  2022-10-13 19:31 Fences/Barriers when mixing C++ atomics and non-atomics Vineet Gupta
  2022-10-13 20:15 ` Jonathan Wakely
  2022-10-13 20:30 ` Uros Bizjak
@ 2022-10-13 20:54 ` Hans Boehm
  2022-10-13 21:11   ` Vineet Gupta
  2 siblings, 1 reply; 8+ messages in thread
From: Hans Boehm @ 2022-10-13 20:54 UTC (permalink / raw)
  To: Vineet Gupta; +Cc: tech-unprivileged, gcc, Hongyu Wang, Uros Bizjak

[-- Attachment #1: Type: text/plain, Size: 3801 bytes --]

The generated code here is correct in both cases. In the RISC--V case, I
believe it is conservative, at a minimum, in that atomics should not imply
IO ordering. We had an earlier discussion, which seemed to have consensus
in favor of that opinion. I believe clang does not enforce IO ordering.

You can think of a "sequentially consistent" load roughly as enforcing two
properties:

1) It behaves as an "acquire" load. Later (in program order) memory
operations do not advance past it. This is implicit for x86. It requires
the trailing fence on RISC-V, which could probably be weakened to r,rw.

2) It ensures that seq_cst operations are fully ordered. This means that,
in addition to (1), and the corresponding fence for stores, every seq_cst
store must be separated from a seq_cst load by at least a w,r fence, so a
seq_cst store followed by a seq_cst load is not reordered. w,r fences are
discouraged on RISC-V, and probably no better than rw,rw, so that's how the
leading fence got there. (Again the io ordering should disappear. It's the
responsibility of IO code to insert that explicitly, rather than paying for
it everywhere.)

x86 does (2) by associating that fence with stores instead of loads, either
by using explicit fences after stores, or by turning stores into xchg.
RISC-V could do the same. And I believe that if the current A extension
were the final word on the architecture, it should. But that convention is
not compatible with the later introduction of an "acquire load", which I
think is essential for performance, at least on larger cores. So I think
the two fence mapping for loads should be maintained for now, as I
suggested in the document I posted to the list.

Hans

On Thu, Oct 13, 2022 at 12:31 PM Vineet Gupta <vineetg@rivosinc.com> wrote:

> Hi,
>
> I have a testcase (from real workloads) involving C++ atomics and trying
> to understand the codegen (gcc 12) for RVWMO and x86.
> It does mix atomics with non-atomics so not obvious what the behavior is
> intended to be hence some explicit CC of subject matter experts
> (apologies for that in advance).
>
> Test has a non-atomic store followed by an atomic_load(SEQ_CST). I
> assume that unadorned direct access defaults to safest/conservative
> seq_cst.
>
>     extern int g;
>     std::atomic<int> a;
>
>     int bar_noaccessor(int n, int *n2)
>     {
>          *n2 = g;
>          return n + a;
>     }
>
>     int bar_seqcst(int n, int *n2)
>     {
>          *n2 = g;
>          return n + a.load(std::memory_order_seq_cst);
>     }
>
> On RV (rvwmo), with current gcc 12 we get 2 full fences around the load
> as prescribed by Privileged Spec, Chpater A, Table A.6 (Mappings from
> C/C++ to RISC-V primitives).
>
>     _Z10bar_seqcstiPi:
>     .LFB382:
>          .cfi_startproc
>          lui    a5,%hi(g)
>          lw    a5,%lo(g)(a5)
>          sw    a5,0(a1)
>     *fence    iorw,iorw*
>          lui    a5,%hi(a)
>          lw    a5,%lo(a)(a5)
>     *fence    iorw,iorw*
>          addw    a0,a5,a0
>          ret
>
>
> OTOH, for x86 (same default toggles) there's no barriers at all.
>
>     _Z10bar_seqcstiPi:
>          endbr64
>          movl    g(%rip), %eax
>          movl    %eax, (%rsi)
>          movl    a(%rip), %eax
>          addl    %edi, %eax
>          ret
>
>
> My naive intuition was x86 TSO would require a fence before
> load(seq_cst) for a prior store, even if that store was non atomic, so
> ensure load didn't bubble up ahead of store.
>
> Perhaps this begs the general question of intermixing non atomic
> accesses with atomics and if that is undefined behavior or some such. I
> skimmed through C++14 specification chapter Atomic Operations library
> but nothing's jumping out on the topic.
>
> Or is it much deeper, related to As-if rule or something.
>
> Thx,
> -Vineet
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Fences/Barriers when mixing C++ atomics and non-atomics
  2022-10-13 20:54 ` Hans Boehm
@ 2022-10-13 21:11   ` Vineet Gupta
  2022-10-13 21:43     ` Hans Boehm
  0 siblings, 1 reply; 8+ messages in thread
From: Vineet Gupta @ 2022-10-13 21:11 UTC (permalink / raw)
  To: Hans Boehm; +Cc: tech-unprivileged, gcc, Hongyu Wang, Uros Bizjak

[-- Attachment #1: Type: text/plain, Size: 4981 bytes --]

Hi Hans,

On 10/13/22 13:54, Hans Boehm wrote:
> The generated code here is correct in both cases. In the RISC--V case, 
> I believe it is conservative, at a minimum, in that atomics should not 
> imply IO ordering. We had an earlier discussion, which seemed to have 
> consensus in favor of that opinion. I believe clang does not enforce 
> IO ordering.
>
> You can think of a "sequentially consistent" load roughly as enforcing 
> two properties:
>
> 1) It behaves as an "acquire" load. Later (in program order) memory 
> operations do not advance past it. This is implicit for x86. It 
> requires the trailing fence on RISC-V, which could probably be 
> weakened to r,rw.

Acq implies later things won't leak out, but prior things could still 
leak-in, meaning prior write could happen after load which contradicts 
what user is asking by load(seq_cst) on x86 ?

>
> 2) It ensures that seq_cst operations are fully ordered. This means 
> that, in addition to (1), and the corresponding fence for stores, 
> every seq_cst store must be separated from a seq_cst load by at least 
> a w,r fence, so a seq_cst store followed by a seq_cst load is not 
> reordered.

This makes sense when both store -> load are seq_cst.
But the question is what happens when that store is non atomic. IOW if 
we had a store(relaxed) -> load(seq_cst) would the generated code still 
ensure that load had a full barrier to prevent


> w,r fences are discouraged on RISC-V, and probably no better than 
> rw,rw, so that's how the leading fence got there. (Again the io 
> ordering should disappear. It's the responsibility of IO code to 
> insert that explicitly, rather than paying for it everywhere.)

Thanks for explaining the RV semantics.

>
> x86 does (2) by associating that fence with stores instead of loads, 
> either by using explicit fences after stores, or by turning stores 
> into xchg.

That makes sense as x86 has ld->ld and ld -> st architecturally ordered, 
so any fences ought to be associated with st.

Thx,
-Vineet

> RISC-V could do the same. And I believe that if the current A 
> extension were the final word on the architecture, it should. But that 
> convention is not compatible with the later introduction of an 
> "acquire load", which I think is essential for performance, at least 
> on larger cores. So I think the two fence mapping for loads should be 
> maintained for now, as I suggested in the document I posted to the list.
>
> Hans
>
> On Thu, Oct 13, 2022 at 12:31 PM Vineet Gupta <vineetg@rivosinc.com> 
> wrote:
>
>     Hi,
>
>     I have a testcase (from real workloads) involving C++ atomics and
>     trying
>     to understand the codegen (gcc 12) for RVWMO and x86.
>     It does mix atomics with non-atomics so not obvious what the
>     behavior is
>     intended to be hence some explicit CC of subject matter experts
>     (apologies for that in advance).
>
>     Test has a non-atomic store followed by an atomic_load(SEQ_CST). I
>     assume that unadorned direct access defaults to
>     safest/conservative seq_cst.
>
>         extern int g;
>         std::atomic<int> a;
>
>         int bar_noaccessor(int n, int *n2)
>         {
>              *n2 = g;
>              return n + a;
>         }
>
>         int bar_seqcst(int n, int *n2)
>         {
>              *n2 = g;
>              return n + a.load(std::memory_order_seq_cst);
>         }
>
>     On RV (rvwmo), with current gcc 12 we get 2 full fences around the
>     load
>     as prescribed by Privileged Spec, Chpater A, Table A.6 (Mappings from
>     C/C++ to RISC-V primitives).
>
>         _Z10bar_seqcstiPi:
>         .LFB382:
>              .cfi_startproc
>              lui    a5,%hi(g)
>              lw    a5,%lo(g)(a5)
>              sw    a5,0(a1)
>         *fence    iorw,iorw*
>              lui    a5,%hi(a)
>              lw    a5,%lo(a)(a5)
>         *fence    iorw,iorw*
>              addw    a0,a5,a0
>              ret
>
>
>     OTOH, for x86 (same default toggles) there's no barriers at all.
>
>         _Z10bar_seqcstiPi:
>              endbr64
>              movl    g(%rip), %eax
>              movl    %eax, (%rsi)
>              movl    a(%rip), %eax
>              addl    %edi, %eax
>              ret
>
>
>     My naive intuition was x86 TSO would require a fence before
>     load(seq_cst) for a prior store, even if that store was non
>     atomic, so
>     ensure load didn't bubble up ahead of store.
>
>     Perhaps this begs the general question of intermixing non atomic
>     accesses with atomics and if that is undefined behavior or some
>     such. I
>     skimmed through C++14 specification chapter Atomic Operations library
>     but nothing's jumping out on the topic.
>
>     Or is it much deeper, related to As-if rule or something.
>
>     Thx,
>     -Vineet
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Fences/Barriers when mixing C++ atomics and non-atomics
  2022-10-13 20:30 ` Uros Bizjak
@ 2022-10-13 21:14   ` Vineet Gupta
  2022-10-13 21:29     ` Uros Bizjak
  0 siblings, 1 reply; 8+ messages in thread
From: Vineet Gupta @ 2022-10-13 21:14 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: tech-unprivileged, gcc, Hans Boehm, Hongyu Wang

[-- Attachment #1: Type: text/plain, Size: 1251 bytes --]



On 10/13/22 13:30, Uros Bizjak wrote:
>> OTOH, for x86 (same default toggles) there's no barriers at all.
>>
>>      _Z10bar_seqcstiPi:
>>           endbr64
>>           movl    g(%rip), %eax
>>           movl    %eax, (%rsi)
>>           movl    a(%rip), %eax
>>           addl    %edi, %eax
>>           ret
>>
> Regarding x86 memory model, please see Intel® 64 and IA-32 Architectures
> Software Developer’s Manual, Volume 3A, section 8.2 [1]
>
> [1]https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html
>
>> My naive intuition was x86 TSO would require a fence before
>> load(seq_cst) for a prior store, even if that store was non atomic, so
>> ensure load didn't bubble up ahead of store.
> As documented in the SDM above, the x86 memory model guarantees that
>
> • Reads are not reordered with other reads.
> • Writes are not reordered with older reads.
> • Writes to memory are not reordered with other writes, with the
> following exceptions:
> ...
> • Reads may be reordered with older writes to different locations but
> not with older writes to the same location.

So my example is the last case where older write is followed by read to 
different location and thus potentially could be reordered.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Fences/Barriers when mixing C++ atomics and non-atomics
  2022-10-13 21:14   ` Vineet Gupta
@ 2022-10-13 21:29     ` Uros Bizjak
  0 siblings, 0 replies; 8+ messages in thread
From: Uros Bizjak @ 2022-10-13 21:29 UTC (permalink / raw)
  To: Vineet Gupta; +Cc: tech-unprivileged, gcc, Hans Boehm, Hongyu Wang

On Thu, Oct 13, 2022 at 11:14 PM Vineet Gupta <vineetg@rivosinc.com> wrote:
>
>
>
> On 10/13/22 13:30, Uros Bizjak wrote:
>
> OTOH, for x86 (same default toggles) there's no barriers at all.
>
>     _Z10bar_seqcstiPi:
>          endbr64
>          movl    g(%rip), %eax
>          movl    %eax, (%rsi)
>          movl    a(%rip), %eax
>          addl    %edi, %eax
>          ret
>
> Regarding x86 memory model, please see Intel® 64 and IA-32 Architectures
> Software Developer’s Manual, Volume 3A, section 8.2 [1]
>
> [1] https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html
>
> My naive intuition was x86 TSO would require a fence before
> load(seq_cst) for a prior store, even if that store was non atomic, so
> ensure load didn't bubble up ahead of store.
>
> As documented in the SDM above, the x86 memory model guarantees that
>
> • Reads are not reordered with other reads.
> • Writes are not reordered with older reads.
> • Writes to memory are not reordered with other writes, with the
> following exceptions:
> ...
> • Reads may be reordered with older writes to different locations but
> not with older writes to the same location.
>
>
> So my example is the last case where older write is followed by read to different location and thus potentially could be reordered.

Yes, but can this reordening be observed under the above conditions?
There is additional rule, where:

In the case of I/O operations, both reads and writes always appear in
programmed order.

Uros.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Fences/Barriers when mixing C++ atomics and non-atomics
  2022-10-13 21:11   ` Vineet Gupta
@ 2022-10-13 21:43     ` Hans Boehm
  0 siblings, 0 replies; 8+ messages in thread
From: Hans Boehm @ 2022-10-13 21:43 UTC (permalink / raw)
  To: Vineet Gupta; +Cc: tech-unprivileged, gcc, Hongyu Wang, Uros Bizjak

[-- Attachment #1: Type: text/plain, Size: 5722 bytes --]

On Thu, Oct 13, 2022 at 2:11 PM Vineet Gupta <vineetg@rivosinc.com> wrote:

> Hi Hans,
>
> On 10/13/22 13:54, Hans Boehm wrote:
>
> The generated code here is correct in both cases. In the RISC--V case, I
> believe it is conservative, at a minimum, in that atomics should not imply
> IO ordering. We had an earlier discussion, which seemed to have consensus
> in favor of that opinion. I believe clang does not enforce IO ordering.
>
> You can think of a "sequentially consistent" load roughly as enforcing two
> properties:
>
> 1) It behaves as an "acquire" load. Later (in program order) memory
> operations do not advance past it. This is implicit for x86. It requires
> the trailing fence on RISC-V, which could probably be weakened to r,rw.
>
>
> Acq implies later things won't leak out, but prior things could still
> leak-in, meaning prior write could happen after load which contradicts what
> user is asking by load(seq_cst) on x86 ?
>
> Agreed.

>
> 2) It ensures that seq_cst operations are fully ordered. This means that,
> in addition to (1), and the corresponding fence for stores, every seq_cst
> store must be separated from a seq_cst load by at least a w,r fence, so a
> seq_cst store followed by a seq_cst load is not reordered.
>
>
> This makes sense when both store -> load are seq_cst.
> But the question is what happens when that store is non atomic. IOW if we
> had a store(relaxed) -> load(seq_cst) would the generated code still ensure
> that load had a full barrier to prevent
>
> That reordering is not observable in conforming C or C++ code. To observe
that reordering, another thread would have to  concurrently load from the
same location as the non-atomic store. That's a data race and undefined
behavior, at least in C and C++.

Perhaps more importantly here, if the earlier store is a relaxed store,
then the relaxed store is not ordered with respect to a subsequent seq_cst
load, just as it would not be ordered by a subsequent critical section.
You can think of C++ seq_cst as being roughly the minimal ordering to
guarantee that if you only use locks and seq_cst atomics (and avoid data
races as required), everything looks sequentially consistent.

I think the Linux kernel has made some different decisions here that give
atomics stronger ordering properties than lock-based critical sections.

>
> w,r fences are discouraged on RISC-V, and probably no better than rw,rw,
> so that's how the leading fence got there. (Again the io ordering should
> disappear. It's the responsibility of IO code to insert that explicitly,
> rather than paying for it everywhere.)
>
>
> Thanks for explaining the RV semantics.
>
>
> x86 does (2) by associating that fence with stores instead of loads,
> either by using explicit fences after stores, or by turning stores into
> xchg.
>
>
> That makes sense as x86 has ld->ld and ld -> st architecturally ordered,
> so any fences ought to be associated with st.
>
It also guarantees st->st and ld->st. The decision is arbitrary, except
that we believe that there will be fewer stores than loads that need those
fences.

>
> Thx,
> -Vineet
>
> RISC-V could do the same. And I believe that if the current A extension
> were the final word on the architecture, it should. But that convention is
> not compatible with the later introduction of an "acquire load", which I
> think is essential for performance, at least on larger cores. So I think
> the two fence mapping for loads should be maintained for now, as I
> suggested in the document I posted to the list.
>
> Hans
>
> On Thu, Oct 13, 2022 at 12:31 PM Vineet Gupta <vineetg@rivosinc.com>
> wrote:
>
>> Hi,
>>
>> I have a testcase (from real workloads) involving C++ atomics and trying
>> to understand the codegen (gcc 12) for RVWMO and x86.
>> It does mix atomics with non-atomics so not obvious what the behavior is
>> intended to be hence some explicit CC of subject matter experts
>> (apologies for that in advance).
>>
>> Test has a non-atomic store followed by an atomic_load(SEQ_CST). I
>> assume that unadorned direct access defaults to safest/conservative
>> seq_cst.
>>
>>     extern int g;
>>     std::atomic<int> a;
>>
>>     int bar_noaccessor(int n, int *n2)
>>     {
>>          *n2 = g;
>>          return n + a;
>>     }
>>
>>     int bar_seqcst(int n, int *n2)
>>     {
>>          *n2 = g;
>>          return n + a.load(std::memory_order_seq_cst);
>>     }
>>
>> On RV (rvwmo), with current gcc 12 we get 2 full fences around the load
>> as prescribed by Privileged Spec, Chpater A, Table A.6 (Mappings from
>> C/C++ to RISC-V primitives).
>>
>>     _Z10bar_seqcstiPi:
>>     .LFB382:
>>          .cfi_startproc
>>          lui    a5,%hi(g)
>>          lw    a5,%lo(g)(a5)
>>          sw    a5,0(a1)
>>     *fence    iorw,iorw*
>>          lui    a5,%hi(a)
>>          lw    a5,%lo(a)(a5)
>>     *fence    iorw,iorw*
>>          addw    a0,a5,a0
>>          ret
>>
>>
>> OTOH, for x86 (same default toggles) there's no barriers at all.
>>
>>     _Z10bar_seqcstiPi:
>>          endbr64
>>          movl    g(%rip), %eax
>>          movl    %eax, (%rsi)
>>          movl    a(%rip), %eax
>>          addl    %edi, %eax
>>          ret
>>
>>
>> My naive intuition was x86 TSO would require a fence before
>> load(seq_cst) for a prior store, even if that store was non atomic, so
>> ensure load didn't bubble up ahead of store.
>>
>> Perhaps this begs the general question of intermixing non atomic
>> accesses with atomics and if that is undefined behavior or some such. I
>> skimmed through C++14 specification chapter Atomic Operations library
>> but nothing's jumping out on the topic.
>>
>> Or is it much deeper, related to As-if rule or something.
>>
>> Thx,
>> -Vineet
>>
>
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-10-13 21:43 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-13 19:31 Fences/Barriers when mixing C++ atomics and non-atomics Vineet Gupta
2022-10-13 20:15 ` Jonathan Wakely
2022-10-13 20:30 ` Uros Bizjak
2022-10-13 21:14   ` Vineet Gupta
2022-10-13 21:29     ` Uros Bizjak
2022-10-13 20:54 ` Hans Boehm
2022-10-13 21:11   ` Vineet Gupta
2022-10-13 21:43     ` Hans Boehm

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).