Re: Fences/Barriers when mixing C++ atomics and non-atomics

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

From: Hans Boehm <hboehm@google.com>
To: Vineet Gupta <vineetg@rivosinc.com>
Cc: tech-unprivileged@lists.riscv.org, gcc@gcc.gnu.org,
	 Hongyu Wang <hongyu.wang@intel.com>,
	Uros Bizjak <ubizjak@gmail.com>
Subject: Re: Fences/Barriers when mixing C++ atomics and non-atomics
Date: Thu, 13 Oct 2022 14:43:25 -0700	[thread overview]
Message-ID: <CAMOCf+jB=pfKBmnygd+tY0m4PzX8xyG_Uh0LNhWBf9wWip8X0Q@mail.gmail.com> (raw)
In-Reply-To: <9838fc0f-a519-bfd3-3eb2-88255742ae57@rivosinc.com>

[-- Attachment #1: Type: text/plain, Size: 5722 bytes --]

On Thu, Oct 13, 2022 at 2:11 PM Vineet Gupta <vineetg@rivosinc.com> wrote:

> Hi Hans,
>
> On 10/13/22 13:54, Hans Boehm wrote:
>
> The generated code here is correct in both cases. In the RISC--V case, I
> believe it is conservative, at a minimum, in that atomics should not imply
> IO ordering. We had an earlier discussion, which seemed to have consensus
> in favor of that opinion. I believe clang does not enforce IO ordering.
>
> You can think of a "sequentially consistent" load roughly as enforcing two
> properties:
>
> 1) It behaves as an "acquire" load. Later (in program order) memory
> operations do not advance past it. This is implicit for x86. It requires
> the trailing fence on RISC-V, which could probably be weakened to r,rw.
>
>
> Acq implies later things won't leak out, but prior things could still
> leak-in, meaning prior write could happen after load which contradicts what
> user is asking by load(seq_cst) on x86 ?
>
> Agreed.

>
> 2) It ensures that seq_cst operations are fully ordered. This means that,
> in addition to (1), and the corresponding fence for stores, every seq_cst
> store must be separated from a seq_cst load by at least a w,r fence, so a
> seq_cst store followed by a seq_cst load is not reordered.
>
>
> This makes sense when both store -> load are seq_cst.
> But the question is what happens when that store is non atomic. IOW if we
> had a store(relaxed) -> load(seq_cst) would the generated code still ensure
> that load had a full barrier to prevent
>
> That reordering is not observable in conforming C or C++ code. To observe
that reordering, another thread would have to  concurrently load from the
same location as the non-atomic store. That's a data race and undefined
behavior, at least in C and C++.

Perhaps more importantly here, if the earlier store is a relaxed store,
then the relaxed store is not ordered with respect to a subsequent seq_cst
load, just as it would not be ordered by a subsequent critical section.
You can think of C++ seq_cst as being roughly the minimal ordering to
guarantee that if you only use locks and seq_cst atomics (and avoid data
races as required), everything looks sequentially consistent.

I think the Linux kernel has made some different decisions here that give
atomics stronger ordering properties than lock-based critical sections.

>
> w,r fences are discouraged on RISC-V, and probably no better than rw,rw,
> so that's how the leading fence got there. (Again the io ordering should
> disappear. It's the responsibility of IO code to insert that explicitly,
> rather than paying for it everywhere.)
>
>
> Thanks for explaining the RV semantics.
>
>
> x86 does (2) by associating that fence with stores instead of loads,
> either by using explicit fences after stores, or by turning stores into
> xchg.
>
>
> That makes sense as x86 has ld->ld and ld -> st architecturally ordered,
> so any fences ought to be associated with st.
>
It also guarantees st->st and ld->st. The decision is arbitrary, except
that we believe that there will be fewer stores than loads that need those
fences.

>
> Thx,
> -Vineet
>
> RISC-V could do the same. And I believe that if the current A extension
> were the final word on the architecture, it should. But that convention is
> not compatible with the later introduction of an "acquire load", which I
> think is essential for performance, at least on larger cores. So I think
> the two fence mapping for loads should be maintained for now, as I
> suggested in the document I posted to the list.
>
> Hans
>
> On Thu, Oct 13, 2022 at 12:31 PM Vineet Gupta <vineetg@rivosinc.com>
> wrote:
>
>> Hi,
>>
>> I have a testcase (from real workloads) involving C++ atomics and trying
>> to understand the codegen (gcc 12) for RVWMO and x86.
>> It does mix atomics with non-atomics so not obvious what the behavior is
>> intended to be hence some explicit CC of subject matter experts
>> (apologies for that in advance).
>>
>> Test has a non-atomic store followed by an atomic_load(SEQ_CST). I
>> assume that unadorned direct access defaults to safest/conservative
>> seq_cst.
>>
>>     extern int g;
>>     std::atomic<int> a;
>>
>>     int bar_noaccessor(int n, int *n2)
>>     {
>>          *n2 = g;
>>          return n + a;
>>     }
>>
>>     int bar_seqcst(int n, int *n2)
>>     {
>>          *n2 = g;
>>          return n + a.load(std::memory_order_seq_cst);
>>     }
>>
>> On RV (rvwmo), with current gcc 12 we get 2 full fences around the load
>> as prescribed by Privileged Spec, Chpater A, Table A.6 (Mappings from
>> C/C++ to RISC-V primitives).
>>
>>     _Z10bar_seqcstiPi:
>>     .LFB382:
>>          .cfi_startproc
>>          lui    a5,%hi(g)
>>          lw    a5,%lo(g)(a5)
>>          sw    a5,0(a1)
>>     *fence    iorw,iorw*
>>          lui    a5,%hi(a)
>>          lw    a5,%lo(a)(a5)
>>     *fence    iorw,iorw*
>>          addw    a0,a5,a0
>>          ret
>>
>>
>> OTOH, for x86 (same default toggles) there's no barriers at all.
>>
>>     _Z10bar_seqcstiPi:
>>          endbr64
>>          movl    g(%rip), %eax
>>          movl    %eax, (%rsi)
>>          movl    a(%rip), %eax
>>          addl    %edi, %eax
>>          ret
>>
>>
>> My naive intuition was x86 TSO would require a fence before
>> load(seq_cst) for a prior store, even if that store was non atomic, so
>> ensure load didn't bubble up ahead of store.
>>
>> Perhaps this begs the general question of intermixing non atomic
>> accesses with atomics and if that is undefined behavior or some such. I
>> skimmed through C++14 specification chapter Atomic Operations library
>> but nothing's jumping out on the topic.
>>
>> Or is it much deeper, related to As-if rule or something.
>>
>> Thx,
>> -Vineet
>>
>
>

     prev parent reply	other threads:[~2022-10-13 21:43 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-13 19:31 Vineet Gupta
2022-10-13 20:15 ` Jonathan Wakely
2022-10-13 20:30 ` Uros Bizjak
2022-10-13 21:14   ` Vineet Gupta
2022-10-13 21:29     ` Uros Bizjak
2022-10-13 20:54 ` Hans Boehm
2022-10-13 21:11   ` Vineet Gupta
2022-10-13 21:43     ` Hans Boehm [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAMOCf+jB=pfKBmnygd+tY0m4PzX8xyG_Uh0LNhWBf9wWip8X0Q@mail.gmail.com' \
    --to=hboehm@google.com \
    --cc=gcc@gcc.gnu.org \
    --cc=hongyu.wang@intel.com \
    --cc=tech-unprivileged@lists.riscv.org \
    --cc=ubizjak@gmail.com \
    --cc=vineetg@rivosinc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).