From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-vs1-xe36.google.com (mail-vs1-xe36.google.com [IPv6:2607:f8b0:4864:20::e36]) by sourceware.org (Postfix) with ESMTPS id 8FBA63858C55 for ; Thu, 13 Oct 2022 21:43:38 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 8FBA63858C55 Authentication-Results: sourceware.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=google.com Received: by mail-vs1-xe36.google.com with SMTP id 3so3162334vsh.5 for ; Thu, 13 Oct 2022 14:43:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=qQGIhKrLpnvNqpXIWne4k5uj4ctUjHV9ZaKdIoJICBA=; b=AwU9zkuuaP0wpGD5nwBRaDAp+7jabmjLiPo/ZpviIT9NopVz905tSI0Ugxz/A+TMFy HOIDcxgKE1dzkyP98IYtj3tS6rLfAUVYkbvZVKyUuQMqh591Xfh3E6gc/ynTvhWFdo2X +vomRSaKMuU/PYyzeX+PzgMm9JA7ZIqU5iJ87xGv/OKnCbAq4pO4VsH8tigjtPFnbrKa ajqM/mFY6Afy38nvL2bei5iONGd7gVOsEktkcktOUR/UUr07Dy6+8KdNq8M2OCMmmOWl EGIs7h9Mpk77Ri3ezSRUCnd+a+ThgVsMChkHlnS6vKiIfZhx8Lg5Q4SK9d85JUn6hD5o vFBg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=qQGIhKrLpnvNqpXIWne4k5uj4ctUjHV9ZaKdIoJICBA=; b=LhB6tvRsaDNkQ7R2PFNMKxiO04n4XsDzE1oT3GeKAnfYo67uO7NPxLv3vNVJEcHcyB J/tCcCT7q/P4RJ/zTxL2Es8G965ue9CaZRyPKwm6XL0fpxXm3qeb7oH3sbI0BlTfqDxS A/OfyNxCpEV6DlgWxFMTzgaZxqrRUdb4MZojZSWM4LR2I5cDVPDGKXtjicxtpCGdxh3h HWGubJh6T6PCmGcz9k51gdlMm+PuoNv38gkNomWFaVTnpgGm/K+iSDj+4Gs2Uebf8dQC zCYsDexdeX4DJutBLFb7R03ZaaPJoPWQ76M50Q2ZHIJO8x77FulhCShyaM+SliOaQ/ym AHJA== X-Gm-Message-State: ACrzQf14UccZNmaUMlpU6BHz15PpVVFdk4jBjbS9/oJ591kDH7nOpoys KSgvmBZp4nKneuavl6o9xqBTGQtjFQiiCnBSB6Hv8w== X-Google-Smtp-Source: AMsMyM6bVZa2n28mCtlVYz092nQCxe+6OMpmNxH4fhfhFqCQlAu2svcw8IvQ6OtRcUEl+FW2+jo5n9JTjMBrYb8YWxM= X-Received: by 2002:a67:a24e:0:b0:3a5:38a0:b610 with SMTP id t14-20020a67a24e000000b003a538a0b610mr1287429vsh.2.1665697417255; Thu, 13 Oct 2022 14:43:37 -0700 (PDT) MIME-Version: 1.0 References: <8c7380d2-2587-78c7-a85a-a4c8afef2284@rivosinc.com> <9838fc0f-a519-bfd3-3eb2-88255742ae57@rivosinc.com> In-Reply-To: <9838fc0f-a519-bfd3-3eb2-88255742ae57@rivosinc.com> From: Hans Boehm Date: Thu, 13 Oct 2022 14:43:25 -0700 Message-ID: Subject: Re: Fences/Barriers when mixing C++ atomics and non-atomics To: Vineet Gupta Cc: tech-unprivileged@lists.riscv.org, gcc@gcc.gnu.org, Hongyu Wang , Uros Bizjak Content-Type: multipart/alternative; boundary="000000000000dc3f5505eaf16675" X-Spam-Status: No, score=-16.5 required=5.0 tests=BAYES_00,DKIMWL_WL_MED,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,ENV_AND_HDR_SPF_MATCH,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: --000000000000dc3f5505eaf16675 Content-Type: text/plain; charset="UTF-8" On Thu, Oct 13, 2022 at 2:11 PM Vineet Gupta wrote: > Hi Hans, > > On 10/13/22 13:54, Hans Boehm wrote: > > The generated code here is correct in both cases. In the RISC--V case, I > believe it is conservative, at a minimum, in that atomics should not imply > IO ordering. We had an earlier discussion, which seemed to have consensus > in favor of that opinion. I believe clang does not enforce IO ordering. > > You can think of a "sequentially consistent" load roughly as enforcing two > properties: > > 1) It behaves as an "acquire" load. Later (in program order) memory > operations do not advance past it. This is implicit for x86. It requires > the trailing fence on RISC-V, which could probably be weakened to r,rw. > > > Acq implies later things won't leak out, but prior things could still > leak-in, meaning prior write could happen after load which contradicts what > user is asking by load(seq_cst) on x86 ? > > Agreed. > > 2) It ensures that seq_cst operations are fully ordered. This means that, > in addition to (1), and the corresponding fence for stores, every seq_cst > store must be separated from a seq_cst load by at least a w,r fence, so a > seq_cst store followed by a seq_cst load is not reordered. > > > This makes sense when both store -> load are seq_cst. > But the question is what happens when that store is non atomic. IOW if we > had a store(relaxed) -> load(seq_cst) would the generated code still ensure > that load had a full barrier to prevent > > That reordering is not observable in conforming C or C++ code. To observe that reordering, another thread would have to concurrently load from the same location as the non-atomic store. That's a data race and undefined behavior, at least in C and C++. Perhaps more importantly here, if the earlier store is a relaxed store, then the relaxed store is not ordered with respect to a subsequent seq_cst load, just as it would not be ordered by a subsequent critical section. You can think of C++ seq_cst as being roughly the minimal ordering to guarantee that if you only use locks and seq_cst atomics (and avoid data races as required), everything looks sequentially consistent. I think the Linux kernel has made some different decisions here that give atomics stronger ordering properties than lock-based critical sections. > > w,r fences are discouraged on RISC-V, and probably no better than rw,rw, > so that's how the leading fence got there. (Again the io ordering should > disappear. It's the responsibility of IO code to insert that explicitly, > rather than paying for it everywhere.) > > > Thanks for explaining the RV semantics. > > > x86 does (2) by associating that fence with stores instead of loads, > either by using explicit fences after stores, or by turning stores into > xchg. > > > That makes sense as x86 has ld->ld and ld -> st architecturally ordered, > so any fences ought to be associated with st. > It also guarantees st->st and ld->st. The decision is arbitrary, except that we believe that there will be fewer stores than loads that need those fences. > > Thx, > -Vineet > > RISC-V could do the same. And I believe that if the current A extension > were the final word on the architecture, it should. But that convention is > not compatible with the later introduction of an "acquire load", which I > think is essential for performance, at least on larger cores. So I think > the two fence mapping for loads should be maintained for now, as I > suggested in the document I posted to the list. > > Hans > > On Thu, Oct 13, 2022 at 12:31 PM Vineet Gupta > wrote: > >> Hi, >> >> I have a testcase (from real workloads) involving C++ atomics and trying >> to understand the codegen (gcc 12) for RVWMO and x86. >> It does mix atomics with non-atomics so not obvious what the behavior is >> intended to be hence some explicit CC of subject matter experts >> (apologies for that in advance). >> >> Test has a non-atomic store followed by an atomic_load(SEQ_CST). I >> assume that unadorned direct access defaults to safest/conservative >> seq_cst. >> >> extern int g; >> std::atomic a; >> >> int bar_noaccessor(int n, int *n2) >> { >> *n2 = g; >> return n + a; >> } >> >> int bar_seqcst(int n, int *n2) >> { >> *n2 = g; >> return n + a.load(std::memory_order_seq_cst); >> } >> >> On RV (rvwmo), with current gcc 12 we get 2 full fences around the load >> as prescribed by Privileged Spec, Chpater A, Table A.6 (Mappings from >> C/C++ to RISC-V primitives). >> >> _Z10bar_seqcstiPi: >> .LFB382: >> .cfi_startproc >> lui a5,%hi(g) >> lw a5,%lo(g)(a5) >> sw a5,0(a1) >> *fence iorw,iorw* >> lui a5,%hi(a) >> lw a5,%lo(a)(a5) >> *fence iorw,iorw* >> addw a0,a5,a0 >> ret >> >> >> OTOH, for x86 (same default toggles) there's no barriers at all. >> >> _Z10bar_seqcstiPi: >> endbr64 >> movl g(%rip), %eax >> movl %eax, (%rsi) >> movl a(%rip), %eax >> addl %edi, %eax >> ret >> >> >> My naive intuition was x86 TSO would require a fence before >> load(seq_cst) for a prior store, even if that store was non atomic, so >> ensure load didn't bubble up ahead of store. >> >> Perhaps this begs the general question of intermixing non atomic >> accesses with atomics and if that is undefined behavior or some such. I >> skimmed through C++14 specification chapter Atomic Operations library >> but nothing's jumping out on the topic. >> >> Or is it much deeper, related to As-if rule or something. >> >> Thx, >> -Vineet >> > > --000000000000dc3f5505eaf16675--