From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ua1-x92d.google.com (mail-ua1-x92d.google.com [IPv6:2607:f8b0:4864:20::92d]) by sourceware.org (Postfix) with ESMTPS id B749E3858C55 for ; Thu, 13 Oct 2022 20:54:37 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B749E3858C55 Authentication-Results: sourceware.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=google.com Received: by mail-ua1-x92d.google.com with SMTP id i16so1201979uak.1 for ; Thu, 13 Oct 2022 13:54:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=6/gPX4RDKgXtYDMqGk6R+39/u2tNa5WSamP/qyiKwX0=; b=AT/QTpOq/NwgM8khS4mrUUpsQxKjG44N+CZgNWpFLXAUfuuYaCX3LWoV469OUaUsBF u6xrbeMneqWEn19vbQG52y+QBIokoTcXXOCeBbHWoFnA4GJERbKT/LyyvICMajNf91vi ofXGdPvBiCDMoWBceeLnedz3YgmEVW5Anie1mTAWTtSMDjQ7WjMgKtTtgMiZuIIS1M3Q IOdcekmY8z59u7+9o1bUjEzFomTPVn4OiOWaC3lk68n/H+oC57jW+4SDIIK07wcBpNyK VW1yoLA+VQ85vZJhoi3VDF7pTtu6/nS2w6oN3DzQ7DM5Hh1tqMv0/NYbRjjKwgY3483m UfsA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=6/gPX4RDKgXtYDMqGk6R+39/u2tNa5WSamP/qyiKwX0=; b=eXyEEYk0V86tqo+P9GQC9QACXCigLJ8RMbEbXmDVJtm58nuRfEjOuQoub6IJQ//EEA pW3ftr09IIpMumb0AeNli54mWsozpFm0tgNsWUYHjexiicEIgJWRfZ4abaFJS8EMQePb jod8Wi/pAzfuK+jFlzG7o+iQtsRruamStQ8OnKkhTr47vNL9RbU5kS4Ek019wIvxKlYW nF72Q0eAbvC8plfM7VOEjBi4Tm44RggokS9jWFEn3iv9L726BFGLyksHYKMmOTL5E3xS 28E5ZYbZLmyG8Fvq27xoFHYML78O0xXHFhY1pa5El9PGd6kKPOkCUcssBzblH12J+2WT TtxA== X-Gm-Message-State: ACrzQf1iZmXbQhl39NYnyzkwvBa+VijrAyxOpK7n8UwXDrd2rcL5wL/Z VSu5iz5Em1tV2sOFdZSINPqzhjKHiRqy1crVgZVDow== X-Google-Smtp-Source: AMsMyM7Qh/F8JSufWbLR1VqG8UchhxYZ8q8YIryqwKwDlxY9kNQf/6ji1dN/lKLdfiUDIy+4o734sEcjn/KxW2izpHw= X-Received: by 2002:ab0:36dc:0:b0:3e2:8aaa:7572 with SMTP id v28-20020ab036dc000000b003e28aaa7572mr965147uau.61.1665694476921; Thu, 13 Oct 2022 13:54:36 -0700 (PDT) MIME-Version: 1.0 References: <8c7380d2-2587-78c7-a85a-a4c8afef2284@rivosinc.com> In-Reply-To: <8c7380d2-2587-78c7-a85a-a4c8afef2284@rivosinc.com> From: Hans Boehm Date: Thu, 13 Oct 2022 13:54:25 -0700 Message-ID: Subject: Re: Fences/Barriers when mixing C++ atomics and non-atomics To: Vineet Gupta Cc: tech-unprivileged@lists.riscv.org, gcc@gcc.gnu.org, Hongyu Wang , Uros Bizjak Content-Type: multipart/alternative; boundary="0000000000009a3dc105eaf0b70e" X-Spam-Status: No, score=-16.2 required=5.0 tests=BAYES_00,DKIMWL_WL_MED,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,ENV_AND_HDR_SPF_MATCH,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: --0000000000009a3dc105eaf0b70e Content-Type: text/plain; charset="UTF-8" The generated code here is correct in both cases. In the RISC--V case, I believe it is conservative, at a minimum, in that atomics should not imply IO ordering. We had an earlier discussion, which seemed to have consensus in favor of that opinion. I believe clang does not enforce IO ordering. You can think of a "sequentially consistent" load roughly as enforcing two properties: 1) It behaves as an "acquire" load. Later (in program order) memory operations do not advance past it. This is implicit for x86. It requires the trailing fence on RISC-V, which could probably be weakened to r,rw. 2) It ensures that seq_cst operations are fully ordered. This means that, in addition to (1), and the corresponding fence for stores, every seq_cst store must be separated from a seq_cst load by at least a w,r fence, so a seq_cst store followed by a seq_cst load is not reordered. w,r fences are discouraged on RISC-V, and probably no better than rw,rw, so that's how the leading fence got there. (Again the io ordering should disappear. It's the responsibility of IO code to insert that explicitly, rather than paying for it everywhere.) x86 does (2) by associating that fence with stores instead of loads, either by using explicit fences after stores, or by turning stores into xchg. RISC-V could do the same. And I believe that if the current A extension were the final word on the architecture, it should. But that convention is not compatible with the later introduction of an "acquire load", which I think is essential for performance, at least on larger cores. So I think the two fence mapping for loads should be maintained for now, as I suggested in the document I posted to the list. Hans On Thu, Oct 13, 2022 at 12:31 PM Vineet Gupta wrote: > Hi, > > I have a testcase (from real workloads) involving C++ atomics and trying > to understand the codegen (gcc 12) for RVWMO and x86. > It does mix atomics with non-atomics so not obvious what the behavior is > intended to be hence some explicit CC of subject matter experts > (apologies for that in advance). > > Test has a non-atomic store followed by an atomic_load(SEQ_CST). I > assume that unadorned direct access defaults to safest/conservative > seq_cst. > > extern int g; > std::atomic a; > > int bar_noaccessor(int n, int *n2) > { > *n2 = g; > return n + a; > } > > int bar_seqcst(int n, int *n2) > { > *n2 = g; > return n + a.load(std::memory_order_seq_cst); > } > > On RV (rvwmo), with current gcc 12 we get 2 full fences around the load > as prescribed by Privileged Spec, Chpater A, Table A.6 (Mappings from > C/C++ to RISC-V primitives). > > _Z10bar_seqcstiPi: > .LFB382: > .cfi_startproc > lui a5,%hi(g) > lw a5,%lo(g)(a5) > sw a5,0(a1) > *fence iorw,iorw* > lui a5,%hi(a) > lw a5,%lo(a)(a5) > *fence iorw,iorw* > addw a0,a5,a0 > ret > > > OTOH, for x86 (same default toggles) there's no barriers at all. > > _Z10bar_seqcstiPi: > endbr64 > movl g(%rip), %eax > movl %eax, (%rsi) > movl a(%rip), %eax > addl %edi, %eax > ret > > > My naive intuition was x86 TSO would require a fence before > load(seq_cst) for a prior store, even if that store was non atomic, so > ensure load didn't bubble up ahead of store. > > Perhaps this begs the general question of intermixing non atomic > accesses with atomics and if that is undefined behavior or some such. I > skimmed through C++14 specification chapter Atomic Operations library > but nothing's jumping out on the topic. > > Or is it much deeper, related to As-if rule or something. > > Thx, > -Vineet > --0000000000009a3dc105eaf0b70e--