From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf1-x42b.google.com (mail-pf1-x42b.google.com [IPv6:2607:f8b0:4864:20::42b]) by sourceware.org (Postfix) with ESMTPS id 981F63858404 for ; Thu, 13 Oct 2022 21:11:17 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 981F63858404 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivosinc.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivosinc.com Received: by mail-pf1-x42b.google.com with SMTP id h13so3060403pfr.7 for ; Thu, 13 Oct 2022 14:11:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20210112.gappssmtp.com; s=20210112; h=in-reply-to:from:references:cc:to:content-language:subject :user-agent:mime-version:date:message-id:from:to:cc:subject:date :message-id:reply-to; bh=dwnP/jFxycQw+P26AT45nA2NmplE8DFrgUGM86GZV5Q=; b=AgNuRoJC9OfUgdi3SamqZtKPG878Ir2UwimUghVB4N1tLdvM7HwpXAWZSuk54e5TLj xpTT3ZSW9twh+ICuDw210qFkxb1/6HzX0oladYdMAYrquqyzseemfP/MFVyTq+teM4C0 fOA5ONaL9Fsbd0X6+LJIlDpOgW2jlikXjsGQGrmSr/lB537MxyrQpBbbgBDCOAIfPo6h iUmR0QzrY+QqF8Z1sP1gAt+fF5WhXC998PKoGbO2YTaI7xQTobMowjRuAv+UWnrJ53qX +fUMpz58ZbQawWiUtw9IAP7obCFj2jlwDay1ss0F/LqGDqi9XvEgZ63joEkl0CbT9eDN zrIw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:from:references:cc:to:content-language:subject :user-agent:mime-version:date:message-id:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=dwnP/jFxycQw+P26AT45nA2NmplE8DFrgUGM86GZV5Q=; b=YQn8+MMS3EqXk3D7Beujk9blNg2a6jUWFKD5Pt9KZJFQlJfySJQNrUnyMTFhv4QU06 W6Kg9mORAbVRkM1+nuQGWHGAoOUWPuRkzMSqz5yca6rE/82SID6NSrWSLEphw2RK1NmM B46pRJYa5BM3lH63l1hn8ZhirF+5jUGAOpStJetZYufGlXPSUGk2dEYO8egS4eZX2o5j sAZpViB9wwPTuNkk8g7wqlyvLInM9PCNvJ4fFZIJQZfWe/9YvIIvxwcCIDvzxY63e1Lp BNJ7mgyMwZXQpvUf2EaSfsrqTBAXa+ggiU+7/xttortieuukj7mKoaJ72JRtMW5nKLaW JpBA== X-Gm-Message-State: ACrzQf1K5XRkEqIoW6ASoilxeUDXZcsdoLhWQ759HndWEQBpzOcqpaI4 rE97NfNTcLvzhNZX4K+u5lRoDQ== X-Google-Smtp-Source: AMsMyM4kAz2lVR1C/79vp4Oi3QMpMUX2nfG97kg6Cddp6uyGDHEJFCIH6C1EBXL9f5zb1oF1sFccIQ== X-Received: by 2002:a63:5761:0:b0:43c:5940:9176 with SMTP id h33-20020a635761000000b0043c59409176mr1548185pgm.65.1665695476439; Thu, 13 Oct 2022 14:11:16 -0700 (PDT) Received: from [192.168.50.116] (c-24-4-73-83.hsd1.ca.comcast.net. [24.4.73.83]) by smtp.gmail.com with ESMTPSA id p17-20020a170903249100b0017f57787a4asm264275plw.229.2022.10.13.14.11.14 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 13 Oct 2022 14:11:15 -0700 (PDT) Content-Type: multipart/alternative; boundary="------------eLX2kKcHjmt0Or7n2nB5vjsp" Message-ID: <9838fc0f-a519-bfd3-3eb2-88255742ae57@rivosinc.com> Date: Thu, 13 Oct 2022 14:11:13 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.2.2 Subject: Re: Fences/Barriers when mixing C++ atomics and non-atomics Content-Language: en-US To: Hans Boehm Cc: tech-unprivileged@lists.riscv.org, gcc@gcc.gnu.org, Hongyu Wang , Uros Bizjak References: <8c7380d2-2587-78c7-a85a-a4c8afef2284@rivosinc.com> From: Vineet Gupta In-Reply-To: X-Spam-Status: No, score=-2.7 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,HTML_MESSAGE,NICE_REPLY_A,RCVD_IN_BARRACUDACENTRAL,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: This is a multi-part message in MIME format. --------------eLX2kKcHjmt0Or7n2nB5vjsp Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Hi Hans, On 10/13/22 13:54, Hans Boehm wrote: > The generated code here is correct in both cases. In the RISC--V case, > I believe it is conservative, at a minimum, in that atomics should not > imply IO ordering. We had an earlier discussion, which seemed to have > consensus in favor of that opinion. I believe clang does not enforce > IO ordering. > > You can think of a "sequentially consistent" load roughly as enforcing > two properties: > > 1) It behaves as an "acquire" load. Later (in program order) memory > operations do not advance past it. This is implicit for x86. It > requires the trailing fence on RISC-V, which could probably be > weakened to r,rw. Acq implies later things won't leak out, but prior things could still leak-in, meaning prior write could happen after load which contradicts what user is asking by load(seq_cst) on x86 ? > > 2) It ensures that seq_cst operations are fully ordered. This means > that, in addition to (1), and the corresponding fence for stores, > every seq_cst store must be separated from a seq_cst load by at least > a w,r fence, so a seq_cst store followed by a seq_cst load is not > reordered. This makes sense when both store -> load are seq_cst. But the question is what happens when that store is non atomic. IOW if we had a store(relaxed) -> load(seq_cst) would the generated code still ensure that load had a full barrier to prevent > w,r fences are discouraged on RISC-V, and probably no better than > rw,rw, so that's how the leading fence got there. (Again the io > ordering should disappear. It's the responsibility of IO code to > insert that explicitly, rather than paying for it everywhere.) Thanks for explaining the RV semantics. > > x86 does (2) by associating that fence with stores instead of loads, > either by using explicit fences after stores, or by turning stores > into xchg. That makes sense as x86 has ld->ld and ld -> st architecturally ordered, so any fences ought to be associated with st. Thx, -Vineet > RISC-V could do the same. And I believe that if the current A > extension were the final word on the architecture, it should. But that > convention is not compatible with the later introduction of an > "acquire load", which I think is essential for performance, at least > on larger cores. So I think the two fence mapping for loads should be > maintained for now, as I suggested in the document I posted to the list. > > Hans > > On Thu, Oct 13, 2022 at 12:31 PM Vineet Gupta > wrote: > > Hi, > > I have a testcase (from real workloads) involving C++ atomics and > trying > to understand the codegen (gcc 12) for RVWMO and x86. > It does mix atomics with non-atomics so not obvious what the > behavior is > intended to be hence some explicit CC of subject matter experts > (apologies for that in advance). > > Test has a non-atomic store followed by an atomic_load(SEQ_CST). I > assume that unadorned direct access defaults to > safest/conservative seq_cst. > >     extern int g; >     std::atomic a; > >     int bar_noaccessor(int n, int *n2) >     { >          *n2 = g; >          return n + a; >     } > >     int bar_seqcst(int n, int *n2) >     { >          *n2 = g; >          return n + a.load(std::memory_order_seq_cst); >     } > > On RV (rvwmo), with current gcc 12 we get 2 full fences around the > load > as prescribed by Privileged Spec, Chpater A, Table A.6 (Mappings from > C/C++ to RISC-V primitives). > >     _Z10bar_seqcstiPi: >     .LFB382: >          .cfi_startproc >          lui    a5,%hi(g) >          lw    a5,%lo(g)(a5) >          sw    a5,0(a1) >     *fence    iorw,iorw* >          lui    a5,%hi(a) >          lw    a5,%lo(a)(a5) >     *fence    iorw,iorw* >          addw    a0,a5,a0 >          ret > > > OTOH, for x86 (same default toggles) there's no barriers at all. > >     _Z10bar_seqcstiPi: >          endbr64 >          movl    g(%rip), %eax >          movl    %eax, (%rsi) >          movl    a(%rip), %eax >          addl    %edi, %eax >          ret > > > My naive intuition was x86 TSO would require a fence before > load(seq_cst) for a prior store, even if that store was non > atomic, so > ensure load didn't bubble up ahead of store. > > Perhaps this begs the general question of intermixing non atomic > accesses with atomics and if that is undefined behavior or some > such. I > skimmed through C++14 specification chapter Atomic Operations library > but nothing's jumping out on the topic. > > Or is it much deeper, related to As-if rule or something. > > Thx, > -Vineet > --------------eLX2kKcHjmt0Or7n2nB5vjsp--