From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf1-x42c.google.com (mail-pf1-x42c.google.com [IPv6:2607:f8b0:4864:20::42c]) by sourceware.org (Postfix) with ESMTPS id 6D2B53858CDB for ; Fri, 28 Apr 2023 16:29:32 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 6D2B53858CDB Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivosinc.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivosinc.com Received: by mail-pf1-x42c.google.com with SMTP id d2e1a72fcca58-63b70f0b320so244143b3a.1 for ; Fri, 28 Apr 2023 09:29:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20221208.gappssmtp.com; s=20221208; t=1682699371; x=1685291371; h=content-transfer-encoding:mime-version:message-id:to:from:cc :in-reply-to:subject:date:from:to:cc:subject:date:message-id :reply-to; bh=hSeSrfnngz094F5ACEf/gn0YJ8nsREoYgZu0ylM9bOc=; b=PMe+CEQljsLB4154q4/JiuB9lz7N622u3dGJxJTMeU9LC4KFo1pOXDTi6Dl6edXELt 4uqagzSyfP+uXqjOhVp4hmppxh2WMxdVTGCsrRoxC6KO4hVJ2lku8wCTzi5Jddu2L4tz zT8nXQI4iWXUw/e7HrSjHmtzUGM5xFAw3wXBykI4+fPvyvNxIb6bltsF/b0GTsqT/kpG 05afRz6knre/9k47L5buT46LFx/A9gWE8HNiFiDvALWkFNCAJ+ok7yYpa8vUxfMBwm7b PuOOmcpLAuiXKmDz1+KuUZGkyA9qV3+h1pziy3D8hfCd0WN90PTZJ9BiKY3s1YZBAJMp Bh+g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682699371; x=1685291371; h=content-transfer-encoding:mime-version:message-id:to:from:cc :in-reply-to:subject:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=hSeSrfnngz094F5ACEf/gn0YJ8nsREoYgZu0ylM9bOc=; b=AMN1FJoviHNtb/oNMoPcwxoXEva0ZCsymPO74OEMrOqjke5O3I14FL/EtFo2a7YF7I R6zm/KAR+Oe+NKKA3M5hnSCr7qgL6fIrBzoxnhi6oeFG8Wj/9VJmaCARp0Sz0AYtQE31 NRGL9GPX5t7JpwI9oqueB1frD9RjhYOQM0XlTd9g//4iWvm2pWDxKfixsavJ5+jo9KAD 6YKYpiDWo7ZC8v6yH9ozx4GyYo5dRZKmRlzMb4o52rs9eBclzBI/Xn0wUZhripwY+IYc QyUJZjVqG8gzUYOFvMI86dx9W/fC5ETmLVyUsDxJ2k9atNZfuXuPEgevR5cIsegWWfrc CRVA== X-Gm-Message-State: AC+VfDy9mT+zNsOydjndeqei6jsKdWgDSnI7/3nk8GrBCD27DgjdkuAG tnFAnke7THczllmZ32C/4+mFww== X-Google-Smtp-Source: ACHHUZ5aP4GiEEEs80Y7BvA+0q4dXJlwRkTrtozdBHLyvIshsqpAEuX90O2O+IF/Yf1Xrl43w0B7qw== X-Received: by 2002:a05:6a21:3945:b0:ee:d553:5cee with SMTP id ac5-20020a056a21394500b000eed5535ceemr6479086pzc.16.1682699371094; Fri, 28 Apr 2023 09:29:31 -0700 (PDT) Received: from localhost ([135.180.227.0]) by smtp.gmail.com with ESMTPSA id fc9-20020a056a002e0900b0062e63cdfcb6sm15949504pfb.94.2023.04.28.09.29.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 28 Apr 2023 09:29:30 -0700 (PDT) Date: Fri, 28 Apr 2023 09:29:30 -0700 (PDT) X-Google-Original-Date: Fri, 28 Apr 2023 09:29:28 PDT (-0700) Subject: Re: [PATCH v5 00/11] RISC-V: Implement ISA Manual Table A.6 Mappings In-Reply-To: CC: Patrick O'Neill , gcc-patches@gcc.gnu.org, gnu-toolchain@rivosinc.com, Vineet Gupta , Andrew Waterman , kito.cheng@sifive.com, Daniel Lustig , cmuellner@gcc.gnu.org, Andrea Parri , hboehm@google.com From: Palmer Dabbelt To: jeffreyalaw@gmail.com Message-ID: Mime-Version: 1.0 (MHng) Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-5.2 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Fri, 28 Apr 2023 09:14:00 PDT (-0700), jeffreyalaw@gmail.com wrote: > > > On 4/27/23 10:22, Patrick O'Neill wrote: >> This patchset aims to make the RISCV atomics implementation stronger >> than the recommended mapping present in table A.6 of the ISA manual. >> https://github.com/riscv/riscv-isa-manual/blob/c7cf84547b3aefacab5463add1734c1602b67a49/src/memory.tex#L1083-L1157 >> >> Context >> --------- >> GCC defined RISC-V mappings [1] before the Memory Model task group >> finalized their work and provided the ISA Manual Table A.6/A.7 mappings[2]. >> >> For at least a year now, we've known that the mappings were different, >> but it wasn't clear if these unique mappings had correctness issues. >> >> Andrea Parri found an issue with the GCC mappings, showing that >> atomic_compare_exchange_weak_explicit(-,-,-,release,relaxed) mappings do >> not enforce release ordering guarantees. (Meaning the GCC mappings have >> a correctness issue). >> https://inbox.sourceware.org/gcc-patches/Y1GbJuhcBFpPGJQ0@andrea/ > Right. I recall this discussion, but thanks for the back reference. Yep, and it's an important one: that's why we're calling the change a bug fix and dropping the current GCC mappings. If we didn't have the bug we'd be talking about an ABI break, and since the GCC mappings predate the ISA mappings we'd likely need an additional compatibility mode. So I guess we're lucky that we have a concurrency bug. I think it's the first time I've said that ;) >> Why not A.6? >> --------- >> We can update our mappings now, so the obvious choice would be to >> implement Table A.6 (what LLVM implements/ISA manual recommends). >> >> The reason why that isn't the best path forward for GCC is due to a >> proposal by Hans Boehm to add L{d|w|b|h}.aq/rl and S{d|w|b|h}.aq/rl. >> >> For context, there is discussion about fast-tracking the addition of >> these instructions. The RISCV architectural review committee supports >> adopting a "new and common atomics ABI for gcc and LLVM toochains ... >> that assumes the addition of the preceding instructions”. That common >> ABI is likely to be A.7. >> https://lists.riscv.org/g/tech-privileged/message/1284 >> >> Transitioning from A.6 to A.7 will cause an ABI break. We can hedge >> against that risk by emitting a conservative fence after SEQ_CST stores >> to make the mapping compatible with both A.6 and A.7. > So I like that we can have compatible sequences across A.6 and A.7. Of > course the concern is performance ;-) > > >> >> What does a mapping compatible with both A.6 & A.7 look like? >> --------- >> It is exactly the same as Table A.6, but SEQ_CST stores have a trailing >> fence rw,rw. It's strictly stronger than Table A.6. > Right. So my worry here is silicon that is either already available or > coming online shortly. Those implementations simply aren't going to be > able to use the A.7 mapping, so they pay a penalty. Does it make sense > to have the compatibility fences conditional? IIRC this was discussed somewhere in some thread, but I think there's really three ABIs that could be implemented here (ignoring the current GCC mappings as they're broken): * ABI compatible with the current mappings in the ISA manual (A.6). This will presumably perform best on extant hardware, given that it's what the words in the PDF say to do. * ABI compatible with the proposed mappings for the ISA manual (A.7). This may perform better on new hardware. * ABI compatible with both A.6 and A.7. This is likely slow on both new and old hardware, but allows cross-linking. If there's no performance issues this would be the only mode we need, but that seems unlikely. IMO those should be encoded somewhere in the ELF. I'd just do it as two bits in the header, but last time I proposed header bits the psABI folks wanted to do something different. I don't think where we encode this matters all that much, but if we're doing to treat these as real long-term ABIs we should have some way to encode that. There's also the orthogonal axis of whether we use the new instructions. Those aren't in specs yet so I think we can hold off on them for a bit, but they're the whole point of doing the ABI break so we should at least think them over. I think we're OK because we've just split out the ABI from the ISA here, but I'm not sure if I'm missing something. Now that I wrote that, though, I remember talking to Patrick about it and we drew a bunch of stuff on the whiteboard and then got confused. So sorry if I'm just out of the loop here... > > > >> >> Benchmark Interpretation >> -------- >> As expected, out of order machines are significantly faster with the >> REL_STORE mappings. Unexpectedly, the in-order machines are >> significantly slower with REL_STORE rather than REL_STORE_FENCE. > Yea, that's a bit of a surprise. > >> >> Most machines in the wild are expected to use Table A.7 once the >> instructions are introduced. >> Incurring this added cost now will make it easier for compiled RISC-V >> binaries to transition to the A.7 memory model mapping. >> >> The performance benefits of moving to A.7 can be more clearly seen using >> an almost-all-load microbenchmark (included on page 3 of Hans’ >> proposal). The code for that microbenchmark is attached below [5]. >> https://lists.riscv.org/g/tech-unprivileged/attachment/382/0/load-acquire110422.pdf >> https://lists.riscv.org/g/tech-unprivileged/topic/92916241 > Yea. I'm not questioning the value of the new instructions that are on > the horizon, just the value of trying to make everything A.7 compatible. > > >> >> Conformance test cases notes >> -------- >> The conformance tests in this patch are a good sanity check but do not >> guarantee exactly following Table A.6. It checks that the right >> instructions are emitted (ex. fence rw,r) but not the order of those >> instructions. > Note there is a way to check ordering as well. You might look at the > check-function-bodies approach. I think there are some recent examples > in the gcc risc-v specific tests. > > >> >> LLVM mapping notes >> -------- >> LLVM emits corresponding fences for atomic_signal_fence instructions. >> This seems to be an oversight since AFAIK atomic_signal_fence acts as a >> compiler directive. GCC does not emit any fences for atomic_signal_fence >> instructions. > This starts to touch on a larger concern. Specifically I'd really like > the two compilers to be compatible in terms of the code they generate > for the various atomics. > > What I worry about is code being written (by design or accident) that is > dependent on the particular behavior of one compiler and then if that > code gets built with the other compiler, and we end up different > behavior. Worse yet, if/when this happens, it's likely to be tough to > expose, reproduce & debug. > > Do you have any sense of where Clang/LLVM is going to go WRT providing > an A.6 mapping that is compatible with A.7 by using the additional fences? > > > Jeff