From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm1-x32c.google.com (mail-wm1-x32c.google.com [IPv6:2a00:1450:4864:20::32c]) by sourceware.org (Postfix) with ESMTPS id 4EB303858D33 for ; Thu, 27 Apr 2023 17:20:20 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 4EB303858D33 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivosinc.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivosinc.com Received: by mail-wm1-x32c.google.com with SMTP id 5b1f17b1804b1-3f315712406so32567325e9.0 for ; Thu, 27 Apr 2023 10:20:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20221208.gappssmtp.com; s=20221208; t=1682616019; x=1685208019; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=MnvQVQPJ66343OTwrjrpfgMDMORp4+04yvOSvSGluSo=; b=bBUdtbXiS+AhyYFiJoBzxTQsaFeNrIaHLRdBlld29QC3TJeu0dzea+7+QKhtob2YFe Qi/8mIrg5q/VuZXPCNZh/E120IoVE9MZroh4lP+yJaUnVZOby4rq4EQL5IcrSW2AMK05 U9KdlN+Ufh2Q1mZ0DWSmV9ApCQ7Rr29yBUZJMKCr9msf4acfLK6uK53yWNOZLiUT34u8 VN5yzik0F8w5AIhnsQ2JHJyoUknxMRyOH7hXTZKjmO/pYGagpdHok6Di/fVYWYuqQ3b9 S6vHqVmNSN9cKLTfelLEPpzKCDbJN9OwyeT+m9SuH0woH8zNi4dSAPJNTbdMU6nTbUsb xEbg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682616019; x=1685208019; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=MnvQVQPJ66343OTwrjrpfgMDMORp4+04yvOSvSGluSo=; b=FIU3+qz3zhPTon1KotVes+4kQz33eWfjTnfScHkbZV86D/73qzJq5DWUQaCP1m/RF4 JW2bF2YAasb37nmX5tSWyP0BYK7AaY41rtJ7hdOn3Npq0QpxVCCvzbqcM0VnRqSyQH31 GyfXSj2okdKgwJ4Y6hUDkA8dXVP9WYOPi3Yo024fwcqvneRmauXsItDntMRP+QLUT3lC 8lt3b2KXFEXXPIfXfmCt23RnOqFF8+lOkrbAZ6H9uABaYkZMAO4pUQfcSfr/Jvsy3vTM Ud6ptJ+r4gRFAoEhrg3wVZbF6n1DjpBNwvGKAlBhRvvRtT5k8F47frvn8PeesjivQJs+ I+1g== X-Gm-Message-State: AC+VfDxhb/h7JhVlsUB5rXC/1VzzWzxwjVIoIL1GpmgDpJxghp73H8yA MAtUeLiBydsNfYdYKkZyK6X0MQ== X-Google-Smtp-Source: ACHHUZ6c7J4t3ibgS4R58JT0CCULWT1poHTEJmU6FQxUPSXJgc7m34SbaLooF8p26/2rxP4xuzxdhQ== X-Received: by 2002:adf:fa82:0:b0:2f7:625:7997 with SMTP id h2-20020adffa82000000b002f706257997mr4327238wrr.28.1682616018579; Thu, 27 Apr 2023 10:20:18 -0700 (PDT) Received: from andrea ([51.52.155.69]) by smtp.gmail.com with ESMTPSA id o4-20020a056000010400b002fa67f77c16sm18924986wrx.57.2023.04.27.10.20.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 27 Apr 2023 10:20:18 -0700 (PDT) Date: Thu, 27 Apr 2023 19:20:14 +0200 From: Andrea Parri To: Patrick O'Neill Cc: gcc-patches@gcc.gnu.org, palmer@rivosinc.com, gnu-toolchain@rivosinc.com, vineetg@rivosinc.com, andrew@sifive.com, kito.cheng@sifive.com, dlustig@nvidia.com, cmuellner@gcc.gnu.org, hboehm@google.com, jeffreyalaw@gmail.com Subject: Re: [PATCH v5 00/11] RISC-V: Implement ISA Manual Table A.6 Mappings Message-ID: References: <20230414170942.1695672-1-patrick@rivosinc.com> <20230427162301.1151333-1-patrick@rivosinc.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20230427162301.1151333-1-patrick@rivosinc.com> X-Spam-Status: No, score=-2.2 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Thu, Apr 27, 2023 at 09:22:50AM -0700, Patrick O'Neill wrote: > This patchset aims to make the RISCV atomics implementation stronger > than the recommended mapping present in table A.6 of the ISA manual. > https://github.com/riscv/riscv-isa-manual/blob/c7cf84547b3aefacab5463add1734c1602b67a49/src/memory.tex#L1083-L1157 > > Context > --------- > GCC defined RISC-V mappings [1] before the Memory Model task group > finalized their work and provided the ISA Manual Table A.6/A.7 mappings[2]. > > For at least a year now, we've known that the mappings were different, > but it wasn't clear if these unique mappings had correctness issues. > > Andrea Parri found an issue with the GCC mappings, showing that > atomic_compare_exchange_weak_explicit(-,-,-,release,relaxed) mappings do > not enforce release ordering guarantees. (Meaning the GCC mappings have > a correctness issue). > https://inbox.sourceware.org/gcc-patches/Y1GbJuhcBFpPGJQ0@andrea/ > > Why not A.6? > --------- > We can update our mappings now, so the obvious choice would be to > implement Table A.6 (what LLVM implements/ISA manual recommends). > > The reason why that isn't the best path forward for GCC is due to a > proposal by Hans Boehm to add L{d|w|b|h}.aq/rl and S{d|w|b|h}.aq/rl. > > For context, there is discussion about fast-tracking the addition of > these instructions. The RISCV architectural review committee supports > adopting a "new and common atomics ABI for gcc and LLVM toochains ... > that assumes the addition of the preceding instructions”. That common > ABI is likely to be A.7. > https://lists.riscv.org/g/tech-privileged/message/1284 > > Transitioning from A.6 to A.7 will cause an ABI break. We can hedge > against that risk by emitting a conservative fence after SEQ_CST stores > to make the mapping compatible with both A.6 and A.7. > > What does a mapping compatible with both A.6 & A.7 look like? > --------- > It is exactly the same as Table A.6, but SEQ_CST stores have a trailing > fence rw,rw. It's strictly stronger than Table A.6. > > Microbenchmark > --------- > Hans Boehm helpfully wrote a microbenchmark [3] that uses ARM to give a > rough estimate for the performance benefits/penalties of the different > mappings. The microbenchmark is single threaded and almost-write-only. > This case seems unlikely but is useful for getting a rough idea of the > workload that would be impacted the most. > > Testcases > ------- > Control: A simple volatile store. This is most similar to a relaxed > store. > Release Store: This is most similar to Sw.rl (one of the instructions in > Hans' proposal). > Store with release fence: This is most similar to the mapping present in > Table A.6. > Store with two fences: This is most similar to the compatibility mapping > present in this patchset. > > Machines > ------- > Intel(R) Core(TM) i7-8650U (sanity check only): x86 TSO > Cortex A53 (Raspberry pi): ARM In order core > Cortex A55 (Pixel 6 Pro): ARM In order core > Cortex A76 (Pixel 6 Pro): ARM Out of order core > Cortex X1 (Pixel 6 Pro): ARM Out of order core > > Microbenchmark Results [4] > -------- > Units are nsecs per iteration. > > Sanity check > Machine CONTROL REL_STORE STORE_REL_FENCE STORE_TWO_FENCE > ------- ------- --------- --------------- --------------- > Intel i7-8650U 1.34812 1.30038 1.2933 18.0474 > > > Machine CONTROL REL_STORE STORE_REL_FENCE STORE_TWO_FENCE > ------- ------- --------- --------------- --------------- > Cortex A53 7.15224 10.7282 7.15221 10.013 > Cortex A55 2.77965 8.89654 4.44787 7.78331 > Cortex A76 1.78021 1.86095 5.33088 8.88462 > Cortex X1 2.14252 2.14258 4.32982 7.05234 > > Reordered tests (using -r flag on microbenchmark) > Machine CONTROL REL_STORE STORE_REL_FENCE STORE_TWO_FENCE > ------- ------- --------- --------------- --------------- > Cortex A53 7.15227 10.7282 7.16113 10.034 > Cortex A55 2.78024 8.89574 4.44844 7.78428 > Cortex A76 1.77686 1.81081 5.3301 8.88346 > Cortex X1 2.14254 2.14251 4.3273 7.05239 > > Benchmark Interpretation > -------- > As expected, out of order machines are significantly faster with the > REL_STORE mappings. Unexpectedly, the in-order machines are > significantly slower with REL_STORE rather than REL_STORE_FENCE. > > Most machines in the wild are expected to use Table A.7 once the > instructions are introduced. > Incurring this added cost now will make it easier for compiled RISC-V > binaries to transition to the A.7 memory model mapping. > > The performance benefits of moving to A.7 can be more clearly seen using > an almost-all-load microbenchmark (included on page 3 of Hans’ > proposal). The code for that microbenchmark is attached below [5]. > https://lists.riscv.org/g/tech-unprivileged/attachment/382/0/load-acquire110422.pdf > https://lists.riscv.org/g/tech-unprivileged/topic/92916241 > > Caveats > -------- > This is a very synthetic microbenchmark that represents what is expected > to be a very unlikely workload. Nevertheless, it's helpful to see the > worst-case price we are paying for compatibility. > > “All times include an entire loop iteration, indirect dispatch and all. > The benchmark alternates tests, but does not lock CPU frequency. Since a > single core was in use, I expect this was running at basically full > speed. Any throttling affected everything more or less uniformly.” > - Hans Boehm > > Patchset overview > -------- > Patch 1 simplifies the memmodel to ignore MEMMODEL_SYNC_* cases (legacy > cases that aren't handled differently for RISC-V). > Patches 2-6 make the mappings strictly stronger. > Patches 7-9 weaken the mappings to be in line with table A.6 of the ISA > manual. > Patch 11 adds some basic conformance tests to ensure the implemented > mapping matches table A.6 with stronger SEQ_CST stores. > > Conformance test cases notes > -------- > The conformance tests in this patch are a good sanity check but do not > guarantee exactly following Table A.6. It checks that the right > instructions are emitted (ex. fence rw,r) but not the order of those > instructions. > > LLVM mapping notes > -------- > LLVM emits corresponding fences for atomic_signal_fence instructions. > This seems to be an oversight since AFAIK atomic_signal_fence acts as a > compiler directive. GCC does not emit any fences for atomic_signal_fence > instructions. > > Future work > -------- > There still remains some work to be done in this space after this > patchset fixes the correctness of the GCC mappings. > * Look into explicitly handling subword loads/stores. > * Look into using AMOSWAP.rl for store words/doubles. > * L{b|h|w|d}.aq/rl & S{b|h|w|d}.aq/rl support once ratified. > * zTSO mappings. > > Prior Patchsets > -------- > Patchset v1: > https://gcc.gnu.org/pipermail/gcc-patches/2022-April/592950.html > > Patchset v2: > https://gcc.gnu.org/pipermail/gcc-patches/2023-April/615264.html > > Patchset v3: > https://gcc.gnu.org/pipermail/gcc-patches/2023-April/615431.html > > Patchset v4: > https://gcc.gnu.org/pipermail/gcc-patches/2023-April/615748.html > > Changelogs > -------- > Changes for v2: > * Use memmodel_base rather than a custom simplify_memmodel function > (Inspired by Christoph Muellner's patch 1/9) > * Move instruction styling change from [v1 5/7] to [v2 3/8] to reduce > [v2 6/8]'s complexity > * Eliminated %K flag for atomic store introduced in v1 in favor of > if/else > * Rebase/test > > Changes for v3: > * Use a trailing fence for atomic stores to be compatible with table A.7 > * Emit an optimized fence r,rw following a SEQ_CST load > * Consolidate tests in [PATCH v3 10/10] > * Add tests for basic A.6 conformance > > Changes for v4: > * Update cover letter to cover more of the reasoning behind moving to a > compatibility mapping > * Improve conformance testcases patch assertions and add new > compare-exchange testcases > > Changes for v5: > * Update cover letter to cover more context and reasoning behind moving > to a compatibility mapping > * Rebase to include the subword-atomic cases introduced here: > https://gcc.gnu.org/pipermail/gcc-patches/2023-April/616080.html > * Add basic amo-add subword atomic testcases > * Reformat changelogs > * Fix misc. whitespace issues > > [1] GCC port with mappings merged 06 Feb 2017 > https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=09cae7507d9e88f2b05cf3a9404bf181e65ccbac > > [2] A.6 mappings added to ISA manual 12 Dec 2017 > https://github.com/riscv/riscv-isa-manual/commit/9da1a115bcc4fe327f35acceb851d4850d12e9fa > > [3] Hans Boehm almost-all-store Microbenchmark: > // Copyright 2023 Google LLC. > // SPDX-License-Identifier: Apache-2.0 > > #include > #include > #include > > static constexpr int INNER_ITERS = 10'000'000; > static constexpr int OUTER_ITERS = 20; > static constexpr int N_TESTS = 4; > > volatile int the_volatile(17); > std::atomic the_atomic(17); > > void test1(int i) { > the_volatile = i; > } > > void test2(int i) { > the_atomic.store(i, std::memory_order_release); > } > > void test3(int i) { > atomic_thread_fence(std::memory_order_release); > the_atomic.store(i, std::memory_order_relaxed); > } > > void test4(int i) { > atomic_thread_fence(std::memory_order_release); > the_atomic.store(i, std::memory_order_relaxed); > atomic_thread_fence(std::memory_order_seq_cst); > } > > typedef void (*int_func)(int); > > uint64_t getnanos() { > struct timespec result; > if (clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &result) != 0) { > std::cerr << "clock_gettime() failed\n"; > exit(1); > } > return (uint64_t)result.tv_nsec + 1'000'000'000 * (uint64_t)result.tv_sec; > } > > int_func tests[N_TESTS] = { test1, test2, test3, test4 }; > const char *test_names[N_TESTS] = > { "control", "release store", "store with release fence", "store with two fences" }; > uint64_t total_time[N_TESTS] = { 0 }; > > int main(int argc, char **argv) { > struct timespec res; > if (clock_getres(CLOCK_PROCESS_CPUTIME_ID, &res) != 0) { > std::cerr << "clock_getres() failed\n"; > exit(1); > } else { > std::cout << "nsec resolution = " << res.tv_nsec << std::endl; > } > if (argc == 2 && argv[1][0] == 'r') { > // Run tests in reverse order. > for (int i = 0; i < N_TESTS / 2; ++i) { > std::swap(tests[i], tests[N_TESTS - 1 - i]); > std::swap(test_names[i], test_names[N_TESTS - 1 - i]); > } > } > for (int i = 0; i < OUTER_ITERS; ++i) { > // Alternate tests to minimize bias due to thermal throttling. > for (int j = 0; j < N_TESTS; ++j) { > uint64_t start_time = getnanos(); > for (int k = 1; k <= INNER_ITERS; ++k) { > tests[j](k); // Provides memory accesses between tests. > } > // Ignore first iteration for all tests. The first iteration of the first test is > // empirically slightly slower. > if (i != 0) { > total_time[j] += getnanos() - start_time; > } > if ((tests[j] == test1 ? the_volatile : the_atomic.load()) != INNER_ITERS) { > std::cerr << "result check failed, test = " << j << ", " << the_volatile << std::endl; > exit(1); > } > } > } > for (int i = 0; i < N_TESTS; ++i) { > double nsecs_per_iter = (double) total_time[i] / INNER_ITERS / (OUTER_ITERS - 1); > std::cout << test_names[i] << " took " << nsecs_per_iter << " nseconds per iteration\n"; > } > exit(0); > } > > [4] Hans Boehm Raw Microbenchmark Results > Intel(R) Core(TM) i7-8650U (sanity check only): > > hboehm@hboehm-glaptop0:~/tests$ ./a.out > nsec resolution = 1 > control took 1.34812 nseconds per iteration > release store took 1.30038 nseconds per iteration > store with release fence took 1.2933 nseconds per iteration > store with two fences took 18.0474 nseconds per iteration > > Cortex A53 (Raspberry pi) > hboehm@rpi3-20210823:~/tests$ ./a.out > nsec resolution = 1 > control took 7.15224 nseconds per iteration > release store took 10.7282 nseconds per iteration > store with release fence took 7.15221 nseconds per iteration > store with two fences took 10.013 nseconds per iteration > hboehm@rpi3-20210823:~/tests$ ./a.out -r > nsec resolution = 1 > control took 7.15227 nseconds per iteration > release store took 10.7282 nseconds per iteration > store with release fence took 7.16133 nseconds per iteration > store with two fences took 10.034 nseconds per iteration > > Cortex A55 (Pixel 6 Pro) > > raven:/data/tmp # taskset 0f ./release-timer > nsec resolution = 1 > control took 2.77965 nseconds per iteration > release store took 8.89654 nseconds per iteration > store with release fence took 4.44787 nseconds per iteration > store with two fences took 7.78331 nseconds per iteration > raven:/data/tmp # taskset 0f ./release-timer -r > nsec resolution = 1 > control took 2.78024 nseconds per iteration > release store took 8.89574 nseconds per iteration > store with release fence took 4.44844 nseconds per iteration > store with two fences took 7.78428 nseconds per iteration > > Cortex A76 (Pixel 6 Pro) > raven:/data/tmp # taskset 30 ./release-timer -r > nsec resolution = 1 > control took 1.77686 nseconds per iteration > release store took 1.81081 nseconds per iteration > store with release fence took 5.3301 nseconds per iteration > store with two fences took 8.88346 nseconds per iteration > raven:/data/tmp # taskset 30 ./release-timer > nsec resolution = 1 > control took 1.78021 nseconds per iteration > release store took 1.86095 nseconds per iteration > store with release fence took 5.33088 nseconds per iteration > store with two fences took 8.88462 nseconds per iteration > > Cortex X1 (Pixel 6 Pro) > raven:/data/tmp # taskset c0 ./release-timer > nsec resolution = 1 > control took 2.14252 nseconds per iteration > release store took 2.14258 nseconds per iteration > store with release fence took 4.32982 nseconds per iteration > store with two fences took 7.05234 nseconds per iteration > raven:/data/tmp # taskset c0 ./release-timer -r > nsec resolution = 1 > control took 2.14254 nseconds per iteration > release store took 2.14251 nseconds per iteration > store with release fence took 4.3273 nseconds per iteration > store with two fences took 7.05239 nseconds per iteration > > [5] Hans Boehm almost-all-load Microbenchmark: > // Copyright 2023 Google LLC. > // SPDX-License-Identifier: Apache-2.0 > > #include > #include > #include > > static constexpr int INNER_ITERS = 10'000'000; > static constexpr int OUTER_ITERS = 20; > static constexpr int N_TESTS = 4; > > volatile int the_volatile(17); > std::atomic the_atomic(17); > > int test1() { > return the_volatile; > } > > int test2() { > return the_atomic.load(std::memory_order_acquire); > } > > int test3() { > int result = the_atomic.load(std::memory_order_relaxed); > atomic_thread_fence(std::memory_order_acquire); > return result; > } > > int test4() { > atomic_thread_fence(std::memory_order_seq_cst); > int result = the_atomic.load(std::memory_order_relaxed); > atomic_thread_fence(std::memory_order_acquire); > return result; > } > > typedef int (*int_func)(); > > uint64_t getnanos() { > struct timespec result; > if (clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &result) != 0) { > std::cerr << "clock_gettime() failed\n"; > exit(1); > } > return (uint64_t)result.tv_nsec + 1'000'000'000 * (uint64_t)result.tv_sec; > } > > int_func tests[N_TESTS] = { test1, test2, test3, test4 }; > const char *test_names[N_TESTS] = > { "control", "acquire load", "load with acquire fence", "load with two fences" }; > uint64_t total_time[N_TESTS] = { 0 }; > > uint sum, last_sum = 0; > > int main(int argc, char **argv) { > struct timespec res; > if (clock_getres(CLOCK_PROCESS_CPUTIME_ID, &res) != 0) { > std::cerr << "clock_getres() failed\n"; > exit(1); > } else { > std::cout << "nsec resolution = " << res.tv_nsec << std::endl; > } > if (argc == 2 && argv[1][0] == 'r') { > // Run tests in reverse order. > for (int i = 0; i < N_TESTS / 2; ++i) { > std::swap(tests[i], tests[N_TESTS - 1 - i]); > std::swap(test_names[i], test_names[N_TESTS - 1 - i]); > } > } > for (int i = 0; i < OUTER_ITERS; ++i) { > // Alternate tests to minimize bias due to thermal throttling. > for (int j = 0; j < N_TESTS; ++j) { > sum = 0; > uint64_t start_time = getnanos(); > for (int k = 0; k < INNER_ITERS; ++k) { > sum += tests[j](); // Provides memory accesses between tests. > } > // Ignore first iteration for all tests. The first iteration of the first test is > // empirically slightly slower. > if (i != 0) { > total_time[j] += getnanos() - start_time; > } > if (sum == 0 || last_sum != 0 && sum != last_sum) { > std::cerr << "result check failed"; > exit(1); > } > last_sum = sum; > } > } > for (int i = 0; i < N_TESTS; ++i) { > double nsecs_per_iter = (double) total_time[i] / INNER_ITERS / (OUTER_ITERS - 1); > std::cout << test_names[i] << " took " << nsecs_per_iter << " nseconds per iteration\n"; > } > exit(0); > } > > Patrick O'Neill (11): > RISC-V: Eliminate SYNC memory models > RISC-V: Enforce Libatomic LR/SC SEQ_CST > RISC-V: Enforce subword atomic LR/SC SEQ_CST > RISC-V: Enforce atomic compare_exchange SEQ_CST > RISC-V: Add AMO release bits > RISC-V: Strengthen atomic stores > RISC-V: Eliminate AMO op fences > RISC-V: Weaken LR/SC pairs > RISC-V: Weaken mem_thread_fence > RISC-V: Weaken atomic loads > RISC-V: Table A.6 conformance tests > > gcc/config/riscv/riscv-protos.h | 3 + > gcc/config/riscv/riscv.cc | 66 ++++-- > gcc/config/riscv/sync.md | 194 ++++++++++++------ > .../riscv/amo-table-a-6-amo-add-1.c | 8 + > .../riscv/amo-table-a-6-amo-add-2.c | 8 + > .../riscv/amo-table-a-6-amo-add-3.c | 8 + > .../riscv/amo-table-a-6-amo-add-4.c | 8 + > .../riscv/amo-table-a-6-amo-add-5.c | 8 + > .../riscv/amo-table-a-6-compare-exchange-1.c | 10 + > .../riscv/amo-table-a-6-compare-exchange-2.c | 10 + > .../riscv/amo-table-a-6-compare-exchange-3.c | 10 + > .../riscv/amo-table-a-6-compare-exchange-4.c | 10 + > .../riscv/amo-table-a-6-compare-exchange-5.c | 10 + > .../riscv/amo-table-a-6-compare-exchange-6.c | 11 + > .../riscv/amo-table-a-6-compare-exchange-7.c | 10 + > .../gcc.target/riscv/amo-table-a-6-fence-1.c | 8 + > .../gcc.target/riscv/amo-table-a-6-fence-2.c | 10 + > .../gcc.target/riscv/amo-table-a-6-fence-3.c | 10 + > .../gcc.target/riscv/amo-table-a-6-fence-4.c | 10 + > .../gcc.target/riscv/amo-table-a-6-fence-5.c | 10 + > .../gcc.target/riscv/amo-table-a-6-load-1.c | 9 + > .../gcc.target/riscv/amo-table-a-6-load-2.c | 11 + > .../gcc.target/riscv/amo-table-a-6-load-3.c | 11 + > .../gcc.target/riscv/amo-table-a-6-store-1.c | 9 + > .../gcc.target/riscv/amo-table-a-6-store-2.c | 11 + > .../riscv/amo-table-a-6-store-compat-3.c | 11 + > .../riscv/amo-table-a-6-subword-amo-add-1.c | 9 + > .../riscv/amo-table-a-6-subword-amo-add-2.c | 9 + > .../riscv/amo-table-a-6-subword-amo-add-3.c | 9 + > .../riscv/amo-table-a-6-subword-amo-add-4.c | 9 + > .../riscv/amo-table-a-6-subword-amo-add-5.c | 9 + > gcc/testsuite/gcc.target/riscv/pr89835.c | 9 + > libgcc/config/riscv/atomic.c | 4 +- > 33 files changed, 467 insertions(+), 75 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-amo-add-1.c > create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-amo-add-2.c > create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-amo-add-3.c > create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-amo-add-4.c > create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-amo-add-5.c > create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-compare-exchange-1.c > create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-compare-exchange-2.c > create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-compare-exchange-3.c > create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-compare-exchange-4.c > create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-compare-exchange-5.c > create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-compare-exchange-6.c > create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-compare-exchange-7.c > create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-fence-1.c > create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-fence-2.c > create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-fence-3.c > create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-fence-4.c > create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-fence-5.c > create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-load-1.c > create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-load-2.c > create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-load-3.c > create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-store-1.c > create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-store-2.c > create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-store-compat-3.c > create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-subword-amo-add-1.c > create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-subword-amo-add-2.c > create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-subword-amo-add-3.c > create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-subword-amo-add-4.c > create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-subword-amo-add-5.c > create mode 100644 gcc/testsuite/gcc.target/riscv/pr89835.c These changes address and fix all the issues I reported/I'm aware of, thank you! Tested-by: Andrea Parri Andrea