From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from xry111.site (xry111.site [89.208.246.23]) by sourceware.org (Postfix) with ESMTPS id BC7F93858D3C for ; Mon, 6 Nov 2023 11:40:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org BC7F93858D3C Authentication-Results: sourceware.org; dmarc=pass (p=reject dis=none) header.from=xry111.site Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=xry111.site ARC-Filter: OpenARC Filter v1.0.0 sourceware.org BC7F93858D3C Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=89.208.246.23 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699270830; cv=none; b=xevWRNe8hNeiIdVL/tFpSrLnb3+YVXmPzAvVpfTLyig9TagucsEgtq7J3XrxQKIlJa/vjqDJkj2RHXFAH+DA4YkM/odITj2RmgxGel6O5Jy4hcE5AHdWL3SZe28UVfcIMGmr+17KUkqfm5WPvY512wl8OnsY7JOGPixWL7LDm/Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699270830; c=relaxed/simple; bh=9jgdClSfQSRbzp81+ZUbqphEsxdiUhuzHZ7/Nd99V5Y=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=A4PcuKUglEPN/KlD+hpX6f1lCFGSdC5a94J9lVuq3Ju4bNzYvajt8R9daU5LqK7mhxCjoWRZ8rMMGRg7xwajYFY3XMeOHMIz+UYusLesvGPIgtwYd5ob/RVhimvyOp8ykRkrGsF4xIKbk1s+5nXazEmNfAZh89KwfIpp35oLqF8= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=xry111.site; s=default; t=1699270817; bh=9jgdClSfQSRbzp81+ZUbqphEsxdiUhuzHZ7/Nd99V5Y=; h=From:To:Cc:Subject:Date:From; b=W89FD8GCrLcApa7h+73S6FFjWzMImnm9KxijJcP7KO/4SRldr/K1jQB/S9IgOUhrp N2ZUYwIofEjC+DWB1M0nespGx1DJ08iIdfKi3G1Lm0z2Yf1B0Q3YjekwhfihbbrekS eRGe9l50iyK1cwSxmfQMzRwW0FltlITe7sFqCP6o= Received: from stargazer.. (unknown [113.140.11.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (Client did not present a certificate) (Authenticated sender: xry111@xry111.site) by xry111.site (Postfix) with ESMTPSA id ECA7866A03; Mon, 6 Nov 2023 06:40:15 -0500 (EST) From: Xi Ruoyao To: gcc-patches@gcc.gnu.org Cc: chenglulu , i@xen0n.name, xuchenghua@loongson.cn, hev , Xi Ruoyao Subject: [PATCH] LoongArch: Remove redundant barrier instructions before LL-SC loops Date: Mon, 6 Nov 2023 19:36:04 +0800 Message-ID: <20231106113809.1193236-1-xry111@xry111.site> X-Mailer: git-send-email 2.42.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,LIKELY_SPAM_FROM,RCVD_IN_BARRACUDACENTRAL,SPF_HELO_PASS,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: This is isomorphic to the LLVM changes [1-2]. On LoongArch, the LL and SC instructions has memory barrier semantics: - LL: + - SC: + But the compare and swap operation is allowed to fail, and if it fails the SC instruction is not executed, thus the guarantee of acquiring semantics cannot be ensured. Therefore, an acquire barrier needs to be generated when failure_memorder includes an acquire operation. On CPUs implementing LoongArch v1.10 or later, "dbar 0b10100" is an acquire barrier; on CPUs implementing LoongArch v1.00, it is a full barrier. So it's always enough for acquire semantics. OTOH if an acquire semantic is not needed, we still needs the "dbar 0x700" as the load-load barrier like all LL-SC loops. [1]:https://github.com/llvm/llvm-project/pull/67391 [2]:https://github.com/llvm/llvm-project/pull/69339 gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_memmodel_needs_release_fence): Remove. (loongarch_cas_failure_memorder_needs_acquire): New static function. (loongarch_print_operand): Redefine 'G' for the barrier on CAS failure. * config/loongarch/sync.md (atomic_cas_value_strong): Remove the redundant barrier before the LL instruction, and emit an acquire barrier on failure if needed by failure_memorder. (atomic_cas_value_cmp_and_7_): Likewise. (atomic_cas_value_add_7_): Remove the unnecessary barrier before the LL instruction. (atomic_cas_value_sub_7_): Likewise. (atomic_cas_value_and_7_): Likewise. (atomic_cas_value_xor_7_): Likewise. (atomic_cas_value_or_7_): Likewise. (atomic_cas_value_nand_7_): Likewise. (atomic_cas_value_exchange_7_): Likewise. gcc/testsuite/ChangeLog: * gcc.target/loongarch/cas-acquire.c: New test. --- Bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk and/or GCC 12/13 (for fixing the acquire semantics in failure_memorder)? gcc/config/loongarch/loongarch.cc | 27 +++--- gcc/config/loongarch/sync.md | 49 +++++------ .../gcc.target/loongarch/cas-acquire.c | 84 +++++++++++++++++++ 3 files changed, 118 insertions(+), 42 deletions(-) create mode 100644 gcc/testsuite/gcc.target/loongarch/cas-acquire.c diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index 9b63f0dc322..d9b7a1076a2 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -5833,25 +5833,22 @@ loongarch_memmodel_needs_rel_acq_fence (enum memmodel model) } } -/* Return true if a FENCE should be emitted to before a memory access to - implement the release portion of memory model MODEL. */ +/* Return true if a FENCE should be emitted after a failed CAS to + implement the acquire semantic of failure_memorder. */ static bool -loongarch_memmodel_needs_release_fence (enum memmodel model) +loongarch_cas_failure_memorder_needs_acquire (enum memmodel model) { - switch (model) + switch (memmodel_base (model)) { + case MEMMODEL_ACQUIRE: case MEMMODEL_ACQ_REL: + case MEMMODEL_CONSUME: case MEMMODEL_SEQ_CST: - case MEMMODEL_SYNC_SEQ_CST: - case MEMMODEL_RELEASE: - case MEMMODEL_SYNC_RELEASE: return true; - case MEMMODEL_ACQUIRE: - case MEMMODEL_CONSUME: - case MEMMODEL_SYNC_ACQUIRE: case MEMMODEL_RELAXED: + case MEMMODEL_RELEASE: return false; default: @@ -5966,7 +5963,8 @@ loongarch_print_operand_reloc (FILE *file, rtx op, bool hi64_part, 'd' Print CONST_INT OP in decimal. 'E' Print CONST_INT OP element 0 of a replicated CONST_VECTOR in decimal. 'F' Print the FPU branch condition for comparison OP. - 'G' Print a DBAR insn if the memory model requires a release. + 'G' Print a DBAR insn for CAS failure (with an acquire semantic if + needed, otherwise a simple load-load barrier). 'H' Print address 52-61bit relocation associated with OP. 'h' Print the high-part relocation associated with OP. 'i' Print i if the operand is not a register. @@ -6057,8 +6055,11 @@ loongarch_print_operand (FILE *file, rtx op, int letter) break; case 'G': - if (loongarch_memmodel_needs_release_fence ((enum memmodel) INTVAL (op))) - fputs ("dbar\t0", file); + if (loongarch_cas_failure_memorder_needs_acquire ( + memmodel_from_int (INTVAL (op)))) + fputs ("dbar\t0b10100", file); + else + fputs ("dbar\t0x700", file); break; case 'h': diff --git a/gcc/config/loongarch/sync.md b/gcc/config/loongarch/sync.md index 9924d522bcd..db3a21690b8 100644 --- a/gcc/config/loongarch/sync.md +++ b/gcc/config/loongarch/sync.md @@ -129,19 +129,18 @@ (define_insn "atomic_cas_value_strong" (clobber (match_scratch:GPR 6 "=&r"))] "" { - return "%G5\\n\\t" - "1:\\n\\t" + return "1:\\n\\t" "ll.\\t%0,%1\\n\\t" "bne\\t%0,%z2,2f\\n\\t" "or%i3\\t%6,$zero,%3\\n\\t" "sc.\\t%6,%1\\n\\t" - "beq\\t$zero,%6,1b\\n\\t" + "beqz\\t%6,1b\\n\\t" "b\\t3f\\n\\t" "2:\\n\\t" - "dbar\\t0x700\\n\\t" + "%G5\\n\\t" "3:\\n\\t"; } - [(set (attr "length") (const_int 32))]) + [(set (attr "length") (const_int 28))]) (define_expand "atomic_compare_and_swap" [(match_operand:SI 0 "register_operand" "") ;; bool output @@ -234,8 +233,7 @@ (define_insn "atomic_cas_value_cmp_and_7_" (clobber (match_scratch:GPR 7 "=&r"))] "" { - return "%G6\\n\\t" - "1:\\n\\t" + return "1:\\n\\t" "ll.\\t%0,%1\\n\\t" "and\\t%7,%0,%2\\n\\t" "bne\\t%7,%z4,2f\\n\\t" @@ -245,10 +243,10 @@ (define_insn "atomic_cas_value_cmp_and_7_" "beq\\t$zero,%7,1b\\n\\t" "b\\t3f\\n\\t" "2:\\n\\t" - "dbar\\t0x700\\n\\t" + "%G6\\n\\t" "3:\\n\\t"; } - [(set (attr "length") (const_int 40))]) + [(set (attr "length") (const_int 36))]) (define_expand "atomic_compare_and_swap" [(match_operand:SI 0 "register_operand" "") ;; bool output @@ -303,8 +301,7 @@ (define_insn "atomic_cas_value_add_7_" (clobber (match_scratch:GPR 8 "=&r"))] "" { - return "%G6\\n\\t" - "1:\\n\\t" + return "1:\\n\\t" "ll.\\t%0,%1\\n\\t" "and\\t%7,%0,%3\\n\\t" "add.w\\t%8,%0,%z5\\n\\t" @@ -314,7 +311,7 @@ (define_insn "atomic_cas_value_add_7_" "beq\\t$zero,%7,1b"; } - [(set (attr "length") (const_int 32))]) + [(set (attr "length") (const_int 28))]) (define_insn "atomic_cas_value_sub_7_" [(set (match_operand:GPR 0 "register_operand" "=&r") ;; res @@ -330,8 +327,7 @@ (define_insn "atomic_cas_value_sub_7_" (clobber (match_scratch:GPR 8 "=&r"))] "" { - return "%G6\\n\\t" - "1:\\n\\t" + return "1:\\n\\t" "ll.\\t%0,%1\\n\\t" "and\\t%7,%0,%3\\n\\t" "sub.w\\t%8,%0,%z5\\n\\t" @@ -340,7 +336,7 @@ (define_insn "atomic_cas_value_sub_7_" "sc.\\t%7,%1\\n\\t" "beq\\t$zero,%7,1b"; } - [(set (attr "length") (const_int 32))]) + [(set (attr "length") (const_int 28))]) (define_insn "atomic_cas_value_and_7_" [(set (match_operand:GPR 0 "register_operand" "=&r") ;; res @@ -356,8 +352,7 @@ (define_insn "atomic_cas_value_and_7_" (clobber (match_scratch:GPR 8 "=&r"))] "" { - return "%G6\\n\\t" - "1:\\n\\t" + return "1:\\n\\t" "ll.\\t%0,%1\\n\\t" "and\\t%7,%0,%3\\n\\t" "and\\t%8,%0,%z5\\n\\t" @@ -366,7 +361,7 @@ (define_insn "atomic_cas_value_and_7_" "sc.\\t%7,%1\\n\\t" "beq\\t$zero,%7,1b"; } - [(set (attr "length") (const_int 32))]) + [(set (attr "length") (const_int 28))]) (define_insn "atomic_cas_value_xor_7_" [(set (match_operand:GPR 0 "register_operand" "=&r") ;; res @@ -382,8 +377,7 @@ (define_insn "atomic_cas_value_xor_7_" (clobber (match_scratch:GPR 8 "=&r"))] "" { - return "%G6\\n\\t" - "1:\\n\\t" + return "1:\\n\\t" "ll.\\t%0,%1\\n\\t" "and\\t%7,%0,%3\\n\\t" "xor\\t%8,%0,%z5\\n\\t" @@ -393,7 +387,7 @@ (define_insn "atomic_cas_value_xor_7_" "beq\\t$zero,%7,1b"; } - [(set (attr "length") (const_int 32))]) + [(set (attr "length") (const_int 28))]) (define_insn "atomic_cas_value_or_7_" [(set (match_operand:GPR 0 "register_operand" "=&r") ;; res @@ -409,8 +403,7 @@ (define_insn "atomic_cas_value_or_7_" (clobber (match_scratch:GPR 8 "=&r"))] "" { - return "%G6\\n\\t" - "1:\\n\\t" + return "1:\\n\\t" "ll.\\t%0,%1\\n\\t" "and\\t%7,%0,%3\\n\\t" "or\\t%8,%0,%z5\\n\\t" @@ -420,7 +413,7 @@ (define_insn "atomic_cas_value_or_7_" "beq\\t$zero,%7,1b"; } - [(set (attr "length") (const_int 32))]) + [(set (attr "length") (const_int 28))]) (define_insn "atomic_cas_value_nand_7_" [(set (match_operand:GPR 0 "register_operand" "=&r") ;; res @@ -436,8 +429,7 @@ (define_insn "atomic_cas_value_nand_7_" (clobber (match_scratch:GPR 8 "=&r"))] "" { - return "%G6\\n\\t" - "1:\\n\\t" + return "1:\\n\\t" "ll.\\t%0,%1\\n\\t" "and\\t%7,%0,%3\\n\\t" "and\\t%8,%0,%z5\\n\\t" @@ -446,7 +438,7 @@ (define_insn "atomic_cas_value_nand_7_" "sc.\\t%7,%1\\n\\t" "beq\\t$zero,%7,1b"; } - [(set (attr "length") (const_int 32))]) + [(set (attr "length") (const_int 28))]) (define_insn "atomic_cas_value_exchange_7_" [(set (match_operand:GPR 0 "register_operand" "=&r") @@ -461,8 +453,7 @@ (define_insn "atomic_cas_value_exchange_7_" (clobber (match_scratch:GPR 7 "=&r"))] "" { - return "%G6\\n\\t" - "1:\\n\\t" + return "1:\\n\\t" "ll.\\t%0,%1\\n\\t" "and\\t%7,%0,%z3\\n\\t" "or%i5\\t%7,%7,%5\\n\\t" diff --git a/gcc/testsuite/gcc.target/loongarch/cas-acquire.c b/gcc/testsuite/gcc.target/loongarch/cas-acquire.c new file mode 100644 index 00000000000..cd3ba06e258 --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/cas-acquire.c @@ -0,0 +1,84 @@ +/* { dg-do run } */ +/* { dg-require-effective-target c99_runtime } */ +/* { dg-require-effective-target pthread } */ +/* { dg-options "-std=c99 -pthread" } */ + +/* https://github.com/llvm/llvm-project/pull/67391#issuecomment-1752403934 + reported that this had failed with GCC and 3A6000. */ + +#include +#include +#include +#include + +static unsigned int tags[32]; +static unsigned int vals[32]; + +static void * +writer_entry (void *data) +{ + atomic_uint *pt = (atomic_uint *)tags; + atomic_uint *pv = (atomic_uint *)vals; + unsigned int n; + + for (n = 1; n < 10000000; n++) + { + atomic_store_explicit (&pv[n & 31], n, memory_order_release); + atomic_store_explicit (&pt[n & 31], n, memory_order_release); + n++; + } + + return NULL; +} + +static void * +reader_entry (void *data) +{ + atomic_uint *pt = (atomic_uint *)tags; + atomic_uint *pv = (atomic_uint *)vals; + int i; + + for (;;) + { + for (i = 0; i < 32; i++) + { + unsigned int tag = 0; + bool res; + + res = atomic_compare_exchange_weak_explicit ( + &pt[i], &tag, 0, memory_order_acquire, memory_order_acquire); + if (!res) + { + unsigned int val; + + val = atomic_load_explicit (&pv[i], memory_order_relaxed); + if (val < tag) + abort (); + } + } + } + + return NULL; +} + +int +main (int argc, char *argv[]) +{ + pthread_t writer; + pthread_t reader; + int res; + + res = pthread_create (&writer, NULL, writer_entry, NULL); + if (res < 0) + abort (); + + res = pthread_create (&reader, NULL, reader_entry, NULL); + if (res < 0) + abort (); + + res = pthread_join (writer, NULL); + if (res < 0) + abort (); + + return 0; +} -- 2.42.1