public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] LoongArch: Use finer-grained DBAR hints
@ 2023-11-13 23:18 Xi Ruoyao
  2023-11-14  2:26 ` chenglulu
  0 siblings, 1 reply; 4+ messages in thread
From: Xi Ruoyao @ 2023-11-13 23:18 UTC (permalink / raw)
  To: gcc-patches; +Cc: chenglulu, i, xuchenghua, Xi Ruoyao

LA664 defines DBAR hints 0x1 - 0x1f (except 0xf and 0x1f) as follows [1-2]:

- Bit 4: kind of constraint (0: completion, 1: ordering)
- Bit 3: barrier for previous read (0: true, 1: false)
- Bit 2: barrier for previous write (0: true, 1: false)
- Bit 1: barrier for succeeding read (0: true, 1: false)
- Bit 0: barrier for succeeding write (0: true, 1: false)

LLVM has already utilized them for different memory orders [3]:

- Bit 4 is always set to one because it's only intended to be zero for
  things like MMIO devices, which are out of the scope of memory orders.
- An acquire barrier is used to implement acquire loads like

    ld.d $a1, $t0, 0
    dbar acquire_hint

  where the load operation (ld.d) should not be reordered with any load
  or store operation after the acquire load.  To accomplish this
  constraint, we need to prevent the load operation from being reordered
  after the barrier, and also prevent any following load/store operation
  from being reordered before the barrier.  Thus bits 0, 1, and 3 must
  be zero, and bit 2 can be one, so acquire_hint should be 0b10100.
- An release barrier is used to implement release stores like

    dbar release_hint
    st.d $a1, $t0, 0

  where the store operation (st.d) should not be reordered with any load
  or store operation before the release store.  So we need to prevent
  the store operation from being reordered before the barrier, and also
  prevent any preceding load/store operation from being reordered after
  the barrier.  So bits 0, 2, 3 must be zero, and bit 1 can be one.  So
  release_hint should be 0b10010.

A similar mapping has been utilized for RISC-V GCC [4], LoongArch Linux
kernel [1], and LoongArch LLVM [3].  So the mapping should be correct.
And I've also bootstrapped & regtested GCC on a LA664 with this patch.

The LoongArch CPUs should treat "unknown" hints as dbar 0, so we can
unconditionally emit the new hints without a compiler switch.

[1]: https://git.kernel.org/torvalds/c/e031a5f3f1ed
[2]: https://github.com/loongson-community/docs/pull/12
[3]: https://github.com/llvm/llvm-project/pull/68787
[4]: https://gcc.gnu.org/r14-406

gcc/ChangeLog:

	* config/loongarch/sync.md (mem_thread_fence): Remove redundant
	check.
	(mem_thread_fence_1): Emit finer-grained DBAR hints for
	different memory models, instead of 0.
---

Bootstrapped and regtested on loongarch64-linux-gnu (running on a
LA664).  Ok for trunk?

 gcc/config/loongarch/sync.md | 49 +++++++++++++++++++++++++++++-------
 1 file changed, 40 insertions(+), 9 deletions(-)

diff --git a/gcc/config/loongarch/sync.md b/gcc/config/loongarch/sync.md
index db3a21690b8..511aba5ffa6 100644
--- a/gcc/config/loongarch/sync.md
+++ b/gcc/config/loongarch/sync.md
@@ -50,23 +50,54 @@ (define_expand "mem_thread_fence"
   [(match_operand:SI 0 "const_int_operand" "")] ;; model
   ""
 {
-  if (INTVAL (operands[0]) != MEMMODEL_RELAXED)
-    {
-      rtx mem = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode));
-      MEM_VOLATILE_P (mem) = 1;
-      emit_insn (gen_mem_thread_fence_1 (mem, operands[0]));
-    }
+  rtx mem = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode));
+  MEM_VOLATILE_P (mem) = 1;
+  emit_insn (gen_mem_thread_fence_1 (mem, operands[0]));
+
   DONE;
 })
 
-;; Until the LoongArch memory model (hence its mapping from C++) is finalized,
-;; conservatively emit a full FENCE.
+;; DBAR hint encoding for LA664 and later micro-architectures, paraphrased from
+;; the Linux patch revealing it [1]:
+;;
+;; - Bit 4: kind of constraint (0: completion, 1: ordering)
+;; - Bit 3: barrier for previous read (0: true, 1: false)
+;; - Bit 2: barrier for previous write (0: true, 1: false)
+;; - Bit 1: barrier for succeeding read (0: true, 1: false)
+;; - Bit 0: barrier for succeeding write (0: true, 1: false)
+;;
+;; [1]: https://git.kernel.org/torvalds/c/e031a5f3f1ed
+;;
+;; Implementations without support for the finer-granularity hints simply treat
+;; all as the full barrier (DBAR 0), so we can unconditionally start emiting the
+;; more precise hints right away.
 (define_insn "mem_thread_fence_1"
   [(set (match_operand:BLK 0 "" "")
 	(unspec:BLK [(match_dup 0)] UNSPEC_MEMORY_BARRIER))
    (match_operand:SI 1 "const_int_operand" "")] ;; model
   ""
-  "dbar\t0")
+  {
+    enum memmodel model = memmodel_base (INTVAL (operands[1]));
+
+    switch (model)
+      {
+      case MEMMODEL_ACQUIRE:
+      case MEMMODEL_CONSUME:
+	/* Consume is implemented using the stronger acquire memory order
+	   because of a deficiency in C++11's semantics.  */
+	return "dbar\t0b10100";
+      case MEMMODEL_RELEASE:
+	return "dbar\t0b10010";
+      case MEMMODEL_ACQ_REL:
+      case MEMMODEL_SEQ_CST:
+	return "dbar\t0b10000";
+      default:
+	/* GCC internal: "For the '__ATOMIC_RELAXED' model no instructions
+	   need to be issued and this expansion is not invoked."
+	   Other values should not be returned by memmodel_base.  */
+	gcc_unreachable ();
+      }
+  })
 
 ;; Atomic memory operations.
 
-- 
2.42.1


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] LoongArch: Use finer-grained DBAR hints
  2023-11-13 23:18 [PATCH] LoongArch: Use finer-grained DBAR hints Xi Ruoyao
@ 2023-11-14  2:26 ` chenglulu
  2023-11-14  8:34   ` Pushed: [PATCH v2] " Xi Ruoyao
  0 siblings, 1 reply; 4+ messages in thread
From: chenglulu @ 2023-11-14  2:26 UTC (permalink / raw)
  To: Xi Ruoyao, gcc-patches; +Cc: i, xuchenghua

[-- Attachment #1: Type: text/plain, Size: 1494 bytes --]


在 2023/11/14 上午7:18, Xi Ruoyao 写道:
/* snip */
>   (define_insn "mem_thread_fence_1"
>     [(set (match_operand:BLK 0 "" "")
>   	(unspec:BLK [(match_dup 0)] UNSPEC_MEMORY_BARRIER))
>      (match_operand:SI 1 "const_int_operand" "")] ;; model
>     ""
> -  "dbar\t0")
> +  {
> +    enum memmodel model = memmodel_base (INTVAL (operands[1]));
> +
> +    switch (model)
> +      {
> +      case MEMMODEL_ACQUIRE:
> +      case MEMMODEL_CONSUME:

Hi,

  * Before calling this template, the function get_memmodel is called to
    process memmodel, which has a piece of code:
  *

    /* Workaround for Bugzilla 59448. GCC doesn't track consume
    properly, so be conservative and promote consume to acquire. */ if
    (val == MEMMODEL_CONSUME) val = MEMMODEL_ACQUIRE;

  *


  * So I think MEMMODEL_CONSUME don't need to be processed here either.

Otherwise is OK.

  * Thanks.

> +	/* Consume is implemented using the stronger acquire memory order
> +	   because of a deficiency in C++11's semantics.  */
> +	return "dbar\t0b10100";
> +      case MEMMODEL_RELEASE:
> +	return "dbar\t0b10010";
> +      case MEMMODEL_ACQ_REL:
> +      case MEMMODEL_SEQ_CST:
> +	return "dbar\t0b10000";
> +      default:
> +	/* GCC internal: "For the '__ATOMIC_RELAXED' model no instructions
> +	   need to be issued and this expansion is not invoked."
> +	   Other values should not be returned by memmodel_base.  */
> +	gcc_unreachable ();
> +      }
> +  })
>   
>   ;; Atomic memory operations.
>   

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Pushed: [PATCH v2] LoongArch: Use finer-grained DBAR hints
  2023-11-14  2:26 ` chenglulu
@ 2023-11-14  8:34   ` Xi Ruoyao
  2023-11-14  8:48     ` chenglulu
  0 siblings, 1 reply; 4+ messages in thread
From: Xi Ruoyao @ 2023-11-14  8:34 UTC (permalink / raw)
  To: chenglulu, gcc-patches; +Cc: i, xuchenghua

[-- Attachment #1: Type: text/plain, Size: 1300 bytes --]

On Tue, 2023-11-14 at 10:26 +0800, chenglulu wrote:
> Hi,
>  
> 
>  * Before calling this template, the function get_memmodel is called to process memmodel, which has a piece of code:
>  
>        /* Workaround for Bugzilla 59448. GCC doesn't track consume properly, so
>         be conservative and promote consume to acquire.  */
>      if (val == MEMMODEL_CONSUME)
>        val = MEMMODEL_ACQUIRE;  
> 
>  * So I think MEMMODEL_CONSUME don't need to be processed here either.
>  
> 
> Otherwise is OK.

Thanks, I've removed case MEMMODEL_CONSUME and there seems no issue. 
RISC-V mem_thread_fence expansion also does not handle MEMMODEL_CONSUME.

Pushed r14-5432 with case MEMMODEL_CONSUME removed and comment adjusted,
as attached.

But curiously there are various references to MEMMODEL_CONSUME in
gcc/config:

$ grep -lr MEMMODEL_CONSUME gcc/config
gcc/config/aarch64/aarch64.cc
gcc/config/riscv/riscv.cc
gcc/config/ia64/ia64.cc
gcc/config/ia64/sync.md
gcc/config/gcn/gcn.md
gcc/config/loongarch/loongarch.cc
gcc/config/rs6000/rs6000.cc
gcc/config/rs6000/sync.md
gcc/config/nvptx/nvptx.cc

Maybe all of them are redundant?

-- 
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University

[-- Attachment #2: v2-0001-LoongArch-Use-finer-grained-DBAR-hints.patch --]
[-- Type: text/x-patch, Size: 5150 bytes --]

From 4a70bfbf686c2b6a1ecd83fe851de826c612c3e0 Mon Sep 17 00:00:00 2001
From: Xi Ruoyao <xry111@xry111.site>
Date: Tue, 14 Nov 2023 05:32:38 +0800
Subject: [PATCH v2] LoongArch: Use finer-grained DBAR hints

LA664 defines DBAR hints 0x1 - 0x1f (except 0xf and 0x1f) as follows [1-2]:

- Bit 4: kind of constraint (0: completion, 1: ordering)
- Bit 3: barrier for previous read (0: true, 1: false)
- Bit 2: barrier for previous write (0: true, 1: false)
- Bit 1: barrier for succeeding read (0: true, 1: false)
- Bit 0: barrier for succeeding write (0: true, 1: false)

LLVM has already utilized them for different memory orders [3]:

- Bit 4 is always set to one because it's only intended to be zero for
  things like MMIO devices, which are out of the scope of memory orders.
- An acquire barrier is used to implement acquire loads like

    ld.d $a1, $t0, 0
    dbar acquire_hint

  where the load operation (ld.d) should not be reordered with any load
  or store operation after the acquire load.  To accomplish this
  constraint, we need to prevent the load operation from being reordered
  after the barrier, and also prevent any following load/store operation
  from being reordered before the barrier.  Thus bits 0, 1, and 3 must
  be zero, and bit 2 can be one, so acquire_hint should be 0b10100.
- An release barrier is used to implement release stores like

    dbar release_hint
    st.d $a1, $t0, 0

  where the store operation (st.d) should not be reordered with any load
  or store operation before the release store.  So we need to prevent
  the store operation from being reordered before the barrier, and also
  prevent any preceding load/store operation from being reordered after
  the barrier.  So bits 0, 2, 3 must be zero, and bit 1 can be one.  So
  release_hint should be 0b10010.

A similar mapping has been utilized for RISC-V GCC [4], LoongArch Linux
kernel [1], and LoongArch LLVM [3].  So the mapping should be correct.
And I've also bootstrapped & regtested GCC on a LA664 with this patch.

The LoongArch CPUs should treat "unknown" hints as dbar 0, so we can
unconditionally emit the new hints without a compiler switch.

[1]: https://git.kernel.org/torvalds/c/e031a5f3f1ed
[2]: https://github.com/loongson-community/docs/pull/12
[3]: https://github.com/llvm/llvm-project/pull/68787
[4]: https://gcc.gnu.org/r14-406

gcc/ChangeLog:

	* config/loongarch/sync.md (mem_thread_fence): Remove redundant
	check.
	(mem_thread_fence_1): Emit finer-grained DBAR hints for
	different memory models, instead of 0.
---
 gcc/config/loongarch/sync.md | 51 +++++++++++++++++++++++++++++-------
 1 file changed, 42 insertions(+), 9 deletions(-)

diff --git a/gcc/config/loongarch/sync.md b/gcc/config/loongarch/sync.md
index 9924d522bcd..1ad0c63e0d9 100644
--- a/gcc/config/loongarch/sync.md
+++ b/gcc/config/loongarch/sync.md
@@ -50,23 +50,56 @@ (define_expand "mem_thread_fence"
   [(match_operand:SI 0 "const_int_operand" "")] ;; model
   ""
 {
-  if (INTVAL (operands[0]) != MEMMODEL_RELAXED)
-    {
-      rtx mem = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode));
-      MEM_VOLATILE_P (mem) = 1;
-      emit_insn (gen_mem_thread_fence_1 (mem, operands[0]));
-    }
+  rtx mem = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode));
+  MEM_VOLATILE_P (mem) = 1;
+  emit_insn (gen_mem_thread_fence_1 (mem, operands[0]));
+
   DONE;
 })
 
-;; Until the LoongArch memory model (hence its mapping from C++) is finalized,
-;; conservatively emit a full FENCE.
+;; DBAR hint encoding for LA664 and later micro-architectures, paraphrased from
+;; the Linux patch revealing it [1]:
+;;
+;; - Bit 4: kind of constraint (0: completion, 1: ordering)
+;; - Bit 3: barrier for previous read (0: true, 1: false)
+;; - Bit 2: barrier for previous write (0: true, 1: false)
+;; - Bit 1: barrier for succeeding read (0: true, 1: false)
+;; - Bit 0: barrier for succeeding write (0: true, 1: false)
+;;
+;; [1]: https://git.kernel.org/torvalds/c/e031a5f3f1ed
+;;
+;; Implementations without support for the finer-granularity hints simply treat
+;; all as the full barrier (DBAR 0), so we can unconditionally start emiting the
+;; more precise hints right away.
 (define_insn "mem_thread_fence_1"
   [(set (match_operand:BLK 0 "" "")
 	(unspec:BLK [(match_dup 0)] UNSPEC_MEMORY_BARRIER))
    (match_operand:SI 1 "const_int_operand" "")] ;; model
   ""
-  "dbar\t0")
+  {
+    enum memmodel model = memmodel_base (INTVAL (operands[1]));
+
+    switch (model)
+      {
+      case MEMMODEL_ACQUIRE:
+	return "dbar\t0b10100";
+      case MEMMODEL_RELEASE:
+	return "dbar\t0b10010";
+      case MEMMODEL_ACQ_REL:
+      case MEMMODEL_SEQ_CST:
+	return "dbar\t0b10000";
+      default:
+	/* GCC internal: "For the '__ATOMIC_RELAXED' model no instructions
+	   need to be issued and this expansion is not invoked."
+
+	   __atomic builtins doc: "Consume is implemented using the
+	   stronger acquire memory order because of a deficiency in C++11's
+	   semantics."  See PR 59448 and get_memmodel in builtins.cc.
+
+	   Other values should not be returned by memmodel_base.  */
+	gcc_unreachable ();
+      }
+  })
 
 ;; Atomic memory operations.
 
-- 
2.42.1


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Pushed: [PATCH v2] LoongArch: Use finer-grained DBAR hints
  2023-11-14  8:34   ` Pushed: [PATCH v2] " Xi Ruoyao
@ 2023-11-14  8:48     ` chenglulu
  0 siblings, 0 replies; 4+ messages in thread
From: chenglulu @ 2023-11-14  8:48 UTC (permalink / raw)
  To: Xi Ruoyao, gcc-patches; +Cc: i, xuchenghua

[-- Attachment #1: Type: text/plain, Size: 1284 bytes --]


在 2023/11/14 下午4:34, Xi Ruoyao 写道:
> On Tue, 2023-11-14 at 10:26 +0800, chenglulu wrote:
>> Hi,
>>   
>>
>>   * Before calling this template, the function get_memmodel is called to process memmodel, which has a piece of code:
>>   
>>         /* Workaround for Bugzilla 59448. GCC doesn't track consume properly, so
>>          be conservative and promote consume to acquire.  */
>>       if (val == MEMMODEL_CONSUME)
>>         val = MEMMODEL_ACQUIRE;
>>
>>   * So I think MEMMODEL_CONSUME don't need to be processed here either.
>>   
>>
>> Otherwise is OK.
> Thanks, I've removed case MEMMODEL_CONSUME and there seems no issue.
> RISC-V mem_thread_fence expansion also does not handle MEMMODEL_CONSUME.
>
> Pushed r14-5432 with case MEMMODEL_CONSUME removed and comment adjusted,
> as attached.
>
> But curiously there are various references to MEMMODEL_CONSUME in
> gcc/config:
>
> $ grep -lr MEMMODEL_CONSUME gcc/config
> gcc/config/aarch64/aarch64.cc
> gcc/config/riscv/riscv.cc
> gcc/config/ia64/ia64.cc
> gcc/config/ia64/sync.md
> gcc/config/gcn/gcn.md
> gcc/config/loongarch/loongarch.cc
> gcc/config/rs6000/rs6000.cc
> gcc/config/rs6000/sync.md
> gcc/config/nvptx/nvptx.cc
>
> Maybe all of them are redundant?
>
I think so.:-)

  *



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-11-14  8:48 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-13 23:18 [PATCH] LoongArch: Use finer-grained DBAR hints Xi Ruoyao
2023-11-14  2:26 ` chenglulu
2023-11-14  8:34   ` Pushed: [PATCH v2] " Xi Ruoyao
2023-11-14  8:48     ` chenglulu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).