public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH v2 00/10] [RISC-V] Atomics improvements [PR100265/PR100266]
@ 2021-05-05 19:36 Christoph Muellner
  2021-05-05 19:36 ` [PATCH v2 01/10] RISC-V: Simplify memory model code [PR 100265] Christoph Muellner
                   ` (10 more replies)
  0 siblings, 11 replies; 31+ messages in thread
From: Christoph Muellner @ 2021-05-05 19:36 UTC (permalink / raw)
  To: gcc-patches; +Cc: Jim Wilson, Kito Cheng, Christoph Muellner

This series provides a cleanup of the current atomics implementation
of RISC-V:

* PR100265: Use proper fences for atomic load/store
* PR100266: Provide programmatic implementation of CAS

As both are very related, I merged the patches into one series.

The first patch could be squashed into the following patches,
but I found it easier to understand the chances with it in place.

The series has been tested as follows:
* Building and testing a multilib RV32/64 toolchain
  (bootstrapped with riscv-gnu-toolchain repo)
* Manual review of generated sequences for GCC's atomic builtins API

The programmatic re-implementation of CAS benefits from a REE improvement
(see PR100264):
  https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568680.html
If this patch is not in place, then an additional extension instruction
is emitted after the SC.W (in case of RV64 and CAS for uint32_t).

Further, the new CAS code requires cbranch INSN helpers to be present:
  https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569689.html

Changes for v2:
* Guard LL/SC sequence by compiler barriers ("blockage")
  (suggested by Andrew Waterman)
* Changed commit message for AMOSWAP->STORE change
  (suggested by Andrew Waterman)
* Extracted cbranch4 patch from patchset (suggested by Kito Cheng)
* Introduce predicate riscv_sync_memory_operand (suggested by Jim Wilson)
* Fix small code style issue

Christoph Muellner (10):
  RISC-V: Simplify memory model code [PR 100265]
  RISC-V: Emit proper memory ordering suffixes for AMOs [PR 100265]
  RISC-V: Eliminate %F specifier from riscv_print_operand() [PR 100265]
  RISC-V: Use STORE instead of AMOSWAP for atomic stores [PR 100265]
  RISC-V: Emit fences according to chosen memory model [PR 100265]
  RISC-V: Implement atomic_{load,store} [PR 100265]
  RISC-V: Model INSNs for LR and SC [PR 100266]
  RISC-V: Add s.ext-consuming INSNs for LR and SC [PR 100266]
  RISC-V: Provide programmatic implementation of CAS [PR 100266]
  RISC-V: Introduce predicate "riscv_sync_memory_operand" [PR 100266]

 gcc/config/riscv/riscv-protos.h |   1 +
 gcc/config/riscv/riscv.c        | 136 +++++++++++++-------
 gcc/config/riscv/sync.md        | 216 +++++++++++++++++++++-----------
 3 files changed, 235 insertions(+), 118 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v2 01/10] RISC-V: Simplify memory model code [PR 100265]
  2021-05-05 19:36 [PATCH v2 00/10] [RISC-V] Atomics improvements [PR100265/PR100266] Christoph Muellner
@ 2021-05-05 19:36 ` Christoph Muellner
  2021-05-05 19:36 ` [PATCH v2 02/10] RISC-V: Emit proper memory ordering suffixes for AMOs " Christoph Muellner
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 31+ messages in thread
From: Christoph Muellner @ 2021-05-05 19:36 UTC (permalink / raw)
  To: gcc-patches; +Cc: Jim Wilson, Kito Cheng, Christoph Muellner

We don't have any special treatment of MEMMODEL_SYNC_* values,
so let's hide them behind the memmodel_base() function.

    gcc/
	PR 100265
        * config/riscv/riscv.c (riscv_memmodel_needs_amo_acquire):
	  Ignore MEMMODEL_SYNC_* values.
        * config/riscv/riscv.c (riscv_memmodel_needs_release_fence):
	  Likewise.
        * config/riscv/riscv.c (riscv_print_operand): Eliminate
	  MEMMODEL_SYNC_* values by calling memmodel_base().
---
 gcc/config/riscv/riscv.c | 15 +++++----------
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
index 27665e5b58f9..545f3d0cb82c 100644
--- a/gcc/config/riscv/riscv.c
+++ b/gcc/config/riscv/riscv.c
@@ -3350,20 +3350,17 @@ riscv_print_operand_reloc (FILE *file, rtx op, bool hi_reloc)
    acquire portion of memory model MODEL.  */
 
 static bool
-riscv_memmodel_needs_amo_acquire (enum memmodel model)
+riscv_memmodel_needs_amo_acquire (const enum memmodel model)
 {
   switch (model)
     {
       case MEMMODEL_ACQ_REL:
       case MEMMODEL_SEQ_CST:
-      case MEMMODEL_SYNC_SEQ_CST:
       case MEMMODEL_ACQUIRE:
       case MEMMODEL_CONSUME:
-      case MEMMODEL_SYNC_ACQUIRE:
 	return true;
 
       case MEMMODEL_RELEASE:
-      case MEMMODEL_SYNC_RELEASE:
       case MEMMODEL_RELAXED:
 	return false;
 
@@ -3376,20 +3373,17 @@ riscv_memmodel_needs_amo_acquire (enum memmodel model)
    implement the release portion of memory model MODEL.  */
 
 static bool
-riscv_memmodel_needs_release_fence (enum memmodel model)
+riscv_memmodel_needs_release_fence (const enum memmodel model)
 {
   switch (model)
     {
       case MEMMODEL_ACQ_REL:
       case MEMMODEL_SEQ_CST:
-      case MEMMODEL_SYNC_SEQ_CST:
       case MEMMODEL_RELEASE:
-      case MEMMODEL_SYNC_RELEASE:
 	return true;
 
       case MEMMODEL_ACQUIRE:
       case MEMMODEL_CONSUME:
-      case MEMMODEL_SYNC_ACQUIRE:
       case MEMMODEL_RELAXED:
 	return false;
 
@@ -3414,6 +3408,7 @@ riscv_print_operand (FILE *file, rtx op, int letter)
 {
   machine_mode mode = GET_MODE (op);
   enum rtx_code code = GET_CODE (op);
+  const enum memmodel model = memmodel_base (INTVAL (op));
 
   switch (letter)
     {
@@ -3433,12 +3428,12 @@ riscv_print_operand (FILE *file, rtx op, int letter)
       break;
 
     case 'A':
-      if (riscv_memmodel_needs_amo_acquire ((enum memmodel) INTVAL (op)))
+      if (riscv_memmodel_needs_amo_acquire (model))
 	fputs (".aq", file);
       break;
 
     case 'F':
-      if (riscv_memmodel_needs_release_fence ((enum memmodel) INTVAL (op)))
+      if (riscv_memmodel_needs_release_fence (model))
 	fputs ("fence iorw,ow; ", file);
       break;
 
-- 
2.31.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v2 02/10] RISC-V: Emit proper memory ordering suffixes for AMOs [PR 100265]
  2021-05-05 19:36 [PATCH v2 00/10] [RISC-V] Atomics improvements [PR100265/PR100266] Christoph Muellner
  2021-05-05 19:36 ` [PATCH v2 01/10] RISC-V: Simplify memory model code [PR 100265] Christoph Muellner
@ 2021-05-05 19:36 ` Christoph Muellner
  2021-05-05 19:36 ` [PATCH v2 03/10] RISC-V: Eliminate %F specifier from riscv_print_operand() " Christoph Muellner
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 31+ messages in thread
From: Christoph Muellner @ 2021-05-05 19:36 UTC (permalink / raw)
  To: gcc-patches; +Cc: Jim Wilson, Kito Cheng, Christoph Muellner

The ratified A extension supports '.aq', '.rl' and '.aqrl' as
memory ordering suffixes. Let's emit them in case we get a '%A'
conversion specifier for riscv_print_operand().

As '%A' was already used for a similar, but restricted, purpose
(only '.aq' was emitted so far), this does not require any other
changes.

    gcc/
        PR 100265
        * config/riscv/riscv.c (riscv_memmodel_needs_amo_acquire):
          Remove function.
        * config/riscv/riscv.c (riscv_print_amo_memory_ordering_suffix):
          Add function to emit AMO memory ordering suffixes.
        * config/riscv/riscv.c (riscv_print_operand): Call
          riscv_print_amo_memory_ordering_suffix() instead of
          riscv_memmodel_needs_amo_acquire().
---
 gcc/config/riscv/riscv.c | 29 +++++++++++++++--------------
 1 file changed, 15 insertions(+), 14 deletions(-)

diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
index 545f3d0cb82c..3edd5c239d7c 100644
--- a/gcc/config/riscv/riscv.c
+++ b/gcc/config/riscv/riscv.c
@@ -3346,24 +3346,26 @@ riscv_print_operand_reloc (FILE *file, rtx op, bool hi_reloc)
   fputc (')', file);
 }
 
-/* Return true if the .AQ suffix should be added to an AMO to implement the
-   acquire portion of memory model MODEL.  */
+/* Print the memory ordering suffix for AMOs.  */
 
-static bool
-riscv_memmodel_needs_amo_acquire (const enum memmodel model)
+static void
+riscv_print_amo_memory_ordering_suffix (FILE *file, const enum memmodel model)
 {
   switch (model)
     {
-      case MEMMODEL_ACQ_REL:
-      case MEMMODEL_SEQ_CST:
-      case MEMMODEL_ACQUIRE:
+      case MEMMODEL_RELAXED:
+	break;
       case MEMMODEL_CONSUME:
-	return true;
-
+      case MEMMODEL_ACQUIRE:
+	fputs (".aq", file);
+	break;
       case MEMMODEL_RELEASE:
-      case MEMMODEL_RELAXED:
-	return false;
-
+	fputs (".rl", file);
+	break;
+      case MEMMODEL_ACQ_REL:
+      case MEMMODEL_SEQ_CST:
+	fputs (".aqrl", file);
+	break;
       default:
 	gcc_unreachable ();
     }
@@ -3428,8 +3430,7 @@ riscv_print_operand (FILE *file, rtx op, int letter)
       break;
 
     case 'A':
-      if (riscv_memmodel_needs_amo_acquire (model))
-	fputs (".aq", file);
+      riscv_print_amo_memory_ordering_suffix (file, model);
       break;
 
     case 'F':
-- 
2.31.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v2 03/10] RISC-V: Eliminate %F specifier from riscv_print_operand() [PR 100265]
  2021-05-05 19:36 [PATCH v2 00/10] [RISC-V] Atomics improvements [PR100265/PR100266] Christoph Muellner
  2021-05-05 19:36 ` [PATCH v2 01/10] RISC-V: Simplify memory model code [PR 100265] Christoph Muellner
  2021-05-05 19:36 ` [PATCH v2 02/10] RISC-V: Emit proper memory ordering suffixes for AMOs " Christoph Muellner
@ 2021-05-05 19:36 ` Christoph Muellner
  2021-05-05 19:36 ` [PATCH v2 04/10] RISC-V: Use STORE instead of AMOSWAP for atomic stores " Christoph Muellner
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 31+ messages in thread
From: Christoph Muellner @ 2021-05-05 19:36 UTC (permalink / raw)
  To: gcc-patches; +Cc: Jim Wilson, Kito Cheng, Christoph Muellner

A previous patch took care, that the proper memory ordering suffixes
for AMOs are emitted. Therefore there is no reason to keep the fence
generation mechanism for release operations.

    gcc/
        PR 100265
        * config/riscv/riscv.c (riscv_memmodel_needs_release_fence):
          Remove function.
        * config/riscv/riscv.c (riscv_print_operand): Remove
          %F format specifier.
        * config/riscv/sync.md: Remove %F format specifier uses.
---
 gcc/config/riscv/riscv.c | 29 -----------------------------
 gcc/config/riscv/sync.md | 16 ++++++++--------
 2 files changed, 8 insertions(+), 37 deletions(-)

diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
index 3edd5c239d7c..5fe65776e608 100644
--- a/gcc/config/riscv/riscv.c
+++ b/gcc/config/riscv/riscv.c
@@ -3371,29 +3371,6 @@ riscv_print_amo_memory_ordering_suffix (FILE *file, const enum memmodel model)
     }
 }
 
-/* Return true if a FENCE should be emitted to before a memory access to
-   implement the release portion of memory model MODEL.  */
-
-static bool
-riscv_memmodel_needs_release_fence (const enum memmodel model)
-{
-  switch (model)
-    {
-      case MEMMODEL_ACQ_REL:
-      case MEMMODEL_SEQ_CST:
-      case MEMMODEL_RELEASE:
-	return true;
-
-      case MEMMODEL_ACQUIRE:
-      case MEMMODEL_CONSUME:
-      case MEMMODEL_RELAXED:
-	return false;
-
-      default:
-	gcc_unreachable ();
-    }
-}
-
 /* Implement TARGET_PRINT_OPERAND.  The RISCV-specific operand codes are:
 
    'h'	Print the high-part relocation associated with OP, after stripping
@@ -3401,7 +3378,6 @@ riscv_memmodel_needs_release_fence (const enum memmodel model)
    'R'	Print the low-part relocation associated with OP.
    'C'	Print the integer branch condition for comparison OP.
    'A'	Print the atomic operation suffix for memory model OP.
-   'F'	Print a FENCE if the memory model requires a release.
    'z'	Print x0 if OP is zero, otherwise print OP normally.
    'i'	Print i if the operand is not a register.  */
 
@@ -3433,11 +3409,6 @@ riscv_print_operand (FILE *file, rtx op, int letter)
       riscv_print_amo_memory_ordering_suffix (file, model);
       break;
 
-    case 'F':
-      if (riscv_memmodel_needs_release_fence (model))
-	fputs ("fence iorw,ow; ", file);
-      break;
-
     case 'i':
       if (code != REG)
         fputs ("i", file);
diff --git a/gcc/config/riscv/sync.md b/gcc/config/riscv/sync.md
index 747a799e2377..aeeb2e854b68 100644
--- a/gcc/config/riscv/sync.md
+++ b/gcc/config/riscv/sync.md
@@ -65,7 +65,7 @@
        (match_operand:SI 2 "const_int_operand")]      ;; model
       UNSPEC_ATOMIC_STORE))]
   "TARGET_ATOMIC"
-  "%F2amoswap.<amo>%A2 zero,%z1,%0"
+  "amoswap.<amo>%A2 zero,%z1,%0"
   [(set (attr "length") (const_int 8))])
 
 (define_insn "atomic_<atomic_optab><mode>"
@@ -76,8 +76,8 @@
 	   (match_operand:SI 2 "const_int_operand")] ;; model
 	 UNSPEC_SYNC_OLD_OP))]
   "TARGET_ATOMIC"
-  "%F2amo<insn>.<amo>%A2 zero,%z1,%0"
-  [(set (attr "length") (const_int 8))])
+  "amo<insn>.<amo>%A2 zero,%z1,%0"
+)
 
 (define_insn "atomic_fetch_<atomic_optab><mode>"
   [(set (match_operand:GPR 0 "register_operand" "=&r")
@@ -89,8 +89,8 @@
 	   (match_operand:SI 3 "const_int_operand")] ;; model
 	 UNSPEC_SYNC_OLD_OP))]
   "TARGET_ATOMIC"
-  "%F3amo<insn>.<amo>%A3 %0,%z2,%1"
-  [(set (attr "length") (const_int 8))])
+  "amo<insn>.<amo>%A3 %0,%z2,%1"
+)
 
 (define_insn "atomic_exchange<mode>"
   [(set (match_operand:GPR 0 "register_operand" "=&r")
@@ -101,8 +101,8 @@
    (set (match_dup 1)
 	(match_operand:GPR 2 "register_operand" "0"))]
   "TARGET_ATOMIC"
-  "%F3amoswap.<amo>%A3 %0,%z2,%1"
-  [(set (attr "length") (const_int 8))])
+  "amoswap.<amo>%A3 %0,%z2,%1"
+)
 
 (define_insn "atomic_cas_value_strong<mode>"
   [(set (match_operand:GPR 0 "register_operand" "=&r")
@@ -115,7 +115,7 @@
 	 UNSPEC_COMPARE_AND_SWAP))
    (clobber (match_scratch:GPR 6 "=&r"))]
   "TARGET_ATOMIC"
-  "%F5 1: lr.<amo>%A5 %0,%1; bne %0,%z2,1f; sc.<amo>%A4 %6,%z3,%1; bnez %6,1b; 1:"
+  "1: lr.<amo>%A5 %0,%1; bne %0,%z2,1f; sc.<amo>%A4 %6,%z3,%1; bnez %6,1b; 1:"
   [(set (attr "length") (const_int 20))])
 
 (define_expand "atomic_compare_and_swap<mode>"
-- 
2.31.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v2 04/10] RISC-V: Use STORE instead of AMOSWAP for atomic stores [PR 100265]
  2021-05-05 19:36 [PATCH v2 00/10] [RISC-V] Atomics improvements [PR100265/PR100266] Christoph Muellner
                   ` (2 preceding siblings ...)
  2021-05-05 19:36 ` [PATCH v2 03/10] RISC-V: Eliminate %F specifier from riscv_print_operand() " Christoph Muellner
@ 2021-05-05 19:36 ` Christoph Muellner
  2021-05-05 19:36 ` [PATCH v2 05/10] RISC-V: Emit fences according to chosen memory model " Christoph Muellner
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 31+ messages in thread
From: Christoph Muellner @ 2021-05-05 19:36 UTC (permalink / raw)
  To: gcc-patches; +Cc: Jim Wilson, Kito Cheng, Christoph Muellner

Using AMOSWAP as atomic store does not allow us to do sub-word accesses.
Further, it is not consistent with our atomic_load () implementation.
The benefit of AMOSWAP is that the resulting code sequence will be
smaller (comapred to FENCE+STORE), however, this does not weight
out for the lack of sub-word accesses.
Additionally, HW implementors have claimed that an optimal
implementation AMOSWAP is slightly more expensive than FENCE+STORE.
So let's use STORE instead of AMOSWAP.

    gcc/
        PR 100265
        * config/riscv/sync.md (atomic_store<mode>):
          Remove.
---
 gcc/config/riscv/sync.md | 11 -----------
 1 file changed, 11 deletions(-)

diff --git a/gcc/config/riscv/sync.md b/gcc/config/riscv/sync.md
index aeeb2e854b68..efd49745a8e2 100644
--- a/gcc/config/riscv/sync.md
+++ b/gcc/config/riscv/sync.md
@@ -57,17 +57,6 @@
 
 ;; Atomic memory operations.
 
-;; Implement atomic stores with amoswap.  Fall back to fences for atomic loads.
-(define_insn "atomic_store<mode>"
-  [(set (match_operand:GPR 0 "memory_operand" "=A")
-    (unspec_volatile:GPR
-      [(match_operand:GPR 1 "reg_or_0_operand" "rJ")
-       (match_operand:SI 2 "const_int_operand")]      ;; model
-      UNSPEC_ATOMIC_STORE))]
-  "TARGET_ATOMIC"
-  "amoswap.<amo>%A2 zero,%z1,%0"
-  [(set (attr "length") (const_int 8))])
-
 (define_insn "atomic_<atomic_optab><mode>"
   [(set (match_operand:GPR 0 "memory_operand" "+A")
 	(unspec_volatile:GPR
-- 
2.31.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v2 05/10] RISC-V: Emit fences according to chosen memory model [PR 100265]
  2021-05-05 19:36 [PATCH v2 00/10] [RISC-V] Atomics improvements [PR100265/PR100266] Christoph Muellner
                   ` (3 preceding siblings ...)
  2021-05-05 19:36 ` [PATCH v2 04/10] RISC-V: Use STORE instead of AMOSWAP for atomic stores " Christoph Muellner
@ 2021-05-05 19:36 ` Christoph Muellner
  2021-05-05 19:36 ` [PATCH v2 06/10] RISC-V: Implement atomic_{load,store} " Christoph Muellner
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 31+ messages in thread
From: Christoph Muellner @ 2021-05-05 19:36 UTC (permalink / raw)
  To: gcc-patches; +Cc: Jim Wilson, Kito Cheng, Christoph Muellner

mem_thread_fence gets the desired memory model as operand.
Let's emit fences according to this value (as defined in section
    "Code Porting and Mapping Guidelines" of the unpriv spec).

    gcc/
        PR 100265
        * config/riscv/sync.md (mem_thread_fence):
          Emit fences according to given operand.
        * config/riscv/sync.md (mem_fence):
          Add INSNs for different fence flavours.
        * config/riscv/sync.md (mem_thread_fence_1):
          Remove.
---
 gcc/config/riscv/sync.md | 41 +++++++++++++++++++++++++++-------------
 1 file changed, 28 insertions(+), 13 deletions(-)

diff --git a/gcc/config/riscv/sync.md b/gcc/config/riscv/sync.md
index efd49745a8e2..406db1730b81 100644
--- a/gcc/config/riscv/sync.md
+++ b/gcc/config/riscv/sync.md
@@ -34,26 +34,41 @@
 ;; Memory barriers.
 
 (define_expand "mem_thread_fence"
-  [(match_operand:SI 0 "const_int_operand" "")] ;; model
+  [(match_operand:SI 0 "const_int_operand")] ;; model
   ""
 {
-  if (INTVAL (operands[0]) != MEMMODEL_RELAXED)
-    {
-      rtx mem = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode));
-      MEM_VOLATILE_P (mem) = 1;
-      emit_insn (gen_mem_thread_fence_1 (mem, operands[0]));
-    }
+  enum memmodel model = memmodel_from_int (INTVAL (operands[0]));
+  if (!(is_mm_relaxed (model)))
+      emit_insn (gen_mem_fence (operands[0]));
   DONE;
 })
 
-;; Until the RISC-V memory model (hence its mapping from C++) is finalized,
-;; conservatively emit a full FENCE.
-(define_insn "mem_thread_fence_1"
+(define_expand "mem_fence"
+  [(set (match_dup 1)
+	(unspec:BLK [(match_dup 1) (match_operand:SI 0 "const_int_operand")]
+		    UNSPEC_MEMORY_BARRIER))]
+  ""
+{
+  operands[1] = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode));
+  MEM_VOLATILE_P (operands[1]) = 1;
+})
+
+(define_insn "*mem_fence"
   [(set (match_operand:BLK 0 "" "")
-	(unspec:BLK [(match_dup 0)] UNSPEC_MEMORY_BARRIER))
-   (match_operand:SI 1 "const_int_operand" "")] ;; model
+	(unspec:BLK [(match_dup 0) (match_operand:SI 1 "const_int_operand")]
+		    UNSPEC_MEMORY_BARRIER))]
   ""
-  "fence\tiorw,iorw")
+{
+  enum memmodel model = memmodel_from_int (INTVAL (operands[1]));
+  if (is_mm_consume (model) || is_mm_acquire (model))
+    return "fence\tr, rw";
+  else if (is_mm_release (model))
+    return "fence\trw, w";
+  else if (is_mm_acq_rel (model))
+    return "fence.tso";
+  else
+    return "fence\trw, rw";
+})
 
 ;; Atomic memory operations.
 
-- 
2.31.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v2 06/10] RISC-V: Implement atomic_{load,store} [PR 100265]
  2021-05-05 19:36 [PATCH v2 00/10] [RISC-V] Atomics improvements [PR100265/PR100266] Christoph Muellner
                   ` (4 preceding siblings ...)
  2021-05-05 19:36 ` [PATCH v2 05/10] RISC-V: Emit fences according to chosen memory model " Christoph Muellner
@ 2021-05-05 19:36 ` Christoph Muellner
  2021-05-05 19:36 ` [PATCH v2 07/10] RISC-V: Model INSNs for LR and SC [PR 100266] Christoph Muellner
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 31+ messages in thread
From: Christoph Muellner @ 2021-05-05 19:36 UTC (permalink / raw)
  To: gcc-patches; +Cc: Jim Wilson, Kito Cheng, Christoph Muellner

A recent commit introduced a mechanism to emit proper fences
for RISC-V. Additionally, we already have emit_move_insn ().
Let's reuse this code and provide atomic_load<mode> and
atomic_store<mode> for RISC-V (as defined in section
"Code Porting and Mapping Guidelines" of the unpriv spec).
Note, that this works also for sub-word atomics.

    gcc/
        PR 100265
        * config/riscv/sync.md (atomic_load<mode>): New.
        * config/riscv/sync.md (atomic_store<mode>): New.
---
 gcc/config/riscv/sync.md | 41 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 41 insertions(+)

diff --git a/gcc/config/riscv/sync.md b/gcc/config/riscv/sync.md
index 406db1730b81..ceec324dfa30 100644
--- a/gcc/config/riscv/sync.md
+++ b/gcc/config/riscv/sync.md
@@ -23,6 +23,7 @@
   UNSPEC_COMPARE_AND_SWAP
   UNSPEC_SYNC_OLD_OP
   UNSPEC_SYNC_EXCHANGE
+  UNSPEC_ATOMIC_LOAD
   UNSPEC_ATOMIC_STORE
   UNSPEC_MEMORY_BARRIER
 ])
@@ -72,6 +73,46 @@
 
 ;; Atomic memory operations.
 
+(define_expand "atomic_load<mode>"
+  [(set (match_operand:ANYI 0 "register_operand" "=r")
+    (unspec_volatile:ANYI
+      [(match_operand:ANYI 1 "memory_operand" "A")
+       (match_operand:SI 2 "const_int_operand")]      ;; model
+      UNSPEC_ATOMIC_LOAD))]
+  ""
+  {
+    rtx target = operands[0];
+    rtx mem = operands[1];
+    enum memmodel model = memmodel_from_int (INTVAL (operands[2]));
+
+    if (is_mm_seq_cst (model))
+      emit_insn (gen_mem_fence (GEN_INT (MEMMODEL_SEQ_CST)));
+    emit_move_insn (target, mem);
+    if (is_mm_acquire (model) || is_mm_seq_cst (model))
+      emit_insn (gen_mem_fence (GEN_INT (MEMMODEL_ACQUIRE)));
+
+    DONE;
+})
+
+(define_expand "atomic_store<mode>"
+  [(set (match_operand:ANYI 0 "memory_operand" "=A")
+    (unspec_volatile:ANYI
+      [(match_operand:ANYI 1 "reg_or_0_operand" "rJ")
+       (match_operand:SI 2 "const_int_operand")]      ;; model
+      UNSPEC_ATOMIC_STORE))]
+  ""
+  {
+    rtx mem = operands[0];
+    rtx val = operands[1];
+    enum memmodel model = memmodel_from_int (INTVAL (operands[2]));
+
+    if (is_mm_release (model) || is_mm_seq_cst (model))
+      emit_insn (gen_mem_fence (GEN_INT (MEMMODEL_RELEASE)));
+    emit_move_insn (mem, val);
+
+    DONE;
+})
+
 (define_insn "atomic_<atomic_optab><mode>"
   [(set (match_operand:GPR 0 "memory_operand" "+A")
 	(unspec_volatile:GPR
-- 
2.31.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v2 07/10] RISC-V: Model INSNs for LR and SC [PR 100266]
  2021-05-05 19:36 [PATCH v2 00/10] [RISC-V] Atomics improvements [PR100265/PR100266] Christoph Muellner
                   ` (5 preceding siblings ...)
  2021-05-05 19:36 ` [PATCH v2 06/10] RISC-V: Implement atomic_{load,store} " Christoph Muellner
@ 2021-05-05 19:36 ` Christoph Muellner
  2021-05-05 19:36 ` [PATCH v2 08/10] RISC-V: Add s.ext-consuming " Christoph Muellner
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 31+ messages in thread
From: Christoph Muellner @ 2021-05-05 19:36 UTC (permalink / raw)
  To: gcc-patches; +Cc: Jim Wilson, Kito Cheng, Christoph Muellner

In order to emit LR/SC sequences, let's provide INSNs, which
take care of memory ordering constraints.

    gcc/
        PR 100266
        * config/rsicv/sync.md (UNSPEC_LOAD_RESERVED): New.
        * config/rsicv/sync.md (UNSPEC_STORE_CONDITIONAL): New.
        * config/riscv/sync.md (riscv_load_reserved): New.
        * config/riscv/sync.md (riscv_store_conditional): New.
---
 gcc/config/riscv/sync.md | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/gcc/config/riscv/sync.md b/gcc/config/riscv/sync.md
index ceec324dfa30..edff6520b87e 100644
--- a/gcc/config/riscv/sync.md
+++ b/gcc/config/riscv/sync.md
@@ -26,6 +26,8 @@
   UNSPEC_ATOMIC_LOAD
   UNSPEC_ATOMIC_STORE
   UNSPEC_MEMORY_BARRIER
+  UNSPEC_LOAD_RESERVED
+  UNSPEC_STORE_CONDITIONAL
 ])
 
 (define_code_iterator any_atomic [plus ior xor and])
@@ -113,6 +115,28 @@
     DONE;
 })
 
+(define_insn "@riscv_load_reserved<mode>"
+  [(set (match_operand:GPR 0 "register_operand" "=r")
+    (unspec_volatile:GPR
+      [(match_operand:GPR 1 "memory_operand" "A")
+       (match_operand:SI 2 "const_int_operand")]      ;; model
+      UNSPEC_LOAD_RESERVED))]
+  "TARGET_ATOMIC"
+  "lr.<amo>%A2 %0, %1"
+)
+
+(define_insn "@riscv_store_conditional<mode>"
+  [(set (match_operand:GPR 0 "register_operand" "=&r")
+    (unspec_volatile:GPR [(const_int 0)] UNSPEC_STORE_CONDITIONAL))
+   (set (match_operand:GPR 1 "memory_operand" "=A")
+    (unspec_volatile:GPR
+      [(match_operand:GPR 2 "reg_or_0_operand" "rJ")
+       (match_operand:SI 3 "const_int_operand")]      ;; model
+      UNSPEC_STORE_CONDITIONAL))]
+  "TARGET_ATOMIC"
+  "sc.<amo>%A3 %0, %z2, %1"
+)
+
 (define_insn "atomic_<atomic_optab><mode>"
   [(set (match_operand:GPR 0 "memory_operand" "+A")
 	(unspec_volatile:GPR
-- 
2.31.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v2 08/10] RISC-V: Add s.ext-consuming INSNs for LR and SC [PR 100266]
  2021-05-05 19:36 [PATCH v2 00/10] [RISC-V] Atomics improvements [PR100265/PR100266] Christoph Muellner
                   ` (6 preceding siblings ...)
  2021-05-05 19:36 ` [PATCH v2 07/10] RISC-V: Model INSNs for LR and SC [PR 100266] Christoph Muellner
@ 2021-05-05 19:36 ` Christoph Muellner
  2021-05-05 19:36 ` [PATCH v2 09/10] RISC-V: Provide programmatic implementation of CAS " Christoph Muellner
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 31+ messages in thread
From: Christoph Muellner @ 2021-05-05 19:36 UTC (permalink / raw)
  To: gcc-patches; +Cc: Jim Wilson, Kito Cheng, Christoph Muellner

The current model of the LR and SC INSNs requires a sign-extension
to use the generated SImode value for conditional branches, which
only operate on XLEN registers.
However, the sign-extension is actually not required in both cases,
therefore this patch introduces additional INSNs that consume
the sign-extension.

Rationale:
The sign-extension of the loaded value of a LR.W is specified as
sign-extended. Therefore, a sign-extension is not required.
The sign-extension of the success value a SC.W is specified as
non-zero. As sign-extended non-zero value remains non-zero,
therefore the sign-extension is not required.

    gcc/
        PR 100266
        * config/riscv/sync.md (riscv_load_reserved): New.
        * config/riscv/sync.md (riscv_store_conditional): New.
---
 gcc/config/riscv/sync.md | 34 ++++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

diff --git a/gcc/config/riscv/sync.md b/gcc/config/riscv/sync.md
index edff6520b87e..49b860da8ef0 100644
--- a/gcc/config/riscv/sync.md
+++ b/gcc/config/riscv/sync.md
@@ -125,6 +125,21 @@
   "lr.<amo>%A2 %0, %1"
 )
 
+;; This pattern allows to consume a sign-extension of the loaded value.
+;; This is legal, because the specification of LR.W defines the loaded
+;; value to be sign-extended.
+
+(define_insn "riscv_load_reserved"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+    (sign_extend:DI
+      (unspec_volatile:SI
+	[(match_operand:SI 1 "memory_operand" "A")
+	 (match_operand:SI 2 "const_int_operand")]      ;; model
+	UNSPEC_LOAD_RESERVED)))]
+  "TARGET_ATOMIC && TARGET_64BIT"
+  "lr.w%A2 %0, %1"
+)
+
 (define_insn "@riscv_store_conditional<mode>"
   [(set (match_operand:GPR 0 "register_operand" "=&r")
     (unspec_volatile:GPR [(const_int 0)] UNSPEC_STORE_CONDITIONAL))
@@ -137,6 +152,25 @@
   "sc.<amo>%A3 %0, %z2, %1"
 )
 
+;; This pattern allows to consume a sign-extension of the success
+;; value of SC.W, which can then be used for instructions which
+;; require values of XLEN-size (e.g. conditional branches).
+;; This is legal, because any non-zero value remains non-zero
+;; after sign-extension.
+
+(define_insn "riscv_store_conditional"
+  [(set (match_operand:DI 0 "register_operand" "=&r")
+    (sign_extend:DI
+      (unspec_volatile:SI [(const_int 0)] UNSPEC_STORE_CONDITIONAL)))
+   (set (match_operand:SI 1 "memory_operand" "=A")
+    (unspec_volatile:SI
+      [(match_operand:SI 2 "reg_or_0_operand" "rJ")
+       (match_operand:SI 3 "const_int_operand")]      ;; model
+      UNSPEC_STORE_CONDITIONAL))]
+  "TARGET_ATOMIC && TARGET_64BIT"
+  "sc.w%A3 %0, %z2, %1"
+)
+
 (define_insn "atomic_<atomic_optab><mode>"
   [(set (match_operand:GPR 0 "memory_operand" "+A")
 	(unspec_volatile:GPR
-- 
2.31.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v2 09/10] RISC-V: Provide programmatic implementation of CAS [PR 100266]
  2021-05-05 19:36 [PATCH v2 00/10] [RISC-V] Atomics improvements [PR100265/PR100266] Christoph Muellner
                   ` (7 preceding siblings ...)
  2021-05-05 19:36 ` [PATCH v2 08/10] RISC-V: Add s.ext-consuming " Christoph Muellner
@ 2021-05-05 19:36 ` Christoph Muellner
  2021-05-06  0:27   ` Jim Wilson
  2021-05-05 19:36 ` [PATCH v2 10/10] RISC-V: Introduce predicate "riscv_sync_memory_operand" " Christoph Muellner
  2022-10-11 19:06 ` [PATCH v2 00/10] [RISC-V] Atomics improvements [PR100265/PR100266] Vineet Gupta
  10 siblings, 1 reply; 31+ messages in thread
From: Christoph Muellner @ 2021-05-05 19:36 UTC (permalink / raw)
  To: gcc-patches; +Cc: Jim Wilson, Kito Cheng, Christoph Muellner

The existing CAS implementation uses an INSN definition, which provides
the core LR/SC sequence. Additionally to that, there is a follow-up code,
that evaluates the results and calculates the return values.
This has two drawbacks: a) an extension to sub-word CAS implementations
is not possible (even if, then it would be unmaintainable), and b) the
implementation is hard to maintain/improve.
This patch provides a programmatic implementation of CAS, similar
like many other architectures are having one.

The implementation supports both, RV32 and RV64.

Additionally, the implementation does not introduce data dependencies
for computation of the return value. Instead, we set the return value
(success state of the CAS operation) based on structural information.
This approach is also shown in the the RISC-V unpriv spec (as part
of the sample code for a compare-and-swap function using LR/SC).
The cost of this implementation is a single LI instruction on top,
which is actually not required in case of success (it will be
overwritten in the success case later).

The resulting sequence requires 9 instructions in the success case.
The previous implementation required 11 instructions in the succcess
case (including a taken branch) and had a "subw;seqz;beqz" sequence,
with direct dependencies.

Below is the generated code of a 32-bit CAS sequence with the old
implementation and the new implementation (ignore the ANDIs below).

Old:
     f00:       419c                    lw      a5,0(a1)
     f02:       1005272f                lr.w    a4,(a0)
     f06:       00f71563                bne     a4,a5,f10
     f0a:       18c526af                sc.w    a3,a2,(a0)
     f0e:       faf5                    bnez    a3,f02
     f10:       40f707bb                subw    a5,a4,a5
     f14:       0017b513                seqz    a0,a5
     f18:       c391                    beqz    a5,f1c
     f1a:       c198                    sw      a4,0(a1)
     f1c:       8905                    andi    a0,a0,1
     f1e:       8082                    ret

New:
     e28:       4194                    lw      a3,0(a1)
     e2a:       4701                    li      a4,0
     e2c:       1005282f                lr.w    a6,(a0)
     e30:       00d81963                bne     a6,a3,e42
     e34:       18c527af                sc.w    a5,a2,(a0)
     e38:       fbf5                    bnez    a5,e2c
     e3a:       4705                    li      a4,1
     e3c:       00177513                andi    a0,a4,1
     e40:       8082                    ret
     e42:       0105a023                sw      a6,0(a1)
     e46:       00177513                andi    a0,a4,1
     e4a:       8082                    ret

    gcc/
        PR 100266
        * config/riscv/riscv-protos.h (riscv_expand_compare_and_swap): New.
        * config/riscv/riscv.c (riscv_emit_unlikely_jump): New.
        * config/rsicv/riscv.c (riscv_expand_compare_and_swap): New.
        * config/rsicv/sync.md (atomic_cas_value_strong): Removed.
        * config/rsicv/sync.md (atomic_compare_and_swap): Call
	  riscv_expand_compare_and_swap.
---
 gcc/config/riscv/riscv-protos.h |  1 +
 gcc/config/riscv/riscv.c        | 75 +++++++++++++++++++++++++++++++++
 gcc/config/riscv/sync.md        | 35 +--------------
 3 files changed, 77 insertions(+), 34 deletions(-)

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 43d7224d6941..eb7e67d3b95a 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -59,6 +59,7 @@ extern void riscv_expand_int_scc (rtx, enum rtx_code, rtx, rtx);
 extern void riscv_expand_float_scc (rtx, enum rtx_code, rtx, rtx);
 extern void riscv_expand_conditional_branch (rtx, enum rtx_code, rtx, rtx);
 extern void riscv_expand_conditional_move (rtx, rtx, rtx, rtx_code, rtx, rtx);
+extern void riscv_expand_compare_and_swap (rtx[]);
 #endif
 extern rtx riscv_legitimize_call_address (rtx);
 extern void riscv_set_return_address (rtx, rtx);
diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
index 5fe65776e608..a7b18d650daa 100644
--- a/gcc/config/riscv/riscv.c
+++ b/gcc/config/riscv/riscv.c
@@ -2496,6 +2496,81 @@ riscv_expand_conditional_move (rtx dest, rtx cons, rtx alt, rtx_code code,
 						      cons, alt)));
 }
 
+/* Mark the previous jump instruction as unlikely.  */
+
+static void
+riscv_emit_unlikely_jump (rtx insn)
+{
+  rtx_insn *jump = emit_jump_insn (insn);
+  add_reg_br_prob_note (jump, profile_probability::very_unlikely ());
+}
+
+/* Expand code to perform a compare-and-swap.  */
+
+extern void
+riscv_expand_compare_and_swap (rtx operands[])
+{
+  rtx bval, oldval, mem, expval, newval, mod_s, mod_f, scratch, cond1, cond2;
+  machine_mode mode;
+  rtx_code_label *begin_label, *end_label;
+
+  bval = operands[0];
+  oldval = operands[1];
+  mem = operands[2];
+  expval = operands[3];
+  newval = operands[4];
+  mod_s = operands[6];
+  mod_f = operands[7];
+  mode = GET_MODE (mem);
+  scratch = gen_reg_rtx (mode);
+  begin_label = gen_label_rtx ();
+  end_label = gen_label_rtx ();
+
+  /* No support for sub-word CAS.  */
+  if (mode == QImode || mode == HImode)
+    gcc_unreachable ();
+
+  /* We use mod_f for LR and mod_s for SC below, but
+     RV does not have any guarantees for LR.rl and SC.aq.  */
+  if (is_mm_acquire (memmodel_base (INTVAL (mod_s)))
+      && is_mm_relaxed (memmodel_base (INTVAL (mod_f))))
+    {
+      mod_f = GEN_INT (MEMMODEL_ACQUIRE);
+      mod_s = GEN_INT (MEMMODEL_RELAXED);
+    }
+
+  /* Since we want to maintain a branch-free good-case, but also want
+     to not have two branches in the bad-case, we set bval to FALSE
+     on top of the sequence.  In the bad case, we simply jump over
+     the assignment of bval to TRUE at the end of the sequence.  */
+
+  emit_insn (gen_rtx_SET (bval, gen_rtx_CONST_INT (SImode, FALSE)));
+
+  /* Make sure the scheduler does not move INSNs beyond here.  */
+  emit_insn (gen_blockage ());
+
+  emit_label (begin_label);
+
+  emit_insn (gen_riscv_load_reserved (mode, oldval, mem, mod_f));
+
+  cond1 = gen_rtx_NE (mode, oldval, expval);
+  riscv_emit_unlikely_jump (gen_cbranch4 (Pmode, cond1, oldval, expval,
+			    end_label));
+
+  emit_insn (gen_riscv_store_conditional (mode, scratch, mem, newval, mod_s));
+
+  /* Make sure the scheduler does not move INSNs beyond here.  */
+  emit_insn (gen_blockage ());
+
+  cond2 = gen_rtx_NE (mode, scratch, const0_rtx);
+  riscv_emit_unlikely_jump (gen_cbranch4 (Pmode, cond2, scratch, const0_rtx,
+			    begin_label));
+
+  emit_insn (gen_rtx_SET (bval, gen_rtx_CONST_INT (SImode, TRUE)));
+
+  emit_label (end_label);
+}
+
 /* Implement TARGET_FUNCTION_ARG_BOUNDARY.  Every parameter gets at
    least PARM_BOUNDARY bits of alignment, but will be given anything up
    to PREFERRED_STACK_BOUNDARY bits if the type requires it.  */
diff --git a/gcc/config/riscv/sync.md b/gcc/config/riscv/sync.md
index 49b860da8ef0..da8dbf698163 100644
--- a/gcc/config/riscv/sync.md
+++ b/gcc/config/riscv/sync.md
@@ -207,20 +207,6 @@
   "amoswap.<amo>%A3 %0,%z2,%1"
 )
 
-(define_insn "atomic_cas_value_strong<mode>"
-  [(set (match_operand:GPR 0 "register_operand" "=&r")
-	(match_operand:GPR 1 "memory_operand" "+A"))
-   (set (match_dup 1)
-	(unspec_volatile:GPR [(match_operand:GPR 2 "reg_or_0_operand" "rJ")
-			      (match_operand:GPR 3 "reg_or_0_operand" "rJ")
-			      (match_operand:SI 4 "const_int_operand")  ;; mod_s
-			      (match_operand:SI 5 "const_int_operand")] ;; mod_f
-	 UNSPEC_COMPARE_AND_SWAP))
-   (clobber (match_scratch:GPR 6 "=&r"))]
-  "TARGET_ATOMIC"
-  "1: lr.<amo>%A5 %0,%1; bne %0,%z2,1f; sc.<amo>%A4 %6,%z3,%1; bnez %6,1b; 1:"
-  [(set (attr "length") (const_int 20))])
-
 (define_expand "atomic_compare_and_swap<mode>"
   [(match_operand:SI 0 "register_operand" "")   ;; bool output
    (match_operand:GPR 1 "register_operand" "")  ;; val output
@@ -232,26 +218,7 @@
    (match_operand:SI 7 "const_int_operand" "")] ;; mod_f
   "TARGET_ATOMIC"
 {
-  emit_insn (gen_atomic_cas_value_strong<mode> (operands[1], operands[2],
-						operands[3], operands[4],
-						operands[6], operands[7]));
-
-  rtx compare = operands[1];
-  if (operands[3] != const0_rtx)
-    {
-      rtx difference = gen_rtx_MINUS (<MODE>mode, operands[1], operands[3]);
-      compare = gen_reg_rtx (<MODE>mode);
-      emit_insn (gen_rtx_SET (compare, difference));
-    }
-
-  if (word_mode != <MODE>mode)
-    {
-      rtx reg = gen_reg_rtx (word_mode);
-      emit_insn (gen_rtx_SET (reg, gen_rtx_SIGN_EXTEND (word_mode, compare)));
-      compare = reg;
-    }
-
-  emit_insn (gen_rtx_SET (operands[0], gen_rtx_EQ (SImode, compare, const0_rtx)));
+  riscv_expand_compare_and_swap (operands);
   DONE;
 })
 
-- 
2.31.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v2 10/10] RISC-V: Introduce predicate "riscv_sync_memory_operand" [PR 100266]
  2021-05-05 19:36 [PATCH v2 00/10] [RISC-V] Atomics improvements [PR100265/PR100266] Christoph Muellner
                   ` (8 preceding siblings ...)
  2021-05-05 19:36 ` [PATCH v2 09/10] RISC-V: Provide programmatic implementation of CAS " Christoph Muellner
@ 2021-05-05 19:36 ` Christoph Muellner
  2022-10-11 19:06 ` [PATCH v2 00/10] [RISC-V] Atomics improvements [PR100265/PR100266] Vineet Gupta
  10 siblings, 0 replies; 31+ messages in thread
From: Christoph Muellner @ 2021-05-05 19:36 UTC (permalink / raw)
  To: gcc-patches; +Cc: Jim Wilson, Kito Cheng, Christoph Muellner

Atomic instructions require zero-offset memory addresses.
If we allow all addresses, the nonzero-offset addresses will
be prepared in an extra register in an extra instruction before
the actual atomic instruction.

This patch introduces the predicate "riscv_sync_memory_operand",
which restricts the memory operand to be suitable for atomic
instructions.

    gcc/
        PR 100266
        * config/riscv/sync.md (riscv_sync_memory_operand): New.
        * config/riscv/sync.md (riscv_load_reserved): Use new predicate.
        * config/riscv/sync.md (riscv_store_conditional): Likewise.
        * config/riscv/sync.md (atomic_<atomic_optab>): Likewise.
        * config/riscv/sync.md (atomic_fetch_<atomic_optab>): Likewise.
        * config/riscv/sync.md (atomic_exchange): Likewise.
        * config/riscv/sync.md (atomic_compare_and_swap): Likewise.
---
 gcc/config/riscv/sync.md | 34 +++++++++++++++++++---------------
 1 file changed, 19 insertions(+), 15 deletions(-)

diff --git a/gcc/config/riscv/sync.md b/gcc/config/riscv/sync.md
index da8dbf698163..cd9078a40248 100644
--- a/gcc/config/riscv/sync.md
+++ b/gcc/config/riscv/sync.md
@@ -30,6 +30,10 @@
   UNSPEC_STORE_CONDITIONAL
 ])
 
+(define_predicate "riscv_sync_memory_operand"
+  (and (match_operand 0 "memory_operand")
+       (match_code "reg" "0")))
+
 (define_code_iterator any_atomic [plus ior xor and])
 (define_code_attr atomic_optab
   [(plus "add") (ior "or") (xor "xor") (and "and")])
@@ -118,7 +122,7 @@
 (define_insn "@riscv_load_reserved<mode>"
   [(set (match_operand:GPR 0 "register_operand" "=r")
     (unspec_volatile:GPR
-      [(match_operand:GPR 1 "memory_operand" "A")
+      [(match_operand:GPR 1 "riscv_sync_memory_operand" "A")
        (match_operand:SI 2 "const_int_operand")]      ;; model
       UNSPEC_LOAD_RESERVED))]
   "TARGET_ATOMIC"
@@ -133,7 +137,7 @@
   [(set (match_operand:DI 0 "register_operand" "=r")
     (sign_extend:DI
       (unspec_volatile:SI
-	[(match_operand:SI 1 "memory_operand" "A")
+	[(match_operand:SI 1 "riscv_sync_memory_operand" "A")
 	 (match_operand:SI 2 "const_int_operand")]      ;; model
 	UNSPEC_LOAD_RESERVED)))]
   "TARGET_ATOMIC && TARGET_64BIT"
@@ -143,7 +147,7 @@
 (define_insn "@riscv_store_conditional<mode>"
   [(set (match_operand:GPR 0 "register_operand" "=&r")
     (unspec_volatile:GPR [(const_int 0)] UNSPEC_STORE_CONDITIONAL))
-   (set (match_operand:GPR 1 "memory_operand" "=A")
+   (set (match_operand:GPR 1 "riscv_sync_memory_operand" "=A")
     (unspec_volatile:GPR
       [(match_operand:GPR 2 "reg_or_0_operand" "rJ")
        (match_operand:SI 3 "const_int_operand")]      ;; model
@@ -162,7 +166,7 @@
   [(set (match_operand:DI 0 "register_operand" "=&r")
     (sign_extend:DI
       (unspec_volatile:SI [(const_int 0)] UNSPEC_STORE_CONDITIONAL)))
-   (set (match_operand:SI 1 "memory_operand" "=A")
+   (set (match_operand:SI 1 "riscv_sync_memory_operand" "=A")
     (unspec_volatile:SI
       [(match_operand:SI 2 "reg_or_0_operand" "rJ")
        (match_operand:SI 3 "const_int_operand")]      ;; model
@@ -172,7 +176,7 @@
 )
 
 (define_insn "atomic_<atomic_optab><mode>"
-  [(set (match_operand:GPR 0 "memory_operand" "+A")
+  [(set (match_operand:GPR 0 "riscv_sync_memory_operand" "+A")
 	(unspec_volatile:GPR
 	  [(any_atomic:GPR (match_dup 0)
 		     (match_operand:GPR 1 "reg_or_0_operand" "rJ"))
@@ -184,7 +188,7 @@
 
 (define_insn "atomic_fetch_<atomic_optab><mode>"
   [(set (match_operand:GPR 0 "register_operand" "=&r")
-	(match_operand:GPR 1 "memory_operand" "+A"))
+	(match_operand:GPR 1 "riscv_sync_memory_operand" "+A"))
    (set (match_dup 1)
 	(unspec_volatile:GPR
 	  [(any_atomic:GPR (match_dup 1)
@@ -198,7 +202,7 @@
 (define_insn "atomic_exchange<mode>"
   [(set (match_operand:GPR 0 "register_operand" "=&r")
 	(unspec_volatile:GPR
-	  [(match_operand:GPR 1 "memory_operand" "+A")
+	  [(match_operand:GPR 1 "riscv_sync_memory_operand" "+A")
 	   (match_operand:SI 3 "const_int_operand")] ;; model
 	  UNSPEC_SYNC_EXCHANGE))
    (set (match_dup 1)
@@ -208,14 +212,14 @@
 )
 
 (define_expand "atomic_compare_and_swap<mode>"
-  [(match_operand:SI 0 "register_operand" "")   ;; bool output
-   (match_operand:GPR 1 "register_operand" "")  ;; val output
-   (match_operand:GPR 2 "memory_operand" "")    ;; memory
-   (match_operand:GPR 3 "reg_or_0_operand" "")  ;; expected value
-   (match_operand:GPR 4 "reg_or_0_operand" "")  ;; desired value
-   (match_operand:SI 5 "const_int_operand" "")  ;; is_weak
-   (match_operand:SI 6 "const_int_operand" "")  ;; mod_s
-   (match_operand:SI 7 "const_int_operand" "")] ;; mod_f
+  [(match_operand:SI 0 "register_operand" "")           ;; bool output
+   (match_operand:GPR 1 "register_operand" "")          ;; val output
+   (match_operand:GPR 2 "riscv_sync_memory_operand" "") ;; memory
+   (match_operand:GPR 3 "reg_or_0_operand" "")          ;; expected value
+   (match_operand:GPR 4 "reg_or_0_operand" "")          ;; desired value
+   (match_operand:SI 5 "const_int_operand" "")          ;; is_weak
+   (match_operand:SI 6 "const_int_operand" "")          ;; mod_s
+   (match_operand:SI 7 "const_int_operand" "")]         ;; mod_f
   "TARGET_ATOMIC"
 {
   riscv_expand_compare_and_swap (operands);
-- 
2.31.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 09/10] RISC-V: Provide programmatic implementation of CAS [PR 100266]
  2021-05-05 19:36 ` [PATCH v2 09/10] RISC-V: Provide programmatic implementation of CAS " Christoph Muellner
@ 2021-05-06  0:27   ` Jim Wilson
  0 siblings, 0 replies; 31+ messages in thread
From: Jim Wilson @ 2021-05-06  0:27 UTC (permalink / raw)
  To: Christoph Muellner; +Cc: GCC Patches, Kito Cheng

On Wed, May 5, 2021 at 12:37 PM Christoph Muellner <cmuellner@gcc.gnu.org>
wrote:

> The existing CAS implementation uses an INSN definition, which provides
> the core LR/SC sequence. Additionally to that, there is a follow-up code,
> that evaluates the results and calculates the return values.
> This has two drawbacks: a) an extension to sub-word CAS implementations
> is not possible (even if, then it would be unmaintainable), and b) the
> implementation is hard to maintain/improve.
> This patch provides a programmatic implementation of CAS, similar
> like many other architectures are having one.
>

A comment that Andrew Waterman made to me today about the safety of this
under various circumstances got me thinking, and I realized that without
the special cas pattern we can get reloads in the middle of the sequence
which would be bad.  Experimenting a bit, I managed to prove it.  This is
using the old version of the patch which I already had handy, but I'm sure
the new version will behave roughly the same way.  Using the testsuite
testcase atomic-compare-exchange-3.c as before, and adding a lot of
-ffixed-X options to simulate high register pressure, with the compiler
command
./xgcc -B./ -O2 -S tmp.c -ffixed-x16 -ffixed-x17 -ffixed-x18 -ffixed-x19
-ffixed-x20 -ffixed-x21 -ffixed-x22 -ffixed-x23 -ffixed-x24 -ffixed-x25
-ffixed-x26 -ffixed-x27 -ffixed-x28 -ffixed-x29 -ffixed-x30 -ffixed-x31
-ffixed-x15 -ffixed-x14 -ffixed-x13 -ffixed-x12 -ffixed-s0 -ffixed-s1
-ffixed-t2 -ffixed-t1 -ffixed-t0
I get for the first lr/sc loop
.L2:
        lui     a1,%hi(v)
        addi    a0,a1,%lo(v)
        lr.w a1, 0(a0)
        ld      a0,8(sp)
        sw      a1,24(sp)
        bne     a1,a0,.L39
        lui     a1,%hi(v)
        addi    a0,a1,%lo(v)
        lw      a1,16(sp)
        sd      ra,24(sp)
        sc.w ra, a1, 0(a0)
        sext.w  a1,ra
        ld      ra,24(sp)
        bne     a1,zero,.L2
and note all of the misc load/store instructions added by reload.  I don't
think this is safe or guaranteed to work.  With the cas pattern, any
reloads are guaranteed to be emitted before and/or after the lr/sc loop.
With the separate patterns, there is no way to ensure that we won't get
accidental reloads in the middle of the lr/sc loop.

I think we need to keep the cas pattern.  We can always put C code inside
the output template of the cas pattern if that is helpful.  It can do any
necessary tests and then return an appropriate string for the instructions
we want.

Jim

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 00/10] [RISC-V] Atomics improvements [PR100265/PR100266]
  2021-05-05 19:36 [PATCH v2 00/10] [RISC-V] Atomics improvements [PR100265/PR100266] Christoph Muellner
                   ` (9 preceding siblings ...)
  2021-05-05 19:36 ` [PATCH v2 10/10] RISC-V: Introduce predicate "riscv_sync_memory_operand" " Christoph Muellner
@ 2022-10-11 19:06 ` Vineet Gupta
  2022-10-11 19:31   ` Palmer Dabbelt
  10 siblings, 1 reply; 31+ messages in thread
From: Vineet Gupta @ 2022-10-11 19:06 UTC (permalink / raw)
  To: Christoph Muellner, gcc-patches; +Cc: Kito Cheng, gnu-toolchain

Hi Christoph, Kito,

On 5/5/21 12:36, Christoph Muellner via Gcc-patches wrote:
> This series provides a cleanup of the current atomics implementation
> of RISC-V:
>
> * PR100265: Use proper fences for atomic load/store
> * PR100266: Provide programmatic implementation of CAS
>
> As both are very related, I merged the patches into one series.
>
> The first patch could be squashed into the following patches,
> but I found it easier to understand the chances with it in place.
>
> The series has been tested as follows:
> * Building and testing a multilib RV32/64 toolchain
>    (bootstrapped with riscv-gnu-toolchain repo)
> * Manual review of generated sequences for GCC's atomic builtins API
>
> The programmatic re-implementation of CAS benefits from a REE improvement
> (see PR100264):
>    https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568680.html
> If this patch is not in place, then an additional extension instruction
> is emitted after the SC.W (in case of RV64 and CAS for uint32_t).
>
> Further, the new CAS code requires cbranch INSN helpers to be present:
>    https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569689.html

I was wondering is this patchset is blocked on some technical grounds.

Thx,
-Vineet

> Changes for v2:
> * Guard LL/SC sequence by compiler barriers ("blockage")
>    (suggested by Andrew Waterman)
> * Changed commit message for AMOSWAP->STORE change
>    (suggested by Andrew Waterman)
> * Extracted cbranch4 patch from patchset (suggested by Kito Cheng)
> * Introduce predicate riscv_sync_memory_operand (suggested by Jim Wilson)
> * Fix small code style issue
>
> Christoph Muellner (10):
>    RISC-V: Simplify memory model code [PR 100265]
>    RISC-V: Emit proper memory ordering suffixes for AMOs [PR 100265]
>    RISC-V: Eliminate %F specifier from riscv_print_operand() [PR 100265]
>    RISC-V: Use STORE instead of AMOSWAP for atomic stores [PR 100265]
>    RISC-V: Emit fences according to chosen memory model [PR 100265]
>    RISC-V: Implement atomic_{load,store} [PR 100265]
>    RISC-V: Model INSNs for LR and SC [PR 100266]
>    RISC-V: Add s.ext-consuming INSNs for LR and SC [PR 100266]
>    RISC-V: Provide programmatic implementation of CAS [PR 100266]
>    RISC-V: Introduce predicate "riscv_sync_memory_operand" [PR 100266]
>
>   gcc/config/riscv/riscv-protos.h |   1 +
>   gcc/config/riscv/riscv.c        | 136 +++++++++++++-------
>   gcc/config/riscv/sync.md        | 216 +++++++++++++++++++++-----------
>   3 files changed, 235 insertions(+), 118 deletions(-)
>


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 00/10] [RISC-V] Atomics improvements [PR100265/PR100266]
  2022-10-11 19:06 ` [PATCH v2 00/10] [RISC-V] Atomics improvements [PR100265/PR100266] Vineet Gupta
@ 2022-10-11 19:31   ` Palmer Dabbelt
  2022-10-11 20:46     ` Christoph Müllner
  2022-10-11 23:14     ` Jeff Law
  0 siblings, 2 replies; 31+ messages in thread
From: Palmer Dabbelt @ 2022-10-11 19:31 UTC (permalink / raw)
  To: Vineet Gupta, andrea; +Cc: cmuellner, gcc-patches, kito.cheng, gnu-toolchain

On Tue, 11 Oct 2022 12:06:27 PDT (-0700), Vineet Gupta wrote:
> Hi Christoph, Kito,
>
> On 5/5/21 12:36, Christoph Muellner via Gcc-patches wrote:
>> This series provides a cleanup of the current atomics implementation
>> of RISC-V:
>>
>> * PR100265: Use proper fences for atomic load/store
>> * PR100266: Provide programmatic implementation of CAS
>>
>> As both are very related, I merged the patches into one series.
>>
>> The first patch could be squashed into the following patches,
>> but I found it easier to understand the chances with it in place.
>>
>> The series has been tested as follows:
>> * Building and testing a multilib RV32/64 toolchain
>>    (bootstrapped with riscv-gnu-toolchain repo)
>> * Manual review of generated sequences for GCC's atomic builtins API
>>
>> The programmatic re-implementation of CAS benefits from a REE improvement
>> (see PR100264):
>>    https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568680.html
>> If this patch is not in place, then an additional extension instruction
>> is emitted after the SC.W (in case of RV64 and CAS for uint32_t).
>>
>> Further, the new CAS code requires cbranch INSN helpers to be present:
>>    https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569689.html
>
> I was wondering is this patchset is blocked on some technical grounds.

There's a v3 (though I can't find all of it, so not quite sure what 
happened), but IIUC that still has the same fundamental problems that 
all these have had: changing over to the new fence model may by an ABI 
break and the split CAS implementation doesn't ensure eventual success 
(see Jim's comments).  Not sure if there's other comments floating 
around, though, that's just what I remember.

+Andrea, in case he has time to look at the memory model / ABI issues.  

We'd still need to sort out the CAS issues, though, and it's not 
abundantly clear it's worth the work: we're essentailly constrained to 
just emitting those fixed CAS sequences due to the eventual success 
rules, so it's not clear what the benefit of splitting those up is.  
With WRS there are some routines we might want to generate code for 
(cond_read_acquire() in Linux, for example) but we'd really need to dig 
into those to see if it's even sane/fast.

There's another patch set to fix the lack of inline atomic routines 
without breaking stuff, there were some minor comments from Kito and 
IIRC I had some test failures that I needed to chase down as well.  
That's a much safer fix in the short term, we'll need to deal with this 
eventually but at least we can stop the libatomic issues for the distro 
folks.

>
> Thx,
> -Vineet
>
>> Changes for v2:
>> * Guard LL/SC sequence by compiler barriers ("blockage")
>>    (suggested by Andrew Waterman)
>> * Changed commit message for AMOSWAP->STORE change
>>    (suggested by Andrew Waterman)
>> * Extracted cbranch4 patch from patchset (suggested by Kito Cheng)
>> * Introduce predicate riscv_sync_memory_operand (suggested by Jim Wilson)
>> * Fix small code style issue
>>
>> Christoph Muellner (10):
>>    RISC-V: Simplify memory model code [PR 100265]
>>    RISC-V: Emit proper memory ordering suffixes for AMOs [PR 100265]
>>    RISC-V: Eliminate %F specifier from riscv_print_operand() [PR 100265]
>>    RISC-V: Use STORE instead of AMOSWAP for atomic stores [PR 100265]
>>    RISC-V: Emit fences according to chosen memory model [PR 100265]
>>    RISC-V: Implement atomic_{load,store} [PR 100265]
>>    RISC-V: Model INSNs for LR and SC [PR 100266]
>>    RISC-V: Add s.ext-consuming INSNs for LR and SC [PR 100266]
>>    RISC-V: Provide programmatic implementation of CAS [PR 100266]
>>    RISC-V: Introduce predicate "riscv_sync_memory_operand" [PR 100266]
>>
>>   gcc/config/riscv/riscv-protos.h |   1 +
>>   gcc/config/riscv/riscv.c        | 136 +++++++++++++-------
>>   gcc/config/riscv/sync.md        | 216 +++++++++++++++++++++-----------
>>   3 files changed, 235 insertions(+), 118 deletions(-)
>>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 00/10] [RISC-V] Atomics improvements [PR100265/PR100266]
  2022-10-11 19:31   ` Palmer Dabbelt
@ 2022-10-11 20:46     ` Christoph Müllner
  2022-10-11 23:31       ` Vineet Gupta
  2022-10-11 23:14     ` Jeff Law
  1 sibling, 1 reply; 31+ messages in thread
From: Christoph Müllner @ 2022-10-11 20:46 UTC (permalink / raw)
  To: Palmer Dabbelt
  Cc: Vineet Gupta, andrea, gcc-patches, kito.cheng, gnu-toolchain

[-- Attachment #1: Type: text/plain, Size: 5851 bytes --]

On Tue, Oct 11, 2022 at 9:31 PM Palmer Dabbelt <palmer@dabbelt.com> wrote:

> On Tue, 11 Oct 2022 12:06:27 PDT (-0700), Vineet Gupta wrote:
> > Hi Christoph, Kito,
> >
> > On 5/5/21 12:36, Christoph Muellner via Gcc-patches wrote:
> >> This series provides a cleanup of the current atomics implementation
> >> of RISC-V:
> >>
> >> * PR100265: Use proper fences for atomic load/store
> >> * PR100266: Provide programmatic implementation of CAS
> >>
> >> As both are very related, I merged the patches into one series.
> >>
> >> The first patch could be squashed into the following patches,
> >> but I found it easier to understand the chances with it in place.
> >>
> >> The series has been tested as follows:
> >> * Building and testing a multilib RV32/64 toolchain
> >>    (bootstrapped with riscv-gnu-toolchain repo)
> >> * Manual review of generated sequences for GCC's atomic builtins API
> >>
> >> The programmatic re-implementation of CAS benefits from a REE
> improvement
> >> (see PR100264):
> >>    https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568680.html
> >> If this patch is not in place, then an additional extension instruction
> >> is emitted after the SC.W (in case of RV64 and CAS for uint32_t).
> >>
> >> Further, the new CAS code requires cbranch INSN helpers to be present:
> >>    https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569689.html
> >
> > I was wondering is this patchset is blocked on some technical grounds.
>
> There's a v3 (though I can't find all of it, so not quite sure what
> happened), but IIUC that still has the same fundamental problems that
> all these have had: changing over to the new fence model may by an ABI
> break and the split CAS implementation doesn't ensure eventual success
> (see Jim's comments).  Not sure if there's other comments floating
> around, though, that's just what I remember.
>

v3 was sent on May 27, 2022, when I rebased this on an internal tree:
  https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595712.html
I dropped the CAS patch in v3 (issue: stack spilling under extreme register
pressure instead of erroring out) as I thought that this was the blocker
for the series.
I just learned a few weeks ago, when I asked Palmer at the GNU Cauldron
about this series, that the ABI break is the blocker.

My initial understanding was that fixing something broken cannot be an ABI
break.
And that the mismatch of the implementation in 2021 and the recommended
mappings in the ratified specification from 2019 is something that is
broken. I still don't know the background here, but I guess this assumption
is incorrect from a historical point of view.

However, I'm sure that I am not the only one that assumes the mappings in
the specification to be implemented in compilers and tools. Therefore I
still consider the implementation of the RISC-V atomics in GCC as broken
(at least w.r.t. user expectation from people that lack the historical
background and just read the RISC-V specification).



>
> +Andrea, in case he has time to look at the memory model / ABI issues.
>
> We'd still need to sort out the CAS issues, though, and it's not
> abundantly clear it's worth the work: we're essentailly constrained to
> just emitting those fixed CAS sequences due to the eventual success
> rules, so it's not clear what the benefit of splitting those up is.
> With WRS there are some routines we might want to generate code for
> (cond_read_acquire() in Linux, for example) but we'd really need to dig
> into those to see if it's even sane/fast.
>
> There's another patch set to fix the lack of inline atomic routines
> without breaking stuff, there were some minor comments from Kito and
> IIRC I had some test failures that I needed to chase down as well.
> That's a much safer fix in the short term, we'll need to deal with this
> eventually but at least we can stop the libatomic issues for the distro
> folks.
>

I expect that the pressure for a proper fix upstream (instead of a backward
compatible compromise) will increase over time (once people start building
big iron based on RISC-V and start hunting performance bottlenecks in
multithreaded workloads to be competitive).
What could be done to get some relief is to enable the new atomics ABI by a
command line switch and promote its use. And at one point in the future (if
there are enough fixes to justify a break) the new ABI can be enabled by
default with a new flag to enable the old ABI.


>
> >
> > Thx,
> > -Vineet
> >
> >> Changes for v2:
> >> * Guard LL/SC sequence by compiler barriers ("blockage")
> >>    (suggested by Andrew Waterman)
> >> * Changed commit message for AMOSWAP->STORE change
> >>    (suggested by Andrew Waterman)
> >> * Extracted cbranch4 patch from patchset (suggested by Kito Cheng)
> >> * Introduce predicate riscv_sync_memory_operand (suggested by Jim
> Wilson)
> >> * Fix small code style issue
> >>
> >> Christoph Muellner (10):
> >>    RISC-V: Simplify memory model code [PR 100265]
> >>    RISC-V: Emit proper memory ordering suffixes for AMOs [PR 100265]
> >>    RISC-V: Eliminate %F specifier from riscv_print_operand() [PR 100265]
> >>    RISC-V: Use STORE instead of AMOSWAP for atomic stores [PR 100265]
> >>    RISC-V: Emit fences according to chosen memory model [PR 100265]
> >>    RISC-V: Implement atomic_{load,store} [PR 100265]
> >>    RISC-V: Model INSNs for LR and SC [PR 100266]
> >>    RISC-V: Add s.ext-consuming INSNs for LR and SC [PR 100266]
> >>    RISC-V: Provide programmatic implementation of CAS [PR 100266]
> >>    RISC-V: Introduce predicate "riscv_sync_memory_operand" [PR 100266]
> >>
> >>   gcc/config/riscv/riscv-protos.h |   1 +
> >>   gcc/config/riscv/riscv.c        | 136 +++++++++++++-------
> >>   gcc/config/riscv/sync.md        | 216 +++++++++++++++++++++-----------
> >>   3 files changed, 235 insertions(+), 118 deletions(-)
> >>
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 00/10] [RISC-V] Atomics improvements [PR100265/PR100266]
  2022-10-11 19:31   ` Palmer Dabbelt
  2022-10-11 20:46     ` Christoph Müllner
@ 2022-10-11 23:14     ` Jeff Law
  1 sibling, 0 replies; 31+ messages in thread
From: Jeff Law @ 2022-10-11 23:14 UTC (permalink / raw)
  To: gcc-patches


On 10/11/22 13:31, Palmer Dabbelt wrote:
> On Tue, 11 Oct 2022 12:06:27 PDT (-0700), Vineet Gupta wrote:
>> Hi Christoph, Kito,
>>
>> On 5/5/21 12:36, Christoph Muellner via Gcc-patches wrote:
>>> This series provides a cleanup of the current atomics implementation
>>> of RISC-V:
>>>
>>> * PR100265: Use proper fences for atomic load/store
>>> * PR100266: Provide programmatic implementation of CAS
>>>
>>> As both are very related, I merged the patches into one series.
>>>
>>> The first patch could be squashed into the following patches,
>>> but I found it easier to understand the chances with it in place.
>>>
>>> The series has been tested as follows:
>>> * Building and testing a multilib RV32/64 toolchain
>>>    (bootstrapped with riscv-gnu-toolchain repo)
>>> * Manual review of generated sequences for GCC's atomic builtins API
>>>
>>> The programmatic re-implementation of CAS benefits from a REE 
>>> improvement
>>> (see PR100264):
>>> https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568680.html
>>> If this patch is not in place, then an additional extension instruction
>>> is emitted after the SC.W (in case of RV64 and CAS for uint32_t).
>>>
>>> Further, the new CAS code requires cbranch INSN helpers to be present:
>>> https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569689.html
>>
>> I was wondering is this patchset is blocked on some technical grounds.
>
> There's a v3 (though I can't find all of it, so not quite sure what 
> happened), but IIUC that still has the same fundamental problems that 
> all these have had: changing over to the new fence model may by an ABI 
> break and the split CAS implementation doesn't ensure eventual success 
> (see Jim's comments).  Not sure if there's other comments floating 
> around, though, that's just what I remember.

Do we have a pointer to the ABI discussion.  I've been meaning to 
familiarize myself with the issues in this space and that seems like a 
good place to start given its blocking progress on the atomics.


jeff



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 00/10] [RISC-V] Atomics improvements [PR100265/PR100266]
  2022-10-11 20:46     ` Christoph Müllner
@ 2022-10-11 23:31       ` Vineet Gupta
  2022-10-12  0:15         ` Palmer Dabbelt
  2022-10-13 22:39         ` Jeff Law
  0 siblings, 2 replies; 31+ messages in thread
From: Vineet Gupta @ 2022-10-11 23:31 UTC (permalink / raw)
  To: Christoph Müllner, Palmer Dabbelt
  Cc: andrea, gcc-patches, kito.cheng, gnu-toolchain



On 10/11/22 13:46, Christoph Müllner wrote:
> On Tue, Oct 11, 2022 at 9:31 PM Palmer Dabbelt <palmer@dabbelt.com> wrote:
>
>     On Tue, 11 Oct 2022 12:06:27 PDT (-0700), Vineet Gupta wrote:
>     > Hi Christoph, Kito,
>     >
>     > On 5/5/21 12:36, Christoph Muellner via Gcc-patches wrote:
>     >> This series provides a cleanup of the current atomics
>     implementation
>     >> of RISC-V:
>     >>
>     >> * PR100265: Use proper fences for atomic load/store
>     >> * PR100266: Provide programmatic implementation of CAS
>     >>
>     >> As both are very related, I merged the patches into one series.
>     >>
>     >> The first patch could be squashed into the following patches,
>     >> but I found it easier to understand the chances with it in place.
>     >>
>     >> The series has been tested as follows:
>     >> * Building and testing a multilib RV32/64 toolchain
>     >>    (bootstrapped with riscv-gnu-toolchain repo)
>     >> * Manual review of generated sequences for GCC's atomic
>     builtins API
>     >>
>     >> The programmatic re-implementation of CAS benefits from a REE
>     improvement
>     >> (see PR100264):
>     >> https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568680.html
>     >> If this patch is not in place, then an additional extension
>     instruction
>     >> is emitted after the SC.W (in case of RV64 and CAS for uint32_t).
>     >>
>     >> Further, the new CAS code requires cbranch INSN helpers to be
>     present:
>     >> https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569689.html
>     >
>     > I was wondering is this patchset is blocked on some technical
>     grounds.
>
>     There's a v3 (though I can't find all of it, so not quite sure what
>     happened), but IIUC that still has the same fundamental problems that
>     all these have had: changing over to the new fence model may by an
>     ABI
>     break and the split CAS implementation doesn't ensure eventual
>     success
>     (see Jim's comments).  Not sure if there's other comments floating
>     around, though, that's just what I remember.
>
>
> v3 was sent on May 27, 2022, when I rebased this on an internal tree:
> https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595712.html
> I dropped the CAS patch in v3 (issue: stack spilling under extreme 
> register pressure instead of erroring out) as I thought that this was 
> the blocker for the series.
> I just learned a few weeks ago, when I asked Palmer at the GNU 
> Cauldron about this series, that the ABI break is the blocker.

Yeah I was confused about the ABI aspect as I didn't see any mention of 
that in the public reviews of v1 and v2.

> My initial understanding was that fixing something broken cannot be an 
> ABI break.
> And that the mismatch of the implementation in 2021 and the 
> recommended mappings in the ratified specification from 2019 is 
> something that is broken. I still don't know the background here, but 
> I guess this assumption is incorrect from a historical point of view.
>
> However, I'm sure that I am not the only one that assumes the mappings 
> in the specification to be implemented in compilers and tools. 
> Therefore I still consider the implementation of the RISC-V atomics in 
> GCC as broken (at least w.r.t. user expectation from people that lack 
> the historical background and just read the RISC-V specification).
>
>
>     +Andrea, in case he has time to look at the memory model / ABI
>     issues.
>
>     We'd still need to sort out the CAS issues, though, and it's not
>     abundantly clear it's worth the work: we're essentailly
>     constrained to
>     just emitting those fixed CAS sequences due to the eventual success
>     rules, so it's not clear what the benefit of splitting those up is.
>     With WRS there are some routines we might want to generate code for
>     (cond_read_acquire() in Linux, for example) but we'd really need
>     to dig
>     into those to see if it's even sane/fast.
>
>     There's another patch set to fix the lack of inline atomic routines
>     without breaking stuff, there were some minor comments from Kito and
>     IIRC I had some test failures that I needed to chase down as well.
>     That's a much safer fix in the short term, we'll need to deal with
>     this
>     eventually but at least we can stop the libatomic issues for the
>     distro
>     folks.
>
>
> I expect that the pressure for a proper fix upstream (instead of a 
> backward compatible compromise) will increase over time (once people 
> start building big iron based on RISC-V and start hunting performance 
> bottlenecks in multithreaded workloads to be competitive).
> What could be done to get some relief is to enable the new atomics ABI 
> by a command line switch and promote its use. And at one point in the 
> future (if there are enough fixes to justify a break) the new ABI can 
> be enabled by default with a new flag to enable the old ABI.

Indeed we are stuck with inefficiencies with status quo. The new abi 
option sounds like a reasonable plan going fwd.

Also my understand is that while the considerations are ABI centric, the 
option to faciliate this need not be tied to canonical -mabi=lp32, lp64d 
etc. It might just be a toggle as -matomic=legacy,2019 etc (this is not 
suggestive just indicative). Otherwise there's another level of blowup 
in multilib testing etc.

-Vineet

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 00/10] [RISC-V] Atomics improvements [PR100265/PR100266]
  2022-10-11 23:31       ` Vineet Gupta
@ 2022-10-12  0:15         ` Palmer Dabbelt
  2022-10-12  8:03           ` Christoph Müllner
                             ` (2 more replies)
  2022-10-13 22:39         ` Jeff Law
  1 sibling, 3 replies; 31+ messages in thread
From: Palmer Dabbelt @ 2022-10-12  0:15 UTC (permalink / raw)
  To: Vineet Gupta, jeffreyalaw
  Cc: cmuellner, Andrea Parri, gcc-patches, kito.cheng, gnu-toolchain

On Tue, 11 Oct 2022 16:31:25 PDT (-0700), Vineet Gupta wrote:
>
>
> On 10/11/22 13:46, Christoph Müllner wrote:
>> On Tue, Oct 11, 2022 at 9:31 PM Palmer Dabbelt <palmer@dabbelt.com> wrote:
>>
>>     On Tue, 11 Oct 2022 12:06:27 PDT (-0700), Vineet Gupta wrote:
>>     > Hi Christoph, Kito,
>>     >
>>     > On 5/5/21 12:36, Christoph Muellner via Gcc-patches wrote:
>>     >> This series provides a cleanup of the current atomics
>>     implementation
>>     >> of RISC-V:
>>     >>
>>     >> * PR100265: Use proper fences for atomic load/store
>>     >> * PR100266: Provide programmatic implementation of CAS
>>     >>
>>     >> As both are very related, I merged the patches into one series.
>>     >>
>>     >> The first patch could be squashed into the following patches,
>>     >> but I found it easier to understand the chances with it in place.
>>     >>
>>     >> The series has been tested as follows:
>>     >> * Building and testing a multilib RV32/64 toolchain
>>     >>    (bootstrapped with riscv-gnu-toolchain repo)
>>     >> * Manual review of generated sequences for GCC's atomic
>>     builtins API
>>     >>
>>     >> The programmatic re-implementation of CAS benefits from a REE
>>     improvement
>>     >> (see PR100264):
>>     >> https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568680.html
>>     >> If this patch is not in place, then an additional extension
>>     instruction
>>     >> is emitted after the SC.W (in case of RV64 and CAS for uint32_t).
>>     >>
>>     >> Further, the new CAS code requires cbranch INSN helpers to be
>>     present:
>>     >> https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569689.html
>>     >
>>     > I was wondering is this patchset is blocked on some technical
>>     grounds.
>>
>>     There's a v3 (though I can't find all of it, so not quite sure what
>>     happened), but IIUC that still has the same fundamental problems that
>>     all these have had: changing over to the new fence model may by an
>>     ABI
>>     break and the split CAS implementation doesn't ensure eventual
>>     success
>>     (see Jim's comments).  Not sure if there's other comments floating
>>     around, though, that's just what I remember.
>>
>>
>> v3 was sent on May 27, 2022, when I rebased this on an internal tree:
>> https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595712.html
>> I dropped the CAS patch in v3 (issue: stack spilling under extreme 
>> register pressure instead of erroring out) as I thought that this was 
>> the blocker for the series.
>> I just learned a few weeks ago, when I asked Palmer at the GNU 
>> Cauldron about this series, that the ABI break is the blocker.
>
> Yeah I was confused about the ABI aspect as I didn't see any mention of 
> that in the public reviews of v1 and v2.

Sorry, I thought we'd talked about it somewhere but it must have just 
been in meetings and such.  Patrick was writing a similar patch set 
around the same time so it probably just got tied up in that, we ended 
up reducing it to just the strong CAS inline stuff because we couldn't 
sort out the correctness of the rest of it.

>> My initial understanding was that fixing something broken cannot be an 
>> ABI break.
>> And that the mismatch of the implementation in 2021 and the 
>> recommended mappings in the ratified specification from 2019 is 
>> something that is broken. I still don't know the background here, but 
>> I guess this assumption is incorrect from a historical point of view.

We agreed that we wouldn't break binaries back when we submitted the 
port.  The ISA has changed many times since then, including adding the 
recommended mappings, but those binaries exist and we can't just 
silently break things for users.

>> However, I'm sure that I am not the only one that assumes the mappings 
>> in the specification to be implemented in compilers and tools. 
>> Therefore I still consider the implementation of the RISC-V atomics in 
>> GCC as broken (at least w.r.t. user expectation from people that lack 
>> the historical background and just read the RISC-V specification).

You can't just read one of those RISC-V PDFs and assume that 
implementations that match those words will function correctly.  Those 
words regularly change in ways where reasonable readers would end up 
with incompatible implementations due to those differences.  That's why 
we're so explicit about versions and such these days, we're just getting 
burned by these old mappings because they're from back when we though 
the RISC-V definition of compatibility was going to match the more 
common one and we didn't build in fallbacks.

>>     +Andrea, in case he has time to look at the memory model / ABI
>>     issues.
>>
>>     We'd still need to sort out the CAS issues, though, and it's not
>>     abundantly clear it's worth the work: we're essentailly
>>     constrained to
>>     just emitting those fixed CAS sequences due to the eventual success
>>     rules, so it's not clear what the benefit of splitting those up is.
>>     With WRS there are some routines we might want to generate code for
>>     (cond_read_acquire() in Linux, for example) but we'd really need
>>     to dig
>>     into those to see if it's even sane/fast.
>>
>>     There's another patch set to fix the lack of inline atomic routines
>>     without breaking stuff, there were some minor comments from Kito and
>>     IIRC I had some test failures that I needed to chase down as well.
>>     That's a much safer fix in the short term, we'll need to deal with
>>     this
>>     eventually but at least we can stop the libatomic issues for the
>>     distro
>>     folks.
>>
>>
>> I expect that the pressure for a proper fix upstream (instead of a 
>> backward compatible compromise) will increase over time (once people 
>> start building big iron based on RISC-V and start hunting performance 
>> bottlenecks in multithreaded workloads to be competitive).
>> What could be done to get some relief is to enable the new atomics ABI 
>> by a command line switch and promote its use. And at one point in the 
>> future (if there are enough fixes to justify a break) the new ABI can 
>> be enabled by default with a new flag to enable the old ABI.
>
> Indeed we are stuck with inefficiencies with status quo. The new abi 
> option sounds like a reasonable plan going fwd.

I don't think we're just stuck with the status quo, we really just need 
to go through the mappings and figure out which can be made both fast 
and ABI-compatible.  Then we can fix those and see where we stand, maybe 
it's good enough or maybe we need to introduce some sort of 
compatibility break to make things faster (and/or compatible with LLVM, 
where I suspect we're broken right now).

If we do need a break then I think it's probably possible to do it in 
phases, where we have a middle-ground compatibility mode that works for 
both the old and new mappings so distros can gradually move over as they 
rebuild packages.

Issues like the libstdc++ shared_ptr/mutex fallback don't map well to 
that, though.  There's also some stuff like the IO fence bits that we 
can probably just add an argument for now, those were likely just a bad 
idea at the time and should be safe to turn off for the vast majority of 
users (though those are more of an API break).

> Also my understand is that while the considerations are ABI centric, the 
> option to faciliate this need not be tied to canonical -mabi=lp32, lp64d 
> etc. It might just be a toggle as -matomic=legacy,2019 etc (this is not 
> suggestive just indicative). Otherwise there's another level of blowup 
> in multilib testing etc.

The psABI doesn't mention memory ordering at all.  IIUC that's a pretty 
standard hole in psABI documents, but it means we're in a grey area 
here.

+Jeff, who was offering to help when the threads got crossed.  I'd 
punted on a lot of this in the hope Andrea could help out, as I'm not 
really a memory model guy and this is pretty far down the rabbit hole.  
Happy to have the help if you're offering, though, as what's there is 
likely a pretty big performance issue for anyone with a reasonable 
memory system.

> -Vineet

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 00/10] [RISC-V] Atomics improvements [PR100265/PR100266]
  2022-10-12  0:15         ` Palmer Dabbelt
@ 2022-10-12  8:03           ` Christoph Müllner
  2022-10-13 23:11             ` Jeff Law
  2022-10-12 17:16           ` Andrea Parri
  2022-10-13 23:04           ` Jeff Law
  2 siblings, 1 reply; 31+ messages in thread
From: Christoph Müllner @ 2022-10-12  8:03 UTC (permalink / raw)
  To: Palmer Dabbelt
  Cc: Vineet Gupta, jeffreyalaw, Andrea Parri, gcc-patches, kito.cheng,
	gnu-toolchain

[-- Attachment #1: Type: text/plain, Size: 9908 bytes --]

On Wed, Oct 12, 2022 at 2:15 AM Palmer Dabbelt <palmer@dabbelt.com> wrote:

> On Tue, 11 Oct 2022 16:31:25 PDT (-0700), Vineet Gupta wrote:
> >
> >
> > On 10/11/22 13:46, Christoph Müllner wrote:
> >> On Tue, Oct 11, 2022 at 9:31 PM Palmer Dabbelt <palmer@dabbelt.com>
> wrote:
> >>
> >>     On Tue, 11 Oct 2022 12:06:27 PDT (-0700), Vineet Gupta wrote:
> >>     > Hi Christoph, Kito,
> >>     >
> >>     > On 5/5/21 12:36, Christoph Muellner via Gcc-patches wrote:
> >>     >> This series provides a cleanup of the current atomics
> >>     implementation
> >>     >> of RISC-V:
> >>     >>
> >>     >> * PR100265: Use proper fences for atomic load/store
> >>     >> * PR100266: Provide programmatic implementation of CAS
> >>     >>
> >>     >> As both are very related, I merged the patches into one series.
> >>     >>
> >>     >> The first patch could be squashed into the following patches,
> >>     >> but I found it easier to understand the chances with it in place.
> >>     >>
> >>     >> The series has been tested as follows:
> >>     >> * Building and testing a multilib RV32/64 toolchain
> >>     >>    (bootstrapped with riscv-gnu-toolchain repo)
> >>     >> * Manual review of generated sequences for GCC's atomic
> >>     builtins API
> >>     >>
> >>     >> The programmatic re-implementation of CAS benefits from a REE
> >>     improvement
> >>     >> (see PR100264):
> >>     >> https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568680.html
> >>     >> If this patch is not in place, then an additional extension
> >>     instruction
> >>     >> is emitted after the SC.W (in case of RV64 and CAS for uint32_t).
> >>     >>
> >>     >> Further, the new CAS code requires cbranch INSN helpers to be
> >>     present:
> >>     >> https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569689.html
> >>     >
> >>     > I was wondering is this patchset is blocked on some technical
> >>     grounds.
> >>
> >>     There's a v3 (though I can't find all of it, so not quite sure what
> >>     happened), but IIUC that still has the same fundamental problems
> that
> >>     all these have had: changing over to the new fence model may by an
> >>     ABI
> >>     break and the split CAS implementation doesn't ensure eventual
> >>     success
> >>     (see Jim's comments).  Not sure if there's other comments floating
> >>     around, though, that's just what I remember.
> >>
> >>
> >> v3 was sent on May 27, 2022, when I rebased this on an internal tree:
> >> https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595712.html
> >> I dropped the CAS patch in v3 (issue: stack spilling under extreme
> >> register pressure instead of erroring out) as I thought that this was
> >> the blocker for the series.
> >> I just learned a few weeks ago, when I asked Palmer at the GNU
> >> Cauldron about this series, that the ABI break is the blocker.
> >
> > Yeah I was confused about the ABI aspect as I didn't see any mention of
> > that in the public reviews of v1 and v2.
>
> Sorry, I thought we'd talked about it somewhere but it must have just
> been in meetings and such.  Patrick was writing a similar patch set
> around the same time so it probably just got tied up in that, we ended
> up reducing it to just the strong CAS inline stuff because we couldn't
> sort out the correctness of the rest of it.
>
> >> My initial understanding was that fixing something broken cannot be an
> >> ABI break.
> >> And that the mismatch of the implementation in 2021 and the
> >> recommended mappings in the ratified specification from 2019 is
> >> something that is broken. I still don't know the background here, but
> >> I guess this assumption is incorrect from a historical point of view.
>
> We agreed that we wouldn't break binaries back when we submitted the
> port.  The ISA has changed many times since then, including adding the
> recommended mappings, but those binaries exist and we can't just
> silently break things for users.
>
> >> However, I'm sure that I am not the only one that assumes the mappings
> >> in the specification to be implemented in compilers and tools.
> >> Therefore I still consider the implementation of the RISC-V atomics in
> >> GCC as broken (at least w.r.t. user expectation from people that lack
> >> the historical background and just read the RISC-V specification).
>
> You can't just read one of those RISC-V PDFs and assume that
> implementations that match those words will function correctly.  Those
> words regularly change in ways where reasonable readers would end up
> with incompatible implementations due to those differences.  That's why
> we're so explicit about versions and such these days, we're just getting
> burned by these old mappings because they're from back when we though
> the RISC-V definition of compatibility was going to match the more
> common one and we didn't build in fallbacks.
>

Indeed, read-and-assume might not be the best idea.
But read-and-ignore (in the hope everyone else does as well) won't help
either.

I think it is reasonable to expect this detailed mapping table from the
specification being implemented.
Why would it have been written and why would one ignore it?

But let's look at the current situation and the proposed solutions so far
(below).


>
> >>     +Andrea, in case he has time to look at the memory model / ABI
> >>     issues.
> >>
> >>     We'd still need to sort out the CAS issues, though, and it's not
> >>     abundantly clear it's worth the work: we're essentailly
> >>     constrained to
> >>     just emitting those fixed CAS sequences due to the eventual success
> >>     rules, so it's not clear what the benefit of splitting those up is.
> >>     With WRS there are some routines we might want to generate code for
> >>     (cond_read_acquire() in Linux, for example) but we'd really need
> >>     to dig
> >>     into those to see if it's even sane/fast.
> >>
> >>     There's another patch set to fix the lack of inline atomic routines
> >>     without breaking stuff, there were some minor comments from Kito and
> >>     IIRC I had some test failures that I needed to chase down as well.
> >>     That's a much safer fix in the short term, we'll need to deal with
> >>     this
> >>     eventually but at least we can stop the libatomic issues for the
> >>     distro
> >>     folks.
> >>
> >>
> >> I expect that the pressure for a proper fix upstream (instead of a
> >> backward compatible compromise) will increase over time (once people
> >> start building big iron based on RISC-V and start hunting performance
> >> bottlenecks in multithreaded workloads to be competitive).
> >> What could be done to get some relief is to enable the new atomics ABI
> >> by a command line switch and promote its use. And at one point in the
> >> future (if there are enough fixes to justify a break) the new ABI can
> >> be enabled by default with a new flag to enable the old ABI.
> >
> > Indeed we are stuck with inefficiencies with status quo. The new abi
> > option sounds like a reasonable plan going fwd.
>
> I don't think we're just stuck with the status quo, we really just need
> to go through the mappings and figure out which can be made both fast
> and ABI-compatible.  Then we can fix those and see where we stand, maybe
> it's good enough or maybe we need to introduce some sort of
> compatibility break to make things faster (and/or compatible with LLVM,
> where I suspect we're broken right now).
>

So we have the following atomics ABIs:
 I) GCC implementation
 II) LLVM implementation
 III) Specified ABI in the "Code Porting and Mapping Guidelines" appendix
of the RISC-V specification

And there are two proposed solutions:
 a) Finding a new ABI that is compatible with I) and II) is of course a
solution, but we don't know if and when such a solution exists.
 b) Going to introduce III) causes a break and therefore needs special care
(e.g. let the user decide via command line flag or provide a
compatibility mode).

I don't see that a) and b) contradict each other.
Why not going for both:
 -) Continue to work on a backward compatible solution
 -) Enable the "new" ABI from the specification appendix via command
line flag
 -) Reevaluate the situation in 12 months to decide the next steps



>
> If we do need a break then I think it's probably possible to do it in
> phases, where we have a middle-ground compatibility mode that works for
> both the old and new mappings so distros can gradually move over as they
> rebuild packages.
>
> Issues like the libstdc++ shared_ptr/mutex fallback don't map well to
> that, though.  There's also some stuff like the IO fence bits that we
> can probably just add an argument for now, those were likely just a bad
> idea at the time and should be safe to turn off for the vast majority of
> users (though those are more of an API break).
>
> > Also my understand is that while the considerations are ABI centric, the
> > option to faciliate this need not be tied to canonical -mabi=lp32, lp64d
> > etc. It might just be a toggle as -matomic=legacy,2019 etc (this is not
> > suggestive just indicative). Otherwise there's another level of blowup
> > in multilib testing etc.
>
> The psABI doesn't mention memory ordering at all.  IIUC that's a pretty
> standard hole in psABI documents, but it means we're in a grey area
> here.
>
> +Jeff, who was offering to help when the threads got crossed.  I'd
> punted on a lot of this in the hope Andrea could help out, as I'm not
> really a memory model guy and this is pretty far down the rabbit hole.
> Happy to have the help if you're offering, though, as what's there is
> likely a pretty big performance issue for anyone with a reasonable
> memory system.
>
> > -Vineet
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 00/10] [RISC-V] Atomics improvements [PR100265/PR100266]
  2022-10-12  0:15         ` Palmer Dabbelt
  2022-10-12  8:03           ` Christoph Müllner
@ 2022-10-12 17:16           ` Andrea Parri
  2022-10-20 19:01             ` Andrea Parri
  2022-10-13 23:04           ` Jeff Law
  2 siblings, 1 reply; 31+ messages in thread
From: Andrea Parri @ 2022-10-12 17:16 UTC (permalink / raw)
  To: Palmer Dabbelt
  Cc: Vineet Gupta, jeffreyalaw, cmuellner, gcc-patches, kito.cheng,
	gnu-toolchain

> > >     +Andrea, in case he has time to look at the memory model / ABI
> > >     issues.

> +Jeff, who was offering to help when the threads got crossed.  I'd punted on
> a lot of this in the hope Andrea could help out, as I'm not really a memory
> model guy and this is pretty far down the rabbit hole.  Happy to have the
> help if you're offering, though, as what's there is likely a pretty big
> performance issue for anyone with a reasonable memory system.

Thanks for linking me to the discussion and the remarks, Palmer.  I'm
happy to help (and synchronized with Jeff/the community) as possible,
building a better understanding of the 'issues' at stake.

  Andrea

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 00/10] [RISC-V] Atomics improvements [PR100265/PR100266]
  2022-10-11 23:31       ` Vineet Gupta
  2022-10-12  0:15         ` Palmer Dabbelt
@ 2022-10-13 22:39         ` Jeff Law
  2022-10-13 23:14           ` Palmer Dabbelt
  2022-10-14  0:14           ` Vineet Gupta
  1 sibling, 2 replies; 31+ messages in thread
From: Jeff Law @ 2022-10-13 22:39 UTC (permalink / raw)
  To: gcc-patches


On 10/11/22 17:31, Vineet Gupta wrote:
>
>>
>> I expect that the pressure for a proper fix upstream (instead of a 
>> backward compatible compromise) will increase over time (once people 
>> start building big iron based on RISC-V and start hunting performance 
>> bottlenecks in multithreaded workloads to be competitive).
>> What could be done to get some relief is to enable the new atomics 
>> ABI by a command line switch and promote its use. And at one point in 
>> the future (if there are enough fixes to justify a break) the new ABI 
>> can be enabled by default with a new flag to enable the old ABI.
>
> Indeed we are stuck with inefficiencies with status quo. The new abi 
> option sounds like a reasonable plan going fwd.
>
> Also my understand is that while the considerations are ABI centric, 
> the option to faciliate this need not be tied to canonical -mabi=lp32, 
> lp64d etc. It might just be a toggle as -matomic=legacy,2019 etc (this 
> is not suggestive just indicative). Otherwise there's another level of 
> blowup in multilib testing etc.

If I understand the history here, we're essentially catering to code 
that is potentially relying on behavior that was never really 
guaranteed.   That's not really ABI -- it's more depending on specifics 
of an implementation or undefined/underdefined behavior.    Holding back 
progress for that case seems short-sighted, particularly given how early 
I hope we are in the RISC-V journey.


But I'm also sympathetic to the desire not to break existing code.  
Could we keep the old behavior under a flag and fix the default behavior 
here, presumably with a bit in the ELF header indicating code that wants 
the old behavior?


Jeff



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 00/10] [RISC-V] Atomics improvements [PR100265/PR100266]
  2022-10-12  0:15         ` Palmer Dabbelt
  2022-10-12  8:03           ` Christoph Müllner
  2022-10-12 17:16           ` Andrea Parri
@ 2022-10-13 23:04           ` Jeff Law
  2 siblings, 0 replies; 31+ messages in thread
From: Jeff Law @ 2022-10-13 23:04 UTC (permalink / raw)
  To: Palmer Dabbelt, Vineet Gupta
  Cc: cmuellner, Andrea Parri, gcc-patches, kito.cheng, gnu-toolchain


On 10/11/22 18:15, Palmer Dabbelt wrote:
>
> Sorry, I thought we'd talked about it somewhere but it must have just 
> been in meetings and such.  Patrick was writing a similar patch set 
> around the same time so it probably just got tied up in that, we ended 
> up reducing it to just the strong CAS inline stuff because we couldn't 
> sort out the correctness of the rest of it.

Now that you mention it, I vaguely recall a discussion about inline 
atomics vs libatomic and the discussion on this issue might have been 
there rather than in Christophe's patchset.


>
>>> My initial understanding was that fixing something broken cannot be 
>>> an ABI break.
>>> And that the mismatch of the implementation in 2021 and the 
>>> recommended mappings in the ratified specification from 2019 is 
>>> something that is broken. I still don't know the background here, 
>>> but I guess this assumption is incorrect from a historical point of 
>>> view.
>
> We agreed that we wouldn't break binaries back when we submitted the 
> port.  The ISA has changed many times since then, including adding the 
> recommended mappings, but those binaries exist and we can't just 
> silently break things for users.

That may be too strong of a policy -- just because something worked in 
the past doesn't mean it must always work.  It really depends on the 
contracts specified by the ABI, processor reference documentation, etc 
and whether or not the code relies on something that it outside those 
contracts.  If it does rely on behavior outside the contracts, then we 
shouldn't be constrained by such an agreement.

Another way to think about this problem is do we want more code making 
incorrect assumptions about the behavior of atomics getting out in the 
wild?  My take would be that we nip this in the bud, get it right now in 
the default configuration, but leave enough bits in place that existing 
code continues to work.

>
>>> However, I'm sure that I am not the only one that assumes the 
>>> mappings in the specification to be implemented in compilers and 
>>> tools. Therefore I still consider the implementation of the RISC-V 
>>> atomics in GCC as broken (at least w.r.t. user expectation from 
>>> people that lack the historical background and just read the RISC-V 
>>> specification).
>
> You can't just read one of those RISC-V PDFs and assume that 
> implementations that match those words will function correctly. Those 
> words regularly change in ways where reasonable readers would end up 
> with incompatible implementations due to those differences.  That's 
> why we're so explicit about versions and such these days, we're just 
> getting burned by these old mappings because they're from back when we 
> though the RISC-V definition of compatibility was going to match the 
> more common one and we didn't build in fallbacks.

Fair point, but in my mind that argues that the platform must mature 
further so that the contracts can be relied upon.  That obviously needs 
to get fixed and until it does any agreements or guarantees about 
behavior of existing code can't be reasonably made.  If we're going to 
be taken seriously, then those fundamentals have to be rock solid.


>
> I don't think we're just stuck with the status quo, we really just 
> need to go through the mappings and figure out which can be made both 
> fast and ABI-compatible.  Then we can fix those and see where we 
> stand, maybe it's good enough or maybe we need to introduce some sort 
> of compatibility break to make things faster (and/or compatible with 
> LLVM, where I suspect we're broken right now).

Certainly seems like a good first step.  What we can fix without 
breaking things we do while we sort out the tougher problems.


>
> If we do need a break then I think it's probably possible to do it in 
> phases, where we have a middle-ground compatibility mode that works 
> for both the old and new mappings so distros can gradually move over 
> as they rebuild packages.

As someone that lived in the distro space for a long time, I would argue 
that now is the time to fix this stuff -- before there is a large uptake 
in distro consumption.


>
> +Jeff, who was offering to help when the threads got crossed.  I'd 
> punted on a lot of this in the hope Andrea could help out, as I'm not 
> really a memory model guy and this is pretty far down the rabbit 
> hole.  Happy to have the help if you're offering, though, as what's 
> there is likely a pretty big performance issue for anyone with a 
> reasonable memory system.

Hmm, there's a case I'm pondering if I can discuss  or not. Probably not 
since I can't recall it ever being discussed in public.  So I'll just 
say this space can be critically important for performance and the 
longer we wait, the tougher it gets to fix without causing significant 
disruption.

Jeff

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 00/10] [RISC-V] Atomics improvements [PR100265/PR100266]
  2022-10-12  8:03           ` Christoph Müllner
@ 2022-10-13 23:11             ` Jeff Law
  0 siblings, 0 replies; 31+ messages in thread
From: Jeff Law @ 2022-10-13 23:11 UTC (permalink / raw)
  To: Christoph Müllner, Palmer Dabbelt
  Cc: Vineet Gupta, Andrea Parri, gcc-patches, kito.cheng, gnu-toolchain

[-- Attachment #1: Type: text/plain, Size: 1400 bytes --]


On 10/12/22 02:03, Christoph Müllner wrote:
>
>
> So we have the following atomics ABIs:
>  I) GCC implementation
>  II) LLVM implementation
>  III) Specified ABI in the "Code Porting and Mapping Guidelines" 
> appendix of the RISC-V specification

And presumably we don't have any way to distinguish between I and II at 
the DSO or object level.  That implies that if we're going to get to 
III, then we have to mark new code.  We obviously can't mark 
pre-existing bits (and I may have implied we should do that in my 
earlier message, my goof).



>
> And there are two proposed solutions:
>  a) Finding a new ABI that is compatible with I) and II) is of course 
> a solution, but we don't know if and when such a solution exists.
>  b) Going to introduce III) causes a break and therefore needs special 
> care (e.g. let the user decide via command line flag or provide a 
> compatibility mode).
>
> I don't see that a) and b) contradict each other.
> Why not going for both:
>  -) Continue to work on a backward compatible solution
>  -) Enable the "new" ABI from the specification appendix via command 
> line flag
>  -) Reevaluate the situation in 12 months to decide the next steps
I would lean towards making the new, more correct, behavior the default 
and having the old behavior enabled by a command line flag. But 
otherwise what you're suggesting seems reasonable.


Jeff

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 00/10] [RISC-V] Atomics improvements [PR100265/PR100266]
  2022-10-13 22:39         ` Jeff Law
@ 2022-10-13 23:14           ` Palmer Dabbelt
  2022-10-14 11:03             ` Christoph Müllner
  2022-10-14  0:14           ` Vineet Gupta
  1 sibling, 1 reply; 31+ messages in thread
From: Palmer Dabbelt @ 2022-10-13 23:14 UTC (permalink / raw)
  To: gcc-patches

On Thu, 13 Oct 2022 15:39:39 PDT (-0700), gcc-patches@gcc.gnu.org wrote:
>
> On 10/11/22 17:31, Vineet Gupta wrote:
>>
>>>
>>> I expect that the pressure for a proper fix upstream (instead of a
>>> backward compatible compromise) will increase over time (once people
>>> start building big iron based on RISC-V and start hunting performance
>>> bottlenecks in multithreaded workloads to be competitive).
>>> What could be done to get some relief is to enable the new atomics
>>> ABI by a command line switch and promote its use. And at one point in
>>> the future (if there are enough fixes to justify a break) the new ABI
>>> can be enabled by default with a new flag to enable the old ABI.
>>
>> Indeed we are stuck with inefficiencies with status quo. The new abi
>> option sounds like a reasonable plan going fwd.
>>
>> Also my understand is that while the considerations are ABI centric,
>> the option to faciliate this need not be tied to canonical -mabi=lp32,
>> lp64d etc. It might just be a toggle as -matomic=legacy,2019 etc (this
>> is not suggestive just indicative). Otherwise there's another level of
>> blowup in multilib testing etc.
>
> If I understand the history here, we're essentially catering to code
> that is potentially relying on behavior that was never really
> guaranteed.   That's not really ABI -- it's more depending on specifics
> of an implementation or undefined/underdefined behavior.    Holding back
> progress for that case seems short-sighted, particularly given how early
> I hope we are in the RISC-V journey.
>
>
> But I'm also sympathetic to the desire not to break existing code. 

I suppose ABI means a ton of things, but in this case that's really want 
I was trying to get at: just changing to the mappings suggested in the 
ISA manual risks producing binaries that don't work when mixed with old 
binaries.  My rationale for calling that ABI was that there's a defacto 
memory model ABI defined as "anything that works with the old binaries", 
but ABI means so many things maybe we just shouldn't say it at all here?

I'm going to call it "old binary compatibility" here, rather than "ABI 
compatibility", just so it's a different term.

> Could we keep the old behavior under a flag and fix the default behavior
> here, presumably with a bit in the ELF header indicating code that wants
> the old behavior?

The thread got forked a bit, but that's essentially what I was trying to 
suggest.  I talked with Andrea a bit and here's how I'd describe it:

We have a mapping from the C{,++}11 memory model to RVWMO that's 
currently emitted by GCC, and there are years of binaries with that 
mapping.  As far as we know that mapping is correct, but I don't think 
anyone's gone through and formally analyzed it.  Let's call those the 
"GCC mapping".

There's also a mapping listed in the ISA manual.  That's not the same as 
the GCC mapping, so let's call it the "ISA mapping".  We need to 
double-check the specifics, but IIUC this ISA mapping is broadly 
followed by LLVM.  It's also very likely to be correct, as it's been 
analyzed by lots of formal memory model people as part of the RVWMO 
standardization process.

As far as I know, everyone likes the ISA mapping better than the GCC 
mapping.  It's hard to describe why concretely because there's no 
hardware that implements RVWMO sufficiently aggressively that we can 
talk performance, but at least matching what's in the ISA manual is the 
way to go.  Also, just kind of a personal opinion, the GCC mapping does 
some weird ugly stuff.

So the question is really: how do we get rid of the GCC mapping while 
causing as little headache as possible?

My proposal is as follows:

* Let's properly analyze the GCC mapping, just on its own.  Maybe it's 
  broken, if it is then I think we've got a pretty decent argument to 
  just ignore old binary compatibility -- if it was always broken then 
  we can't break it, so who cares.
* Assuming the GCC mapping is internally consistent, let's analyze 
  arbitrary mixes of the GCC and ISA mappings.  It's not generally true 
  that two correct mappings can be freely mixed, but maybe we've got 
  some cases that work fine.  Maybe we even get lucky and everything's 
  compatible, who knows (though I'm worried about the same .aq vs fence 
  stuff we've had in LKMM a few times).
* Assuming there's any issue mixing the GCC and ISA mappings, let's add 
  a flag to GCC.  Something like -mrvwmo-compat={legacy,both,isa}, where:
    - =legacy does what we have now.  We can eventually deprecate this, 
      and assuming =both is fast enough maybe we don't even need it.
    - =both implements a mapping that's compatible with both the GCC and 
      ISA mappings.  This might be slow or something, but it'll be 
      correct with both.  We can set this to the default now, as it's 
      safe.
    - =isa implements the ISA mappings.

Then we can go stick some marker in the ELF saying "this binary is 
compatible with the ISA mappings" to triage binaries in systems.  That 
way distros can decide when they want to move to -mrvwmo-compat=isa by 
default, maybe when they naturally do a world rebuild.  Eventually we 
can make that the default in GCC upstream and later deprecate the 
compatibility modes.

That's going to take a bit of work on the formal memory model side of 
things, but I think we've at least got enough of Andrea's bandwidth to 
make it happen in a reasonable timeframe.  I don't think there's a ton 
of implementation work to do once we sort out the desired mappings, and 
a chunk of that can probably start in parallel as we'll likely need 
stuff like the ELF tagging eventually.

Landing this before stage4 might be tricky, though.  I tell myself we're 
not going to take late backend features every release... ;)

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 00/10] [RISC-V] Atomics improvements [PR100265/PR100266]
  2022-10-13 22:39         ` Jeff Law
  2022-10-13 23:14           ` Palmer Dabbelt
@ 2022-10-14  0:14           ` Vineet Gupta
  1 sibling, 0 replies; 31+ messages in thread
From: Vineet Gupta @ 2022-10-14  0:14 UTC (permalink / raw)
  To: Jeff Law, gcc-patches; +Cc: Palmer Dabbelt, Christoph Muellner

On 10/13/22 15:39, Jeff Law via Gcc-patches wrote:
> 
> On 10/11/22 17:31, Vineet Gupta wrote:
>>
>>>
>>> I expect that the pressure for a proper fix upstream (instead of a 
>>> backward compatible compromise) will increase over time (once people 
>>> start building big iron based on RISC-V and start hunting performance 
>>> bottlenecks in multithreaded workloads to be competitive).
>>> What could be done to get some relief is to enable the new atomics 
>>> ABI by a command line switch and promote its use. And at one point in 
>>> the future (if there are enough fixes to justify a break) the new ABI 
>>> can be enabled by default with a new flag to enable the old ABI.
>>
>> Indeed we are stuck with inefficiencies with status quo. The new abi 
>> option sounds like a reasonable plan going fwd.
>>
>> Also my understand is that while the considerations are ABI centric, 
>> the option to faciliate this need not be tied to canonical -mabi=lp32, 
>> lp64d etc. It might just be a toggle as -matomic=legacy,2019 etc (this 
>> is not suggestive just indicative). Otherwise there's another level of 
>> blowup in multilib testing etc.
> 
> If I understand the history here, we're essentially catering to code 
> that is potentially relying on behavior that was never really 
> guaranteed.   That's not really ABI -- it's more depending on specifics 
> of an implementation or undefined/underdefined behavior.    Holding back 
> progress for that case seems short-sighted, particularly given how early 
> I hope we are in the RISC-V journey.

Exactly. I keep hearing about the potential ABI breakage but no real 
discussion (publicly at least) which describe in detail what exactly 
that ABI / old-behavior breakage is with this patch series [1]. So 
perhaps we can start with reviewing the patches, calling out exactly 
what change causes the divergence and if that is acceptable or not. And 
while at it, perhaps also make updates to [2]


[1] https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595712.html
[2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100265

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 00/10] [RISC-V] Atomics improvements [PR100265/PR100266]
  2022-10-13 23:14           ` Palmer Dabbelt
@ 2022-10-14 11:03             ` Christoph Müllner
  2022-10-14 20:39               ` Jeff Law
  0 siblings, 1 reply; 31+ messages in thread
From: Christoph Müllner @ 2022-10-14 11:03 UTC (permalink / raw)
  To: Palmer Dabbelt
  Cc: gcc-patches, Vineet Gupta, Jeff Law, Christoph Muellner,
	Kito Cheng, gnu-toolchain, Philipp Tomsich,
	Christoph Müllner

[-- Attachment #1: Type: text/plain, Size: 6963 bytes --]

On Fri, Oct 14, 2022 at 1:15 AM Palmer Dabbelt <palmer@dabbelt.com> wrote:

> On Thu, 13 Oct 2022 15:39:39 PDT (-0700), gcc-patches@gcc.gnu.org wrote:
> >
> > On 10/11/22 17:31, Vineet Gupta wrote:
> >>
> >>>
> >>> I expect that the pressure for a proper fix upstream (instead of a
> >>> backward compatible compromise) will increase over time (once people
> >>> start building big iron based on RISC-V and start hunting performance
> >>> bottlenecks in multithreaded workloads to be competitive).
> >>> What could be done to get some relief is to enable the new atomics
> >>> ABI by a command line switch and promote its use. And at one point in
> >>> the future (if there are enough fixes to justify a break) the new ABI
> >>> can be enabled by default with a new flag to enable the old ABI.
> >>
> >> Indeed we are stuck with inefficiencies with status quo. The new abi
> >> option sounds like a reasonable plan going fwd.
> >>
> >> Also my understand is that while the considerations are ABI centric,
> >> the option to faciliate this need not be tied to canonical -mabi=lp32,
> >> lp64d etc. It might just be a toggle as -matomic=legacy,2019 etc (this
> >> is not suggestive just indicative). Otherwise there's another level of
> >> blowup in multilib testing etc.
> >
> > If I understand the history here, we're essentially catering to code
> > that is potentially relying on behavior that was never really
> > guaranteed.   That's not really ABI -- it's more depending on specifics
> > of an implementation or undefined/underdefined behavior.    Holding back
> > progress for that case seems short-sighted, particularly given how early
> > I hope we are in the RISC-V journey.
> >
> >
> > But I'm also sympathetic to the desire not to break existing code.
>
> I suppose ABI means a ton of things, but in this case that's really want
> I was trying to get at: just changing to the mappings suggested in the
> ISA manual risks producing binaries that don't work when mixed with old
> binaries.  My rationale for calling that ABI was that there's a defacto
> memory model ABI defined as "anything that works with the old binaries",
> but ABI means so many things maybe we just shouldn't say it at all here?
>
> I'm going to call it "old binary compatibility" here, rather than "ABI
> compatibility", just so it's a different term.
>
> > Could we keep the old behavior under a flag and fix the default behavior
> > here, presumably with a bit in the ELF header indicating code that wants
> > the old behavior?
>
> The thread got forked a bit, but that's essentially what I was trying to
> suggest.  I talked with Andrea a bit and here's how I'd describe it:
>
> We have a mapping from the C{,++}11 memory model to RVWMO that's
> currently emitted by GCC, and there are years of binaries with that
> mapping.  As far as we know that mapping is correct, but I don't think
> anyone's gone through and formally analyzed it.  Let's call those the
> "GCC mapping".
>
> There's also a mapping listed in the ISA manual.  That's not the same as
> the GCC mapping, so let's call it the "ISA mapping".  We need to
> double-check the specifics, but IIUC this ISA mapping is broadly
> followed by LLVM.  It's also very likely to be correct, as it's been
> analyzed by lots of formal memory model people as part of the RVWMO
> standardization process.
>
> As far as I know, everyone likes the ISA mapping better than the GCC
> mapping.  It's hard to describe why concretely because there's no
> hardware that implements RVWMO sufficiently aggressively that we can
> talk performance, but at least matching what's in the ISA manual is the
> way to go.  Also, just kind of a personal opinion, the GCC mapping does
> some weird ugly stuff.
>

My guess is people like the ISA mapping (more) because it has been
documented and reviewed.
And it is the product of a working group that worked out the
RVWMO specification.
This gives some confidence that we don't need to rework it massively
because of correctness issues in the future.

The GCC mapping has (as far as I understand) never been documented and
reviewed (especially not by the RISC-V arch review group).
It might be faster and more robust, but it also might be incomplete and
slower.
Finding this out is a research topic with an unpredictable outcome.
And more or less nobody wants to take this risk or has the time and the
required skills to work on the research.


>
> So the question is really: how do we get rid of the GCC mapping while
> causing as little headache as possible?
>
> My proposal is as follows:
>
> * Let's properly analyze the GCC mapping, just on its own.  Maybe it's
>   broken, if it is then I think we've got a pretty decent argument to
>   just ignore old binary compatibility -- if it was always broken then
>   we can't break it, so who cares.
> * Assuming the GCC mapping is internally consistent, let's analyze
>   arbitrary mixes of the GCC and ISA mappings.  It's not generally true
>   that two correct mappings can be freely mixed, but maybe we've got
>   some cases that work fine.  Maybe we even get lucky and everything's
>   compatible, who knows (though I'm worried about the same .aq vs fence
>   stuff we've had in LKMM a few times).
> * Assuming there's any issue mixing the GCC and ISA mappings, let's add
>   a flag to GCC.  Something like -mrvwmo-compat={legacy,both,isa}, where:
>     - =legacy does what we have now.  We can eventually deprecate this,
>       and assuming =both is fast enough maybe we don't even need it.
>     - =both implements a mapping that's compatible with both the GCC and
>       ISA mappings.  This might be slow or something, but it'll be
>       correct with both.  We can set this to the default now, as it's
>       safe.
>     - =isa implements the ISA mappings.
>

I think a patch for legacy/isa can be done on time for the GCC release
window.
I can rework my series accordingly.
However, I see some risk for the "both" option on the timeline, so this
would have to wait until such a compatible mapping exists.



>
> Then we can go stick some marker in the ELF saying "this binary is
> compatible with the ISA mappings" to triage binaries in systems.  That
> way distros can decide when they want to move to -mrvwmo-compat=isa by
> default, maybe when they naturally do a world rebuild.  Eventually we
> can make that the default in GCC upstream and later deprecate the
> compatibility modes.
>
> That's going to take a bit of work on the formal memory model side of
> things, but I think we've at least got enough of Andrea's bandwidth to
> make it happen in a reasonable timeframe.  I don't think there's a ton
> of implementation work to do once we sort out the desired mappings, and
> a chunk of that can probably start in parallel as we'll likely need
> stuff like the ELF tagging eventually.
>
> Landing this before stage4 might be tricky, though.  I tell myself we're
> not going to take late backend features every release... ;)
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 00/10] [RISC-V] Atomics improvements [PR100265/PR100266]
  2022-10-14 11:03             ` Christoph Müllner
@ 2022-10-14 20:39               ` Jeff Law
  2022-10-14 21:57                 ` Palmer Dabbelt
  0 siblings, 1 reply; 31+ messages in thread
From: Jeff Law @ 2022-10-14 20:39 UTC (permalink / raw)
  To: Christoph Müllner, Palmer Dabbelt
  Cc: gcc-patches, Vineet Gupta, Kito Cheng, gnu-toolchain,
	Philipp Tomsich, Christoph Müllner

[-- Attachment #1: Type: text/plain, Size: 867 bytes --]


On 10/14/22 05:03, Christoph Müllner wrote:
>
> My guess is people like the ISA mapping (more) because it has been 
> documented and reviewed.
> And it is the product of a working group that worked out the 
> RVWMO specification.
> This gives some confidence that we don't need to rework it massively 
> because of correctness issues in the future.

This stuff can be hard and if someone with deep experience in memory 
models has reviewed the ISA mapping, then I'd prefer it over the GCC 
mapping.   It's just more likely the experts in the memory model space 
are more likely to get it right than a bunch of compiler junkies, no 
matter how smart we think we are :-)

Maybe I'm being too optimistic, but it's not hard for me to see a path 
where GCC and LLVM both implement the ISA mapping by default.  Anything 
else is just a path of long term pain.


Jeff

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 00/10] [RISC-V] Atomics improvements [PR100265/PR100266]
  2022-10-14 20:39               ` Jeff Law
@ 2022-10-14 21:57                 ` Palmer Dabbelt
  2022-10-15  0:31                   ` Palmer Dabbelt
  0 siblings, 1 reply; 31+ messages in thread
From: Palmer Dabbelt @ 2022-10-14 21:57 UTC (permalink / raw)
  To: jeffreyalaw
  Cc: cmuellner, gcc-patches, Vineet Gupta, kito.cheng, gnu-toolchain,
	philipp.tomsich, christoph.muellner

On Fri, 14 Oct 2022 13:39:33 PDT (-0700), jeffreyalaw@gmail.com wrote:
>
> On 10/14/22 05:03, Christoph Müllner wrote:
>>
>> My guess is people like the ISA mapping (more) because it has been 
>> documented and reviewed.
>> And it is the product of a working group that worked out the 
>> RVWMO specification.
>> This gives some confidence that we don't need to rework it massively 
>> because of correctness issues in the future.
>
> This stuff can be hard and if someone with deep experience in memory 
> models has reviewed the ISA mapping, then I'd prefer it over the GCC 
> mapping.   It's just more likely the experts in the memory model space 
> are more likely to get it right than a bunch of compiler junkies, no 
> matter how smart we think we are :-)

That's not really proven the case for the ISA suggested Linux mappings.  
We've been through a bunch of rounds of review upstream and that's 
resulted in some differences.  Some of that is likely related to the ISA 
mappings for Linux being incomplete (they punt on the tricky bits like 
interactions with spinlocks, which filter all over the place), and Linux 
doesn't have the same old binary compatibility issues (aka the in-kernel 
ABI is not stable between releases) so mapping churn isn't nearly as 
scary over there.

Still not much of an argument in favor of the GCC mappings, though.  I'm 
pretty sure nobody with a formal memory model backgound has looked at 
those, which is pretty much the worst spot to be in.  That said, I don't 
think we can just say the ISA mappings are the way to go, they seem to 
suffer from some similar incompleteness issues (for example, there's no 
explicit mappings for std::atomic<T>::compare_exchange_{weak,strong}).  
So we'll still need to put in the work to make sure whatever mappings 
get implemented are correct.

[
As an aside, I think LLVM is doing the wrong thing for some of the more 
complex forms of compare_exchange_weak.  For example

    #include <atomic>
    
    bool f(std::atomic<long>& p, long& o, long n) {
        return p.compare_exchange_weak(o, n, std::memory_order_acq_rel, std::memory_order_release);
    }
    
   
    $ clang-15.0.0 -std=c++17 -O3
    f(std::atomic<long>&, long&, long):                   # @f(std::atomic<long>&, long&, long)
            ld      a4, 0(a1)
    .LBB0_3:                                # =>This Inner Loop Header: Depth=1
            lr.d.aq a3, (a0)
            bne     a3, a4, .LBB0_5
            sc.d.rl a5, a2, (a0)
            bnez    a5, .LBB0_3
    .LBB0_5:
            xor     a0, a3, a4
            seqz    a0, a0
            beq     a3, a4, .LBB0_2
            sd      a3, 0(a1)
    .LBB0_2:
            ret

doesn't look to me like it provides release ordering on failure, but I'm 
not really a memory model person so maybe I'm missing something here.  
The GCC mapping is pretty ugly, but I think we do happen to have correct 
behavior in this case:

    # gcc-12.2.0 -std=c++17 -O3
    f(std::atomic<long>&, long&, long):
            ld      a5,0(a1)
            fence iorw,ow;  1: lr.d.aq a4,0(a0); bne a4,a5,1f; sc.d.aq a3,a2,0(a0); bnez a3,1b; 1:
            sub     a5,a4,a5
            seqz    a0,a5
            beq     a5,zero,.L2
            sd      a4,0(a1)
    .L2:
            andi    a0,a0,1
            ret
]

> Maybe I'm being too optimistic, but it's not hard for me to see a path 
> where GCC and LLVM both implement the ISA mapping by default.  Anything 
> else is just a path of long term pain.

I'd bet that most people want that, but in practice building any real 
systems in RISC-V land requires some degree of implementation-defined 
behavior as the specifications don't cover everything (even ignoring the 
whole PDF vs specification word games).  That's not to say we should 
just ignore what's written down, just that there's more work to do even 
if we ignore compatibility with old binaries.

I think the question here is whether it's worth putting in the extra 
work to provide a path for systems with old binaries to gradually 
upgrade to the ISA mappings, or if we just toss out those old binaries.  
I think we really need to see how bunch of a headache that compatibility 
mode is going to be, and the only way to do that is put in the time to 
analyze the GCC mappings.

That said, I don't really personally care that much about old binaries.  
Really my only argument here is that we broke binary compatibility once 
(right before we upstreamed the port), that made a handful of people 
mad, and I told them we'd never do it again.  I think we were all 
operating under the assumption that RISC-V would move an order of 
magnitude faster that it has, though, so maybe we're in a spot where 
nobody actually cares about extant binaries and we can just throw them 
all out.

If we're going to do that, though, there's a bunch of other cruft that 
we should probably toss along with the GCC mappings...

>
>
> Jeff

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 00/10] [RISC-V] Atomics improvements [PR100265/PR100266]
  2022-10-14 21:57                 ` Palmer Dabbelt
@ 2022-10-15  0:31                   ` Palmer Dabbelt
  0 siblings, 0 replies; 31+ messages in thread
From: Palmer Dabbelt @ 2022-10-15  0:31 UTC (permalink / raw)
  To: gcc-patches
  Cc: jeffreyalaw, cmuellner, Vineet Gupta, kito.cheng, gnu-toolchain,
	philipp.tomsich, christoph.muellner

On Fri, 14 Oct 2022 14:57:22 PDT (-0700), Palmer Dabbelt wrote:
> On Fri, 14 Oct 2022 13:39:33 PDT (-0700), jeffreyalaw@gmail.com wrote:
>>
>> On 10/14/22 05:03, Christoph Müllner wrote:
>>>
>>> My guess is people like the ISA mapping (more) because it has been 
>>> documented and reviewed.
>>> And it is the product of a working group that worked out the 
>>> RVWMO specification.
>>> This gives some confidence that we don't need to rework it massively 
>>> because of correctness issues in the future.
>>
>> This stuff can be hard and if someone with deep experience in memory 
>> models has reviewed the ISA mapping, then I'd prefer it over the GCC 
>> mapping.   It's just more likely the experts in the memory model space 
>> are more likely to get it right than a bunch of compiler junkies, no 
>> matter how smart we think we are :-)
>
> That's not really proven the case for the ISA suggested Linux mappings.  
> We've been through a bunch of rounds of review upstream and that's 
> resulted in some differences.  Some of that is likely related to the ISA 
> mappings for Linux being incomplete (they punt on the tricky bits like 
> interactions with spinlocks, which filter all over the place), and Linux 
> doesn't have the same old binary compatibility issues (aka the in-kernel 
> ABI is not stable between releases) so mapping churn isn't nearly as 
> scary over there.
>
> Still not much of an argument in favor of the GCC mappings, though.  I'm 
> pretty sure nobody with a formal memory model backgound has looked at 
> those, which is pretty much the worst spot to be in.  That said, I don't 
> think we can just say the ISA mappings are the way to go, they seem to 
> suffer from some similar incompleteness issues (for example, there's no 
> explicit mappings for std::atomic<T>::compare_exchange_{weak,strong}).  
> So we'll still need to put in the work to make sure whatever mappings 
> get implemented are correct.
>
> [
> As an aside, I think LLVM is doing the wrong thing for some of the more 
> complex forms of compare_exchange_weak.  For example
>
>     #include <atomic>
>     
>     bool f(std::atomic<long>& p, long& o, long n) {
>         return p.compare_exchange_weak(o, n, std::memory_order_acq_rel, std::memory_order_release);

Eric points out this is bogus code, the spec forbids release as the fail 
ordering (which akes sense).  Just kind of randomly permutating these 
ordering arguments ends up with some generated code that seems off, for 
example release/acq_rel produces lr/sc.rl (and no fences).

I don't think any of that is very relevant for the GCC discussion, 
though, as there's a bunch of other mappings specified by the ISA and we 
should just sort this one out.

>     }
>     
>    
>     $ clang-15.0.0 -std=c++17 -O3
>     f(std::atomic<long>&, long&, long):                   # @f(std::atomic<long>&, long&, long)
>             ld      a4, 0(a1)
>     .LBB0_3:                                # =>This Inner Loop Header: Depth=1
>             lr.d.aq a3, (a0)
>             bne     a3, a4, .LBB0_5
>             sc.d.rl a5, a2, (a0)
>             bnez    a5, .LBB0_3
>     .LBB0_5:
>             xor     a0, a3, a4
>             seqz    a0, a0
>             beq     a3, a4, .LBB0_2
>             sd      a3, 0(a1)
>     .LBB0_2:
>             ret
>
> doesn't look to me like it provides release ordering on failure, but I'm 
> not really a memory model person so maybe I'm missing something here.  
> The GCC mapping is pretty ugly, but I think we do happen to have correct 
> behavior in this case:
>
>     # gcc-12.2.0 -std=c++17 -O3
>     f(std::atomic<long>&, long&, long):
>             ld      a5,0(a1)
>             fence iorw,ow;  1: lr.d.aq a4,0(a0); bne a4,a5,1f; sc.d.aq a3,a2,0(a0); bnez a3,1b; 1:
>             sub     a5,a4,a5
>             seqz    a0,a5
>             beq     a5,zero,.L2
>             sd      a4,0(a1)
>     .L2:
>             andi    a0,a0,1
>             ret
> ]
>
>> Maybe I'm being too optimistic, but it's not hard for me to see a path 
>> where GCC and LLVM both implement the ISA mapping by default.  Anything 
>> else is just a path of long term pain.
>
> I'd bet that most people want that, but in practice building any real 
> systems in RISC-V land requires some degree of implementation-defined 
> behavior as the specifications don't cover everything (even ignoring the 
> whole PDF vs specification word games).  That's not to say we should 
> just ignore what's written down, just that there's more work to do even 
> if we ignore compatibility with old binaries.
>
> I think the question here is whether it's worth putting in the extra 
> work to provide a path for systems with old binaries to gradually 
> upgrade to the ISA mappings, or if we just toss out those old binaries.  
> I think we really need to see how bunch of a headache that compatibility 
> mode is going to be, and the only way to do that is put in the time to 
> analyze the GCC mappings.
>
> That said, I don't really personally care that much about old binaries.  
> Really my only argument here is that we broke binary compatibility once 
> (right before we upstreamed the port), that made a handful of people 
> mad, and I told them we'd never do it again.  I think we were all 
> operating under the assumption that RISC-V would move an order of 
> magnitude faster that it has, though, so maybe we're in a spot where 
> nobody actually cares about extant binaries and we can just throw them 
> all out.
>
> If we're going to do that, though, there's a bunch of other cruft that 
> we should probably toss along with the GCC mappings...
>
>>
>>
>> Jeff
>
> -- 
> You received this message because you are subscribed to the Google Groups "gnu-toolchain" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to gnu-toolchain+unsubscribe@rivosinc.com.
> To view this discussion on the web visit https://groups.google.com/a/rivosinc.com/d/msgid/gnu-toolchain/mhng-580bc331-c4d5-4f04-afd5-30f09f68b660%40palmer-ri-x1c9a.
> For more options, visit https://groups.google.com/a/rivosinc.com/d/optout.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 00/10] [RISC-V] Atomics improvements [PR100265/PR100266]
  2022-10-12 17:16           ` Andrea Parri
@ 2022-10-20 19:01             ` Andrea Parri
  2022-10-29  5:02               ` Jeff Law
  0 siblings, 1 reply; 31+ messages in thread
From: Andrea Parri @ 2022-10-20 19:01 UTC (permalink / raw)
  To: Palmer Dabbelt
  Cc: Vineet Gupta, jeffreyalaw, cmuellner, gcc-patches, kito.cheng,
	gnu-toolchain

On Wed, Oct 12, 2022 at 07:16:20PM +0200, Andrea Parri wrote:
> > > >     +Andrea, in case he has time to look at the memory model / ABI
> > > >     issues.
> 
> > +Jeff, who was offering to help when the threads got crossed.  I'd punted on
> > a lot of this in the hope Andrea could help out, as I'm not really a memory
> > model guy and this is pretty far down the rabbit hole.  Happy to have the
> > help if you're offering, though, as what's there is likely a pretty big
> > performance issue for anyone with a reasonable memory system.
> 
> Thanks for linking me to the discussion and the remarks, Palmer.  I'm
> happy to help (and synchronized with Jeff/the community) as possible,
> building a better understanding of the 'issues' at stake.

Summarizing here some findings from looking at the currently-implemented
and the proposed [1] mappings:

  - Current mapping is missing synchronization, notably

	atomic_compare_exchange_weak_explicit(-, -, -,
					      memory_order_release,
					      memory_order_relaxed);

    is unable to provide the (required) release ordering guarantees; for
    reference, I've reported a litmus test illustrating it at the bottom
    of this email, cf. c-cmpxchg.

  - [1] addressed the "memory_order_release" problem/bug mentioned above
    (as well as other quirks of the current mapping I won't detail here),
    but it doesn't address other problems present in the current mapping;
    in particular, both mappings translate the following

	atomic_compare_exchange_weak_explicit(-, -, -,
					      memory_order_acquire,
					      memory_order_relaxed);

    to a sequence 

	lr.w
	bne
	sc.w.aq

    (withouth any other synchronization/fences), which contrasts with the
    the Unprivileged Spec, Section 10,2 "Load-Reserve / Store-Conditional
    Instructions":

      "Software should not set the 'rl' bit on an LR instruction unless
      the 'aq' bit is also set, nor should software set the 'aq' bit on
      an SC instruction unless the 'rl' bit is also set.  LR.rl and SC.aq
      instructions are not guaranteed to provide any stronger ordering
      than those with both bits clear [...]"

Thanks,
  Andrea

[1] https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595712.html


C c-cmpxchg

{}

P0(atomic_int *x, atomic_int *y, int *z)
{
	int r0;

	atomic_store_explicit(x, 1, memory_order_relaxed);
	r0 = atomic_compare_exchange_weak_explicit(y, z, 1, memory_order_release, memory_order_relaxed);
}

P1(atomic_int *x, atomic_int *y)
{
	int r1;
	int r2;

	r1 = atomic_load_explicit(y, memory_order_acquire);
	r2 = atomic_load_explicit(x, memory_order_relaxed);
}

exists (1:r1=1 /\ 1:r2=0)

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 00/10] [RISC-V] Atomics improvements [PR100265/PR100266]
  2022-10-20 19:01             ` Andrea Parri
@ 2022-10-29  5:02               ` Jeff Law
  0 siblings, 0 replies; 31+ messages in thread
From: Jeff Law @ 2022-10-29  5:02 UTC (permalink / raw)
  To: Andrea Parri, Palmer Dabbelt
  Cc: Vineet Gupta, cmuellner, gcc-patches, kito.cheng, gnu-toolchain


On 10/20/22 13:01, Andrea Parri wrote:
> On Wed, Oct 12, 2022 at 07:16:20PM +0200, Andrea Parri wrote:
>>>>>      +Andrea, in case he has time to look at the memory model / ABI
>>>>>      issues.
>>> +Jeff, who was offering to help when the threads got crossed.  I'd punted on
>>> a lot of this in the hope Andrea could help out, as I'm not really a memory
>>> model guy and this is pretty far down the rabbit hole.  Happy to have the
>>> help if you're offering, though, as what's there is likely a pretty big
>>> performance issue for anyone with a reasonable memory system.
>> Thanks for linking me to the discussion and the remarks, Palmer.  I'm
>> happy to help (and synchronized with Jeff/the community) as possible,
>> building a better understanding of the 'issues' at stake.
> Summarizing here some findings from looking at the currently-implemented
> and the proposed [1] mappings:
>
>    - Current mapping is missing synchronization, notably
>
> 	atomic_compare_exchange_weak_explicit(-, -, -,
> 					      memory_order_release,
> 					      memory_order_relaxed);
>
>      is unable to provide the (required) release ordering guarantees; for
>      reference, I've reported a litmus test illustrating it at the bottom
>      of this email, cf. c-cmpxchg.
>
>    - [1] addressed the "memory_order_release" problem/bug mentioned above
>      (as well as other quirks of the current mapping I won't detail here),
>      but it doesn't address other problems present in the current mapping;
>      in particular, both mappings translate the following
>
> 	atomic_compare_exchange_weak_explicit(-, -, -,
> 					      memory_order_acquire,
> 					      memory_order_relaxed);
>
>      to a sequence
>
> 	lr.w
> 	bne
> 	sc.w.aq
>
>      (withouth any other synchronization/fences), which contrasts with the
>      the Unprivileged Spec, Section 10,2 "Load-Reserve / Store-Conditional
>      Instructions":
>
>        "Software should not set the 'rl' bit on an LR instruction unless
>        the 'aq' bit is also set, nor should software set the 'aq' bit on
>        an SC instruction unless the 'rl' bit is also set.  LR.rl and SC.aq
>        instructions are not guaranteed to provide any stronger ordering
>        than those with both bits clear [...]"

So it sounds like Christoph's patch is an improvement, but isn't 
complete.  Given the pain in this space, I'd be hesitant to put in an 
incomplete fix, even if it moves things in the right direction as it 
creates another compatibility headache if we don't get the complete 
solution in place for gcc-13.


Christoph, thoughts on the case Andrea pointed out?


Jeff



^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2022-10-29  5:02 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-05 19:36 [PATCH v2 00/10] [RISC-V] Atomics improvements [PR100265/PR100266] Christoph Muellner
2021-05-05 19:36 ` [PATCH v2 01/10] RISC-V: Simplify memory model code [PR 100265] Christoph Muellner
2021-05-05 19:36 ` [PATCH v2 02/10] RISC-V: Emit proper memory ordering suffixes for AMOs " Christoph Muellner
2021-05-05 19:36 ` [PATCH v2 03/10] RISC-V: Eliminate %F specifier from riscv_print_operand() " Christoph Muellner
2021-05-05 19:36 ` [PATCH v2 04/10] RISC-V: Use STORE instead of AMOSWAP for atomic stores " Christoph Muellner
2021-05-05 19:36 ` [PATCH v2 05/10] RISC-V: Emit fences according to chosen memory model " Christoph Muellner
2021-05-05 19:36 ` [PATCH v2 06/10] RISC-V: Implement atomic_{load,store} " Christoph Muellner
2021-05-05 19:36 ` [PATCH v2 07/10] RISC-V: Model INSNs for LR and SC [PR 100266] Christoph Muellner
2021-05-05 19:36 ` [PATCH v2 08/10] RISC-V: Add s.ext-consuming " Christoph Muellner
2021-05-05 19:36 ` [PATCH v2 09/10] RISC-V: Provide programmatic implementation of CAS " Christoph Muellner
2021-05-06  0:27   ` Jim Wilson
2021-05-05 19:36 ` [PATCH v2 10/10] RISC-V: Introduce predicate "riscv_sync_memory_operand" " Christoph Muellner
2022-10-11 19:06 ` [PATCH v2 00/10] [RISC-V] Atomics improvements [PR100265/PR100266] Vineet Gupta
2022-10-11 19:31   ` Palmer Dabbelt
2022-10-11 20:46     ` Christoph Müllner
2022-10-11 23:31       ` Vineet Gupta
2022-10-12  0:15         ` Palmer Dabbelt
2022-10-12  8:03           ` Christoph Müllner
2022-10-13 23:11             ` Jeff Law
2022-10-12 17:16           ` Andrea Parri
2022-10-20 19:01             ` Andrea Parri
2022-10-29  5:02               ` Jeff Law
2022-10-13 23:04           ` Jeff Law
2022-10-13 22:39         ` Jeff Law
2022-10-13 23:14           ` Palmer Dabbelt
2022-10-14 11:03             ` Christoph Müllner
2022-10-14 20:39               ` Jeff Law
2022-10-14 21:57                 ` Palmer Dabbelt
2022-10-15  0:31                   ` Palmer Dabbelt
2022-10-14  0:14           ` Vineet Gupta
2022-10-11 23:14     ` Jeff Law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).