public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH v1] RISC-V: Add support for inlining subword atomic operations
@ 2022-02-08  0:48 Patrick O'Neill
  2022-02-23 21:49 ` Palmer Dabbelt
  0 siblings, 1 reply; 10+ messages in thread
From: Patrick O'Neill @ 2022-02-08  0:48 UTC (permalink / raw)
  To: gcc-patches; +Cc: palmer, Patrick O'Neill

RISC-V has no support for subword atomic operations; code currently
generates libatomic library calls.

This patch changes the default behavior to inline subword atomic calls 
(using the same logic as the existing library call).
Behavior can be specified using the -minline-atomics and
-mno-inline-atomics command line flags.

gcc/libgcc/config/riscv/atomic.c has the same logic implemented in asm.
This will need to stay for backwards compatibility and the
-mno-inline-atomics flag.

2022-02-07 Patrick O'Neill <patrick@rivosinc.com>

	PR target/104338
	* riscv.opt: Add command-line flag.
	* sync.md (atomic_fetch_<atomic_optab><mode>): logic for 
	expanding subword atomic operations.
	* sync.md (subword_atomic_fetch_strong_<atomic_optab>): LR/SC
	block for performing atomic operation
	* atomic.c: Add reference to duplicate logic.
	* inline-atomics-1.c: New test.
	* inline-atomics-2.c: Likewise.
	* inline-atomics-3.c: Likewise.
	* inline-atomics-4.c: Likewise.
	* inline-atomics-5.c: Likewise.
	* inline-atomics-6.c: Likewise.
	* inline-atomics-7.c: Likewise.
	* inline-atomics-8.c: Likewise.
	* inline-atomics-9.c: Likewise.

Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
---
There may be further concerns about the memory consistency of these 
operations, but this patch focuses on simply moving the logic inline.
Those concerns can be addressed in a future patch.
---
 gcc/config/riscv/riscv.opt                    |   4 +
 gcc/config/riscv/sync.md                      |  96 +++
 .../gcc.target/riscv/inline-atomics-1.c       |  11 +
 .../gcc.target/riscv/inline-atomics-2.c       |  12 +
 .../gcc.target/riscv/inline-atomics-3.c       | 569 ++++++++++++++++++
 .../gcc.target/riscv/inline-atomics-4.c       | 566 +++++++++++++++++
 .../gcc.target/riscv/inline-atomics-5.c       |  13 +
 .../gcc.target/riscv/inline-atomics-6.c       |  12 +
 .../gcc.target/riscv/inline-atomics-7.c       |  12 +
 .../gcc.target/riscv/inline-atomics-8.c       |  17 +
 .../gcc.target/riscv/inline-atomics-9.c       |  17 +
 libgcc/config/riscv/atomic.c                  |   2 +
 12 files changed, 1331 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-9.c

diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index e294e223151..fb702317233 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -211,3 +211,7 @@ Enum(isa_spec_class) String(20191213) Value(ISA_SPEC_CLASS_20191213)
 misa-spec=
 Target RejectNegative Joined Enum(isa_spec_class) Var(riscv_isa_spec) Init(TARGET_DEFAULT_ISA_SPEC)
 Set the version of RISC-V ISA spec.
+
+minline-atomics
+Target Bool Var(ALWAYS_INLINE_SUBWORD_ATOMIC) Init(-1)
+Always inline subword atomic operations.
diff --git a/gcc/config/riscv/sync.md b/gcc/config/riscv/sync.md
index 747a799e237..e19b4157d3c 100644
--- a/gcc/config/riscv/sync.md
+++ b/gcc/config/riscv/sync.md
@@ -92,6 +92,102 @@
   "%F3amo<insn>.<amo>%A3 %0,%z2,%1"
   [(set (attr "length") (const_int 8))])
 
+(define_expand "atomic_fetch_<atomic_optab><mode>"
+  [(set (match_operand:SHORT 0 "register_operand" "=&r")	      ;; old value at mem
+	(match_operand:SHORT 1 "memory_operand" "+A"))		      ;; mem location
+   (set (match_dup 1)
+	(unspec_volatile:SHORT
+	  [(any_atomic:SHORT (match_dup 1)
+		     (match_operand:SHORT 2 "reg_or_0_operand" "rJ")) ;; value for op
+	   (match_operand:SI 3 "const_int_operand")]		      ;; model
+	 UNSPEC_SYNC_OLD_OP))]
+  "TARGET_ATOMIC && ALWAYS_INLINE_SUBWORD_ATOMIC"
+{
+  /* We have no QImode/HImode atomics, so form a mask, then use
+     subword_atomic_fetch_strong_<mode> to implement a LR/SC version of the
+     operation. */
+
+  /* Logic duplicated in gcc/libgcc/config/riscv/atomic.c for use when inlining
+     is disabled */
+
+  rtx old = gen_reg_rtx (SImode);
+  rtx mem = operands[1];
+  rtx value = operands[2];
+  rtx mask = gen_reg_rtx (SImode);
+  rtx notmask = gen_reg_rtx (SImode);
+
+  rtx addr = force_reg (Pmode, XEXP (mem, 0));
+
+  rtx aligned_addr = gen_reg_rtx (Pmode);
+  emit_move_insn (aligned_addr,  gen_rtx_AND (Pmode, addr,
+					      gen_int_mode (-4, Pmode)));
+
+  rtx aligned_mem = change_address (mem, SImode, aligned_addr);
+
+  rtx shift = gen_reg_rtx (SImode);
+  emit_move_insn (shift, gen_rtx_AND (SImode, gen_lowpart (SImode, addr),
+				      gen_int_mode (3, SImode)));
+  emit_move_insn (shift, gen_rtx_ASHIFT (SImode, shift,
+					 gen_int_mode(3, SImode)));
+
+  rtx value_reg = gen_reg_rtx (SImode);
+  emit_move_insn (value_reg, simplify_gen_subreg (SImode, value, <MODE>mode, 0));
+
+  rtx shifted_value = gen_reg_rtx (SImode);
+  emit_move_insn(shifted_value, gen_rtx_ASHIFT(SImode, value_reg,
+					       gen_lowpart (QImode, shift)));
+
+  int unshifted_mask;
+  if (<MODE>mode == QImode)
+    unshifted_mask = 0xFF;
+  else
+    unshifted_mask = 0xFFFF;
+
+  rtx mask_reg = gen_reg_rtx (SImode);
+  emit_move_insn (mask_reg, gen_int_mode(unshifted_mask, SImode));
+
+  emit_move_insn (mask, gen_rtx_ASHIFT(SImode, mask_reg,
+				       gen_lowpart (QImode, shift)));
+
+  emit_move_insn (notmask, gen_rtx_NOT(SImode, mask));
+
+  emit_insn (gen_subword_atomic_fetch_strong_<atomic_optab> (old, aligned_mem,
+							     shifted_value,
+							     mask, notmask));
+
+  emit_move_insn (old, gen_rtx_ASHIFTRT(SImode, old,
+					gen_lowpart(QImode, shift)));
+
+  emit_move_insn (operands[0], gen_lowpart(<MODE>mode, old));
+
+  DONE;
+})
+
+(define_insn "subword_atomic_fetch_strong_<atomic_optab>"
+  [(set (match_operand:SI 0 "register_operand" "=&r")		   ;; old value at mem
+	(match_operand:SI 1 "memory_operand" "+A"))		   ;; mem location
+   (set (match_dup 1)
+	(unspec_volatile:SI
+	  [(any_atomic:SI (match_dup 1)
+		     (match_operand:SI 2 "register_operand" "rI")) ;; value for op
+	   (match_operand:SI 3 "register_operand" "rI")]	   ;; mask
+	 UNSPEC_SYNC_OLD_OP))
+    (match_operand:SI 4 "register_operand" "rI")		   ;; not_mask
+    (clobber (match_scratch:SI 5 "=&r"))			   ;; tmp_1
+    (clobber (match_scratch:SI 6 "=&r"))]			   ;; tmp_2
+  "TARGET_ATOMIC && ALWAYS_INLINE_SUBWORD_ATOMIC"
+  {
+    return
+    "1:\;"
+    "lr.w.aq\t%0, %1\;"
+    "<insn>\t%5, %0, %2\;"
+    "and\t%5, %5, %3\;"
+    "and\t%6, %0, %4\;"
+    "or\t%6, %6, %5\;"
+    "sc.w.rl\t%5, %6, %1\;"
+    "bnez\t%5, 1b\;";}
+  )
+
 (define_insn "atomic_exchange<mode>"
   [(set (match_operand:GPR 0 "register_operand" "=&r")
 	(unspec_volatile:GPR
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-1.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-1.c
new file mode 100644
index 00000000000..110fdabd313
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-1.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-mno-inline-atomics" } */
+/* { dg-final { scan-assembler "\tcall\t__sync_fetch_and_add_1" } } */
+
+char bar;
+
+int
+main ()
+{
+  __sync_fetch_and_add(&bar, 1);
+}
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-2.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-2.c
new file mode 100644
index 00000000000..8d5c31d8b79
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-2.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* Verify that subword atomics do not generate calls.  */
+/* { dg-options "-minline-atomics" } */
+/* { dg-final { scan-assembler-not "\tcall\t__sync_fetch_and_add_1" } } */
+
+char bar;
+
+int
+main ()
+{
+  __sync_fetch_and_add(&bar, 1);
+}
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-3.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-3.c
new file mode 100644
index 00000000000..19b382d45b0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-3.c
@@ -0,0 +1,569 @@
+/* Check all char alignments.  */
+/* Duplicate logic as libatomic/testsuite/libatomic.c/atomic-op-1.c */
+/* Test __atomic routines for existence and proper execution on 1 byte
+   values with each valid memory model.  */
+/* { dg-do run } */
+/* { dg-options "-latomic -minline-atomics -Wno-address-of-packed-member" } */
+
+/* Test the execution of the __atomic_*OP builtin routines for a char.  */
+
+extern void abort(void);
+
+char count, res;
+const char init = ~0;
+
+struct A
+{
+   char a;
+   char b;
+   char c;
+   char d;
+} __attribute__ ((packed)) A;
+
+/* The fetch_op routines return the original value before the operation.  */
+
+void
+test_fetch_add (char* v)
+{
+  *v = 0;
+  count = 1;
+
+  if (__atomic_fetch_add (v, count, __ATOMIC_RELAXED) != 0)
+    abort ();
+
+  if (__atomic_fetch_add (v, 1, __ATOMIC_CONSUME) != 1)
+    abort ();
+
+  if (__atomic_fetch_add (v, count, __ATOMIC_ACQUIRE) != 2)
+    abort ();
+
+  if (__atomic_fetch_add (v, 1, __ATOMIC_RELEASE) != 3)
+    abort ();
+
+  if (__atomic_fetch_add (v, count, __ATOMIC_ACQ_REL) != 4)
+    abort ();
+
+  if (__atomic_fetch_add (v, 1, __ATOMIC_SEQ_CST) != 5)
+    abort ();
+}
+
+
+void
+test_fetch_sub (char* v)
+{
+  *v = res = 20;
+  count = 0;
+
+  if (__atomic_fetch_sub (v, count + 1, __ATOMIC_RELAXED) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, 1, __ATOMIC_CONSUME) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, count + 1, __ATOMIC_ACQUIRE) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, 1, __ATOMIC_RELEASE) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, count + 1, __ATOMIC_ACQ_REL) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, 1, __ATOMIC_SEQ_CST) !=  res--)
+    abort ();
+}
+
+void
+test_fetch_and (char* v)
+{
+  *v = init;
+
+  if (__atomic_fetch_and (v, 0, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_fetch_and (v, init, __ATOMIC_CONSUME) !=  0)
+    abort ();
+
+  if (__atomic_fetch_and (v, 0, __ATOMIC_ACQUIRE) !=  0)
+    abort ();
+
+  *v = ~*v;
+  if (__atomic_fetch_and (v, init, __ATOMIC_RELEASE) !=  init)
+    abort ();
+
+  if (__atomic_fetch_and (v, 0, __ATOMIC_ACQ_REL) !=  init)
+    abort ();
+
+  if (__atomic_fetch_and (v, 0, __ATOMIC_SEQ_CST) !=  0)
+    abort ();
+}
+
+void
+test_fetch_nand (char* v)
+{
+  *v = init;
+
+  if (__atomic_fetch_nand (v, 0, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_fetch_nand (v, init, __ATOMIC_CONSUME) !=  init)
+    abort ();
+
+  if (__atomic_fetch_nand (v, 0, __ATOMIC_ACQUIRE) !=  0 )
+    abort ();
+
+  if (__atomic_fetch_nand (v, init, __ATOMIC_RELEASE) !=  init)
+    abort ();
+
+  if (__atomic_fetch_nand (v, init, __ATOMIC_ACQ_REL) !=  0)
+    abort ();
+
+  if (__atomic_fetch_nand (v, 0, __ATOMIC_SEQ_CST) !=  init)
+    abort ();
+}
+
+void
+test_fetch_xor (char* v)
+{
+  *v = init;
+  count = 0;
+
+  if (__atomic_fetch_xor (v, count, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_fetch_xor (v, ~count, __ATOMIC_CONSUME) !=  init)
+    abort ();
+
+  if (__atomic_fetch_xor (v, 0, __ATOMIC_ACQUIRE) !=  0)
+    abort ();
+
+  if (__atomic_fetch_xor (v, ~count, __ATOMIC_RELEASE) !=  0)
+    abort ();
+
+  if (__atomic_fetch_xor (v, 0, __ATOMIC_ACQ_REL) !=  init)
+    abort ();
+
+  if (__atomic_fetch_xor (v, ~count, __ATOMIC_SEQ_CST) !=  init)
+    abort ();
+}
+
+void
+test_fetch_or (char* v)
+{
+  *v = 0;
+  count = 1;
+
+  if (__atomic_fetch_or (v, count, __ATOMIC_RELAXED) !=  0)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, 2, __ATOMIC_CONSUME) !=  1)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, count, __ATOMIC_ACQUIRE) !=  3)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, 8, __ATOMIC_RELEASE) !=  7)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, count, __ATOMIC_ACQ_REL) !=  15)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, count, __ATOMIC_SEQ_CST) !=  31)
+    abort ();
+}
+
+/* The OP_fetch routines return the new value after the operation.  */
+
+void
+test_add_fetch (char* v)
+{
+  *v = 0;
+  count = 1;
+
+  if (__atomic_add_fetch (v, count, __ATOMIC_RELAXED) != 1)
+    abort ();
+
+  if (__atomic_add_fetch (v, 1, __ATOMIC_CONSUME) != 2)
+    abort ();
+
+  if (__atomic_add_fetch (v, count, __ATOMIC_ACQUIRE) != 3)
+    abort ();
+
+  if (__atomic_add_fetch (v, 1, __ATOMIC_RELEASE) != 4)
+    abort ();
+
+  if (__atomic_add_fetch (v, count, __ATOMIC_ACQ_REL) != 5)
+    abort ();
+
+  if (__atomic_add_fetch (v, count, __ATOMIC_SEQ_CST) != 6)
+    abort ();
+}
+
+
+void
+test_sub_fetch (char* v)
+{
+  *v = res = 20;
+  count = 0;
+
+  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_RELAXED) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, 1, __ATOMIC_CONSUME) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_ACQUIRE) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, 1, __ATOMIC_RELEASE) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_ACQ_REL) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_SEQ_CST) !=  --res)
+    abort ();
+}
+
+void
+test_and_fetch (char* v)
+{
+  *v = init;
+
+  if (__atomic_and_fetch (v, 0, __ATOMIC_RELAXED) !=  0)
+    abort ();
+
+  *v = init;
+  if (__atomic_and_fetch (v, init, __ATOMIC_CONSUME) !=  init)
+    abort ();
+
+  if (__atomic_and_fetch (v, 0, __ATOMIC_ACQUIRE) !=  0)
+    abort ();
+
+  *v = ~*v;
+  if (__atomic_and_fetch (v, init, __ATOMIC_RELEASE) !=  init)
+    abort ();
+
+  if (__atomic_and_fetch (v, 0, __ATOMIC_ACQ_REL) !=  0)
+    abort ();
+
+  *v = ~*v;
+  if (__atomic_and_fetch (v, 0, __ATOMIC_SEQ_CST) !=  0)
+    abort ();
+}
+
+void
+test_nand_fetch (char* v)
+{
+  *v = init;
+
+  if (__atomic_nand_fetch (v, 0, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_nand_fetch (v, init, __ATOMIC_CONSUME) !=  0)
+    abort ();
+
+  if (__atomic_nand_fetch (v, 0, __ATOMIC_ACQUIRE) !=  init)
+    abort ();
+
+  if (__atomic_nand_fetch (v, init, __ATOMIC_RELEASE) !=  0)
+    abort ();
+
+  if (__atomic_nand_fetch (v, init, __ATOMIC_ACQ_REL) !=  init)
+    abort ();
+
+  if (__atomic_nand_fetch (v, 0, __ATOMIC_SEQ_CST) !=  init)
+    abort ();
+}
+
+
+
+void
+test_xor_fetch (char* v)
+{
+  *v = init;
+  count = 0;
+
+  if (__atomic_xor_fetch (v, count, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_xor_fetch (v, ~count, __ATOMIC_CONSUME) !=  0)
+    abort ();
+
+  if (__atomic_xor_fetch (v, 0, __ATOMIC_ACQUIRE) !=  0)
+    abort ();
+
+  if (__atomic_xor_fetch (v, ~count, __ATOMIC_RELEASE) !=  init)
+    abort ();
+
+  if (__atomic_xor_fetch (v, 0, __ATOMIC_ACQ_REL) !=  init)
+    abort ();
+
+  if (__atomic_xor_fetch (v, ~count, __ATOMIC_SEQ_CST) !=  0)
+    abort ();
+}
+
+void
+test_or_fetch (char* v)
+{
+  *v = 0;
+  count = 1;
+
+  if (__atomic_or_fetch (v, count, __ATOMIC_RELAXED) !=  1)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, 2, __ATOMIC_CONSUME) !=  3)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, count, __ATOMIC_ACQUIRE) !=  7)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, 8, __ATOMIC_RELEASE) !=  15)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, count, __ATOMIC_ACQ_REL) !=  31)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, count, __ATOMIC_SEQ_CST) !=  63)
+    abort ();
+}
+
+
+/* Test the OP routines with a result which isn't used. Use both variations
+   within each function.  */
+
+void
+test_add (char* v)
+{
+  *v = 0;
+  count = 1;
+
+  __atomic_add_fetch (v, count, __ATOMIC_RELAXED);
+  if (*v != 1)
+    abort ();
+
+  __atomic_fetch_add (v, count, __ATOMIC_CONSUME);
+  if (*v != 2)
+    abort ();
+
+  __atomic_add_fetch (v, 1 , __ATOMIC_ACQUIRE);
+  if (*v != 3)
+    abort ();
+
+  __atomic_fetch_add (v, 1, __ATOMIC_RELEASE);
+  if (*v != 4)
+    abort ();
+
+  __atomic_add_fetch (v, count, __ATOMIC_ACQ_REL);
+  if (*v != 5)
+    abort ();
+
+  __atomic_fetch_add (v, count, __ATOMIC_SEQ_CST);
+  if (*v != 6)
+    abort ();
+}
+
+
+void
+test_sub (char* v)
+{
+  *v = res = 20;
+  count = 0;
+
+  __atomic_sub_fetch (v, count + 1, __ATOMIC_RELAXED);
+  if (*v != --res)
+    abort ();
+
+  __atomic_fetch_sub (v, count + 1, __ATOMIC_CONSUME);
+  if (*v != --res)
+    abort ();
+
+  __atomic_sub_fetch (v, 1, __ATOMIC_ACQUIRE);
+  if (*v != --res)
+    abort ();
+
+  __atomic_fetch_sub (v, 1, __ATOMIC_RELEASE);
+  if (*v != --res)
+    abort ();
+
+  __atomic_sub_fetch (v, count + 1, __ATOMIC_ACQ_REL);
+  if (*v != --res)
+    abort ();
+
+  __atomic_fetch_sub (v, count + 1, __ATOMIC_SEQ_CST);
+  if (*v != --res)
+    abort ();
+}
+
+void
+test_and (char* v)
+{
+  *v = init;
+
+  __atomic_and_fetch (v, 0, __ATOMIC_RELAXED);
+  if (*v != 0)
+    abort ();
+
+  *v = init;
+  __atomic_fetch_and (v, init, __ATOMIC_CONSUME);
+  if (*v != init)
+    abort ();
+
+  __atomic_and_fetch (v, 0, __ATOMIC_ACQUIRE);
+  if (*v != 0)
+    abort ();
+
+  *v = ~*v;
+  __atomic_fetch_and (v, init, __ATOMIC_RELEASE);
+  if (*v != init)
+    abort ();
+
+  __atomic_and_fetch (v, 0, __ATOMIC_ACQ_REL);
+  if (*v != 0)
+    abort ();
+
+  *v = ~*v;
+  __atomic_fetch_and (v, 0, __ATOMIC_SEQ_CST);
+  if (*v != 0)
+    abort ();
+}
+
+void
+test_nand (char* v)
+{
+  *v = init;
+
+  __atomic_fetch_nand (v, 0, __ATOMIC_RELAXED);
+  if (*v != init)
+    abort ();
+
+  __atomic_fetch_nand (v, init, __ATOMIC_CONSUME);
+  if (*v != 0)
+    abort ();
+
+  __atomic_nand_fetch (v, 0, __ATOMIC_ACQUIRE);
+  if (*v != init)
+    abort ();
+
+  __atomic_nand_fetch (v, init, __ATOMIC_RELEASE);
+  if (*v != 0)
+    abort ();
+
+  __atomic_fetch_nand (v, init, __ATOMIC_ACQ_REL);
+  if (*v != init)
+    abort ();
+
+  __atomic_nand_fetch (v, 0, __ATOMIC_SEQ_CST);
+  if (*v != init)
+    abort ();
+}
+
+
+
+void
+test_xor (char* v)
+{
+  *v = init;
+  count = 0;
+
+  __atomic_xor_fetch (v, count, __ATOMIC_RELAXED);
+  if (*v != init)
+    abort ();
+
+  __atomic_fetch_xor (v, ~count, __ATOMIC_CONSUME);
+  if (*v != 0)
+    abort ();
+
+  __atomic_xor_fetch (v, 0, __ATOMIC_ACQUIRE);
+  if (*v != 0)
+    abort ();
+
+  __atomic_fetch_xor (v, ~count, __ATOMIC_RELEASE);
+  if (*v != init)
+    abort ();
+
+  __atomic_fetch_xor (v, 0, __ATOMIC_ACQ_REL);
+  if (*v != init)
+    abort ();
+
+  __atomic_xor_fetch (v, ~count, __ATOMIC_SEQ_CST);
+  if (*v != 0)
+    abort ();
+}
+
+void
+test_or (char* v)
+{
+  *v = 0;
+  count = 1;
+
+  __atomic_or_fetch (v, count, __ATOMIC_RELAXED);
+  if (*v != 1)
+    abort ();
+
+  count *= 2;
+  __atomic_fetch_or (v, count, __ATOMIC_CONSUME);
+  if (*v != 3)
+    abort ();
+
+  count *= 2;
+  __atomic_or_fetch (v, 4, __ATOMIC_ACQUIRE);
+  if (*v != 7)
+    abort ();
+
+  count *= 2;
+  __atomic_fetch_or (v, 8, __ATOMIC_RELEASE);
+  if (*v != 15)
+    abort ();
+
+  count *= 2;
+  __atomic_or_fetch (v, count, __ATOMIC_ACQ_REL);
+  if (*v != 31)
+    abort ();
+
+  count *= 2;
+  __atomic_fetch_or (v, count, __ATOMIC_SEQ_CST);
+  if (*v != 63)
+    abort ();
+}
+
+int
+main ()
+{
+  char* V[] = {&A.a, &A.b, &A.c, &A.d};
+
+  for (int i = 0; i < 4; i++) {
+    test_fetch_add (V[i]);
+    test_fetch_sub (V[i]);
+    test_fetch_and (V[i]);
+    test_fetch_nand (V[i]);
+    test_fetch_xor (V[i]);
+    test_fetch_or (V[i]);
+
+    test_add_fetch (V[i]);
+    test_sub_fetch (V[i]);
+    test_and_fetch (V[i]);
+    test_nand_fetch (V[i]);
+    test_xor_fetch (V[i]);
+    test_or_fetch (V[i]);
+
+    test_add (V[i]);
+    test_sub (V[i]);
+    test_and (V[i]);
+    test_nand (V[i]);
+    test_xor (V[i]);
+    test_or (V[i]);
+  }
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-4.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-4.c
new file mode 100644
index 00000000000..619cf1f86ca
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-4.c
@@ -0,0 +1,566 @@
+/* Check all short alignments.  */
+/* Duplicate logic as libatomic/testsuite/libatomic.c/atomic-op-2.c */
+/* Test __atomic routines for existence and proper execution on 2 byte
+   values with each valid memory model.  */
+/* { dg-do run } */
+/* { dg-options "-latomic -minline-atomics -Wno-address-of-packed-member" } */
+
+/* Test the execution of the __atomic_*OP builtin routines for a short.  */
+
+extern void abort(void);
+
+short count, res;
+const short init = ~0;
+
+struct A
+{
+   short a;
+   short b;
+} __attribute__ ((packed)) A;
+
+/* The fetch_op routines return the original value before the operation.  */
+
+void
+test_fetch_add (short* v)
+{
+  *v = 0;
+  count = 1;
+
+  if (__atomic_fetch_add (v, count, __ATOMIC_RELAXED) != 0)
+    abort ();
+
+  if (__atomic_fetch_add (v, 1, __ATOMIC_CONSUME) != 1)
+    abort ();
+
+  if (__atomic_fetch_add (v, count, __ATOMIC_ACQUIRE) != 2)
+    abort ();
+
+  if (__atomic_fetch_add (v, 1, __ATOMIC_RELEASE) != 3)
+    abort ();
+
+  if (__atomic_fetch_add (v, count, __ATOMIC_ACQ_REL) != 4)
+    abort ();
+
+  if (__atomic_fetch_add (v, 1, __ATOMIC_SEQ_CST) != 5)
+    abort ();
+}
+
+
+void
+test_fetch_sub (short* v)
+{
+  *v = res = 20;
+  count = 0;
+
+  if (__atomic_fetch_sub (v, count + 1, __ATOMIC_RELAXED) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, 1, __ATOMIC_CONSUME) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, count + 1, __ATOMIC_ACQUIRE) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, 1, __ATOMIC_RELEASE) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, count + 1, __ATOMIC_ACQ_REL) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, 1, __ATOMIC_SEQ_CST) !=  res--)
+    abort ();
+}
+
+void
+test_fetch_and (short* v)
+{
+  *v = init;
+
+  if (__atomic_fetch_and (v, 0, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_fetch_and (v, init, __ATOMIC_CONSUME) !=  0)
+    abort ();
+
+  if (__atomic_fetch_and (v, 0, __ATOMIC_ACQUIRE) !=  0)
+    abort ();
+
+  *v = ~*v;
+  if (__atomic_fetch_and (v, init, __ATOMIC_RELEASE) !=  init)
+    abort ();
+
+  if (__atomic_fetch_and (v, 0, __ATOMIC_ACQ_REL) !=  init)
+    abort ();
+
+  if (__atomic_fetch_and (v, 0, __ATOMIC_SEQ_CST) !=  0)
+    abort ();
+}
+
+void
+test_fetch_nand (short* v)
+{
+  *v = init;
+
+  if (__atomic_fetch_nand (v, 0, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_fetch_nand (v, init, __ATOMIC_CONSUME) !=  init)
+    abort ();
+
+  if (__atomic_fetch_nand (v, 0, __ATOMIC_ACQUIRE) !=  0 )
+    abort ();
+
+  if (__atomic_fetch_nand (v, init, __ATOMIC_RELEASE) !=  init)
+    abort ();
+
+  if (__atomic_fetch_nand (v, init, __ATOMIC_ACQ_REL) !=  0)
+    abort ();
+
+  if (__atomic_fetch_nand (v, 0, __ATOMIC_SEQ_CST) !=  init)
+    abort ();
+}
+
+void
+test_fetch_xor (short* v)
+{
+  *v = init;
+  count = 0;
+
+  if (__atomic_fetch_xor (v, count, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_fetch_xor (v, ~count, __ATOMIC_CONSUME) !=  init)
+    abort ();
+
+  if (__atomic_fetch_xor (v, 0, __ATOMIC_ACQUIRE) !=  0)
+    abort ();
+
+  if (__atomic_fetch_xor (v, ~count, __ATOMIC_RELEASE) !=  0)
+    abort ();
+
+  if (__atomic_fetch_xor (v, 0, __ATOMIC_ACQ_REL) !=  init)
+    abort ();
+
+  if (__atomic_fetch_xor (v, ~count, __ATOMIC_SEQ_CST) !=  init)
+    abort ();
+}
+
+void
+test_fetch_or (short* v)
+{
+  *v = 0;
+  count = 1;
+
+  if (__atomic_fetch_or (v, count, __ATOMIC_RELAXED) !=  0)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, 2, __ATOMIC_CONSUME) !=  1)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, count, __ATOMIC_ACQUIRE) !=  3)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, 8, __ATOMIC_RELEASE) !=  7)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, count, __ATOMIC_ACQ_REL) !=  15)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, count, __ATOMIC_SEQ_CST) !=  31)
+    abort ();
+}
+
+/* The OP_fetch routines return the new value after the operation.  */
+
+void
+test_add_fetch (short* v)
+{
+  *v = 0;
+  count = 1;
+
+  if (__atomic_add_fetch (v, count, __ATOMIC_RELAXED) != 1)
+    abort ();
+
+  if (__atomic_add_fetch (v, 1, __ATOMIC_CONSUME) != 2)
+    abort ();
+
+  if (__atomic_add_fetch (v, count, __ATOMIC_ACQUIRE) != 3)
+    abort ();
+
+  if (__atomic_add_fetch (v, 1, __ATOMIC_RELEASE) != 4)
+    abort ();
+
+  if (__atomic_add_fetch (v, count, __ATOMIC_ACQ_REL) != 5)
+    abort ();
+
+  if (__atomic_add_fetch (v, count, __ATOMIC_SEQ_CST) != 6)
+    abort ();
+}
+
+
+void
+test_sub_fetch (short* v)
+{
+  *v = res = 20;
+  count = 0;
+
+  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_RELAXED) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, 1, __ATOMIC_CONSUME) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_ACQUIRE) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, 1, __ATOMIC_RELEASE) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_ACQ_REL) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_SEQ_CST) !=  --res)
+    abort ();
+}
+
+void
+test_and_fetch (short* v)
+{
+  *v = init;
+
+  if (__atomic_and_fetch (v, 0, __ATOMIC_RELAXED) !=  0)
+    abort ();
+
+  *v = init;
+  if (__atomic_and_fetch (v, init, __ATOMIC_CONSUME) !=  init)
+    abort ();
+
+  if (__atomic_and_fetch (v, 0, __ATOMIC_ACQUIRE) !=  0)
+    abort ();
+
+  *v = ~*v;
+  if (__atomic_and_fetch (v, init, __ATOMIC_RELEASE) !=  init)
+    abort ();
+
+  if (__atomic_and_fetch (v, 0, __ATOMIC_ACQ_REL) !=  0)
+    abort ();
+
+  *v = ~*v;
+  if (__atomic_and_fetch (v, 0, __ATOMIC_SEQ_CST) !=  0)
+    abort ();
+}
+
+void
+test_nand_fetch (short* v)
+{
+  *v = init;
+
+  if (__atomic_nand_fetch (v, 0, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_nand_fetch (v, init, __ATOMIC_CONSUME) !=  0)
+    abort ();
+
+  if (__atomic_nand_fetch (v, 0, __ATOMIC_ACQUIRE) !=  init)
+    abort ();
+
+  if (__atomic_nand_fetch (v, init, __ATOMIC_RELEASE) !=  0)
+    abort ();
+
+  if (__atomic_nand_fetch (v, init, __ATOMIC_ACQ_REL) !=  init)
+    abort ();
+
+  if (__atomic_nand_fetch (v, 0, __ATOMIC_SEQ_CST) !=  init)
+    abort ();
+}
+
+
+
+void
+test_xor_fetch (short* v)
+{
+  *v = init;
+  count = 0;
+
+  if (__atomic_xor_fetch (v, count, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_xor_fetch (v, ~count, __ATOMIC_CONSUME) !=  0)
+    abort ();
+
+  if (__atomic_xor_fetch (v, 0, __ATOMIC_ACQUIRE) !=  0)
+    abort ();
+
+  if (__atomic_xor_fetch (v, ~count, __ATOMIC_RELEASE) !=  init)
+    abort ();
+
+  if (__atomic_xor_fetch (v, 0, __ATOMIC_ACQ_REL) !=  init)
+    abort ();
+
+  if (__atomic_xor_fetch (v, ~count, __ATOMIC_SEQ_CST) !=  0)
+    abort ();
+}
+
+void
+test_or_fetch (short* v)
+{
+  *v = 0;
+  count = 1;
+
+  if (__atomic_or_fetch (v, count, __ATOMIC_RELAXED) !=  1)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, 2, __ATOMIC_CONSUME) !=  3)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, count, __ATOMIC_ACQUIRE) !=  7)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, 8, __ATOMIC_RELEASE) !=  15)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, count, __ATOMIC_ACQ_REL) !=  31)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, count, __ATOMIC_SEQ_CST) !=  63)
+    abort ();
+}
+
+
+/* Test the OP routines with a result which isn't used. Use both variations
+   within each function.  */
+
+void
+test_add (short* v)
+{
+  *v = 0;
+  count = 1;
+
+  __atomic_add_fetch (v, count, __ATOMIC_RELAXED);
+  if (*v != 1)
+    abort ();
+
+  __atomic_fetch_add (v, count, __ATOMIC_CONSUME);
+  if (*v != 2)
+    abort ();
+
+  __atomic_add_fetch (v, 1 , __ATOMIC_ACQUIRE);
+  if (*v != 3)
+    abort ();
+
+  __atomic_fetch_add (v, 1, __ATOMIC_RELEASE);
+  if (*v != 4)
+    abort ();
+
+  __atomic_add_fetch (v, count, __ATOMIC_ACQ_REL);
+  if (*v != 5)
+    abort ();
+
+  __atomic_fetch_add (v, count, __ATOMIC_SEQ_CST);
+  if (*v != 6)
+    abort ();
+}
+
+
+void
+test_sub (short* v)
+{
+  *v = res = 20;
+  count = 0;
+
+  __atomic_sub_fetch (v, count + 1, __ATOMIC_RELAXED);
+  if (*v != --res)
+    abort ();
+
+  __atomic_fetch_sub (v, count + 1, __ATOMIC_CONSUME);
+  if (*v != --res)
+    abort ();
+
+  __atomic_sub_fetch (v, 1, __ATOMIC_ACQUIRE);
+  if (*v != --res)
+    abort ();
+
+  __atomic_fetch_sub (v, 1, __ATOMIC_RELEASE);
+  if (*v != --res)
+    abort ();
+
+  __atomic_sub_fetch (v, count + 1, __ATOMIC_ACQ_REL);
+  if (*v != --res)
+    abort ();
+
+  __atomic_fetch_sub (v, count + 1, __ATOMIC_SEQ_CST);
+  if (*v != --res)
+    abort ();
+}
+
+void
+test_and (short* v)
+{
+  *v = init;
+
+  __atomic_and_fetch (v, 0, __ATOMIC_RELAXED);
+  if (*v != 0)
+    abort ();
+
+  *v = init;
+  __atomic_fetch_and (v, init, __ATOMIC_CONSUME);
+  if (*v != init)
+    abort ();
+
+  __atomic_and_fetch (v, 0, __ATOMIC_ACQUIRE);
+  if (*v != 0)
+    abort ();
+
+  *v = ~*v;
+  __atomic_fetch_and (v, init, __ATOMIC_RELEASE);
+  if (*v != init)
+    abort ();
+
+  __atomic_and_fetch (v, 0, __ATOMIC_ACQ_REL);
+  if (*v != 0)
+    abort ();
+
+  *v = ~*v;
+  __atomic_fetch_and (v, 0, __ATOMIC_SEQ_CST);
+  if (*v != 0)
+    abort ();
+}
+
+void
+test_nand (short* v)
+{
+  *v = init;
+
+  __atomic_fetch_nand (v, 0, __ATOMIC_RELAXED);
+  if (*v != init)
+    abort ();
+
+  __atomic_fetch_nand (v, init, __ATOMIC_CONSUME);
+  if (*v != 0)
+    abort ();
+
+  __atomic_nand_fetch (v, 0, __ATOMIC_ACQUIRE);
+  if (*v != init)
+    abort ();
+
+  __atomic_nand_fetch (v, init, __ATOMIC_RELEASE);
+  if (*v != 0)
+    abort ();
+
+  __atomic_fetch_nand (v, init, __ATOMIC_ACQ_REL);
+  if (*v != init)
+    abort ();
+
+  __atomic_nand_fetch (v, 0, __ATOMIC_SEQ_CST);
+  if (*v != init)
+    abort ();
+}
+
+
+
+void
+test_xor (short* v)
+{
+  *v = init;
+  count = 0;
+
+  __atomic_xor_fetch (v, count, __ATOMIC_RELAXED);
+  if (*v != init)
+    abort ();
+
+  __atomic_fetch_xor (v, ~count, __ATOMIC_CONSUME);
+  if (*v != 0)
+    abort ();
+
+  __atomic_xor_fetch (v, 0, __ATOMIC_ACQUIRE);
+  if (*v != 0)
+    abort ();
+
+  __atomic_fetch_xor (v, ~count, __ATOMIC_RELEASE);
+  if (*v != init)
+    abort ();
+
+  __atomic_fetch_xor (v, 0, __ATOMIC_ACQ_REL);
+  if (*v != init)
+    abort ();
+
+  __atomic_xor_fetch (v, ~count, __ATOMIC_SEQ_CST);
+  if (*v != 0)
+    abort ();
+}
+
+void
+test_or (short* v)
+{
+  *v = 0;
+  count = 1;
+
+  __atomic_or_fetch (v, count, __ATOMIC_RELAXED);
+  if (*v != 1)
+    abort ();
+
+  count *= 2;
+  __atomic_fetch_or (v, count, __ATOMIC_CONSUME);
+  if (*v != 3)
+    abort ();
+
+  count *= 2;
+  __atomic_or_fetch (v, 4, __ATOMIC_ACQUIRE);
+  if (*v != 7)
+    abort ();
+
+  count *= 2;
+  __atomic_fetch_or (v, 8, __ATOMIC_RELEASE);
+  if (*v != 15)
+    abort ();
+
+  count *= 2;
+  __atomic_or_fetch (v, count, __ATOMIC_ACQ_REL);
+  if (*v != 31)
+    abort ();
+
+  count *= 2;
+  __atomic_fetch_or (v, count, __ATOMIC_SEQ_CST);
+  if (*v != 63)
+    abort ();
+}
+
+int
+main () {
+  short* V[] = {&A.a, &A.b};
+
+  for (int i = 0; i < 2; i++) {
+    test_fetch_add (V[i]);
+    test_fetch_sub (V[i]);
+    test_fetch_and (V[i]);
+    test_fetch_nand (V[i]);
+    test_fetch_xor (V[i]);
+    test_fetch_or (V[i]);
+
+    test_add_fetch (V[i]);
+    test_sub_fetch (V[i]);
+    test_and_fetch (V[i]);
+    test_nand_fetch (V[i]);
+    test_xor_fetch (V[i]);
+    test_or_fetch (V[i]);
+
+    test_add (V[i]);
+    test_sub (V[i]);
+    test_and (V[i]);
+    test_nand (V[i]);
+    test_xor (V[i]);
+    test_or (V[i]);
+  }
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-5.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-5.c
new file mode 100644
index 00000000000..c2751235dbf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-5.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* Verify that constant propogation is functioning.  */
+/* The -4 should be propogated into an ANDI statement. */
+/* { dg-options "-minline-atomics" } */
+/* { dg-final { scan-assembler-not "\tli\t[at]\d,-4" } } */
+
+char bar;
+
+int
+main ()
+{
+  __sync_fetch_and_add(&bar, 1);
+}
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-6.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-6.c
new file mode 100644
index 00000000000..18249bae7d1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-6.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* Verify that masks are generated to the correct size.  */
+/* { dg-options "-O3 -minline-atomics" } */
+/* Check for mask */
+/* { dg-final { scan-assembler "\tli\t[at]\d,255" } } */
+
+int
+main ()
+{
+  char bar __attribute__((aligned (32)));
+  __sync_fetch_and_add(&bar, 0);
+}
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-7.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-7.c
new file mode 100644
index 00000000000..81bbf4badce
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-7.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* Verify that masks are generated to the correct size.  */
+/* { dg-options "-O3 -minline-atomics" } */
+/* Check for mask */
+/* { dg-final { scan-assembler "\tli\t[at]\d,65535" } } */
+
+int
+main ()
+{
+  short bar __attribute__((aligned (32)));
+  __sync_fetch_and_add(&bar, 0);
+}
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-8.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-8.c
new file mode 100644
index 00000000000..d27562ed981
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-8.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* Verify that masks are aligned properly.  */
+/* { dg-options "-O3 -minline-atomics" } */
+/* Check for mask */
+/* { dg-final { scan-assembler "\tli\t[at]\d,16711680" } } */
+
+int
+main ()
+{
+  struct A {
+    char a;
+    char b;
+    char c;
+    char d;
+  } __attribute__ ((packed)) __attribute__((aligned (32))) A;
+  __sync_fetch_and_add(&A.c, 0);
+}
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-9.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-9.c
new file mode 100644
index 00000000000..382849702ca
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-9.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* Verify that masks are aligned properly.  */
+/* { dg-options "-O3 -minline-atomics" } */
+/* Check for mask */
+/* { dg-final { scan-assembler "\tli\t[at]\d,-16777216" } } */
+
+int
+main ()
+{
+  struct A {
+    char a;
+    char b;
+    char c;
+    char d;
+  } __attribute__ ((packed)) __attribute__((aligned (32))) A;
+  __sync_fetch_and_add(&A.d, 0);
+}
diff --git a/libgcc/config/riscv/atomic.c b/libgcc/config/riscv/atomic.c
index 904d8c59cf0..9583027b757 100644
--- a/libgcc/config/riscv/atomic.c
+++ b/libgcc/config/riscv/atomic.c
@@ -30,6 +30,8 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 #define INVERT		"not %[tmp1], %[tmp1]\n\t"
 #define DONT_INVERT	""
 
+/* Logic duplicated in gcc/gcc/config/riscv/sync.md for use when inlining is enabled */
+
 #define GENERATE_FETCH_AND_OP(type, size, opname, insn, invert, cop)	\
   type __sync_fetch_and_ ## opname ## _ ## size (type *p, type v)	\
   {									\
-- 
2.25.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v1] RISC-V: Add support for inlining subword atomic operations
  2022-02-08  0:48 [PATCH v1] RISC-V: Add support for inlining subword atomic operations Patrick O'Neill
@ 2022-02-23 21:49 ` Palmer Dabbelt
  2022-03-10 22:36   ` [PATCH v2] RISCV: Add support for inlining subword atomics Patrick O'Neill
  0 siblings, 1 reply; 10+ messages in thread
From: Palmer Dabbelt @ 2022-02-23 21:49 UTC (permalink / raw)
  To: patrick; +Cc: gcc-patches, patrick

On Mon, 07 Feb 2022 16:48:41 PST (-0800), patrick@rivosinc.com wrote:
> RISC-V has no support for subword atomic operations; code currently
> generates libatomic library calls.
>
> This patch changes the default behavior to inline subword atomic calls
> (using the same logic as the existing library call).
> Behavior can be specified using the -minline-atomics and
> -mno-inline-atomics command line flags.
>
> gcc/libgcc/config/riscv/atomic.c has the same logic implemented in asm.
> This will need to stay for backwards compatibility and the
> -mno-inline-atomics flag.
>
> 2022-02-07 Patrick O'Neill <patrick@rivosinc.com>
>
> 	PR target/104338
> 	* riscv.opt: Add command-line flag.
> 	* sync.md (atomic_fetch_<atomic_optab><mode>): logic for
> 	expanding subword atomic operations.
> 	* sync.md (subword_atomic_fetch_strong_<atomic_optab>): LR/SC
> 	block for performing atomic operation
> 	* atomic.c: Add reference to duplicate logic.
> 	* inline-atomics-1.c: New test.
> 	* inline-atomics-2.c: Likewise.
> 	* inline-atomics-3.c: Likewise.
> 	* inline-atomics-4.c: Likewise.
> 	* inline-atomics-5.c: Likewise.
> 	* inline-atomics-6.c: Likewise.
> 	* inline-atomics-7.c: Likewise.
> 	* inline-atomics-8.c: Likewise.
> 	* inline-atomics-9.c: Likewise.
>
> Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
> ---
> There may be further concerns about the memory consistency of these
> operations, but this patch focuses on simply moving the logic inline.
> Those concerns can be addressed in a future patch.
> ---
>  gcc/config/riscv/riscv.opt                    |   4 +
>  gcc/config/riscv/sync.md                      |  96 +++
>  .../gcc.target/riscv/inline-atomics-1.c       |  11 +
>  .../gcc.target/riscv/inline-atomics-2.c       |  12 +
>  .../gcc.target/riscv/inline-atomics-3.c       | 569 ++++++++++++++++++
>  .../gcc.target/riscv/inline-atomics-4.c       | 566 +++++++++++++++++
>  .../gcc.target/riscv/inline-atomics-5.c       |  13 +
>  .../gcc.target/riscv/inline-atomics-6.c       |  12 +
>  .../gcc.target/riscv/inline-atomics-7.c       |  12 +
>  .../gcc.target/riscv/inline-atomics-8.c       |  17 +
>  .../gcc.target/riscv/inline-atomics-9.c       |  17 +
>  libgcc/config/riscv/atomic.c                  |   2 +
>  12 files changed, 1331 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-1.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-2.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-3.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-4.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-5.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-6.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-7.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-8.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-9.c
>
> diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
> index e294e223151..fb702317233 100644
> --- a/gcc/config/riscv/riscv.opt
> +++ b/gcc/config/riscv/riscv.opt
> @@ -211,3 +211,7 @@ Enum(isa_spec_class) String(20191213) Value(ISA_SPEC_CLASS_20191213)
>  misa-spec=
>  Target RejectNegative Joined Enum(isa_spec_class) Var(riscv_isa_spec) Init(TARGET_DEFAULT_ISA_SPEC)
>  Set the version of RISC-V ISA spec.
> +
> +minline-atomics
> +Target Bool Var(ALWAYS_INLINE_SUBWORD_ATOMIC) Init(-1)
> +Always inline subword atomic operations.

We usually have lower-case names for variables, but I think you can get 
away with a target flag here (which makes things slightly easier).  The 
-1 initializer is also a bit odd, but that'd go away with a target flag.

At a bare minimum this needs a dov/invoke.texi blurb, but IMO this 
should really be called out as a news entry as well -- we're already 
finding some ABI-related fallout in libstdc++, so we should make this as 
visible as possible to users.  I think it's OK to default to enabling 
the inline atomics, as we're not directly breaking the ABI from GCC.

> diff --git a/gcc/config/riscv/sync.md b/gcc/config/riscv/sync.md
> index 747a799e237..e19b4157d3c 100644
> --- a/gcc/config/riscv/sync.md
> +++ b/gcc/config/riscv/sync.md
> @@ -92,6 +92,102 @@
>    "%F3amo<insn>.<amo>%A3 %0,%z2,%1"
>    [(set (attr "length") (const_int 8))])
>
> +(define_expand "atomic_fetch_<atomic_optab><mode>"
> +  [(set (match_operand:SHORT 0 "register_operand" "=&r")	      ;; old value at mem
> +	(match_operand:SHORT 1 "memory_operand" "+A"))		      ;; mem location
> +   (set (match_dup 1)
> +	(unspec_volatile:SHORT
> +	  [(any_atomic:SHORT (match_dup 1)
> +		     (match_operand:SHORT 2 "reg_or_0_operand" "rJ")) ;; value for op
> +	   (match_operand:SI 3 "const_int_operand")]		      ;; model
> +	 UNSPEC_SYNC_OLD_OP))]
> +  "TARGET_ATOMIC && ALWAYS_INLINE_SUBWORD_ATOMIC"
> +{
> +  /* We have no QImode/HImode atomics, so form a mask, then use
> +     subword_atomic_fetch_strong_<mode> to implement a LR/SC version of the
> +     operation. */
> +
> +  /* Logic duplicated in gcc/libgcc/config/riscv/atomic.c for use when inlining
> +     is disabled */
> +
> +  rtx old = gen_reg_rtx (SImode);
> +  rtx mem = operands[1];
> +  rtx value = operands[2];
> +  rtx mask = gen_reg_rtx (SImode);
> +  rtx notmask = gen_reg_rtx (SImode);
> +
> +  rtx addr = force_reg (Pmode, XEXP (mem, 0));
> +
> +  rtx aligned_addr = gen_reg_rtx (Pmode);
> +  emit_move_insn (aligned_addr,  gen_rtx_AND (Pmode, addr,
> +					      gen_int_mode (-4, Pmode)));
> +
> +  rtx aligned_mem = change_address (mem, SImode, aligned_addr);
> +
> +  rtx shift = gen_reg_rtx (SImode);
> +  emit_move_insn (shift, gen_rtx_AND (SImode, gen_lowpart (SImode, addr),
> +				      gen_int_mode (3, SImode)));
> +  emit_move_insn (shift, gen_rtx_ASHIFT (SImode, shift,
> +					 gen_int_mode(3, SImode)));
> +
> +  rtx value_reg = gen_reg_rtx (SImode);
> +  emit_move_insn (value_reg, simplify_gen_subreg (SImode, value, <MODE>mode, 0));
> +
> +  rtx shifted_value = gen_reg_rtx (SImode);
> +  emit_move_insn(shifted_value, gen_rtx_ASHIFT(SImode, value_reg,
> +					       gen_lowpart (QImode, shift)));
> +
> +  int unshifted_mask;
> +  if (<MODE>mode == QImode)
> +    unshifted_mask = 0xFF;
> +  else
> +    unshifted_mask = 0xFFFF;
> +
> +  rtx mask_reg = gen_reg_rtx (SImode);
> +  emit_move_insn (mask_reg, gen_int_mode(unshifted_mask, SImode));
> +
> +  emit_move_insn (mask, gen_rtx_ASHIFT(SImode, mask_reg,
> +				       gen_lowpart (QImode, shift)));
> +
> +  emit_move_insn (notmask, gen_rtx_NOT(SImode, mask));
> +
> +  emit_insn (gen_subword_atomic_fetch_strong_<atomic_optab> (old, aligned_mem,
> +							     shifted_value,
> +							     mask, notmask));
> +
> +  emit_move_insn (old, gen_rtx_ASHIFTRT(SImode, old,
> +					gen_lowpart(QImode, shift)));
> +
> +  emit_move_insn (operands[0], gen_lowpart(<MODE>mode, old));
> +
> +  DONE;
> +})
> +
> +(define_insn "subword_atomic_fetch_strong_<atomic_optab>"
> +  [(set (match_operand:SI 0 "register_operand" "=&r")		   ;; old value at mem
> +	(match_operand:SI 1 "memory_operand" "+A"))		   ;; mem location
> +   (set (match_dup 1)
> +	(unspec_volatile:SI
> +	  [(any_atomic:SI (match_dup 1)
> +		     (match_operand:SI 2 "register_operand" "rI")) ;; value for op
> +	   (match_operand:SI 3 "register_operand" "rI")]	   ;; mask
> +	 UNSPEC_SYNC_OLD_OP))

IIUC nothing's looking at UNSPEC_SYNC_OLD_OP, so it's not technically a 
bug, but this isn't really computing the same thing the other patterns 
do (it's a shifted version of the value, not the actual value).  That's 
likely to trip someone up at some point, so I'd just make a new unspec 
for it.

> +    (match_operand:SI 4 "register_operand" "rI")		   ;; not_mask
> +    (clobber (match_scratch:SI 5 "=&r"))			   ;; tmp_1
> +    (clobber (match_scratch:SI 6 "=&r"))]			   ;; tmp_2
> +  "TARGET_ATOMIC && ALWAYS_INLINE_SUBWORD_ATOMIC"
> +  {
> +    return
> +    "1:\;"
> +    "lr.w.aq\t%0, %1\;"
> +    "<insn>\t%5, %0, %2\;"
> +    "and\t%5, %5, %3\;"
> +    "and\t%6, %0, %4\;"
> +    "or\t%6, %6, %5\;"
> +    "sc.w.rl\t%5, %6, %1\;"
> +    "bnez\t%5, 1b\;";}
> +  )
> +
>  (define_insn "atomic_exchange<mode>"
>    [(set (match_operand:GPR 0 "register_operand" "=&r")
>  	(unspec_volatile:GPR
> diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-1.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-1.c
> new file mode 100644
> index 00000000000..110fdabd313
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-1.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mno-inline-atomics" } */
> +/* { dg-final { scan-assembler "\tcall\t__sync_fetch_and_add_1" } } */
> +
> +char bar;
> +
> +int
> +main ()
> +{
> +  __sync_fetch_and_add(&bar, 1);
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-2.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-2.c
> new file mode 100644
> index 00000000000..8d5c31d8b79
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-2.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* Verify that subword atomics do not generate calls.  */
> +/* { dg-options "-minline-atomics" } */
> +/* { dg-final { scan-assembler-not "\tcall\t__sync_fetch_and_add_1" } } */
> +
> +char bar;
> +
> +int
> +main ()
> +{
> +  __sync_fetch_and_add(&bar, 1);
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-3.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-3.c
> new file mode 100644
> index 00000000000..19b382d45b0
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-3.c
> @@ -0,0 +1,569 @@
> +/* Check all char alignments.  */
> +/* Duplicate logic as libatomic/testsuite/libatomic.c/atomic-op-1.c */
> +/* Test __atomic routines for existence and proper execution on 1 byte
> +   values with each valid memory model.  */
> +/* { dg-do run } */
> +/* { dg-options "-latomic -minline-atomics -Wno-address-of-packed-member" } */
> +
> +/* Test the execution of the __atomic_*OP builtin routines for a char.  */
> +
> +extern void abort(void);
> +
> +char count, res;
> +const char init = ~0;
> +
> +struct A
> +{
> +   char a;
> +   char b;
> +   char c;
> +   char d;
> +} __attribute__ ((packed)) A;
> +
> +/* The fetch_op routines return the original value before the operation.  */
> +
> +void
> +test_fetch_add (char* v)
> +{
> +  *v = 0;
> +  count = 1;
> +
> +  if (__atomic_fetch_add (v, count, __ATOMIC_RELAXED) != 0)
> +    abort ();
> +
> +  if (__atomic_fetch_add (v, 1, __ATOMIC_CONSUME) != 1)
> +    abort ();
> +
> +  if (__atomic_fetch_add (v, count, __ATOMIC_ACQUIRE) != 2)
> +    abort ();
> +
> +  if (__atomic_fetch_add (v, 1, __ATOMIC_RELEASE) != 3)
> +    abort ();
> +
> +  if (__atomic_fetch_add (v, count, __ATOMIC_ACQ_REL) != 4)
> +    abort ();
> +
> +  if (__atomic_fetch_add (v, 1, __ATOMIC_SEQ_CST) != 5)
> +    abort ();
> +}
> +
> +
> +void
> +test_fetch_sub (char* v)
> +{
> +  *v = res = 20;
> +  count = 0;
> +
> +  if (__atomic_fetch_sub (v, count + 1, __ATOMIC_RELAXED) !=  res--)
> +    abort ();
> +
> +  if (__atomic_fetch_sub (v, 1, __ATOMIC_CONSUME) !=  res--)
> +    abort ();
> +
> +  if (__atomic_fetch_sub (v, count + 1, __ATOMIC_ACQUIRE) !=  res--)
> +    abort ();
> +
> +  if (__atomic_fetch_sub (v, 1, __ATOMIC_RELEASE) !=  res--)
> +    abort ();
> +
> +  if (__atomic_fetch_sub (v, count + 1, __ATOMIC_ACQ_REL) !=  res--)
> +    abort ();
> +
> +  if (__atomic_fetch_sub (v, 1, __ATOMIC_SEQ_CST) !=  res--)
> +    abort ();
> +}
> +
> +void
> +test_fetch_and (char* v)
> +{
> +  *v = init;
> +
> +  if (__atomic_fetch_and (v, 0, __ATOMIC_RELAXED) !=  init)
> +    abort ();
> +
> +  if (__atomic_fetch_and (v, init, __ATOMIC_CONSUME) !=  0)
> +    abort ();
> +
> +  if (__atomic_fetch_and (v, 0, __ATOMIC_ACQUIRE) !=  0)
> +    abort ();
> +
> +  *v = ~*v;
> +  if (__atomic_fetch_and (v, init, __ATOMIC_RELEASE) !=  init)
> +    abort ();
> +
> +  if (__atomic_fetch_and (v, 0, __ATOMIC_ACQ_REL) !=  init)
> +    abort ();
> +
> +  if (__atomic_fetch_and (v, 0, __ATOMIC_SEQ_CST) !=  0)
> +    abort ();
> +}
> +
> +void
> +test_fetch_nand (char* v)
> +{
> +  *v = init;
> +
> +  if (__atomic_fetch_nand (v, 0, __ATOMIC_RELAXED) !=  init)
> +    abort ();
> +
> +  if (__atomic_fetch_nand (v, init, __ATOMIC_CONSUME) !=  init)
> +    abort ();
> +
> +  if (__atomic_fetch_nand (v, 0, __ATOMIC_ACQUIRE) !=  0 )
> +    abort ();
> +
> +  if (__atomic_fetch_nand (v, init, __ATOMIC_RELEASE) !=  init)
> +    abort ();
> +
> +  if (__atomic_fetch_nand (v, init, __ATOMIC_ACQ_REL) !=  0)
> +    abort ();
> +
> +  if (__atomic_fetch_nand (v, 0, __ATOMIC_SEQ_CST) !=  init)
> +    abort ();
> +}
> +
> +void
> +test_fetch_xor (char* v)
> +{
> +  *v = init;
> +  count = 0;
> +
> +  if (__atomic_fetch_xor (v, count, __ATOMIC_RELAXED) !=  init)
> +    abort ();
> +
> +  if (__atomic_fetch_xor (v, ~count, __ATOMIC_CONSUME) !=  init)
> +    abort ();
> +
> +  if (__atomic_fetch_xor (v, 0, __ATOMIC_ACQUIRE) !=  0)
> +    abort ();
> +
> +  if (__atomic_fetch_xor (v, ~count, __ATOMIC_RELEASE) !=  0)
> +    abort ();
> +
> +  if (__atomic_fetch_xor (v, 0, __ATOMIC_ACQ_REL) !=  init)
> +    abort ();
> +
> +  if (__atomic_fetch_xor (v, ~count, __ATOMIC_SEQ_CST) !=  init)
> +    abort ();
> +}
> +
> +void
> +test_fetch_or (char* v)
> +{
> +  *v = 0;
> +  count = 1;
> +
> +  if (__atomic_fetch_or (v, count, __ATOMIC_RELAXED) !=  0)
> +    abort ();
> +
> +  count *= 2;
> +  if (__atomic_fetch_or (v, 2, __ATOMIC_CONSUME) !=  1)
> +    abort ();
> +
> +  count *= 2;
> +  if (__atomic_fetch_or (v, count, __ATOMIC_ACQUIRE) !=  3)
> +    abort ();
> +
> +  count *= 2;
> +  if (__atomic_fetch_or (v, 8, __ATOMIC_RELEASE) !=  7)
> +    abort ();
> +
> +  count *= 2;
> +  if (__atomic_fetch_or (v, count, __ATOMIC_ACQ_REL) !=  15)
> +    abort ();
> +
> +  count *= 2;
> +  if (__atomic_fetch_or (v, count, __ATOMIC_SEQ_CST) !=  31)
> +    abort ();
> +}
> +
> +/* The OP_fetch routines return the new value after the operation.  */
> +
> +void
> +test_add_fetch (char* v)
> +{
> +  *v = 0;
> +  count = 1;
> +
> +  if (__atomic_add_fetch (v, count, __ATOMIC_RELAXED) != 1)
> +    abort ();
> +
> +  if (__atomic_add_fetch (v, 1, __ATOMIC_CONSUME) != 2)
> +    abort ();
> +
> +  if (__atomic_add_fetch (v, count, __ATOMIC_ACQUIRE) != 3)
> +    abort ();
> +
> +  if (__atomic_add_fetch (v, 1, __ATOMIC_RELEASE) != 4)
> +    abort ();
> +
> +  if (__atomic_add_fetch (v, count, __ATOMIC_ACQ_REL) != 5)
> +    abort ();
> +
> +  if (__atomic_add_fetch (v, count, __ATOMIC_SEQ_CST) != 6)
> +    abort ();
> +}
> +
> +
> +void
> +test_sub_fetch (char* v)
> +{
> +  *v = res = 20;
> +  count = 0;
> +
> +  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_RELAXED) !=  --res)
> +    abort ();
> +
> +  if (__atomic_sub_fetch (v, 1, __ATOMIC_CONSUME) !=  --res)
> +    abort ();
> +
> +  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_ACQUIRE) !=  --res)
> +    abort ();
> +
> +  if (__atomic_sub_fetch (v, 1, __ATOMIC_RELEASE) !=  --res)
> +    abort ();
> +
> +  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_ACQ_REL) !=  --res)
> +    abort ();
> +
> +  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_SEQ_CST) !=  --res)
> +    abort ();
> +}
> +
> +void
> +test_and_fetch (char* v)
> +{
> +  *v = init;
> +
> +  if (__atomic_and_fetch (v, 0, __ATOMIC_RELAXED) !=  0)
> +    abort ();
> +
> +  *v = init;
> +  if (__atomic_and_fetch (v, init, __ATOMIC_CONSUME) !=  init)
> +    abort ();
> +
> +  if (__atomic_and_fetch (v, 0, __ATOMIC_ACQUIRE) !=  0)
> +    abort ();
> +
> +  *v = ~*v;
> +  if (__atomic_and_fetch (v, init, __ATOMIC_RELEASE) !=  init)
> +    abort ();
> +
> +  if (__atomic_and_fetch (v, 0, __ATOMIC_ACQ_REL) !=  0)
> +    abort ();
> +
> +  *v = ~*v;
> +  if (__atomic_and_fetch (v, 0, __ATOMIC_SEQ_CST) !=  0)
> +    abort ();
> +}
> +
> +void
> +test_nand_fetch (char* v)
> +{
> +  *v = init;
> +
> +  if (__atomic_nand_fetch (v, 0, __ATOMIC_RELAXED) !=  init)
> +    abort ();
> +
> +  if (__atomic_nand_fetch (v, init, __ATOMIC_CONSUME) !=  0)
> +    abort ();
> +
> +  if (__atomic_nand_fetch (v, 0, __ATOMIC_ACQUIRE) !=  init)
> +    abort ();
> +
> +  if (__atomic_nand_fetch (v, init, __ATOMIC_RELEASE) !=  0)
> +    abort ();
> +
> +  if (__atomic_nand_fetch (v, init, __ATOMIC_ACQ_REL) !=  init)
> +    abort ();
> +
> +  if (__atomic_nand_fetch (v, 0, __ATOMIC_SEQ_CST) !=  init)
> +    abort ();
> +}
> +
> +
> +
> +void
> +test_xor_fetch (char* v)
> +{
> +  *v = init;
> +  count = 0;
> +
> +  if (__atomic_xor_fetch (v, count, __ATOMIC_RELAXED) !=  init)
> +    abort ();
> +
> +  if (__atomic_xor_fetch (v, ~count, __ATOMIC_CONSUME) !=  0)
> +    abort ();
> +
> +  if (__atomic_xor_fetch (v, 0, __ATOMIC_ACQUIRE) !=  0)
> +    abort ();
> +
> +  if (__atomic_xor_fetch (v, ~count, __ATOMIC_RELEASE) !=  init)
> +    abort ();
> +
> +  if (__atomic_xor_fetch (v, 0, __ATOMIC_ACQ_REL) !=  init)
> +    abort ();
> +
> +  if (__atomic_xor_fetch (v, ~count, __ATOMIC_SEQ_CST) !=  0)
> +    abort ();
> +}
> +
> +void
> +test_or_fetch (char* v)
> +{
> +  *v = 0;
> +  count = 1;
> +
> +  if (__atomic_or_fetch (v, count, __ATOMIC_RELAXED) !=  1)
> +    abort ();
> +
> +  count *= 2;
> +  if (__atomic_or_fetch (v, 2, __ATOMIC_CONSUME) !=  3)
> +    abort ();
> +
> +  count *= 2;
> +  if (__atomic_or_fetch (v, count, __ATOMIC_ACQUIRE) !=  7)
> +    abort ();
> +
> +  count *= 2;
> +  if (__atomic_or_fetch (v, 8, __ATOMIC_RELEASE) !=  15)
> +    abort ();
> +
> +  count *= 2;
> +  if (__atomic_or_fetch (v, count, __ATOMIC_ACQ_REL) !=  31)
> +    abort ();
> +
> +  count *= 2;
> +  if (__atomic_or_fetch (v, count, __ATOMIC_SEQ_CST) !=  63)
> +    abort ();
> +}
> +
> +
> +/* Test the OP routines with a result which isn't used. Use both variations
> +   within each function.  */
> +
> +void
> +test_add (char* v)
> +{
> +  *v = 0;
> +  count = 1;
> +
> +  __atomic_add_fetch (v, count, __ATOMIC_RELAXED);
> +  if (*v != 1)
> +    abort ();
> +
> +  __atomic_fetch_add (v, count, __ATOMIC_CONSUME);
> +  if (*v != 2)
> +    abort ();
> +
> +  __atomic_add_fetch (v, 1 , __ATOMIC_ACQUIRE);
> +  if (*v != 3)
> +    abort ();
> +
> +  __atomic_fetch_add (v, 1, __ATOMIC_RELEASE);
> +  if (*v != 4)
> +    abort ();
> +
> +  __atomic_add_fetch (v, count, __ATOMIC_ACQ_REL);
> +  if (*v != 5)
> +    abort ();
> +
> +  __atomic_fetch_add (v, count, __ATOMIC_SEQ_CST);
> +  if (*v != 6)
> +    abort ();
> +}
> +
> +
> +void
> +test_sub (char* v)
> +{
> +  *v = res = 20;
> +  count = 0;
> +
> +  __atomic_sub_fetch (v, count + 1, __ATOMIC_RELAXED);
> +  if (*v != --res)
> +    abort ();
> +
> +  __atomic_fetch_sub (v, count + 1, __ATOMIC_CONSUME);
> +  if (*v != --res)
> +    abort ();
> +
> +  __atomic_sub_fetch (v, 1, __ATOMIC_ACQUIRE);
> +  if (*v != --res)
> +    abort ();
> +
> +  __atomic_fetch_sub (v, 1, __ATOMIC_RELEASE);
> +  if (*v != --res)
> +    abort ();
> +
> +  __atomic_sub_fetch (v, count + 1, __ATOMIC_ACQ_REL);
> +  if (*v != --res)
> +    abort ();
> +
> +  __atomic_fetch_sub (v, count + 1, __ATOMIC_SEQ_CST);
> +  if (*v != --res)
> +    abort ();
> +}
> +
> +void
> +test_and (char* v)
> +{
> +  *v = init;
> +
> +  __atomic_and_fetch (v, 0, __ATOMIC_RELAXED);
> +  if (*v != 0)
> +    abort ();
> +
> +  *v = init;
> +  __atomic_fetch_and (v, init, __ATOMIC_CONSUME);
> +  if (*v != init)
> +    abort ();
> +
> +  __atomic_and_fetch (v, 0, __ATOMIC_ACQUIRE);
> +  if (*v != 0)
> +    abort ();
> +
> +  *v = ~*v;
> +  __atomic_fetch_and (v, init, __ATOMIC_RELEASE);
> +  if (*v != init)
> +    abort ();
> +
> +  __atomic_and_fetch (v, 0, __ATOMIC_ACQ_REL);
> +  if (*v != 0)
> +    abort ();
> +
> +  *v = ~*v;
> +  __atomic_fetch_and (v, 0, __ATOMIC_SEQ_CST);
> +  if (*v != 0)
> +    abort ();
> +}
> +
> +void
> +test_nand (char* v)
> +{
> +  *v = init;
> +
> +  __atomic_fetch_nand (v, 0, __ATOMIC_RELAXED);
> +  if (*v != init)
> +    abort ();
> +
> +  __atomic_fetch_nand (v, init, __ATOMIC_CONSUME);
> +  if (*v != 0)
> +    abort ();
> +
> +  __atomic_nand_fetch (v, 0, __ATOMIC_ACQUIRE);
> +  if (*v != init)
> +    abort ();
> +
> +  __atomic_nand_fetch (v, init, __ATOMIC_RELEASE);
> +  if (*v != 0)
> +    abort ();
> +
> +  __atomic_fetch_nand (v, init, __ATOMIC_ACQ_REL);
> +  if (*v != init)
> +    abort ();
> +
> +  __atomic_nand_fetch (v, 0, __ATOMIC_SEQ_CST);
> +  if (*v != init)
> +    abort ();
> +}
> +
> +
> +
> +void
> +test_xor (char* v)
> +{
> +  *v = init;
> +  count = 0;
> +
> +  __atomic_xor_fetch (v, count, __ATOMIC_RELAXED);
> +  if (*v != init)
> +    abort ();
> +
> +  __atomic_fetch_xor (v, ~count, __ATOMIC_CONSUME);
> +  if (*v != 0)
> +    abort ();
> +
> +  __atomic_xor_fetch (v, 0, __ATOMIC_ACQUIRE);
> +  if (*v != 0)
> +    abort ();
> +
> +  __atomic_fetch_xor (v, ~count, __ATOMIC_RELEASE);
> +  if (*v != init)
> +    abort ();
> +
> +  __atomic_fetch_xor (v, 0, __ATOMIC_ACQ_REL);
> +  if (*v != init)
> +    abort ();
> +
> +  __atomic_xor_fetch (v, ~count, __ATOMIC_SEQ_CST);
> +  if (*v != 0)
> +    abort ();
> +}
> +
> +void
> +test_or (char* v)
> +{
> +  *v = 0;
> +  count = 1;
> +
> +  __atomic_or_fetch (v, count, __ATOMIC_RELAXED);
> +  if (*v != 1)
> +    abort ();
> +
> +  count *= 2;
> +  __atomic_fetch_or (v, count, __ATOMIC_CONSUME);
> +  if (*v != 3)
> +    abort ();
> +
> +  count *= 2;
> +  __atomic_or_fetch (v, 4, __ATOMIC_ACQUIRE);
> +  if (*v != 7)
> +    abort ();
> +
> +  count *= 2;
> +  __atomic_fetch_or (v, 8, __ATOMIC_RELEASE);
> +  if (*v != 15)
> +    abort ();
> +
> +  count *= 2;
> +  __atomic_or_fetch (v, count, __ATOMIC_ACQ_REL);
> +  if (*v != 31)
> +    abort ();
> +
> +  count *= 2;
> +  __atomic_fetch_or (v, count, __ATOMIC_SEQ_CST);
> +  if (*v != 63)
> +    abort ();
> +}
> +
> +int
> +main ()
> +{
> +  char* V[] = {&A.a, &A.b, &A.c, &A.d};
> +
> +  for (int i = 0; i < 4; i++) {
> +    test_fetch_add (V[i]);
> +    test_fetch_sub (V[i]);
> +    test_fetch_and (V[i]);
> +    test_fetch_nand (V[i]);
> +    test_fetch_xor (V[i]);
> +    test_fetch_or (V[i]);
> +
> +    test_add_fetch (V[i]);
> +    test_sub_fetch (V[i]);
> +    test_and_fetch (V[i]);
> +    test_nand_fetch (V[i]);
> +    test_xor_fetch (V[i]);
> +    test_or_fetch (V[i]);
> +
> +    test_add (V[i]);
> +    test_sub (V[i]);
> +    test_and (V[i]);
> +    test_nand (V[i]);
> +    test_xor (V[i]);
> +    test_or (V[i]);
> +  }
> +
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-4.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-4.c
> new file mode 100644
> index 00000000000..619cf1f86ca
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-4.c
> @@ -0,0 +1,566 @@
> +/* Check all short alignments.  */
> +/* Duplicate logic as libatomic/testsuite/libatomic.c/atomic-op-2.c */
> +/* Test __atomic routines for existence and proper execution on 2 byte
> +   values with each valid memory model.  */
> +/* { dg-do run } */
> +/* { dg-options "-latomic -minline-atomics -Wno-address-of-packed-member" } */
> +
> +/* Test the execution of the __atomic_*OP builtin routines for a short.  */
> +
> +extern void abort(void);
> +
> +short count, res;
> +const short init = ~0;
> +
> +struct A
> +{
> +   short a;
> +   short b;
> +} __attribute__ ((packed)) A;
> +
> +/* The fetch_op routines return the original value before the operation.  */
> +
> +void
> +test_fetch_add (short* v)
> +{
> +  *v = 0;
> +  count = 1;
> +
> +  if (__atomic_fetch_add (v, count, __ATOMIC_RELAXED) != 0)
> +    abort ();
> +
> +  if (__atomic_fetch_add (v, 1, __ATOMIC_CONSUME) != 1)
> +    abort ();
> +
> +  if (__atomic_fetch_add (v, count, __ATOMIC_ACQUIRE) != 2)
> +    abort ();
> +
> +  if (__atomic_fetch_add (v, 1, __ATOMIC_RELEASE) != 3)
> +    abort ();
> +
> +  if (__atomic_fetch_add (v, count, __ATOMIC_ACQ_REL) != 4)
> +    abort ();
> +
> +  if (__atomic_fetch_add (v, 1, __ATOMIC_SEQ_CST) != 5)
> +    abort ();
> +}
> +
> +
> +void
> +test_fetch_sub (short* v)
> +{
> +  *v = res = 20;
> +  count = 0;
> +
> +  if (__atomic_fetch_sub (v, count + 1, __ATOMIC_RELAXED) !=  res--)
> +    abort ();
> +
> +  if (__atomic_fetch_sub (v, 1, __ATOMIC_CONSUME) !=  res--)
> +    abort ();
> +
> +  if (__atomic_fetch_sub (v, count + 1, __ATOMIC_ACQUIRE) !=  res--)
> +    abort ();
> +
> +  if (__atomic_fetch_sub (v, 1, __ATOMIC_RELEASE) !=  res--)
> +    abort ();
> +
> +  if (__atomic_fetch_sub (v, count + 1, __ATOMIC_ACQ_REL) !=  res--)
> +    abort ();
> +
> +  if (__atomic_fetch_sub (v, 1, __ATOMIC_SEQ_CST) !=  res--)
> +    abort ();
> +}
> +
> +void
> +test_fetch_and (short* v)
> +{
> +  *v = init;
> +
> +  if (__atomic_fetch_and (v, 0, __ATOMIC_RELAXED) !=  init)
> +    abort ();
> +
> +  if (__atomic_fetch_and (v, init, __ATOMIC_CONSUME) !=  0)
> +    abort ();
> +
> +  if (__atomic_fetch_and (v, 0, __ATOMIC_ACQUIRE) !=  0)
> +    abort ();
> +
> +  *v = ~*v;
> +  if (__atomic_fetch_and (v, init, __ATOMIC_RELEASE) !=  init)
> +    abort ();
> +
> +  if (__atomic_fetch_and (v, 0, __ATOMIC_ACQ_REL) !=  init)
> +    abort ();
> +
> +  if (__atomic_fetch_and (v, 0, __ATOMIC_SEQ_CST) !=  0)
> +    abort ();
> +}
> +
> +void
> +test_fetch_nand (short* v)
> +{
> +  *v = init;
> +
> +  if (__atomic_fetch_nand (v, 0, __ATOMIC_RELAXED) !=  init)
> +    abort ();
> +
> +  if (__atomic_fetch_nand (v, init, __ATOMIC_CONSUME) !=  init)
> +    abort ();
> +
> +  if (__atomic_fetch_nand (v, 0, __ATOMIC_ACQUIRE) !=  0 )
> +    abort ();
> +
> +  if (__atomic_fetch_nand (v, init, __ATOMIC_RELEASE) !=  init)
> +    abort ();
> +
> +  if (__atomic_fetch_nand (v, init, __ATOMIC_ACQ_REL) !=  0)
> +    abort ();
> +
> +  if (__atomic_fetch_nand (v, 0, __ATOMIC_SEQ_CST) !=  init)
> +    abort ();
> +}
> +
> +void
> +test_fetch_xor (short* v)
> +{
> +  *v = init;
> +  count = 0;
> +
> +  if (__atomic_fetch_xor (v, count, __ATOMIC_RELAXED) !=  init)
> +    abort ();
> +
> +  if (__atomic_fetch_xor (v, ~count, __ATOMIC_CONSUME) !=  init)
> +    abort ();
> +
> +  if (__atomic_fetch_xor (v, 0, __ATOMIC_ACQUIRE) !=  0)
> +    abort ();
> +
> +  if (__atomic_fetch_xor (v, ~count, __ATOMIC_RELEASE) !=  0)
> +    abort ();
> +
> +  if (__atomic_fetch_xor (v, 0, __ATOMIC_ACQ_REL) !=  init)
> +    abort ();
> +
> +  if (__atomic_fetch_xor (v, ~count, __ATOMIC_SEQ_CST) !=  init)
> +    abort ();
> +}
> +
> +void
> +test_fetch_or (short* v)
> +{
> +  *v = 0;
> +  count = 1;
> +
> +  if (__atomic_fetch_or (v, count, __ATOMIC_RELAXED) !=  0)
> +    abort ();
> +
> +  count *= 2;
> +  if (__atomic_fetch_or (v, 2, __ATOMIC_CONSUME) !=  1)
> +    abort ();
> +
> +  count *= 2;
> +  if (__atomic_fetch_or (v, count, __ATOMIC_ACQUIRE) !=  3)
> +    abort ();
> +
> +  count *= 2;
> +  if (__atomic_fetch_or (v, 8, __ATOMIC_RELEASE) !=  7)
> +    abort ();
> +
> +  count *= 2;
> +  if (__atomic_fetch_or (v, count, __ATOMIC_ACQ_REL) !=  15)
> +    abort ();
> +
> +  count *= 2;
> +  if (__atomic_fetch_or (v, count, __ATOMIC_SEQ_CST) !=  31)
> +    abort ();
> +}
> +
> +/* The OP_fetch routines return the new value after the operation.  */
> +
> +void
> +test_add_fetch (short* v)
> +{
> +  *v = 0;
> +  count = 1;
> +
> +  if (__atomic_add_fetch (v, count, __ATOMIC_RELAXED) != 1)
> +    abort ();
> +
> +  if (__atomic_add_fetch (v, 1, __ATOMIC_CONSUME) != 2)
> +    abort ();
> +
> +  if (__atomic_add_fetch (v, count, __ATOMIC_ACQUIRE) != 3)
> +    abort ();
> +
> +  if (__atomic_add_fetch (v, 1, __ATOMIC_RELEASE) != 4)
> +    abort ();
> +
> +  if (__atomic_add_fetch (v, count, __ATOMIC_ACQ_REL) != 5)
> +    abort ();
> +
> +  if (__atomic_add_fetch (v, count, __ATOMIC_SEQ_CST) != 6)
> +    abort ();
> +}
> +
> +
> +void
> +test_sub_fetch (short* v)
> +{
> +  *v = res = 20;
> +  count = 0;
> +
> +  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_RELAXED) !=  --res)
> +    abort ();
> +
> +  if (__atomic_sub_fetch (v, 1, __ATOMIC_CONSUME) !=  --res)
> +    abort ();
> +
> +  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_ACQUIRE) !=  --res)
> +    abort ();
> +
> +  if (__atomic_sub_fetch (v, 1, __ATOMIC_RELEASE) !=  --res)
> +    abort ();
> +
> +  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_ACQ_REL) !=  --res)
> +    abort ();
> +
> +  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_SEQ_CST) !=  --res)
> +    abort ();
> +}
> +
> +void
> +test_and_fetch (short* v)
> +{
> +  *v = init;
> +
> +  if (__atomic_and_fetch (v, 0, __ATOMIC_RELAXED) !=  0)
> +    abort ();
> +
> +  *v = init;
> +  if (__atomic_and_fetch (v, init, __ATOMIC_CONSUME) !=  init)
> +    abort ();
> +
> +  if (__atomic_and_fetch (v, 0, __ATOMIC_ACQUIRE) !=  0)
> +    abort ();
> +
> +  *v = ~*v;
> +  if (__atomic_and_fetch (v, init, __ATOMIC_RELEASE) !=  init)
> +    abort ();
> +
> +  if (__atomic_and_fetch (v, 0, __ATOMIC_ACQ_REL) !=  0)
> +    abort ();
> +
> +  *v = ~*v;
> +  if (__atomic_and_fetch (v, 0, __ATOMIC_SEQ_CST) !=  0)
> +    abort ();
> +}
> +
> +void
> +test_nand_fetch (short* v)
> +{
> +  *v = init;
> +
> +  if (__atomic_nand_fetch (v, 0, __ATOMIC_RELAXED) !=  init)
> +    abort ();
> +
> +  if (__atomic_nand_fetch (v, init, __ATOMIC_CONSUME) !=  0)
> +    abort ();
> +
> +  if (__atomic_nand_fetch (v, 0, __ATOMIC_ACQUIRE) !=  init)
> +    abort ();
> +
> +  if (__atomic_nand_fetch (v, init, __ATOMIC_RELEASE) !=  0)
> +    abort ();
> +
> +  if (__atomic_nand_fetch (v, init, __ATOMIC_ACQ_REL) !=  init)
> +    abort ();
> +
> +  if (__atomic_nand_fetch (v, 0, __ATOMIC_SEQ_CST) !=  init)
> +    abort ();
> +}
> +
> +
> +
> +void
> +test_xor_fetch (short* v)
> +{
> +  *v = init;
> +  count = 0;
> +
> +  if (__atomic_xor_fetch (v, count, __ATOMIC_RELAXED) !=  init)
> +    abort ();
> +
> +  if (__atomic_xor_fetch (v, ~count, __ATOMIC_CONSUME) !=  0)
> +    abort ();
> +
> +  if (__atomic_xor_fetch (v, 0, __ATOMIC_ACQUIRE) !=  0)
> +    abort ();
> +
> +  if (__atomic_xor_fetch (v, ~count, __ATOMIC_RELEASE) !=  init)
> +    abort ();
> +
> +  if (__atomic_xor_fetch (v, 0, __ATOMIC_ACQ_REL) !=  init)
> +    abort ();
> +
> +  if (__atomic_xor_fetch (v, ~count, __ATOMIC_SEQ_CST) !=  0)
> +    abort ();
> +}
> +
> +void
> +test_or_fetch (short* v)
> +{
> +  *v = 0;
> +  count = 1;
> +
> +  if (__atomic_or_fetch (v, count, __ATOMIC_RELAXED) !=  1)
> +    abort ();
> +
> +  count *= 2;
> +  if (__atomic_or_fetch (v, 2, __ATOMIC_CONSUME) !=  3)
> +    abort ();
> +
> +  count *= 2;
> +  if (__atomic_or_fetch (v, count, __ATOMIC_ACQUIRE) !=  7)
> +    abort ();
> +
> +  count *= 2;
> +  if (__atomic_or_fetch (v, 8, __ATOMIC_RELEASE) !=  15)
> +    abort ();
> +
> +  count *= 2;
> +  if (__atomic_or_fetch (v, count, __ATOMIC_ACQ_REL) !=  31)
> +    abort ();
> +
> +  count *= 2;
> +  if (__atomic_or_fetch (v, count, __ATOMIC_SEQ_CST) !=  63)
> +    abort ();
> +}
> +
> +
> +/* Test the OP routines with a result which isn't used. Use both variations
> +   within each function.  */
> +
> +void
> +test_add (short* v)
> +{
> +  *v = 0;
> +  count = 1;
> +
> +  __atomic_add_fetch (v, count, __ATOMIC_RELAXED);
> +  if (*v != 1)
> +    abort ();
> +
> +  __atomic_fetch_add (v, count, __ATOMIC_CONSUME);
> +  if (*v != 2)
> +    abort ();
> +
> +  __atomic_add_fetch (v, 1 , __ATOMIC_ACQUIRE);
> +  if (*v != 3)
> +    abort ();
> +
> +  __atomic_fetch_add (v, 1, __ATOMIC_RELEASE);
> +  if (*v != 4)
> +    abort ();
> +
> +  __atomic_add_fetch (v, count, __ATOMIC_ACQ_REL);
> +  if (*v != 5)
> +    abort ();
> +
> +  __atomic_fetch_add (v, count, __ATOMIC_SEQ_CST);
> +  if (*v != 6)
> +    abort ();
> +}
> +
> +
> +void
> +test_sub (short* v)
> +{
> +  *v = res = 20;
> +  count = 0;
> +
> +  __atomic_sub_fetch (v, count + 1, __ATOMIC_RELAXED);
> +  if (*v != --res)
> +    abort ();
> +
> +  __atomic_fetch_sub (v, count + 1, __ATOMIC_CONSUME);
> +  if (*v != --res)
> +    abort ();
> +
> +  __atomic_sub_fetch (v, 1, __ATOMIC_ACQUIRE);
> +  if (*v != --res)
> +    abort ();
> +
> +  __atomic_fetch_sub (v, 1, __ATOMIC_RELEASE);
> +  if (*v != --res)
> +    abort ();
> +
> +  __atomic_sub_fetch (v, count + 1, __ATOMIC_ACQ_REL);
> +  if (*v != --res)
> +    abort ();
> +
> +  __atomic_fetch_sub (v, count + 1, __ATOMIC_SEQ_CST);
> +  if (*v != --res)
> +    abort ();
> +}
> +
> +void
> +test_and (short* v)
> +{
> +  *v = init;
> +
> +  __atomic_and_fetch (v, 0, __ATOMIC_RELAXED);
> +  if (*v != 0)
> +    abort ();
> +
> +  *v = init;
> +  __atomic_fetch_and (v, init, __ATOMIC_CONSUME);
> +  if (*v != init)
> +    abort ();
> +
> +  __atomic_and_fetch (v, 0, __ATOMIC_ACQUIRE);
> +  if (*v != 0)
> +    abort ();
> +
> +  *v = ~*v;
> +  __atomic_fetch_and (v, init, __ATOMIC_RELEASE);
> +  if (*v != init)
> +    abort ();
> +
> +  __atomic_and_fetch (v, 0, __ATOMIC_ACQ_REL);
> +  if (*v != 0)
> +    abort ();
> +
> +  *v = ~*v;
> +  __atomic_fetch_and (v, 0, __ATOMIC_SEQ_CST);
> +  if (*v != 0)
> +    abort ();
> +}
> +
> +void
> +test_nand (short* v)
> +{
> +  *v = init;
> +
> +  __atomic_fetch_nand (v, 0, __ATOMIC_RELAXED);
> +  if (*v != init)
> +    abort ();
> +
> +  __atomic_fetch_nand (v, init, __ATOMIC_CONSUME);
> +  if (*v != 0)
> +    abort ();
> +
> +  __atomic_nand_fetch (v, 0, __ATOMIC_ACQUIRE);
> +  if (*v != init)
> +    abort ();
> +
> +  __atomic_nand_fetch (v, init, __ATOMIC_RELEASE);
> +  if (*v != 0)
> +    abort ();
> +
> +  __atomic_fetch_nand (v, init, __ATOMIC_ACQ_REL);
> +  if (*v != init)
> +    abort ();
> +
> +  __atomic_nand_fetch (v, 0, __ATOMIC_SEQ_CST);
> +  if (*v != init)
> +    abort ();
> +}
> +
> +
> +
> +void
> +test_xor (short* v)
> +{
> +  *v = init;
> +  count = 0;
> +
> +  __atomic_xor_fetch (v, count, __ATOMIC_RELAXED);
> +  if (*v != init)
> +    abort ();
> +
> +  __atomic_fetch_xor (v, ~count, __ATOMIC_CONSUME);
> +  if (*v != 0)
> +    abort ();
> +
> +  __atomic_xor_fetch (v, 0, __ATOMIC_ACQUIRE);
> +  if (*v != 0)
> +    abort ();
> +
> +  __atomic_fetch_xor (v, ~count, __ATOMIC_RELEASE);
> +  if (*v != init)
> +    abort ();
> +
> +  __atomic_fetch_xor (v, 0, __ATOMIC_ACQ_REL);
> +  if (*v != init)
> +    abort ();
> +
> +  __atomic_xor_fetch (v, ~count, __ATOMIC_SEQ_CST);
> +  if (*v != 0)
> +    abort ();
> +}
> +
> +void
> +test_or (short* v)
> +{
> +  *v = 0;
> +  count = 1;
> +
> +  __atomic_or_fetch (v, count, __ATOMIC_RELAXED);
> +  if (*v != 1)
> +    abort ();
> +
> +  count *= 2;
> +  __atomic_fetch_or (v, count, __ATOMIC_CONSUME);
> +  if (*v != 3)
> +    abort ();
> +
> +  count *= 2;
> +  __atomic_or_fetch (v, 4, __ATOMIC_ACQUIRE);
> +  if (*v != 7)
> +    abort ();
> +
> +  count *= 2;
> +  __atomic_fetch_or (v, 8, __ATOMIC_RELEASE);
> +  if (*v != 15)
> +    abort ();
> +
> +  count *= 2;
> +  __atomic_or_fetch (v, count, __ATOMIC_ACQ_REL);
> +  if (*v != 31)
> +    abort ();
> +
> +  count *= 2;
> +  __atomic_fetch_or (v, count, __ATOMIC_SEQ_CST);
> +  if (*v != 63)
> +    abort ();
> +}
> +
> +int
> +main () {
> +  short* V[] = {&A.a, &A.b};
> +
> +  for (int i = 0; i < 2; i++) {
> +    test_fetch_add (V[i]);
> +    test_fetch_sub (V[i]);
> +    test_fetch_and (V[i]);
> +    test_fetch_nand (V[i]);
> +    test_fetch_xor (V[i]);
> +    test_fetch_or (V[i]);
> +
> +    test_add_fetch (V[i]);
> +    test_sub_fetch (V[i]);
> +    test_and_fetch (V[i]);
> +    test_nand_fetch (V[i]);
> +    test_xor_fetch (V[i]);
> +    test_or_fetch (V[i]);
> +
> +    test_add (V[i]);
> +    test_sub (V[i]);
> +    test_and (V[i]);
> +    test_nand (V[i]);
> +    test_xor (V[i]);
> +    test_or (V[i]);
> +  }
> +
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-5.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-5.c
> new file mode 100644
> index 00000000000..c2751235dbf
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-5.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* Verify that constant propogation is functioning.  */
> +/* The -4 should be propogated into an ANDI statement. */
> +/* { dg-options "-minline-atomics" } */
> +/* { dg-final { scan-assembler-not "\tli\t[at]\d,-4" } } */
> +
> +char bar;
> +
> +int
> +main ()
> +{
> +  __sync_fetch_and_add(&bar, 1);
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-6.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-6.c
> new file mode 100644
> index 00000000000..18249bae7d1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-6.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* Verify that masks are generated to the correct size.  */
> +/* { dg-options "-O3 -minline-atomics" } */
> +/* Check for mask */
> +/* { dg-final { scan-assembler "\tli\t[at]\d,255" } } */
> +
> +int
> +main ()
> +{
> +  char bar __attribute__((aligned (32)));
> +  __sync_fetch_and_add(&bar, 0);
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-7.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-7.c
> new file mode 100644
> index 00000000000..81bbf4badce
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-7.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* Verify that masks are generated to the correct size.  */
> +/* { dg-options "-O3 -minline-atomics" } */
> +/* Check for mask */
> +/* { dg-final { scan-assembler "\tli\t[at]\d,65535" } } */
> +
> +int
> +main ()
> +{
> +  short bar __attribute__((aligned (32)));
> +  __sync_fetch_and_add(&bar, 0);
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-8.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-8.c
> new file mode 100644
> index 00000000000..d27562ed981
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-8.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* Verify that masks are aligned properly.  */
> +/* { dg-options "-O3 -minline-atomics" } */
> +/* Check for mask */
> +/* { dg-final { scan-assembler "\tli\t[at]\d,16711680" } } */
> +
> +int
> +main ()
> +{
> +  struct A {
> +    char a;
> +    char b;
> +    char c;
> +    char d;
> +  } __attribute__ ((packed)) __attribute__((aligned (32))) A;
> +  __sync_fetch_and_add(&A.c, 0);
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-9.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-9.c
> new file mode 100644
> index 00000000000..382849702ca
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-9.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* Verify that masks are aligned properly.  */
> +/* { dg-options "-O3 -minline-atomics" } */
> +/* Check for mask */
> +/* { dg-final { scan-assembler "\tli\t[at]\d,-16777216" } } */
> +
> +int
> +main ()
> +{
> +  struct A {
> +    char a;
> +    char b;
> +    char c;
> +    char d;
> +  } __attribute__ ((packed)) __attribute__((aligned (32))) A;
> +  __sync_fetch_and_add(&A.d, 0);
> +}
> diff --git a/libgcc/config/riscv/atomic.c b/libgcc/config/riscv/atomic.c
> index 904d8c59cf0..9583027b757 100644
> --- a/libgcc/config/riscv/atomic.c
> +++ b/libgcc/config/riscv/atomic.c
> @@ -30,6 +30,8 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
>  #define INVERT		"not %[tmp1], %[tmp1]\n\t"
>  #define DONT_INVERT	""
>
> +/* Logic duplicated in gcc/gcc/config/riscv/sync.md for use when inlining is enabled */
> +
>  #define GENERATE_FETCH_AND_OP(type, size, opname, insn, invert, cop)	\
>    type __sync_fetch_and_ ## opname ## _ ## size (type *p, type v)	\
>    {									\

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2] RISCV: Add support for inlining subword atomics
  2022-02-23 21:49 ` Palmer Dabbelt
@ 2022-03-10 22:36   ` Patrick O'Neill
  2022-04-08  2:48     ` Pan RZ
  2022-04-19 17:17     ` [PATCH v3] RISC-V: Add support for inlining subword atomic operations Patrick O'Neill
  0 siblings, 2 replies; 10+ messages in thread
From: Patrick O'Neill @ 2022-03-10 22:36 UTC (permalink / raw)
  To: gcc-patches; +Cc: palmer, Patrick O'Neill

RISC-V has no support for subword atomic operations; code currently
generates libatomic library calls.

This patch changes the default behavior to inline subword atomic calls 
(using the same logic as the existing library call).
Behavior can be specified using the -minline-atomics and
-mno-inline-atomics command line flags.

gcc/libgcc/config/riscv/atomic.c has the same logic implemented in asm.
This will need to stay for backwards compatibility and the
-mno-inline-atomics flag.

2022-02-15 Patrick O'Neill <patrick@rivosinc.com>

	PR target/104338
	* riscv.opt: Add command-line flag.
	* invoke.texi: Add blurb regarding command-line flag.
	* sync.md (atomic_fetch_<atomic_optab><mode>): logic for 
	expanding subword atomic operations.
	* sync.md (subword_atomic_fetch_strong_<atomic_optab>): LR/SC
	block for performing atomic operation
	* atomic.c: Add reference to duplicate logic.
	* inline-atomics-1.c: New test.
	* inline-atomics-2.c: Likewise.
	* inline-atomics-3.c: Likewise.
	* inline-atomics-4.c: Likewise.
	* inline-atomics-5.c: Likewise.
	* inline-atomics-6.c: Likewise.
	* inline-atomics-7.c: Likewise.
	* inline-atomics-8.c: Likewise.
	* inline-atomics-9.c: Likewise.

Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
---
There may be further concerns about the memory consistency of these 
operations, but this patch focuses on simply moving the logic inline.
Those concerns can be addressed in a future patch.
---
v2 Changelog:
 - Add texti blurb
 - Update target flag
 - add 'UNSPEC_SYNC_OLD_OP_SUBWORD' for subword ops
---
 gcc/config/riscv/riscv.opt                    |   4 +
 gcc/config/riscv/sync.md                      |  98 +++
 gcc/doc/invoke.texi                           |   7 +
 .../gcc.target/riscv/inline-atomics-1.c       |  11 +
 .../gcc.target/riscv/inline-atomics-2.c       |  12 +
 .../gcc.target/riscv/inline-atomics-3.c       | 569 ++++++++++++++++++
 .../gcc.target/riscv/inline-atomics-4.c       | 566 +++++++++++++++++
 .../gcc.target/riscv/inline-atomics-5.c       |  13 +
 .../gcc.target/riscv/inline-atomics-6.c       |  12 +
 .../gcc.target/riscv/inline-atomics-7.c       |  12 +
 .../gcc.target/riscv/inline-atomics-8.c       |  17 +
 .../gcc.target/riscv/inline-atomics-9.c       |  17 +
 libgcc/config/riscv/atomic.c                  |   2 +
 13 files changed, 1340 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-9.c

diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index 9fffc08220d..8378e41aa85 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -225,3 +225,7 @@ Enum(isa_spec_class) String(20191213) Value(ISA_SPEC_CLASS_20191213)
 misa-spec=
 Target RejectNegative Joined Enum(isa_spec_class) Var(riscv_isa_spec) Init(TARGET_DEFAULT_ISA_SPEC)
 Set the version of RISC-V ISA spec.
+
+minline-atomics
+Target Mask(INLINE_SUBWORD_ATOMIC)
+Always inline subword atomic operations.
diff --git a/gcc/config/riscv/sync.md b/gcc/config/riscv/sync.md
index 86b41e6b00a..05cbdfd5db3 100644
--- a/gcc/config/riscv/sync.md
+++ b/gcc/config/riscv/sync.md
@@ -22,6 +22,7 @@
 (define_c_enum "unspec" [
   UNSPEC_COMPARE_AND_SWAP
   UNSPEC_SYNC_OLD_OP
+  UNSPEC_SYNC_OLD_OP_SUBWORD
   UNSPEC_SYNC_EXCHANGE
   UNSPEC_ATOMIC_STORE
   UNSPEC_MEMORY_BARRIER
@@ -92,6 +93,103 @@
   "%F3amo<insn>.<amo>%A3 %0,%z2,%1"
   [(set (attr "length") (const_int 8))])
 
+(define_expand "atomic_fetch_<atomic_optab><mode>"
+  [(set (match_operand:SHORT 0 "register_operand" "=&r")	      ;; old value at mem
+	(match_operand:SHORT 1 "memory_operand" "+A"))		      ;; mem location
+   (set (match_dup 1)
+	(unspec_volatile:SHORT
+	  [(any_atomic:SHORT (match_dup 1)
+		     (match_operand:SHORT 2 "reg_or_0_operand" "rJ")) ;; value for op
+	   (match_operand:SI 3 "const_int_operand")]		      ;; model
+	 UNSPEC_SYNC_OLD_OP_SUBWORD))]
+  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
+{
+  /* We have no QImode/HImode atomics, so form a mask, then use
+     subword_atomic_fetch_strong_<mode> to implement a LR/SC version of the
+     operation. */
+
+  /* Logic duplicated in gcc/libgcc/config/riscv/atomic.c for use when inlining
+     is disabled */
+
+  rtx old = gen_reg_rtx (SImode);
+  rtx mem = operands[1];
+  rtx value = operands[2];
+  rtx mask = gen_reg_rtx (SImode);
+  rtx notmask = gen_reg_rtx (SImode);
+
+  rtx addr = force_reg (Pmode, XEXP (mem, 0));
+
+  rtx aligned_addr = gen_reg_rtx (Pmode);
+  emit_move_insn (aligned_addr,  gen_rtx_AND (Pmode, addr,
+					      gen_int_mode (-4, Pmode)));
+
+  rtx aligned_mem = change_address (mem, SImode, aligned_addr);
+
+  rtx shift = gen_reg_rtx (SImode);
+  emit_move_insn (shift, gen_rtx_AND (SImode, gen_lowpart (SImode, addr),
+				      gen_int_mode (3, SImode)));
+  emit_move_insn (shift, gen_rtx_ASHIFT (SImode, shift,
+					 gen_int_mode(3, SImode)));
+
+  rtx value_reg = gen_reg_rtx (SImode);
+  emit_move_insn (value_reg, simplify_gen_subreg (SImode, value, <MODE>mode, 0));
+
+  rtx shifted_value = gen_reg_rtx (SImode);
+  emit_move_insn(shifted_value, gen_rtx_ASHIFT(SImode, value_reg,
+					       gen_lowpart (QImode, shift)));
+
+  int unshifted_mask;
+  if (<MODE>mode == QImode)
+    unshifted_mask = 0xFF;
+  else
+    unshifted_mask = 0xFFFF;
+
+  rtx mask_reg = gen_reg_rtx (SImode);
+  emit_move_insn (mask_reg, gen_int_mode(unshifted_mask, SImode));
+
+  emit_move_insn (mask, gen_rtx_ASHIFT(SImode, mask_reg,
+				       gen_lowpart (QImode, shift)));
+
+  emit_move_insn (notmask, gen_rtx_NOT(SImode, mask));
+
+  emit_insn (gen_subword_atomic_fetch_strong_<atomic_optab> (old, aligned_mem,
+							     shifted_value,
+							     mask, notmask));
+
+  emit_move_insn (old, gen_rtx_ASHIFTRT(SImode, old,
+					gen_lowpart(QImode, shift)));
+
+  emit_move_insn (operands[0], gen_lowpart(<MODE>mode, old));
+
+  DONE;
+})
+
+(define_insn "subword_atomic_fetch_strong_<atomic_optab>"
+  [(set (match_operand:SI 0 "register_operand" "=&r")		   ;; old value at mem
+	(match_operand:SI 1 "memory_operand" "+A"))		   ;; mem location
+   (set (match_dup 1)
+	(unspec_volatile:SI
+	  [(any_atomic:SI (match_dup 1)
+		     (match_operand:SI 2 "register_operand" "rI")) ;; value for op
+	   (match_operand:SI 3 "register_operand" "rI")]	   ;; mask
+	 UNSPEC_SYNC_OLD_OP_SUBWORD))
+    (match_operand:SI 4 "register_operand" "rI")		   ;; not_mask
+    (clobber (match_scratch:SI 5 "=&r"))			   ;; tmp_1
+    (clobber (match_scratch:SI 6 "=&r"))]			   ;; tmp_2
+  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
+  {
+    return
+    "1:\;"
+    "lr.w.aq\t%0, %1\;"
+    "<insn>\t%5, %0, %2\;"
+    "and\t%5, %5, %3\;"
+    "and\t%6, %0, %4\;"
+    "or\t%6, %6, %5\;"
+    "sc.w.rl\t%5, %6, %1\;"
+    "bnez\t%5, 1b";
+  }
+  [(set (attr "length") (const_int 28))])
+
 (define_insn "atomic_exchange<mode>"
   [(set (match_operand:GPR 0 "register_operand" "=&r")
 	(unspec_volatile:GPR
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index e1a00c80307..1007110aafb 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1200,6 +1200,7 @@ See RS/6000 and PowerPC Options.
 -mbig-endian  -mlittle-endian @gol
 -mstack-protector-guard=@var{guard} -mstack-protector-guard-reg=@var{reg} @gol
 -mstack-protector-guard-offset=@var{offset}}
+-minline-atomics  -mno-inline-atomics @gol
 
 @emph{RL78 Options}
 @gccoptlist{-msim  -mmul=none  -mmul=g13  -mmul=g14  -mallregs @gol
@@ -27712,6 +27713,12 @@ Do or don't use smaller but slower prologue and epilogue code that uses
 library function calls.  The default is to use fast inline prologues and
 epilogues.
 
+@item -minline-atomics
+@itemx -mno-inline-atomics
+@opindex minline-atomics
+Do or don't use smaller but slower subword atomic emulation code that uses
+library function calls.  The default is to use fast inline subword atomics.
+
 @item -mshorten-memrefs
 @itemx -mno-shorten-memrefs
 @opindex mshorten-memrefs
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-1.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-1.c
new file mode 100644
index 00000000000..110fdabd313
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-1.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-mno-inline-atomics" } */
+/* { dg-final { scan-assembler "\tcall\t__sync_fetch_and_add_1" } } */
+
+char bar;
+
+int
+main ()
+{
+  __sync_fetch_and_add(&bar, 1);
+}
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-2.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-2.c
new file mode 100644
index 00000000000..8d5c31d8b79
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-2.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* Verify that subword atomics do not generate calls.  */
+/* { dg-options "-minline-atomics" } */
+/* { dg-final { scan-assembler-not "\tcall\t__sync_fetch_and_add_1" } } */
+
+char bar;
+
+int
+main ()
+{
+  __sync_fetch_and_add(&bar, 1);
+}
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-3.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-3.c
new file mode 100644
index 00000000000..19b382d45b0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-3.c
@@ -0,0 +1,569 @@
+/* Check all char alignments.  */
+/* Duplicate logic as libatomic/testsuite/libatomic.c/atomic-op-1.c */
+/* Test __atomic routines for existence and proper execution on 1 byte
+   values with each valid memory model.  */
+/* { dg-do run } */
+/* { dg-options "-latomic -minline-atomics -Wno-address-of-packed-member" } */
+
+/* Test the execution of the __atomic_*OP builtin routines for a char.  */
+
+extern void abort(void);
+
+char count, res;
+const char init = ~0;
+
+struct A
+{
+   char a;
+   char b;
+   char c;
+   char d;
+} __attribute__ ((packed)) A;
+
+/* The fetch_op routines return the original value before the operation.  */
+
+void
+test_fetch_add (char* v)
+{
+  *v = 0;
+  count = 1;
+
+  if (__atomic_fetch_add (v, count, __ATOMIC_RELAXED) != 0)
+    abort ();
+
+  if (__atomic_fetch_add (v, 1, __ATOMIC_CONSUME) != 1)
+    abort ();
+
+  if (__atomic_fetch_add (v, count, __ATOMIC_ACQUIRE) != 2)
+    abort ();
+
+  if (__atomic_fetch_add (v, 1, __ATOMIC_RELEASE) != 3)
+    abort ();
+
+  if (__atomic_fetch_add (v, count, __ATOMIC_ACQ_REL) != 4)
+    abort ();
+
+  if (__atomic_fetch_add (v, 1, __ATOMIC_SEQ_CST) != 5)
+    abort ();
+}
+
+
+void
+test_fetch_sub (char* v)
+{
+  *v = res = 20;
+  count = 0;
+
+  if (__atomic_fetch_sub (v, count + 1, __ATOMIC_RELAXED) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, 1, __ATOMIC_CONSUME) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, count + 1, __ATOMIC_ACQUIRE) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, 1, __ATOMIC_RELEASE) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, count + 1, __ATOMIC_ACQ_REL) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, 1, __ATOMIC_SEQ_CST) !=  res--)
+    abort ();
+}
+
+void
+test_fetch_and (char* v)
+{
+  *v = init;
+
+  if (__atomic_fetch_and (v, 0, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_fetch_and (v, init, __ATOMIC_CONSUME) !=  0)
+    abort ();
+
+  if (__atomic_fetch_and (v, 0, __ATOMIC_ACQUIRE) !=  0)
+    abort ();
+
+  *v = ~*v;
+  if (__atomic_fetch_and (v, init, __ATOMIC_RELEASE) !=  init)
+    abort ();
+
+  if (__atomic_fetch_and (v, 0, __ATOMIC_ACQ_REL) !=  init)
+    abort ();
+
+  if (__atomic_fetch_and (v, 0, __ATOMIC_SEQ_CST) !=  0)
+    abort ();
+}
+
+void
+test_fetch_nand (char* v)
+{
+  *v = init;
+
+  if (__atomic_fetch_nand (v, 0, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_fetch_nand (v, init, __ATOMIC_CONSUME) !=  init)
+    abort ();
+
+  if (__atomic_fetch_nand (v, 0, __ATOMIC_ACQUIRE) !=  0 )
+    abort ();
+
+  if (__atomic_fetch_nand (v, init, __ATOMIC_RELEASE) !=  init)
+    abort ();
+
+  if (__atomic_fetch_nand (v, init, __ATOMIC_ACQ_REL) !=  0)
+    abort ();
+
+  if (__atomic_fetch_nand (v, 0, __ATOMIC_SEQ_CST) !=  init)
+    abort ();
+}
+
+void
+test_fetch_xor (char* v)
+{
+  *v = init;
+  count = 0;
+
+  if (__atomic_fetch_xor (v, count, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_fetch_xor (v, ~count, __ATOMIC_CONSUME) !=  init)
+    abort ();
+
+  if (__atomic_fetch_xor (v, 0, __ATOMIC_ACQUIRE) !=  0)
+    abort ();
+
+  if (__atomic_fetch_xor (v, ~count, __ATOMIC_RELEASE) !=  0)
+    abort ();
+
+  if (__atomic_fetch_xor (v, 0, __ATOMIC_ACQ_REL) !=  init)
+    abort ();
+
+  if (__atomic_fetch_xor (v, ~count, __ATOMIC_SEQ_CST) !=  init)
+    abort ();
+}
+
+void
+test_fetch_or (char* v)
+{
+  *v = 0;
+  count = 1;
+
+  if (__atomic_fetch_or (v, count, __ATOMIC_RELAXED) !=  0)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, 2, __ATOMIC_CONSUME) !=  1)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, count, __ATOMIC_ACQUIRE) !=  3)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, 8, __ATOMIC_RELEASE) !=  7)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, count, __ATOMIC_ACQ_REL) !=  15)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, count, __ATOMIC_SEQ_CST) !=  31)
+    abort ();
+}
+
+/* The OP_fetch routines return the new value after the operation.  */
+
+void
+test_add_fetch (char* v)
+{
+  *v = 0;
+  count = 1;
+
+  if (__atomic_add_fetch (v, count, __ATOMIC_RELAXED) != 1)
+    abort ();
+
+  if (__atomic_add_fetch (v, 1, __ATOMIC_CONSUME) != 2)
+    abort ();
+
+  if (__atomic_add_fetch (v, count, __ATOMIC_ACQUIRE) != 3)
+    abort ();
+
+  if (__atomic_add_fetch (v, 1, __ATOMIC_RELEASE) != 4)
+    abort ();
+
+  if (__atomic_add_fetch (v, count, __ATOMIC_ACQ_REL) != 5)
+    abort ();
+
+  if (__atomic_add_fetch (v, count, __ATOMIC_SEQ_CST) != 6)
+    abort ();
+}
+
+
+void
+test_sub_fetch (char* v)
+{
+  *v = res = 20;
+  count = 0;
+
+  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_RELAXED) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, 1, __ATOMIC_CONSUME) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_ACQUIRE) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, 1, __ATOMIC_RELEASE) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_ACQ_REL) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_SEQ_CST) !=  --res)
+    abort ();
+}
+
+void
+test_and_fetch (char* v)
+{
+  *v = init;
+
+  if (__atomic_and_fetch (v, 0, __ATOMIC_RELAXED) !=  0)
+    abort ();
+
+  *v = init;
+  if (__atomic_and_fetch (v, init, __ATOMIC_CONSUME) !=  init)
+    abort ();
+
+  if (__atomic_and_fetch (v, 0, __ATOMIC_ACQUIRE) !=  0)
+    abort ();
+
+  *v = ~*v;
+  if (__atomic_and_fetch (v, init, __ATOMIC_RELEASE) !=  init)
+    abort ();
+
+  if (__atomic_and_fetch (v, 0, __ATOMIC_ACQ_REL) !=  0)
+    abort ();
+
+  *v = ~*v;
+  if (__atomic_and_fetch (v, 0, __ATOMIC_SEQ_CST) !=  0)
+    abort ();
+}
+
+void
+test_nand_fetch (char* v)
+{
+  *v = init;
+
+  if (__atomic_nand_fetch (v, 0, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_nand_fetch (v, init, __ATOMIC_CONSUME) !=  0)
+    abort ();
+
+  if (__atomic_nand_fetch (v, 0, __ATOMIC_ACQUIRE) !=  init)
+    abort ();
+
+  if (__atomic_nand_fetch (v, init, __ATOMIC_RELEASE) !=  0)
+    abort ();
+
+  if (__atomic_nand_fetch (v, init, __ATOMIC_ACQ_REL) !=  init)
+    abort ();
+
+  if (__atomic_nand_fetch (v, 0, __ATOMIC_SEQ_CST) !=  init)
+    abort ();
+}
+
+
+
+void
+test_xor_fetch (char* v)
+{
+  *v = init;
+  count = 0;
+
+  if (__atomic_xor_fetch (v, count, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_xor_fetch (v, ~count, __ATOMIC_CONSUME) !=  0)
+    abort ();
+
+  if (__atomic_xor_fetch (v, 0, __ATOMIC_ACQUIRE) !=  0)
+    abort ();
+
+  if (__atomic_xor_fetch (v, ~count, __ATOMIC_RELEASE) !=  init)
+    abort ();
+
+  if (__atomic_xor_fetch (v, 0, __ATOMIC_ACQ_REL) !=  init)
+    abort ();
+
+  if (__atomic_xor_fetch (v, ~count, __ATOMIC_SEQ_CST) !=  0)
+    abort ();
+}
+
+void
+test_or_fetch (char* v)
+{
+  *v = 0;
+  count = 1;
+
+  if (__atomic_or_fetch (v, count, __ATOMIC_RELAXED) !=  1)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, 2, __ATOMIC_CONSUME) !=  3)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, count, __ATOMIC_ACQUIRE) !=  7)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, 8, __ATOMIC_RELEASE) !=  15)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, count, __ATOMIC_ACQ_REL) !=  31)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, count, __ATOMIC_SEQ_CST) !=  63)
+    abort ();
+}
+
+
+/* Test the OP routines with a result which isn't used. Use both variations
+   within each function.  */
+
+void
+test_add (char* v)
+{
+  *v = 0;
+  count = 1;
+
+  __atomic_add_fetch (v, count, __ATOMIC_RELAXED);
+  if (*v != 1)
+    abort ();
+
+  __atomic_fetch_add (v, count, __ATOMIC_CONSUME);
+  if (*v != 2)
+    abort ();
+
+  __atomic_add_fetch (v, 1 , __ATOMIC_ACQUIRE);
+  if (*v != 3)
+    abort ();
+
+  __atomic_fetch_add (v, 1, __ATOMIC_RELEASE);
+  if (*v != 4)
+    abort ();
+
+  __atomic_add_fetch (v, count, __ATOMIC_ACQ_REL);
+  if (*v != 5)
+    abort ();
+
+  __atomic_fetch_add (v, count, __ATOMIC_SEQ_CST);
+  if (*v != 6)
+    abort ();
+}
+
+
+void
+test_sub (char* v)
+{
+  *v = res = 20;
+  count = 0;
+
+  __atomic_sub_fetch (v, count + 1, __ATOMIC_RELAXED);
+  if (*v != --res)
+    abort ();
+
+  __atomic_fetch_sub (v, count + 1, __ATOMIC_CONSUME);
+  if (*v != --res)
+    abort ();
+
+  __atomic_sub_fetch (v, 1, __ATOMIC_ACQUIRE);
+  if (*v != --res)
+    abort ();
+
+  __atomic_fetch_sub (v, 1, __ATOMIC_RELEASE);
+  if (*v != --res)
+    abort ();
+
+  __atomic_sub_fetch (v, count + 1, __ATOMIC_ACQ_REL);
+  if (*v != --res)
+    abort ();
+
+  __atomic_fetch_sub (v, count + 1, __ATOMIC_SEQ_CST);
+  if (*v != --res)
+    abort ();
+}
+
+void
+test_and (char* v)
+{
+  *v = init;
+
+  __atomic_and_fetch (v, 0, __ATOMIC_RELAXED);
+  if (*v != 0)
+    abort ();
+
+  *v = init;
+  __atomic_fetch_and (v, init, __ATOMIC_CONSUME);
+  if (*v != init)
+    abort ();
+
+  __atomic_and_fetch (v, 0, __ATOMIC_ACQUIRE);
+  if (*v != 0)
+    abort ();
+
+  *v = ~*v;
+  __atomic_fetch_and (v, init, __ATOMIC_RELEASE);
+  if (*v != init)
+    abort ();
+
+  __atomic_and_fetch (v, 0, __ATOMIC_ACQ_REL);
+  if (*v != 0)
+    abort ();
+
+  *v = ~*v;
+  __atomic_fetch_and (v, 0, __ATOMIC_SEQ_CST);
+  if (*v != 0)
+    abort ();
+}
+
+void
+test_nand (char* v)
+{
+  *v = init;
+
+  __atomic_fetch_nand (v, 0, __ATOMIC_RELAXED);
+  if (*v != init)
+    abort ();
+
+  __atomic_fetch_nand (v, init, __ATOMIC_CONSUME);
+  if (*v != 0)
+    abort ();
+
+  __atomic_nand_fetch (v, 0, __ATOMIC_ACQUIRE);
+  if (*v != init)
+    abort ();
+
+  __atomic_nand_fetch (v, init, __ATOMIC_RELEASE);
+  if (*v != 0)
+    abort ();
+
+  __atomic_fetch_nand (v, init, __ATOMIC_ACQ_REL);
+  if (*v != init)
+    abort ();
+
+  __atomic_nand_fetch (v, 0, __ATOMIC_SEQ_CST);
+  if (*v != init)
+    abort ();
+}
+
+
+
+void
+test_xor (char* v)
+{
+  *v = init;
+  count = 0;
+
+  __atomic_xor_fetch (v, count, __ATOMIC_RELAXED);
+  if (*v != init)
+    abort ();
+
+  __atomic_fetch_xor (v, ~count, __ATOMIC_CONSUME);
+  if (*v != 0)
+    abort ();
+
+  __atomic_xor_fetch (v, 0, __ATOMIC_ACQUIRE);
+  if (*v != 0)
+    abort ();
+
+  __atomic_fetch_xor (v, ~count, __ATOMIC_RELEASE);
+  if (*v != init)
+    abort ();
+
+  __atomic_fetch_xor (v, 0, __ATOMIC_ACQ_REL);
+  if (*v != init)
+    abort ();
+
+  __atomic_xor_fetch (v, ~count, __ATOMIC_SEQ_CST);
+  if (*v != 0)
+    abort ();
+}
+
+void
+test_or (char* v)
+{
+  *v = 0;
+  count = 1;
+
+  __atomic_or_fetch (v, count, __ATOMIC_RELAXED);
+  if (*v != 1)
+    abort ();
+
+  count *= 2;
+  __atomic_fetch_or (v, count, __ATOMIC_CONSUME);
+  if (*v != 3)
+    abort ();
+
+  count *= 2;
+  __atomic_or_fetch (v, 4, __ATOMIC_ACQUIRE);
+  if (*v != 7)
+    abort ();
+
+  count *= 2;
+  __atomic_fetch_or (v, 8, __ATOMIC_RELEASE);
+  if (*v != 15)
+    abort ();
+
+  count *= 2;
+  __atomic_or_fetch (v, count, __ATOMIC_ACQ_REL);
+  if (*v != 31)
+    abort ();
+
+  count *= 2;
+  __atomic_fetch_or (v, count, __ATOMIC_SEQ_CST);
+  if (*v != 63)
+    abort ();
+}
+
+int
+main ()
+{
+  char* V[] = {&A.a, &A.b, &A.c, &A.d};
+
+  for (int i = 0; i < 4; i++) {
+    test_fetch_add (V[i]);
+    test_fetch_sub (V[i]);
+    test_fetch_and (V[i]);
+    test_fetch_nand (V[i]);
+    test_fetch_xor (V[i]);
+    test_fetch_or (V[i]);
+
+    test_add_fetch (V[i]);
+    test_sub_fetch (V[i]);
+    test_and_fetch (V[i]);
+    test_nand_fetch (V[i]);
+    test_xor_fetch (V[i]);
+    test_or_fetch (V[i]);
+
+    test_add (V[i]);
+    test_sub (V[i]);
+    test_and (V[i]);
+    test_nand (V[i]);
+    test_xor (V[i]);
+    test_or (V[i]);
+  }
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-4.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-4.c
new file mode 100644
index 00000000000..619cf1f86ca
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-4.c
@@ -0,0 +1,566 @@
+/* Check all short alignments.  */
+/* Duplicate logic as libatomic/testsuite/libatomic.c/atomic-op-2.c */
+/* Test __atomic routines for existence and proper execution on 2 byte
+   values with each valid memory model.  */
+/* { dg-do run } */
+/* { dg-options "-latomic -minline-atomics -Wno-address-of-packed-member" } */
+
+/* Test the execution of the __atomic_*OP builtin routines for a short.  */
+
+extern void abort(void);
+
+short count, res;
+const short init = ~0;
+
+struct A
+{
+   short a;
+   short b;
+} __attribute__ ((packed)) A;
+
+/* The fetch_op routines return the original value before the operation.  */
+
+void
+test_fetch_add (short* v)
+{
+  *v = 0;
+  count = 1;
+
+  if (__atomic_fetch_add (v, count, __ATOMIC_RELAXED) != 0)
+    abort ();
+
+  if (__atomic_fetch_add (v, 1, __ATOMIC_CONSUME) != 1)
+    abort ();
+
+  if (__atomic_fetch_add (v, count, __ATOMIC_ACQUIRE) != 2)
+    abort ();
+
+  if (__atomic_fetch_add (v, 1, __ATOMIC_RELEASE) != 3)
+    abort ();
+
+  if (__atomic_fetch_add (v, count, __ATOMIC_ACQ_REL) != 4)
+    abort ();
+
+  if (__atomic_fetch_add (v, 1, __ATOMIC_SEQ_CST) != 5)
+    abort ();
+}
+
+
+void
+test_fetch_sub (short* v)
+{
+  *v = res = 20;
+  count = 0;
+
+  if (__atomic_fetch_sub (v, count + 1, __ATOMIC_RELAXED) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, 1, __ATOMIC_CONSUME) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, count + 1, __ATOMIC_ACQUIRE) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, 1, __ATOMIC_RELEASE) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, count + 1, __ATOMIC_ACQ_REL) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, 1, __ATOMIC_SEQ_CST) !=  res--)
+    abort ();
+}
+
+void
+test_fetch_and (short* v)
+{
+  *v = init;
+
+  if (__atomic_fetch_and (v, 0, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_fetch_and (v, init, __ATOMIC_CONSUME) !=  0)
+    abort ();
+
+  if (__atomic_fetch_and (v, 0, __ATOMIC_ACQUIRE) !=  0)
+    abort ();
+
+  *v = ~*v;
+  if (__atomic_fetch_and (v, init, __ATOMIC_RELEASE) !=  init)
+    abort ();
+
+  if (__atomic_fetch_and (v, 0, __ATOMIC_ACQ_REL) !=  init)
+    abort ();
+
+  if (__atomic_fetch_and (v, 0, __ATOMIC_SEQ_CST) !=  0)
+    abort ();
+}
+
+void
+test_fetch_nand (short* v)
+{
+  *v = init;
+
+  if (__atomic_fetch_nand (v, 0, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_fetch_nand (v, init, __ATOMIC_CONSUME) !=  init)
+    abort ();
+
+  if (__atomic_fetch_nand (v, 0, __ATOMIC_ACQUIRE) !=  0 )
+    abort ();
+
+  if (__atomic_fetch_nand (v, init, __ATOMIC_RELEASE) !=  init)
+    abort ();
+
+  if (__atomic_fetch_nand (v, init, __ATOMIC_ACQ_REL) !=  0)
+    abort ();
+
+  if (__atomic_fetch_nand (v, 0, __ATOMIC_SEQ_CST) !=  init)
+    abort ();
+}
+
+void
+test_fetch_xor (short* v)
+{
+  *v = init;
+  count = 0;
+
+  if (__atomic_fetch_xor (v, count, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_fetch_xor (v, ~count, __ATOMIC_CONSUME) !=  init)
+    abort ();
+
+  if (__atomic_fetch_xor (v, 0, __ATOMIC_ACQUIRE) !=  0)
+    abort ();
+
+  if (__atomic_fetch_xor (v, ~count, __ATOMIC_RELEASE) !=  0)
+    abort ();
+
+  if (__atomic_fetch_xor (v, 0, __ATOMIC_ACQ_REL) !=  init)
+    abort ();
+
+  if (__atomic_fetch_xor (v, ~count, __ATOMIC_SEQ_CST) !=  init)
+    abort ();
+}
+
+void
+test_fetch_or (short* v)
+{
+  *v = 0;
+  count = 1;
+
+  if (__atomic_fetch_or (v, count, __ATOMIC_RELAXED) !=  0)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, 2, __ATOMIC_CONSUME) !=  1)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, count, __ATOMIC_ACQUIRE) !=  3)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, 8, __ATOMIC_RELEASE) !=  7)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, count, __ATOMIC_ACQ_REL) !=  15)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, count, __ATOMIC_SEQ_CST) !=  31)
+    abort ();
+}
+
+/* The OP_fetch routines return the new value after the operation.  */
+
+void
+test_add_fetch (short* v)
+{
+  *v = 0;
+  count = 1;
+
+  if (__atomic_add_fetch (v, count, __ATOMIC_RELAXED) != 1)
+    abort ();
+
+  if (__atomic_add_fetch (v, 1, __ATOMIC_CONSUME) != 2)
+    abort ();
+
+  if (__atomic_add_fetch (v, count, __ATOMIC_ACQUIRE) != 3)
+    abort ();
+
+  if (__atomic_add_fetch (v, 1, __ATOMIC_RELEASE) != 4)
+    abort ();
+
+  if (__atomic_add_fetch (v, count, __ATOMIC_ACQ_REL) != 5)
+    abort ();
+
+  if (__atomic_add_fetch (v, count, __ATOMIC_SEQ_CST) != 6)
+    abort ();
+}
+
+
+void
+test_sub_fetch (short* v)
+{
+  *v = res = 20;
+  count = 0;
+
+  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_RELAXED) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, 1, __ATOMIC_CONSUME) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_ACQUIRE) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, 1, __ATOMIC_RELEASE) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_ACQ_REL) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_SEQ_CST) !=  --res)
+    abort ();
+}
+
+void
+test_and_fetch (short* v)
+{
+  *v = init;
+
+  if (__atomic_and_fetch (v, 0, __ATOMIC_RELAXED) !=  0)
+    abort ();
+
+  *v = init;
+  if (__atomic_and_fetch (v, init, __ATOMIC_CONSUME) !=  init)
+    abort ();
+
+  if (__atomic_and_fetch (v, 0, __ATOMIC_ACQUIRE) !=  0)
+    abort ();
+
+  *v = ~*v;
+  if (__atomic_and_fetch (v, init, __ATOMIC_RELEASE) !=  init)
+    abort ();
+
+  if (__atomic_and_fetch (v, 0, __ATOMIC_ACQ_REL) !=  0)
+    abort ();
+
+  *v = ~*v;
+  if (__atomic_and_fetch (v, 0, __ATOMIC_SEQ_CST) !=  0)
+    abort ();
+}
+
+void
+test_nand_fetch (short* v)
+{
+  *v = init;
+
+  if (__atomic_nand_fetch (v, 0, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_nand_fetch (v, init, __ATOMIC_CONSUME) !=  0)
+    abort ();
+
+  if (__atomic_nand_fetch (v, 0, __ATOMIC_ACQUIRE) !=  init)
+    abort ();
+
+  if (__atomic_nand_fetch (v, init, __ATOMIC_RELEASE) !=  0)
+    abort ();
+
+  if (__atomic_nand_fetch (v, init, __ATOMIC_ACQ_REL) !=  init)
+    abort ();
+
+  if (__atomic_nand_fetch (v, 0, __ATOMIC_SEQ_CST) !=  init)
+    abort ();
+}
+
+
+
+void
+test_xor_fetch (short* v)
+{
+  *v = init;
+  count = 0;
+
+  if (__atomic_xor_fetch (v, count, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_xor_fetch (v, ~count, __ATOMIC_CONSUME) !=  0)
+    abort ();
+
+  if (__atomic_xor_fetch (v, 0, __ATOMIC_ACQUIRE) !=  0)
+    abort ();
+
+  if (__atomic_xor_fetch (v, ~count, __ATOMIC_RELEASE) !=  init)
+    abort ();
+
+  if (__atomic_xor_fetch (v, 0, __ATOMIC_ACQ_REL) !=  init)
+    abort ();
+
+  if (__atomic_xor_fetch (v, ~count, __ATOMIC_SEQ_CST) !=  0)
+    abort ();
+}
+
+void
+test_or_fetch (short* v)
+{
+  *v = 0;
+  count = 1;
+
+  if (__atomic_or_fetch (v, count, __ATOMIC_RELAXED) !=  1)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, 2, __ATOMIC_CONSUME) !=  3)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, count, __ATOMIC_ACQUIRE) !=  7)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, 8, __ATOMIC_RELEASE) !=  15)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, count, __ATOMIC_ACQ_REL) !=  31)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, count, __ATOMIC_SEQ_CST) !=  63)
+    abort ();
+}
+
+
+/* Test the OP routines with a result which isn't used. Use both variations
+   within each function.  */
+
+void
+test_add (short* v)
+{
+  *v = 0;
+  count = 1;
+
+  __atomic_add_fetch (v, count, __ATOMIC_RELAXED);
+  if (*v != 1)
+    abort ();
+
+  __atomic_fetch_add (v, count, __ATOMIC_CONSUME);
+  if (*v != 2)
+    abort ();
+
+  __atomic_add_fetch (v, 1 , __ATOMIC_ACQUIRE);
+  if (*v != 3)
+    abort ();
+
+  __atomic_fetch_add (v, 1, __ATOMIC_RELEASE);
+  if (*v != 4)
+    abort ();
+
+  __atomic_add_fetch (v, count, __ATOMIC_ACQ_REL);
+  if (*v != 5)
+    abort ();
+
+  __atomic_fetch_add (v, count, __ATOMIC_SEQ_CST);
+  if (*v != 6)
+    abort ();
+}
+
+
+void
+test_sub (short* v)
+{
+  *v = res = 20;
+  count = 0;
+
+  __atomic_sub_fetch (v, count + 1, __ATOMIC_RELAXED);
+  if (*v != --res)
+    abort ();
+
+  __atomic_fetch_sub (v, count + 1, __ATOMIC_CONSUME);
+  if (*v != --res)
+    abort ();
+
+  __atomic_sub_fetch (v, 1, __ATOMIC_ACQUIRE);
+  if (*v != --res)
+    abort ();
+
+  __atomic_fetch_sub (v, 1, __ATOMIC_RELEASE);
+  if (*v != --res)
+    abort ();
+
+  __atomic_sub_fetch (v, count + 1, __ATOMIC_ACQ_REL);
+  if (*v != --res)
+    abort ();
+
+  __atomic_fetch_sub (v, count + 1, __ATOMIC_SEQ_CST);
+  if (*v != --res)
+    abort ();
+}
+
+void
+test_and (short* v)
+{
+  *v = init;
+
+  __atomic_and_fetch (v, 0, __ATOMIC_RELAXED);
+  if (*v != 0)
+    abort ();
+
+  *v = init;
+  __atomic_fetch_and (v, init, __ATOMIC_CONSUME);
+  if (*v != init)
+    abort ();
+
+  __atomic_and_fetch (v, 0, __ATOMIC_ACQUIRE);
+  if (*v != 0)
+    abort ();
+
+  *v = ~*v;
+  __atomic_fetch_and (v, init, __ATOMIC_RELEASE);
+  if (*v != init)
+    abort ();
+
+  __atomic_and_fetch (v, 0, __ATOMIC_ACQ_REL);
+  if (*v != 0)
+    abort ();
+
+  *v = ~*v;
+  __atomic_fetch_and (v, 0, __ATOMIC_SEQ_CST);
+  if (*v != 0)
+    abort ();
+}
+
+void
+test_nand (short* v)
+{
+  *v = init;
+
+  __atomic_fetch_nand (v, 0, __ATOMIC_RELAXED);
+  if (*v != init)
+    abort ();
+
+  __atomic_fetch_nand (v, init, __ATOMIC_CONSUME);
+  if (*v != 0)
+    abort ();
+
+  __atomic_nand_fetch (v, 0, __ATOMIC_ACQUIRE);
+  if (*v != init)
+    abort ();
+
+  __atomic_nand_fetch (v, init, __ATOMIC_RELEASE);
+  if (*v != 0)
+    abort ();
+
+  __atomic_fetch_nand (v, init, __ATOMIC_ACQ_REL);
+  if (*v != init)
+    abort ();
+
+  __atomic_nand_fetch (v, 0, __ATOMIC_SEQ_CST);
+  if (*v != init)
+    abort ();
+}
+
+
+
+void
+test_xor (short* v)
+{
+  *v = init;
+  count = 0;
+
+  __atomic_xor_fetch (v, count, __ATOMIC_RELAXED);
+  if (*v != init)
+    abort ();
+
+  __atomic_fetch_xor (v, ~count, __ATOMIC_CONSUME);
+  if (*v != 0)
+    abort ();
+
+  __atomic_xor_fetch (v, 0, __ATOMIC_ACQUIRE);
+  if (*v != 0)
+    abort ();
+
+  __atomic_fetch_xor (v, ~count, __ATOMIC_RELEASE);
+  if (*v != init)
+    abort ();
+
+  __atomic_fetch_xor (v, 0, __ATOMIC_ACQ_REL);
+  if (*v != init)
+    abort ();
+
+  __atomic_xor_fetch (v, ~count, __ATOMIC_SEQ_CST);
+  if (*v != 0)
+    abort ();
+}
+
+void
+test_or (short* v)
+{
+  *v = 0;
+  count = 1;
+
+  __atomic_or_fetch (v, count, __ATOMIC_RELAXED);
+  if (*v != 1)
+    abort ();
+
+  count *= 2;
+  __atomic_fetch_or (v, count, __ATOMIC_CONSUME);
+  if (*v != 3)
+    abort ();
+
+  count *= 2;
+  __atomic_or_fetch (v, 4, __ATOMIC_ACQUIRE);
+  if (*v != 7)
+    abort ();
+
+  count *= 2;
+  __atomic_fetch_or (v, 8, __ATOMIC_RELEASE);
+  if (*v != 15)
+    abort ();
+
+  count *= 2;
+  __atomic_or_fetch (v, count, __ATOMIC_ACQ_REL);
+  if (*v != 31)
+    abort ();
+
+  count *= 2;
+  __atomic_fetch_or (v, count, __ATOMIC_SEQ_CST);
+  if (*v != 63)
+    abort ();
+}
+
+int
+main () {
+  short* V[] = {&A.a, &A.b};
+
+  for (int i = 0; i < 2; i++) {
+    test_fetch_add (V[i]);
+    test_fetch_sub (V[i]);
+    test_fetch_and (V[i]);
+    test_fetch_nand (V[i]);
+    test_fetch_xor (V[i]);
+    test_fetch_or (V[i]);
+
+    test_add_fetch (V[i]);
+    test_sub_fetch (V[i]);
+    test_and_fetch (V[i]);
+    test_nand_fetch (V[i]);
+    test_xor_fetch (V[i]);
+    test_or_fetch (V[i]);
+
+    test_add (V[i]);
+    test_sub (V[i]);
+    test_and (V[i]);
+    test_nand (V[i]);
+    test_xor (V[i]);
+    test_or (V[i]);
+  }
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-5.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-5.c
new file mode 100644
index 00000000000..c2751235dbf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-5.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* Verify that constant propogation is functioning.  */
+/* The -4 should be propogated into an ANDI statement. */
+/* { dg-options "-minline-atomics" } */
+/* { dg-final { scan-assembler-not "\tli\t[at]\d,-4" } } */
+
+char bar;
+
+int
+main ()
+{
+  __sync_fetch_and_add(&bar, 1);
+}
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-6.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-6.c
new file mode 100644
index 00000000000..18249bae7d1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-6.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* Verify that masks are generated to the correct size.  */
+/* { dg-options "-O3 -minline-atomics" } */
+/* Check for mask */
+/* { dg-final { scan-assembler "\tli\t[at]\d,255" } } */
+
+int
+main ()
+{
+  char bar __attribute__((aligned (32)));
+  __sync_fetch_and_add(&bar, 0);
+}
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-7.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-7.c
new file mode 100644
index 00000000000..81bbf4badce
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-7.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* Verify that masks are generated to the correct size.  */
+/* { dg-options "-O3 -minline-atomics" } */
+/* Check for mask */
+/* { dg-final { scan-assembler "\tli\t[at]\d,65535" } } */
+
+int
+main ()
+{
+  short bar __attribute__((aligned (32)));
+  __sync_fetch_and_add(&bar, 0);
+}
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-8.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-8.c
new file mode 100644
index 00000000000..d27562ed981
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-8.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* Verify that masks are aligned properly.  */
+/* { dg-options "-O3 -minline-atomics" } */
+/* Check for mask */
+/* { dg-final { scan-assembler "\tli\t[at]\d,16711680" } } */
+
+int
+main ()
+{
+  struct A {
+    char a;
+    char b;
+    char c;
+    char d;
+  } __attribute__ ((packed)) __attribute__((aligned (32))) A;
+  __sync_fetch_and_add(&A.c, 0);
+}
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-9.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-9.c
new file mode 100644
index 00000000000..382849702ca
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-9.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* Verify that masks are aligned properly.  */
+/* { dg-options "-O3 -minline-atomics" } */
+/* Check for mask */
+/* { dg-final { scan-assembler "\tli\t[at]\d,-16777216" } } */
+
+int
+main ()
+{
+  struct A {
+    char a;
+    char b;
+    char c;
+    char d;
+  } __attribute__ ((packed)) __attribute__((aligned (32))) A;
+  __sync_fetch_and_add(&A.d, 0);
+}
diff --git a/libgcc/config/riscv/atomic.c b/libgcc/config/riscv/atomic.c
index 7007e7a20e4..a29909b97b5 100644
--- a/libgcc/config/riscv/atomic.c
+++ b/libgcc/config/riscv/atomic.c
@@ -30,6 +30,8 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 #define INVERT		"not %[tmp1], %[tmp1]\n\t"
 #define DONT_INVERT	""
 
+/* Logic duplicated in gcc/gcc/config/riscv/sync.md for use when inlining is enabled */
+
 #define GENERATE_FETCH_AND_OP(type, size, opname, insn, invert, cop)	\
   type __sync_fetch_and_ ## opname ## _ ## size (type *p, type v)	\
   {									\
-- 
2.25.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] RISCV: Add support for inlining subword atomics
  2022-03-10 22:36   ` [PATCH v2] RISCV: Add support for inlining subword atomics Patrick O'Neill
@ 2022-04-08  2:48     ` Pan RZ
  2022-04-08 16:48       ` Patrick O'Neill
  2022-04-19 17:17     ` [PATCH v3] RISC-V: Add support for inlining subword atomic operations Patrick O'Neill
  1 sibling, 1 reply; 10+ messages in thread
From: Pan RZ @ 2022-04-08  2:48 UTC (permalink / raw)
  To: patrick; +Cc: gcc-patches

Hi Patrick,

Glad to know that efforts have been made to add inlining subword atomic supports into gcc. The patch looks great so far, yet as Andreas Schwab has pointed out (at riscv-collab/riscv-gcc#337), looks like it only contains atomic fetch stuff. Just wondering do you have further plans to implement support for atomic store / exchange as well? Also, as a reminder, note that after adding store / exchange supports, some related macros like ATOMIC_BOOL_LOCK_FREE and ATOMIC_CHAR_LOCK_FREE's values (defined in <atomic>) may need to be set to true. Currently in RISC-V gcc, they are all defined as false. This may also need to be done for std::atomic<bool>::is_always_lock_free and std::atomic_is_lock_free(bool_var).

See:https://en.cppreference.com/w/cpp/atomic/atomic_is_lock_free  and
https://github.com/riscv-collab/riscv-gcc/issues/337#issuecomment-1086664815  

What's your opinion on this?

Best regards, RZ Pan (XieJiSS)


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] RISCV: Add support for inlining subword atomics
  2022-04-08  2:48     ` Pan RZ
@ 2022-04-08 16:48       ` Patrick O'Neill
  2022-04-08 18:12         ` Pan RZ
  0 siblings, 1 reply; 10+ messages in thread
From: Patrick O'Neill @ 2022-04-08 16:48 UTC (permalink / raw)
  To: Pan RZ; +Cc: gcc-patches

Hi RZ Pan,

I'll start working on the atomic store/exchange stuff. It shouldn't be
too difficult to add since it will have similar masking logic to
atomic fetch.

Also - I briefly looked and couldn't find the place where those macro's
values for RISC-V are defined in GCC. If anyone can point me in the
right direction for that, I would appreciate it :)

Thank you,
Patrick

On 4/7/22 19:48, Pan RZ wrote:
> Hi Patrick,
>
> Glad to know that efforts have been made to add inlining subword 
> atomic supports into gcc. The patch looks great so far, yet as Andreas 
> Schwab has pointed out (at riscv-collab/riscv-gcc#337), looks like it 
> only contains atomic fetch stuff. Just wondering do you have further 
> plans to implement support for atomic store / exchange as well? Also, 
> as a reminder, note that after adding store / exchange supports, some 
> related macros like ATOMIC_BOOL_LOCK_FREE and ATOMIC_CHAR_LOCK_FREE's 
> values (defined in <atomic>) may need to be set to true. Currently in 
> RISC-V gcc, they are all defined as false. This may also need to be 
> done for std::atomic<bool>::is_always_lock_free and 
> std::atomic_is_lock_free(bool_var).
>
> See:https://en.cppreference.com/w/cpp/atomic/atomic_is_lock_free and
> https://github.com/riscv-collab/riscv-gcc/issues/337#issuecomment-1086664815 
>
> What's your opinion on this?
>
> Best regards, RZ Pan (XieJiSS)
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] RISCV: Add support for inlining subword atomics
  2022-04-08 16:48       ` Patrick O'Neill
@ 2022-04-08 18:12         ` Pan RZ
  2022-04-08 18:39           ` Patrick O'Neill
  0 siblings, 1 reply; 10+ messages in thread
From: Pan RZ @ 2022-04-08 18:12 UTC (permalink / raw)
  To: patrick; +Cc: gcc-patches

Hi Patrick,


We are more than delighted to hear that you'd like to implement inlining 
subword atomic load/store and exchange as well!


I searched for these macros in the gcc codebase, and it seems like the 
internal logic that defines ATOMIC_* builtin macros can be found at 
gcc/c-family/c-cppbuiltin.cc#L665-L796 (inside static void 
cpp_atomic_builtins), while is_always_lock_free is at 
libstdc++-v3/include/std/atomic#L99. I'm not 100% sure, though, due to 
my limited understanding of the codebase. Hope this information helps :-)


Yours sincerely,

Pan RZ (XieJiSS) @ PLCT Lab


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] RISCV: Add support for inlining subword atomics
  2022-04-08 18:12         ` Pan RZ
@ 2022-04-08 18:39           ` Patrick O'Neill
  2022-04-08 20:29             ` Andreas Schwab
  0 siblings, 1 reply; 10+ messages in thread
From: Patrick O'Neill @ 2022-04-08 18:39 UTC (permalink / raw)
  To: Pan RZ; +Cc: gcc-patches

Hi Pan RZ,

I appreciate the help - that's a good starting point for the macros.

It looks like the file:
gcc/config/nds32/linux.h
interacts with the macro:
#define HAVE_sync_compare_and_swaphi 1

I'm not sure if that's the correct way to do it/if this is defined in a
different way for targets like x86/ARM/etc.

I'll circle back around to it once the other inline ops are done. In the
meantime if anyone else wants to dig in and figure out how to update the
macros (for RISC-V) without impacting other targets, that would make
this patch ready a little faster ;)

Thanks,
Patrick

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] RISCV: Add support for inlining subword atomics
  2022-04-08 18:39           ` Patrick O'Neill
@ 2022-04-08 20:29             ` Andreas Schwab
  0 siblings, 0 replies; 10+ messages in thread
From: Andreas Schwab @ 2022-04-08 20:29 UTC (permalink / raw)
  To: Patrick O'Neill; +Cc: Pan RZ, gcc-patches

On Apr 08 2022, Patrick O'Neill wrote:

> It looks like the file:
> gcc/config/nds32/linux.h
> interacts with the macro:
> #define HAVE_sync_compare_and_swaphi 1
>
> I'm not sure if that's the correct way to do it/if this is defined in a
> different way for targets like x86/ARM/etc.

They are normally autogenerated, see insn-flags.h.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v3] RISC-V: Add support for inlining subword atomic operations
  2022-03-10 22:36   ` [PATCH v2] RISCV: Add support for inlining subword atomics Patrick O'Neill
  2022-04-08  2:48     ` Pan RZ
@ 2022-04-19 17:17     ` Patrick O'Neill
  2022-05-03 11:59       ` Andreas Schwab
  1 sibling, 1 reply; 10+ messages in thread
From: Patrick O'Neill @ 2022-04-19 17:17 UTC (permalink / raw)
  To: gcc-patches; +Cc: palmer, panrz, schwab, Patrick O'Neill

RISC-V has no support for subword atomic operations; code currently
generates libatomic library calls.

This patch changes the default behavior to inline subword atomic calls 
(using the same logic as the existing library call).
Behavior can be specified using the -minline-atomics and
-mno-inline-atomics command line flags.

gcc/libgcc/config/riscv/atomic.c has the same logic implemented in asm.
This will need to stay for backwards compatibility and the
-mno-inline-atomics flag.

2022-04-19 Patrick O'Neill <patrick@rivosinc.com>

	PR target/104338
	* riscv-protos.h: Add helper function stubs.
	* riscv.cc: Add helper functions for subword masking.
	* riscv.opt: Add command-line flag.
	* sync.md: Add masking logic and inline asm for fetch_and_op,
	fetch_and_nand, CAS, and exchange ops.
	* invoke.texi: Add blurb regarding command-line flag.
	* inline-atomics-1.c: New test.
	* inline-atomics-2.c: Likewise.
	* inline-atomics-3.c: Likewise.
	* inline-atomics-4.c: Likewise.
	* inline-atomics-5.c: Likewise.
	* inline-atomics-6.c: Likewise.
	* inline-atomics-7.c: Likewise.
	* inline-atomics-8.c: Likewise.
	* atomic.c: Add reference to duplicate logic.

Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
---
There may be further concerns about the memory consistency of these 
operations, but this patch focuses on simply moving the logic inline.

See "[RFC 0/7] RISCV: Implement ISA Manual Table A.6 Mappings" sent to
the gcc-patches mailing list on 2022-04-07 for info about these
concerns.
---
See target/84568 on bugzilla for ABI break info.
---
The implementation for fetch_nand is clunky. I'm not convinced that this
is the best implementation since it duplicates all the logic in order to
change ~2 lines.
---
v2 Changelog:
 - Add texti blurb
 - Update target flag
 - add 'UNSPEC_SYNC_OLD_OP_SUBWORD' for subword ops
---
v3 Changelog:
 - Update target flag to be on by default
 - Add inline CAS and fetch&nand ops
 - Remove brittle tests & -latomic flag for inline tests
 - Add compare_and_exchange and exchange tests
 - Move duplicate masking logic to riscv.cc helper functions
---
 gcc/config/riscv/riscv-protos.h               |   2 +
 gcc/config/riscv/riscv.cc                     |  52 ++
 gcc/config/riscv/riscv.opt                    |   4 +
 gcc/config/riscv/sync.md                      | 318 ++++++++++
 gcc/doc/invoke.texi                           |   7 +
 .../gcc.target/riscv/inline-atomics-1.c       |  18 +
 .../gcc.target/riscv/inline-atomics-2.c       |  19 +
 .../gcc.target/riscv/inline-atomics-3.c       | 569 ++++++++++++++++++
 .../gcc.target/riscv/inline-atomics-4.c       | 566 +++++++++++++++++
 .../gcc.target/riscv/inline-atomics-5.c       |  87 +++
 .../gcc.target/riscv/inline-atomics-6.c       |  87 +++
 .../gcc.target/riscv/inline-atomics-7.c       |  69 +++
 .../gcc.target/riscv/inline-atomics-8.c       |  69 +++
 libgcc/config/riscv/atomic.c                  |   2 +
 14 files changed, 1869 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-8.c

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 20c2381c21a..14f3c8f0d4e 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -74,6 +74,8 @@ extern bool riscv_expand_block_move (rtx, rtx, rtx);
 extern bool riscv_store_data_bypass_p (rtx_insn *, rtx_insn *);
 extern rtx riscv_gen_gpr_save_insn (struct riscv_frame_info *);
 extern bool riscv_gpr_save_operation_p (rtx);
+extern void riscv_subword_address (rtx, rtx *, rtx *, rtx *, rtx *);
+extern void riscv_lshift_subword (machine_mode, rtx, rtx, rtx *);
 
 /* Routines implemented in riscv-c.cc.  */
 void riscv_cpu_cpp_builtins (cpp_reader *);
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index ee756aab694..cfd2f7710db 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -5587,6 +5587,58 @@ riscv_asan_shadow_offset (void)
   return TARGET_64BIT ? (HOST_WIDE_INT_1 << 29) : 0;
 }
 
+/* Helper function for extracting a subword from memory.  */
+
+void
+riscv_subword_address (rtx mem, rtx *aligned_mem, rtx *shift, rtx *mask,
+		       rtx *not_mask)
+{
+  /* Align the memory addess to a word.  */
+  rtx addr = force_reg (Pmode, XEXP (mem, 0));
+
+  rtx aligned_addr = gen_reg_rtx (Pmode);
+  emit_move_insn (aligned_addr,  gen_rtx_AND (Pmode, addr,
+					      gen_int_mode (-4, Pmode)));
+
+  *aligned_mem = change_address (mem, SImode, aligned_addr);
+
+  /* Calculate the shift amount.  */
+  *shift = gen_reg_rtx (SImode);
+  emit_move_insn (*shift, gen_rtx_AND (SImode, gen_lowpart (SImode, addr),
+				      gen_int_mode (3, SImode)));
+  emit_move_insn (*shift, gen_rtx_ASHIFT (SImode, *shift,
+					 gen_int_mode(3, SImode)));
+
+  /* Calculate the mask.  */
+  int unshifted_mask;
+  if (GET_MODE (mem) == QImode)
+    unshifted_mask = 0xFF;
+  else
+    unshifted_mask = 0xFFFF;
+
+  rtx mask_reg = gen_reg_rtx (SImode);
+  emit_move_insn (mask_reg, gen_int_mode(unshifted_mask, SImode));
+
+  emit_move_insn (*mask, gen_rtx_ASHIFT(SImode, mask_reg,
+				       gen_lowpart (QImode, *shift)));
+
+  emit_move_insn (*not_mask, gen_rtx_NOT(SImode, *mask));
+}
+
+/* Leftshift a subword within an SImode register.  */
+
+void
+riscv_lshift_subword (machine_mode mode, rtx value, rtx shift,
+		     rtx *shifted_value)
+{
+  rtx value_reg = gen_reg_rtx (SImode);
+  emit_move_insn (value_reg, simplify_gen_subreg (SImode, value,
+						  mode, 0));
+
+  emit_move_insn(*shifted_value, gen_rtx_ASHIFT(SImode, value_reg,
+						gen_lowpart (QImode, shift)));
+}
+
 /* Initialize the GCC target structure.  */
 #undef TARGET_ASM_ALIGNED_HI_OP
 #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t"
diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index 492aad12324..328d848d698 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -225,3 +225,7 @@ Enum(isa_spec_class) String(20191213) Value(ISA_SPEC_CLASS_20191213)
 misa-spec=
 Target RejectNegative Joined Enum(isa_spec_class) Var(riscv_isa_spec) Init(TARGET_DEFAULT_ISA_SPEC)
 Set the version of RISC-V ISA spec.
+
+minline-atomics
+Target Var(TARGET_INLINE_SUBWORD_ATOMIC) Init(1)
+Always inline subword atomic operations.
diff --git a/gcc/config/riscv/sync.md b/gcc/config/riscv/sync.md
index 86b41e6b00a..198ddbed8e9 100644
--- a/gcc/config/riscv/sync.md
+++ b/gcc/config/riscv/sync.md
@@ -21,8 +21,11 @@
 
 (define_c_enum "unspec" [
   UNSPEC_COMPARE_AND_SWAP
+  UNSPEC_COMPARE_AND_SWAP_SUBWORD
   UNSPEC_SYNC_OLD_OP
+  UNSPEC_SYNC_OLD_OP_SUBWORD
   UNSPEC_SYNC_EXCHANGE
+  UNSPEC_SYNC_EXCHANGE_SUBWORD
   UNSPEC_ATOMIC_STORE
   UNSPEC_MEMORY_BARRIER
 ])
@@ -92,6 +95,145 @@
   "%F3amo<insn>.<amo>%A3 %0,%z2,%1"
   [(set (attr "length") (const_int 8))])
 
+(define_expand "atomic_fetch_<atomic_optab><mode>"
+  [(set (match_operand:SHORT 0 "register_operand" "=&r")	      ;; old value at mem
+	(match_operand:SHORT 1 "memory_operand" "+A"))		      ;; mem location
+   (set (match_dup 1)
+	(unspec_volatile:SHORT
+	  [(any_atomic:SHORT (match_dup 1)
+		     (match_operand:SHORT 2 "reg_or_0_operand" "rJ")) ;; value for op
+	   (match_operand:SI 3 "const_int_operand")]		      ;; model
+	 UNSPEC_SYNC_OLD_OP_SUBWORD))]
+  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
+{
+  /* We have no QImode/HImode atomics, so form a mask, then use
+     subword_atomic_fetch_strong_<mode> to implement a LR/SC version of the
+     operation. */
+
+  /* Logic duplicated in gcc/libgcc/config/riscv/atomic.c for use when inlining
+     is disabled */
+
+  rtx old = gen_reg_rtx (SImode);
+  rtx mem = operands[1];
+  rtx value = operands[2];
+  rtx aligned_mem = gen_reg_rtx (SImode);
+  rtx shift = gen_reg_rtx (SImode);
+  rtx mask = gen_reg_rtx (SImode);
+  rtx not_mask = gen_reg_rtx (SImode);
+
+  riscv_subword_address (mem, &aligned_mem, &shift, &mask, &not_mask);
+
+  rtx shifted_value = gen_reg_rtx (SImode);
+  riscv_lshift_subword (<MODE>mode, value, shift, &shifted_value);
+
+  emit_insn (gen_subword_atomic_fetch_strong_<atomic_optab> (old, aligned_mem,
+							     shifted_value,
+							     mask, not_mask));
+
+  emit_move_insn (old, gen_rtx_ASHIFTRT (SImode, old,
+					 gen_lowpart (QImode, shift)));
+
+  emit_move_insn (operands[0], gen_lowpart (<MODE>mode, old));
+
+  DONE;
+})
+
+(define_insn "subword_atomic_fetch_strong_<atomic_optab>"
+  [(set (match_operand:SI 0 "register_operand" "=&r")		   ;; old value at mem
+	(match_operand:SI 1 "memory_operand" "+A"))		   ;; mem location
+   (set (match_dup 1)
+	(unspec_volatile:SI
+	  [(any_atomic:SI (match_dup 1)
+		     (match_operand:SI 2 "register_operand" "rI")) ;; value for op
+	   (match_operand:SI 3 "register_operand" "rI")]	   ;; mask
+	 UNSPEC_SYNC_OLD_OP_SUBWORD))
+    (match_operand:SI 4 "register_operand" "rI")		   ;; not_mask
+    (clobber (match_scratch:SI 5 "=&r"))			   ;; tmp_1
+    (clobber (match_scratch:SI 6 "=&r"))]			   ;; tmp_2
+  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
+  {
+    return
+    "1:\;"
+    "lr.w.aq\t%0, %1\;"
+    "<insn>\t%5, %0, %2\;"
+    "and\t%5, %5, %3\;"
+    "and\t%6, %0, %4\;"
+    "or\t%6, %6, %5\;"
+    "sc.w.rl\t%5, %6, %1\;"
+    "bnez\t%5, 1b";
+  }
+  [(set (attr "length") (const_int 28))])
+
+(define_expand "atomic_fetch_nand<mode>"
+  [(set (match_operand:SHORT 0 "register_operand" "=&r")
+	(match_operand:SHORT 1 "memory_operand" "+A"))
+   (set (match_dup 1)
+	(unspec_volatile:SHORT
+	  [(not:SHORT (and:SHORT (match_dup 1)
+				 (match_operand:SHORT 2 "reg_or_0_operand" "rJ")))
+	   (match_operand:SI 3 "const_int_operand")] ;; model
+	 UNSPEC_SYNC_OLD_OP_SUBWORD))]
+  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
+{
+  /* We have no QImode/HImode atomics, so form a mask, then use
+     subword_atomic_fetch_strong_nand to implement a LR/SC version of the
+     operation. */
+
+  /* Logic duplicated in gcc/libgcc/config/riscv/atomic.c for use when inlining
+     is disabled */
+
+  rtx old = gen_reg_rtx (SImode);
+  rtx mem = operands[1];
+  rtx value = operands[2];
+  rtx aligned_mem = gen_reg_rtx (SImode);
+  rtx shift = gen_reg_rtx (SImode);
+  rtx mask = gen_reg_rtx (SImode);
+  rtx not_mask = gen_reg_rtx (SImode);
+
+  riscv_subword_address (mem, &aligned_mem, &shift, &mask, &not_mask);
+
+  rtx shifted_value = gen_reg_rtx (SImode);
+  riscv_lshift_subword (<MODE>mode, value, shift, &shifted_value);
+
+  emit_insn (gen_subword_atomic_fetch_strong_nand (old, aligned_mem,
+						   shifted_value,
+						   mask, not_mask));
+
+  emit_move_insn (old, gen_rtx_ASHIFTRT (SImode, old,
+					 gen_lowpart (QImode, shift)));
+
+  emit_move_insn (operands[0], gen_lowpart (<MODE>mode, old));
+
+  DONE;
+})
+
+(define_insn "subword_atomic_fetch_strong_nand"
+  [(set (match_operand:SI 0 "register_operand" "=&r")			  ;; old value at mem
+	(match_operand:SI 1 "memory_operand" "+A"))			  ;; mem location
+   (set (match_dup 1)
+	(unspec_volatile:SI
+	  [(not:SI (and:SI (match_dup 1)
+			   (match_operand:SI 2 "register_operand" "rI"))) ;; value for op
+	   (match_operand:SI 3 "register_operand" "rI")]		  ;; mask
+	 UNSPEC_SYNC_OLD_OP_SUBWORD))
+    (match_operand:SI 4 "register_operand" "rI")			  ;; not_mask
+    (clobber (match_scratch:SI 5 "=&r"))				  ;; tmp_1
+    (clobber (match_scratch:SI 6 "=&r"))]				  ;; tmp_2
+  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
+  {
+    return
+    "1:\;"
+    "lr.w.aq\t%0, %1\;"
+    "and\t%5, %0, %2\;"
+    "not\t%5, %5\;"
+    "and\t%5, %5, %3\;"
+    "and\t%6, %0, %4\;"
+    "or\t%6, %6, %5\;"
+    "sc.w.rl\t%5, %6, %1\;"
+    "bnez\t%5, 1b";
+  }
+  [(set (attr "length") (const_int 32))])
+
 (define_insn "atomic_exchange<mode>"
   [(set (match_operand:GPR 0 "register_operand" "=&r")
 	(unspec_volatile:GPR
@@ -104,6 +246,60 @@
   "%F3amoswap.<amo>%A3 %0,%z2,%1"
   [(set (attr "length") (const_int 8))])
 
+(define_expand "atomic_exchange<mode>"
+  [(set (match_operand:SHORT 0 "register_operand" "=&r")
+	(unspec_volatile:SHORT
+	  [(match_operand:SHORT 1 "memory_operand" "+A")
+	   (match_operand:SI 3 "const_int_operand")] ;; model
+	  UNSPEC_SYNC_EXCHANGE_SUBWORD))
+   (set (match_dup 1)
+	(match_operand:SHORT 2 "register_operand" "0"))]
+  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
+{
+  rtx old = gen_reg_rtx (SImode);
+  rtx mem = operands[1];
+  rtx value = operands[2];
+  rtx aligned_mem = gen_reg_rtx (SImode);
+  rtx shift = gen_reg_rtx (SImode);
+  rtx mask = gen_reg_rtx (SImode);
+  rtx not_mask = gen_reg_rtx (SImode);
+
+  riscv_subword_address (mem, &aligned_mem, &shift, &mask, &not_mask);
+
+  rtx shifted_value = gen_reg_rtx (SImode);
+  riscv_lshift_subword (<MODE>mode, value, shift, &shifted_value);
+
+  emit_insn (gen_subword_atomic_exchange_strong (old, aligned_mem,
+						 shifted_value, not_mask));
+
+  emit_move_insn (old, gen_rtx_ASHIFTRT (SImode, old,
+					 gen_lowpart (QImode, shift)));
+
+  emit_move_insn (operands[0], gen_lowpart (<MODE>mode, old));
+  DONE;
+})
+
+(define_insn "subword_atomic_exchange_strong"
+  [(set (match_operand:SI 0 "register_operand" "=&r")	 ;; old value at mem
+	(match_operand:SI 1 "memory_operand" "+A"))	 ;; mem location
+   (set (match_dup 1)
+	(unspec_volatile:SI
+	  [(match_operand:SI 2 "reg_or_0_operand" "rI")  ;; value
+	   (match_operand:SI 3 "reg_or_0_operand" "rI")] ;; not_mask
+      UNSPEC_SYNC_EXCHANGE_SUBWORD))
+    (clobber (match_scratch:SI 4 "=&r"))]		 ;; tmp_1
+  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
+  {
+    return
+    "1:\;"
+    "lr.w.aq\t%0, %1\;"
+    "and\t%4, %0, %3\;"
+    "or\t%4, %4, %2\;"
+    "sc.w.rl\t%4, %4, %1\;"
+    "bnez\t%4, 1b";
+  }
+  [(set (attr "length") (const_int 20))])
+
 (define_insn "atomic_cas_value_strong<mode>"
   [(set (match_operand:GPR 0 "register_operand" "=&r")
 	(match_operand:GPR 1 "memory_operand" "+A"))
@@ -152,6 +348,128 @@
   DONE;
 })
 
+(define_expand "atomic_compare_and_swap<mode>"
+  [(match_operand:SI 0 "register_operand" "")    ;; bool output
+   (match_operand:SHORT 1 "register_operand" "") ;; val output
+   (match_operand:SHORT 2 "memory_operand" "")   ;; memory
+   (match_operand:SHORT 3 "reg_or_0_operand" "") ;; expected value
+   (match_operand:SHORT 4 "reg_or_0_operand" "") ;; desired value
+   (match_operand:SI 5 "const_int_operand" "")   ;; is_weak
+   (match_operand:SI 6 "const_int_operand" "")   ;; mod_s
+   (match_operand:SI 7 "const_int_operand" "")]  ;; mod_f
+  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
+{
+  emit_insn (gen_atomic_cas_value_strong<mode> (operands[1], operands[2],
+						operands[3], operands[4],
+						operands[6], operands[7]));
+
+  rtx val = gen_reg_rtx (SImode);
+  if (operands[1] != const0_rtx)
+    emit_insn (gen_rtx_SET (val, gen_rtx_SIGN_EXTEND (SImode, operands[1])));
+  else
+    emit_insn (gen_rtx_SET (val, const0_rtx));
+
+  rtx exp = gen_reg_rtx (SImode);
+  if (operands[3] != const0_rtx)
+    emit_insn (gen_rtx_SET (exp, gen_rtx_SIGN_EXTEND (SImode, operands[3])));
+  else
+    emit_insn (gen_rtx_SET (exp, const0_rtx));
+
+  rtx compare = val;
+  if (exp != const0_rtx)
+    {
+      rtx difference = gen_rtx_MINUS (SImode, val, exp);
+      compare = gen_reg_rtx (SImode);
+      emit_insn (gen_rtx_SET (compare, difference));
+    }
+
+  if (word_mode != SImode)
+    {
+      rtx reg = gen_reg_rtx (word_mode);
+      emit_insn (gen_rtx_SET (reg, gen_rtx_SIGN_EXTEND (word_mode, compare)));
+      compare = reg;
+    }
+
+  emit_insn (gen_rtx_SET (operands[0], gen_rtx_EQ (SImode, compare, const0_rtx)));
+  DONE;
+})
+
+(define_expand "atomic_cas_value_strong<mode>"
+  [(set (match_operand:SHORT 0 "register_operand" "=&r")			;; val output
+	(match_operand:SHORT 1 "memory_operand" "+A"))				;; memory
+   (set (match_dup 1)
+	(unspec_volatile:SHORT [(match_operand:SHORT 2 "reg_or_0_operand" "rJ")	;; expected val
+				(match_operand:SHORT 3 "reg_or_0_operand" "rJ")	;; desired val
+				(match_operand:SI 4 "const_int_operand")	;; mod_s
+				(match_operand:SI 5 "const_int_operand")]	;; mod_f
+	 UNSPEC_COMPARE_AND_SWAP_SUBWORD))
+   (clobber (match_scratch:SHORT 6 "=&r"))]
+  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
+{
+  /* We have no QImode/HImode atomics, so form a mask, then use
+     subword_atomic_cas_strong<mode> to implement a LR/SC version of the
+     operation. */
+
+  /* Logic duplicated in gcc/libgcc/config/riscv/atomic.c for use when inlining
+     is disabled */
+
+  rtx old = gen_reg_rtx (SImode);
+  rtx mem = operands[1];
+  rtx aligned_mem = gen_reg_rtx (SImode);
+  rtx shift = gen_reg_rtx (SImode);
+  rtx mask = gen_reg_rtx (SImode);
+  rtx not_mask = gen_reg_rtx (SImode);
+
+  riscv_subword_address (mem, &aligned_mem, &shift, &mask, &not_mask);
+
+  rtx o = operands[2];
+  rtx n = operands[3];
+  rtx shifted_o = gen_reg_rtx (SImode);
+  rtx shifted_n = gen_reg_rtx (SImode);
+
+  riscv_lshift_subword (<MODE>mode, o, shift, &shifted_o);
+  riscv_lshift_subword (<MODE>mode, n, shift, &shifted_n);
+
+  emit_move_insn (shifted_o, gen_rtx_AND (SImode, shifted_o, mask));
+  emit_move_insn (shifted_n, gen_rtx_AND (SImode, shifted_n, mask));
+
+  emit_insn (gen_subword_atomic_cas_strong (old, aligned_mem,
+					    shifted_o, shifted_n,
+					    mask, not_mask));
+
+  emit_move_insn (old, gen_rtx_ASHIFTRT (SImode, old,
+					 gen_lowpart (QImode, shift)));
+
+  emit_move_insn (operands[0], gen_lowpart(<MODE>mode, old));
+
+  DONE;
+})
+
+(define_insn "subword_atomic_cas_strong"
+  [(set (match_operand:SI 0 "register_operand" "=&r")			   ;; old value at mem
+	(match_operand:SI 1 "memory_operand" "+A"))			   ;; mem location
+   (set (match_dup 1)
+	(unspec_volatile:SI [(match_operand:SI 2 "reg_or_0_operand" "rJ")  ;; o
+			     (match_operand:SI 3 "reg_or_0_operand" "rJ")] ;; n
+	 UNSPEC_COMPARE_AND_SWAP_SUBWORD))
+	(match_operand:SI 4 "register_operand" "rI")			   ;; mask
+	(match_operand:SI 5 "register_operand" "rI")			   ;; not_mask
+	(clobber (match_scratch:SI 6 "=&r"))]				   ;; tmp_1
+  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
+  {
+    return
+    "1:\;"
+    "lr.w.aq\t%0, %1\;"
+    "and\t%6, %0, %4\;"
+    "bne\t%6, %z2, 1f\;"
+    "and\t%6, %0, %5\;"
+    "or\t%6, %6, %3\;"
+    "sc.w.rl\t%6, %6, %1\;"
+    "bnez\t%6, 1b\;"
+    "1:";
+  }
+  [(set (attr "length") (const_int 28))])
+
 (define_expand "atomic_test_and_set"
   [(match_operand:QI 0 "register_operand" "")     ;; bool output
    (match_operand:QI 1 "memory_operand" "+A")    ;; memory
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index dd2d3879f86..ee2f05dd4c4 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1211,6 +1211,7 @@ See RS/6000 and PowerPC Options.
 -mbig-endian  -mlittle-endian @gol
 -mstack-protector-guard=@var{guard} -mstack-protector-guard-reg=@var{reg} @gol
 -mstack-protector-guard-offset=@var{offset}}
+-minline-atomics  -mno-inline-atomics @gol
 
 @emph{RL78 Options}
 @gccoptlist{-msim  -mmul=none  -mmul=g13  -mmul=g14  -mallregs @gol
@@ -28041,6 +28042,12 @@ Do or don't use smaller but slower prologue and epilogue code that uses
 library function calls.  The default is to use fast inline prologues and
 epilogues.
 
+@item -minline-atomics
+@itemx -mno-inline-atomics
+@opindex minline-atomics
+Do or don't use smaller but slower subword atomic emulation code that uses
+library function calls.  The default is to use fast inline subword atomics.
+
 @item -mshorten-memrefs
 @itemx -mno-shorten-memrefs
 @opindex mshorten-memrefs
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-1.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-1.c
new file mode 100644
index 00000000000..5c5623d9b2f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mno-inline-atomics" } */
+/* { dg-message "note: '__sync_fetch_and_nand' changed semantics in GCC 4.4" "fetch_and_nand" { target *-*-* } 0 } */
+/* { dg-final { scan-assembler "\tcall\t__sync_fetch_and_add_1" } } */
+/* { dg-final { scan-assembler "\tcall\t__sync_fetch_and_nand_1" } } */
+/* { dg-final { scan-assembler "\tcall\t__sync_bool_compare_and_swap_1" } } */
+
+char foo;
+char bar;
+char baz;
+
+int
+main ()
+{
+  __sync_fetch_and_add(&foo, 1);
+  __sync_fetch_and_nand(&bar, 1);
+  __sync_bool_compare_and_swap (&baz, 1, 2);
+}
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-2.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-2.c
new file mode 100644
index 00000000000..fdce7a5d71f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-2.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* Verify that subword atomics do not generate calls.  */
+/* { dg-options "-minline-atomics" } */
+/* { dg-message "note: '__sync_fetch_and_nand' changed semantics in GCC 4.4" "fetch_and_nand" { target *-*-* } 0 } */
+/* { dg-final { scan-assembler-not "\tcall\t__sync_fetch_and_add_1" } } */
+/* { dg-final { scan-assembler-not "\tcall\t__sync_fetch_and_nand_1" } } */
+/* { dg-final { scan-assembler-not "\tcall\t__sync_bool_compare_and_swap_1" } } */
+
+char foo;
+char bar;
+char baz;
+
+int
+main ()
+{
+  __sync_fetch_and_add(&foo, 1);
+  __sync_fetch_and_nand(&bar, 1);
+  __sync_bool_compare_and_swap (&baz, 1, 2);
+}
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-3.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-3.c
new file mode 100644
index 00000000000..709f3734377
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-3.c
@@ -0,0 +1,569 @@
+/* Check all char alignments.  */
+/* Duplicate logic as libatomic/testsuite/libatomic.c/atomic-op-1.c */
+/* Test __atomic routines for existence and proper execution on 1 byte
+   values with each valid memory model.  */
+/* { dg-do run } */
+/* { dg-options "-minline-atomics -Wno-address-of-packed-member" } */
+
+/* Test the execution of the __atomic_*OP builtin routines for a char.  */
+
+extern void abort(void);
+
+char count, res;
+const char init = ~0;
+
+struct A
+{
+   char a;
+   char b;
+   char c;
+   char d;
+} __attribute__ ((packed)) A;
+
+/* The fetch_op routines return the original value before the operation.  */
+
+void
+test_fetch_add (char* v)
+{
+  *v = 0;
+  count = 1;
+
+  if (__atomic_fetch_add (v, count, __ATOMIC_RELAXED) != 0)
+    abort ();
+
+  if (__atomic_fetch_add (v, 1, __ATOMIC_CONSUME) != 1)
+    abort ();
+
+  if (__atomic_fetch_add (v, count, __ATOMIC_ACQUIRE) != 2)
+    abort ();
+
+  if (__atomic_fetch_add (v, 1, __ATOMIC_RELEASE) != 3)
+    abort ();
+
+  if (__atomic_fetch_add (v, count, __ATOMIC_ACQ_REL) != 4)
+    abort ();
+
+  if (__atomic_fetch_add (v, 1, __ATOMIC_SEQ_CST) != 5)
+    abort ();
+}
+
+
+void
+test_fetch_sub (char* v)
+{
+  *v = res = 20;
+  count = 0;
+
+  if (__atomic_fetch_sub (v, count + 1, __ATOMIC_RELAXED) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, 1, __ATOMIC_CONSUME) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, count + 1, __ATOMIC_ACQUIRE) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, 1, __ATOMIC_RELEASE) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, count + 1, __ATOMIC_ACQ_REL) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, 1, __ATOMIC_SEQ_CST) !=  res--)
+    abort ();
+}
+
+void
+test_fetch_and (char* v)
+{
+  *v = init;
+
+  if (__atomic_fetch_and (v, 0, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_fetch_and (v, init, __ATOMIC_CONSUME) !=  0)
+    abort ();
+
+  if (__atomic_fetch_and (v, 0, __ATOMIC_ACQUIRE) !=  0)
+    abort ();
+
+  *v = ~*v;
+  if (__atomic_fetch_and (v, init, __ATOMIC_RELEASE) !=  init)
+    abort ();
+
+  if (__atomic_fetch_and (v, 0, __ATOMIC_ACQ_REL) !=  init)
+    abort ();
+
+  if (__atomic_fetch_and (v, 0, __ATOMIC_SEQ_CST) !=  0)
+    abort ();
+}
+
+void
+test_fetch_nand (char* v)
+{
+  *v = init;
+
+  if (__atomic_fetch_nand (v, 0, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_fetch_nand (v, init, __ATOMIC_CONSUME) !=  init)
+    abort ();
+
+  if (__atomic_fetch_nand (v, 0, __ATOMIC_ACQUIRE) !=  0 )
+    abort ();
+
+  if (__atomic_fetch_nand (v, init, __ATOMIC_RELEASE) !=  init)
+    abort ();
+
+  if (__atomic_fetch_nand (v, init, __ATOMIC_ACQ_REL) !=  0)
+    abort ();
+
+  if (__atomic_fetch_nand (v, 0, __ATOMIC_SEQ_CST) !=  init)
+    abort ();
+}
+
+void
+test_fetch_xor (char* v)
+{
+  *v = init;
+  count = 0;
+
+  if (__atomic_fetch_xor (v, count, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_fetch_xor (v, ~count, __ATOMIC_CONSUME) !=  init)
+    abort ();
+
+  if (__atomic_fetch_xor (v, 0, __ATOMIC_ACQUIRE) !=  0)
+    abort ();
+
+  if (__atomic_fetch_xor (v, ~count, __ATOMIC_RELEASE) !=  0)
+    abort ();
+
+  if (__atomic_fetch_xor (v, 0, __ATOMIC_ACQ_REL) !=  init)
+    abort ();
+
+  if (__atomic_fetch_xor (v, ~count, __ATOMIC_SEQ_CST) !=  init)
+    abort ();
+}
+
+void
+test_fetch_or (char* v)
+{
+  *v = 0;
+  count = 1;
+
+  if (__atomic_fetch_or (v, count, __ATOMIC_RELAXED) !=  0)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, 2, __ATOMIC_CONSUME) !=  1)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, count, __ATOMIC_ACQUIRE) !=  3)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, 8, __ATOMIC_RELEASE) !=  7)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, count, __ATOMIC_ACQ_REL) !=  15)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, count, __ATOMIC_SEQ_CST) !=  31)
+    abort ();
+}
+
+/* The OP_fetch routines return the new value after the operation.  */
+
+void
+test_add_fetch (char* v)
+{
+  *v = 0;
+  count = 1;
+
+  if (__atomic_add_fetch (v, count, __ATOMIC_RELAXED) != 1)
+    abort ();
+
+  if (__atomic_add_fetch (v, 1, __ATOMIC_CONSUME) != 2)
+    abort ();
+
+  if (__atomic_add_fetch (v, count, __ATOMIC_ACQUIRE) != 3)
+    abort ();
+
+  if (__atomic_add_fetch (v, 1, __ATOMIC_RELEASE) != 4)
+    abort ();
+
+  if (__atomic_add_fetch (v, count, __ATOMIC_ACQ_REL) != 5)
+    abort ();
+
+  if (__atomic_add_fetch (v, count, __ATOMIC_SEQ_CST) != 6)
+    abort ();
+}
+
+
+void
+test_sub_fetch (char* v)
+{
+  *v = res = 20;
+  count = 0;
+
+  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_RELAXED) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, 1, __ATOMIC_CONSUME) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_ACQUIRE) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, 1, __ATOMIC_RELEASE) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_ACQ_REL) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_SEQ_CST) !=  --res)
+    abort ();
+}
+
+void
+test_and_fetch (char* v)
+{
+  *v = init;
+
+  if (__atomic_and_fetch (v, 0, __ATOMIC_RELAXED) !=  0)
+    abort ();
+
+  *v = init;
+  if (__atomic_and_fetch (v, init, __ATOMIC_CONSUME) !=  init)
+    abort ();
+
+  if (__atomic_and_fetch (v, 0, __ATOMIC_ACQUIRE) !=  0)
+    abort ();
+
+  *v = ~*v;
+  if (__atomic_and_fetch (v, init, __ATOMIC_RELEASE) !=  init)
+    abort ();
+
+  if (__atomic_and_fetch (v, 0, __ATOMIC_ACQ_REL) !=  0)
+    abort ();
+
+  *v = ~*v;
+  if (__atomic_and_fetch (v, 0, __ATOMIC_SEQ_CST) !=  0)
+    abort ();
+}
+
+void
+test_nand_fetch (char* v)
+{
+  *v = init;
+
+  if (__atomic_nand_fetch (v, 0, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_nand_fetch (v, init, __ATOMIC_CONSUME) !=  0)
+    abort ();
+
+  if (__atomic_nand_fetch (v, 0, __ATOMIC_ACQUIRE) !=  init)
+    abort ();
+
+  if (__atomic_nand_fetch (v, init, __ATOMIC_RELEASE) !=  0)
+    abort ();
+
+  if (__atomic_nand_fetch (v, init, __ATOMIC_ACQ_REL) !=  init)
+    abort ();
+
+  if (__atomic_nand_fetch (v, 0, __ATOMIC_SEQ_CST) !=  init)
+    abort ();
+}
+
+
+
+void
+test_xor_fetch (char* v)
+{
+  *v = init;
+  count = 0;
+
+  if (__atomic_xor_fetch (v, count, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_xor_fetch (v, ~count, __ATOMIC_CONSUME) !=  0)
+    abort ();
+
+  if (__atomic_xor_fetch (v, 0, __ATOMIC_ACQUIRE) !=  0)
+    abort ();
+
+  if (__atomic_xor_fetch (v, ~count, __ATOMIC_RELEASE) !=  init)
+    abort ();
+
+  if (__atomic_xor_fetch (v, 0, __ATOMIC_ACQ_REL) !=  init)
+    abort ();
+
+  if (__atomic_xor_fetch (v, ~count, __ATOMIC_SEQ_CST) !=  0)
+    abort ();
+}
+
+void
+test_or_fetch (char* v)
+{
+  *v = 0;
+  count = 1;
+
+  if (__atomic_or_fetch (v, count, __ATOMIC_RELAXED) !=  1)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, 2, __ATOMIC_CONSUME) !=  3)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, count, __ATOMIC_ACQUIRE) !=  7)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, 8, __ATOMIC_RELEASE) !=  15)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, count, __ATOMIC_ACQ_REL) !=  31)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, count, __ATOMIC_SEQ_CST) !=  63)
+    abort ();
+}
+
+
+/* Test the OP routines with a result which isn't used. Use both variations
+   within each function.  */
+
+void
+test_add (char* v)
+{
+  *v = 0;
+  count = 1;
+
+  __atomic_add_fetch (v, count, __ATOMIC_RELAXED);
+  if (*v != 1)
+    abort ();
+
+  __atomic_fetch_add (v, count, __ATOMIC_CONSUME);
+  if (*v != 2)
+    abort ();
+
+  __atomic_add_fetch (v, 1 , __ATOMIC_ACQUIRE);
+  if (*v != 3)
+    abort ();
+
+  __atomic_fetch_add (v, 1, __ATOMIC_RELEASE);
+  if (*v != 4)
+    abort ();
+
+  __atomic_add_fetch (v, count, __ATOMIC_ACQ_REL);
+  if (*v != 5)
+    abort ();
+
+  __atomic_fetch_add (v, count, __ATOMIC_SEQ_CST);
+  if (*v != 6)
+    abort ();
+}
+
+
+void
+test_sub (char* v)
+{
+  *v = res = 20;
+  count = 0;
+
+  __atomic_sub_fetch (v, count + 1, __ATOMIC_RELAXED);
+  if (*v != --res)
+    abort ();
+
+  __atomic_fetch_sub (v, count + 1, __ATOMIC_CONSUME);
+  if (*v != --res)
+    abort ();
+
+  __atomic_sub_fetch (v, 1, __ATOMIC_ACQUIRE);
+  if (*v != --res)
+    abort ();
+
+  __atomic_fetch_sub (v, 1, __ATOMIC_RELEASE);
+  if (*v != --res)
+    abort ();
+
+  __atomic_sub_fetch (v, count + 1, __ATOMIC_ACQ_REL);
+  if (*v != --res)
+    abort ();
+
+  __atomic_fetch_sub (v, count + 1, __ATOMIC_SEQ_CST);
+  if (*v != --res)
+    abort ();
+}
+
+void
+test_and (char* v)
+{
+  *v = init;
+
+  __atomic_and_fetch (v, 0, __ATOMIC_RELAXED);
+  if (*v != 0)
+    abort ();
+
+  *v = init;
+  __atomic_fetch_and (v, init, __ATOMIC_CONSUME);
+  if (*v != init)
+    abort ();
+
+  __atomic_and_fetch (v, 0, __ATOMIC_ACQUIRE);
+  if (*v != 0)
+    abort ();
+
+  *v = ~*v;
+  __atomic_fetch_and (v, init, __ATOMIC_RELEASE);
+  if (*v != init)
+    abort ();
+
+  __atomic_and_fetch (v, 0, __ATOMIC_ACQ_REL);
+  if (*v != 0)
+    abort ();
+
+  *v = ~*v;
+  __atomic_fetch_and (v, 0, __ATOMIC_SEQ_CST);
+  if (*v != 0)
+    abort ();
+}
+
+void
+test_nand (char* v)
+{
+  *v = init;
+
+  __atomic_fetch_nand (v, 0, __ATOMIC_RELAXED);
+  if (*v != init)
+    abort ();
+
+  __atomic_fetch_nand (v, init, __ATOMIC_CONSUME);
+  if (*v != 0)
+    abort ();
+
+  __atomic_nand_fetch (v, 0, __ATOMIC_ACQUIRE);
+  if (*v != init)
+    abort ();
+
+  __atomic_nand_fetch (v, init, __ATOMIC_RELEASE);
+  if (*v != 0)
+    abort ();
+
+  __atomic_fetch_nand (v, init, __ATOMIC_ACQ_REL);
+  if (*v != init)
+    abort ();
+
+  __atomic_nand_fetch (v, 0, __ATOMIC_SEQ_CST);
+  if (*v != init)
+    abort ();
+}
+
+
+
+void
+test_xor (char* v)
+{
+  *v = init;
+  count = 0;
+
+  __atomic_xor_fetch (v, count, __ATOMIC_RELAXED);
+  if (*v != init)
+    abort ();
+
+  __atomic_fetch_xor (v, ~count, __ATOMIC_CONSUME);
+  if (*v != 0)
+    abort ();
+
+  __atomic_xor_fetch (v, 0, __ATOMIC_ACQUIRE);
+  if (*v != 0)
+    abort ();
+
+  __atomic_fetch_xor (v, ~count, __ATOMIC_RELEASE);
+  if (*v != init)
+    abort ();
+
+  __atomic_fetch_xor (v, 0, __ATOMIC_ACQ_REL);
+  if (*v != init)
+    abort ();
+
+  __atomic_xor_fetch (v, ~count, __ATOMIC_SEQ_CST);
+  if (*v != 0)
+    abort ();
+}
+
+void
+test_or (char* v)
+{
+  *v = 0;
+  count = 1;
+
+  __atomic_or_fetch (v, count, __ATOMIC_RELAXED);
+  if (*v != 1)
+    abort ();
+
+  count *= 2;
+  __atomic_fetch_or (v, count, __ATOMIC_CONSUME);
+  if (*v != 3)
+    abort ();
+
+  count *= 2;
+  __atomic_or_fetch (v, 4, __ATOMIC_ACQUIRE);
+  if (*v != 7)
+    abort ();
+
+  count *= 2;
+  __atomic_fetch_or (v, 8, __ATOMIC_RELEASE);
+  if (*v != 15)
+    abort ();
+
+  count *= 2;
+  __atomic_or_fetch (v, count, __ATOMIC_ACQ_REL);
+  if (*v != 31)
+    abort ();
+
+  count *= 2;
+  __atomic_fetch_or (v, count, __ATOMIC_SEQ_CST);
+  if (*v != 63)
+    abort ();
+}
+
+int
+main ()
+{
+  char* V[] = {&A.a, &A.b, &A.c, &A.d};
+
+  for (int i = 0; i < 4; i++) {
+    test_fetch_add (V[i]);
+    test_fetch_sub (V[i]);
+    test_fetch_and (V[i]);
+    test_fetch_nand (V[i]);
+    test_fetch_xor (V[i]);
+    test_fetch_or (V[i]);
+
+    test_add_fetch (V[i]);
+    test_sub_fetch (V[i]);
+    test_and_fetch (V[i]);
+    test_nand_fetch (V[i]);
+    test_xor_fetch (V[i]);
+    test_or_fetch (V[i]);
+
+    test_add (V[i]);
+    test_sub (V[i]);
+    test_and (V[i]);
+    test_nand (V[i]);
+    test_xor (V[i]);
+    test_or (V[i]);
+  }
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-4.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-4.c
new file mode 100644
index 00000000000..eecfaae5cc6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-4.c
@@ -0,0 +1,566 @@
+/* Check all short alignments.  */
+/* Duplicate logic as libatomic/testsuite/libatomic.c/atomic-op-2.c */
+/* Test __atomic routines for existence and proper execution on 2 byte
+   values with each valid memory model.  */
+/* { dg-do run } */
+/* { dg-options "-minline-atomics -Wno-address-of-packed-member" } */
+
+/* Test the execution of the __atomic_*OP builtin routines for a short.  */
+
+extern void abort(void);
+
+short count, res;
+const short init = ~0;
+
+struct A
+{
+   short a;
+   short b;
+} __attribute__ ((packed)) A;
+
+/* The fetch_op routines return the original value before the operation.  */
+
+void
+test_fetch_add (short* v)
+{
+  *v = 0;
+  count = 1;
+
+  if (__atomic_fetch_add (v, count, __ATOMIC_RELAXED) != 0)
+    abort ();
+
+  if (__atomic_fetch_add (v, 1, __ATOMIC_CONSUME) != 1)
+    abort ();
+
+  if (__atomic_fetch_add (v, count, __ATOMIC_ACQUIRE) != 2)
+    abort ();
+
+  if (__atomic_fetch_add (v, 1, __ATOMIC_RELEASE) != 3)
+    abort ();
+
+  if (__atomic_fetch_add (v, count, __ATOMIC_ACQ_REL) != 4)
+    abort ();
+
+  if (__atomic_fetch_add (v, 1, __ATOMIC_SEQ_CST) != 5)
+    abort ();
+}
+
+
+void
+test_fetch_sub (short* v)
+{
+  *v = res = 20;
+  count = 0;
+
+  if (__atomic_fetch_sub (v, count + 1, __ATOMIC_RELAXED) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, 1, __ATOMIC_CONSUME) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, count + 1, __ATOMIC_ACQUIRE) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, 1, __ATOMIC_RELEASE) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, count + 1, __ATOMIC_ACQ_REL) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, 1, __ATOMIC_SEQ_CST) !=  res--)
+    abort ();
+}
+
+void
+test_fetch_and (short* v)
+{
+  *v = init;
+
+  if (__atomic_fetch_and (v, 0, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_fetch_and (v, init, __ATOMIC_CONSUME) !=  0)
+    abort ();
+
+  if (__atomic_fetch_and (v, 0, __ATOMIC_ACQUIRE) !=  0)
+    abort ();
+
+  *v = ~*v;
+  if (__atomic_fetch_and (v, init, __ATOMIC_RELEASE) !=  init)
+    abort ();
+
+  if (__atomic_fetch_and (v, 0, __ATOMIC_ACQ_REL) !=  init)
+    abort ();
+
+  if (__atomic_fetch_and (v, 0, __ATOMIC_SEQ_CST) !=  0)
+    abort ();
+}
+
+void
+test_fetch_nand (short* v)
+{
+  *v = init;
+
+  if (__atomic_fetch_nand (v, 0, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_fetch_nand (v, init, __ATOMIC_CONSUME) !=  init)
+    abort ();
+
+  if (__atomic_fetch_nand (v, 0, __ATOMIC_ACQUIRE) !=  0 )
+    abort ();
+
+  if (__atomic_fetch_nand (v, init, __ATOMIC_RELEASE) !=  init)
+    abort ();
+
+  if (__atomic_fetch_nand (v, init, __ATOMIC_ACQ_REL) !=  0)
+    abort ();
+
+  if (__atomic_fetch_nand (v, 0, __ATOMIC_SEQ_CST) !=  init)
+    abort ();
+}
+
+void
+test_fetch_xor (short* v)
+{
+  *v = init;
+  count = 0;
+
+  if (__atomic_fetch_xor (v, count, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_fetch_xor (v, ~count, __ATOMIC_CONSUME) !=  init)
+    abort ();
+
+  if (__atomic_fetch_xor (v, 0, __ATOMIC_ACQUIRE) !=  0)
+    abort ();
+
+  if (__atomic_fetch_xor (v, ~count, __ATOMIC_RELEASE) !=  0)
+    abort ();
+
+  if (__atomic_fetch_xor (v, 0, __ATOMIC_ACQ_REL) !=  init)
+    abort ();
+
+  if (__atomic_fetch_xor (v, ~count, __ATOMIC_SEQ_CST) !=  init)
+    abort ();
+}
+
+void
+test_fetch_or (short* v)
+{
+  *v = 0;
+  count = 1;
+
+  if (__atomic_fetch_or (v, count, __ATOMIC_RELAXED) !=  0)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, 2, __ATOMIC_CONSUME) !=  1)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, count, __ATOMIC_ACQUIRE) !=  3)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, 8, __ATOMIC_RELEASE) !=  7)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, count, __ATOMIC_ACQ_REL) !=  15)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, count, __ATOMIC_SEQ_CST) !=  31)
+    abort ();
+}
+
+/* The OP_fetch routines return the new value after the operation.  */
+
+void
+test_add_fetch (short* v)
+{
+  *v = 0;
+  count = 1;
+
+  if (__atomic_add_fetch (v, count, __ATOMIC_RELAXED) != 1)
+    abort ();
+
+  if (__atomic_add_fetch (v, 1, __ATOMIC_CONSUME) != 2)
+    abort ();
+
+  if (__atomic_add_fetch (v, count, __ATOMIC_ACQUIRE) != 3)
+    abort ();
+
+  if (__atomic_add_fetch (v, 1, __ATOMIC_RELEASE) != 4)
+    abort ();
+
+  if (__atomic_add_fetch (v, count, __ATOMIC_ACQ_REL) != 5)
+    abort ();
+
+  if (__atomic_add_fetch (v, count, __ATOMIC_SEQ_CST) != 6)
+    abort ();
+}
+
+
+void
+test_sub_fetch (short* v)
+{
+  *v = res = 20;
+  count = 0;
+
+  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_RELAXED) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, 1, __ATOMIC_CONSUME) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_ACQUIRE) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, 1, __ATOMIC_RELEASE) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_ACQ_REL) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_SEQ_CST) !=  --res)
+    abort ();
+}
+
+void
+test_and_fetch (short* v)
+{
+  *v = init;
+
+  if (__atomic_and_fetch (v, 0, __ATOMIC_RELAXED) !=  0)
+    abort ();
+
+  *v = init;
+  if (__atomic_and_fetch (v, init, __ATOMIC_CONSUME) !=  init)
+    abort ();
+
+  if (__atomic_and_fetch (v, 0, __ATOMIC_ACQUIRE) !=  0)
+    abort ();
+
+  *v = ~*v;
+  if (__atomic_and_fetch (v, init, __ATOMIC_RELEASE) !=  init)
+    abort ();
+
+  if (__atomic_and_fetch (v, 0, __ATOMIC_ACQ_REL) !=  0)
+    abort ();
+
+  *v = ~*v;
+  if (__atomic_and_fetch (v, 0, __ATOMIC_SEQ_CST) !=  0)
+    abort ();
+}
+
+void
+test_nand_fetch (short* v)
+{
+  *v = init;
+
+  if (__atomic_nand_fetch (v, 0, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_nand_fetch (v, init, __ATOMIC_CONSUME) !=  0)
+    abort ();
+
+  if (__atomic_nand_fetch (v, 0, __ATOMIC_ACQUIRE) !=  init)
+    abort ();
+
+  if (__atomic_nand_fetch (v, init, __ATOMIC_RELEASE) !=  0)
+    abort ();
+
+  if (__atomic_nand_fetch (v, init, __ATOMIC_ACQ_REL) !=  init)
+    abort ();
+
+  if (__atomic_nand_fetch (v, 0, __ATOMIC_SEQ_CST) !=  init)
+    abort ();
+}
+
+
+
+void
+test_xor_fetch (short* v)
+{
+  *v = init;
+  count = 0;
+
+  if (__atomic_xor_fetch (v, count, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_xor_fetch (v, ~count, __ATOMIC_CONSUME) !=  0)
+    abort ();
+
+  if (__atomic_xor_fetch (v, 0, __ATOMIC_ACQUIRE) !=  0)
+    abort ();
+
+  if (__atomic_xor_fetch (v, ~count, __ATOMIC_RELEASE) !=  init)
+    abort ();
+
+  if (__atomic_xor_fetch (v, 0, __ATOMIC_ACQ_REL) !=  init)
+    abort ();
+
+  if (__atomic_xor_fetch (v, ~count, __ATOMIC_SEQ_CST) !=  0)
+    abort ();
+}
+
+void
+test_or_fetch (short* v)
+{
+  *v = 0;
+  count = 1;
+
+  if (__atomic_or_fetch (v, count, __ATOMIC_RELAXED) !=  1)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, 2, __ATOMIC_CONSUME) !=  3)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, count, __ATOMIC_ACQUIRE) !=  7)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, 8, __ATOMIC_RELEASE) !=  15)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, count, __ATOMIC_ACQ_REL) !=  31)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, count, __ATOMIC_SEQ_CST) !=  63)
+    abort ();
+}
+
+
+/* Test the OP routines with a result which isn't used. Use both variations
+   within each function.  */
+
+void
+test_add (short* v)
+{
+  *v = 0;
+  count = 1;
+
+  __atomic_add_fetch (v, count, __ATOMIC_RELAXED);
+  if (*v != 1)
+    abort ();
+
+  __atomic_fetch_add (v, count, __ATOMIC_CONSUME);
+  if (*v != 2)
+    abort ();
+
+  __atomic_add_fetch (v, 1 , __ATOMIC_ACQUIRE);
+  if (*v != 3)
+    abort ();
+
+  __atomic_fetch_add (v, 1, __ATOMIC_RELEASE);
+  if (*v != 4)
+    abort ();
+
+  __atomic_add_fetch (v, count, __ATOMIC_ACQ_REL);
+  if (*v != 5)
+    abort ();
+
+  __atomic_fetch_add (v, count, __ATOMIC_SEQ_CST);
+  if (*v != 6)
+    abort ();
+}
+
+
+void
+test_sub (short* v)
+{
+  *v = res = 20;
+  count = 0;
+
+  __atomic_sub_fetch (v, count + 1, __ATOMIC_RELAXED);
+  if (*v != --res)
+    abort ();
+
+  __atomic_fetch_sub (v, count + 1, __ATOMIC_CONSUME);
+  if (*v != --res)
+    abort ();
+
+  __atomic_sub_fetch (v, 1, __ATOMIC_ACQUIRE);
+  if (*v != --res)
+    abort ();
+
+  __atomic_fetch_sub (v, 1, __ATOMIC_RELEASE);
+  if (*v != --res)
+    abort ();
+
+  __atomic_sub_fetch (v, count + 1, __ATOMIC_ACQ_REL);
+  if (*v != --res)
+    abort ();
+
+  __atomic_fetch_sub (v, count + 1, __ATOMIC_SEQ_CST);
+  if (*v != --res)
+    abort ();
+}
+
+void
+test_and (short* v)
+{
+  *v = init;
+
+  __atomic_and_fetch (v, 0, __ATOMIC_RELAXED);
+  if (*v != 0)
+    abort ();
+
+  *v = init;
+  __atomic_fetch_and (v, init, __ATOMIC_CONSUME);
+  if (*v != init)
+    abort ();
+
+  __atomic_and_fetch (v, 0, __ATOMIC_ACQUIRE);
+  if (*v != 0)
+    abort ();
+
+  *v = ~*v;
+  __atomic_fetch_and (v, init, __ATOMIC_RELEASE);
+  if (*v != init)
+    abort ();
+
+  __atomic_and_fetch (v, 0, __ATOMIC_ACQ_REL);
+  if (*v != 0)
+    abort ();
+
+  *v = ~*v;
+  __atomic_fetch_and (v, 0, __ATOMIC_SEQ_CST);
+  if (*v != 0)
+    abort ();
+}
+
+void
+test_nand (short* v)
+{
+  *v = init;
+
+  __atomic_fetch_nand (v, 0, __ATOMIC_RELAXED);
+  if (*v != init)
+    abort ();
+
+  __atomic_fetch_nand (v, init, __ATOMIC_CONSUME);
+  if (*v != 0)
+    abort ();
+
+  __atomic_nand_fetch (v, 0, __ATOMIC_ACQUIRE);
+  if (*v != init)
+    abort ();
+
+  __atomic_nand_fetch (v, init, __ATOMIC_RELEASE);
+  if (*v != 0)
+    abort ();
+
+  __atomic_fetch_nand (v, init, __ATOMIC_ACQ_REL);
+  if (*v != init)
+    abort ();
+
+  __atomic_nand_fetch (v, 0, __ATOMIC_SEQ_CST);
+  if (*v != init)
+    abort ();
+}
+
+
+
+void
+test_xor (short* v)
+{
+  *v = init;
+  count = 0;
+
+  __atomic_xor_fetch (v, count, __ATOMIC_RELAXED);
+  if (*v != init)
+    abort ();
+
+  __atomic_fetch_xor (v, ~count, __ATOMIC_CONSUME);
+  if (*v != 0)
+    abort ();
+
+  __atomic_xor_fetch (v, 0, __ATOMIC_ACQUIRE);
+  if (*v != 0)
+    abort ();
+
+  __atomic_fetch_xor (v, ~count, __ATOMIC_RELEASE);
+  if (*v != init)
+    abort ();
+
+  __atomic_fetch_xor (v, 0, __ATOMIC_ACQ_REL);
+  if (*v != init)
+    abort ();
+
+  __atomic_xor_fetch (v, ~count, __ATOMIC_SEQ_CST);
+  if (*v != 0)
+    abort ();
+}
+
+void
+test_or (short* v)
+{
+  *v = 0;
+  count = 1;
+
+  __atomic_or_fetch (v, count, __ATOMIC_RELAXED);
+  if (*v != 1)
+    abort ();
+
+  count *= 2;
+  __atomic_fetch_or (v, count, __ATOMIC_CONSUME);
+  if (*v != 3)
+    abort ();
+
+  count *= 2;
+  __atomic_or_fetch (v, 4, __ATOMIC_ACQUIRE);
+  if (*v != 7)
+    abort ();
+
+  count *= 2;
+  __atomic_fetch_or (v, 8, __ATOMIC_RELEASE);
+  if (*v != 15)
+    abort ();
+
+  count *= 2;
+  __atomic_or_fetch (v, count, __ATOMIC_ACQ_REL);
+  if (*v != 31)
+    abort ();
+
+  count *= 2;
+  __atomic_fetch_or (v, count, __ATOMIC_SEQ_CST);
+  if (*v != 63)
+    abort ();
+}
+
+int
+main () {
+  short* V[] = {&A.a, &A.b};
+
+  for (int i = 0; i < 2; i++) {
+    test_fetch_add (V[i]);
+    test_fetch_sub (V[i]);
+    test_fetch_and (V[i]);
+    test_fetch_nand (V[i]);
+    test_fetch_xor (V[i]);
+    test_fetch_or (V[i]);
+
+    test_add_fetch (V[i]);
+    test_sub_fetch (V[i]);
+    test_and_fetch (V[i]);
+    test_nand_fetch (V[i]);
+    test_xor_fetch (V[i]);
+    test_or_fetch (V[i]);
+
+    test_add (V[i]);
+    test_sub (V[i]);
+    test_and (V[i]);
+    test_nand (V[i]);
+    test_xor (V[i]);
+    test_or (V[i]);
+  }
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-5.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-5.c
new file mode 100644
index 00000000000..52093894a79
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-5.c
@@ -0,0 +1,87 @@
+/* Test __atomic routines for existence and proper execution on 1 byte
+   values with each valid memory model.  */
+/* Duplicate logic as libatomic/testsuite/libatomic.c/atomic-compare-exchange-1.c */
+/* { dg-do run } */
+/* { dg-options "-minline-atomics" } */
+
+/* Test the execution of the __atomic_compare_exchange_n builtin for a char.  */
+
+extern void abort(void);
+
+char v = 0;
+char expected = 0;
+char max = ~0;
+char desired = ~0;
+char zero = 0;
+
+#define STRONG 0
+#define WEAK 1
+
+int
+main ()
+{
+
+  if (!__atomic_compare_exchange_n (&v, &expected, max, STRONG , __ATOMIC_RELAXED, __ATOMIC_RELAXED))
+    abort ();
+  if (expected != 0)
+    abort ();
+
+  if (__atomic_compare_exchange_n (&v, &expected, 0, STRONG , __ATOMIC_ACQUIRE, __ATOMIC_RELAXED))
+    abort ();
+  if (expected != max)
+    abort ();
+
+  if (!__atomic_compare_exchange_n (&v, &expected, 0, STRONG , __ATOMIC_RELEASE, __ATOMIC_ACQUIRE))
+    abort ();
+  if (expected != max)
+    abort ();
+  if (v != 0)
+    abort ();
+
+  if (__atomic_compare_exchange_n (&v, &expected, desired, WEAK, __ATOMIC_ACQ_REL, __ATOMIC_ACQUIRE))
+    abort ();
+  if (expected != 0)
+    abort ();
+
+  if (!__atomic_compare_exchange_n (&v, &expected, desired, STRONG , __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST))
+    abort ();
+  if (expected != 0)
+    abort ();
+  if (v != max)
+    abort ();
+
+  /* Now test the generic version.  */
+
+  v = 0;
+
+  if (!__atomic_compare_exchange (&v, &expected, &max, STRONG, __ATOMIC_RELAXED, __ATOMIC_RELAXED))
+    abort ();
+  if (expected != 0)
+    abort ();
+
+  if (__atomic_compare_exchange (&v, &expected, &zero, STRONG , __ATOMIC_ACQUIRE, __ATOMIC_RELAXED))
+    abort ();
+  if (expected != max)
+    abort ();
+
+  if (!__atomic_compare_exchange (&v, &expected, &zero, STRONG , __ATOMIC_RELEASE, __ATOMIC_ACQUIRE))
+    abort ();
+  if (expected != max)
+    abort ();
+  if (v != 0)
+    abort ();
+
+  if (__atomic_compare_exchange (&v, &expected, &desired, WEAK, __ATOMIC_ACQ_REL, __ATOMIC_ACQUIRE))
+    abort ();
+  if (expected != 0)
+    abort ();
+
+  if (!__atomic_compare_exchange (&v, &expected, &desired, STRONG , __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST))
+    abort ();
+  if (expected != 0)
+    abort ();
+  if (v != max)
+    abort ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-6.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-6.c
new file mode 100644
index 00000000000..8fee8c44811
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-6.c
@@ -0,0 +1,87 @@
+/* Test __atomic routines for existence and proper execution on 2 byte
+   values with each valid memory model.  */
+/* Duplicate logic as libatomic/testsuite/libatomic.c/atomic-compare-exchange-2.c */
+/* { dg-do run } */
+/* { dg-options "-minline-atomics" } */
+
+/* Test the execution of the __atomic_compare_exchange_n builtin for a short.  */
+
+extern void abort(void);
+
+short v = 0;
+short expected = 0;
+short max = ~0;
+short desired = ~0;
+short zero = 0;
+
+#define STRONG 0
+#define WEAK 1
+
+int
+main ()
+{
+
+  if (!__atomic_compare_exchange_n (&v, &expected, max, STRONG , __ATOMIC_RELAXED, __ATOMIC_RELAXED))
+    abort ();
+  if (expected != 0)
+    abort ();
+
+  if (__atomic_compare_exchange_n (&v, &expected, 0, STRONG , __ATOMIC_ACQUIRE, __ATOMIC_RELAXED))
+    abort ();
+  if (expected != max)
+    abort ();
+
+  if (!__atomic_compare_exchange_n (&v, &expected, 0, STRONG , __ATOMIC_RELEASE, __ATOMIC_ACQUIRE))
+    abort ();
+  if (expected != max)
+    abort ();
+  if (v != 0)
+    abort ();
+
+  if (__atomic_compare_exchange_n (&v, &expected, desired, WEAK, __ATOMIC_ACQ_REL, __ATOMIC_ACQUIRE))
+    abort ();
+  if (expected != 0)
+    abort ();
+
+  if (!__atomic_compare_exchange_n (&v, &expected, desired, STRONG , __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST))
+    abort ();
+  if (expected != 0)
+    abort ();
+  if (v != max)
+    abort ();
+
+  /* Now test the generic version.  */
+
+  v = 0;
+
+  if (!__atomic_compare_exchange (&v, &expected, &max, STRONG, __ATOMIC_RELAXED, __ATOMIC_RELAXED))
+    abort ();
+  if (expected != 0)
+    abort ();
+
+  if (__atomic_compare_exchange (&v, &expected, &zero, STRONG , __ATOMIC_ACQUIRE, __ATOMIC_RELAXED))
+    abort ();
+  if (expected != max)
+    abort ();
+
+  if (!__atomic_compare_exchange (&v, &expected, &zero, STRONG , __ATOMIC_RELEASE, __ATOMIC_ACQUIRE))
+    abort ();
+  if (expected != max)
+    abort ();
+  if (v != 0)
+    abort ();
+
+  if (__atomic_compare_exchange (&v, &expected, &desired, WEAK, __ATOMIC_ACQ_REL, __ATOMIC_ACQUIRE))
+    abort ();
+  if (expected != 0)
+    abort ();
+
+  if (!__atomic_compare_exchange (&v, &expected, &desired, STRONG , __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST))
+    abort ();
+  if (expected != 0)
+    abort ();
+  if (v != max)
+    abort ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-7.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-7.c
new file mode 100644
index 00000000000..24c344c0ce3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-7.c
@@ -0,0 +1,69 @@
+/* Test __atomic routines for existence and proper execution on 1 byte
+   values with each valid memory model.  */
+/* Duplicate logic as libatomic/testsuite/libatomic.c/atomic-exchange-1.c */
+/* { dg-do run } */
+/* { dg-options "-minline-atomics" } */
+
+/* Test the execution of the __atomic_exchange_n builtin for a char.  */
+
+extern void abort(void);
+
+char v, count, ret;
+
+int
+main ()
+{
+  v = 0;
+  count = 0;
+
+  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_RELAXED) != count)
+    abort ();
+  count++;
+
+  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_ACQUIRE) != count)
+    abort ();
+  count++;
+
+  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_RELEASE) != count)
+    abort ();
+  count++;
+
+  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_ACQ_REL) != count)
+    abort ();
+  count++;
+
+  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_SEQ_CST) != count)
+    abort ();
+  count++;
+
+  /* Now test the generic version.  */
+
+  count++;
+
+  __atomic_exchange (&v, &count, &ret, __ATOMIC_RELAXED);
+  if (ret != count - 1 || v != count)
+    abort ();
+  count++;
+
+  __atomic_exchange (&v, &count, &ret, __ATOMIC_ACQUIRE);
+  if (ret != count - 1 || v != count)
+    abort ();
+  count++;
+
+  __atomic_exchange (&v, &count, &ret, __ATOMIC_RELEASE);
+  if (ret != count - 1 || v != count)
+    abort ();
+  count++;
+
+  __atomic_exchange (&v, &count, &ret, __ATOMIC_ACQ_REL);
+  if (ret != count - 1 || v != count)
+    abort ();
+  count++;
+
+  __atomic_exchange (&v, &count, &ret, __ATOMIC_SEQ_CST);
+  if (ret != count - 1 || v != count)
+    abort ();
+  count++;
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-8.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-8.c
new file mode 100644
index 00000000000..edc212df04e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-8.c
@@ -0,0 +1,69 @@
+/* Test __atomic routines for existence and proper execution on 2 byte
+   values with each valid memory model.  */
+/* Duplicate logic as libatomic/testsuite/libatomic.c/atomic-exchange-2.c */
+/* { dg-do run } */
+/* { dg-options "-minline-atomics" } */
+
+/* Test the execution of the __atomic_X builtin for a short.  */
+
+extern void abort(void);
+
+short v, count, ret;
+
+int
+main ()
+{
+  v = 0;
+  count = 0;
+
+  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_RELAXED) != count)
+    abort ();
+  count++;
+
+  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_ACQUIRE) != count)
+    abort ();
+  count++;
+
+  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_RELEASE) != count)
+    abort ();
+  count++;
+
+  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_ACQ_REL) != count)
+    abort ();
+  count++;
+
+  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_SEQ_CST) != count)
+    abort ();
+  count++;
+
+  /* Now test the generic version.  */
+
+  count++;
+
+  __atomic_exchange (&v, &count, &ret, __ATOMIC_RELAXED);
+  if (ret != count - 1 || v != count)
+    abort ();
+  count++;
+
+  __atomic_exchange (&v, &count, &ret, __ATOMIC_ACQUIRE);
+  if (ret != count - 1 || v != count)
+    abort ();
+  count++;
+
+  __atomic_exchange (&v, &count, &ret, __ATOMIC_RELEASE);
+  if (ret != count - 1 || v != count)
+    abort ();
+  count++;
+
+  __atomic_exchange (&v, &count, &ret, __ATOMIC_ACQ_REL);
+  if (ret != count - 1 || v != count)
+    abort ();
+  count++;
+
+  __atomic_exchange (&v, &count, &ret, __ATOMIC_SEQ_CST);
+  if (ret != count - 1 || v != count)
+    abort ();
+  count++;
+
+  return 0;
+}
diff --git a/libgcc/config/riscv/atomic.c b/libgcc/config/riscv/atomic.c
index 7007e7a20e4..a29909b97b5 100644
--- a/libgcc/config/riscv/atomic.c
+++ b/libgcc/config/riscv/atomic.c
@@ -30,6 +30,8 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 #define INVERT		"not %[tmp1], %[tmp1]\n\t"
 #define DONT_INVERT	""
 
+/* Logic duplicated in gcc/gcc/config/riscv/sync.md for use when inlining is enabled */
+
 #define GENERATE_FETCH_AND_OP(type, size, opname, insn, invert, cop)	\
   type __sync_fetch_and_ ## opname ## _ ## size (type *p, type v)	\
   {									\
-- 
2.25.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3] RISC-V: Add support for inlining subword atomic operations
  2022-04-19 17:17     ` [PATCH v3] RISC-V: Add support for inlining subword atomic operations Patrick O'Neill
@ 2022-05-03 11:59       ` Andreas Schwab
  0 siblings, 0 replies; 10+ messages in thread
From: Andreas Schwab @ 2022-05-03 11:59 UTC (permalink / raw)
  To: Patrick O'Neill; +Cc: gcc-patches, palmer, panrz

With inline atomics it should no longer be necessary to link -latomic
with -pthread.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-05-03 12:00 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-08  0:48 [PATCH v1] RISC-V: Add support for inlining subword atomic operations Patrick O'Neill
2022-02-23 21:49 ` Palmer Dabbelt
2022-03-10 22:36   ` [PATCH v2] RISCV: Add support for inlining subword atomics Patrick O'Neill
2022-04-08  2:48     ` Pan RZ
2022-04-08 16:48       ` Patrick O'Neill
2022-04-08 18:12         ` Pan RZ
2022-04-08 18:39           ` Patrick O'Neill
2022-04-08 20:29             ` Andreas Schwab
2022-04-19 17:17     ` [PATCH v3] RISC-V: Add support for inlining subword atomic operations Patrick O'Neill
2022-05-03 11:59       ` Andreas Schwab

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).