public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH v4] RISC-V: Add support for inlining subword atomic operations
@ 2022-08-21 21:58 Palmer Dabbelt
  2022-09-02 10:08 ` Kito Cheng
  2023-04-18 14:28 ` [PATCH v5] RISCV: Inline subword atomic ops Patrick O'Neill
  0 siblings, 2 replies; 24+ messages in thread
From: Palmer Dabbelt @ 2022-08-21 21:58 UTC (permalink / raw)
  To: gcc-patches; +Cc: Patrick O'Neill, Palmer Dabbelt

From: Patrick O'Neill <patrick@rivosinc.com>

RISC-V has no support for subword atomic operations; code currently
generates libatomic library calls.

This patch changes the default behavior to inline subword atomic calls
(using the same logic as the existing library call).
Behavior can be specified using the -minline-atomics and
-mno-inline-atomics command line flags.

gcc/libgcc/config/riscv/atomic.c has the same logic implemented in asm.
This will need to stay for backwards compatibility and the
-mno-inline-atomics flag.

2022-04-19 Patrick O'Neill <patrick@rivosinc.com>

	PR target/104338
	* riscv-protos.h: Add helper function stubs.
	* riscv.cc: Add helper functions for subword masking.
	* riscv.opt: Add command-line flag.
	* sync.md: Add masking logic and inline asm for fetch_and_op,
	fetch_and_nand, CAS, and exchange ops.
	* invoke.texi: Add blurb regarding command-line flag.
	* inline-atomics-1.c: New test.
	* inline-atomics-2.c: Likewise.
	* inline-atomics-3.c: Likewise.
	* inline-atomics-4.c: Likewise.
	* inline-atomics-5.c: Likewise.
	* inline-atomics-6.c: Likewise.
	* inline-atomics-7.c: Likewise.
	* inline-atomics-8.c: Likewise.
	* atomic.c: Add reference to duplicate logic.

Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
---
This is pretty much the same as the v3, I just rebased it onto trunk and
adjusted the documentation to mention libatomic.  I think there's still
room to do a more optimized implemnetation here, but we still need to
sort out the memory model implications.  Given that this is causing a
headache for the distro folks I think it's best to just take the simple
version now rather than waiting for something better -- we can always do
something better, it's just probably going to miss the next release.

This alone is pretty safe: we're just inlining the same code sequences
that libatomic generates, and since we provide static libatomic that
doesn't add to any future binary compatibility issues.

This has no new failures on trunk.  I also backported it to gcc-10
(as that's what my buildroot is using, which itself is a problem I need
to fix) and it can build/boot a simple glibc-based userspace fine.

OK for trunk?
---
 gcc/config/riscv/riscv-protos.h               |   2 +
 gcc/config/riscv/riscv.cc                     |  52 ++
 gcc/config/riscv/riscv.opt                    |   4 +
 gcc/config/riscv/sync.md                      | 318 ++++++++++
 gcc/doc/invoke.texi                           |   8 +
 .../gcc.target/riscv/inline-atomics-1.c       |  18 +
 .../gcc.target/riscv/inline-atomics-2.c       |  19 +
 .../gcc.target/riscv/inline-atomics-3.c       | 569 ++++++++++++++++++
 .../gcc.target/riscv/inline-atomics-4.c       | 566 +++++++++++++++++
 .../gcc.target/riscv/inline-atomics-5.c       |  87 +++
 .../gcc.target/riscv/inline-atomics-6.c       |  87 +++
 .../gcc.target/riscv/inline-atomics-7.c       |  69 +++
 .../gcc.target/riscv/inline-atomics-8.c       |  69 +++
 libgcc/config/riscv/atomic.c                  |   2 +
 14 files changed, 1870 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-8.c

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 20c2381c21a..14f3c8f0d4e 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -74,6 +74,8 @@ extern bool riscv_expand_block_move (rtx, rtx, rtx);
 extern bool riscv_store_data_bypass_p (rtx_insn *, rtx_insn *);
 extern rtx riscv_gen_gpr_save_insn (struct riscv_frame_info *);
 extern bool riscv_gpr_save_operation_p (rtx);
+extern void riscv_subword_address (rtx, rtx *, rtx *, rtx *, rtx *);
+extern void riscv_lshift_subword (machine_mode, rtx, rtx, rtx *);
 
 /* Routines implemented in riscv-c.cc.  */
 void riscv_cpu_cpp_builtins (cpp_reader *);
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 9d70974c893..1d8730d80e9 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -5791,6 +5791,58 @@ riscv_init_libfuncs (void)
   set_optab_libfunc (unord_optab, HFmode, NULL);
 }
 
+/* Helper function for extracting a subword from memory.  */
+
+void
+riscv_subword_address (rtx mem, rtx *aligned_mem, rtx *shift, rtx *mask,
+		       rtx *not_mask)
+{
+  /* Align the memory addess to a word.  */
+  rtx addr = force_reg (Pmode, XEXP (mem, 0));
+
+  rtx aligned_addr = gen_reg_rtx (Pmode);
+  emit_move_insn (aligned_addr,  gen_rtx_AND (Pmode, addr,
+					      gen_int_mode (-4, Pmode)));
+
+  *aligned_mem = change_address (mem, SImode, aligned_addr);
+
+  /* Calculate the shift amount.  */
+  *shift = gen_reg_rtx (SImode);
+  emit_move_insn (*shift, gen_rtx_AND (SImode, gen_lowpart (SImode, addr),
+				      gen_int_mode (3, SImode)));
+  emit_move_insn (*shift, gen_rtx_ASHIFT (SImode, *shift,
+					 gen_int_mode(3, SImode)));
+
+  /* Calculate the mask.  */
+  int unshifted_mask;
+  if (GET_MODE (mem) == QImode)
+    unshifted_mask = 0xFF;
+  else
+    unshifted_mask = 0xFFFF;
+
+  rtx mask_reg = gen_reg_rtx (SImode);
+  emit_move_insn (mask_reg, gen_int_mode(unshifted_mask, SImode));
+
+  emit_move_insn (*mask, gen_rtx_ASHIFT(SImode, mask_reg,
+				       gen_lowpart (QImode, *shift)));
+
+  emit_move_insn (*not_mask, gen_rtx_NOT(SImode, *mask));
+}
+
+/* Leftshift a subword within an SImode register.  */
+
+void
+riscv_lshift_subword (machine_mode mode, rtx value, rtx shift,
+		     rtx *shifted_value)
+{
+  rtx value_reg = gen_reg_rtx (SImode);
+  emit_move_insn (value_reg, simplify_gen_subreg (SImode, value,
+						  mode, 0));
+
+  emit_move_insn(*shifted_value, gen_rtx_ASHIFT(SImode, value_reg,
+						gen_lowpart (QImode, shift)));
+}
+
 /* Initialize the GCC target structure.  */
 #undef TARGET_ASM_ALIGNED_HI_OP
 #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t"
diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index fbca91b956c..1b668fd953f 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -231,3 +231,7 @@ Enum(isa_spec_class) String(20191213) Value(ISA_SPEC_CLASS_20191213)
 misa-spec=
 Target RejectNegative Joined Enum(isa_spec_class) Var(riscv_isa_spec) Init(TARGET_DEFAULT_ISA_SPEC)
 Set the version of RISC-V ISA spec.
+
+minline-atomics
+Target Var(TARGET_INLINE_SUBWORD_ATOMIC) Init(1)
+Always inline subword atomic operations.
diff --git a/gcc/config/riscv/sync.md b/gcc/config/riscv/sync.md
index 86b41e6b00a..198ddbed8e9 100644
--- a/gcc/config/riscv/sync.md
+++ b/gcc/config/riscv/sync.md
@@ -21,8 +21,11 @@
 
 (define_c_enum "unspec" [
   UNSPEC_COMPARE_AND_SWAP
+  UNSPEC_COMPARE_AND_SWAP_SUBWORD
   UNSPEC_SYNC_OLD_OP
+  UNSPEC_SYNC_OLD_OP_SUBWORD
   UNSPEC_SYNC_EXCHANGE
+  UNSPEC_SYNC_EXCHANGE_SUBWORD
   UNSPEC_ATOMIC_STORE
   UNSPEC_MEMORY_BARRIER
 ])
@@ -92,6 +95,145 @@
   "%F3amo<insn>.<amo>%A3 %0,%z2,%1"
   [(set (attr "length") (const_int 8))])
 
+(define_expand "atomic_fetch_<atomic_optab><mode>"
+  [(set (match_operand:SHORT 0 "register_operand" "=&r")	      ;; old value at mem
+	(match_operand:SHORT 1 "memory_operand" "+A"))		      ;; mem location
+   (set (match_dup 1)
+	(unspec_volatile:SHORT
+	  [(any_atomic:SHORT (match_dup 1)
+		     (match_operand:SHORT 2 "reg_or_0_operand" "rJ")) ;; value for op
+	   (match_operand:SI 3 "const_int_operand")]		      ;; model
+	 UNSPEC_SYNC_OLD_OP_SUBWORD))]
+  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
+{
+  /* We have no QImode/HImode atomics, so form a mask, then use
+     subword_atomic_fetch_strong_<mode> to implement a LR/SC version of the
+     operation. */
+
+  /* Logic duplicated in gcc/libgcc/config/riscv/atomic.c for use when inlining
+     is disabled */
+
+  rtx old = gen_reg_rtx (SImode);
+  rtx mem = operands[1];
+  rtx value = operands[2];
+  rtx aligned_mem = gen_reg_rtx (SImode);
+  rtx shift = gen_reg_rtx (SImode);
+  rtx mask = gen_reg_rtx (SImode);
+  rtx not_mask = gen_reg_rtx (SImode);
+
+  riscv_subword_address (mem, &aligned_mem, &shift, &mask, &not_mask);
+
+  rtx shifted_value = gen_reg_rtx (SImode);
+  riscv_lshift_subword (<MODE>mode, value, shift, &shifted_value);
+
+  emit_insn (gen_subword_atomic_fetch_strong_<atomic_optab> (old, aligned_mem,
+							     shifted_value,
+							     mask, not_mask));
+
+  emit_move_insn (old, gen_rtx_ASHIFTRT (SImode, old,
+					 gen_lowpart (QImode, shift)));
+
+  emit_move_insn (operands[0], gen_lowpart (<MODE>mode, old));
+
+  DONE;
+})
+
+(define_insn "subword_atomic_fetch_strong_<atomic_optab>"
+  [(set (match_operand:SI 0 "register_operand" "=&r")		   ;; old value at mem
+	(match_operand:SI 1 "memory_operand" "+A"))		   ;; mem location
+   (set (match_dup 1)
+	(unspec_volatile:SI
+	  [(any_atomic:SI (match_dup 1)
+		     (match_operand:SI 2 "register_operand" "rI")) ;; value for op
+	   (match_operand:SI 3 "register_operand" "rI")]	   ;; mask
+	 UNSPEC_SYNC_OLD_OP_SUBWORD))
+    (match_operand:SI 4 "register_operand" "rI")		   ;; not_mask
+    (clobber (match_scratch:SI 5 "=&r"))			   ;; tmp_1
+    (clobber (match_scratch:SI 6 "=&r"))]			   ;; tmp_2
+  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
+  {
+    return
+    "1:\;"
+    "lr.w.aq\t%0, %1\;"
+    "<insn>\t%5, %0, %2\;"
+    "and\t%5, %5, %3\;"
+    "and\t%6, %0, %4\;"
+    "or\t%6, %6, %5\;"
+    "sc.w.rl\t%5, %6, %1\;"
+    "bnez\t%5, 1b";
+  }
+  [(set (attr "length") (const_int 28))])
+
+(define_expand "atomic_fetch_nand<mode>"
+  [(set (match_operand:SHORT 0 "register_operand" "=&r")
+	(match_operand:SHORT 1 "memory_operand" "+A"))
+   (set (match_dup 1)
+	(unspec_volatile:SHORT
+	  [(not:SHORT (and:SHORT (match_dup 1)
+				 (match_operand:SHORT 2 "reg_or_0_operand" "rJ")))
+	   (match_operand:SI 3 "const_int_operand")] ;; model
+	 UNSPEC_SYNC_OLD_OP_SUBWORD))]
+  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
+{
+  /* We have no QImode/HImode atomics, so form a mask, then use
+     subword_atomic_fetch_strong_nand to implement a LR/SC version of the
+     operation. */
+
+  /* Logic duplicated in gcc/libgcc/config/riscv/atomic.c for use when inlining
+     is disabled */
+
+  rtx old = gen_reg_rtx (SImode);
+  rtx mem = operands[1];
+  rtx value = operands[2];
+  rtx aligned_mem = gen_reg_rtx (SImode);
+  rtx shift = gen_reg_rtx (SImode);
+  rtx mask = gen_reg_rtx (SImode);
+  rtx not_mask = gen_reg_rtx (SImode);
+
+  riscv_subword_address (mem, &aligned_mem, &shift, &mask, &not_mask);
+
+  rtx shifted_value = gen_reg_rtx (SImode);
+  riscv_lshift_subword (<MODE>mode, value, shift, &shifted_value);
+
+  emit_insn (gen_subword_atomic_fetch_strong_nand (old, aligned_mem,
+						   shifted_value,
+						   mask, not_mask));
+
+  emit_move_insn (old, gen_rtx_ASHIFTRT (SImode, old,
+					 gen_lowpart (QImode, shift)));
+
+  emit_move_insn (operands[0], gen_lowpart (<MODE>mode, old));
+
+  DONE;
+})
+
+(define_insn "subword_atomic_fetch_strong_nand"
+  [(set (match_operand:SI 0 "register_operand" "=&r")			  ;; old value at mem
+	(match_operand:SI 1 "memory_operand" "+A"))			  ;; mem location
+   (set (match_dup 1)
+	(unspec_volatile:SI
+	  [(not:SI (and:SI (match_dup 1)
+			   (match_operand:SI 2 "register_operand" "rI"))) ;; value for op
+	   (match_operand:SI 3 "register_operand" "rI")]		  ;; mask
+	 UNSPEC_SYNC_OLD_OP_SUBWORD))
+    (match_operand:SI 4 "register_operand" "rI")			  ;; not_mask
+    (clobber (match_scratch:SI 5 "=&r"))				  ;; tmp_1
+    (clobber (match_scratch:SI 6 "=&r"))]				  ;; tmp_2
+  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
+  {
+    return
+    "1:\;"
+    "lr.w.aq\t%0, %1\;"
+    "and\t%5, %0, %2\;"
+    "not\t%5, %5\;"
+    "and\t%5, %5, %3\;"
+    "and\t%6, %0, %4\;"
+    "or\t%6, %6, %5\;"
+    "sc.w.rl\t%5, %6, %1\;"
+    "bnez\t%5, 1b";
+  }
+  [(set (attr "length") (const_int 32))])
+
 (define_insn "atomic_exchange<mode>"
   [(set (match_operand:GPR 0 "register_operand" "=&r")
 	(unspec_volatile:GPR
@@ -104,6 +246,60 @@
   "%F3amoswap.<amo>%A3 %0,%z2,%1"
   [(set (attr "length") (const_int 8))])
 
+(define_expand "atomic_exchange<mode>"
+  [(set (match_operand:SHORT 0 "register_operand" "=&r")
+	(unspec_volatile:SHORT
+	  [(match_operand:SHORT 1 "memory_operand" "+A")
+	   (match_operand:SI 3 "const_int_operand")] ;; model
+	  UNSPEC_SYNC_EXCHANGE_SUBWORD))
+   (set (match_dup 1)
+	(match_operand:SHORT 2 "register_operand" "0"))]
+  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
+{
+  rtx old = gen_reg_rtx (SImode);
+  rtx mem = operands[1];
+  rtx value = operands[2];
+  rtx aligned_mem = gen_reg_rtx (SImode);
+  rtx shift = gen_reg_rtx (SImode);
+  rtx mask = gen_reg_rtx (SImode);
+  rtx not_mask = gen_reg_rtx (SImode);
+
+  riscv_subword_address (mem, &aligned_mem, &shift, &mask, &not_mask);
+
+  rtx shifted_value = gen_reg_rtx (SImode);
+  riscv_lshift_subword (<MODE>mode, value, shift, &shifted_value);
+
+  emit_insn (gen_subword_atomic_exchange_strong (old, aligned_mem,
+						 shifted_value, not_mask));
+
+  emit_move_insn (old, gen_rtx_ASHIFTRT (SImode, old,
+					 gen_lowpart (QImode, shift)));
+
+  emit_move_insn (operands[0], gen_lowpart (<MODE>mode, old));
+  DONE;
+})
+
+(define_insn "subword_atomic_exchange_strong"
+  [(set (match_operand:SI 0 "register_operand" "=&r")	 ;; old value at mem
+	(match_operand:SI 1 "memory_operand" "+A"))	 ;; mem location
+   (set (match_dup 1)
+	(unspec_volatile:SI
+	  [(match_operand:SI 2 "reg_or_0_operand" "rI")  ;; value
+	   (match_operand:SI 3 "reg_or_0_operand" "rI")] ;; not_mask
+      UNSPEC_SYNC_EXCHANGE_SUBWORD))
+    (clobber (match_scratch:SI 4 "=&r"))]		 ;; tmp_1
+  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
+  {
+    return
+    "1:\;"
+    "lr.w.aq\t%0, %1\;"
+    "and\t%4, %0, %3\;"
+    "or\t%4, %4, %2\;"
+    "sc.w.rl\t%4, %4, %1\;"
+    "bnez\t%4, 1b";
+  }
+  [(set (attr "length") (const_int 20))])
+
 (define_insn "atomic_cas_value_strong<mode>"
   [(set (match_operand:GPR 0 "register_operand" "=&r")
 	(match_operand:GPR 1 "memory_operand" "+A"))
@@ -152,6 +348,128 @@
   DONE;
 })
 
+(define_expand "atomic_compare_and_swap<mode>"
+  [(match_operand:SI 0 "register_operand" "")    ;; bool output
+   (match_operand:SHORT 1 "register_operand" "") ;; val output
+   (match_operand:SHORT 2 "memory_operand" "")   ;; memory
+   (match_operand:SHORT 3 "reg_or_0_operand" "") ;; expected value
+   (match_operand:SHORT 4 "reg_or_0_operand" "") ;; desired value
+   (match_operand:SI 5 "const_int_operand" "")   ;; is_weak
+   (match_operand:SI 6 "const_int_operand" "")   ;; mod_s
+   (match_operand:SI 7 "const_int_operand" "")]  ;; mod_f
+  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
+{
+  emit_insn (gen_atomic_cas_value_strong<mode> (operands[1], operands[2],
+						operands[3], operands[4],
+						operands[6], operands[7]));
+
+  rtx val = gen_reg_rtx (SImode);
+  if (operands[1] != const0_rtx)
+    emit_insn (gen_rtx_SET (val, gen_rtx_SIGN_EXTEND (SImode, operands[1])));
+  else
+    emit_insn (gen_rtx_SET (val, const0_rtx));
+
+  rtx exp = gen_reg_rtx (SImode);
+  if (operands[3] != const0_rtx)
+    emit_insn (gen_rtx_SET (exp, gen_rtx_SIGN_EXTEND (SImode, operands[3])));
+  else
+    emit_insn (gen_rtx_SET (exp, const0_rtx));
+
+  rtx compare = val;
+  if (exp != const0_rtx)
+    {
+      rtx difference = gen_rtx_MINUS (SImode, val, exp);
+      compare = gen_reg_rtx (SImode);
+      emit_insn (gen_rtx_SET (compare, difference));
+    }
+
+  if (word_mode != SImode)
+    {
+      rtx reg = gen_reg_rtx (word_mode);
+      emit_insn (gen_rtx_SET (reg, gen_rtx_SIGN_EXTEND (word_mode, compare)));
+      compare = reg;
+    }
+
+  emit_insn (gen_rtx_SET (operands[0], gen_rtx_EQ (SImode, compare, const0_rtx)));
+  DONE;
+})
+
+(define_expand "atomic_cas_value_strong<mode>"
+  [(set (match_operand:SHORT 0 "register_operand" "=&r")			;; val output
+	(match_operand:SHORT 1 "memory_operand" "+A"))				;; memory
+   (set (match_dup 1)
+	(unspec_volatile:SHORT [(match_operand:SHORT 2 "reg_or_0_operand" "rJ")	;; expected val
+				(match_operand:SHORT 3 "reg_or_0_operand" "rJ")	;; desired val
+				(match_operand:SI 4 "const_int_operand")	;; mod_s
+				(match_operand:SI 5 "const_int_operand")]	;; mod_f
+	 UNSPEC_COMPARE_AND_SWAP_SUBWORD))
+   (clobber (match_scratch:SHORT 6 "=&r"))]
+  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
+{
+  /* We have no QImode/HImode atomics, so form a mask, then use
+     subword_atomic_cas_strong<mode> to implement a LR/SC version of the
+     operation. */
+
+  /* Logic duplicated in gcc/libgcc/config/riscv/atomic.c for use when inlining
+     is disabled */
+
+  rtx old = gen_reg_rtx (SImode);
+  rtx mem = operands[1];
+  rtx aligned_mem = gen_reg_rtx (SImode);
+  rtx shift = gen_reg_rtx (SImode);
+  rtx mask = gen_reg_rtx (SImode);
+  rtx not_mask = gen_reg_rtx (SImode);
+
+  riscv_subword_address (mem, &aligned_mem, &shift, &mask, &not_mask);
+
+  rtx o = operands[2];
+  rtx n = operands[3];
+  rtx shifted_o = gen_reg_rtx (SImode);
+  rtx shifted_n = gen_reg_rtx (SImode);
+
+  riscv_lshift_subword (<MODE>mode, o, shift, &shifted_o);
+  riscv_lshift_subword (<MODE>mode, n, shift, &shifted_n);
+
+  emit_move_insn (shifted_o, gen_rtx_AND (SImode, shifted_o, mask));
+  emit_move_insn (shifted_n, gen_rtx_AND (SImode, shifted_n, mask));
+
+  emit_insn (gen_subword_atomic_cas_strong (old, aligned_mem,
+					    shifted_o, shifted_n,
+					    mask, not_mask));
+
+  emit_move_insn (old, gen_rtx_ASHIFTRT (SImode, old,
+					 gen_lowpart (QImode, shift)));
+
+  emit_move_insn (operands[0], gen_lowpart(<MODE>mode, old));
+
+  DONE;
+})
+
+(define_insn "subword_atomic_cas_strong"
+  [(set (match_operand:SI 0 "register_operand" "=&r")			   ;; old value at mem
+	(match_operand:SI 1 "memory_operand" "+A"))			   ;; mem location
+   (set (match_dup 1)
+	(unspec_volatile:SI [(match_operand:SI 2 "reg_or_0_operand" "rJ")  ;; o
+			     (match_operand:SI 3 "reg_or_0_operand" "rJ")] ;; n
+	 UNSPEC_COMPARE_AND_SWAP_SUBWORD))
+	(match_operand:SI 4 "register_operand" "rI")			   ;; mask
+	(match_operand:SI 5 "register_operand" "rI")			   ;; not_mask
+	(clobber (match_scratch:SI 6 "=&r"))]				   ;; tmp_1
+  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
+  {
+    return
+    "1:\;"
+    "lr.w.aq\t%0, %1\;"
+    "and\t%6, %0, %4\;"
+    "bne\t%6, %z2, 1f\;"
+    "and\t%6, %0, %5\;"
+    "or\t%6, %6, %3\;"
+    "sc.w.rl\t%6, %6, %1\;"
+    "bnez\t%6, 1b\;"
+    "1:";
+  }
+  [(set (attr "length") (const_int 28))])
+
 (define_expand "atomic_test_and_set"
   [(match_operand:QI 0 "register_operand" "")     ;; bool output
    (match_operand:QI 1 "memory_operand" "+A")    ;; memory
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 1ac81ad0bb4..e3af78c51d8 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1227,6 +1227,7 @@ See RS/6000 and PowerPC Options.
 -mbig-endian  -mlittle-endian @gol
 -mstack-protector-guard=@var{guard}  -mstack-protector-guard-reg=@var{reg} @gol
 -mstack-protector-guard-offset=@var{offset}}
+-minline-atomics  -mno-inline-atomics @gol
 
 @emph{RL78 Options}
 @gccoptlist{-msim  -mmul=none  -mmul=g13  -mmul=g14  -mallregs @gol
@@ -28544,6 +28545,13 @@ Do or don't use smaller but slower prologue and epilogue code that uses
 library function calls.  The default is to use fast inline prologues and
 epilogues.
 
+@item -minline-atomics
+@itemx -mno-inline-atomics
+@opindex minline-atomics
+Do or don't use smaller but slower subword atomic emulation code that uses
+libatomic function calls.  The default is to use fast inline subword atomics
+that do not require libatomic.
+
 @item -mshorten-memrefs
 @itemx -mno-shorten-memrefs
 @opindex mshorten-memrefs
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-1.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-1.c
new file mode 100644
index 00000000000..5c5623d9b2f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mno-inline-atomics" } */
+/* { dg-message "note: '__sync_fetch_and_nand' changed semantics in GCC 4.4" "fetch_and_nand" { target *-*-* } 0 } */
+/* { dg-final { scan-assembler "\tcall\t__sync_fetch_and_add_1" } } */
+/* { dg-final { scan-assembler "\tcall\t__sync_fetch_and_nand_1" } } */
+/* { dg-final { scan-assembler "\tcall\t__sync_bool_compare_and_swap_1" } } */
+
+char foo;
+char bar;
+char baz;
+
+int
+main ()
+{
+  __sync_fetch_and_add(&foo, 1);
+  __sync_fetch_and_nand(&bar, 1);
+  __sync_bool_compare_and_swap (&baz, 1, 2);
+}
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-2.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-2.c
new file mode 100644
index 00000000000..fdce7a5d71f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-2.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* Verify that subword atomics do not generate calls.  */
+/* { dg-options "-minline-atomics" } */
+/* { dg-message "note: '__sync_fetch_and_nand' changed semantics in GCC 4.4" "fetch_and_nand" { target *-*-* } 0 } */
+/* { dg-final { scan-assembler-not "\tcall\t__sync_fetch_and_add_1" } } */
+/* { dg-final { scan-assembler-not "\tcall\t__sync_fetch_and_nand_1" } } */
+/* { dg-final { scan-assembler-not "\tcall\t__sync_bool_compare_and_swap_1" } } */
+
+char foo;
+char bar;
+char baz;
+
+int
+main ()
+{
+  __sync_fetch_and_add(&foo, 1);
+  __sync_fetch_and_nand(&bar, 1);
+  __sync_bool_compare_and_swap (&baz, 1, 2);
+}
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-3.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-3.c
new file mode 100644
index 00000000000..709f3734377
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-3.c
@@ -0,0 +1,569 @@
+/* Check all char alignments.  */
+/* Duplicate logic as libatomic/testsuite/libatomic.c/atomic-op-1.c */
+/* Test __atomic routines for existence and proper execution on 1 byte
+   values with each valid memory model.  */
+/* { dg-do run } */
+/* { dg-options "-minline-atomics -Wno-address-of-packed-member" } */
+
+/* Test the execution of the __atomic_*OP builtin routines for a char.  */
+
+extern void abort(void);
+
+char count, res;
+const char init = ~0;
+
+struct A
+{
+   char a;
+   char b;
+   char c;
+   char d;
+} __attribute__ ((packed)) A;
+
+/* The fetch_op routines return the original value before the operation.  */
+
+void
+test_fetch_add (char* v)
+{
+  *v = 0;
+  count = 1;
+
+  if (__atomic_fetch_add (v, count, __ATOMIC_RELAXED) != 0)
+    abort ();
+
+  if (__atomic_fetch_add (v, 1, __ATOMIC_CONSUME) != 1)
+    abort ();
+
+  if (__atomic_fetch_add (v, count, __ATOMIC_ACQUIRE) != 2)
+    abort ();
+
+  if (__atomic_fetch_add (v, 1, __ATOMIC_RELEASE) != 3)
+    abort ();
+
+  if (__atomic_fetch_add (v, count, __ATOMIC_ACQ_REL) != 4)
+    abort ();
+
+  if (__atomic_fetch_add (v, 1, __ATOMIC_SEQ_CST) != 5)
+    abort ();
+}
+
+
+void
+test_fetch_sub (char* v)
+{
+  *v = res = 20;
+  count = 0;
+
+  if (__atomic_fetch_sub (v, count + 1, __ATOMIC_RELAXED) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, 1, __ATOMIC_CONSUME) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, count + 1, __ATOMIC_ACQUIRE) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, 1, __ATOMIC_RELEASE) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, count + 1, __ATOMIC_ACQ_REL) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, 1, __ATOMIC_SEQ_CST) !=  res--)
+    abort ();
+}
+
+void
+test_fetch_and (char* v)
+{
+  *v = init;
+
+  if (__atomic_fetch_and (v, 0, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_fetch_and (v, init, __ATOMIC_CONSUME) !=  0)
+    abort ();
+
+  if (__atomic_fetch_and (v, 0, __ATOMIC_ACQUIRE) !=  0)
+    abort ();
+
+  *v = ~*v;
+  if (__atomic_fetch_and (v, init, __ATOMIC_RELEASE) !=  init)
+    abort ();
+
+  if (__atomic_fetch_and (v, 0, __ATOMIC_ACQ_REL) !=  init)
+    abort ();
+
+  if (__atomic_fetch_and (v, 0, __ATOMIC_SEQ_CST) !=  0)
+    abort ();
+}
+
+void
+test_fetch_nand (char* v)
+{
+  *v = init;
+
+  if (__atomic_fetch_nand (v, 0, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_fetch_nand (v, init, __ATOMIC_CONSUME) !=  init)
+    abort ();
+
+  if (__atomic_fetch_nand (v, 0, __ATOMIC_ACQUIRE) !=  0 )
+    abort ();
+
+  if (__atomic_fetch_nand (v, init, __ATOMIC_RELEASE) !=  init)
+    abort ();
+
+  if (__atomic_fetch_nand (v, init, __ATOMIC_ACQ_REL) !=  0)
+    abort ();
+
+  if (__atomic_fetch_nand (v, 0, __ATOMIC_SEQ_CST) !=  init)
+    abort ();
+}
+
+void
+test_fetch_xor (char* v)
+{
+  *v = init;
+  count = 0;
+
+  if (__atomic_fetch_xor (v, count, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_fetch_xor (v, ~count, __ATOMIC_CONSUME) !=  init)
+    abort ();
+
+  if (__atomic_fetch_xor (v, 0, __ATOMIC_ACQUIRE) !=  0)
+    abort ();
+
+  if (__atomic_fetch_xor (v, ~count, __ATOMIC_RELEASE) !=  0)
+    abort ();
+
+  if (__atomic_fetch_xor (v, 0, __ATOMIC_ACQ_REL) !=  init)
+    abort ();
+
+  if (__atomic_fetch_xor (v, ~count, __ATOMIC_SEQ_CST) !=  init)
+    abort ();
+}
+
+void
+test_fetch_or (char* v)
+{
+  *v = 0;
+  count = 1;
+
+  if (__atomic_fetch_or (v, count, __ATOMIC_RELAXED) !=  0)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, 2, __ATOMIC_CONSUME) !=  1)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, count, __ATOMIC_ACQUIRE) !=  3)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, 8, __ATOMIC_RELEASE) !=  7)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, count, __ATOMIC_ACQ_REL) !=  15)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, count, __ATOMIC_SEQ_CST) !=  31)
+    abort ();
+}
+
+/* The OP_fetch routines return the new value after the operation.  */
+
+void
+test_add_fetch (char* v)
+{
+  *v = 0;
+  count = 1;
+
+  if (__atomic_add_fetch (v, count, __ATOMIC_RELAXED) != 1)
+    abort ();
+
+  if (__atomic_add_fetch (v, 1, __ATOMIC_CONSUME) != 2)
+    abort ();
+
+  if (__atomic_add_fetch (v, count, __ATOMIC_ACQUIRE) != 3)
+    abort ();
+
+  if (__atomic_add_fetch (v, 1, __ATOMIC_RELEASE) != 4)
+    abort ();
+
+  if (__atomic_add_fetch (v, count, __ATOMIC_ACQ_REL) != 5)
+    abort ();
+
+  if (__atomic_add_fetch (v, count, __ATOMIC_SEQ_CST) != 6)
+    abort ();
+}
+
+
+void
+test_sub_fetch (char* v)
+{
+  *v = res = 20;
+  count = 0;
+
+  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_RELAXED) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, 1, __ATOMIC_CONSUME) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_ACQUIRE) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, 1, __ATOMIC_RELEASE) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_ACQ_REL) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_SEQ_CST) !=  --res)
+    abort ();
+}
+
+void
+test_and_fetch (char* v)
+{
+  *v = init;
+
+  if (__atomic_and_fetch (v, 0, __ATOMIC_RELAXED) !=  0)
+    abort ();
+
+  *v = init;
+  if (__atomic_and_fetch (v, init, __ATOMIC_CONSUME) !=  init)
+    abort ();
+
+  if (__atomic_and_fetch (v, 0, __ATOMIC_ACQUIRE) !=  0)
+    abort ();
+
+  *v = ~*v;
+  if (__atomic_and_fetch (v, init, __ATOMIC_RELEASE) !=  init)
+    abort ();
+
+  if (__atomic_and_fetch (v, 0, __ATOMIC_ACQ_REL) !=  0)
+    abort ();
+
+  *v = ~*v;
+  if (__atomic_and_fetch (v, 0, __ATOMIC_SEQ_CST) !=  0)
+    abort ();
+}
+
+void
+test_nand_fetch (char* v)
+{
+  *v = init;
+
+  if (__atomic_nand_fetch (v, 0, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_nand_fetch (v, init, __ATOMIC_CONSUME) !=  0)
+    abort ();
+
+  if (__atomic_nand_fetch (v, 0, __ATOMIC_ACQUIRE) !=  init)
+    abort ();
+
+  if (__atomic_nand_fetch (v, init, __ATOMIC_RELEASE) !=  0)
+    abort ();
+
+  if (__atomic_nand_fetch (v, init, __ATOMIC_ACQ_REL) !=  init)
+    abort ();
+
+  if (__atomic_nand_fetch (v, 0, __ATOMIC_SEQ_CST) !=  init)
+    abort ();
+}
+
+
+
+void
+test_xor_fetch (char* v)
+{
+  *v = init;
+  count = 0;
+
+  if (__atomic_xor_fetch (v, count, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_xor_fetch (v, ~count, __ATOMIC_CONSUME) !=  0)
+    abort ();
+
+  if (__atomic_xor_fetch (v, 0, __ATOMIC_ACQUIRE) !=  0)
+    abort ();
+
+  if (__atomic_xor_fetch (v, ~count, __ATOMIC_RELEASE) !=  init)
+    abort ();
+
+  if (__atomic_xor_fetch (v, 0, __ATOMIC_ACQ_REL) !=  init)
+    abort ();
+
+  if (__atomic_xor_fetch (v, ~count, __ATOMIC_SEQ_CST) !=  0)
+    abort ();
+}
+
+void
+test_or_fetch (char* v)
+{
+  *v = 0;
+  count = 1;
+
+  if (__atomic_or_fetch (v, count, __ATOMIC_RELAXED) !=  1)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, 2, __ATOMIC_CONSUME) !=  3)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, count, __ATOMIC_ACQUIRE) !=  7)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, 8, __ATOMIC_RELEASE) !=  15)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, count, __ATOMIC_ACQ_REL) !=  31)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, count, __ATOMIC_SEQ_CST) !=  63)
+    abort ();
+}
+
+
+/* Test the OP routines with a result which isn't used. Use both variations
+   within each function.  */
+
+void
+test_add (char* v)
+{
+  *v = 0;
+  count = 1;
+
+  __atomic_add_fetch (v, count, __ATOMIC_RELAXED);
+  if (*v != 1)
+    abort ();
+
+  __atomic_fetch_add (v, count, __ATOMIC_CONSUME);
+  if (*v != 2)
+    abort ();
+
+  __atomic_add_fetch (v, 1 , __ATOMIC_ACQUIRE);
+  if (*v != 3)
+    abort ();
+
+  __atomic_fetch_add (v, 1, __ATOMIC_RELEASE);
+  if (*v != 4)
+    abort ();
+
+  __atomic_add_fetch (v, count, __ATOMIC_ACQ_REL);
+  if (*v != 5)
+    abort ();
+
+  __atomic_fetch_add (v, count, __ATOMIC_SEQ_CST);
+  if (*v != 6)
+    abort ();
+}
+
+
+void
+test_sub (char* v)
+{
+  *v = res = 20;
+  count = 0;
+
+  __atomic_sub_fetch (v, count + 1, __ATOMIC_RELAXED);
+  if (*v != --res)
+    abort ();
+
+  __atomic_fetch_sub (v, count + 1, __ATOMIC_CONSUME);
+  if (*v != --res)
+    abort ();
+
+  __atomic_sub_fetch (v, 1, __ATOMIC_ACQUIRE);
+  if (*v != --res)
+    abort ();
+
+  __atomic_fetch_sub (v, 1, __ATOMIC_RELEASE);
+  if (*v != --res)
+    abort ();
+
+  __atomic_sub_fetch (v, count + 1, __ATOMIC_ACQ_REL);
+  if (*v != --res)
+    abort ();
+
+  __atomic_fetch_sub (v, count + 1, __ATOMIC_SEQ_CST);
+  if (*v != --res)
+    abort ();
+}
+
+void
+test_and (char* v)
+{
+  *v = init;
+
+  __atomic_and_fetch (v, 0, __ATOMIC_RELAXED);
+  if (*v != 0)
+    abort ();
+
+  *v = init;
+  __atomic_fetch_and (v, init, __ATOMIC_CONSUME);
+  if (*v != init)
+    abort ();
+
+  __atomic_and_fetch (v, 0, __ATOMIC_ACQUIRE);
+  if (*v != 0)
+    abort ();
+
+  *v = ~*v;
+  __atomic_fetch_and (v, init, __ATOMIC_RELEASE);
+  if (*v != init)
+    abort ();
+
+  __atomic_and_fetch (v, 0, __ATOMIC_ACQ_REL);
+  if (*v != 0)
+    abort ();
+
+  *v = ~*v;
+  __atomic_fetch_and (v, 0, __ATOMIC_SEQ_CST);
+  if (*v != 0)
+    abort ();
+}
+
+void
+test_nand (char* v)
+{
+  *v = init;
+
+  __atomic_fetch_nand (v, 0, __ATOMIC_RELAXED);
+  if (*v != init)
+    abort ();
+
+  __atomic_fetch_nand (v, init, __ATOMIC_CONSUME);
+  if (*v != 0)
+    abort ();
+
+  __atomic_nand_fetch (v, 0, __ATOMIC_ACQUIRE);
+  if (*v != init)
+    abort ();
+
+  __atomic_nand_fetch (v, init, __ATOMIC_RELEASE);
+  if (*v != 0)
+    abort ();
+
+  __atomic_fetch_nand (v, init, __ATOMIC_ACQ_REL);
+  if (*v != init)
+    abort ();
+
+  __atomic_nand_fetch (v, 0, __ATOMIC_SEQ_CST);
+  if (*v != init)
+    abort ();
+}
+
+
+
+void
+test_xor (char* v)
+{
+  *v = init;
+  count = 0;
+
+  __atomic_xor_fetch (v, count, __ATOMIC_RELAXED);
+  if (*v != init)
+    abort ();
+
+  __atomic_fetch_xor (v, ~count, __ATOMIC_CONSUME);
+  if (*v != 0)
+    abort ();
+
+  __atomic_xor_fetch (v, 0, __ATOMIC_ACQUIRE);
+  if (*v != 0)
+    abort ();
+
+  __atomic_fetch_xor (v, ~count, __ATOMIC_RELEASE);
+  if (*v != init)
+    abort ();
+
+  __atomic_fetch_xor (v, 0, __ATOMIC_ACQ_REL);
+  if (*v != init)
+    abort ();
+
+  __atomic_xor_fetch (v, ~count, __ATOMIC_SEQ_CST);
+  if (*v != 0)
+    abort ();
+}
+
+void
+test_or (char* v)
+{
+  *v = 0;
+  count = 1;
+
+  __atomic_or_fetch (v, count, __ATOMIC_RELAXED);
+  if (*v != 1)
+    abort ();
+
+  count *= 2;
+  __atomic_fetch_or (v, count, __ATOMIC_CONSUME);
+  if (*v != 3)
+    abort ();
+
+  count *= 2;
+  __atomic_or_fetch (v, 4, __ATOMIC_ACQUIRE);
+  if (*v != 7)
+    abort ();
+
+  count *= 2;
+  __atomic_fetch_or (v, 8, __ATOMIC_RELEASE);
+  if (*v != 15)
+    abort ();
+
+  count *= 2;
+  __atomic_or_fetch (v, count, __ATOMIC_ACQ_REL);
+  if (*v != 31)
+    abort ();
+
+  count *= 2;
+  __atomic_fetch_or (v, count, __ATOMIC_SEQ_CST);
+  if (*v != 63)
+    abort ();
+}
+
+int
+main ()
+{
+  char* V[] = {&A.a, &A.b, &A.c, &A.d};
+
+  for (int i = 0; i < 4; i++) {
+    test_fetch_add (V[i]);
+    test_fetch_sub (V[i]);
+    test_fetch_and (V[i]);
+    test_fetch_nand (V[i]);
+    test_fetch_xor (V[i]);
+    test_fetch_or (V[i]);
+
+    test_add_fetch (V[i]);
+    test_sub_fetch (V[i]);
+    test_and_fetch (V[i]);
+    test_nand_fetch (V[i]);
+    test_xor_fetch (V[i]);
+    test_or_fetch (V[i]);
+
+    test_add (V[i]);
+    test_sub (V[i]);
+    test_and (V[i]);
+    test_nand (V[i]);
+    test_xor (V[i]);
+    test_or (V[i]);
+  }
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-4.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-4.c
new file mode 100644
index 00000000000..eecfaae5cc6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-4.c
@@ -0,0 +1,566 @@
+/* Check all short alignments.  */
+/* Duplicate logic as libatomic/testsuite/libatomic.c/atomic-op-2.c */
+/* Test __atomic routines for existence and proper execution on 2 byte
+   values with each valid memory model.  */
+/* { dg-do run } */
+/* { dg-options "-minline-atomics -Wno-address-of-packed-member" } */
+
+/* Test the execution of the __atomic_*OP builtin routines for a short.  */
+
+extern void abort(void);
+
+short count, res;
+const short init = ~0;
+
+struct A
+{
+   short a;
+   short b;
+} __attribute__ ((packed)) A;
+
+/* The fetch_op routines return the original value before the operation.  */
+
+void
+test_fetch_add (short* v)
+{
+  *v = 0;
+  count = 1;
+
+  if (__atomic_fetch_add (v, count, __ATOMIC_RELAXED) != 0)
+    abort ();
+
+  if (__atomic_fetch_add (v, 1, __ATOMIC_CONSUME) != 1)
+    abort ();
+
+  if (__atomic_fetch_add (v, count, __ATOMIC_ACQUIRE) != 2)
+    abort ();
+
+  if (__atomic_fetch_add (v, 1, __ATOMIC_RELEASE) != 3)
+    abort ();
+
+  if (__atomic_fetch_add (v, count, __ATOMIC_ACQ_REL) != 4)
+    abort ();
+
+  if (__atomic_fetch_add (v, 1, __ATOMIC_SEQ_CST) != 5)
+    abort ();
+}
+
+
+void
+test_fetch_sub (short* v)
+{
+  *v = res = 20;
+  count = 0;
+
+  if (__atomic_fetch_sub (v, count + 1, __ATOMIC_RELAXED) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, 1, __ATOMIC_CONSUME) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, count + 1, __ATOMIC_ACQUIRE) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, 1, __ATOMIC_RELEASE) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, count + 1, __ATOMIC_ACQ_REL) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, 1, __ATOMIC_SEQ_CST) !=  res--)
+    abort ();
+}
+
+void
+test_fetch_and (short* v)
+{
+  *v = init;
+
+  if (__atomic_fetch_and (v, 0, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_fetch_and (v, init, __ATOMIC_CONSUME) !=  0)
+    abort ();
+
+  if (__atomic_fetch_and (v, 0, __ATOMIC_ACQUIRE) !=  0)
+    abort ();
+
+  *v = ~*v;
+  if (__atomic_fetch_and (v, init, __ATOMIC_RELEASE) !=  init)
+    abort ();
+
+  if (__atomic_fetch_and (v, 0, __ATOMIC_ACQ_REL) !=  init)
+    abort ();
+
+  if (__atomic_fetch_and (v, 0, __ATOMIC_SEQ_CST) !=  0)
+    abort ();
+}
+
+void
+test_fetch_nand (short* v)
+{
+  *v = init;
+
+  if (__atomic_fetch_nand (v, 0, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_fetch_nand (v, init, __ATOMIC_CONSUME) !=  init)
+    abort ();
+
+  if (__atomic_fetch_nand (v, 0, __ATOMIC_ACQUIRE) !=  0 )
+    abort ();
+
+  if (__atomic_fetch_nand (v, init, __ATOMIC_RELEASE) !=  init)
+    abort ();
+
+  if (__atomic_fetch_nand (v, init, __ATOMIC_ACQ_REL) !=  0)
+    abort ();
+
+  if (__atomic_fetch_nand (v, 0, __ATOMIC_SEQ_CST) !=  init)
+    abort ();
+}
+
+void
+test_fetch_xor (short* v)
+{
+  *v = init;
+  count = 0;
+
+  if (__atomic_fetch_xor (v, count, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_fetch_xor (v, ~count, __ATOMIC_CONSUME) !=  init)
+    abort ();
+
+  if (__atomic_fetch_xor (v, 0, __ATOMIC_ACQUIRE) !=  0)
+    abort ();
+
+  if (__atomic_fetch_xor (v, ~count, __ATOMIC_RELEASE) !=  0)
+    abort ();
+
+  if (__atomic_fetch_xor (v, 0, __ATOMIC_ACQ_REL) !=  init)
+    abort ();
+
+  if (__atomic_fetch_xor (v, ~count, __ATOMIC_SEQ_CST) !=  init)
+    abort ();
+}
+
+void
+test_fetch_or (short* v)
+{
+  *v = 0;
+  count = 1;
+
+  if (__atomic_fetch_or (v, count, __ATOMIC_RELAXED) !=  0)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, 2, __ATOMIC_CONSUME) !=  1)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, count, __ATOMIC_ACQUIRE) !=  3)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, 8, __ATOMIC_RELEASE) !=  7)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, count, __ATOMIC_ACQ_REL) !=  15)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, count, __ATOMIC_SEQ_CST) !=  31)
+    abort ();
+}
+
+/* The OP_fetch routines return the new value after the operation.  */
+
+void
+test_add_fetch (short* v)
+{
+  *v = 0;
+  count = 1;
+
+  if (__atomic_add_fetch (v, count, __ATOMIC_RELAXED) != 1)
+    abort ();
+
+  if (__atomic_add_fetch (v, 1, __ATOMIC_CONSUME) != 2)
+    abort ();
+
+  if (__atomic_add_fetch (v, count, __ATOMIC_ACQUIRE) != 3)
+    abort ();
+
+  if (__atomic_add_fetch (v, 1, __ATOMIC_RELEASE) != 4)
+    abort ();
+
+  if (__atomic_add_fetch (v, count, __ATOMIC_ACQ_REL) != 5)
+    abort ();
+
+  if (__atomic_add_fetch (v, count, __ATOMIC_SEQ_CST) != 6)
+    abort ();
+}
+
+
+void
+test_sub_fetch (short* v)
+{
+  *v = res = 20;
+  count = 0;
+
+  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_RELAXED) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, 1, __ATOMIC_CONSUME) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_ACQUIRE) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, 1, __ATOMIC_RELEASE) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_ACQ_REL) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_SEQ_CST) !=  --res)
+    abort ();
+}
+
+void
+test_and_fetch (short* v)
+{
+  *v = init;
+
+  if (__atomic_and_fetch (v, 0, __ATOMIC_RELAXED) !=  0)
+    abort ();
+
+  *v = init;
+  if (__atomic_and_fetch (v, init, __ATOMIC_CONSUME) !=  init)
+    abort ();
+
+  if (__atomic_and_fetch (v, 0, __ATOMIC_ACQUIRE) !=  0)
+    abort ();
+
+  *v = ~*v;
+  if (__atomic_and_fetch (v, init, __ATOMIC_RELEASE) !=  init)
+    abort ();
+
+  if (__atomic_and_fetch (v, 0, __ATOMIC_ACQ_REL) !=  0)
+    abort ();
+
+  *v = ~*v;
+  if (__atomic_and_fetch (v, 0, __ATOMIC_SEQ_CST) !=  0)
+    abort ();
+}
+
+void
+test_nand_fetch (short* v)
+{
+  *v = init;
+
+  if (__atomic_nand_fetch (v, 0, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_nand_fetch (v, init, __ATOMIC_CONSUME) !=  0)
+    abort ();
+
+  if (__atomic_nand_fetch (v, 0, __ATOMIC_ACQUIRE) !=  init)
+    abort ();
+
+  if (__atomic_nand_fetch (v, init, __ATOMIC_RELEASE) !=  0)
+    abort ();
+
+  if (__atomic_nand_fetch (v, init, __ATOMIC_ACQ_REL) !=  init)
+    abort ();
+
+  if (__atomic_nand_fetch (v, 0, __ATOMIC_SEQ_CST) !=  init)
+    abort ();
+}
+
+
+
+void
+test_xor_fetch (short* v)
+{
+  *v = init;
+  count = 0;
+
+  if (__atomic_xor_fetch (v, count, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_xor_fetch (v, ~count, __ATOMIC_CONSUME) !=  0)
+    abort ();
+
+  if (__atomic_xor_fetch (v, 0, __ATOMIC_ACQUIRE) !=  0)
+    abort ();
+
+  if (__atomic_xor_fetch (v, ~count, __ATOMIC_RELEASE) !=  init)
+    abort ();
+
+  if (__atomic_xor_fetch (v, 0, __ATOMIC_ACQ_REL) !=  init)
+    abort ();
+
+  if (__atomic_xor_fetch (v, ~count, __ATOMIC_SEQ_CST) !=  0)
+    abort ();
+}
+
+void
+test_or_fetch (short* v)
+{
+  *v = 0;
+  count = 1;
+
+  if (__atomic_or_fetch (v, count, __ATOMIC_RELAXED) !=  1)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, 2, __ATOMIC_CONSUME) !=  3)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, count, __ATOMIC_ACQUIRE) !=  7)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, 8, __ATOMIC_RELEASE) !=  15)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, count, __ATOMIC_ACQ_REL) !=  31)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, count, __ATOMIC_SEQ_CST) !=  63)
+    abort ();
+}
+
+
+/* Test the OP routines with a result which isn't used. Use both variations
+   within each function.  */
+
+void
+test_add (short* v)
+{
+  *v = 0;
+  count = 1;
+
+  __atomic_add_fetch (v, count, __ATOMIC_RELAXED);
+  if (*v != 1)
+    abort ();
+
+  __atomic_fetch_add (v, count, __ATOMIC_CONSUME);
+  if (*v != 2)
+    abort ();
+
+  __atomic_add_fetch (v, 1 , __ATOMIC_ACQUIRE);
+  if (*v != 3)
+    abort ();
+
+  __atomic_fetch_add (v, 1, __ATOMIC_RELEASE);
+  if (*v != 4)
+    abort ();
+
+  __atomic_add_fetch (v, count, __ATOMIC_ACQ_REL);
+  if (*v != 5)
+    abort ();
+
+  __atomic_fetch_add (v, count, __ATOMIC_SEQ_CST);
+  if (*v != 6)
+    abort ();
+}
+
+
+void
+test_sub (short* v)
+{
+  *v = res = 20;
+  count = 0;
+
+  __atomic_sub_fetch (v, count + 1, __ATOMIC_RELAXED);
+  if (*v != --res)
+    abort ();
+
+  __atomic_fetch_sub (v, count + 1, __ATOMIC_CONSUME);
+  if (*v != --res)
+    abort ();
+
+  __atomic_sub_fetch (v, 1, __ATOMIC_ACQUIRE);
+  if (*v != --res)
+    abort ();
+
+  __atomic_fetch_sub (v, 1, __ATOMIC_RELEASE);
+  if (*v != --res)
+    abort ();
+
+  __atomic_sub_fetch (v, count + 1, __ATOMIC_ACQ_REL);
+  if (*v != --res)
+    abort ();
+
+  __atomic_fetch_sub (v, count + 1, __ATOMIC_SEQ_CST);
+  if (*v != --res)
+    abort ();
+}
+
+void
+test_and (short* v)
+{
+  *v = init;
+
+  __atomic_and_fetch (v, 0, __ATOMIC_RELAXED);
+  if (*v != 0)
+    abort ();
+
+  *v = init;
+  __atomic_fetch_and (v, init, __ATOMIC_CONSUME);
+  if (*v != init)
+    abort ();
+
+  __atomic_and_fetch (v, 0, __ATOMIC_ACQUIRE);
+  if (*v != 0)
+    abort ();
+
+  *v = ~*v;
+  __atomic_fetch_and (v, init, __ATOMIC_RELEASE);
+  if (*v != init)
+    abort ();
+
+  __atomic_and_fetch (v, 0, __ATOMIC_ACQ_REL);
+  if (*v != 0)
+    abort ();
+
+  *v = ~*v;
+  __atomic_fetch_and (v, 0, __ATOMIC_SEQ_CST);
+  if (*v != 0)
+    abort ();
+}
+
+void
+test_nand (short* v)
+{
+  *v = init;
+
+  __atomic_fetch_nand (v, 0, __ATOMIC_RELAXED);
+  if (*v != init)
+    abort ();
+
+  __atomic_fetch_nand (v, init, __ATOMIC_CONSUME);
+  if (*v != 0)
+    abort ();
+
+  __atomic_nand_fetch (v, 0, __ATOMIC_ACQUIRE);
+  if (*v != init)
+    abort ();
+
+  __atomic_nand_fetch (v, init, __ATOMIC_RELEASE);
+  if (*v != 0)
+    abort ();
+
+  __atomic_fetch_nand (v, init, __ATOMIC_ACQ_REL);
+  if (*v != init)
+    abort ();
+
+  __atomic_nand_fetch (v, 0, __ATOMIC_SEQ_CST);
+  if (*v != init)
+    abort ();
+}
+
+
+
+void
+test_xor (short* v)
+{
+  *v = init;
+  count = 0;
+
+  __atomic_xor_fetch (v, count, __ATOMIC_RELAXED);
+  if (*v != init)
+    abort ();
+
+  __atomic_fetch_xor (v, ~count, __ATOMIC_CONSUME);
+  if (*v != 0)
+    abort ();
+
+  __atomic_xor_fetch (v, 0, __ATOMIC_ACQUIRE);
+  if (*v != 0)
+    abort ();
+
+  __atomic_fetch_xor (v, ~count, __ATOMIC_RELEASE);
+  if (*v != init)
+    abort ();
+
+  __atomic_fetch_xor (v, 0, __ATOMIC_ACQ_REL);
+  if (*v != init)
+    abort ();
+
+  __atomic_xor_fetch (v, ~count, __ATOMIC_SEQ_CST);
+  if (*v != 0)
+    abort ();
+}
+
+void
+test_or (short* v)
+{
+  *v = 0;
+  count = 1;
+
+  __atomic_or_fetch (v, count, __ATOMIC_RELAXED);
+  if (*v != 1)
+    abort ();
+
+  count *= 2;
+  __atomic_fetch_or (v, count, __ATOMIC_CONSUME);
+  if (*v != 3)
+    abort ();
+
+  count *= 2;
+  __atomic_or_fetch (v, 4, __ATOMIC_ACQUIRE);
+  if (*v != 7)
+    abort ();
+
+  count *= 2;
+  __atomic_fetch_or (v, 8, __ATOMIC_RELEASE);
+  if (*v != 15)
+    abort ();
+
+  count *= 2;
+  __atomic_or_fetch (v, count, __ATOMIC_ACQ_REL);
+  if (*v != 31)
+    abort ();
+
+  count *= 2;
+  __atomic_fetch_or (v, count, __ATOMIC_SEQ_CST);
+  if (*v != 63)
+    abort ();
+}
+
+int
+main () {
+  short* V[] = {&A.a, &A.b};
+
+  for (int i = 0; i < 2; i++) {
+    test_fetch_add (V[i]);
+    test_fetch_sub (V[i]);
+    test_fetch_and (V[i]);
+    test_fetch_nand (V[i]);
+    test_fetch_xor (V[i]);
+    test_fetch_or (V[i]);
+
+    test_add_fetch (V[i]);
+    test_sub_fetch (V[i]);
+    test_and_fetch (V[i]);
+    test_nand_fetch (V[i]);
+    test_xor_fetch (V[i]);
+    test_or_fetch (V[i]);
+
+    test_add (V[i]);
+    test_sub (V[i]);
+    test_and (V[i]);
+    test_nand (V[i]);
+    test_xor (V[i]);
+    test_or (V[i]);
+  }
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-5.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-5.c
new file mode 100644
index 00000000000..52093894a79
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-5.c
@@ -0,0 +1,87 @@
+/* Test __atomic routines for existence and proper execution on 1 byte
+   values with each valid memory model.  */
+/* Duplicate logic as libatomic/testsuite/libatomic.c/atomic-compare-exchange-1.c */
+/* { dg-do run } */
+/* { dg-options "-minline-atomics" } */
+
+/* Test the execution of the __atomic_compare_exchange_n builtin for a char.  */
+
+extern void abort(void);
+
+char v = 0;
+char expected = 0;
+char max = ~0;
+char desired = ~0;
+char zero = 0;
+
+#define STRONG 0
+#define WEAK 1
+
+int
+main ()
+{
+
+  if (!__atomic_compare_exchange_n (&v, &expected, max, STRONG , __ATOMIC_RELAXED, __ATOMIC_RELAXED))
+    abort ();
+  if (expected != 0)
+    abort ();
+
+  if (__atomic_compare_exchange_n (&v, &expected, 0, STRONG , __ATOMIC_ACQUIRE, __ATOMIC_RELAXED))
+    abort ();
+  if (expected != max)
+    abort ();
+
+  if (!__atomic_compare_exchange_n (&v, &expected, 0, STRONG , __ATOMIC_RELEASE, __ATOMIC_ACQUIRE))
+    abort ();
+  if (expected != max)
+    abort ();
+  if (v != 0)
+    abort ();
+
+  if (__atomic_compare_exchange_n (&v, &expected, desired, WEAK, __ATOMIC_ACQ_REL, __ATOMIC_ACQUIRE))
+    abort ();
+  if (expected != 0)
+    abort ();
+
+  if (!__atomic_compare_exchange_n (&v, &expected, desired, STRONG , __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST))
+    abort ();
+  if (expected != 0)
+    abort ();
+  if (v != max)
+    abort ();
+
+  /* Now test the generic version.  */
+
+  v = 0;
+
+  if (!__atomic_compare_exchange (&v, &expected, &max, STRONG, __ATOMIC_RELAXED, __ATOMIC_RELAXED))
+    abort ();
+  if (expected != 0)
+    abort ();
+
+  if (__atomic_compare_exchange (&v, &expected, &zero, STRONG , __ATOMIC_ACQUIRE, __ATOMIC_RELAXED))
+    abort ();
+  if (expected != max)
+    abort ();
+
+  if (!__atomic_compare_exchange (&v, &expected, &zero, STRONG , __ATOMIC_RELEASE, __ATOMIC_ACQUIRE))
+    abort ();
+  if (expected != max)
+    abort ();
+  if (v != 0)
+    abort ();
+
+  if (__atomic_compare_exchange (&v, &expected, &desired, WEAK, __ATOMIC_ACQ_REL, __ATOMIC_ACQUIRE))
+    abort ();
+  if (expected != 0)
+    abort ();
+
+  if (!__atomic_compare_exchange (&v, &expected, &desired, STRONG , __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST))
+    abort ();
+  if (expected != 0)
+    abort ();
+  if (v != max)
+    abort ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-6.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-6.c
new file mode 100644
index 00000000000..8fee8c44811
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-6.c
@@ -0,0 +1,87 @@
+/* Test __atomic routines for existence and proper execution on 2 byte
+   values with each valid memory model.  */
+/* Duplicate logic as libatomic/testsuite/libatomic.c/atomic-compare-exchange-2.c */
+/* { dg-do run } */
+/* { dg-options "-minline-atomics" } */
+
+/* Test the execution of the __atomic_compare_exchange_n builtin for a short.  */
+
+extern void abort(void);
+
+short v = 0;
+short expected = 0;
+short max = ~0;
+short desired = ~0;
+short zero = 0;
+
+#define STRONG 0
+#define WEAK 1
+
+int
+main ()
+{
+
+  if (!__atomic_compare_exchange_n (&v, &expected, max, STRONG , __ATOMIC_RELAXED, __ATOMIC_RELAXED))
+    abort ();
+  if (expected != 0)
+    abort ();
+
+  if (__atomic_compare_exchange_n (&v, &expected, 0, STRONG , __ATOMIC_ACQUIRE, __ATOMIC_RELAXED))
+    abort ();
+  if (expected != max)
+    abort ();
+
+  if (!__atomic_compare_exchange_n (&v, &expected, 0, STRONG , __ATOMIC_RELEASE, __ATOMIC_ACQUIRE))
+    abort ();
+  if (expected != max)
+    abort ();
+  if (v != 0)
+    abort ();
+
+  if (__atomic_compare_exchange_n (&v, &expected, desired, WEAK, __ATOMIC_ACQ_REL, __ATOMIC_ACQUIRE))
+    abort ();
+  if (expected != 0)
+    abort ();
+
+  if (!__atomic_compare_exchange_n (&v, &expected, desired, STRONG , __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST))
+    abort ();
+  if (expected != 0)
+    abort ();
+  if (v != max)
+    abort ();
+
+  /* Now test the generic version.  */
+
+  v = 0;
+
+  if (!__atomic_compare_exchange (&v, &expected, &max, STRONG, __ATOMIC_RELAXED, __ATOMIC_RELAXED))
+    abort ();
+  if (expected != 0)
+    abort ();
+
+  if (__atomic_compare_exchange (&v, &expected, &zero, STRONG , __ATOMIC_ACQUIRE, __ATOMIC_RELAXED))
+    abort ();
+  if (expected != max)
+    abort ();
+
+  if (!__atomic_compare_exchange (&v, &expected, &zero, STRONG , __ATOMIC_RELEASE, __ATOMIC_ACQUIRE))
+    abort ();
+  if (expected != max)
+    abort ();
+  if (v != 0)
+    abort ();
+
+  if (__atomic_compare_exchange (&v, &expected, &desired, WEAK, __ATOMIC_ACQ_REL, __ATOMIC_ACQUIRE))
+    abort ();
+  if (expected != 0)
+    abort ();
+
+  if (!__atomic_compare_exchange (&v, &expected, &desired, STRONG , __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST))
+    abort ();
+  if (expected != 0)
+    abort ();
+  if (v != max)
+    abort ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-7.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-7.c
new file mode 100644
index 00000000000..24c344c0ce3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-7.c
@@ -0,0 +1,69 @@
+/* Test __atomic routines for existence and proper execution on 1 byte
+   values with each valid memory model.  */
+/* Duplicate logic as libatomic/testsuite/libatomic.c/atomic-exchange-1.c */
+/* { dg-do run } */
+/* { dg-options "-minline-atomics" } */
+
+/* Test the execution of the __atomic_exchange_n builtin for a char.  */
+
+extern void abort(void);
+
+char v, count, ret;
+
+int
+main ()
+{
+  v = 0;
+  count = 0;
+
+  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_RELAXED) != count)
+    abort ();
+  count++;
+
+  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_ACQUIRE) != count)
+    abort ();
+  count++;
+
+  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_RELEASE) != count)
+    abort ();
+  count++;
+
+  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_ACQ_REL) != count)
+    abort ();
+  count++;
+
+  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_SEQ_CST) != count)
+    abort ();
+  count++;
+
+  /* Now test the generic version.  */
+
+  count++;
+
+  __atomic_exchange (&v, &count, &ret, __ATOMIC_RELAXED);
+  if (ret != count - 1 || v != count)
+    abort ();
+  count++;
+
+  __atomic_exchange (&v, &count, &ret, __ATOMIC_ACQUIRE);
+  if (ret != count - 1 || v != count)
+    abort ();
+  count++;
+
+  __atomic_exchange (&v, &count, &ret, __ATOMIC_RELEASE);
+  if (ret != count - 1 || v != count)
+    abort ();
+  count++;
+
+  __atomic_exchange (&v, &count, &ret, __ATOMIC_ACQ_REL);
+  if (ret != count - 1 || v != count)
+    abort ();
+  count++;
+
+  __atomic_exchange (&v, &count, &ret, __ATOMIC_SEQ_CST);
+  if (ret != count - 1 || v != count)
+    abort ();
+  count++;
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-8.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-8.c
new file mode 100644
index 00000000000..edc212df04e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-8.c
@@ -0,0 +1,69 @@
+/* Test __atomic routines for existence and proper execution on 2 byte
+   values with each valid memory model.  */
+/* Duplicate logic as libatomic/testsuite/libatomic.c/atomic-exchange-2.c */
+/* { dg-do run } */
+/* { dg-options "-minline-atomics" } */
+
+/* Test the execution of the __atomic_X builtin for a short.  */
+
+extern void abort(void);
+
+short v, count, ret;
+
+int
+main ()
+{
+  v = 0;
+  count = 0;
+
+  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_RELAXED) != count)
+    abort ();
+  count++;
+
+  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_ACQUIRE) != count)
+    abort ();
+  count++;
+
+  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_RELEASE) != count)
+    abort ();
+  count++;
+
+  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_ACQ_REL) != count)
+    abort ();
+  count++;
+
+  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_SEQ_CST) != count)
+    abort ();
+  count++;
+
+  /* Now test the generic version.  */
+
+  count++;
+
+  __atomic_exchange (&v, &count, &ret, __ATOMIC_RELAXED);
+  if (ret != count - 1 || v != count)
+    abort ();
+  count++;
+
+  __atomic_exchange (&v, &count, &ret, __ATOMIC_ACQUIRE);
+  if (ret != count - 1 || v != count)
+    abort ();
+  count++;
+
+  __atomic_exchange (&v, &count, &ret, __ATOMIC_RELEASE);
+  if (ret != count - 1 || v != count)
+    abort ();
+  count++;
+
+  __atomic_exchange (&v, &count, &ret, __ATOMIC_ACQ_REL);
+  if (ret != count - 1 || v != count)
+    abort ();
+  count++;
+
+  __atomic_exchange (&v, &count, &ret, __ATOMIC_SEQ_CST);
+  if (ret != count - 1 || v != count)
+    abort ();
+  count++;
+
+  return 0;
+}
diff --git a/libgcc/config/riscv/atomic.c b/libgcc/config/riscv/atomic.c
index 7007e7a20e4..a29909b97b5 100644
--- a/libgcc/config/riscv/atomic.c
+++ b/libgcc/config/riscv/atomic.c
@@ -30,6 +30,8 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 #define INVERT		"not %[tmp1], %[tmp1]\n\t"
 #define DONT_INVERT	""
 
+/* Logic duplicated in gcc/gcc/config/riscv/sync.md for use when inlining is enabled */
+
 #define GENERATE_FETCH_AND_OP(type, size, opname, insn, invert, cop)	\
   type __sync_fetch_and_ ## opname ## _ ## size (type *p, type v)	\
   {									\
-- 
2.34.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4] RISC-V: Add support for inlining subword atomic operations
  2022-08-21 21:58 [PATCH v4] RISC-V: Add support for inlining subword atomic operations Palmer Dabbelt
@ 2022-09-02 10:08 ` Kito Cheng
  2022-10-28 16:55   ` David Abdurachmanov
  2023-04-18 14:28 ` [PATCH v5] RISCV: Inline subword atomic ops Patrick O'Neill
  1 sibling, 1 reply; 24+ messages in thread
From: Kito Cheng @ 2022-09-02 10:08 UTC (permalink / raw)
  To: Palmer Dabbelt; +Cc: GCC Patches, Patrick O'Neill

LGTM with minor comments, it's time to move forward, thanks Patrick and Palmer.

> +
> +void
> +riscv_subword_address (rtx mem, rtx *aligned_mem, rtx *shift, rtx *mask,
> +                      rtx *not_mask)
> +{
> +  /* Align the memory addess to a word.  */
> +  rtx addr = force_reg (Pmode, XEXP (mem, 0));
> +
> +  rtx aligned_addr = gen_reg_rtx (Pmode);
> +  emit_move_insn (aligned_addr,  gen_rtx_AND (Pmode, addr,
> +                                             gen_int_mode (-4, Pmode)));
> +
> +  *aligned_mem = change_address (mem, SImode, aligned_addr);
> +
> +  /* Calculate the shift amount.  */
> +  *shift = gen_reg_rtx (SImode);

Already allocated reg_rtx outside, this line could be removed.

> +  emit_move_insn (*shift, gen_rtx_AND (SImode, gen_lowpart (SImode, addr),
> +                                     gen_int_mode (3, SImode)));
> +  emit_move_insn (*shift, gen_rtx_ASHIFT (SImode, *shift,
> +                                        gen_int_mode(3, SImode)));
> +
> +  /* Calculate the mask.  */
> +  int unshifted_mask;
> +  if (GET_MODE (mem) == QImode)
> +    unshifted_mask = 0xFF;
> +  else
> +    unshifted_mask = 0xFFFF;
> +
> +  rtx mask_reg = gen_reg_rtx (SImode);

Ditto.

> @@ -152,6 +348,128 @@
>    DONE;
>  })
>
> +(define_expand "atomic_compare_and_swap<mode>"
> +  [(match_operand:SI 0 "register_operand" "")    ;; bool output
> +   (match_operand:SHORT 1 "register_operand" "") ;; val output
> +   (match_operand:SHORT 2 "memory_operand" "")   ;; memory
> +   (match_operand:SHORT 3 "reg_or_0_operand" "") ;; expected value
> +   (match_operand:SHORT 4 "reg_or_0_operand" "") ;; desired value
> +   (match_operand:SI 5 "const_int_operand" "")   ;; is_weak
> +   (match_operand:SI 6 "const_int_operand" "")   ;; mod_s
> +   (match_operand:SI 7 "const_int_operand" "")]  ;; mod_f
> +  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
> +{
> +  emit_insn (gen_atomic_cas_value_strong<mode> (operands[1], operands[2],
> +                                               operands[3], operands[4],
> +                                               operands[6], operands[7]));
> +
> +  rtx val = gen_reg_rtx (SImode);
> +  if (operands[1] != const0_rtx)
> +    emit_insn (gen_rtx_SET (val, gen_rtx_SIGN_EXTEND (SImode, operands[1])));
> +  else
> +    emit_insn (gen_rtx_SET (val, const0_rtx));

nit: emit_move_insn rather than emit_insn + gen_rtx_SET

> +
> +  rtx exp = gen_reg_rtx (SImode);
> +  if (operands[3] != const0_rtx)
> +    emit_insn (gen_rtx_SET (exp, gen_rtx_SIGN_EXTEND (SImode, operands[3])));
> +  else
> +    emit_insn (gen_rtx_SET (exp, const0_rtx));

nit: emit_move_insn rather than emit_insn + gen_rtx_SET

> +
> +  rtx compare = val;
> +  if (exp != const0_rtx)
> +    {
> +      rtx difference = gen_rtx_MINUS (SImode, val, exp);
> +      compare = gen_reg_rtx (SImode);
> +      emit_insn (gen_rtx_SET (compare, difference));

nit: emit_move_insn rather than emit_insn + gen_rtx_SET

> +    }
> +
> +  if (word_mode != SImode)
> +    {
> +      rtx reg = gen_reg_rtx (word_mode);
> +      emit_insn (gen_rtx_SET (reg, gen_rtx_SIGN_EXTEND (word_mode, compare)));

nit: emit_move_insn rather than emit_insn + gen_rtx_SET


> +      compare = reg;
> +    }
> +
> +  emit_insn (gen_rtx_SET (operands[0], gen_rtx_EQ (SImode, compare, const0_rtx)));

nit: emit_move_insn rather than emit_insn + gen_rtx_SET

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4] RISC-V: Add support for inlining subword atomic operations
  2022-09-02 10:08 ` Kito Cheng
@ 2022-10-28 16:55   ` David Abdurachmanov
  2022-11-16  3:53     ` Jeff Law
  0 siblings, 1 reply; 24+ messages in thread
From: David Abdurachmanov @ 2022-10-28 16:55 UTC (permalink / raw)
  To: Kito Cheng; +Cc: Palmer Dabbelt, Patrick O'Neill, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 4074 bytes --]

On Fri, Sep 2, 2022 at 1:09 PM Kito Cheng via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

> LGTM with minor comments, it's time to move forward, thanks Patrick and
> Palmer.
>

Ping.

Any plans to finally land this one for GCC 13?

The hope is that this patch would make life significantly easier for
distributions. There are way too many packages failing to build due to
sub-word atomics, which is highly annoying considering that it's not
consistent between package versions. Build times on riscv64 are extremely
long which makes it even more annoying. Would love to see this finally
fixed.


> > +
> > +void
> > +riscv_subword_address (rtx mem, rtx *aligned_mem, rtx *shift, rtx *mask,
> > +                      rtx *not_mask)
> > +{
> > +  /* Align the memory addess to a word.  */
> > +  rtx addr = force_reg (Pmode, XEXP (mem, 0));
> > +
> > +  rtx aligned_addr = gen_reg_rtx (Pmode);
> > +  emit_move_insn (aligned_addr,  gen_rtx_AND (Pmode, addr,
> > +                                             gen_int_mode (-4, Pmode)));
> > +
> > +  *aligned_mem = change_address (mem, SImode, aligned_addr);
> > +
> > +  /* Calculate the shift amount.  */
> > +  *shift = gen_reg_rtx (SImode);
>
> Already allocated reg_rtx outside, this line could be removed.
>
> > +  emit_move_insn (*shift, gen_rtx_AND (SImode, gen_lowpart (SImode,
> addr),
> > +                                     gen_int_mode (3, SImode)));
> > +  emit_move_insn (*shift, gen_rtx_ASHIFT (SImode, *shift,
> > +                                        gen_int_mode(3, SImode)));
> > +
> > +  /* Calculate the mask.  */
> > +  int unshifted_mask;
> > +  if (GET_MODE (mem) == QImode)
> > +    unshifted_mask = 0xFF;
> > +  else
> > +    unshifted_mask = 0xFFFF;
> > +
> > +  rtx mask_reg = gen_reg_rtx (SImode);
>
> Ditto.
>
> > @@ -152,6 +348,128 @@
> >    DONE;
> >  })
> >
> > +(define_expand "atomic_compare_and_swap<mode>"
> > +  [(match_operand:SI 0 "register_operand" "")    ;; bool output
> > +   (match_operand:SHORT 1 "register_operand" "") ;; val output
> > +   (match_operand:SHORT 2 "memory_operand" "")   ;; memory
> > +   (match_operand:SHORT 3 "reg_or_0_operand" "") ;; expected value
> > +   (match_operand:SHORT 4 "reg_or_0_operand" "") ;; desired value
> > +   (match_operand:SI 5 "const_int_operand" "")   ;; is_weak
> > +   (match_operand:SI 6 "const_int_operand" "")   ;; mod_s
> > +   (match_operand:SI 7 "const_int_operand" "")]  ;; mod_f
> > +  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
> > +{
> > +  emit_insn (gen_atomic_cas_value_strong<mode> (operands[1],
> operands[2],
> > +                                               operands[3], operands[4],
> > +                                               operands[6],
> operands[7]));
> > +
> > +  rtx val = gen_reg_rtx (SImode);
> > +  if (operands[1] != const0_rtx)
> > +    emit_insn (gen_rtx_SET (val, gen_rtx_SIGN_EXTEND (SImode,
> operands[1])));
> > +  else
> > +    emit_insn (gen_rtx_SET (val, const0_rtx));
>
> nit: emit_move_insn rather than emit_insn + gen_rtx_SET
>
> > +
> > +  rtx exp = gen_reg_rtx (SImode);
> > +  if (operands[3] != const0_rtx)
> > +    emit_insn (gen_rtx_SET (exp, gen_rtx_SIGN_EXTEND (SImode,
> operands[3])));
> > +  else
> > +    emit_insn (gen_rtx_SET (exp, const0_rtx));
>
> nit: emit_move_insn rather than emit_insn + gen_rtx_SET
>
> > +
> > +  rtx compare = val;
> > +  if (exp != const0_rtx)
> > +    {
> > +      rtx difference = gen_rtx_MINUS (SImode, val, exp);
> > +      compare = gen_reg_rtx (SImode);
> > +      emit_insn (gen_rtx_SET (compare, difference));
>
> nit: emit_move_insn rather than emit_insn + gen_rtx_SET
>
> > +    }
> > +
> > +  if (word_mode != SImode)
> > +    {
> > +      rtx reg = gen_reg_rtx (word_mode);
> > +      emit_insn (gen_rtx_SET (reg, gen_rtx_SIGN_EXTEND (word_mode,
> compare)));
>
> nit: emit_move_insn rather than emit_insn + gen_rtx_SET
>
>
> > +      compare = reg;
> > +    }
> > +
> > +  emit_insn (gen_rtx_SET (operands[0], gen_rtx_EQ (SImode, compare,
> const0_rtx)));
>
> nit: emit_move_insn rather than emit_insn + gen_rtx_SET
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4] RISC-V: Add support for inlining subword atomic operations
  2022-10-28 16:55   ` David Abdurachmanov
@ 2022-11-16  3:53     ` Jeff Law
  0 siblings, 0 replies; 24+ messages in thread
From: Jeff Law @ 2022-11-16  3:53 UTC (permalink / raw)
  To: David Abdurachmanov, Kito Cheng; +Cc: GCC Patches, Patrick O'Neill


On 10/28/22 10:55, David Abdurachmanov via Gcc-patches wrote:
> On Fri, Sep 2, 2022 at 1:09 PM Kito Cheng via Gcc-patches <
> gcc-patches@gcc.gnu.org> wrote:
>
>> LGTM with minor comments, it's time to move forward, thanks Patrick and
>> Palmer.
>>
> Ping.
>
> Any plans to finally land this one for GCC 13?
>
> The hope is that this patch would make life significantly easier for
> distributions. There are way too many packages failing to build due to
> sub-word atomics, which is highly annoying considering that it's not
> consistent between package versions. Build times on riscv64 are extremely
> long which makes it even more annoying. Would love to see this finally
> fixed.

It could well be the case that this gets punted.  Atomics are a bit of a 
mess at the moment and we're still trying to figure out the best way 
forward.  It's not forgotten though.

jeff



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v5] RISCV: Inline subword atomic ops
  2022-08-21 21:58 [PATCH v4] RISC-V: Add support for inlining subword atomic operations Palmer Dabbelt
  2022-09-02 10:08 ` Kito Cheng
@ 2023-04-18 14:28 ` Patrick O'Neill
  2023-04-18 15:06   ` Andreas Schwab
                     ` (2 more replies)
  1 sibling, 3 replies; 24+ messages in thread
From: Patrick O'Neill @ 2023-04-18 14:28 UTC (permalink / raw)
  To: gcc-patches
  Cc: palmer, kito.cheng, david.abd, jeffreyalaw, Patrick O'Neill

RISC-V has no support for subword atomic operations; code currently
generates libatomic library calls.

This patch changes the default behavior to inline subword atomic calls 
(using the same logic as the existing library call).
Behavior can be specified using the -minline-atomics and
-mno-inline-atomics command line flags.

gcc/libgcc/config/riscv/atomic.c has the same logic implemented in asm.
This will need to stay for backwards compatibility and the
-mno-inline-atomics flag.

2023-04-18 Patrick O'Neill <patrick@rivosinc.com>

	PR target/104338
	* riscv-protos.h: Add helper function stubs.
	* riscv.cc: Add helper functions for subword masking.
	* riscv.opt: Add command-line flag.
	* sync.md: Add masking logic and inline asm for fetch_and_op,
	fetch_and_nand, CAS, and exchange ops.
	* invoke.texi: Add blurb regarding command-line flag.
	* inline-atomics-1.c: New test.
	* inline-atomics-2.c: Likewise.
	* inline-atomics-3.c: Likewise.
	* inline-atomics-4.c: Likewise.
	* inline-atomics-5.c: Likewise.
	* inline-atomics-6.c: Likewise.
	* inline-atomics-7.c: Likewise.
	* inline-atomics-8.c: Likewise.
	* atomic.c: Add reference to duplicate logic.

Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
---
v4: https://inbox.sourceware.org/gcc-patches/20220821215823.18207-1-palmer@rivosinc.com/

Rebased v4 and addressed Kito Cheng's comments.
No new failures on trunk.
---
The mapping implemented here matches Libatomic. That mapping changes if
"Implement ISA Manual Table A.6 Mappings" is merged. Depending on which
patch is merged first, I will update the other to make sure the
correct mapping is emitted.
  https://gcc.gnu.org/pipermail/gcc-patches/2023-April/615748.html
---
 gcc/config/riscv/riscv-protos.h               |   2 +
 gcc/config/riscv/riscv.cc                     |  50 ++
 gcc/config/riscv/riscv.opt                    |   4 +
 gcc/config/riscv/sync.md                      | 314 ++++++++++
 gcc/doc/invoke.texi                           |   8 +
 .../gcc.target/riscv/inline-atomics-1.c       |  18 +
 .../gcc.target/riscv/inline-atomics-2.c       |  19 +
 .../gcc.target/riscv/inline-atomics-3.c       | 569 ++++++++++++++++++
 .../gcc.target/riscv/inline-atomics-4.c       | 566 +++++++++++++++++
 .../gcc.target/riscv/inline-atomics-5.c       |  87 +++
 .../gcc.target/riscv/inline-atomics-6.c       |  87 +++
 .../gcc.target/riscv/inline-atomics-7.c       |  69 +++
 .../gcc.target/riscv/inline-atomics-8.c       |  69 +++
 libgcc/config/riscv/atomic.c                  |   2 +
 14 files changed, 1864 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-8.c

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 5244e8dcbf0..02b33e02020 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -79,6 +79,8 @@ extern void riscv_reinit (void);
 extern poly_uint64 riscv_regmode_natural_size (machine_mode);
 extern bool riscv_v_ext_vector_mode_p (machine_mode);
 extern bool riscv_shamt_matches_mask_p (int, HOST_WIDE_INT);
+extern void riscv_subword_address (rtx, rtx *, rtx *, rtx *, rtx *);
+extern void riscv_lshift_subword (machine_mode, rtx, rtx, rtx *);
 
 /* Routines implemented in riscv-c.cc.  */
 void riscv_cpu_cpp_builtins (cpp_reader *);
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index e4937d1af25..fa0247be22f 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -7143,6 +7143,56 @@ riscv_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
 							& ~zeroed_hardregs);
 }
 
+/* Helper function for extracting a subword from memory.  */
+
+void
+riscv_subword_address (rtx mem, rtx *aligned_mem, rtx *shift, rtx *mask,
+		       rtx *not_mask)
+{
+  /* Align the memory addess to a word.  */
+  rtx addr = force_reg (Pmode, XEXP (mem, 0));
+
+  rtx aligned_addr = gen_reg_rtx (Pmode);
+  emit_move_insn (aligned_addr,  gen_rtx_AND (Pmode, addr,
+					      gen_int_mode (-4, Pmode)));
+
+  *aligned_mem = change_address (mem, SImode, aligned_addr);
+
+  /* Calculate the shift amount.  */
+  emit_move_insn (*shift, gen_rtx_AND (SImode, gen_lowpart (SImode, addr),
+				       gen_int_mode (3, SImode)));
+  emit_move_insn (*shift, gen_rtx_ASHIFT (SImode, *shift,
+					  gen_int_mode(3, SImode)));
+
+  /* Calculate the mask.  */
+  int unshifted_mask;
+  if (GET_MODE (mem) == QImode)
+    unshifted_mask = 0xFF;
+  else
+    unshifted_mask = 0xFFFF;
+
+  emit_move_insn (*mask, gen_int_mode(unshifted_mask, SImode));
+
+  emit_move_insn (*mask, gen_rtx_ASHIFT(SImode, *mask,
+					gen_lowpart (QImode, *shift)));
+
+  emit_move_insn (*not_mask, gen_rtx_NOT(SImode, *mask));
+}
+
+/* Leftshift a subword within an SImode register.  */
+
+void
+riscv_lshift_subword (machine_mode mode, rtx value, rtx shift,
+		      rtx *shifted_value)
+{
+  rtx value_reg = gen_reg_rtx (SImode);
+  emit_move_insn (value_reg, simplify_gen_subreg (SImode, value,
+						  mode, 0));
+
+  emit_move_insn(*shifted_value, gen_rtx_ASHIFT(SImode, value_reg,
+						gen_lowpart (QImode, shift)));
+}
+
 /* Initialize the GCC target structure.  */
 #undef TARGET_ASM_ALIGNED_HI_OP
 #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t"
diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index ff1dd4ddd4f..bc5e63ab3e6 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -254,3 +254,7 @@ Enum(isa_spec_class) String(20191213) Value(ISA_SPEC_CLASS_20191213)
 misa-spec=
 Target RejectNegative Joined Enum(isa_spec_class) Var(riscv_isa_spec) Init(TARGET_DEFAULT_ISA_SPEC)
 Set the version of RISC-V ISA spec.
+
+minline-atomics
+Target Var(TARGET_INLINE_SUBWORD_ATOMIC) Init(1)
+Always inline subword atomic operations.
diff --git a/gcc/config/riscv/sync.md b/gcc/config/riscv/sync.md
index c932ef87b9d..5adbe08e2d6 100644
--- a/gcc/config/riscv/sync.md
+++ b/gcc/config/riscv/sync.md
@@ -21,8 +21,11 @@
 
 (define_c_enum "unspec" [
   UNSPEC_COMPARE_AND_SWAP
+  UNSPEC_COMPARE_AND_SWAP_SUBWORD
   UNSPEC_SYNC_OLD_OP
+  UNSPEC_SYNC_OLD_OP_SUBWORD
   UNSPEC_SYNC_EXCHANGE
+  UNSPEC_SYNC_EXCHANGE_SUBWORD
   UNSPEC_ATOMIC_STORE
   UNSPEC_MEMORY_BARRIER
 ])
@@ -91,6 +94,143 @@
   [(set_attr "type" "atomic")
    (set (attr "length") (const_int 8))])
 
+(define_insn "subword_atomic_fetch_strong_<atomic_optab>"
+  [(set (match_operand:SI 0 "register_operand" "=&r")		   ;; old value at mem
+	(match_operand:SI 1 "memory_operand" "+A"))		   ;; mem location
+   (set (match_dup 1)
+	(unspec_volatile:SI
+	  [(any_atomic:SI (match_dup 1)
+		     (match_operand:SI 2 "register_operand" "rI")) ;; value for op
+	   (match_operand:SI 3 "register_operand" "rI")]	   ;; mask
+	 UNSPEC_SYNC_OLD_OP_SUBWORD))
+    (match_operand:SI 4 "register_operand" "rI")		   ;; not_mask
+    (clobber (match_scratch:SI 5 "=&r"))			   ;; tmp_1
+    (clobber (match_scratch:SI 6 "=&r"))]			   ;; tmp_2
+  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
+  {
+    return "1:\;"
+	   "lr.w.aq\t%0, %1\;"
+	   "<insn>\t%5, %0, %2\;"
+	   "and\t%5, %5, %3\;"
+	   "and\t%6, %0, %4\;"
+	   "or\t%6, %6, %5\;"
+	   "sc.w.rl\t%5, %6, %1\;"
+	   "bnez\t%5, 1b";
+  }
+  [(set (attr "length") (const_int 28))])
+
+(define_expand "atomic_fetch_nand<mode>"
+  [(set (match_operand:SHORT 0 "register_operand" "=&r")
+	(match_operand:SHORT 1 "memory_operand" "+A"))
+   (set (match_dup 1)
+	(unspec_volatile:SHORT
+	  [(not:SHORT (and:SHORT (match_dup 1)
+				 (match_operand:SHORT 2 "reg_or_0_operand" "rJ")))
+	   (match_operand:SI 3 "const_int_operand")] ;; model
+	 UNSPEC_SYNC_OLD_OP_SUBWORD))]
+  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
+{
+  /* We have no QImode/HImode atomics, so form a mask, then use
+     subword_atomic_fetch_strong_nand to implement a LR/SC version of the
+     operation. */
+
+  /* Logic duplicated in gcc/libgcc/config/riscv/atomic.c for use when inlining
+     is disabled */
+
+  rtx old = gen_reg_rtx (SImode);
+  rtx mem = operands[1];
+  rtx value = operands[2];
+  rtx aligned_mem = gen_reg_rtx (SImode);
+  rtx shift = gen_reg_rtx (SImode);
+  rtx mask = gen_reg_rtx (SImode);
+  rtx not_mask = gen_reg_rtx (SImode);
+
+  riscv_subword_address (mem, &aligned_mem, &shift, &mask, &not_mask);
+
+  rtx shifted_value = gen_reg_rtx (SImode);
+  riscv_lshift_subword (<MODE>mode, value, shift, &shifted_value);
+
+  emit_insn (gen_subword_atomic_fetch_strong_nand (old, aligned_mem,
+						   shifted_value,
+						   mask, not_mask));
+
+  emit_move_insn (old, gen_rtx_ASHIFTRT (SImode, old,
+					 gen_lowpart (QImode, shift)));
+
+  emit_move_insn (operands[0], gen_lowpart (<MODE>mode, old));
+
+  DONE;
+})
+
+(define_insn "subword_atomic_fetch_strong_nand"
+  [(set (match_operand:SI 0 "register_operand" "=&r")			  ;; old value at mem
+	(match_operand:SI 1 "memory_operand" "+A"))			  ;; mem location
+   (set (match_dup 1)
+	(unspec_volatile:SI
+	  [(not:SI (and:SI (match_dup 1)
+			   (match_operand:SI 2 "register_operand" "rI"))) ;; value for op
+	   (match_operand:SI 3 "register_operand" "rI")]		  ;; mask
+	 UNSPEC_SYNC_OLD_OP_SUBWORD))
+    (match_operand:SI 4 "register_operand" "rI")			  ;; not_mask
+    (clobber (match_scratch:SI 5 "=&r"))				  ;; tmp_1
+    (clobber (match_scratch:SI 6 "=&r"))]				  ;; tmp_2
+  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
+  {
+    return "1:\;"
+	   "lr.w.aq\t%0, %1\;"
+	   "and\t%5, %0, %2\;"
+	   "not\t%5, %5\;"
+	   "and\t%5, %5, %3\;"
+	   "and\t%6, %0, %4\;"
+	   "or\t%6, %6, %5\;"
+	   "sc.w.rl\t%5, %6, %1\;"
+	   "bnez\t%5, 1b";
+  }
+  [(set (attr "length") (const_int 32))])
+
+(define_expand "atomic_fetch_<atomic_optab><mode>"
+  [(set (match_operand:SHORT 0 "register_operand" "=&r")	      ;; old value at mem
+	(match_operand:SHORT 1 "memory_operand" "+A"))		      ;; mem location
+   (set (match_dup 1)
+	(unspec_volatile:SHORT
+	  [(any_atomic:SHORT (match_dup 1)
+		     (match_operand:SHORT 2 "reg_or_0_operand" "rJ")) ;; value for op
+	   (match_operand:SI 3 "const_int_operand")]		      ;; model
+	 UNSPEC_SYNC_OLD_OP_SUBWORD))]
+  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
+{
+  /* We have no QImode/HImode atomics, so form a mask, then use
+     subword_atomic_fetch_strong_<mode> to implement a LR/SC version of the
+     operation. */
+
+  /* Logic duplicated in gcc/libgcc/config/riscv/atomic.c for use when inlining
+     is disabled */
+
+  rtx old = gen_reg_rtx (SImode);
+  rtx mem = operands[1];
+  rtx value = operands[2];
+  rtx aligned_mem = gen_reg_rtx (SImode);
+  rtx shift = gen_reg_rtx (SImode);
+  rtx mask = gen_reg_rtx (SImode);
+  rtx not_mask = gen_reg_rtx (SImode);
+
+  riscv_subword_address (mem, &aligned_mem, &shift, &mask, &not_mask);
+
+  rtx shifted_value = gen_reg_rtx (SImode);
+  riscv_lshift_subword (<MODE>mode, value, shift, &shifted_value);
+
+  emit_insn (gen_subword_atomic_fetch_strong_<atomic_optab> (old, aligned_mem,
+							     shifted_value,
+							     mask, not_mask));
+
+  emit_move_insn (old, gen_rtx_ASHIFTRT (SImode, old,
+					 gen_lowpart (QImode, shift)));
+
+  emit_move_insn (operands[0], gen_lowpart (<MODE>mode, old));
+
+  DONE;
+})
+
 (define_insn "atomic_exchange<mode>"
   [(set (match_operand:GPR 0 "register_operand" "=&r")
 	(unspec_volatile:GPR
@@ -104,6 +244,59 @@
   [(set_attr "type" "atomic")
    (set (attr "length") (const_int 8))])
 
+(define_expand "atomic_exchange<mode>"
+  [(set (match_operand:SHORT 0 "register_operand" "=&r")
+	(unspec_volatile:SHORT
+	  [(match_operand:SHORT 1 "memory_operand" "+A")
+	   (match_operand:SI 3 "const_int_operand")] ;; model
+	  UNSPEC_SYNC_EXCHANGE_SUBWORD))
+   (set (match_dup 1)
+	(match_operand:SHORT 2 "register_operand" "0"))]
+  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
+{
+  rtx old = gen_reg_rtx (SImode);
+  rtx mem = operands[1];
+  rtx value = operands[2];
+  rtx aligned_mem = gen_reg_rtx (SImode);
+  rtx shift = gen_reg_rtx (SImode);
+  rtx mask = gen_reg_rtx (SImode);
+  rtx not_mask = gen_reg_rtx (SImode);
+
+  riscv_subword_address (mem, &aligned_mem, &shift, &mask, &not_mask);
+
+  rtx shifted_value = gen_reg_rtx (SImode);
+  riscv_lshift_subword (<MODE>mode, value, shift, &shifted_value);
+
+  emit_insn (gen_subword_atomic_exchange_strong (old, aligned_mem,
+						 shifted_value, not_mask));
+
+  emit_move_insn (old, gen_rtx_ASHIFTRT (SImode, old,
+					 gen_lowpart (QImode, shift)));
+
+  emit_move_insn (operands[0], gen_lowpart (<MODE>mode, old));
+  DONE;
+})
+
+(define_insn "subword_atomic_exchange_strong"
+  [(set (match_operand:SI 0 "register_operand" "=&r")	 ;; old value at mem
+	(match_operand:SI 1 "memory_operand" "+A"))	 ;; mem location
+   (set (match_dup 1)
+	(unspec_volatile:SI
+	  [(match_operand:SI 2 "reg_or_0_operand" "rI")  ;; value
+	   (match_operand:SI 3 "reg_or_0_operand" "rI")] ;; not_mask
+      UNSPEC_SYNC_EXCHANGE_SUBWORD))
+    (clobber (match_scratch:SI 4 "=&r"))]		 ;; tmp_1
+  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
+  {
+    return "1:\;"
+	   "lr.w.aq\t%0, %1\;"
+	   "and\t%4, %0, %3\;"
+	   "or\t%4, %4, %2\;"
+	   "sc.w.rl\t%4, %4, %1\;"
+	   "bnez\t%4, 1b";
+  }
+  [(set (attr "length") (const_int 20))])
+
 (define_insn "atomic_cas_value_strong<mode>"
   [(set (match_operand:GPR 0 "register_operand" "=&r")
 	(match_operand:GPR 1 "memory_operand" "+A"))
@@ -153,6 +346,127 @@
   DONE;
 })
 
+(define_expand "atomic_compare_and_swap<mode>"
+  [(match_operand:SI 0 "register_operand" "")    ;; bool output
+   (match_operand:SHORT 1 "register_operand" "") ;; val output
+   (match_operand:SHORT 2 "memory_operand" "")   ;; memory
+   (match_operand:SHORT 3 "reg_or_0_operand" "") ;; expected value
+   (match_operand:SHORT 4 "reg_or_0_operand" "") ;; desired value
+   (match_operand:SI 5 "const_int_operand" "")   ;; is_weak
+   (match_operand:SI 6 "const_int_operand" "")   ;; mod_s
+   (match_operand:SI 7 "const_int_operand" "")]  ;; mod_f
+  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
+{
+  emit_insn (gen_atomic_cas_value_strong<mode> (operands[1], operands[2],
+						operands[3], operands[4],
+						operands[6], operands[7]));
+
+  rtx val = gen_reg_rtx (SImode);
+  if (operands[1] != const0_rtx)
+    emit_move_insn (val, gen_rtx_SIGN_EXTEND (SImode, operands[1]));
+  else
+    emit_move_insn (val, const0_rtx);
+
+  rtx exp = gen_reg_rtx (SImode);
+  if (operands[3] != const0_rtx)
+    emit_move_insn (exp, gen_rtx_SIGN_EXTEND (SImode, operands[3]));
+  else
+    emit_move_insn (exp, const0_rtx);
+
+  rtx compare = val;
+  if (exp != const0_rtx)
+    {
+      rtx difference = gen_rtx_MINUS (SImode, val, exp);
+      compare = gen_reg_rtx (SImode);
+      emit_move_insn  (compare, difference);
+    }
+
+  if (word_mode != SImode)
+    {
+      rtx reg = gen_reg_rtx (word_mode);
+      emit_move_insn (reg, gen_rtx_SIGN_EXTEND (word_mode, compare));
+      compare = reg;
+    }
+
+  emit_move_insn (operands[0], gen_rtx_EQ (SImode, compare, const0_rtx));
+  DONE;
+})
+
+(define_expand "atomic_cas_value_strong<mode>"
+  [(set (match_operand:SHORT 0 "register_operand" "=&r")			;; val output
+	(match_operand:SHORT 1 "memory_operand" "+A"))				;; memory
+   (set (match_dup 1)
+	(unspec_volatile:SHORT [(match_operand:SHORT 2 "reg_or_0_operand" "rJ")	;; expected val
+				(match_operand:SHORT 3 "reg_or_0_operand" "rJ")	;; desired val
+				(match_operand:SI 4 "const_int_operand")	;; mod_s
+				(match_operand:SI 5 "const_int_operand")]	;; mod_f
+	 UNSPEC_COMPARE_AND_SWAP_SUBWORD))
+   (clobber (match_scratch:SHORT 6 "=&r"))]
+  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
+{
+  /* We have no QImode/HImode atomics, so form a mask, then use
+     subword_atomic_cas_strong<mode> to implement a LR/SC version of the
+     operation. */
+
+  /* Logic duplicated in gcc/libgcc/config/riscv/atomic.c for use when inlining
+     is disabled */
+
+  rtx old = gen_reg_rtx (SImode);
+  rtx mem = operands[1];
+  rtx aligned_mem = gen_reg_rtx (SImode);
+  rtx shift = gen_reg_rtx (SImode);
+  rtx mask = gen_reg_rtx (SImode);
+  rtx not_mask = gen_reg_rtx (SImode);
+
+  riscv_subword_address (mem, &aligned_mem, &shift, &mask, &not_mask);
+
+  rtx o = operands[2];
+  rtx n = operands[3];
+  rtx shifted_o = gen_reg_rtx (SImode);
+  rtx shifted_n = gen_reg_rtx (SImode);
+
+  riscv_lshift_subword (<MODE>mode, o, shift, &shifted_o);
+  riscv_lshift_subword (<MODE>mode, n, shift, &shifted_n);
+
+  emit_move_insn (shifted_o, gen_rtx_AND (SImode, shifted_o, mask));
+  emit_move_insn (shifted_n, gen_rtx_AND (SImode, shifted_n, mask));
+
+  emit_insn (gen_subword_atomic_cas_strong (old, aligned_mem,
+					    shifted_o, shifted_n,
+					    mask, not_mask));
+
+  emit_move_insn (old, gen_rtx_ASHIFTRT (SImode, old,
+					 gen_lowpart (QImode, shift)));
+
+  emit_move_insn (operands[0], gen_lowpart(<MODE>mode, old));
+
+  DONE;
+})
+
+(define_insn "subword_atomic_cas_strong"
+  [(set (match_operand:SI 0 "register_operand" "=&r")			   ;; old value at mem
+	(match_operand:SI 1 "memory_operand" "+A"))			   ;; mem location
+   (set (match_dup 1)
+	(unspec_volatile:SI [(match_operand:SI 2 "reg_or_0_operand" "rJ")  ;; o
+			     (match_operand:SI 3 "reg_or_0_operand" "rJ")] ;; n
+	 UNSPEC_COMPARE_AND_SWAP_SUBWORD))
+	(match_operand:SI 4 "register_operand" "rI")			   ;; mask
+	(match_operand:SI 5 "register_operand" "rI")			   ;; not_mask
+	(clobber (match_scratch:SI 6 "=&r"))]				   ;; tmp_1
+  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
+  {
+    return "1:\;"
+	   "lr.w.aq\t%0, %1\;"
+	   "and\t%6, %0, %4\;"
+	   "bne\t%6, %z2, 1f\;"
+	   "and\t%6, %0, %5\;"
+	   "or\t%6, %6, %3\;"
+	   "sc.w.rl\t%6, %6, %1\;"
+	   "bnez\t%6, 1b\;"
+	   "1:";
+  }
+  [(set (attr "length") (const_int 28))])
+
 (define_expand "atomic_test_and_set"
   [(match_operand:QI 0 "register_operand" "")     ;; bool output
    (match_operand:QI 1 "memory_operand" "+A")    ;; memory
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index a38547f53e5..9c3e91d2fee 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1227,6 +1227,7 @@ See RS/6000 and PowerPC Options.
 -mstack-protector-guard=@var{guard}  -mstack-protector-guard-reg=@var{reg}
 -mstack-protector-guard-offset=@var{offset}
 -mcsr-check -mno-csr-check}
+-minline-atomics  -mno-inline-atomics
 
 @emph{RL78 Options}
 @gccoptlist{-msim  -mmul=none  -mmul=g13  -mmul=g14  -mallregs
@@ -29006,6 +29007,13 @@ Do or don't use smaller but slower prologue and epilogue code that uses
 library function calls.  The default is to use fast inline prologues and
 epilogues.
 
+@item -minline-atomics
+@itemx -mno-inline-atomics
+@opindex minline-atomics
+Do or don't use smaller but slower subword atomic emulation code that uses
+libatomic function calls.  The default is to use fast inline subword atomics
+that do not require libatomic.
+
 @opindex mshorten-memrefs
 @item -mshorten-memrefs
 @itemx -mno-shorten-memrefs
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-1.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-1.c
new file mode 100644
index 00000000000..5c5623d9b2f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mno-inline-atomics" } */
+/* { dg-message "note: '__sync_fetch_and_nand' changed semantics in GCC 4.4" "fetch_and_nand" { target *-*-* } 0 } */
+/* { dg-final { scan-assembler "\tcall\t__sync_fetch_and_add_1" } } */
+/* { dg-final { scan-assembler "\tcall\t__sync_fetch_and_nand_1" } } */
+/* { dg-final { scan-assembler "\tcall\t__sync_bool_compare_and_swap_1" } } */
+
+char foo;
+char bar;
+char baz;
+
+int
+main ()
+{
+  __sync_fetch_and_add(&foo, 1);
+  __sync_fetch_and_nand(&bar, 1);
+  __sync_bool_compare_and_swap (&baz, 1, 2);
+}
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-2.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-2.c
new file mode 100644
index 00000000000..fdce7a5d71f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-2.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* Verify that subword atomics do not generate calls.  */
+/* { dg-options "-minline-atomics" } */
+/* { dg-message "note: '__sync_fetch_and_nand' changed semantics in GCC 4.4" "fetch_and_nand" { target *-*-* } 0 } */
+/* { dg-final { scan-assembler-not "\tcall\t__sync_fetch_and_add_1" } } */
+/* { dg-final { scan-assembler-not "\tcall\t__sync_fetch_and_nand_1" } } */
+/* { dg-final { scan-assembler-not "\tcall\t__sync_bool_compare_and_swap_1" } } */
+
+char foo;
+char bar;
+char baz;
+
+int
+main ()
+{
+  __sync_fetch_and_add(&foo, 1);
+  __sync_fetch_and_nand(&bar, 1);
+  __sync_bool_compare_and_swap (&baz, 1, 2);
+}
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-3.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-3.c
new file mode 100644
index 00000000000..709f3734377
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-3.c
@@ -0,0 +1,569 @@
+/* Check all char alignments.  */
+/* Duplicate logic as libatomic/testsuite/libatomic.c/atomic-op-1.c */
+/* Test __atomic routines for existence and proper execution on 1 byte
+   values with each valid memory model.  */
+/* { dg-do run } */
+/* { dg-options "-minline-atomics -Wno-address-of-packed-member" } */
+
+/* Test the execution of the __atomic_*OP builtin routines for a char.  */
+
+extern void abort(void);
+
+char count, res;
+const char init = ~0;
+
+struct A
+{
+   char a;
+   char b;
+   char c;
+   char d;
+} __attribute__ ((packed)) A;
+
+/* The fetch_op routines return the original value before the operation.  */
+
+void
+test_fetch_add (char* v)
+{
+  *v = 0;
+  count = 1;
+
+  if (__atomic_fetch_add (v, count, __ATOMIC_RELAXED) != 0)
+    abort ();
+
+  if (__atomic_fetch_add (v, 1, __ATOMIC_CONSUME) != 1)
+    abort ();
+
+  if (__atomic_fetch_add (v, count, __ATOMIC_ACQUIRE) != 2)
+    abort ();
+
+  if (__atomic_fetch_add (v, 1, __ATOMIC_RELEASE) != 3)
+    abort ();
+
+  if (__atomic_fetch_add (v, count, __ATOMIC_ACQ_REL) != 4)
+    abort ();
+
+  if (__atomic_fetch_add (v, 1, __ATOMIC_SEQ_CST) != 5)
+    abort ();
+}
+
+
+void
+test_fetch_sub (char* v)
+{
+  *v = res = 20;
+  count = 0;
+
+  if (__atomic_fetch_sub (v, count + 1, __ATOMIC_RELAXED) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, 1, __ATOMIC_CONSUME) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, count + 1, __ATOMIC_ACQUIRE) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, 1, __ATOMIC_RELEASE) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, count + 1, __ATOMIC_ACQ_REL) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, 1, __ATOMIC_SEQ_CST) !=  res--)
+    abort ();
+}
+
+void
+test_fetch_and (char* v)
+{
+  *v = init;
+
+  if (__atomic_fetch_and (v, 0, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_fetch_and (v, init, __ATOMIC_CONSUME) !=  0)
+    abort ();
+
+  if (__atomic_fetch_and (v, 0, __ATOMIC_ACQUIRE) !=  0)
+    abort ();
+
+  *v = ~*v;
+  if (__atomic_fetch_and (v, init, __ATOMIC_RELEASE) !=  init)
+    abort ();
+
+  if (__atomic_fetch_and (v, 0, __ATOMIC_ACQ_REL) !=  init)
+    abort ();
+
+  if (__atomic_fetch_and (v, 0, __ATOMIC_SEQ_CST) !=  0)
+    abort ();
+}
+
+void
+test_fetch_nand (char* v)
+{
+  *v = init;
+
+  if (__atomic_fetch_nand (v, 0, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_fetch_nand (v, init, __ATOMIC_CONSUME) !=  init)
+    abort ();
+
+  if (__atomic_fetch_nand (v, 0, __ATOMIC_ACQUIRE) !=  0 )
+    abort ();
+
+  if (__atomic_fetch_nand (v, init, __ATOMIC_RELEASE) !=  init)
+    abort ();
+
+  if (__atomic_fetch_nand (v, init, __ATOMIC_ACQ_REL) !=  0)
+    abort ();
+
+  if (__atomic_fetch_nand (v, 0, __ATOMIC_SEQ_CST) !=  init)
+    abort ();
+}
+
+void
+test_fetch_xor (char* v)
+{
+  *v = init;
+  count = 0;
+
+  if (__atomic_fetch_xor (v, count, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_fetch_xor (v, ~count, __ATOMIC_CONSUME) !=  init)
+    abort ();
+
+  if (__atomic_fetch_xor (v, 0, __ATOMIC_ACQUIRE) !=  0)
+    abort ();
+
+  if (__atomic_fetch_xor (v, ~count, __ATOMIC_RELEASE) !=  0)
+    abort ();
+
+  if (__atomic_fetch_xor (v, 0, __ATOMIC_ACQ_REL) !=  init)
+    abort ();
+
+  if (__atomic_fetch_xor (v, ~count, __ATOMIC_SEQ_CST) !=  init)
+    abort ();
+}
+
+void
+test_fetch_or (char* v)
+{
+  *v = 0;
+  count = 1;
+
+  if (__atomic_fetch_or (v, count, __ATOMIC_RELAXED) !=  0)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, 2, __ATOMIC_CONSUME) !=  1)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, count, __ATOMIC_ACQUIRE) !=  3)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, 8, __ATOMIC_RELEASE) !=  7)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, count, __ATOMIC_ACQ_REL) !=  15)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, count, __ATOMIC_SEQ_CST) !=  31)
+    abort ();
+}
+
+/* The OP_fetch routines return the new value after the operation.  */
+
+void
+test_add_fetch (char* v)
+{
+  *v = 0;
+  count = 1;
+
+  if (__atomic_add_fetch (v, count, __ATOMIC_RELAXED) != 1)
+    abort ();
+
+  if (__atomic_add_fetch (v, 1, __ATOMIC_CONSUME) != 2)
+    abort ();
+
+  if (__atomic_add_fetch (v, count, __ATOMIC_ACQUIRE) != 3)
+    abort ();
+
+  if (__atomic_add_fetch (v, 1, __ATOMIC_RELEASE) != 4)
+    abort ();
+
+  if (__atomic_add_fetch (v, count, __ATOMIC_ACQ_REL) != 5)
+    abort ();
+
+  if (__atomic_add_fetch (v, count, __ATOMIC_SEQ_CST) != 6)
+    abort ();
+}
+
+
+void
+test_sub_fetch (char* v)
+{
+  *v = res = 20;
+  count = 0;
+
+  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_RELAXED) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, 1, __ATOMIC_CONSUME) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_ACQUIRE) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, 1, __ATOMIC_RELEASE) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_ACQ_REL) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_SEQ_CST) !=  --res)
+    abort ();
+}
+
+void
+test_and_fetch (char* v)
+{
+  *v = init;
+
+  if (__atomic_and_fetch (v, 0, __ATOMIC_RELAXED) !=  0)
+    abort ();
+
+  *v = init;
+  if (__atomic_and_fetch (v, init, __ATOMIC_CONSUME) !=  init)
+    abort ();
+
+  if (__atomic_and_fetch (v, 0, __ATOMIC_ACQUIRE) !=  0)
+    abort ();
+
+  *v = ~*v;
+  if (__atomic_and_fetch (v, init, __ATOMIC_RELEASE) !=  init)
+    abort ();
+
+  if (__atomic_and_fetch (v, 0, __ATOMIC_ACQ_REL) !=  0)
+    abort ();
+
+  *v = ~*v;
+  if (__atomic_and_fetch (v, 0, __ATOMIC_SEQ_CST) !=  0)
+    abort ();
+}
+
+void
+test_nand_fetch (char* v)
+{
+  *v = init;
+
+  if (__atomic_nand_fetch (v, 0, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_nand_fetch (v, init, __ATOMIC_CONSUME) !=  0)
+    abort ();
+
+  if (__atomic_nand_fetch (v, 0, __ATOMIC_ACQUIRE) !=  init)
+    abort ();
+
+  if (__atomic_nand_fetch (v, init, __ATOMIC_RELEASE) !=  0)
+    abort ();
+
+  if (__atomic_nand_fetch (v, init, __ATOMIC_ACQ_REL) !=  init)
+    abort ();
+
+  if (__atomic_nand_fetch (v, 0, __ATOMIC_SEQ_CST) !=  init)
+    abort ();
+}
+
+
+
+void
+test_xor_fetch (char* v)
+{
+  *v = init;
+  count = 0;
+
+  if (__atomic_xor_fetch (v, count, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_xor_fetch (v, ~count, __ATOMIC_CONSUME) !=  0)
+    abort ();
+
+  if (__atomic_xor_fetch (v, 0, __ATOMIC_ACQUIRE) !=  0)
+    abort ();
+
+  if (__atomic_xor_fetch (v, ~count, __ATOMIC_RELEASE) !=  init)
+    abort ();
+
+  if (__atomic_xor_fetch (v, 0, __ATOMIC_ACQ_REL) !=  init)
+    abort ();
+
+  if (__atomic_xor_fetch (v, ~count, __ATOMIC_SEQ_CST) !=  0)
+    abort ();
+}
+
+void
+test_or_fetch (char* v)
+{
+  *v = 0;
+  count = 1;
+
+  if (__atomic_or_fetch (v, count, __ATOMIC_RELAXED) !=  1)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, 2, __ATOMIC_CONSUME) !=  3)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, count, __ATOMIC_ACQUIRE) !=  7)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, 8, __ATOMIC_RELEASE) !=  15)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, count, __ATOMIC_ACQ_REL) !=  31)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, count, __ATOMIC_SEQ_CST) !=  63)
+    abort ();
+}
+
+
+/* Test the OP routines with a result which isn't used. Use both variations
+   within each function.  */
+
+void
+test_add (char* v)
+{
+  *v = 0;
+  count = 1;
+
+  __atomic_add_fetch (v, count, __ATOMIC_RELAXED);
+  if (*v != 1)
+    abort ();
+
+  __atomic_fetch_add (v, count, __ATOMIC_CONSUME);
+  if (*v != 2)
+    abort ();
+
+  __atomic_add_fetch (v, 1 , __ATOMIC_ACQUIRE);
+  if (*v != 3)
+    abort ();
+
+  __atomic_fetch_add (v, 1, __ATOMIC_RELEASE);
+  if (*v != 4)
+    abort ();
+
+  __atomic_add_fetch (v, count, __ATOMIC_ACQ_REL);
+  if (*v != 5)
+    abort ();
+
+  __atomic_fetch_add (v, count, __ATOMIC_SEQ_CST);
+  if (*v != 6)
+    abort ();
+}
+
+
+void
+test_sub (char* v)
+{
+  *v = res = 20;
+  count = 0;
+
+  __atomic_sub_fetch (v, count + 1, __ATOMIC_RELAXED);
+  if (*v != --res)
+    abort ();
+
+  __atomic_fetch_sub (v, count + 1, __ATOMIC_CONSUME);
+  if (*v != --res)
+    abort ();
+
+  __atomic_sub_fetch (v, 1, __ATOMIC_ACQUIRE);
+  if (*v != --res)
+    abort ();
+
+  __atomic_fetch_sub (v, 1, __ATOMIC_RELEASE);
+  if (*v != --res)
+    abort ();
+
+  __atomic_sub_fetch (v, count + 1, __ATOMIC_ACQ_REL);
+  if (*v != --res)
+    abort ();
+
+  __atomic_fetch_sub (v, count + 1, __ATOMIC_SEQ_CST);
+  if (*v != --res)
+    abort ();
+}
+
+void
+test_and (char* v)
+{
+  *v = init;
+
+  __atomic_and_fetch (v, 0, __ATOMIC_RELAXED);
+  if (*v != 0)
+    abort ();
+
+  *v = init;
+  __atomic_fetch_and (v, init, __ATOMIC_CONSUME);
+  if (*v != init)
+    abort ();
+
+  __atomic_and_fetch (v, 0, __ATOMIC_ACQUIRE);
+  if (*v != 0)
+    abort ();
+
+  *v = ~*v;
+  __atomic_fetch_and (v, init, __ATOMIC_RELEASE);
+  if (*v != init)
+    abort ();
+
+  __atomic_and_fetch (v, 0, __ATOMIC_ACQ_REL);
+  if (*v != 0)
+    abort ();
+
+  *v = ~*v;
+  __atomic_fetch_and (v, 0, __ATOMIC_SEQ_CST);
+  if (*v != 0)
+    abort ();
+}
+
+void
+test_nand (char* v)
+{
+  *v = init;
+
+  __atomic_fetch_nand (v, 0, __ATOMIC_RELAXED);
+  if (*v != init)
+    abort ();
+
+  __atomic_fetch_nand (v, init, __ATOMIC_CONSUME);
+  if (*v != 0)
+    abort ();
+
+  __atomic_nand_fetch (v, 0, __ATOMIC_ACQUIRE);
+  if (*v != init)
+    abort ();
+
+  __atomic_nand_fetch (v, init, __ATOMIC_RELEASE);
+  if (*v != 0)
+    abort ();
+
+  __atomic_fetch_nand (v, init, __ATOMIC_ACQ_REL);
+  if (*v != init)
+    abort ();
+
+  __atomic_nand_fetch (v, 0, __ATOMIC_SEQ_CST);
+  if (*v != init)
+    abort ();
+}
+
+
+
+void
+test_xor (char* v)
+{
+  *v = init;
+  count = 0;
+
+  __atomic_xor_fetch (v, count, __ATOMIC_RELAXED);
+  if (*v != init)
+    abort ();
+
+  __atomic_fetch_xor (v, ~count, __ATOMIC_CONSUME);
+  if (*v != 0)
+    abort ();
+
+  __atomic_xor_fetch (v, 0, __ATOMIC_ACQUIRE);
+  if (*v != 0)
+    abort ();
+
+  __atomic_fetch_xor (v, ~count, __ATOMIC_RELEASE);
+  if (*v != init)
+    abort ();
+
+  __atomic_fetch_xor (v, 0, __ATOMIC_ACQ_REL);
+  if (*v != init)
+    abort ();
+
+  __atomic_xor_fetch (v, ~count, __ATOMIC_SEQ_CST);
+  if (*v != 0)
+    abort ();
+}
+
+void
+test_or (char* v)
+{
+  *v = 0;
+  count = 1;
+
+  __atomic_or_fetch (v, count, __ATOMIC_RELAXED);
+  if (*v != 1)
+    abort ();
+
+  count *= 2;
+  __atomic_fetch_or (v, count, __ATOMIC_CONSUME);
+  if (*v != 3)
+    abort ();
+
+  count *= 2;
+  __atomic_or_fetch (v, 4, __ATOMIC_ACQUIRE);
+  if (*v != 7)
+    abort ();
+
+  count *= 2;
+  __atomic_fetch_or (v, 8, __ATOMIC_RELEASE);
+  if (*v != 15)
+    abort ();
+
+  count *= 2;
+  __atomic_or_fetch (v, count, __ATOMIC_ACQ_REL);
+  if (*v != 31)
+    abort ();
+
+  count *= 2;
+  __atomic_fetch_or (v, count, __ATOMIC_SEQ_CST);
+  if (*v != 63)
+    abort ();
+}
+
+int
+main ()
+{
+  char* V[] = {&A.a, &A.b, &A.c, &A.d};
+
+  for (int i = 0; i < 4; i++) {
+    test_fetch_add (V[i]);
+    test_fetch_sub (V[i]);
+    test_fetch_and (V[i]);
+    test_fetch_nand (V[i]);
+    test_fetch_xor (V[i]);
+    test_fetch_or (V[i]);
+
+    test_add_fetch (V[i]);
+    test_sub_fetch (V[i]);
+    test_and_fetch (V[i]);
+    test_nand_fetch (V[i]);
+    test_xor_fetch (V[i]);
+    test_or_fetch (V[i]);
+
+    test_add (V[i]);
+    test_sub (V[i]);
+    test_and (V[i]);
+    test_nand (V[i]);
+    test_xor (V[i]);
+    test_or (V[i]);
+  }
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-4.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-4.c
new file mode 100644
index 00000000000..eecfaae5cc6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-4.c
@@ -0,0 +1,566 @@
+/* Check all short alignments.  */
+/* Duplicate logic as libatomic/testsuite/libatomic.c/atomic-op-2.c */
+/* Test __atomic routines for existence and proper execution on 2 byte
+   values with each valid memory model.  */
+/* { dg-do run } */
+/* { dg-options "-minline-atomics -Wno-address-of-packed-member" } */
+
+/* Test the execution of the __atomic_*OP builtin routines for a short.  */
+
+extern void abort(void);
+
+short count, res;
+const short init = ~0;
+
+struct A
+{
+   short a;
+   short b;
+} __attribute__ ((packed)) A;
+
+/* The fetch_op routines return the original value before the operation.  */
+
+void
+test_fetch_add (short* v)
+{
+  *v = 0;
+  count = 1;
+
+  if (__atomic_fetch_add (v, count, __ATOMIC_RELAXED) != 0)
+    abort ();
+
+  if (__atomic_fetch_add (v, 1, __ATOMIC_CONSUME) != 1)
+    abort ();
+
+  if (__atomic_fetch_add (v, count, __ATOMIC_ACQUIRE) != 2)
+    abort ();
+
+  if (__atomic_fetch_add (v, 1, __ATOMIC_RELEASE) != 3)
+    abort ();
+
+  if (__atomic_fetch_add (v, count, __ATOMIC_ACQ_REL) != 4)
+    abort ();
+
+  if (__atomic_fetch_add (v, 1, __ATOMIC_SEQ_CST) != 5)
+    abort ();
+}
+
+
+void
+test_fetch_sub (short* v)
+{
+  *v = res = 20;
+  count = 0;
+
+  if (__atomic_fetch_sub (v, count + 1, __ATOMIC_RELAXED) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, 1, __ATOMIC_CONSUME) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, count + 1, __ATOMIC_ACQUIRE) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, 1, __ATOMIC_RELEASE) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, count + 1, __ATOMIC_ACQ_REL) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, 1, __ATOMIC_SEQ_CST) !=  res--)
+    abort ();
+}
+
+void
+test_fetch_and (short* v)
+{
+  *v = init;
+
+  if (__atomic_fetch_and (v, 0, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_fetch_and (v, init, __ATOMIC_CONSUME) !=  0)
+    abort ();
+
+  if (__atomic_fetch_and (v, 0, __ATOMIC_ACQUIRE) !=  0)
+    abort ();
+
+  *v = ~*v;
+  if (__atomic_fetch_and (v, init, __ATOMIC_RELEASE) !=  init)
+    abort ();
+
+  if (__atomic_fetch_and (v, 0, __ATOMIC_ACQ_REL) !=  init)
+    abort ();
+
+  if (__atomic_fetch_and (v, 0, __ATOMIC_SEQ_CST) !=  0)
+    abort ();
+}
+
+void
+test_fetch_nand (short* v)
+{
+  *v = init;
+
+  if (__atomic_fetch_nand (v, 0, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_fetch_nand (v, init, __ATOMIC_CONSUME) !=  init)
+    abort ();
+
+  if (__atomic_fetch_nand (v, 0, __ATOMIC_ACQUIRE) !=  0 )
+    abort ();
+
+  if (__atomic_fetch_nand (v, init, __ATOMIC_RELEASE) !=  init)
+    abort ();
+
+  if (__atomic_fetch_nand (v, init, __ATOMIC_ACQ_REL) !=  0)
+    abort ();
+
+  if (__atomic_fetch_nand (v, 0, __ATOMIC_SEQ_CST) !=  init)
+    abort ();
+}
+
+void
+test_fetch_xor (short* v)
+{
+  *v = init;
+  count = 0;
+
+  if (__atomic_fetch_xor (v, count, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_fetch_xor (v, ~count, __ATOMIC_CONSUME) !=  init)
+    abort ();
+
+  if (__atomic_fetch_xor (v, 0, __ATOMIC_ACQUIRE) !=  0)
+    abort ();
+
+  if (__atomic_fetch_xor (v, ~count, __ATOMIC_RELEASE) !=  0)
+    abort ();
+
+  if (__atomic_fetch_xor (v, 0, __ATOMIC_ACQ_REL) !=  init)
+    abort ();
+
+  if (__atomic_fetch_xor (v, ~count, __ATOMIC_SEQ_CST) !=  init)
+    abort ();
+}
+
+void
+test_fetch_or (short* v)
+{
+  *v = 0;
+  count = 1;
+
+  if (__atomic_fetch_or (v, count, __ATOMIC_RELAXED) !=  0)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, 2, __ATOMIC_CONSUME) !=  1)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, count, __ATOMIC_ACQUIRE) !=  3)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, 8, __ATOMIC_RELEASE) !=  7)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, count, __ATOMIC_ACQ_REL) !=  15)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, count, __ATOMIC_SEQ_CST) !=  31)
+    abort ();
+}
+
+/* The OP_fetch routines return the new value after the operation.  */
+
+void
+test_add_fetch (short* v)
+{
+  *v = 0;
+  count = 1;
+
+  if (__atomic_add_fetch (v, count, __ATOMIC_RELAXED) != 1)
+    abort ();
+
+  if (__atomic_add_fetch (v, 1, __ATOMIC_CONSUME) != 2)
+    abort ();
+
+  if (__atomic_add_fetch (v, count, __ATOMIC_ACQUIRE) != 3)
+    abort ();
+
+  if (__atomic_add_fetch (v, 1, __ATOMIC_RELEASE) != 4)
+    abort ();
+
+  if (__atomic_add_fetch (v, count, __ATOMIC_ACQ_REL) != 5)
+    abort ();
+
+  if (__atomic_add_fetch (v, count, __ATOMIC_SEQ_CST) != 6)
+    abort ();
+}
+
+
+void
+test_sub_fetch (short* v)
+{
+  *v = res = 20;
+  count = 0;
+
+  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_RELAXED) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, 1, __ATOMIC_CONSUME) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_ACQUIRE) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, 1, __ATOMIC_RELEASE) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_ACQ_REL) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_SEQ_CST) !=  --res)
+    abort ();
+}
+
+void
+test_and_fetch (short* v)
+{
+  *v = init;
+
+  if (__atomic_and_fetch (v, 0, __ATOMIC_RELAXED) !=  0)
+    abort ();
+
+  *v = init;
+  if (__atomic_and_fetch (v, init, __ATOMIC_CONSUME) !=  init)
+    abort ();
+
+  if (__atomic_and_fetch (v, 0, __ATOMIC_ACQUIRE) !=  0)
+    abort ();
+
+  *v = ~*v;
+  if (__atomic_and_fetch (v, init, __ATOMIC_RELEASE) !=  init)
+    abort ();
+
+  if (__atomic_and_fetch (v, 0, __ATOMIC_ACQ_REL) !=  0)
+    abort ();
+
+  *v = ~*v;
+  if (__atomic_and_fetch (v, 0, __ATOMIC_SEQ_CST) !=  0)
+    abort ();
+}
+
+void
+test_nand_fetch (short* v)
+{
+  *v = init;
+
+  if (__atomic_nand_fetch (v, 0, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_nand_fetch (v, init, __ATOMIC_CONSUME) !=  0)
+    abort ();
+
+  if (__atomic_nand_fetch (v, 0, __ATOMIC_ACQUIRE) !=  init)
+    abort ();
+
+  if (__atomic_nand_fetch (v, init, __ATOMIC_RELEASE) !=  0)
+    abort ();
+
+  if (__atomic_nand_fetch (v, init, __ATOMIC_ACQ_REL) !=  init)
+    abort ();
+
+  if (__atomic_nand_fetch (v, 0, __ATOMIC_SEQ_CST) !=  init)
+    abort ();
+}
+
+
+
+void
+test_xor_fetch (short* v)
+{
+  *v = init;
+  count = 0;
+
+  if (__atomic_xor_fetch (v, count, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_xor_fetch (v, ~count, __ATOMIC_CONSUME) !=  0)
+    abort ();
+
+  if (__atomic_xor_fetch (v, 0, __ATOMIC_ACQUIRE) !=  0)
+    abort ();
+
+  if (__atomic_xor_fetch (v, ~count, __ATOMIC_RELEASE) !=  init)
+    abort ();
+
+  if (__atomic_xor_fetch (v, 0, __ATOMIC_ACQ_REL) !=  init)
+    abort ();
+
+  if (__atomic_xor_fetch (v, ~count, __ATOMIC_SEQ_CST) !=  0)
+    abort ();
+}
+
+void
+test_or_fetch (short* v)
+{
+  *v = 0;
+  count = 1;
+
+  if (__atomic_or_fetch (v, count, __ATOMIC_RELAXED) !=  1)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, 2, __ATOMIC_CONSUME) !=  3)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, count, __ATOMIC_ACQUIRE) !=  7)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, 8, __ATOMIC_RELEASE) !=  15)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, count, __ATOMIC_ACQ_REL) !=  31)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, count, __ATOMIC_SEQ_CST) !=  63)
+    abort ();
+}
+
+
+/* Test the OP routines with a result which isn't used. Use both variations
+   within each function.  */
+
+void
+test_add (short* v)
+{
+  *v = 0;
+  count = 1;
+
+  __atomic_add_fetch (v, count, __ATOMIC_RELAXED);
+  if (*v != 1)
+    abort ();
+
+  __atomic_fetch_add (v, count, __ATOMIC_CONSUME);
+  if (*v != 2)
+    abort ();
+
+  __atomic_add_fetch (v, 1 , __ATOMIC_ACQUIRE);
+  if (*v != 3)
+    abort ();
+
+  __atomic_fetch_add (v, 1, __ATOMIC_RELEASE);
+  if (*v != 4)
+    abort ();
+
+  __atomic_add_fetch (v, count, __ATOMIC_ACQ_REL);
+  if (*v != 5)
+    abort ();
+
+  __atomic_fetch_add (v, count, __ATOMIC_SEQ_CST);
+  if (*v != 6)
+    abort ();
+}
+
+
+void
+test_sub (short* v)
+{
+  *v = res = 20;
+  count = 0;
+
+  __atomic_sub_fetch (v, count + 1, __ATOMIC_RELAXED);
+  if (*v != --res)
+    abort ();
+
+  __atomic_fetch_sub (v, count + 1, __ATOMIC_CONSUME);
+  if (*v != --res)
+    abort ();
+
+  __atomic_sub_fetch (v, 1, __ATOMIC_ACQUIRE);
+  if (*v != --res)
+    abort ();
+
+  __atomic_fetch_sub (v, 1, __ATOMIC_RELEASE);
+  if (*v != --res)
+    abort ();
+
+  __atomic_sub_fetch (v, count + 1, __ATOMIC_ACQ_REL);
+  if (*v != --res)
+    abort ();
+
+  __atomic_fetch_sub (v, count + 1, __ATOMIC_SEQ_CST);
+  if (*v != --res)
+    abort ();
+}
+
+void
+test_and (short* v)
+{
+  *v = init;
+
+  __atomic_and_fetch (v, 0, __ATOMIC_RELAXED);
+  if (*v != 0)
+    abort ();
+
+  *v = init;
+  __atomic_fetch_and (v, init, __ATOMIC_CONSUME);
+  if (*v != init)
+    abort ();
+
+  __atomic_and_fetch (v, 0, __ATOMIC_ACQUIRE);
+  if (*v != 0)
+    abort ();
+
+  *v = ~*v;
+  __atomic_fetch_and (v, init, __ATOMIC_RELEASE);
+  if (*v != init)
+    abort ();
+
+  __atomic_and_fetch (v, 0, __ATOMIC_ACQ_REL);
+  if (*v != 0)
+    abort ();
+
+  *v = ~*v;
+  __atomic_fetch_and (v, 0, __ATOMIC_SEQ_CST);
+  if (*v != 0)
+    abort ();
+}
+
+void
+test_nand (short* v)
+{
+  *v = init;
+
+  __atomic_fetch_nand (v, 0, __ATOMIC_RELAXED);
+  if (*v != init)
+    abort ();
+
+  __atomic_fetch_nand (v, init, __ATOMIC_CONSUME);
+  if (*v != 0)
+    abort ();
+
+  __atomic_nand_fetch (v, 0, __ATOMIC_ACQUIRE);
+  if (*v != init)
+    abort ();
+
+  __atomic_nand_fetch (v, init, __ATOMIC_RELEASE);
+  if (*v != 0)
+    abort ();
+
+  __atomic_fetch_nand (v, init, __ATOMIC_ACQ_REL);
+  if (*v != init)
+    abort ();
+
+  __atomic_nand_fetch (v, 0, __ATOMIC_SEQ_CST);
+  if (*v != init)
+    abort ();
+}
+
+
+
+void
+test_xor (short* v)
+{
+  *v = init;
+  count = 0;
+
+  __atomic_xor_fetch (v, count, __ATOMIC_RELAXED);
+  if (*v != init)
+    abort ();
+
+  __atomic_fetch_xor (v, ~count, __ATOMIC_CONSUME);
+  if (*v != 0)
+    abort ();
+
+  __atomic_xor_fetch (v, 0, __ATOMIC_ACQUIRE);
+  if (*v != 0)
+    abort ();
+
+  __atomic_fetch_xor (v, ~count, __ATOMIC_RELEASE);
+  if (*v != init)
+    abort ();
+
+  __atomic_fetch_xor (v, 0, __ATOMIC_ACQ_REL);
+  if (*v != init)
+    abort ();
+
+  __atomic_xor_fetch (v, ~count, __ATOMIC_SEQ_CST);
+  if (*v != 0)
+    abort ();
+}
+
+void
+test_or (short* v)
+{
+  *v = 0;
+  count = 1;
+
+  __atomic_or_fetch (v, count, __ATOMIC_RELAXED);
+  if (*v != 1)
+    abort ();
+
+  count *= 2;
+  __atomic_fetch_or (v, count, __ATOMIC_CONSUME);
+  if (*v != 3)
+    abort ();
+
+  count *= 2;
+  __atomic_or_fetch (v, 4, __ATOMIC_ACQUIRE);
+  if (*v != 7)
+    abort ();
+
+  count *= 2;
+  __atomic_fetch_or (v, 8, __ATOMIC_RELEASE);
+  if (*v != 15)
+    abort ();
+
+  count *= 2;
+  __atomic_or_fetch (v, count, __ATOMIC_ACQ_REL);
+  if (*v != 31)
+    abort ();
+
+  count *= 2;
+  __atomic_fetch_or (v, count, __ATOMIC_SEQ_CST);
+  if (*v != 63)
+    abort ();
+}
+
+int
+main () {
+  short* V[] = {&A.a, &A.b};
+
+  for (int i = 0; i < 2; i++) {
+    test_fetch_add (V[i]);
+    test_fetch_sub (V[i]);
+    test_fetch_and (V[i]);
+    test_fetch_nand (V[i]);
+    test_fetch_xor (V[i]);
+    test_fetch_or (V[i]);
+
+    test_add_fetch (V[i]);
+    test_sub_fetch (V[i]);
+    test_and_fetch (V[i]);
+    test_nand_fetch (V[i]);
+    test_xor_fetch (V[i]);
+    test_or_fetch (V[i]);
+
+    test_add (V[i]);
+    test_sub (V[i]);
+    test_and (V[i]);
+    test_nand (V[i]);
+    test_xor (V[i]);
+    test_or (V[i]);
+  }
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-5.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-5.c
new file mode 100644
index 00000000000..52093894a79
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-5.c
@@ -0,0 +1,87 @@
+/* Test __atomic routines for existence and proper execution on 1 byte
+   values with each valid memory model.  */
+/* Duplicate logic as libatomic/testsuite/libatomic.c/atomic-compare-exchange-1.c */
+/* { dg-do run } */
+/* { dg-options "-minline-atomics" } */
+
+/* Test the execution of the __atomic_compare_exchange_n builtin for a char.  */
+
+extern void abort(void);
+
+char v = 0;
+char expected = 0;
+char max = ~0;
+char desired = ~0;
+char zero = 0;
+
+#define STRONG 0
+#define WEAK 1
+
+int
+main ()
+{
+
+  if (!__atomic_compare_exchange_n (&v, &expected, max, STRONG , __ATOMIC_RELAXED, __ATOMIC_RELAXED))
+    abort ();
+  if (expected != 0)
+    abort ();
+
+  if (__atomic_compare_exchange_n (&v, &expected, 0, STRONG , __ATOMIC_ACQUIRE, __ATOMIC_RELAXED))
+    abort ();
+  if (expected != max)
+    abort ();
+
+  if (!__atomic_compare_exchange_n (&v, &expected, 0, STRONG , __ATOMIC_RELEASE, __ATOMIC_ACQUIRE))
+    abort ();
+  if (expected != max)
+    abort ();
+  if (v != 0)
+    abort ();
+
+  if (__atomic_compare_exchange_n (&v, &expected, desired, WEAK, __ATOMIC_ACQ_REL, __ATOMIC_ACQUIRE))
+    abort ();
+  if (expected != 0)
+    abort ();
+
+  if (!__atomic_compare_exchange_n (&v, &expected, desired, STRONG , __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST))
+    abort ();
+  if (expected != 0)
+    abort ();
+  if (v != max)
+    abort ();
+
+  /* Now test the generic version.  */
+
+  v = 0;
+
+  if (!__atomic_compare_exchange (&v, &expected, &max, STRONG, __ATOMIC_RELAXED, __ATOMIC_RELAXED))
+    abort ();
+  if (expected != 0)
+    abort ();
+
+  if (__atomic_compare_exchange (&v, &expected, &zero, STRONG , __ATOMIC_ACQUIRE, __ATOMIC_RELAXED))
+    abort ();
+  if (expected != max)
+    abort ();
+
+  if (!__atomic_compare_exchange (&v, &expected, &zero, STRONG , __ATOMIC_RELEASE, __ATOMIC_ACQUIRE))
+    abort ();
+  if (expected != max)
+    abort ();
+  if (v != 0)
+    abort ();
+
+  if (__atomic_compare_exchange (&v, &expected, &desired, WEAK, __ATOMIC_ACQ_REL, __ATOMIC_ACQUIRE))
+    abort ();
+  if (expected != 0)
+    abort ();
+
+  if (!__atomic_compare_exchange (&v, &expected, &desired, STRONG , __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST))
+    abort ();
+  if (expected != 0)
+    abort ();
+  if (v != max)
+    abort ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-6.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-6.c
new file mode 100644
index 00000000000..8fee8c44811
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-6.c
@@ -0,0 +1,87 @@
+/* Test __atomic routines for existence and proper execution on 2 byte
+   values with each valid memory model.  */
+/* Duplicate logic as libatomic/testsuite/libatomic.c/atomic-compare-exchange-2.c */
+/* { dg-do run } */
+/* { dg-options "-minline-atomics" } */
+
+/* Test the execution of the __atomic_compare_exchange_n builtin for a short.  */
+
+extern void abort(void);
+
+short v = 0;
+short expected = 0;
+short max = ~0;
+short desired = ~0;
+short zero = 0;
+
+#define STRONG 0
+#define WEAK 1
+
+int
+main ()
+{
+
+  if (!__atomic_compare_exchange_n (&v, &expected, max, STRONG , __ATOMIC_RELAXED, __ATOMIC_RELAXED))
+    abort ();
+  if (expected != 0)
+    abort ();
+
+  if (__atomic_compare_exchange_n (&v, &expected, 0, STRONG , __ATOMIC_ACQUIRE, __ATOMIC_RELAXED))
+    abort ();
+  if (expected != max)
+    abort ();
+
+  if (!__atomic_compare_exchange_n (&v, &expected, 0, STRONG , __ATOMIC_RELEASE, __ATOMIC_ACQUIRE))
+    abort ();
+  if (expected != max)
+    abort ();
+  if (v != 0)
+    abort ();
+
+  if (__atomic_compare_exchange_n (&v, &expected, desired, WEAK, __ATOMIC_ACQ_REL, __ATOMIC_ACQUIRE))
+    abort ();
+  if (expected != 0)
+    abort ();
+
+  if (!__atomic_compare_exchange_n (&v, &expected, desired, STRONG , __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST))
+    abort ();
+  if (expected != 0)
+    abort ();
+  if (v != max)
+    abort ();
+
+  /* Now test the generic version.  */
+
+  v = 0;
+
+  if (!__atomic_compare_exchange (&v, &expected, &max, STRONG, __ATOMIC_RELAXED, __ATOMIC_RELAXED))
+    abort ();
+  if (expected != 0)
+    abort ();
+
+  if (__atomic_compare_exchange (&v, &expected, &zero, STRONG , __ATOMIC_ACQUIRE, __ATOMIC_RELAXED))
+    abort ();
+  if (expected != max)
+    abort ();
+
+  if (!__atomic_compare_exchange (&v, &expected, &zero, STRONG , __ATOMIC_RELEASE, __ATOMIC_ACQUIRE))
+    abort ();
+  if (expected != max)
+    abort ();
+  if (v != 0)
+    abort ();
+
+  if (__atomic_compare_exchange (&v, &expected, &desired, WEAK, __ATOMIC_ACQ_REL, __ATOMIC_ACQUIRE))
+    abort ();
+  if (expected != 0)
+    abort ();
+
+  if (!__atomic_compare_exchange (&v, &expected, &desired, STRONG , __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST))
+    abort ();
+  if (expected != 0)
+    abort ();
+  if (v != max)
+    abort ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-7.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-7.c
new file mode 100644
index 00000000000..24c344c0ce3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-7.c
@@ -0,0 +1,69 @@
+/* Test __atomic routines for existence and proper execution on 1 byte
+   values with each valid memory model.  */
+/* Duplicate logic as libatomic/testsuite/libatomic.c/atomic-exchange-1.c */
+/* { dg-do run } */
+/* { dg-options "-minline-atomics" } */
+
+/* Test the execution of the __atomic_exchange_n builtin for a char.  */
+
+extern void abort(void);
+
+char v, count, ret;
+
+int
+main ()
+{
+  v = 0;
+  count = 0;
+
+  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_RELAXED) != count)
+    abort ();
+  count++;
+
+  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_ACQUIRE) != count)
+    abort ();
+  count++;
+
+  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_RELEASE) != count)
+    abort ();
+  count++;
+
+  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_ACQ_REL) != count)
+    abort ();
+  count++;
+
+  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_SEQ_CST) != count)
+    abort ();
+  count++;
+
+  /* Now test the generic version.  */
+
+  count++;
+
+  __atomic_exchange (&v, &count, &ret, __ATOMIC_RELAXED);
+  if (ret != count - 1 || v != count)
+    abort ();
+  count++;
+
+  __atomic_exchange (&v, &count, &ret, __ATOMIC_ACQUIRE);
+  if (ret != count - 1 || v != count)
+    abort ();
+  count++;
+
+  __atomic_exchange (&v, &count, &ret, __ATOMIC_RELEASE);
+  if (ret != count - 1 || v != count)
+    abort ();
+  count++;
+
+  __atomic_exchange (&v, &count, &ret, __ATOMIC_ACQ_REL);
+  if (ret != count - 1 || v != count)
+    abort ();
+  count++;
+
+  __atomic_exchange (&v, &count, &ret, __ATOMIC_SEQ_CST);
+  if (ret != count - 1 || v != count)
+    abort ();
+  count++;
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-8.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-8.c
new file mode 100644
index 00000000000..edc212df04e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-8.c
@@ -0,0 +1,69 @@
+/* Test __atomic routines for existence and proper execution on 2 byte
+   values with each valid memory model.  */
+/* Duplicate logic as libatomic/testsuite/libatomic.c/atomic-exchange-2.c */
+/* { dg-do run } */
+/* { dg-options "-minline-atomics" } */
+
+/* Test the execution of the __atomic_X builtin for a short.  */
+
+extern void abort(void);
+
+short v, count, ret;
+
+int
+main ()
+{
+  v = 0;
+  count = 0;
+
+  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_RELAXED) != count)
+    abort ();
+  count++;
+
+  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_ACQUIRE) != count)
+    abort ();
+  count++;
+
+  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_RELEASE) != count)
+    abort ();
+  count++;
+
+  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_ACQ_REL) != count)
+    abort ();
+  count++;
+
+  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_SEQ_CST) != count)
+    abort ();
+  count++;
+
+  /* Now test the generic version.  */
+
+  count++;
+
+  __atomic_exchange (&v, &count, &ret, __ATOMIC_RELAXED);
+  if (ret != count - 1 || v != count)
+    abort ();
+  count++;
+
+  __atomic_exchange (&v, &count, &ret, __ATOMIC_ACQUIRE);
+  if (ret != count - 1 || v != count)
+    abort ();
+  count++;
+
+  __atomic_exchange (&v, &count, &ret, __ATOMIC_RELEASE);
+  if (ret != count - 1 || v != count)
+    abort ();
+  count++;
+
+  __atomic_exchange (&v, &count, &ret, __ATOMIC_ACQ_REL);
+  if (ret != count - 1 || v != count)
+    abort ();
+  count++;
+
+  __atomic_exchange (&v, &count, &ret, __ATOMIC_SEQ_CST);
+  if (ret != count - 1 || v != count)
+    abort ();
+  count++;
+
+  return 0;
+}
diff --git a/libgcc/config/riscv/atomic.c b/libgcc/config/riscv/atomic.c
index 69f53623509..573d163ea04 100644
--- a/libgcc/config/riscv/atomic.c
+++ b/libgcc/config/riscv/atomic.c
@@ -30,6 +30,8 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 #define INVERT		"not %[tmp1], %[tmp1]\n\t"
 #define DONT_INVERT	""
 
+/* Logic duplicated in gcc/gcc/config/riscv/sync.md for use when inlining is enabled */
+
 #define GENERATE_FETCH_AND_OP(type, size, opname, insn, invert, cop)	\
   type __sync_fetch_and_ ## opname ## _ ## size (type *p, type v)	\
   {									\
-- 
2.25.1



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5] RISCV: Inline subword atomic ops
  2023-04-18 14:28 ` [PATCH v5] RISCV: Inline subword atomic ops Patrick O'Neill
@ 2023-04-18 15:06   ` Andreas Schwab
  2023-04-18 16:39   ` [PATCH v6] " Patrick O'Neill
  2023-04-18 16:59   ` [PATCH v5] " Jeff Law
  2 siblings, 0 replies; 24+ messages in thread
From: Andreas Schwab @ 2023-04-18 15:06 UTC (permalink / raw)
  To: Patrick O'Neill
  Cc: gcc-patches, palmer, kito.cheng, david.abd, jeffreyalaw

On Apr 18 2023, Patrick O'Neill wrote:

> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index a38547f53e5..9c3e91d2fee 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -1227,6 +1227,7 @@ See RS/6000 and PowerPC Options.
>  -mstack-protector-guard=@var{guard}  -mstack-protector-guard-reg=@var{reg}
>  -mstack-protector-guard-offset=@var{offset}
>  -mcsr-check -mno-csr-check}
> +-minline-atomics  -mno-inline-atomics

The options need to be inside @gccoptlist.

> @@ -29006,6 +29007,13 @@ Do or don't use smaller but slower prologue and epilogue code that uses
>  library function calls.  The default is to use fast inline prologues and
>  epilogues.
>  
> +@item -minline-atomics
> +@itemx -mno-inline-atomics
> +@opindex minline-atomics

@opindex should precede @item.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v6] RISCV: Inline subword atomic ops
  2023-04-18 14:28 ` [PATCH v5] RISCV: Inline subword atomic ops Patrick O'Neill
  2023-04-18 15:06   ` Andreas Schwab
@ 2023-04-18 16:39   ` Patrick O'Neill
  2023-04-18 20:17     ` Palmer Dabbelt
  2023-04-18 21:41     ` [PATCH v7] " Patrick O'Neill
  2023-04-18 16:59   ` [PATCH v5] " Jeff Law
  2 siblings, 2 replies; 24+ messages in thread
From: Patrick O'Neill @ 2023-04-18 16:39 UTC (permalink / raw)
  To: gcc-patches
  Cc: palmer, kito.cheng, david.abd, jeffreyalaw, schwab, Patrick O'Neill

RISC-V has no support for subword atomic operations; code currently
generates libatomic library calls.

This patch changes the default behavior to inline subword atomic calls 
(using the same logic as the existing library call).
Behavior can be specified using the -minline-atomics and
-mno-inline-atomics command line flags.

gcc/libgcc/config/riscv/atomic.c has the same logic implemented in asm.
This will need to stay for backwards compatibility and the
-mno-inline-atomics flag.

2023-04-18 Patrick O'Neill <patrick@rivosinc.com>

	PR target/104338
	* riscv-protos.h: Add helper function stubs.
	* riscv.cc: Add helper functions for subword masking.
	* riscv.opt: Add command-line flag.
	* sync.md: Add masking logic and inline asm for fetch_and_op,
	fetch_and_nand, CAS, and exchange ops.
	* invoke.texi: Add blurb regarding command-line flag.
	* inline-atomics-1.c: New test.
	* inline-atomics-2.c: Likewise.
	* inline-atomics-3.c: Likewise.
	* inline-atomics-4.c: Likewise.
	* inline-atomics-5.c: Likewise.
	* inline-atomics-6.c: Likewise.
	* inline-atomics-7.c: Likewise.
	* inline-atomics-8.c: Likewise.
	* atomic.c: Add reference to duplicate logic.

Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
---
v5: https://inbox.sourceware.org/gcc-patches/20230418142858.2424851-1-patrick@rivosinc.com/

Addressed Andreas Schwab's comments about the flags/documentation.
  https://inbox.sourceware.org/gcc-patches/87y1mpb57m.fsf@igel.home/

No new failures on trunk.
---
The mapping implemented here matches Libatomic. That mapping changes if
"Implement ISA Manual Table A.6 Mappings" is merged. Depending on which
patch is merged first, I will update the other to make sure the
correct mapping is emitted.
  https://gcc.gnu.org/pipermail/gcc-patches/2023-April/615748.html
---
 gcc/config/riscv/riscv-protos.h               |   2 +
 gcc/config/riscv/riscv.cc                     |  50 ++
 gcc/config/riscv/riscv.opt                    |   4 +
 gcc/config/riscv/sync.md                      | 314 ++++++++++
 gcc/doc/invoke.texi                           |  10 +-
 .../gcc.target/riscv/inline-atomics-1.c       |  18 +
 .../gcc.target/riscv/inline-atomics-2.c       |  19 +
 .../gcc.target/riscv/inline-atomics-3.c       | 569 ++++++++++++++++++
 .../gcc.target/riscv/inline-atomics-4.c       | 566 +++++++++++++++++
 .../gcc.target/riscv/inline-atomics-5.c       |  87 +++
 .../gcc.target/riscv/inline-atomics-6.c       |  87 +++
 .../gcc.target/riscv/inline-atomics-7.c       |  69 +++
 .../gcc.target/riscv/inline-atomics-8.c       |  69 +++
 libgcc/config/riscv/atomic.c                  |   2 +
 14 files changed, 1865 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-8.c

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 5244e8dcbf0..02b33e02020 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -79,6 +79,8 @@ extern void riscv_reinit (void);
 extern poly_uint64 riscv_regmode_natural_size (machine_mode);
 extern bool riscv_v_ext_vector_mode_p (machine_mode);
 extern bool riscv_shamt_matches_mask_p (int, HOST_WIDE_INT);
+extern void riscv_subword_address (rtx, rtx *, rtx *, rtx *, rtx *);
+extern void riscv_lshift_subword (machine_mode, rtx, rtx, rtx *);
 
 /* Routines implemented in riscv-c.cc.  */
 void riscv_cpu_cpp_builtins (cpp_reader *);
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index e4937d1af25..fa0247be22f 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -7143,6 +7143,56 @@ riscv_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
 							& ~zeroed_hardregs);
 }
 
+/* Helper function for extracting a subword from memory.  */
+
+void
+riscv_subword_address (rtx mem, rtx *aligned_mem, rtx *shift, rtx *mask,
+		       rtx *not_mask)
+{
+  /* Align the memory addess to a word.  */
+  rtx addr = force_reg (Pmode, XEXP (mem, 0));
+
+  rtx aligned_addr = gen_reg_rtx (Pmode);
+  emit_move_insn (aligned_addr,  gen_rtx_AND (Pmode, addr,
+					      gen_int_mode (-4, Pmode)));
+
+  *aligned_mem = change_address (mem, SImode, aligned_addr);
+
+  /* Calculate the shift amount.  */
+  emit_move_insn (*shift, gen_rtx_AND (SImode, gen_lowpart (SImode, addr),
+				       gen_int_mode (3, SImode)));
+  emit_move_insn (*shift, gen_rtx_ASHIFT (SImode, *shift,
+					  gen_int_mode(3, SImode)));
+
+  /* Calculate the mask.  */
+  int unshifted_mask;
+  if (GET_MODE (mem) == QImode)
+    unshifted_mask = 0xFF;
+  else
+    unshifted_mask = 0xFFFF;
+
+  emit_move_insn (*mask, gen_int_mode(unshifted_mask, SImode));
+
+  emit_move_insn (*mask, gen_rtx_ASHIFT(SImode, *mask,
+					gen_lowpart (QImode, *shift)));
+
+  emit_move_insn (*not_mask, gen_rtx_NOT(SImode, *mask));
+}
+
+/* Leftshift a subword within an SImode register.  */
+
+void
+riscv_lshift_subword (machine_mode mode, rtx value, rtx shift,
+		      rtx *shifted_value)
+{
+  rtx value_reg = gen_reg_rtx (SImode);
+  emit_move_insn (value_reg, simplify_gen_subreg (SImode, value,
+						  mode, 0));
+
+  emit_move_insn(*shifted_value, gen_rtx_ASHIFT(SImode, value_reg,
+						gen_lowpart (QImode, shift)));
+}
+
 /* Initialize the GCC target structure.  */
 #undef TARGET_ASM_ALIGNED_HI_OP
 #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t"
diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index ff1dd4ddd4f..bc5e63ab3e6 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -254,3 +254,7 @@ Enum(isa_spec_class) String(20191213) Value(ISA_SPEC_CLASS_20191213)
 misa-spec=
 Target RejectNegative Joined Enum(isa_spec_class) Var(riscv_isa_spec) Init(TARGET_DEFAULT_ISA_SPEC)
 Set the version of RISC-V ISA spec.
+
+minline-atomics
+Target Var(TARGET_INLINE_SUBWORD_ATOMIC) Init(1)
+Always inline subword atomic operations.
diff --git a/gcc/config/riscv/sync.md b/gcc/config/riscv/sync.md
index c932ef87b9d..5adbe08e2d6 100644
--- a/gcc/config/riscv/sync.md
+++ b/gcc/config/riscv/sync.md
@@ -21,8 +21,11 @@
 
 (define_c_enum "unspec" [
   UNSPEC_COMPARE_AND_SWAP
+  UNSPEC_COMPARE_AND_SWAP_SUBWORD
   UNSPEC_SYNC_OLD_OP
+  UNSPEC_SYNC_OLD_OP_SUBWORD
   UNSPEC_SYNC_EXCHANGE
+  UNSPEC_SYNC_EXCHANGE_SUBWORD
   UNSPEC_ATOMIC_STORE
   UNSPEC_MEMORY_BARRIER
 ])
@@ -91,6 +94,143 @@
   [(set_attr "type" "atomic")
    (set (attr "length") (const_int 8))])
 
+(define_insn "subword_atomic_fetch_strong_<atomic_optab>"
+  [(set (match_operand:SI 0 "register_operand" "=&r")		   ;; old value at mem
+	(match_operand:SI 1 "memory_operand" "+A"))		   ;; mem location
+   (set (match_dup 1)
+	(unspec_volatile:SI
+	  [(any_atomic:SI (match_dup 1)
+		     (match_operand:SI 2 "register_operand" "rI")) ;; value for op
+	   (match_operand:SI 3 "register_operand" "rI")]	   ;; mask
+	 UNSPEC_SYNC_OLD_OP_SUBWORD))
+    (match_operand:SI 4 "register_operand" "rI")		   ;; not_mask
+    (clobber (match_scratch:SI 5 "=&r"))			   ;; tmp_1
+    (clobber (match_scratch:SI 6 "=&r"))]			   ;; tmp_2
+  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
+  {
+    return "1:\;"
+	   "lr.w.aq\t%0, %1\;"
+	   "<insn>\t%5, %0, %2\;"
+	   "and\t%5, %5, %3\;"
+	   "and\t%6, %0, %4\;"
+	   "or\t%6, %6, %5\;"
+	   "sc.w.rl\t%5, %6, %1\;"
+	   "bnez\t%5, 1b";
+  }
+  [(set (attr "length") (const_int 28))])
+
+(define_expand "atomic_fetch_nand<mode>"
+  [(set (match_operand:SHORT 0 "register_operand" "=&r")
+	(match_operand:SHORT 1 "memory_operand" "+A"))
+   (set (match_dup 1)
+	(unspec_volatile:SHORT
+	  [(not:SHORT (and:SHORT (match_dup 1)
+				 (match_operand:SHORT 2 "reg_or_0_operand" "rJ")))
+	   (match_operand:SI 3 "const_int_operand")] ;; model
+	 UNSPEC_SYNC_OLD_OP_SUBWORD))]
+  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
+{
+  /* We have no QImode/HImode atomics, so form a mask, then use
+     subword_atomic_fetch_strong_nand to implement a LR/SC version of the
+     operation. */
+
+  /* Logic duplicated in gcc/libgcc/config/riscv/atomic.c for use when inlining
+     is disabled */
+
+  rtx old = gen_reg_rtx (SImode);
+  rtx mem = operands[1];
+  rtx value = operands[2];
+  rtx aligned_mem = gen_reg_rtx (SImode);
+  rtx shift = gen_reg_rtx (SImode);
+  rtx mask = gen_reg_rtx (SImode);
+  rtx not_mask = gen_reg_rtx (SImode);
+
+  riscv_subword_address (mem, &aligned_mem, &shift, &mask, &not_mask);
+
+  rtx shifted_value = gen_reg_rtx (SImode);
+  riscv_lshift_subword (<MODE>mode, value, shift, &shifted_value);
+
+  emit_insn (gen_subword_atomic_fetch_strong_nand (old, aligned_mem,
+						   shifted_value,
+						   mask, not_mask));
+
+  emit_move_insn (old, gen_rtx_ASHIFTRT (SImode, old,
+					 gen_lowpart (QImode, shift)));
+
+  emit_move_insn (operands[0], gen_lowpart (<MODE>mode, old));
+
+  DONE;
+})
+
+(define_insn "subword_atomic_fetch_strong_nand"
+  [(set (match_operand:SI 0 "register_operand" "=&r")			  ;; old value at mem
+	(match_operand:SI 1 "memory_operand" "+A"))			  ;; mem location
+   (set (match_dup 1)
+	(unspec_volatile:SI
+	  [(not:SI (and:SI (match_dup 1)
+			   (match_operand:SI 2 "register_operand" "rI"))) ;; value for op
+	   (match_operand:SI 3 "register_operand" "rI")]		  ;; mask
+	 UNSPEC_SYNC_OLD_OP_SUBWORD))
+    (match_operand:SI 4 "register_operand" "rI")			  ;; not_mask
+    (clobber (match_scratch:SI 5 "=&r"))				  ;; tmp_1
+    (clobber (match_scratch:SI 6 "=&r"))]				  ;; tmp_2
+  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
+  {
+    return "1:\;"
+	   "lr.w.aq\t%0, %1\;"
+	   "and\t%5, %0, %2\;"
+	   "not\t%5, %5\;"
+	   "and\t%5, %5, %3\;"
+	   "and\t%6, %0, %4\;"
+	   "or\t%6, %6, %5\;"
+	   "sc.w.rl\t%5, %6, %1\;"
+	   "bnez\t%5, 1b";
+  }
+  [(set (attr "length") (const_int 32))])
+
+(define_expand "atomic_fetch_<atomic_optab><mode>"
+  [(set (match_operand:SHORT 0 "register_operand" "=&r")	      ;; old value at mem
+	(match_operand:SHORT 1 "memory_operand" "+A"))		      ;; mem location
+   (set (match_dup 1)
+	(unspec_volatile:SHORT
+	  [(any_atomic:SHORT (match_dup 1)
+		     (match_operand:SHORT 2 "reg_or_0_operand" "rJ")) ;; value for op
+	   (match_operand:SI 3 "const_int_operand")]		      ;; model
+	 UNSPEC_SYNC_OLD_OP_SUBWORD))]
+  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
+{
+  /* We have no QImode/HImode atomics, so form a mask, then use
+     subword_atomic_fetch_strong_<mode> to implement a LR/SC version of the
+     operation. */
+
+  /* Logic duplicated in gcc/libgcc/config/riscv/atomic.c for use when inlining
+     is disabled */
+
+  rtx old = gen_reg_rtx (SImode);
+  rtx mem = operands[1];
+  rtx value = operands[2];
+  rtx aligned_mem = gen_reg_rtx (SImode);
+  rtx shift = gen_reg_rtx (SImode);
+  rtx mask = gen_reg_rtx (SImode);
+  rtx not_mask = gen_reg_rtx (SImode);
+
+  riscv_subword_address (mem, &aligned_mem, &shift, &mask, &not_mask);
+
+  rtx shifted_value = gen_reg_rtx (SImode);
+  riscv_lshift_subword (<MODE>mode, value, shift, &shifted_value);
+
+  emit_insn (gen_subword_atomic_fetch_strong_<atomic_optab> (old, aligned_mem,
+							     shifted_value,
+							     mask, not_mask));
+
+  emit_move_insn (old, gen_rtx_ASHIFTRT (SImode, old,
+					 gen_lowpart (QImode, shift)));
+
+  emit_move_insn (operands[0], gen_lowpart (<MODE>mode, old));
+
+  DONE;
+})
+
 (define_insn "atomic_exchange<mode>"
   [(set (match_operand:GPR 0 "register_operand" "=&r")
 	(unspec_volatile:GPR
@@ -104,6 +244,59 @@
   [(set_attr "type" "atomic")
    (set (attr "length") (const_int 8))])
 
+(define_expand "atomic_exchange<mode>"
+  [(set (match_operand:SHORT 0 "register_operand" "=&r")
+	(unspec_volatile:SHORT
+	  [(match_operand:SHORT 1 "memory_operand" "+A")
+	   (match_operand:SI 3 "const_int_operand")] ;; model
+	  UNSPEC_SYNC_EXCHANGE_SUBWORD))
+   (set (match_dup 1)
+	(match_operand:SHORT 2 "register_operand" "0"))]
+  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
+{
+  rtx old = gen_reg_rtx (SImode);
+  rtx mem = operands[1];
+  rtx value = operands[2];
+  rtx aligned_mem = gen_reg_rtx (SImode);
+  rtx shift = gen_reg_rtx (SImode);
+  rtx mask = gen_reg_rtx (SImode);
+  rtx not_mask = gen_reg_rtx (SImode);
+
+  riscv_subword_address (mem, &aligned_mem, &shift, &mask, &not_mask);
+
+  rtx shifted_value = gen_reg_rtx (SImode);
+  riscv_lshift_subword (<MODE>mode, value, shift, &shifted_value);
+
+  emit_insn (gen_subword_atomic_exchange_strong (old, aligned_mem,
+						 shifted_value, not_mask));
+
+  emit_move_insn (old, gen_rtx_ASHIFTRT (SImode, old,
+					 gen_lowpart (QImode, shift)));
+
+  emit_move_insn (operands[0], gen_lowpart (<MODE>mode, old));
+  DONE;
+})
+
+(define_insn "subword_atomic_exchange_strong"
+  [(set (match_operand:SI 0 "register_operand" "=&r")	 ;; old value at mem
+	(match_operand:SI 1 "memory_operand" "+A"))	 ;; mem location
+   (set (match_dup 1)
+	(unspec_volatile:SI
+	  [(match_operand:SI 2 "reg_or_0_operand" "rI")  ;; value
+	   (match_operand:SI 3 "reg_or_0_operand" "rI")] ;; not_mask
+      UNSPEC_SYNC_EXCHANGE_SUBWORD))
+    (clobber (match_scratch:SI 4 "=&r"))]		 ;; tmp_1
+  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
+  {
+    return "1:\;"
+	   "lr.w.aq\t%0, %1\;"
+	   "and\t%4, %0, %3\;"
+	   "or\t%4, %4, %2\;"
+	   "sc.w.rl\t%4, %4, %1\;"
+	   "bnez\t%4, 1b";
+  }
+  [(set (attr "length") (const_int 20))])
+
 (define_insn "atomic_cas_value_strong<mode>"
   [(set (match_operand:GPR 0 "register_operand" "=&r")
 	(match_operand:GPR 1 "memory_operand" "+A"))
@@ -153,6 +346,127 @@
   DONE;
 })
 
+(define_expand "atomic_compare_and_swap<mode>"
+  [(match_operand:SI 0 "register_operand" "")    ;; bool output
+   (match_operand:SHORT 1 "register_operand" "") ;; val output
+   (match_operand:SHORT 2 "memory_operand" "")   ;; memory
+   (match_operand:SHORT 3 "reg_or_0_operand" "") ;; expected value
+   (match_operand:SHORT 4 "reg_or_0_operand" "") ;; desired value
+   (match_operand:SI 5 "const_int_operand" "")   ;; is_weak
+   (match_operand:SI 6 "const_int_operand" "")   ;; mod_s
+   (match_operand:SI 7 "const_int_operand" "")]  ;; mod_f
+  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
+{
+  emit_insn (gen_atomic_cas_value_strong<mode> (operands[1], operands[2],
+						operands[3], operands[4],
+						operands[6], operands[7]));
+
+  rtx val = gen_reg_rtx (SImode);
+  if (operands[1] != const0_rtx)
+    emit_move_insn (val, gen_rtx_SIGN_EXTEND (SImode, operands[1]));
+  else
+    emit_move_insn (val, const0_rtx);
+
+  rtx exp = gen_reg_rtx (SImode);
+  if (operands[3] != const0_rtx)
+    emit_move_insn (exp, gen_rtx_SIGN_EXTEND (SImode, operands[3]));
+  else
+    emit_move_insn (exp, const0_rtx);
+
+  rtx compare = val;
+  if (exp != const0_rtx)
+    {
+      rtx difference = gen_rtx_MINUS (SImode, val, exp);
+      compare = gen_reg_rtx (SImode);
+      emit_move_insn  (compare, difference);
+    }
+
+  if (word_mode != SImode)
+    {
+      rtx reg = gen_reg_rtx (word_mode);
+      emit_move_insn (reg, gen_rtx_SIGN_EXTEND (word_mode, compare));
+      compare = reg;
+    }
+
+  emit_move_insn (operands[0], gen_rtx_EQ (SImode, compare, const0_rtx));
+  DONE;
+})
+
+(define_expand "atomic_cas_value_strong<mode>"
+  [(set (match_operand:SHORT 0 "register_operand" "=&r")			;; val output
+	(match_operand:SHORT 1 "memory_operand" "+A"))				;; memory
+   (set (match_dup 1)
+	(unspec_volatile:SHORT [(match_operand:SHORT 2 "reg_or_0_operand" "rJ")	;; expected val
+				(match_operand:SHORT 3 "reg_or_0_operand" "rJ")	;; desired val
+				(match_operand:SI 4 "const_int_operand")	;; mod_s
+				(match_operand:SI 5 "const_int_operand")]	;; mod_f
+	 UNSPEC_COMPARE_AND_SWAP_SUBWORD))
+   (clobber (match_scratch:SHORT 6 "=&r"))]
+  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
+{
+  /* We have no QImode/HImode atomics, so form a mask, then use
+     subword_atomic_cas_strong<mode> to implement a LR/SC version of the
+     operation. */
+
+  /* Logic duplicated in gcc/libgcc/config/riscv/atomic.c for use when inlining
+     is disabled */
+
+  rtx old = gen_reg_rtx (SImode);
+  rtx mem = operands[1];
+  rtx aligned_mem = gen_reg_rtx (SImode);
+  rtx shift = gen_reg_rtx (SImode);
+  rtx mask = gen_reg_rtx (SImode);
+  rtx not_mask = gen_reg_rtx (SImode);
+
+  riscv_subword_address (mem, &aligned_mem, &shift, &mask, &not_mask);
+
+  rtx o = operands[2];
+  rtx n = operands[3];
+  rtx shifted_o = gen_reg_rtx (SImode);
+  rtx shifted_n = gen_reg_rtx (SImode);
+
+  riscv_lshift_subword (<MODE>mode, o, shift, &shifted_o);
+  riscv_lshift_subword (<MODE>mode, n, shift, &shifted_n);
+
+  emit_move_insn (shifted_o, gen_rtx_AND (SImode, shifted_o, mask));
+  emit_move_insn (shifted_n, gen_rtx_AND (SImode, shifted_n, mask));
+
+  emit_insn (gen_subword_atomic_cas_strong (old, aligned_mem,
+					    shifted_o, shifted_n,
+					    mask, not_mask));
+
+  emit_move_insn (old, gen_rtx_ASHIFTRT (SImode, old,
+					 gen_lowpart (QImode, shift)));
+
+  emit_move_insn (operands[0], gen_lowpart(<MODE>mode, old));
+
+  DONE;
+})
+
+(define_insn "subword_atomic_cas_strong"
+  [(set (match_operand:SI 0 "register_operand" "=&r")			   ;; old value at mem
+	(match_operand:SI 1 "memory_operand" "+A"))			   ;; mem location
+   (set (match_dup 1)
+	(unspec_volatile:SI [(match_operand:SI 2 "reg_or_0_operand" "rJ")  ;; o
+			     (match_operand:SI 3 "reg_or_0_operand" "rJ")] ;; n
+	 UNSPEC_COMPARE_AND_SWAP_SUBWORD))
+	(match_operand:SI 4 "register_operand" "rI")			   ;; mask
+	(match_operand:SI 5 "register_operand" "rI")			   ;; not_mask
+	(clobber (match_scratch:SI 6 "=&r"))]				   ;; tmp_1
+  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
+  {
+    return "1:\;"
+	   "lr.w.aq\t%0, %1\;"
+	   "and\t%6, %0, %4\;"
+	   "bne\t%6, %z2, 1f\;"
+	   "and\t%6, %0, %5\;"
+	   "or\t%6, %6, %3\;"
+	   "sc.w.rl\t%6, %6, %1\;"
+	   "bnez\t%6, 1b\;"
+	   "1:";
+  }
+  [(set (attr "length") (const_int 28))])
+
 (define_expand "atomic_test_and_set"
   [(match_operand:QI 0 "register_operand" "")     ;; bool output
    (match_operand:QI 1 "memory_operand" "+A")    ;; memory
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index a38547f53e5..ba448dcb7ef 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1226,7 +1226,8 @@ See RS/6000 and PowerPC Options.
 -mbig-endian  -mlittle-endian
 -mstack-protector-guard=@var{guard}  -mstack-protector-guard-reg=@var{reg}
 -mstack-protector-guard-offset=@var{offset}
--mcsr-check -mno-csr-check}
+-mcsr-check -mno-csr-check
+-minline-atomics  -mno-inline-atomics}
 
 @emph{RL78 Options}
 @gccoptlist{-msim  -mmul=none  -mmul=g13  -mmul=g14  -mallregs
@@ -29006,6 +29007,13 @@ Do or don't use smaller but slower prologue and epilogue code that uses
 library function calls.  The default is to use fast inline prologues and
 epilogues.
 
+@opindex minline-atomics
+@item -minline-atomics
+@itemx -mno-inline-atomics
+Do or don't use smaller but slower subword atomic emulation code that uses
+libatomic function calls.  The default is to use fast inline subword atomics
+that do not require libatomic.
+
 @opindex mshorten-memrefs
 @item -mshorten-memrefs
 @itemx -mno-shorten-memrefs
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-1.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-1.c
new file mode 100644
index 00000000000..5c5623d9b2f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mno-inline-atomics" } */
+/* { dg-message "note: '__sync_fetch_and_nand' changed semantics in GCC 4.4" "fetch_and_nand" { target *-*-* } 0 } */
+/* { dg-final { scan-assembler "\tcall\t__sync_fetch_and_add_1" } } */
+/* { dg-final { scan-assembler "\tcall\t__sync_fetch_and_nand_1" } } */
+/* { dg-final { scan-assembler "\tcall\t__sync_bool_compare_and_swap_1" } } */
+
+char foo;
+char bar;
+char baz;
+
+int
+main ()
+{
+  __sync_fetch_and_add(&foo, 1);
+  __sync_fetch_and_nand(&bar, 1);
+  __sync_bool_compare_and_swap (&baz, 1, 2);
+}
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-2.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-2.c
new file mode 100644
index 00000000000..fdce7a5d71f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-2.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* Verify that subword atomics do not generate calls.  */
+/* { dg-options "-minline-atomics" } */
+/* { dg-message "note: '__sync_fetch_and_nand' changed semantics in GCC 4.4" "fetch_and_nand" { target *-*-* } 0 } */
+/* { dg-final { scan-assembler-not "\tcall\t__sync_fetch_and_add_1" } } */
+/* { dg-final { scan-assembler-not "\tcall\t__sync_fetch_and_nand_1" } } */
+/* { dg-final { scan-assembler-not "\tcall\t__sync_bool_compare_and_swap_1" } } */
+
+char foo;
+char bar;
+char baz;
+
+int
+main ()
+{
+  __sync_fetch_and_add(&foo, 1);
+  __sync_fetch_and_nand(&bar, 1);
+  __sync_bool_compare_and_swap (&baz, 1, 2);
+}
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-3.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-3.c
new file mode 100644
index 00000000000..709f3734377
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-3.c
@@ -0,0 +1,569 @@
+/* Check all char alignments.  */
+/* Duplicate logic as libatomic/testsuite/libatomic.c/atomic-op-1.c */
+/* Test __atomic routines for existence and proper execution on 1 byte
+   values with each valid memory model.  */
+/* { dg-do run } */
+/* { dg-options "-minline-atomics -Wno-address-of-packed-member" } */
+
+/* Test the execution of the __atomic_*OP builtin routines for a char.  */
+
+extern void abort(void);
+
+char count, res;
+const char init = ~0;
+
+struct A
+{
+   char a;
+   char b;
+   char c;
+   char d;
+} __attribute__ ((packed)) A;
+
+/* The fetch_op routines return the original value before the operation.  */
+
+void
+test_fetch_add (char* v)
+{
+  *v = 0;
+  count = 1;
+
+  if (__atomic_fetch_add (v, count, __ATOMIC_RELAXED) != 0)
+    abort ();
+
+  if (__atomic_fetch_add (v, 1, __ATOMIC_CONSUME) != 1)
+    abort ();
+
+  if (__atomic_fetch_add (v, count, __ATOMIC_ACQUIRE) != 2)
+    abort ();
+
+  if (__atomic_fetch_add (v, 1, __ATOMIC_RELEASE) != 3)
+    abort ();
+
+  if (__atomic_fetch_add (v, count, __ATOMIC_ACQ_REL) != 4)
+    abort ();
+
+  if (__atomic_fetch_add (v, 1, __ATOMIC_SEQ_CST) != 5)
+    abort ();
+}
+
+
+void
+test_fetch_sub (char* v)
+{
+  *v = res = 20;
+  count = 0;
+
+  if (__atomic_fetch_sub (v, count + 1, __ATOMIC_RELAXED) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, 1, __ATOMIC_CONSUME) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, count + 1, __ATOMIC_ACQUIRE) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, 1, __ATOMIC_RELEASE) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, count + 1, __ATOMIC_ACQ_REL) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, 1, __ATOMIC_SEQ_CST) !=  res--)
+    abort ();
+}
+
+void
+test_fetch_and (char* v)
+{
+  *v = init;
+
+  if (__atomic_fetch_and (v, 0, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_fetch_and (v, init, __ATOMIC_CONSUME) !=  0)
+    abort ();
+
+  if (__atomic_fetch_and (v, 0, __ATOMIC_ACQUIRE) !=  0)
+    abort ();
+
+  *v = ~*v;
+  if (__atomic_fetch_and (v, init, __ATOMIC_RELEASE) !=  init)
+    abort ();
+
+  if (__atomic_fetch_and (v, 0, __ATOMIC_ACQ_REL) !=  init)
+    abort ();
+
+  if (__atomic_fetch_and (v, 0, __ATOMIC_SEQ_CST) !=  0)
+    abort ();
+}
+
+void
+test_fetch_nand (char* v)
+{
+  *v = init;
+
+  if (__atomic_fetch_nand (v, 0, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_fetch_nand (v, init, __ATOMIC_CONSUME) !=  init)
+    abort ();
+
+  if (__atomic_fetch_nand (v, 0, __ATOMIC_ACQUIRE) !=  0 )
+    abort ();
+
+  if (__atomic_fetch_nand (v, init, __ATOMIC_RELEASE) !=  init)
+    abort ();
+
+  if (__atomic_fetch_nand (v, init, __ATOMIC_ACQ_REL) !=  0)
+    abort ();
+
+  if (__atomic_fetch_nand (v, 0, __ATOMIC_SEQ_CST) !=  init)
+    abort ();
+}
+
+void
+test_fetch_xor (char* v)
+{
+  *v = init;
+  count = 0;
+
+  if (__atomic_fetch_xor (v, count, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_fetch_xor (v, ~count, __ATOMIC_CONSUME) !=  init)
+    abort ();
+
+  if (__atomic_fetch_xor (v, 0, __ATOMIC_ACQUIRE) !=  0)
+    abort ();
+
+  if (__atomic_fetch_xor (v, ~count, __ATOMIC_RELEASE) !=  0)
+    abort ();
+
+  if (__atomic_fetch_xor (v, 0, __ATOMIC_ACQ_REL) !=  init)
+    abort ();
+
+  if (__atomic_fetch_xor (v, ~count, __ATOMIC_SEQ_CST) !=  init)
+    abort ();
+}
+
+void
+test_fetch_or (char* v)
+{
+  *v = 0;
+  count = 1;
+
+  if (__atomic_fetch_or (v, count, __ATOMIC_RELAXED) !=  0)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, 2, __ATOMIC_CONSUME) !=  1)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, count, __ATOMIC_ACQUIRE) !=  3)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, 8, __ATOMIC_RELEASE) !=  7)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, count, __ATOMIC_ACQ_REL) !=  15)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, count, __ATOMIC_SEQ_CST) !=  31)
+    abort ();
+}
+
+/* The OP_fetch routines return the new value after the operation.  */
+
+void
+test_add_fetch (char* v)
+{
+  *v = 0;
+  count = 1;
+
+  if (__atomic_add_fetch (v, count, __ATOMIC_RELAXED) != 1)
+    abort ();
+
+  if (__atomic_add_fetch (v, 1, __ATOMIC_CONSUME) != 2)
+    abort ();
+
+  if (__atomic_add_fetch (v, count, __ATOMIC_ACQUIRE) != 3)
+    abort ();
+
+  if (__atomic_add_fetch (v, 1, __ATOMIC_RELEASE) != 4)
+    abort ();
+
+  if (__atomic_add_fetch (v, count, __ATOMIC_ACQ_REL) != 5)
+    abort ();
+
+  if (__atomic_add_fetch (v, count, __ATOMIC_SEQ_CST) != 6)
+    abort ();
+}
+
+
+void
+test_sub_fetch (char* v)
+{
+  *v = res = 20;
+  count = 0;
+
+  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_RELAXED) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, 1, __ATOMIC_CONSUME) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_ACQUIRE) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, 1, __ATOMIC_RELEASE) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_ACQ_REL) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_SEQ_CST) !=  --res)
+    abort ();
+}
+
+void
+test_and_fetch (char* v)
+{
+  *v = init;
+
+  if (__atomic_and_fetch (v, 0, __ATOMIC_RELAXED) !=  0)
+    abort ();
+
+  *v = init;
+  if (__atomic_and_fetch (v, init, __ATOMIC_CONSUME) !=  init)
+    abort ();
+
+  if (__atomic_and_fetch (v, 0, __ATOMIC_ACQUIRE) !=  0)
+    abort ();
+
+  *v = ~*v;
+  if (__atomic_and_fetch (v, init, __ATOMIC_RELEASE) !=  init)
+    abort ();
+
+  if (__atomic_and_fetch (v, 0, __ATOMIC_ACQ_REL) !=  0)
+    abort ();
+
+  *v = ~*v;
+  if (__atomic_and_fetch (v, 0, __ATOMIC_SEQ_CST) !=  0)
+    abort ();
+}
+
+void
+test_nand_fetch (char* v)
+{
+  *v = init;
+
+  if (__atomic_nand_fetch (v, 0, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_nand_fetch (v, init, __ATOMIC_CONSUME) !=  0)
+    abort ();
+
+  if (__atomic_nand_fetch (v, 0, __ATOMIC_ACQUIRE) !=  init)
+    abort ();
+
+  if (__atomic_nand_fetch (v, init, __ATOMIC_RELEASE) !=  0)
+    abort ();
+
+  if (__atomic_nand_fetch (v, init, __ATOMIC_ACQ_REL) !=  init)
+    abort ();
+
+  if (__atomic_nand_fetch (v, 0, __ATOMIC_SEQ_CST) !=  init)
+    abort ();
+}
+
+
+
+void
+test_xor_fetch (char* v)
+{
+  *v = init;
+  count = 0;
+
+  if (__atomic_xor_fetch (v, count, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_xor_fetch (v, ~count, __ATOMIC_CONSUME) !=  0)
+    abort ();
+
+  if (__atomic_xor_fetch (v, 0, __ATOMIC_ACQUIRE) !=  0)
+    abort ();
+
+  if (__atomic_xor_fetch (v, ~count, __ATOMIC_RELEASE) !=  init)
+    abort ();
+
+  if (__atomic_xor_fetch (v, 0, __ATOMIC_ACQ_REL) !=  init)
+    abort ();
+
+  if (__atomic_xor_fetch (v, ~count, __ATOMIC_SEQ_CST) !=  0)
+    abort ();
+}
+
+void
+test_or_fetch (char* v)
+{
+  *v = 0;
+  count = 1;
+
+  if (__atomic_or_fetch (v, count, __ATOMIC_RELAXED) !=  1)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, 2, __ATOMIC_CONSUME) !=  3)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, count, __ATOMIC_ACQUIRE) !=  7)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, 8, __ATOMIC_RELEASE) !=  15)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, count, __ATOMIC_ACQ_REL) !=  31)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, count, __ATOMIC_SEQ_CST) !=  63)
+    abort ();
+}
+
+
+/* Test the OP routines with a result which isn't used. Use both variations
+   within each function.  */
+
+void
+test_add (char* v)
+{
+  *v = 0;
+  count = 1;
+
+  __atomic_add_fetch (v, count, __ATOMIC_RELAXED);
+  if (*v != 1)
+    abort ();
+
+  __atomic_fetch_add (v, count, __ATOMIC_CONSUME);
+  if (*v != 2)
+    abort ();
+
+  __atomic_add_fetch (v, 1 , __ATOMIC_ACQUIRE);
+  if (*v != 3)
+    abort ();
+
+  __atomic_fetch_add (v, 1, __ATOMIC_RELEASE);
+  if (*v != 4)
+    abort ();
+
+  __atomic_add_fetch (v, count, __ATOMIC_ACQ_REL);
+  if (*v != 5)
+    abort ();
+
+  __atomic_fetch_add (v, count, __ATOMIC_SEQ_CST);
+  if (*v != 6)
+    abort ();
+}
+
+
+void
+test_sub (char* v)
+{
+  *v = res = 20;
+  count = 0;
+
+  __atomic_sub_fetch (v, count + 1, __ATOMIC_RELAXED);
+  if (*v != --res)
+    abort ();
+
+  __atomic_fetch_sub (v, count + 1, __ATOMIC_CONSUME);
+  if (*v != --res)
+    abort ();
+
+  __atomic_sub_fetch (v, 1, __ATOMIC_ACQUIRE);
+  if (*v != --res)
+    abort ();
+
+  __atomic_fetch_sub (v, 1, __ATOMIC_RELEASE);
+  if (*v != --res)
+    abort ();
+
+  __atomic_sub_fetch (v, count + 1, __ATOMIC_ACQ_REL);
+  if (*v != --res)
+    abort ();
+
+  __atomic_fetch_sub (v, count + 1, __ATOMIC_SEQ_CST);
+  if (*v != --res)
+    abort ();
+}
+
+void
+test_and (char* v)
+{
+  *v = init;
+
+  __atomic_and_fetch (v, 0, __ATOMIC_RELAXED);
+  if (*v != 0)
+    abort ();
+
+  *v = init;
+  __atomic_fetch_and (v, init, __ATOMIC_CONSUME);
+  if (*v != init)
+    abort ();
+
+  __atomic_and_fetch (v, 0, __ATOMIC_ACQUIRE);
+  if (*v != 0)
+    abort ();
+
+  *v = ~*v;
+  __atomic_fetch_and (v, init, __ATOMIC_RELEASE);
+  if (*v != init)
+    abort ();
+
+  __atomic_and_fetch (v, 0, __ATOMIC_ACQ_REL);
+  if (*v != 0)
+    abort ();
+
+  *v = ~*v;
+  __atomic_fetch_and (v, 0, __ATOMIC_SEQ_CST);
+  if (*v != 0)
+    abort ();
+}
+
+void
+test_nand (char* v)
+{
+  *v = init;
+
+  __atomic_fetch_nand (v, 0, __ATOMIC_RELAXED);
+  if (*v != init)
+    abort ();
+
+  __atomic_fetch_nand (v, init, __ATOMIC_CONSUME);
+  if (*v != 0)
+    abort ();
+
+  __atomic_nand_fetch (v, 0, __ATOMIC_ACQUIRE);
+  if (*v != init)
+    abort ();
+
+  __atomic_nand_fetch (v, init, __ATOMIC_RELEASE);
+  if (*v != 0)
+    abort ();
+
+  __atomic_fetch_nand (v, init, __ATOMIC_ACQ_REL);
+  if (*v != init)
+    abort ();
+
+  __atomic_nand_fetch (v, 0, __ATOMIC_SEQ_CST);
+  if (*v != init)
+    abort ();
+}
+
+
+
+void
+test_xor (char* v)
+{
+  *v = init;
+  count = 0;
+
+  __atomic_xor_fetch (v, count, __ATOMIC_RELAXED);
+  if (*v != init)
+    abort ();
+
+  __atomic_fetch_xor (v, ~count, __ATOMIC_CONSUME);
+  if (*v != 0)
+    abort ();
+
+  __atomic_xor_fetch (v, 0, __ATOMIC_ACQUIRE);
+  if (*v != 0)
+    abort ();
+
+  __atomic_fetch_xor (v, ~count, __ATOMIC_RELEASE);
+  if (*v != init)
+    abort ();
+
+  __atomic_fetch_xor (v, 0, __ATOMIC_ACQ_REL);
+  if (*v != init)
+    abort ();
+
+  __atomic_xor_fetch (v, ~count, __ATOMIC_SEQ_CST);
+  if (*v != 0)
+    abort ();
+}
+
+void
+test_or (char* v)
+{
+  *v = 0;
+  count = 1;
+
+  __atomic_or_fetch (v, count, __ATOMIC_RELAXED);
+  if (*v != 1)
+    abort ();
+
+  count *= 2;
+  __atomic_fetch_or (v, count, __ATOMIC_CONSUME);
+  if (*v != 3)
+    abort ();
+
+  count *= 2;
+  __atomic_or_fetch (v, 4, __ATOMIC_ACQUIRE);
+  if (*v != 7)
+    abort ();
+
+  count *= 2;
+  __atomic_fetch_or (v, 8, __ATOMIC_RELEASE);
+  if (*v != 15)
+    abort ();
+
+  count *= 2;
+  __atomic_or_fetch (v, count, __ATOMIC_ACQ_REL);
+  if (*v != 31)
+    abort ();
+
+  count *= 2;
+  __atomic_fetch_or (v, count, __ATOMIC_SEQ_CST);
+  if (*v != 63)
+    abort ();
+}
+
+int
+main ()
+{
+  char* V[] = {&A.a, &A.b, &A.c, &A.d};
+
+  for (int i = 0; i < 4; i++) {
+    test_fetch_add (V[i]);
+    test_fetch_sub (V[i]);
+    test_fetch_and (V[i]);
+    test_fetch_nand (V[i]);
+    test_fetch_xor (V[i]);
+    test_fetch_or (V[i]);
+
+    test_add_fetch (V[i]);
+    test_sub_fetch (V[i]);
+    test_and_fetch (V[i]);
+    test_nand_fetch (V[i]);
+    test_xor_fetch (V[i]);
+    test_or_fetch (V[i]);
+
+    test_add (V[i]);
+    test_sub (V[i]);
+    test_and (V[i]);
+    test_nand (V[i]);
+    test_xor (V[i]);
+    test_or (V[i]);
+  }
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-4.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-4.c
new file mode 100644
index 00000000000..eecfaae5cc6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-4.c
@@ -0,0 +1,566 @@
+/* Check all short alignments.  */
+/* Duplicate logic as libatomic/testsuite/libatomic.c/atomic-op-2.c */
+/* Test __atomic routines for existence and proper execution on 2 byte
+   values with each valid memory model.  */
+/* { dg-do run } */
+/* { dg-options "-minline-atomics -Wno-address-of-packed-member" } */
+
+/* Test the execution of the __atomic_*OP builtin routines for a short.  */
+
+extern void abort(void);
+
+short count, res;
+const short init = ~0;
+
+struct A
+{
+   short a;
+   short b;
+} __attribute__ ((packed)) A;
+
+/* The fetch_op routines return the original value before the operation.  */
+
+void
+test_fetch_add (short* v)
+{
+  *v = 0;
+  count = 1;
+
+  if (__atomic_fetch_add (v, count, __ATOMIC_RELAXED) != 0)
+    abort ();
+
+  if (__atomic_fetch_add (v, 1, __ATOMIC_CONSUME) != 1)
+    abort ();
+
+  if (__atomic_fetch_add (v, count, __ATOMIC_ACQUIRE) != 2)
+    abort ();
+
+  if (__atomic_fetch_add (v, 1, __ATOMIC_RELEASE) != 3)
+    abort ();
+
+  if (__atomic_fetch_add (v, count, __ATOMIC_ACQ_REL) != 4)
+    abort ();
+
+  if (__atomic_fetch_add (v, 1, __ATOMIC_SEQ_CST) != 5)
+    abort ();
+}
+
+
+void
+test_fetch_sub (short* v)
+{
+  *v = res = 20;
+  count = 0;
+
+  if (__atomic_fetch_sub (v, count + 1, __ATOMIC_RELAXED) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, 1, __ATOMIC_CONSUME) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, count + 1, __ATOMIC_ACQUIRE) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, 1, __ATOMIC_RELEASE) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, count + 1, __ATOMIC_ACQ_REL) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, 1, __ATOMIC_SEQ_CST) !=  res--)
+    abort ();
+}
+
+void
+test_fetch_and (short* v)
+{
+  *v = init;
+
+  if (__atomic_fetch_and (v, 0, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_fetch_and (v, init, __ATOMIC_CONSUME) !=  0)
+    abort ();
+
+  if (__atomic_fetch_and (v, 0, __ATOMIC_ACQUIRE) !=  0)
+    abort ();
+
+  *v = ~*v;
+  if (__atomic_fetch_and (v, init, __ATOMIC_RELEASE) !=  init)
+    abort ();
+
+  if (__atomic_fetch_and (v, 0, __ATOMIC_ACQ_REL) !=  init)
+    abort ();
+
+  if (__atomic_fetch_and (v, 0, __ATOMIC_SEQ_CST) !=  0)
+    abort ();
+}
+
+void
+test_fetch_nand (short* v)
+{
+  *v = init;
+
+  if (__atomic_fetch_nand (v, 0, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_fetch_nand (v, init, __ATOMIC_CONSUME) !=  init)
+    abort ();
+
+  if (__atomic_fetch_nand (v, 0, __ATOMIC_ACQUIRE) !=  0 )
+    abort ();
+
+  if (__atomic_fetch_nand (v, init, __ATOMIC_RELEASE) !=  init)
+    abort ();
+
+  if (__atomic_fetch_nand (v, init, __ATOMIC_ACQ_REL) !=  0)
+    abort ();
+
+  if (__atomic_fetch_nand (v, 0, __ATOMIC_SEQ_CST) !=  init)
+    abort ();
+}
+
+void
+test_fetch_xor (short* v)
+{
+  *v = init;
+  count = 0;
+
+  if (__atomic_fetch_xor (v, count, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_fetch_xor (v, ~count, __ATOMIC_CONSUME) !=  init)
+    abort ();
+
+  if (__atomic_fetch_xor (v, 0, __ATOMIC_ACQUIRE) !=  0)
+    abort ();
+
+  if (__atomic_fetch_xor (v, ~count, __ATOMIC_RELEASE) !=  0)
+    abort ();
+
+  if (__atomic_fetch_xor (v, 0, __ATOMIC_ACQ_REL) !=  init)
+    abort ();
+
+  if (__atomic_fetch_xor (v, ~count, __ATOMIC_SEQ_CST) !=  init)
+    abort ();
+}
+
+void
+test_fetch_or (short* v)
+{
+  *v = 0;
+  count = 1;
+
+  if (__atomic_fetch_or (v, count, __ATOMIC_RELAXED) !=  0)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, 2, __ATOMIC_CONSUME) !=  1)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, count, __ATOMIC_ACQUIRE) !=  3)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, 8, __ATOMIC_RELEASE) !=  7)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, count, __ATOMIC_ACQ_REL) !=  15)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, count, __ATOMIC_SEQ_CST) !=  31)
+    abort ();
+}
+
+/* The OP_fetch routines return the new value after the operation.  */
+
+void
+test_add_fetch (short* v)
+{
+  *v = 0;
+  count = 1;
+
+  if (__atomic_add_fetch (v, count, __ATOMIC_RELAXED) != 1)
+    abort ();
+
+  if (__atomic_add_fetch (v, 1, __ATOMIC_CONSUME) != 2)
+    abort ();
+
+  if (__atomic_add_fetch (v, count, __ATOMIC_ACQUIRE) != 3)
+    abort ();
+
+  if (__atomic_add_fetch (v, 1, __ATOMIC_RELEASE) != 4)
+    abort ();
+
+  if (__atomic_add_fetch (v, count, __ATOMIC_ACQ_REL) != 5)
+    abort ();
+
+  if (__atomic_add_fetch (v, count, __ATOMIC_SEQ_CST) != 6)
+    abort ();
+}
+
+
+void
+test_sub_fetch (short* v)
+{
+  *v = res = 20;
+  count = 0;
+
+  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_RELAXED) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, 1, __ATOMIC_CONSUME) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_ACQUIRE) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, 1, __ATOMIC_RELEASE) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_ACQ_REL) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_SEQ_CST) !=  --res)
+    abort ();
+}
+
+void
+test_and_fetch (short* v)
+{
+  *v = init;
+
+  if (__atomic_and_fetch (v, 0, __ATOMIC_RELAXED) !=  0)
+    abort ();
+
+  *v = init;
+  if (__atomic_and_fetch (v, init, __ATOMIC_CONSUME) !=  init)
+    abort ();
+
+  if (__atomic_and_fetch (v, 0, __ATOMIC_ACQUIRE) !=  0)
+    abort ();
+
+  *v = ~*v;
+  if (__atomic_and_fetch (v, init, __ATOMIC_RELEASE) !=  init)
+    abort ();
+
+  if (__atomic_and_fetch (v, 0, __ATOMIC_ACQ_REL) !=  0)
+    abort ();
+
+  *v = ~*v;
+  if (__atomic_and_fetch (v, 0, __ATOMIC_SEQ_CST) !=  0)
+    abort ();
+}
+
+void
+test_nand_fetch (short* v)
+{
+  *v = init;
+
+  if (__atomic_nand_fetch (v, 0, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_nand_fetch (v, init, __ATOMIC_CONSUME) !=  0)
+    abort ();
+
+  if (__atomic_nand_fetch (v, 0, __ATOMIC_ACQUIRE) !=  init)
+    abort ();
+
+  if (__atomic_nand_fetch (v, init, __ATOMIC_RELEASE) !=  0)
+    abort ();
+
+  if (__atomic_nand_fetch (v, init, __ATOMIC_ACQ_REL) !=  init)
+    abort ();
+
+  if (__atomic_nand_fetch (v, 0, __ATOMIC_SEQ_CST) !=  init)
+    abort ();
+}
+
+
+
+void
+test_xor_fetch (short* v)
+{
+  *v = init;
+  count = 0;
+
+  if (__atomic_xor_fetch (v, count, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_xor_fetch (v, ~count, __ATOMIC_CONSUME) !=  0)
+    abort ();
+
+  if (__atomic_xor_fetch (v, 0, __ATOMIC_ACQUIRE) !=  0)
+    abort ();
+
+  if (__atomic_xor_fetch (v, ~count, __ATOMIC_RELEASE) !=  init)
+    abort ();
+
+  if (__atomic_xor_fetch (v, 0, __ATOMIC_ACQ_REL) !=  init)
+    abort ();
+
+  if (__atomic_xor_fetch (v, ~count, __ATOMIC_SEQ_CST) !=  0)
+    abort ();
+}
+
+void
+test_or_fetch (short* v)
+{
+  *v = 0;
+  count = 1;
+
+  if (__atomic_or_fetch (v, count, __ATOMIC_RELAXED) !=  1)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, 2, __ATOMIC_CONSUME) !=  3)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, count, __ATOMIC_ACQUIRE) !=  7)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, 8, __ATOMIC_RELEASE) !=  15)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, count, __ATOMIC_ACQ_REL) !=  31)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, count, __ATOMIC_SEQ_CST) !=  63)
+    abort ();
+}
+
+
+/* Test the OP routines with a result which isn't used. Use both variations
+   within each function.  */
+
+void
+test_add (short* v)
+{
+  *v = 0;
+  count = 1;
+
+  __atomic_add_fetch (v, count, __ATOMIC_RELAXED);
+  if (*v != 1)
+    abort ();
+
+  __atomic_fetch_add (v, count, __ATOMIC_CONSUME);
+  if (*v != 2)
+    abort ();
+
+  __atomic_add_fetch (v, 1 , __ATOMIC_ACQUIRE);
+  if (*v != 3)
+    abort ();
+
+  __atomic_fetch_add (v, 1, __ATOMIC_RELEASE);
+  if (*v != 4)
+    abort ();
+
+  __atomic_add_fetch (v, count, __ATOMIC_ACQ_REL);
+  if (*v != 5)
+    abort ();
+
+  __atomic_fetch_add (v, count, __ATOMIC_SEQ_CST);
+  if (*v != 6)
+    abort ();
+}
+
+
+void
+test_sub (short* v)
+{
+  *v = res = 20;
+  count = 0;
+
+  __atomic_sub_fetch (v, count + 1, __ATOMIC_RELAXED);
+  if (*v != --res)
+    abort ();
+
+  __atomic_fetch_sub (v, count + 1, __ATOMIC_CONSUME);
+  if (*v != --res)
+    abort ();
+
+  __atomic_sub_fetch (v, 1, __ATOMIC_ACQUIRE);
+  if (*v != --res)
+    abort ();
+
+  __atomic_fetch_sub (v, 1, __ATOMIC_RELEASE);
+  if (*v != --res)
+    abort ();
+
+  __atomic_sub_fetch (v, count + 1, __ATOMIC_ACQ_REL);
+  if (*v != --res)
+    abort ();
+
+  __atomic_fetch_sub (v, count + 1, __ATOMIC_SEQ_CST);
+  if (*v != --res)
+    abort ();
+}
+
+void
+test_and (short* v)
+{
+  *v = init;
+
+  __atomic_and_fetch (v, 0, __ATOMIC_RELAXED);
+  if (*v != 0)
+    abort ();
+
+  *v = init;
+  __atomic_fetch_and (v, init, __ATOMIC_CONSUME);
+  if (*v != init)
+    abort ();
+
+  __atomic_and_fetch (v, 0, __ATOMIC_ACQUIRE);
+  if (*v != 0)
+    abort ();
+
+  *v = ~*v;
+  __atomic_fetch_and (v, init, __ATOMIC_RELEASE);
+  if (*v != init)
+    abort ();
+
+  __atomic_and_fetch (v, 0, __ATOMIC_ACQ_REL);
+  if (*v != 0)
+    abort ();
+
+  *v = ~*v;
+  __atomic_fetch_and (v, 0, __ATOMIC_SEQ_CST);
+  if (*v != 0)
+    abort ();
+}
+
+void
+test_nand (short* v)
+{
+  *v = init;
+
+  __atomic_fetch_nand (v, 0, __ATOMIC_RELAXED);
+  if (*v != init)
+    abort ();
+
+  __atomic_fetch_nand (v, init, __ATOMIC_CONSUME);
+  if (*v != 0)
+    abort ();
+
+  __atomic_nand_fetch (v, 0, __ATOMIC_ACQUIRE);
+  if (*v != init)
+    abort ();
+
+  __atomic_nand_fetch (v, init, __ATOMIC_RELEASE);
+  if (*v != 0)
+    abort ();
+
+  __atomic_fetch_nand (v, init, __ATOMIC_ACQ_REL);
+  if (*v != init)
+    abort ();
+
+  __atomic_nand_fetch (v, 0, __ATOMIC_SEQ_CST);
+  if (*v != init)
+    abort ();
+}
+
+
+
+void
+test_xor (short* v)
+{
+  *v = init;
+  count = 0;
+
+  __atomic_xor_fetch (v, count, __ATOMIC_RELAXED);
+  if (*v != init)
+    abort ();
+
+  __atomic_fetch_xor (v, ~count, __ATOMIC_CONSUME);
+  if (*v != 0)
+    abort ();
+
+  __atomic_xor_fetch (v, 0, __ATOMIC_ACQUIRE);
+  if (*v != 0)
+    abort ();
+
+  __atomic_fetch_xor (v, ~count, __ATOMIC_RELEASE);
+  if (*v != init)
+    abort ();
+
+  __atomic_fetch_xor (v, 0, __ATOMIC_ACQ_REL);
+  if (*v != init)
+    abort ();
+
+  __atomic_xor_fetch (v, ~count, __ATOMIC_SEQ_CST);
+  if (*v != 0)
+    abort ();
+}
+
+void
+test_or (short* v)
+{
+  *v = 0;
+  count = 1;
+
+  __atomic_or_fetch (v, count, __ATOMIC_RELAXED);
+  if (*v != 1)
+    abort ();
+
+  count *= 2;
+  __atomic_fetch_or (v, count, __ATOMIC_CONSUME);
+  if (*v != 3)
+    abort ();
+
+  count *= 2;
+  __atomic_or_fetch (v, 4, __ATOMIC_ACQUIRE);
+  if (*v != 7)
+    abort ();
+
+  count *= 2;
+  __atomic_fetch_or (v, 8, __ATOMIC_RELEASE);
+  if (*v != 15)
+    abort ();
+
+  count *= 2;
+  __atomic_or_fetch (v, count, __ATOMIC_ACQ_REL);
+  if (*v != 31)
+    abort ();
+
+  count *= 2;
+  __atomic_fetch_or (v, count, __ATOMIC_SEQ_CST);
+  if (*v != 63)
+    abort ();
+}
+
+int
+main () {
+  short* V[] = {&A.a, &A.b};
+
+  for (int i = 0; i < 2; i++) {
+    test_fetch_add (V[i]);
+    test_fetch_sub (V[i]);
+    test_fetch_and (V[i]);
+    test_fetch_nand (V[i]);
+    test_fetch_xor (V[i]);
+    test_fetch_or (V[i]);
+
+    test_add_fetch (V[i]);
+    test_sub_fetch (V[i]);
+    test_and_fetch (V[i]);
+    test_nand_fetch (V[i]);
+    test_xor_fetch (V[i]);
+    test_or_fetch (V[i]);
+
+    test_add (V[i]);
+    test_sub (V[i]);
+    test_and (V[i]);
+    test_nand (V[i]);
+    test_xor (V[i]);
+    test_or (V[i]);
+  }
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-5.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-5.c
new file mode 100644
index 00000000000..52093894a79
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-5.c
@@ -0,0 +1,87 @@
+/* Test __atomic routines for existence and proper execution on 1 byte
+   values with each valid memory model.  */
+/* Duplicate logic as libatomic/testsuite/libatomic.c/atomic-compare-exchange-1.c */
+/* { dg-do run } */
+/* { dg-options "-minline-atomics" } */
+
+/* Test the execution of the __atomic_compare_exchange_n builtin for a char.  */
+
+extern void abort(void);
+
+char v = 0;
+char expected = 0;
+char max = ~0;
+char desired = ~0;
+char zero = 0;
+
+#define STRONG 0
+#define WEAK 1
+
+int
+main ()
+{
+
+  if (!__atomic_compare_exchange_n (&v, &expected, max, STRONG , __ATOMIC_RELAXED, __ATOMIC_RELAXED))
+    abort ();
+  if (expected != 0)
+    abort ();
+
+  if (__atomic_compare_exchange_n (&v, &expected, 0, STRONG , __ATOMIC_ACQUIRE, __ATOMIC_RELAXED))
+    abort ();
+  if (expected != max)
+    abort ();
+
+  if (!__atomic_compare_exchange_n (&v, &expected, 0, STRONG , __ATOMIC_RELEASE, __ATOMIC_ACQUIRE))
+    abort ();
+  if (expected != max)
+    abort ();
+  if (v != 0)
+    abort ();
+
+  if (__atomic_compare_exchange_n (&v, &expected, desired, WEAK, __ATOMIC_ACQ_REL, __ATOMIC_ACQUIRE))
+    abort ();
+  if (expected != 0)
+    abort ();
+
+  if (!__atomic_compare_exchange_n (&v, &expected, desired, STRONG , __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST))
+    abort ();
+  if (expected != 0)
+    abort ();
+  if (v != max)
+    abort ();
+
+  /* Now test the generic version.  */
+
+  v = 0;
+
+  if (!__atomic_compare_exchange (&v, &expected, &max, STRONG, __ATOMIC_RELAXED, __ATOMIC_RELAXED))
+    abort ();
+  if (expected != 0)
+    abort ();
+
+  if (__atomic_compare_exchange (&v, &expected, &zero, STRONG , __ATOMIC_ACQUIRE, __ATOMIC_RELAXED))
+    abort ();
+  if (expected != max)
+    abort ();
+
+  if (!__atomic_compare_exchange (&v, &expected, &zero, STRONG , __ATOMIC_RELEASE, __ATOMIC_ACQUIRE))
+    abort ();
+  if (expected != max)
+    abort ();
+  if (v != 0)
+    abort ();
+
+  if (__atomic_compare_exchange (&v, &expected, &desired, WEAK, __ATOMIC_ACQ_REL, __ATOMIC_ACQUIRE))
+    abort ();
+  if (expected != 0)
+    abort ();
+
+  if (!__atomic_compare_exchange (&v, &expected, &desired, STRONG , __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST))
+    abort ();
+  if (expected != 0)
+    abort ();
+  if (v != max)
+    abort ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-6.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-6.c
new file mode 100644
index 00000000000..8fee8c44811
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-6.c
@@ -0,0 +1,87 @@
+/* Test __atomic routines for existence and proper execution on 2 byte
+   values with each valid memory model.  */
+/* Duplicate logic as libatomic/testsuite/libatomic.c/atomic-compare-exchange-2.c */
+/* { dg-do run } */
+/* { dg-options "-minline-atomics" } */
+
+/* Test the execution of the __atomic_compare_exchange_n builtin for a short.  */
+
+extern void abort(void);
+
+short v = 0;
+short expected = 0;
+short max = ~0;
+short desired = ~0;
+short zero = 0;
+
+#define STRONG 0
+#define WEAK 1
+
+int
+main ()
+{
+
+  if (!__atomic_compare_exchange_n (&v, &expected, max, STRONG , __ATOMIC_RELAXED, __ATOMIC_RELAXED))
+    abort ();
+  if (expected != 0)
+    abort ();
+
+  if (__atomic_compare_exchange_n (&v, &expected, 0, STRONG , __ATOMIC_ACQUIRE, __ATOMIC_RELAXED))
+    abort ();
+  if (expected != max)
+    abort ();
+
+  if (!__atomic_compare_exchange_n (&v, &expected, 0, STRONG , __ATOMIC_RELEASE, __ATOMIC_ACQUIRE))
+    abort ();
+  if (expected != max)
+    abort ();
+  if (v != 0)
+    abort ();
+
+  if (__atomic_compare_exchange_n (&v, &expected, desired, WEAK, __ATOMIC_ACQ_REL, __ATOMIC_ACQUIRE))
+    abort ();
+  if (expected != 0)
+    abort ();
+
+  if (!__atomic_compare_exchange_n (&v, &expected, desired, STRONG , __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST))
+    abort ();
+  if (expected != 0)
+    abort ();
+  if (v != max)
+    abort ();
+
+  /* Now test the generic version.  */
+
+  v = 0;
+
+  if (!__atomic_compare_exchange (&v, &expected, &max, STRONG, __ATOMIC_RELAXED, __ATOMIC_RELAXED))
+    abort ();
+  if (expected != 0)
+    abort ();
+
+  if (__atomic_compare_exchange (&v, &expected, &zero, STRONG , __ATOMIC_ACQUIRE, __ATOMIC_RELAXED))
+    abort ();
+  if (expected != max)
+    abort ();
+
+  if (!__atomic_compare_exchange (&v, &expected, &zero, STRONG , __ATOMIC_RELEASE, __ATOMIC_ACQUIRE))
+    abort ();
+  if (expected != max)
+    abort ();
+  if (v != 0)
+    abort ();
+
+  if (__atomic_compare_exchange (&v, &expected, &desired, WEAK, __ATOMIC_ACQ_REL, __ATOMIC_ACQUIRE))
+    abort ();
+  if (expected != 0)
+    abort ();
+
+  if (!__atomic_compare_exchange (&v, &expected, &desired, STRONG , __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST))
+    abort ();
+  if (expected != 0)
+    abort ();
+  if (v != max)
+    abort ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-7.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-7.c
new file mode 100644
index 00000000000..24c344c0ce3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-7.c
@@ -0,0 +1,69 @@
+/* Test __atomic routines for existence and proper execution on 1 byte
+   values with each valid memory model.  */
+/* Duplicate logic as libatomic/testsuite/libatomic.c/atomic-exchange-1.c */
+/* { dg-do run } */
+/* { dg-options "-minline-atomics" } */
+
+/* Test the execution of the __atomic_exchange_n builtin for a char.  */
+
+extern void abort(void);
+
+char v, count, ret;
+
+int
+main ()
+{
+  v = 0;
+  count = 0;
+
+  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_RELAXED) != count)
+    abort ();
+  count++;
+
+  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_ACQUIRE) != count)
+    abort ();
+  count++;
+
+  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_RELEASE) != count)
+    abort ();
+  count++;
+
+  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_ACQ_REL) != count)
+    abort ();
+  count++;
+
+  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_SEQ_CST) != count)
+    abort ();
+  count++;
+
+  /* Now test the generic version.  */
+
+  count++;
+
+  __atomic_exchange (&v, &count, &ret, __ATOMIC_RELAXED);
+  if (ret != count - 1 || v != count)
+    abort ();
+  count++;
+
+  __atomic_exchange (&v, &count, &ret, __ATOMIC_ACQUIRE);
+  if (ret != count - 1 || v != count)
+    abort ();
+  count++;
+
+  __atomic_exchange (&v, &count, &ret, __ATOMIC_RELEASE);
+  if (ret != count - 1 || v != count)
+    abort ();
+  count++;
+
+  __atomic_exchange (&v, &count, &ret, __ATOMIC_ACQ_REL);
+  if (ret != count - 1 || v != count)
+    abort ();
+  count++;
+
+  __atomic_exchange (&v, &count, &ret, __ATOMIC_SEQ_CST);
+  if (ret != count - 1 || v != count)
+    abort ();
+  count++;
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-8.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-8.c
new file mode 100644
index 00000000000..edc212df04e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-8.c
@@ -0,0 +1,69 @@
+/* Test __atomic routines for existence and proper execution on 2 byte
+   values with each valid memory model.  */
+/* Duplicate logic as libatomic/testsuite/libatomic.c/atomic-exchange-2.c */
+/* { dg-do run } */
+/* { dg-options "-minline-atomics" } */
+
+/* Test the execution of the __atomic_X builtin for a short.  */
+
+extern void abort(void);
+
+short v, count, ret;
+
+int
+main ()
+{
+  v = 0;
+  count = 0;
+
+  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_RELAXED) != count)
+    abort ();
+  count++;
+
+  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_ACQUIRE) != count)
+    abort ();
+  count++;
+
+  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_RELEASE) != count)
+    abort ();
+  count++;
+
+  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_ACQ_REL) != count)
+    abort ();
+  count++;
+
+  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_SEQ_CST) != count)
+    abort ();
+  count++;
+
+  /* Now test the generic version.  */
+
+  count++;
+
+  __atomic_exchange (&v, &count, &ret, __ATOMIC_RELAXED);
+  if (ret != count - 1 || v != count)
+    abort ();
+  count++;
+
+  __atomic_exchange (&v, &count, &ret, __ATOMIC_ACQUIRE);
+  if (ret != count - 1 || v != count)
+    abort ();
+  count++;
+
+  __atomic_exchange (&v, &count, &ret, __ATOMIC_RELEASE);
+  if (ret != count - 1 || v != count)
+    abort ();
+  count++;
+
+  __atomic_exchange (&v, &count, &ret, __ATOMIC_ACQ_REL);
+  if (ret != count - 1 || v != count)
+    abort ();
+  count++;
+
+  __atomic_exchange (&v, &count, &ret, __ATOMIC_SEQ_CST);
+  if (ret != count - 1 || v != count)
+    abort ();
+  count++;
+
+  return 0;
+}
diff --git a/libgcc/config/riscv/atomic.c b/libgcc/config/riscv/atomic.c
index 69f53623509..573d163ea04 100644
--- a/libgcc/config/riscv/atomic.c
+++ b/libgcc/config/riscv/atomic.c
@@ -30,6 +30,8 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 #define INVERT		"not %[tmp1], %[tmp1]\n\t"
 #define DONT_INVERT	""
 
+/* Logic duplicated in gcc/gcc/config/riscv/sync.md for use when inlining is enabled */
+
 #define GENERATE_FETCH_AND_OP(type, size, opname, insn, invert, cop)	\
   type __sync_fetch_and_ ## opname ## _ ## size (type *p, type v)	\
   {									\
-- 
2.25.1



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5] RISCV: Inline subword atomic ops
  2023-04-18 14:28 ` [PATCH v5] RISCV: Inline subword atomic ops Patrick O'Neill
  2023-04-18 15:06   ` Andreas Schwab
  2023-04-18 16:39   ` [PATCH v6] " Patrick O'Neill
@ 2023-04-18 16:59   ` Jeff Law
  2023-04-18 20:48     ` Patrick O'Neill
  2 siblings, 1 reply; 24+ messages in thread
From: Jeff Law @ 2023-04-18 16:59 UTC (permalink / raw)
  To: Patrick O'Neill, gcc-patches; +Cc: palmer, kito.cheng, david.abd



On 4/18/23 08:28, Patrick O'Neill wrote:
> RISC-V has no support for subword atomic operations; code currently
> generates libatomic library calls.
> 
> This patch changes the default behavior to inline subword atomic calls
> (using the same logic as the existing library call).
> Behavior can be specified using the -minline-atomics and
> -mno-inline-atomics command line flags.
> 
> gcc/libgcc/config/riscv/atomic.c has the same logic implemented in asm.
> This will need to stay for backwards compatibility and the
> -mno-inline-atomics flag.
> 
> 2023-04-18 Patrick O'Neill <patrick@rivosinc.com>
> 
> 	PR target/104338
> 	* riscv-protos.h: Add helper function stubs.
> 	* riscv.cc: Add helper functions for subword masking.
> 	* riscv.opt: Add command-line flag.
> 	* sync.md: Add masking logic and inline asm for fetch_and_op,
> 	fetch_and_nand, CAS, and exchange ops.
> 	* invoke.texi: Add blurb regarding command-line flag.
> 	* inline-atomics-1.c: New test.
> 	* inline-atomics-2.c: Likewise.
> 	* inline-atomics-3.c: Likewise.
> 	* inline-atomics-4.c: Likewise.
> 	* inline-atomics-5.c: Likewise.
> 	* inline-atomics-6.c: Likewise.
> 	* inline-atomics-7.c: Likewise.
> 	* inline-atomics-8.c: Likewise.
> 	* atomic.c: Add reference to duplicate logic.
So for others who may be interested.  The motivation here is that for a 
sub-word atomic we currently have to explicitly link in libatomic or we 
get undefined symbols.

This is particularly problematical for the distros because we're one of 
the few (only?) architectures supported by the distros that require 
linking in libatomic for these cases.  THe distros don't want to adjust 
each affected packages and be stuck carrying that change forward or 
negotiating with all the relevant upstreams.  The distros might tackle 
this problem by porting this patch into their compiler tree which has 
its own set of problems with long term maintenance.

The net is from a usability standpoint it's best if we get this problem 
addressed and backported to our gcc-13 RISC-V coordination branch.

We had held this up pending resolution of some other issues in the 
atomics space.  In retrospect that might have been a mistake.

So with that background...  Here we go...

>   
> +/* Helper function for extracting a subword from memory.  */
> +
> +void
> +riscv_subword_address (rtx mem, rtx *aligned_mem, rtx *shift, rtx *mask,
> +		       rtx *not_mask)
So I'd expand on that comment.  The idea is we would like someone 
working in the backend to be able to read the function comment and have 
a reasonable sense of what the function does as well as the inputs and 
return value.  So perhaps something like this:

/* Given memory reference MEM, expand code to compute the aligned
    memory address, shift and mask values and store them into
    *ALIGNED_MEM, *SHIFT, *MASK and *NOT_MASK.  */

Or something like that.


> +{
> +  /* Align the memory addess to a word.  */
s/addess/address/


> +  rtx addr = force_reg (Pmode, XEXP (mem, 0));
> +
> +  rtx aligned_addr = gen_reg_rtx (Pmode);
> +  emit_move_insn (aligned_addr,  gen_rtx_AND (Pmode, addr,
> +					      gen_int_mode (-4, Pmode)));
So rather than -4 as a magic number, GET_MODE_MASK would be better. 
That may result in needing to rewrap this code.  I'd bring the 
gen_rtx_AND down on a new line, aligned with aligned_addr.

Presumably using SImode is intentional here rather than wanting to use 
word_mode which would be SImode for rv32 and DImode for rv64?  I'm going 
to work based on that assumption, but if it isn't there's more work to 
do to generalize this code.



> +
> +  /* Calculate the shift amount.  */
> +  emit_move_insn (*shift, gen_rtx_AND (SImode, gen_lowpart (SImode, addr),
> +				       gen_int_mode (3, SImode)));
> +  emit_move_insn (*shift, gen_rtx_ASHIFT (SImode, *shift,
> +					  gen_int_mode(3, SImode)));


Formatting nit.  space after before open paren in the gen_int_mode call 
on that last line above.  This minor goof shows up in various places, 
please review the patch as a whole looking for similar nits.



> +
> +  /* Calculate the mask.  */
> +  int unshifted_mask;
> +  if (GET_MODE (mem) == QImode)
> +    unshifted_mask = 0xFF;
> +  else
> +    unshifted_mask = 0xFFFF;
Can you just use GET_MASK_MODE here which should simplify this to 
something like

unshifted_mask = GET_MODE_MASK (GET_MODE (mem));








> +
> +(define_expand "atomic_fetch_nand<mode>"
> +  [(set (match_operand:SHORT 0 "register_operand" "=&r")
> +	(match_operand:SHORT 1 "memory_operand" "+A"))
> +   (set (match_dup 1)
> +	(unspec_volatile:SHORT
> +	  [(not:SHORT (and:SHORT (match_dup 1)
> +				 (match_operand:SHORT 2 "reg_or_0_operand" "rJ")))
> +	   (match_operand:SI 3 "const_int_operand")] ;; model
> +	 UNSPEC_SYNC_OLD_OP_SUBWORD))]
> +  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
Just a note, constraints aren't necessary for a define_expand.  They 
don't hurt anything though.  They do document expectations, but then you 
have to maintain them over time.  I'm OK leaving them, mostly wanted to 
make sure you're aware they aren't strictly necessary for a define_expand.






> +
> +(define_expand "atomic_fetch_<atomic_optab><mode>"
> +  [(set (match_operand:SHORT 0 "register_operand" "=&r")	      ;; old value at mem
> +	(match_operand:SHORT 1 "memory_operand" "+A"))		      ;; mem location
> +   (set (match_dup 1)
> +	(unspec_volatile:SHORT
> +	  [(any_atomic:SHORT (match_dup 1)
> +		     (match_operand:SHORT 2 "reg_or_0_operand" "rJ")) ;; value for op
> +	   (match_operand:SI 3 "const_int_operand")]		      ;; model
> +	 UNSPEC_SYNC_OLD_OP_SUBWORD))]
> +  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
> +{
> +  /* We have no QImode/HImode atomics, so form a mask, then use
> +     subword_atomic_fetch_strong_<mode> to implement a LR/SC version of the
> +     operation. */
> +
> +  /* Logic duplicated in gcc/libgcc/config/riscv/atomic.c for use when inlining
> +     is disabled */
Makes me wonder if we should expose builtins that could be used by 
atomic.c rather than having the logic open coded in two places.  That 
could be a follow-up in my opinion rather than a blocker for this patch.


>   
> +(define_expand "atomic_compare_and_swap<mode>"
> +  [(match_operand:SI 0 "register_operand" "")    ;; bool output
> +   (match_operand:SHORT 1 "register_operand" "") ;; val output
> +   (match_operand:SHORT 2 "memory_operand" "")   ;; memory
> +   (match_operand:SHORT 3 "reg_or_0_operand" "") ;; expected value
> +   (match_operand:SHORT 4 "reg_or_0_operand" "") ;; desired value
> +   (match_operand:SI 5 "const_int_operand" "")   ;; is_weak
> +   (match_operand:SI 6 "const_int_operand" "")   ;; mod_s
> +   (match_operand:SI 7 "const_int_operand" "")]  ;; mod_f
> +  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
I nearly suggested you use this form for the earlier expanders where the 
RTL in the expander is ultimately thrown away because we generate all 
the RTL we need in the C fragment and finish with a "DONE" tag.


> diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-1.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-1.c
> new file mode 100644
> index 00000000000..5c5623d9b2f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-1.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mno-inline-atomics" } */
> +/* { dg-message "note: '__sync_fetch_and_nand' changed semantics in GCC 4.4" "fetch_and_nand" { target *-*-* } 0 } */
> +/* { dg-final { scan-assembler "\tcall\t__sync_fetch_and_add_1" } } */
> +/* { dg-final { scan-assembler "\tcall\t__sync_fetch_and_nand_1" } } */
> +/* { dg-final { scan-assembler "\tcall\t__sync_bool_compare_and_swap_1" } } */
> +
> +char foo;
> +char bar;
> +char baz;
> +
> +int
> +main ()
> +{
> +  __sync_fetch_and_add(&foo, 1);
> +  __sync_fetch_and_nand(&bar, 1);
> +  __sync_bool_compare_and_swap (&baz, 1, 2);
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-2.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-2.c
> new file mode 100644
> index 00000000000..fdce7a5d71f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-2.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* Verify that subword atomics do not generate calls.  */
> +/* { dg-options "-minline-atomics" } */
> +/* { dg-message "note: '__sync_fetch_and_nand' changed semantics in GCC 4.4" "fetch_and_nand" { target *-*-* } 0 } */
> +/* { dg-final { scan-assembler-not "\tcall\t__sync_fetch_and_add_1" } } */
> +/* { dg-final { scan-assembler-not "\tcall\t__sync_fetch_and_nand_1" } } */
> +/* { dg-final { scan-assembler-not "\tcall\t__sync_bool_compare_and_swap_1" } } */
Note that you can #include another test file.  When you do that the 
dg-directives in the #included file are ignored.   Meaning that you can 
share test code, but use different dg-directives.  So for inline-2.c, 
you can have the dg-* directives above, followed by:

#include "inline-atomics-1.c"

Overall it looks pretty close to ready.

Jeff

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v6] RISCV: Inline subword atomic ops
  2023-04-18 16:39   ` [PATCH v6] " Patrick O'Neill
@ 2023-04-18 20:17     ` Palmer Dabbelt
  2023-04-18 21:41     ` [PATCH v7] " Patrick O'Neill
  1 sibling, 0 replies; 24+ messages in thread
From: Palmer Dabbelt @ 2023-04-18 20:17 UTC (permalink / raw)
  To: Patrick O'Neill
  Cc: gcc-patches, Kito Cheng, david.abd, jeffreyalaw, schwab,
	Patrick O'Neill

On Tue, 18 Apr 2023 09:39:13 PDT (-0700), Patrick O'Neill wrote:
> RISC-V has no support for subword atomic operations; code currently
> generates libatomic library calls.
>
> This patch changes the default behavior to inline subword atomic calls
> (using the same logic as the existing library call).
> Behavior can be specified using the -minline-atomics and
> -mno-inline-atomics command line flags.
>
> gcc/libgcc/config/riscv/atomic.c has the same logic implemented in asm.
> This will need to stay for backwards compatibility and the
> -mno-inline-atomics flag.
>
> 2023-04-18 Patrick O'Neill <patrick@rivosinc.com>
>
> 	PR target/104338
> 	* riscv-protos.h: Add helper function stubs.
> 	* riscv.cc: Add helper functions for subword masking.
> 	* riscv.opt: Add command-line flag.
> 	* sync.md: Add masking logic and inline asm for fetch_and_op,
> 	fetch_and_nand, CAS, and exchange ops.
> 	* invoke.texi: Add blurb regarding command-line flag.
> 	* inline-atomics-1.c: New test.
> 	* inline-atomics-2.c: Likewise.
> 	* inline-atomics-3.c: Likewise.
> 	* inline-atomics-4.c: Likewise.
> 	* inline-atomics-5.c: Likewise.
> 	* inline-atomics-6.c: Likewise.
> 	* inline-atomics-7.c: Likewise.
> 	* inline-atomics-8.c: Likewise.
> 	* atomic.c: Add reference to duplicate logic.
>
> Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
> ---
> v5: https://inbox.sourceware.org/gcc-patches/20230418142858.2424851-1-patrick@rivosinc.com/
>
> Addressed Andreas Schwab's comments about the flags/documentation.
>   https://inbox.sourceware.org/gcc-patches/87y1mpb57m.fsf@igel.home/
>
> No new failures on trunk.

Looks like Jeff had some comments as well.

IMO we should be targeting this for gcc-13: it's enough of a headache 
for distros that they'll likely backport it anyway, so we might as well 
just take on the pain ourselves.

Since Jeff and Kito have chimed in on the code I'll let them have some 
time to look, I wrote some of it so I'm OK with it.

> ---
> The mapping implemented here matches Libatomic. That mapping changes if
> "Implement ISA Manual Table A.6 Mappings" is merged. Depending on which
> patch is merged first, I will update the other to make sure the
> correct mapping is emitted.
>   https://gcc.gnu.org/pipermail/gcc-patches/2023-April/615748.html
> ---
>  gcc/config/riscv/riscv-protos.h               |   2 +
>  gcc/config/riscv/riscv.cc                     |  50 ++
>  gcc/config/riscv/riscv.opt                    |   4 +
>  gcc/config/riscv/sync.md                      | 314 ++++++++++
>  gcc/doc/invoke.texi                           |  10 +-
>  .../gcc.target/riscv/inline-atomics-1.c       |  18 +
>  .../gcc.target/riscv/inline-atomics-2.c       |  19 +
>  .../gcc.target/riscv/inline-atomics-3.c       | 569 ++++++++++++++++++
>  .../gcc.target/riscv/inline-atomics-4.c       | 566 +++++++++++++++++
>  .../gcc.target/riscv/inline-atomics-5.c       |  87 +++
>  .../gcc.target/riscv/inline-atomics-6.c       |  87 +++
>  .../gcc.target/riscv/inline-atomics-7.c       |  69 +++
>  .../gcc.target/riscv/inline-atomics-8.c       |  69 +++
>  libgcc/config/riscv/atomic.c                  |   2 +
>  14 files changed, 1865 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-1.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-2.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-3.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-4.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-5.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-6.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-7.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-8.c
>
> diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
> index 5244e8dcbf0..02b33e02020 100644
> --- a/gcc/config/riscv/riscv-protos.h
> +++ b/gcc/config/riscv/riscv-protos.h
> @@ -79,6 +79,8 @@ extern void riscv_reinit (void);
>  extern poly_uint64 riscv_regmode_natural_size (machine_mode);
>  extern bool riscv_v_ext_vector_mode_p (machine_mode);
>  extern bool riscv_shamt_matches_mask_p (int, HOST_WIDE_INT);
> +extern void riscv_subword_address (rtx, rtx *, rtx *, rtx *, rtx *);
> +extern void riscv_lshift_subword (machine_mode, rtx, rtx, rtx *);
>
>  /* Routines implemented in riscv-c.cc.  */
>  void riscv_cpu_cpp_builtins (cpp_reader *);
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index e4937d1af25..fa0247be22f 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -7143,6 +7143,56 @@ riscv_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
>  							& ~zeroed_hardregs);
>  }
>
> +/* Helper function for extracting a subword from memory.  */
> +
> +void
> +riscv_subword_address (rtx mem, rtx *aligned_mem, rtx *shift, rtx *mask,
> +		       rtx *not_mask)
> +{
> +  /* Align the memory addess to a word.  */
> +  rtx addr = force_reg (Pmode, XEXP (mem, 0));
> +
> +  rtx aligned_addr = gen_reg_rtx (Pmode);
> +  emit_move_insn (aligned_addr,  gen_rtx_AND (Pmode, addr,
> +					      gen_int_mode (-4, Pmode)));
> +
> +  *aligned_mem = change_address (mem, SImode, aligned_addr);
> +
> +  /* Calculate the shift amount.  */
> +  emit_move_insn (*shift, gen_rtx_AND (SImode, gen_lowpart (SImode, addr),
> +				       gen_int_mode (3, SImode)));
> +  emit_move_insn (*shift, gen_rtx_ASHIFT (SImode, *shift,
> +					  gen_int_mode(3, SImode)));
> +
> +  /* Calculate the mask.  */
> +  int unshifted_mask;
> +  if (GET_MODE (mem) == QImode)
> +    unshifted_mask = 0xFF;
> +  else
> +    unshifted_mask = 0xFFFF;
> +
> +  emit_move_insn (*mask, gen_int_mode(unshifted_mask, SImode));
> +
> +  emit_move_insn (*mask, gen_rtx_ASHIFT(SImode, *mask,
> +					gen_lowpart (QImode, *shift)));
> +
> +  emit_move_insn (*not_mask, gen_rtx_NOT(SImode, *mask));
> +}
> +
> +/* Leftshift a subword within an SImode register.  */
> +
> +void
> +riscv_lshift_subword (machine_mode mode, rtx value, rtx shift,
> +		      rtx *shifted_value)
> +{
> +  rtx value_reg = gen_reg_rtx (SImode);
> +  emit_move_insn (value_reg, simplify_gen_subreg (SImode, value,
> +						  mode, 0));
> +
> +  emit_move_insn(*shifted_value, gen_rtx_ASHIFT(SImode, value_reg,
> +						gen_lowpart (QImode, shift)));
> +}
> +
>  /* Initialize the GCC target structure.  */
>  #undef TARGET_ASM_ALIGNED_HI_OP
>  #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t"
> diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
> index ff1dd4ddd4f..bc5e63ab3e6 100644
> --- a/gcc/config/riscv/riscv.opt
> +++ b/gcc/config/riscv/riscv.opt
> @@ -254,3 +254,7 @@ Enum(isa_spec_class) String(20191213) Value(ISA_SPEC_CLASS_20191213)
>  misa-spec=
>  Target RejectNegative Joined Enum(isa_spec_class) Var(riscv_isa_spec) Init(TARGET_DEFAULT_ISA_SPEC)
>  Set the version of RISC-V ISA spec.
> +
> +minline-atomics
> +Target Var(TARGET_INLINE_SUBWORD_ATOMIC) Init(1)
> +Always inline subword atomic operations.
> diff --git a/gcc/config/riscv/sync.md b/gcc/config/riscv/sync.md
> index c932ef87b9d..5adbe08e2d6 100644
> --- a/gcc/config/riscv/sync.md
> +++ b/gcc/config/riscv/sync.md
> @@ -21,8 +21,11 @@
>
>  (define_c_enum "unspec" [
>    UNSPEC_COMPARE_AND_SWAP
> +  UNSPEC_COMPARE_AND_SWAP_SUBWORD
>    UNSPEC_SYNC_OLD_OP
> +  UNSPEC_SYNC_OLD_OP_SUBWORD
>    UNSPEC_SYNC_EXCHANGE
> +  UNSPEC_SYNC_EXCHANGE_SUBWORD
>    UNSPEC_ATOMIC_STORE
>    UNSPEC_MEMORY_BARRIER
>  ])
> @@ -91,6 +94,143 @@
>    [(set_attr "type" "atomic")
>     (set (attr "length") (const_int 8))])
>
> +(define_insn "subword_atomic_fetch_strong_<atomic_optab>"
> +  [(set (match_operand:SI 0 "register_operand" "=&r")		   ;; old value at mem
> +	(match_operand:SI 1 "memory_operand" "+A"))		   ;; mem location
> +   (set (match_dup 1)
> +	(unspec_volatile:SI
> +	  [(any_atomic:SI (match_dup 1)
> +		     (match_operand:SI 2 "register_operand" "rI")) ;; value for op
> +	   (match_operand:SI 3 "register_operand" "rI")]	   ;; mask
> +	 UNSPEC_SYNC_OLD_OP_SUBWORD))
> +    (match_operand:SI 4 "register_operand" "rI")		   ;; not_mask
> +    (clobber (match_scratch:SI 5 "=&r"))			   ;; tmp_1
> +    (clobber (match_scratch:SI 6 "=&r"))]			   ;; tmp_2
> +  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
> +  {
> +    return "1:\;"
> +	   "lr.w.aq\t%0, %1\;"
> +	   "<insn>\t%5, %0, %2\;"
> +	   "and\t%5, %5, %3\;"
> +	   "and\t%6, %0, %4\;"
> +	   "or\t%6, %6, %5\;"
> +	   "sc.w.rl\t%5, %6, %1\;"
> +	   "bnez\t%5, 1b";
> +  }
> +  [(set (attr "length") (const_int 28))])
> +
> +(define_expand "atomic_fetch_nand<mode>"
> +  [(set (match_operand:SHORT 0 "register_operand" "=&r")
> +	(match_operand:SHORT 1 "memory_operand" "+A"))
> +   (set (match_dup 1)
> +	(unspec_volatile:SHORT
> +	  [(not:SHORT (and:SHORT (match_dup 1)
> +				 (match_operand:SHORT 2 "reg_or_0_operand" "rJ")))
> +	   (match_operand:SI 3 "const_int_operand")] ;; model
> +	 UNSPEC_SYNC_OLD_OP_SUBWORD))]
> +  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
> +{
> +  /* We have no QImode/HImode atomics, so form a mask, then use
> +     subword_atomic_fetch_strong_nand to implement a LR/SC version of the
> +     operation. */
> +
> +  /* Logic duplicated in gcc/libgcc/config/riscv/atomic.c for use when inlining
> +     is disabled */
> +
> +  rtx old = gen_reg_rtx (SImode);
> +  rtx mem = operands[1];
> +  rtx value = operands[2];
> +  rtx aligned_mem = gen_reg_rtx (SImode);
> +  rtx shift = gen_reg_rtx (SImode);
> +  rtx mask = gen_reg_rtx (SImode);
> +  rtx not_mask = gen_reg_rtx (SImode);
> +
> +  riscv_subword_address (mem, &aligned_mem, &shift, &mask, &not_mask);
> +
> +  rtx shifted_value = gen_reg_rtx (SImode);
> +  riscv_lshift_subword (<MODE>mode, value, shift, &shifted_value);
> +
> +  emit_insn (gen_subword_atomic_fetch_strong_nand (old, aligned_mem,
> +						   shifted_value,
> +						   mask, not_mask));
> +
> +  emit_move_insn (old, gen_rtx_ASHIFTRT (SImode, old,
> +					 gen_lowpart (QImode, shift)));
> +
> +  emit_move_insn (operands[0], gen_lowpart (<MODE>mode, old));
> +
> +  DONE;
> +})
> +
> +(define_insn "subword_atomic_fetch_strong_nand"
> +  [(set (match_operand:SI 0 "register_operand" "=&r")			  ;; old value at mem
> +	(match_operand:SI 1 "memory_operand" "+A"))			  ;; mem location
> +   (set (match_dup 1)
> +	(unspec_volatile:SI
> +	  [(not:SI (and:SI (match_dup 1)
> +			   (match_operand:SI 2 "register_operand" "rI"))) ;; value for op
> +	   (match_operand:SI 3 "register_operand" "rI")]		  ;; mask
> +	 UNSPEC_SYNC_OLD_OP_SUBWORD))
> +    (match_operand:SI 4 "register_operand" "rI")			  ;; not_mask
> +    (clobber (match_scratch:SI 5 "=&r"))				  ;; tmp_1
> +    (clobber (match_scratch:SI 6 "=&r"))]				  ;; tmp_2
> +  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
> +  {
> +    return "1:\;"
> +	   "lr.w.aq\t%0, %1\;"
> +	   "and\t%5, %0, %2\;"
> +	   "not\t%5, %5\;"
> +	   "and\t%5, %5, %3\;"
> +	   "and\t%6, %0, %4\;"
> +	   "or\t%6, %6, %5\;"
> +	   "sc.w.rl\t%5, %6, %1\;"
> +	   "bnez\t%5, 1b";
> +  }
> +  [(set (attr "length") (const_int 32))])
> +
> +(define_expand "atomic_fetch_<atomic_optab><mode>"
> +  [(set (match_operand:SHORT 0 "register_operand" "=&r")	      ;; old value at mem
> +	(match_operand:SHORT 1 "memory_operand" "+A"))		      ;; mem location
> +   (set (match_dup 1)
> +	(unspec_volatile:SHORT
> +	  [(any_atomic:SHORT (match_dup 1)
> +		     (match_operand:SHORT 2 "reg_or_0_operand" "rJ")) ;; value for op
> +	   (match_operand:SI 3 "const_int_operand")]		      ;; model
> +	 UNSPEC_SYNC_OLD_OP_SUBWORD))]
> +  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
> +{
> +  /* We have no QImode/HImode atomics, so form a mask, then use
> +     subword_atomic_fetch_strong_<mode> to implement a LR/SC version of the
> +     operation. */
> +
> +  /* Logic duplicated in gcc/libgcc/config/riscv/atomic.c for use when inlining
> +     is disabled */
> +
> +  rtx old = gen_reg_rtx (SImode);
> +  rtx mem = operands[1];
> +  rtx value = operands[2];
> +  rtx aligned_mem = gen_reg_rtx (SImode);
> +  rtx shift = gen_reg_rtx (SImode);
> +  rtx mask = gen_reg_rtx (SImode);
> +  rtx not_mask = gen_reg_rtx (SImode);
> +
> +  riscv_subword_address (mem, &aligned_mem, &shift, &mask, &not_mask);
> +
> +  rtx shifted_value = gen_reg_rtx (SImode);
> +  riscv_lshift_subword (<MODE>mode, value, shift, &shifted_value);
> +
> +  emit_insn (gen_subword_atomic_fetch_strong_<atomic_optab> (old, aligned_mem,
> +							     shifted_value,
> +							     mask, not_mask));
> +
> +  emit_move_insn (old, gen_rtx_ASHIFTRT (SImode, old,
> +					 gen_lowpart (QImode, shift)));
> +
> +  emit_move_insn (operands[0], gen_lowpart (<MODE>mode, old));
> +
> +  DONE;
> +})
> +
>  (define_insn "atomic_exchange<mode>"
>    [(set (match_operand:GPR 0 "register_operand" "=&r")
>  	(unspec_volatile:GPR
> @@ -104,6 +244,59 @@
>    [(set_attr "type" "atomic")
>     (set (attr "length") (const_int 8))])
>
> +(define_expand "atomic_exchange<mode>"
> +  [(set (match_operand:SHORT 0 "register_operand" "=&r")
> +	(unspec_volatile:SHORT
> +	  [(match_operand:SHORT 1 "memory_operand" "+A")
> +	   (match_operand:SI 3 "const_int_operand")] ;; model
> +	  UNSPEC_SYNC_EXCHANGE_SUBWORD))
> +   (set (match_dup 1)
> +	(match_operand:SHORT 2 "register_operand" "0"))]
> +  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
> +{
> +  rtx old = gen_reg_rtx (SImode);
> +  rtx mem = operands[1];
> +  rtx value = operands[2];
> +  rtx aligned_mem = gen_reg_rtx (SImode);
> +  rtx shift = gen_reg_rtx (SImode);
> +  rtx mask = gen_reg_rtx (SImode);
> +  rtx not_mask = gen_reg_rtx (SImode);
> +
> +  riscv_subword_address (mem, &aligned_mem, &shift, &mask, &not_mask);
> +
> +  rtx shifted_value = gen_reg_rtx (SImode);
> +  riscv_lshift_subword (<MODE>mode, value, shift, &shifted_value);
> +
> +  emit_insn (gen_subword_atomic_exchange_strong (old, aligned_mem,
> +						 shifted_value, not_mask));
> +
> +  emit_move_insn (old, gen_rtx_ASHIFTRT (SImode, old,
> +					 gen_lowpart (QImode, shift)));
> +
> +  emit_move_insn (operands[0], gen_lowpart (<MODE>mode, old));
> +  DONE;
> +})
> +
> +(define_insn "subword_atomic_exchange_strong"
> +  [(set (match_operand:SI 0 "register_operand" "=&r")	 ;; old value at mem
> +	(match_operand:SI 1 "memory_operand" "+A"))	 ;; mem location
> +   (set (match_dup 1)
> +	(unspec_volatile:SI
> +	  [(match_operand:SI 2 "reg_or_0_operand" "rI")  ;; value
> +	   (match_operand:SI 3 "reg_or_0_operand" "rI")] ;; not_mask
> +      UNSPEC_SYNC_EXCHANGE_SUBWORD))
> +    (clobber (match_scratch:SI 4 "=&r"))]		 ;; tmp_1
> +  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
> +  {
> +    return "1:\;"
> +	   "lr.w.aq\t%0, %1\;"
> +	   "and\t%4, %0, %3\;"
> +	   "or\t%4, %4, %2\;"
> +	   "sc.w.rl\t%4, %4, %1\;"
> +	   "bnez\t%4, 1b";
> +  }
> +  [(set (attr "length") (const_int 20))])
> +
>  (define_insn "atomic_cas_value_strong<mode>"
>    [(set (match_operand:GPR 0 "register_operand" "=&r")
>  	(match_operand:GPR 1 "memory_operand" "+A"))
> @@ -153,6 +346,127 @@
>    DONE;
>  })
>
> +(define_expand "atomic_compare_and_swap<mode>"
> +  [(match_operand:SI 0 "register_operand" "")    ;; bool output
> +   (match_operand:SHORT 1 "register_operand" "") ;; val output
> +   (match_operand:SHORT 2 "memory_operand" "")   ;; memory
> +   (match_operand:SHORT 3 "reg_or_0_operand" "") ;; expected value
> +   (match_operand:SHORT 4 "reg_or_0_operand" "") ;; desired value
> +   (match_operand:SI 5 "const_int_operand" "")   ;; is_weak
> +   (match_operand:SI 6 "const_int_operand" "")   ;; mod_s
> +   (match_operand:SI 7 "const_int_operand" "")]  ;; mod_f
> +  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
> +{
> +  emit_insn (gen_atomic_cas_value_strong<mode> (operands[1], operands[2],
> +						operands[3], operands[4],
> +						operands[6], operands[7]));
> +
> +  rtx val = gen_reg_rtx (SImode);
> +  if (operands[1] != const0_rtx)
> +    emit_move_insn (val, gen_rtx_SIGN_EXTEND (SImode, operands[1]));
> +  else
> +    emit_move_insn (val, const0_rtx);
> +
> +  rtx exp = gen_reg_rtx (SImode);
> +  if (operands[3] != const0_rtx)
> +    emit_move_insn (exp, gen_rtx_SIGN_EXTEND (SImode, operands[3]));
> +  else
> +    emit_move_insn (exp, const0_rtx);
> +
> +  rtx compare = val;
> +  if (exp != const0_rtx)
> +    {
> +      rtx difference = gen_rtx_MINUS (SImode, val, exp);
> +      compare = gen_reg_rtx (SImode);
> +      emit_move_insn  (compare, difference);
> +    }
> +
> +  if (word_mode != SImode)
> +    {
> +      rtx reg = gen_reg_rtx (word_mode);
> +      emit_move_insn (reg, gen_rtx_SIGN_EXTEND (word_mode, compare));
> +      compare = reg;
> +    }
> +
> +  emit_move_insn (operands[0], gen_rtx_EQ (SImode, compare, const0_rtx));
> +  DONE;
> +})
> +
> +(define_expand "atomic_cas_value_strong<mode>"
> +  [(set (match_operand:SHORT 0 "register_operand" "=&r")			;; val output
> +	(match_operand:SHORT 1 "memory_operand" "+A"))				;; memory
> +   (set (match_dup 1)
> +	(unspec_volatile:SHORT [(match_operand:SHORT 2 "reg_or_0_operand" "rJ")	;; expected val
> +				(match_operand:SHORT 3 "reg_or_0_operand" "rJ")	;; desired val
> +				(match_operand:SI 4 "const_int_operand")	;; mod_s
> +				(match_operand:SI 5 "const_int_operand")]	;; mod_f
> +	 UNSPEC_COMPARE_AND_SWAP_SUBWORD))
> +   (clobber (match_scratch:SHORT 6 "=&r"))]
> +  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
> +{
> +  /* We have no QImode/HImode atomics, so form a mask, then use
> +     subword_atomic_cas_strong<mode> to implement a LR/SC version of the
> +     operation. */
> +
> +  /* Logic duplicated in gcc/libgcc/config/riscv/atomic.c for use when inlining
> +     is disabled */
> +
> +  rtx old = gen_reg_rtx (SImode);
> +  rtx mem = operands[1];
> +  rtx aligned_mem = gen_reg_rtx (SImode);
> +  rtx shift = gen_reg_rtx (SImode);
> +  rtx mask = gen_reg_rtx (SImode);
> +  rtx not_mask = gen_reg_rtx (SImode);
> +
> +  riscv_subword_address (mem, &aligned_mem, &shift, &mask, &not_mask);
> +
> +  rtx o = operands[2];
> +  rtx n = operands[3];
> +  rtx shifted_o = gen_reg_rtx (SImode);
> +  rtx shifted_n = gen_reg_rtx (SImode);
> +
> +  riscv_lshift_subword (<MODE>mode, o, shift, &shifted_o);
> +  riscv_lshift_subword (<MODE>mode, n, shift, &shifted_n);
> +
> +  emit_move_insn (shifted_o, gen_rtx_AND (SImode, shifted_o, mask));
> +  emit_move_insn (shifted_n, gen_rtx_AND (SImode, shifted_n, mask));
> +
> +  emit_insn (gen_subword_atomic_cas_strong (old, aligned_mem,
> +					    shifted_o, shifted_n,
> +					    mask, not_mask));
> +
> +  emit_move_insn (old, gen_rtx_ASHIFTRT (SImode, old,
> +					 gen_lowpart (QImode, shift)));
> +
> +  emit_move_insn (operands[0], gen_lowpart(<MODE>mode, old));
> +
> +  DONE;
> +})
> +
> +(define_insn "subword_atomic_cas_strong"
> +  [(set (match_operand:SI 0 "register_operand" "=&r")			   ;; old value at mem
> +	(match_operand:SI 1 "memory_operand" "+A"))			   ;; mem location
> +   (set (match_dup 1)
> +	(unspec_volatile:SI [(match_operand:SI 2 "reg_or_0_operand" "rJ")  ;; o
> +			     (match_operand:SI 3 "reg_or_0_operand" "rJ")] ;; n
> +	 UNSPEC_COMPARE_AND_SWAP_SUBWORD))
> +	(match_operand:SI 4 "register_operand" "rI")			   ;; mask
> +	(match_operand:SI 5 "register_operand" "rI")			   ;; not_mask
> +	(clobber (match_scratch:SI 6 "=&r"))]				   ;; tmp_1
> +  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
> +  {
> +    return "1:\;"
> +	   "lr.w.aq\t%0, %1\;"
> +	   "and\t%6, %0, %4\;"
> +	   "bne\t%6, %z2, 1f\;"
> +	   "and\t%6, %0, %5\;"
> +	   "or\t%6, %6, %3\;"
> +	   "sc.w.rl\t%6, %6, %1\;"
> +	   "bnez\t%6, 1b\;"
> +	   "1:";
> +  }
> +  [(set (attr "length") (const_int 28))])
> +
>  (define_expand "atomic_test_and_set"
>    [(match_operand:QI 0 "register_operand" "")     ;; bool output
>     (match_operand:QI 1 "memory_operand" "+A")    ;; memory
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index a38547f53e5..ba448dcb7ef 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -1226,7 +1226,8 @@ See RS/6000 and PowerPC Options.
>  -mbig-endian  -mlittle-endian
>  -mstack-protector-guard=@var{guard}  -mstack-protector-guard-reg=@var{reg}
>  -mstack-protector-guard-offset=@var{offset}
> --mcsr-check -mno-csr-check}
> +-mcsr-check -mno-csr-check
> +-minline-atomics  -mno-inline-atomics}
>
>  @emph{RL78 Options}
>  @gccoptlist{-msim  -mmul=none  -mmul=g13  -mmul=g14  -mallregs
> @@ -29006,6 +29007,13 @@ Do or don't use smaller but slower prologue and epilogue code that uses
>  library function calls.  The default is to use fast inline prologues and
>  epilogues.
>
> +@opindex minline-atomics
> +@item -minline-atomics
> +@itemx -mno-inline-atomics
> +Do or don't use smaller but slower subword atomic emulation code that uses
> +libatomic function calls.  The default is to use fast inline subword atomics
> +that do not require libatomic.
> +
>  @opindex mshorten-memrefs
>  @item -mshorten-memrefs
>  @itemx -mno-shorten-memrefs
> diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-1.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-1.c
> new file mode 100644
> index 00000000000..5c5623d9b2f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-1.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mno-inline-atomics" } */
> +/* { dg-message "note: '__sync_fetch_and_nand' changed semantics in GCC 4.4" "fetch_and_nand" { target *-*-* } 0 } */
> +/* { dg-final { scan-assembler "\tcall\t__sync_fetch_and_add_1" } } */
> +/* { dg-final { scan-assembler "\tcall\t__sync_fetch_and_nand_1" } } */
> +/* { dg-final { scan-assembler "\tcall\t__sync_bool_compare_and_swap_1" } } */
> +
> +char foo;
> +char bar;
> +char baz;
> +
> +int
> +main ()
> +{
> +  __sync_fetch_and_add(&foo, 1);
> +  __sync_fetch_and_nand(&bar, 1);
> +  __sync_bool_compare_and_swap (&baz, 1, 2);
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-2.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-2.c
> new file mode 100644
> index 00000000000..fdce7a5d71f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-2.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* Verify that subword atomics do not generate calls.  */
> +/* { dg-options "-minline-atomics" } */
> +/* { dg-message "note: '__sync_fetch_and_nand' changed semantics in GCC 4.4" "fetch_and_nand" { target *-*-* } 0 } */
> +/* { dg-final { scan-assembler-not "\tcall\t__sync_fetch_and_add_1" } } */
> +/* { dg-final { scan-assembler-not "\tcall\t__sync_fetch_and_nand_1" } } */
> +/* { dg-final { scan-assembler-not "\tcall\t__sync_bool_compare_and_swap_1" } } */
> +
> +char foo;
> +char bar;
> +char baz;
> +
> +int
> +main ()
> +{
> +  __sync_fetch_and_add(&foo, 1);
> +  __sync_fetch_and_nand(&bar, 1);
> +  __sync_bool_compare_and_swap (&baz, 1, 2);
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-3.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-3.c
> new file mode 100644
> index 00000000000..709f3734377
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-3.c
> @@ -0,0 +1,569 @@
> +/* Check all char alignments.  */
> +/* Duplicate logic as libatomic/testsuite/libatomic.c/atomic-op-1.c */
> +/* Test __atomic routines for existence and proper execution on 1 byte
> +   values with each valid memory model.  */
> +/* { dg-do run } */
> +/* { dg-options "-minline-atomics -Wno-address-of-packed-member" } */
> +
> +/* Test the execution of the __atomic_*OP builtin routines for a char.  */
> +
> +extern void abort(void);
> +
> +char count, res;
> +const char init = ~0;
> +
> +struct A
> +{
> +   char a;
> +   char b;
> +   char c;
> +   char d;
> +} __attribute__ ((packed)) A;
> +
> +/* The fetch_op routines return the original value before the operation.  */
> +
> +void
> +test_fetch_add (char* v)
> +{
> +  *v = 0;
> +  count = 1;
> +
> +  if (__atomic_fetch_add (v, count, __ATOMIC_RELAXED) != 0)
> +    abort ();
> +
> +  if (__atomic_fetch_add (v, 1, __ATOMIC_CONSUME) != 1)
> +    abort ();
> +
> +  if (__atomic_fetch_add (v, count, __ATOMIC_ACQUIRE) != 2)
> +    abort ();
> +
> +  if (__atomic_fetch_add (v, 1, __ATOMIC_RELEASE) != 3)
> +    abort ();
> +
> +  if (__atomic_fetch_add (v, count, __ATOMIC_ACQ_REL) != 4)
> +    abort ();
> +
> +  if (__atomic_fetch_add (v, 1, __ATOMIC_SEQ_CST) != 5)
> +    abort ();
> +}
> +
> +
> +void
> +test_fetch_sub (char* v)
> +{
> +  *v = res = 20;
> +  count = 0;
> +
> +  if (__atomic_fetch_sub (v, count + 1, __ATOMIC_RELAXED) !=  res--)
> +    abort ();
> +
> +  if (__atomic_fetch_sub (v, 1, __ATOMIC_CONSUME) !=  res--)
> +    abort ();
> +
> +  if (__atomic_fetch_sub (v, count + 1, __ATOMIC_ACQUIRE) !=  res--)
> +    abort ();
> +
> +  if (__atomic_fetch_sub (v, 1, __ATOMIC_RELEASE) !=  res--)
> +    abort ();
> +
> +  if (__atomic_fetch_sub (v, count + 1, __ATOMIC_ACQ_REL) !=  res--)
> +    abort ();
> +
> +  if (__atomic_fetch_sub (v, 1, __ATOMIC_SEQ_CST) !=  res--)
> +    abort ();
> +}
> +
> +void
> +test_fetch_and (char* v)
> +{
> +  *v = init;
> +
> +  if (__atomic_fetch_and (v, 0, __ATOMIC_RELAXED) !=  init)
> +    abort ();
> +
> +  if (__atomic_fetch_and (v, init, __ATOMIC_CONSUME) !=  0)
> +    abort ();
> +
> +  if (__atomic_fetch_and (v, 0, __ATOMIC_ACQUIRE) !=  0)
> +    abort ();
> +
> +  *v = ~*v;
> +  if (__atomic_fetch_and (v, init, __ATOMIC_RELEASE) !=  init)
> +    abort ();
> +
> +  if (__atomic_fetch_and (v, 0, __ATOMIC_ACQ_REL) !=  init)
> +    abort ();
> +
> +  if (__atomic_fetch_and (v, 0, __ATOMIC_SEQ_CST) !=  0)
> +    abort ();
> +}
> +
> +void
> +test_fetch_nand (char* v)
> +{
> +  *v = init;
> +
> +  if (__atomic_fetch_nand (v, 0, __ATOMIC_RELAXED) !=  init)
> +    abort ();
> +
> +  if (__atomic_fetch_nand (v, init, __ATOMIC_CONSUME) !=  init)
> +    abort ();
> +
> +  if (__atomic_fetch_nand (v, 0, __ATOMIC_ACQUIRE) !=  0 )
> +    abort ();
> +
> +  if (__atomic_fetch_nand (v, init, __ATOMIC_RELEASE) !=  init)
> +    abort ();
> +
> +  if (__atomic_fetch_nand (v, init, __ATOMIC_ACQ_REL) !=  0)
> +    abort ();
> +
> +  if (__atomic_fetch_nand (v, 0, __ATOMIC_SEQ_CST) !=  init)
> +    abort ();
> +}
> +
> +void
> +test_fetch_xor (char* v)
> +{
> +  *v = init;
> +  count = 0;
> +
> +  if (__atomic_fetch_xor (v, count, __ATOMIC_RELAXED) !=  init)
> +    abort ();
> +
> +  if (__atomic_fetch_xor (v, ~count, __ATOMIC_CONSUME) !=  init)
> +    abort ();
> +
> +  if (__atomic_fetch_xor (v, 0, __ATOMIC_ACQUIRE) !=  0)
> +    abort ();
> +
> +  if (__atomic_fetch_xor (v, ~count, __ATOMIC_RELEASE) !=  0)
> +    abort ();
> +
> +  if (__atomic_fetch_xor (v, 0, __ATOMIC_ACQ_REL) !=  init)
> +    abort ();
> +
> +  if (__atomic_fetch_xor (v, ~count, __ATOMIC_SEQ_CST) !=  init)
> +    abort ();
> +}
> +
> +void
> +test_fetch_or (char* v)
> +{
> +  *v = 0;
> +  count = 1;
> +
> +  if (__atomic_fetch_or (v, count, __ATOMIC_RELAXED) !=  0)
> +    abort ();
> +
> +  count *= 2;
> +  if (__atomic_fetch_or (v, 2, __ATOMIC_CONSUME) !=  1)
> +    abort ();
> +
> +  count *= 2;
> +  if (__atomic_fetch_or (v, count, __ATOMIC_ACQUIRE) !=  3)
> +    abort ();
> +
> +  count *= 2;
> +  if (__atomic_fetch_or (v, 8, __ATOMIC_RELEASE) !=  7)
> +    abort ();
> +
> +  count *= 2;
> +  if (__atomic_fetch_or (v, count, __ATOMIC_ACQ_REL) !=  15)
> +    abort ();
> +
> +  count *= 2;
> +  if (__atomic_fetch_or (v, count, __ATOMIC_SEQ_CST) !=  31)
> +    abort ();
> +}
> +
> +/* The OP_fetch routines return the new value after the operation.  */
> +
> +void
> +test_add_fetch (char* v)
> +{
> +  *v = 0;
> +  count = 1;
> +
> +  if (__atomic_add_fetch (v, count, __ATOMIC_RELAXED) != 1)
> +    abort ();
> +
> +  if (__atomic_add_fetch (v, 1, __ATOMIC_CONSUME) != 2)
> +    abort ();
> +
> +  if (__atomic_add_fetch (v, count, __ATOMIC_ACQUIRE) != 3)
> +    abort ();
> +
> +  if (__atomic_add_fetch (v, 1, __ATOMIC_RELEASE) != 4)
> +    abort ();
> +
> +  if (__atomic_add_fetch (v, count, __ATOMIC_ACQ_REL) != 5)
> +    abort ();
> +
> +  if (__atomic_add_fetch (v, count, __ATOMIC_SEQ_CST) != 6)
> +    abort ();
> +}
> +
> +
> +void
> +test_sub_fetch (char* v)
> +{
> +  *v = res = 20;
> +  count = 0;
> +
> +  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_RELAXED) !=  --res)
> +    abort ();
> +
> +  if (__atomic_sub_fetch (v, 1, __ATOMIC_CONSUME) !=  --res)
> +    abort ();
> +
> +  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_ACQUIRE) !=  --res)
> +    abort ();
> +
> +  if (__atomic_sub_fetch (v, 1, __ATOMIC_RELEASE) !=  --res)
> +    abort ();
> +
> +  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_ACQ_REL) !=  --res)
> +    abort ();
> +
> +  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_SEQ_CST) !=  --res)
> +    abort ();
> +}
> +
> +void
> +test_and_fetch (char* v)
> +{
> +  *v = init;
> +
> +  if (__atomic_and_fetch (v, 0, __ATOMIC_RELAXED) !=  0)
> +    abort ();
> +
> +  *v = init;
> +  if (__atomic_and_fetch (v, init, __ATOMIC_CONSUME) !=  init)
> +    abort ();
> +
> +  if (__atomic_and_fetch (v, 0, __ATOMIC_ACQUIRE) !=  0)
> +    abort ();
> +
> +  *v = ~*v;
> +  if (__atomic_and_fetch (v, init, __ATOMIC_RELEASE) !=  init)
> +    abort ();
> +
> +  if (__atomic_and_fetch (v, 0, __ATOMIC_ACQ_REL) !=  0)
> +    abort ();
> +
> +  *v = ~*v;
> +  if (__atomic_and_fetch (v, 0, __ATOMIC_SEQ_CST) !=  0)
> +    abort ();
> +}
> +
> +void
> +test_nand_fetch (char* v)
> +{
> +  *v = init;
> +
> +  if (__atomic_nand_fetch (v, 0, __ATOMIC_RELAXED) !=  init)
> +    abort ();
> +
> +  if (__atomic_nand_fetch (v, init, __ATOMIC_CONSUME) !=  0)
> +    abort ();
> +
> +  if (__atomic_nand_fetch (v, 0, __ATOMIC_ACQUIRE) !=  init)
> +    abort ();
> +
> +  if (__atomic_nand_fetch (v, init, __ATOMIC_RELEASE) !=  0)
> +    abort ();
> +
> +  if (__atomic_nand_fetch (v, init, __ATOMIC_ACQ_REL) !=  init)
> +    abort ();
> +
> +  if (__atomic_nand_fetch (v, 0, __ATOMIC_SEQ_CST) !=  init)
> +    abort ();
> +}
> +
> +
> +
> +void
> +test_xor_fetch (char* v)
> +{
> +  *v = init;
> +  count = 0;
> +
> +  if (__atomic_xor_fetch (v, count, __ATOMIC_RELAXED) !=  init)
> +    abort ();
> +
> +  if (__atomic_xor_fetch (v, ~count, __ATOMIC_CONSUME) !=  0)
> +    abort ();
> +
> +  if (__atomic_xor_fetch (v, 0, __ATOMIC_ACQUIRE) !=  0)
> +    abort ();
> +
> +  if (__atomic_xor_fetch (v, ~count, __ATOMIC_RELEASE) !=  init)
> +    abort ();
> +
> +  if (__atomic_xor_fetch (v, 0, __ATOMIC_ACQ_REL) !=  init)
> +    abort ();
> +
> +  if (__atomic_xor_fetch (v, ~count, __ATOMIC_SEQ_CST) !=  0)
> +    abort ();
> +}
> +
> +void
> +test_or_fetch (char* v)
> +{
> +  *v = 0;
> +  count = 1;
> +
> +  if (__atomic_or_fetch (v, count, __ATOMIC_RELAXED) !=  1)
> +    abort ();
> +
> +  count *= 2;
> +  if (__atomic_or_fetch (v, 2, __ATOMIC_CONSUME) !=  3)
> +    abort ();
> +
> +  count *= 2;
> +  if (__atomic_or_fetch (v, count, __ATOMIC_ACQUIRE) !=  7)
> +    abort ();
> +
> +  count *= 2;
> +  if (__atomic_or_fetch (v, 8, __ATOMIC_RELEASE) !=  15)
> +    abort ();
> +
> +  count *= 2;
> +  if (__atomic_or_fetch (v, count, __ATOMIC_ACQ_REL) !=  31)
> +    abort ();
> +
> +  count *= 2;
> +  if (__atomic_or_fetch (v, count, __ATOMIC_SEQ_CST) !=  63)
> +    abort ();
> +}
> +
> +
> +/* Test the OP routines with a result which isn't used. Use both variations
> +   within each function.  */
> +
> +void
> +test_add (char* v)
> +{
> +  *v = 0;
> +  count = 1;
> +
> +  __atomic_add_fetch (v, count, __ATOMIC_RELAXED);
> +  if (*v != 1)
> +    abort ();
> +
> +  __atomic_fetch_add (v, count, __ATOMIC_CONSUME);
> +  if (*v != 2)
> +    abort ();
> +
> +  __atomic_add_fetch (v, 1 , __ATOMIC_ACQUIRE);
> +  if (*v != 3)
> +    abort ();
> +
> +  __atomic_fetch_add (v, 1, __ATOMIC_RELEASE);
> +  if (*v != 4)
> +    abort ();
> +
> +  __atomic_add_fetch (v, count, __ATOMIC_ACQ_REL);
> +  if (*v != 5)
> +    abort ();
> +
> +  __atomic_fetch_add (v, count, __ATOMIC_SEQ_CST);
> +  if (*v != 6)
> +    abort ();
> +}
> +
> +
> +void
> +test_sub (char* v)
> +{
> +  *v = res = 20;
> +  count = 0;
> +
> +  __atomic_sub_fetch (v, count + 1, __ATOMIC_RELAXED);
> +  if (*v != --res)
> +    abort ();
> +
> +  __atomic_fetch_sub (v, count + 1, __ATOMIC_CONSUME);
> +  if (*v != --res)
> +    abort ();
> +
> +  __atomic_sub_fetch (v, 1, __ATOMIC_ACQUIRE);
> +  if (*v != --res)
> +    abort ();
> +
> +  __atomic_fetch_sub (v, 1, __ATOMIC_RELEASE);
> +  if (*v != --res)
> +    abort ();
> +
> +  __atomic_sub_fetch (v, count + 1, __ATOMIC_ACQ_REL);
> +  if (*v != --res)
> +    abort ();
> +
> +  __atomic_fetch_sub (v, count + 1, __ATOMIC_SEQ_CST);
> +  if (*v != --res)
> +    abort ();
> +}
> +
> +void
> +test_and (char* v)
> +{
> +  *v = init;
> +
> +  __atomic_and_fetch (v, 0, __ATOMIC_RELAXED);
> +  if (*v != 0)
> +    abort ();
> +
> +  *v = init;
> +  __atomic_fetch_and (v, init, __ATOMIC_CONSUME);
> +  if (*v != init)
> +    abort ();
> +
> +  __atomic_and_fetch (v, 0, __ATOMIC_ACQUIRE);
> +  if (*v != 0)
> +    abort ();
> +
> +  *v = ~*v;
> +  __atomic_fetch_and (v, init, __ATOMIC_RELEASE);
> +  if (*v != init)
> +    abort ();
> +
> +  __atomic_and_fetch (v, 0, __ATOMIC_ACQ_REL);
> +  if (*v != 0)
> +    abort ();
> +
> +  *v = ~*v;
> +  __atomic_fetch_and (v, 0, __ATOMIC_SEQ_CST);
> +  if (*v != 0)
> +    abort ();
> +}
> +
> +void
> +test_nand (char* v)
> +{
> +  *v = init;
> +
> +  __atomic_fetch_nand (v, 0, __ATOMIC_RELAXED);
> +  if (*v != init)
> +    abort ();
> +
> +  __atomic_fetch_nand (v, init, __ATOMIC_CONSUME);
> +  if (*v != 0)
> +    abort ();
> +
> +  __atomic_nand_fetch (v, 0, __ATOMIC_ACQUIRE);
> +  if (*v != init)
> +    abort ();
> +
> +  __atomic_nand_fetch (v, init, __ATOMIC_RELEASE);
> +  if (*v != 0)
> +    abort ();
> +
> +  __atomic_fetch_nand (v, init, __ATOMIC_ACQ_REL);
> +  if (*v != init)
> +    abort ();
> +
> +  __atomic_nand_fetch (v, 0, __ATOMIC_SEQ_CST);
> +  if (*v != init)
> +    abort ();
> +}
> +
> +
> +
> +void
> +test_xor (char* v)
> +{
> +  *v = init;
> +  count = 0;
> +
> +  __atomic_xor_fetch (v, count, __ATOMIC_RELAXED);
> +  if (*v != init)
> +    abort ();
> +
> +  __atomic_fetch_xor (v, ~count, __ATOMIC_CONSUME);
> +  if (*v != 0)
> +    abort ();
> +
> +  __atomic_xor_fetch (v, 0, __ATOMIC_ACQUIRE);
> +  if (*v != 0)
> +    abort ();
> +
> +  __atomic_fetch_xor (v, ~count, __ATOMIC_RELEASE);
> +  if (*v != init)
> +    abort ();
> +
> +  __atomic_fetch_xor (v, 0, __ATOMIC_ACQ_REL);
> +  if (*v != init)
> +    abort ();
> +
> +  __atomic_xor_fetch (v, ~count, __ATOMIC_SEQ_CST);
> +  if (*v != 0)
> +    abort ();
> +}
> +
> +void
> +test_or (char* v)
> +{
> +  *v = 0;
> +  count = 1;
> +
> +  __atomic_or_fetch (v, count, __ATOMIC_RELAXED);
> +  if (*v != 1)
> +    abort ();
> +
> +  count *= 2;
> +  __atomic_fetch_or (v, count, __ATOMIC_CONSUME);
> +  if (*v != 3)
> +    abort ();
> +
> +  count *= 2;
> +  __atomic_or_fetch (v, 4, __ATOMIC_ACQUIRE);
> +  if (*v != 7)
> +    abort ();
> +
> +  count *= 2;
> +  __atomic_fetch_or (v, 8, __ATOMIC_RELEASE);
> +  if (*v != 15)
> +    abort ();
> +
> +  count *= 2;
> +  __atomic_or_fetch (v, count, __ATOMIC_ACQ_REL);
> +  if (*v != 31)
> +    abort ();
> +
> +  count *= 2;
> +  __atomic_fetch_or (v, count, __ATOMIC_SEQ_CST);
> +  if (*v != 63)
> +    abort ();
> +}
> +
> +int
> +main ()
> +{
> +  char* V[] = {&A.a, &A.b, &A.c, &A.d};
> +
> +  for (int i = 0; i < 4; i++) {
> +    test_fetch_add (V[i]);
> +    test_fetch_sub (V[i]);
> +    test_fetch_and (V[i]);
> +    test_fetch_nand (V[i]);
> +    test_fetch_xor (V[i]);
> +    test_fetch_or (V[i]);
> +
> +    test_add_fetch (V[i]);
> +    test_sub_fetch (V[i]);
> +    test_and_fetch (V[i]);
> +    test_nand_fetch (V[i]);
> +    test_xor_fetch (V[i]);
> +    test_or_fetch (V[i]);
> +
> +    test_add (V[i]);
> +    test_sub (V[i]);
> +    test_and (V[i]);
> +    test_nand (V[i]);
> +    test_xor (V[i]);
> +    test_or (V[i]);
> +  }
> +
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-4.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-4.c
> new file mode 100644
> index 00000000000..eecfaae5cc6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-4.c
> @@ -0,0 +1,566 @@
> +/* Check all short alignments.  */
> +/* Duplicate logic as libatomic/testsuite/libatomic.c/atomic-op-2.c */
> +/* Test __atomic routines for existence and proper execution on 2 byte
> +   values with each valid memory model.  */
> +/* { dg-do run } */
> +/* { dg-options "-minline-atomics -Wno-address-of-packed-member" } */
> +
> +/* Test the execution of the __atomic_*OP builtin routines for a short.  */
> +
> +extern void abort(void);
> +
> +short count, res;
> +const short init = ~0;
> +
> +struct A
> +{
> +   short a;
> +   short b;
> +} __attribute__ ((packed)) A;
> +
> +/* The fetch_op routines return the original value before the operation.  */
> +
> +void
> +test_fetch_add (short* v)
> +{
> +  *v = 0;
> +  count = 1;
> +
> +  if (__atomic_fetch_add (v, count, __ATOMIC_RELAXED) != 0)
> +    abort ();
> +
> +  if (__atomic_fetch_add (v, 1, __ATOMIC_CONSUME) != 1)
> +    abort ();
> +
> +  if (__atomic_fetch_add (v, count, __ATOMIC_ACQUIRE) != 2)
> +    abort ();
> +
> +  if (__atomic_fetch_add (v, 1, __ATOMIC_RELEASE) != 3)
> +    abort ();
> +
> +  if (__atomic_fetch_add (v, count, __ATOMIC_ACQ_REL) != 4)
> +    abort ();
> +
> +  if (__atomic_fetch_add (v, 1, __ATOMIC_SEQ_CST) != 5)
> +    abort ();
> +}
> +
> +
> +void
> +test_fetch_sub (short* v)
> +{
> +  *v = res = 20;
> +  count = 0;
> +
> +  if (__atomic_fetch_sub (v, count + 1, __ATOMIC_RELAXED) !=  res--)
> +    abort ();
> +
> +  if (__atomic_fetch_sub (v, 1, __ATOMIC_CONSUME) !=  res--)
> +    abort ();
> +
> +  if (__atomic_fetch_sub (v, count + 1, __ATOMIC_ACQUIRE) !=  res--)
> +    abort ();
> +
> +  if (__atomic_fetch_sub (v, 1, __ATOMIC_RELEASE) !=  res--)
> +    abort ();
> +
> +  if (__atomic_fetch_sub (v, count + 1, __ATOMIC_ACQ_REL) !=  res--)
> +    abort ();
> +
> +  if (__atomic_fetch_sub (v, 1, __ATOMIC_SEQ_CST) !=  res--)
> +    abort ();
> +}
> +
> +void
> +test_fetch_and (short* v)
> +{
> +  *v = init;
> +
> +  if (__atomic_fetch_and (v, 0, __ATOMIC_RELAXED) !=  init)
> +    abort ();
> +
> +  if (__atomic_fetch_and (v, init, __ATOMIC_CONSUME) !=  0)
> +    abort ();
> +
> +  if (__atomic_fetch_and (v, 0, __ATOMIC_ACQUIRE) !=  0)
> +    abort ();
> +
> +  *v = ~*v;
> +  if (__atomic_fetch_and (v, init, __ATOMIC_RELEASE) !=  init)
> +    abort ();
> +
> +  if (__atomic_fetch_and (v, 0, __ATOMIC_ACQ_REL) !=  init)
> +    abort ();
> +
> +  if (__atomic_fetch_and (v, 0, __ATOMIC_SEQ_CST) !=  0)
> +    abort ();
> +}
> +
> +void
> +test_fetch_nand (short* v)
> +{
> +  *v = init;
> +
> +  if (__atomic_fetch_nand (v, 0, __ATOMIC_RELAXED) !=  init)
> +    abort ();
> +
> +  if (__atomic_fetch_nand (v, init, __ATOMIC_CONSUME) !=  init)
> +    abort ();
> +
> +  if (__atomic_fetch_nand (v, 0, __ATOMIC_ACQUIRE) !=  0 )
> +    abort ();
> +
> +  if (__atomic_fetch_nand (v, init, __ATOMIC_RELEASE) !=  init)
> +    abort ();
> +
> +  if (__atomic_fetch_nand (v, init, __ATOMIC_ACQ_REL) !=  0)
> +    abort ();
> +
> +  if (__atomic_fetch_nand (v, 0, __ATOMIC_SEQ_CST) !=  init)
> +    abort ();
> +}
> +
> +void
> +test_fetch_xor (short* v)
> +{
> +  *v = init;
> +  count = 0;
> +
> +  if (__atomic_fetch_xor (v, count, __ATOMIC_RELAXED) !=  init)
> +    abort ();
> +
> +  if (__atomic_fetch_xor (v, ~count, __ATOMIC_CONSUME) !=  init)
> +    abort ();
> +
> +  if (__atomic_fetch_xor (v, 0, __ATOMIC_ACQUIRE) !=  0)
> +    abort ();
> +
> +  if (__atomic_fetch_xor (v, ~count, __ATOMIC_RELEASE) !=  0)
> +    abort ();
> +
> +  if (__atomic_fetch_xor (v, 0, __ATOMIC_ACQ_REL) !=  init)
> +    abort ();
> +
> +  if (__atomic_fetch_xor (v, ~count, __ATOMIC_SEQ_CST) !=  init)
> +    abort ();
> +}
> +
> +void
> +test_fetch_or (short* v)
> +{
> +  *v = 0;
> +  count = 1;
> +
> +  if (__atomic_fetch_or (v, count, __ATOMIC_RELAXED) !=  0)
> +    abort ();
> +
> +  count *= 2;
> +  if (__atomic_fetch_or (v, 2, __ATOMIC_CONSUME) !=  1)
> +    abort ();
> +
> +  count *= 2;
> +  if (__atomic_fetch_or (v, count, __ATOMIC_ACQUIRE) !=  3)
> +    abort ();
> +
> +  count *= 2;
> +  if (__atomic_fetch_or (v, 8, __ATOMIC_RELEASE) !=  7)
> +    abort ();
> +
> +  count *= 2;
> +  if (__atomic_fetch_or (v, count, __ATOMIC_ACQ_REL) !=  15)
> +    abort ();
> +
> +  count *= 2;
> +  if (__atomic_fetch_or (v, count, __ATOMIC_SEQ_CST) !=  31)
> +    abort ();
> +}
> +
> +/* The OP_fetch routines return the new value after the operation.  */
> +
> +void
> +test_add_fetch (short* v)
> +{
> +  *v = 0;
> +  count = 1;
> +
> +  if (__atomic_add_fetch (v, count, __ATOMIC_RELAXED) != 1)
> +    abort ();
> +
> +  if (__atomic_add_fetch (v, 1, __ATOMIC_CONSUME) != 2)
> +    abort ();
> +
> +  if (__atomic_add_fetch (v, count, __ATOMIC_ACQUIRE) != 3)
> +    abort ();
> +
> +  if (__atomic_add_fetch (v, 1, __ATOMIC_RELEASE) != 4)
> +    abort ();
> +
> +  if (__atomic_add_fetch (v, count, __ATOMIC_ACQ_REL) != 5)
> +    abort ();
> +
> +  if (__atomic_add_fetch (v, count, __ATOMIC_SEQ_CST) != 6)
> +    abort ();
> +}
> +
> +
> +void
> +test_sub_fetch (short* v)
> +{
> +  *v = res = 20;
> +  count = 0;
> +
> +  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_RELAXED) !=  --res)
> +    abort ();
> +
> +  if (__atomic_sub_fetch (v, 1, __ATOMIC_CONSUME) !=  --res)
> +    abort ();
> +
> +  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_ACQUIRE) !=  --res)
> +    abort ();
> +
> +  if (__atomic_sub_fetch (v, 1, __ATOMIC_RELEASE) !=  --res)
> +    abort ();
> +
> +  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_ACQ_REL) !=  --res)
> +    abort ();
> +
> +  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_SEQ_CST) !=  --res)
> +    abort ();
> +}
> +
> +void
> +test_and_fetch (short* v)
> +{
> +  *v = init;
> +
> +  if (__atomic_and_fetch (v, 0, __ATOMIC_RELAXED) !=  0)
> +    abort ();
> +
> +  *v = init;
> +  if (__atomic_and_fetch (v, init, __ATOMIC_CONSUME) !=  init)
> +    abort ();
> +
> +  if (__atomic_and_fetch (v, 0, __ATOMIC_ACQUIRE) !=  0)
> +    abort ();
> +
> +  *v = ~*v;
> +  if (__atomic_and_fetch (v, init, __ATOMIC_RELEASE) !=  init)
> +    abort ();
> +
> +  if (__atomic_and_fetch (v, 0, __ATOMIC_ACQ_REL) !=  0)
> +    abort ();
> +
> +  *v = ~*v;
> +  if (__atomic_and_fetch (v, 0, __ATOMIC_SEQ_CST) !=  0)
> +    abort ();
> +}
> +
> +void
> +test_nand_fetch (short* v)
> +{
> +  *v = init;
> +
> +  if (__atomic_nand_fetch (v, 0, __ATOMIC_RELAXED) !=  init)
> +    abort ();
> +
> +  if (__atomic_nand_fetch (v, init, __ATOMIC_CONSUME) !=  0)
> +    abort ();
> +
> +  if (__atomic_nand_fetch (v, 0, __ATOMIC_ACQUIRE) !=  init)
> +    abort ();
> +
> +  if (__atomic_nand_fetch (v, init, __ATOMIC_RELEASE) !=  0)
> +    abort ();
> +
> +  if (__atomic_nand_fetch (v, init, __ATOMIC_ACQ_REL) !=  init)
> +    abort ();
> +
> +  if (__atomic_nand_fetch (v, 0, __ATOMIC_SEQ_CST) !=  init)
> +    abort ();
> +}
> +
> +
> +
> +void
> +test_xor_fetch (short* v)
> +{
> +  *v = init;
> +  count = 0;
> +
> +  if (__atomic_xor_fetch (v, count, __ATOMIC_RELAXED) !=  init)
> +    abort ();
> +
> +  if (__atomic_xor_fetch (v, ~count, __ATOMIC_CONSUME) !=  0)
> +    abort ();
> +
> +  if (__atomic_xor_fetch (v, 0, __ATOMIC_ACQUIRE) !=  0)
> +    abort ();
> +
> +  if (__atomic_xor_fetch (v, ~count, __ATOMIC_RELEASE) !=  init)
> +    abort ();
> +
> +  if (__atomic_xor_fetch (v, 0, __ATOMIC_ACQ_REL) !=  init)
> +    abort ();
> +
> +  if (__atomic_xor_fetch (v, ~count, __ATOMIC_SEQ_CST) !=  0)
> +    abort ();
> +}
> +
> +void
> +test_or_fetch (short* v)
> +{
> +  *v = 0;
> +  count = 1;
> +
> +  if (__atomic_or_fetch (v, count, __ATOMIC_RELAXED) !=  1)
> +    abort ();
> +
> +  count *= 2;
> +  if (__atomic_or_fetch (v, 2, __ATOMIC_CONSUME) !=  3)
> +    abort ();
> +
> +  count *= 2;
> +  if (__atomic_or_fetch (v, count, __ATOMIC_ACQUIRE) !=  7)
> +    abort ();
> +
> +  count *= 2;
> +  if (__atomic_or_fetch (v, 8, __ATOMIC_RELEASE) !=  15)
> +    abort ();
> +
> +  count *= 2;
> +  if (__atomic_or_fetch (v, count, __ATOMIC_ACQ_REL) !=  31)
> +    abort ();
> +
> +  count *= 2;
> +  if (__atomic_or_fetch (v, count, __ATOMIC_SEQ_CST) !=  63)
> +    abort ();
> +}
> +
> +
> +/* Test the OP routines with a result which isn't used. Use both variations
> +   within each function.  */
> +
> +void
> +test_add (short* v)
> +{
> +  *v = 0;
> +  count = 1;
> +
> +  __atomic_add_fetch (v, count, __ATOMIC_RELAXED);
> +  if (*v != 1)
> +    abort ();
> +
> +  __atomic_fetch_add (v, count, __ATOMIC_CONSUME);
> +  if (*v != 2)
> +    abort ();
> +
> +  __atomic_add_fetch (v, 1 , __ATOMIC_ACQUIRE);
> +  if (*v != 3)
> +    abort ();
> +
> +  __atomic_fetch_add (v, 1, __ATOMIC_RELEASE);
> +  if (*v != 4)
> +    abort ();
> +
> +  __atomic_add_fetch (v, count, __ATOMIC_ACQ_REL);
> +  if (*v != 5)
> +    abort ();
> +
> +  __atomic_fetch_add (v, count, __ATOMIC_SEQ_CST);
> +  if (*v != 6)
> +    abort ();
> +}
> +
> +
> +void
> +test_sub (short* v)
> +{
> +  *v = res = 20;
> +  count = 0;
> +
> +  __atomic_sub_fetch (v, count + 1, __ATOMIC_RELAXED);
> +  if (*v != --res)
> +    abort ();
> +
> +  __atomic_fetch_sub (v, count + 1, __ATOMIC_CONSUME);
> +  if (*v != --res)
> +    abort ();
> +
> +  __atomic_sub_fetch (v, 1, __ATOMIC_ACQUIRE);
> +  if (*v != --res)
> +    abort ();
> +
> +  __atomic_fetch_sub (v, 1, __ATOMIC_RELEASE);
> +  if (*v != --res)
> +    abort ();
> +
> +  __atomic_sub_fetch (v, count + 1, __ATOMIC_ACQ_REL);
> +  if (*v != --res)
> +    abort ();
> +
> +  __atomic_fetch_sub (v, count + 1, __ATOMIC_SEQ_CST);
> +  if (*v != --res)
> +    abort ();
> +}
> +
> +void
> +test_and (short* v)
> +{
> +  *v = init;
> +
> +  __atomic_and_fetch (v, 0, __ATOMIC_RELAXED);
> +  if (*v != 0)
> +    abort ();
> +
> +  *v = init;
> +  __atomic_fetch_and (v, init, __ATOMIC_CONSUME);
> +  if (*v != init)
> +    abort ();
> +
> +  __atomic_and_fetch (v, 0, __ATOMIC_ACQUIRE);
> +  if (*v != 0)
> +    abort ();
> +
> +  *v = ~*v;
> +  __atomic_fetch_and (v, init, __ATOMIC_RELEASE);
> +  if (*v != init)
> +    abort ();
> +
> +  __atomic_and_fetch (v, 0, __ATOMIC_ACQ_REL);
> +  if (*v != 0)
> +    abort ();
> +
> +  *v = ~*v;
> +  __atomic_fetch_and (v, 0, __ATOMIC_SEQ_CST);
> +  if (*v != 0)
> +    abort ();
> +}
> +
> +void
> +test_nand (short* v)
> +{
> +  *v = init;
> +
> +  __atomic_fetch_nand (v, 0, __ATOMIC_RELAXED);
> +  if (*v != init)
> +    abort ();
> +
> +  __atomic_fetch_nand (v, init, __ATOMIC_CONSUME);
> +  if (*v != 0)
> +    abort ();
> +
> +  __atomic_nand_fetch (v, 0, __ATOMIC_ACQUIRE);
> +  if (*v != init)
> +    abort ();
> +
> +  __atomic_nand_fetch (v, init, __ATOMIC_RELEASE);
> +  if (*v != 0)
> +    abort ();
> +
> +  __atomic_fetch_nand (v, init, __ATOMIC_ACQ_REL);
> +  if (*v != init)
> +    abort ();
> +
> +  __atomic_nand_fetch (v, 0, __ATOMIC_SEQ_CST);
> +  if (*v != init)
> +    abort ();
> +}
> +
> +
> +
> +void
> +test_xor (short* v)
> +{
> +  *v = init;
> +  count = 0;
> +
> +  __atomic_xor_fetch (v, count, __ATOMIC_RELAXED);
> +  if (*v != init)
> +    abort ();
> +
> +  __atomic_fetch_xor (v, ~count, __ATOMIC_CONSUME);
> +  if (*v != 0)
> +    abort ();
> +
> +  __atomic_xor_fetch (v, 0, __ATOMIC_ACQUIRE);
> +  if (*v != 0)
> +    abort ();
> +
> +  __atomic_fetch_xor (v, ~count, __ATOMIC_RELEASE);
> +  if (*v != init)
> +    abort ();
> +
> +  __atomic_fetch_xor (v, 0, __ATOMIC_ACQ_REL);
> +  if (*v != init)
> +    abort ();
> +
> +  __atomic_xor_fetch (v, ~count, __ATOMIC_SEQ_CST);
> +  if (*v != 0)
> +    abort ();
> +}
> +
> +void
> +test_or (short* v)
> +{
> +  *v = 0;
> +  count = 1;
> +
> +  __atomic_or_fetch (v, count, __ATOMIC_RELAXED);
> +  if (*v != 1)
> +    abort ();
> +
> +  count *= 2;
> +  __atomic_fetch_or (v, count, __ATOMIC_CONSUME);
> +  if (*v != 3)
> +    abort ();
> +
> +  count *= 2;
> +  __atomic_or_fetch (v, 4, __ATOMIC_ACQUIRE);
> +  if (*v != 7)
> +    abort ();
> +
> +  count *= 2;
> +  __atomic_fetch_or (v, 8, __ATOMIC_RELEASE);
> +  if (*v != 15)
> +    abort ();
> +
> +  count *= 2;
> +  __atomic_or_fetch (v, count, __ATOMIC_ACQ_REL);
> +  if (*v != 31)
> +    abort ();
> +
> +  count *= 2;
> +  __atomic_fetch_or (v, count, __ATOMIC_SEQ_CST);
> +  if (*v != 63)
> +    abort ();
> +}
> +
> +int
> +main () {
> +  short* V[] = {&A.a, &A.b};
> +
> +  for (int i = 0; i < 2; i++) {
> +    test_fetch_add (V[i]);
> +    test_fetch_sub (V[i]);
> +    test_fetch_and (V[i]);
> +    test_fetch_nand (V[i]);
> +    test_fetch_xor (V[i]);
> +    test_fetch_or (V[i]);
> +
> +    test_add_fetch (V[i]);
> +    test_sub_fetch (V[i]);
> +    test_and_fetch (V[i]);
> +    test_nand_fetch (V[i]);
> +    test_xor_fetch (V[i]);
> +    test_or_fetch (V[i]);
> +
> +    test_add (V[i]);
> +    test_sub (V[i]);
> +    test_and (V[i]);
> +    test_nand (V[i]);
> +    test_xor (V[i]);
> +    test_or (V[i]);
> +  }
> +
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-5.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-5.c
> new file mode 100644
> index 00000000000..52093894a79
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-5.c
> @@ -0,0 +1,87 @@
> +/* Test __atomic routines for existence and proper execution on 1 byte
> +   values with each valid memory model.  */
> +/* Duplicate logic as libatomic/testsuite/libatomic.c/atomic-compare-exchange-1.c */
> +/* { dg-do run } */
> +/* { dg-options "-minline-atomics" } */
> +
> +/* Test the execution of the __atomic_compare_exchange_n builtin for a char.  */
> +
> +extern void abort(void);
> +
> +char v = 0;
> +char expected = 0;
> +char max = ~0;
> +char desired = ~0;
> +char zero = 0;
> +
> +#define STRONG 0
> +#define WEAK 1
> +
> +int
> +main ()
> +{
> +
> +  if (!__atomic_compare_exchange_n (&v, &expected, max, STRONG , __ATOMIC_RELAXED, __ATOMIC_RELAXED))
> +    abort ();
> +  if (expected != 0)
> +    abort ();
> +
> +  if (__atomic_compare_exchange_n (&v, &expected, 0, STRONG , __ATOMIC_ACQUIRE, __ATOMIC_RELAXED))
> +    abort ();
> +  if (expected != max)
> +    abort ();
> +
> +  if (!__atomic_compare_exchange_n (&v, &expected, 0, STRONG , __ATOMIC_RELEASE, __ATOMIC_ACQUIRE))
> +    abort ();
> +  if (expected != max)
> +    abort ();
> +  if (v != 0)
> +    abort ();
> +
> +  if (__atomic_compare_exchange_n (&v, &expected, desired, WEAK, __ATOMIC_ACQ_REL, __ATOMIC_ACQUIRE))
> +    abort ();
> +  if (expected != 0)
> +    abort ();
> +
> +  if (!__atomic_compare_exchange_n (&v, &expected, desired, STRONG , __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST))
> +    abort ();
> +  if (expected != 0)
> +    abort ();
> +  if (v != max)
> +    abort ();
> +
> +  /* Now test the generic version.  */
> +
> +  v = 0;
> +
> +  if (!__atomic_compare_exchange (&v, &expected, &max, STRONG, __ATOMIC_RELAXED, __ATOMIC_RELAXED))
> +    abort ();
> +  if (expected != 0)
> +    abort ();
> +
> +  if (__atomic_compare_exchange (&v, &expected, &zero, STRONG , __ATOMIC_ACQUIRE, __ATOMIC_RELAXED))
> +    abort ();
> +  if (expected != max)
> +    abort ();
> +
> +  if (!__atomic_compare_exchange (&v, &expected, &zero, STRONG , __ATOMIC_RELEASE, __ATOMIC_ACQUIRE))
> +    abort ();
> +  if (expected != max)
> +    abort ();
> +  if (v != 0)
> +    abort ();
> +
> +  if (__atomic_compare_exchange (&v, &expected, &desired, WEAK, __ATOMIC_ACQ_REL, __ATOMIC_ACQUIRE))
> +    abort ();
> +  if (expected != 0)
> +    abort ();
> +
> +  if (!__atomic_compare_exchange (&v, &expected, &desired, STRONG , __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST))
> +    abort ();
> +  if (expected != 0)
> +    abort ();
> +  if (v != max)
> +    abort ();
> +
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-6.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-6.c
> new file mode 100644
> index 00000000000..8fee8c44811
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-6.c
> @@ -0,0 +1,87 @@
> +/* Test __atomic routines for existence and proper execution on 2 byte
> +   values with each valid memory model.  */
> +/* Duplicate logic as libatomic/testsuite/libatomic.c/atomic-compare-exchange-2.c */
> +/* { dg-do run } */
> +/* { dg-options "-minline-atomics" } */
> +
> +/* Test the execution of the __atomic_compare_exchange_n builtin for a short.  */
> +
> +extern void abort(void);
> +
> +short v = 0;
> +short expected = 0;
> +short max = ~0;
> +short desired = ~0;
> +short zero = 0;
> +
> +#define STRONG 0
> +#define WEAK 1
> +
> +int
> +main ()
> +{
> +
> +  if (!__atomic_compare_exchange_n (&v, &expected, max, STRONG , __ATOMIC_RELAXED, __ATOMIC_RELAXED))
> +    abort ();
> +  if (expected != 0)
> +    abort ();
> +
> +  if (__atomic_compare_exchange_n (&v, &expected, 0, STRONG , __ATOMIC_ACQUIRE, __ATOMIC_RELAXED))
> +    abort ();
> +  if (expected != max)
> +    abort ();
> +
> +  if (!__atomic_compare_exchange_n (&v, &expected, 0, STRONG , __ATOMIC_RELEASE, __ATOMIC_ACQUIRE))
> +    abort ();
> +  if (expected != max)
> +    abort ();
> +  if (v != 0)
> +    abort ();
> +
> +  if (__atomic_compare_exchange_n (&v, &expected, desired, WEAK, __ATOMIC_ACQ_REL, __ATOMIC_ACQUIRE))
> +    abort ();
> +  if (expected != 0)
> +    abort ();
> +
> +  if (!__atomic_compare_exchange_n (&v, &expected, desired, STRONG , __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST))
> +    abort ();
> +  if (expected != 0)
> +    abort ();
> +  if (v != max)
> +    abort ();
> +
> +  /* Now test the generic version.  */
> +
> +  v = 0;
> +
> +  if (!__atomic_compare_exchange (&v, &expected, &max, STRONG, __ATOMIC_RELAXED, __ATOMIC_RELAXED))
> +    abort ();
> +  if (expected != 0)
> +    abort ();
> +
> +  if (__atomic_compare_exchange (&v, &expected, &zero, STRONG , __ATOMIC_ACQUIRE, __ATOMIC_RELAXED))
> +    abort ();
> +  if (expected != max)
> +    abort ();
> +
> +  if (!__atomic_compare_exchange (&v, &expected, &zero, STRONG , __ATOMIC_RELEASE, __ATOMIC_ACQUIRE))
> +    abort ();
> +  if (expected != max)
> +    abort ();
> +  if (v != 0)
> +    abort ();
> +
> +  if (__atomic_compare_exchange (&v, &expected, &desired, WEAK, __ATOMIC_ACQ_REL, __ATOMIC_ACQUIRE))
> +    abort ();
> +  if (expected != 0)
> +    abort ();
> +
> +  if (!__atomic_compare_exchange (&v, &expected, &desired, STRONG , __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST))
> +    abort ();
> +  if (expected != 0)
> +    abort ();
> +  if (v != max)
> +    abort ();
> +
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-7.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-7.c
> new file mode 100644
> index 00000000000..24c344c0ce3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-7.c
> @@ -0,0 +1,69 @@
> +/* Test __atomic routines for existence and proper execution on 1 byte
> +   values with each valid memory model.  */
> +/* Duplicate logic as libatomic/testsuite/libatomic.c/atomic-exchange-1.c */
> +/* { dg-do run } */
> +/* { dg-options "-minline-atomics" } */
> +
> +/* Test the execution of the __atomic_exchange_n builtin for a char.  */
> +
> +extern void abort(void);
> +
> +char v, count, ret;
> +
> +int
> +main ()
> +{
> +  v = 0;
> +  count = 0;
> +
> +  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_RELAXED) != count)
> +    abort ();
> +  count++;
> +
> +  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_ACQUIRE) != count)
> +    abort ();
> +  count++;
> +
> +  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_RELEASE) != count)
> +    abort ();
> +  count++;
> +
> +  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_ACQ_REL) != count)
> +    abort ();
> +  count++;
> +
> +  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_SEQ_CST) != count)
> +    abort ();
> +  count++;
> +
> +  /* Now test the generic version.  */
> +
> +  count++;
> +
> +  __atomic_exchange (&v, &count, &ret, __ATOMIC_RELAXED);
> +  if (ret != count - 1 || v != count)
> +    abort ();
> +  count++;
> +
> +  __atomic_exchange (&v, &count, &ret, __ATOMIC_ACQUIRE);
> +  if (ret != count - 1 || v != count)
> +    abort ();
> +  count++;
> +
> +  __atomic_exchange (&v, &count, &ret, __ATOMIC_RELEASE);
> +  if (ret != count - 1 || v != count)
> +    abort ();
> +  count++;
> +
> +  __atomic_exchange (&v, &count, &ret, __ATOMIC_ACQ_REL);
> +  if (ret != count - 1 || v != count)
> +    abort ();
> +  count++;
> +
> +  __atomic_exchange (&v, &count, &ret, __ATOMIC_SEQ_CST);
> +  if (ret != count - 1 || v != count)
> +    abort ();
> +  count++;
> +
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-8.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-8.c
> new file mode 100644
> index 00000000000..edc212df04e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-8.c
> @@ -0,0 +1,69 @@
> +/* Test __atomic routines for existence and proper execution on 2 byte
> +   values with each valid memory model.  */
> +/* Duplicate logic as libatomic/testsuite/libatomic.c/atomic-exchange-2.c */
> +/* { dg-do run } */
> +/* { dg-options "-minline-atomics" } */
> +
> +/* Test the execution of the __atomic_X builtin for a short.  */
> +
> +extern void abort(void);
> +
> +short v, count, ret;
> +
> +int
> +main ()
> +{
> +  v = 0;
> +  count = 0;
> +
> +  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_RELAXED) != count)
> +    abort ();
> +  count++;
> +
> +  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_ACQUIRE) != count)
> +    abort ();
> +  count++;
> +
> +  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_RELEASE) != count)
> +    abort ();
> +  count++;
> +
> +  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_ACQ_REL) != count)
> +    abort ();
> +  count++;
> +
> +  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_SEQ_CST) != count)
> +    abort ();
> +  count++;
> +
> +  /* Now test the generic version.  */
> +
> +  count++;
> +
> +  __atomic_exchange (&v, &count, &ret, __ATOMIC_RELAXED);
> +  if (ret != count - 1 || v != count)
> +    abort ();
> +  count++;
> +
> +  __atomic_exchange (&v, &count, &ret, __ATOMIC_ACQUIRE);
> +  if (ret != count - 1 || v != count)
> +    abort ();
> +  count++;
> +
> +  __atomic_exchange (&v, &count, &ret, __ATOMIC_RELEASE);
> +  if (ret != count - 1 || v != count)
> +    abort ();
> +  count++;
> +
> +  __atomic_exchange (&v, &count, &ret, __ATOMIC_ACQ_REL);
> +  if (ret != count - 1 || v != count)
> +    abort ();
> +  count++;
> +
> +  __atomic_exchange (&v, &count, &ret, __ATOMIC_SEQ_CST);
> +  if (ret != count - 1 || v != count)
> +    abort ();
> +  count++;
> +
> +  return 0;
> +}
> diff --git a/libgcc/config/riscv/atomic.c b/libgcc/config/riscv/atomic.c
> index 69f53623509..573d163ea04 100644
> --- a/libgcc/config/riscv/atomic.c
> +++ b/libgcc/config/riscv/atomic.c
> @@ -30,6 +30,8 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
>  #define INVERT		"not %[tmp1], %[tmp1]\n\t"
>  #define DONT_INVERT	""
>
> +/* Logic duplicated in gcc/gcc/config/riscv/sync.md for use when inlining is enabled */
> +
>  #define GENERATE_FETCH_AND_OP(type, size, opname, insn, invert, cop)	\
>    type __sync_fetch_and_ ## opname ## _ ## size (type *p, type v)	\
>    {									\

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5] RISCV: Inline subword atomic ops
  2023-04-18 16:59   ` [PATCH v5] " Jeff Law
@ 2023-04-18 20:48     ` Patrick O'Neill
  2023-04-18 21:04       ` Jeff Law
  0 siblings, 1 reply; 24+ messages in thread
From: Patrick O'Neill @ 2023-04-18 20:48 UTC (permalink / raw)
  To: Jeff Law, gcc-patches; +Cc: palmer, kito.cheng, david.abd

On 4/18/23 09:59, Jeff Law wrote:
> On 4/18/23 08:28, Patrick O'Neill wrote:
> ...
>> +  rtx addr = force_reg (Pmode, XEXP (mem, 0));
>> +
>> +  rtx aligned_addr = gen_reg_rtx (Pmode);
>> +  emit_move_insn (aligned_addr,  gen_rtx_AND (Pmode, addr,
>> +                          gen_int_mode (-4, Pmode)));
> So rather than -4 as a magic number, GET_MODE_MASK would be better. 
> That may result in needing to rewrap this code.  I'd bring the 
> gen_rtx_AND down on a new line, aligned with aligned_addr.
IIUC GET_MODE_MASK generates masks like 0xFF for QI (for example). It 
doesn't have the granularity to generate 0x3 (which we can NOT to get 
-4). I searched the GCC internals docs but couldn't find a function that 
does address alignment masks.
> Presumably using SImode is intentional here rather than wanting to use 
> word_mode which would be SImode for rv32 and DImode for rv64?  I'm 
> going to work based on that assumption, but if it isn't there's more 
> work to do to generalize this code.
It's been a year but IIRC it was just simpler to implement (and to me it 
didn't make sense to use 64 bits for a subword op).
Is there a benefit in using 64 bit instructions when computing subwords?
>> +
>> +(define_expand "atomic_fetch_nand<mode>"
>> +  [(set (match_operand:SHORT 0 "register_operand" "=&r")
>> +    (match_operand:SHORT 1 "memory_operand" "+A"))
>> +   (set (match_dup 1)
>> +    (unspec_volatile:SHORT
>> +      [(not:SHORT (and:SHORT (match_dup 1)
>> +                 (match_operand:SHORT 2 "reg_or_0_operand" "rJ")))
>> +       (match_operand:SI 3 "const_int_operand")] ;; model
>> +     UNSPEC_SYNC_OLD_OP_SUBWORD))]
>> +  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
> Just a note, constraints aren't necessary for a define_expand. They 
> don't hurt anything though.  They do document expectations, but then 
> you have to maintain them over time.  I'm OK leaving them, mostly 
> wanted to make sure you're aware they aren't strictly necessary for a 
> define_expand.
I wasn't aware, thanks for pointing it out! - you're referring to the 
"TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC", (not the register 
constraints) right?
> ...
Thanks for reviewing!
Patrick

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5] RISCV: Inline subword atomic ops
  2023-04-18 20:48     ` Patrick O'Neill
@ 2023-04-18 21:04       ` Jeff Law
  0 siblings, 0 replies; 24+ messages in thread
From: Jeff Law @ 2023-04-18 21:04 UTC (permalink / raw)
  To: Patrick O'Neill, gcc-patches; +Cc: palmer, kito.cheng, david.abd



On 4/18/23 14:48, Patrick O'Neill wrote:
> On 4/18/23 09:59, Jeff Law wrote:
>> On 4/18/23 08:28, Patrick O'Neill wrote:
>> ...
>>> +  rtx addr = force_reg (Pmode, XEXP (mem, 0));
>>> +
>>> +  rtx aligned_addr = gen_reg_rtx (Pmode);
>>> +  emit_move_insn (aligned_addr,  gen_rtx_AND (Pmode, addr,
>>> +                          gen_int_mode (-4, Pmode)));
>> So rather than -4 as a magic number, GET_MODE_MASK would be better. 
>> That may result in needing to rewrap this code.  I'd bring the 
>> gen_rtx_AND down on a new line, aligned with aligned_addr.
> IIUC GET_MODE_MASK generates masks like 0xFF for QI (for example). It 
> doesn't have the granularity to generate 0x3 (which we can NOT to get 
> -4). I searched the GCC internals docs but couldn't find a function that 
> does address alignment masks.
Yea, yea.  Big "duh" on my side.

>> Presumably using SImode is intentional here rather than wanting to use 
>> word_mode which would be SImode for rv32 and DImode for rv64?  I'm 
>> going to work based on that assumption, but if it isn't there's more 
>> work to do to generalize this code.
> It's been a year but IIRC it was just simpler to implement (and to me it 
> didn't make sense to use 64 bits for a subword op).
> Is there a benefit in using 64 bit instructions when computing subwords?
Given that rv64 should have 32bit load/stores, I don't offhand see any 
advantage.


>>> +
>>> +(define_expand "atomic_fetch_nand<mode>"
>>> +  [(set (match_operand:SHORT 0 "register_operand" "=&r")
>>> +    (match_operand:SHORT 1 "memory_operand" "+A"))
>>> +   (set (match_dup 1)
>>> +    (unspec_volatile:SHORT
>>> +      [(not:SHORT (and:SHORT (match_dup 1)
>>> +                 (match_operand:SHORT 2 "reg_or_0_operand" "rJ")))
>>> +       (match_operand:SI 3 "const_int_operand")] ;; model
>>> +     UNSPEC_SYNC_OLD_OP_SUBWORD))]
>>> +  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
>> Just a note, constraints aren't necessary for a define_expand. They 
>> don't hurt anything though.  They do document expectations, but then 
>> you have to maintain them over time.  I'm OK leaving them, mostly 
>> wanted to make sure you're aware they aren't strictly necessary for a 
>> define_expand.
> I wasn't aware, thanks for pointing it out! - you're referring to the 
> "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC", (not the register 
> constraints) right?
I was referring to the register constraints like "=&r".  They're ignored 
on define_expand constructors.  A define_expand generates RTL that will 
be matched later by a define_insn.

The "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC" is usually referred 
to as the insn condition.


> Thanks for reviewing!
NP.  Looking forward to V6 which I expect will be ready for inclusion.

jeff

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v7] RISCV: Inline subword atomic ops
  2023-04-18 16:39   ` [PATCH v6] " Patrick O'Neill
  2023-04-18 20:17     ` Palmer Dabbelt
@ 2023-04-18 21:41     ` Patrick O'Neill
  2023-04-24 17:20       ` Patrick O'Neill
  2023-04-25  5:52       ` Jeff Law
  1 sibling, 2 replies; 24+ messages in thread
From: Patrick O'Neill @ 2023-04-18 21:41 UTC (permalink / raw)
  To: gcc-patches
  Cc: palmer, kito.cheng, david.abd, jeffreyalaw, schwab, Patrick O'Neill

RISC-V has no support for subword atomic operations; code currently
generates libatomic library calls.

This patch changes the default behavior to inline subword atomic calls 
(using the same logic as the existing library call).
Behavior can be specified using the -minline-atomics and
-mno-inline-atomics command line flags.

gcc/libgcc/config/riscv/atomic.c has the same logic implemented in asm.
This will need to stay for backwards compatibility and the
-mno-inline-atomics flag.

2023-04-18 Patrick O'Neill <patrick@rivosinc.com>

	PR target/104338
	* riscv-protos.h: Add helper function stubs.
	* riscv.cc: Add helper functions for subword masking.
	* riscv.opt: Add command-line flag.
	* sync.md: Add masking logic and inline asm for fetch_and_op,
	fetch_and_nand, CAS, and exchange ops.
	* invoke.texi: Add blurb regarding command-line flag.
	* inline-atomics-1.c: New test.
	* inline-atomics-2.c: Likewise.
	* inline-atomics-3.c: Likewise.
	* inline-atomics-4.c: Likewise.
	* inline-atomics-5.c: Likewise.
	* inline-atomics-6.c: Likewise.
	* inline-atomics-7.c: Likewise.
	* inline-atomics-8.c: Likewise.
	* atomic.c: Add reference to duplicate logic.

Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
---
Comment from Jeff Law that gives more context to this patch:
"So for others who may be interested.  The motivation here is that for a 
sub-word atomic we currently have to explicitly link in libatomic or we 
get undefined symbols.

This is particularly problematical for the distros because we're one of 
the few (only?) architectures supported by the distros that require 
linking in libatomic for these cases.  THe distros don't want to adjust 
each affected packages and be stuck carrying that change forward or 
negotiating with all the relevant upstreams.  The distros might tackle 
this problem by porting this patch into their compiler tree which has 
its own set of problems with long term maintenance.

The net is from a usability standpoint it's best if we get this problem 
addressed and backported to our gcc-13 RISC-V coordination branch.

We had held this up pending resolution of some other issues in the 
atomics space.  In retrospect that might have been a mistake."
  https://inbox.sourceware.org/gcc-patches/87y1mpb57m.fsf@igel.home/
---
v6: https://inbox.sourceware.org/gcc-patches/20230418163913.2429812-1-patrick@rivosinc.com/

Addressed Jeff Law's comments.
  https://inbox.sourceware.org/gcc-patches/87y1mpb57m.fsf@igel.home/

Changes:
- Simplified define_expand expressions/removed unneeded register
  constraints
- Improve comment describing riscv_subword_address
- Use #include "inline-atomics-1.c" in inline-atomics-2.c
- Use rtx addr_mask variable to describe the use of the -4 magic number.
- Misc. formatting/define_expand comments

No new failures on trunk.
---
The mapping implemented here matches Libatomic. That mapping changes if
"Implement ISA Manual Table A.6 Mappings" is merged. Depending on which
patch is merged first, I will update the other to make sure the
correct mapping is emitted.
  https://gcc.gnu.org/pipermail/gcc-patches/2023-April/615748.html
---
 gcc/config/riscv/riscv-protos.h               |   2 +
 gcc/config/riscv/riscv.cc                     |  49 ++
 gcc/config/riscv/riscv.opt                    |   4 +
 gcc/config/riscv/sync.md                      | 301 +++++++++
 gcc/doc/invoke.texi                           |  10 +-
 .../gcc.target/riscv/inline-atomics-1.c       |  18 +
 .../gcc.target/riscv/inline-atomics-2.c       |   9 +
 .../gcc.target/riscv/inline-atomics-3.c       | 569 ++++++++++++++++++
 .../gcc.target/riscv/inline-atomics-4.c       | 566 +++++++++++++++++
 .../gcc.target/riscv/inline-atomics-5.c       |  87 +++
 .../gcc.target/riscv/inline-atomics-6.c       |  87 +++
 .../gcc.target/riscv/inline-atomics-7.c       |  69 +++
 .../gcc.target/riscv/inline-atomics-8.c       |  69 +++
 libgcc/config/riscv/atomic.c                  |   2 +
 14 files changed, 1841 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-8.c

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 5244e8dcbf0..02b33e02020 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -79,6 +79,8 @@ extern void riscv_reinit (void);
 extern poly_uint64 riscv_regmode_natural_size (machine_mode);
 extern bool riscv_v_ext_vector_mode_p (machine_mode);
 extern bool riscv_shamt_matches_mask_p (int, HOST_WIDE_INT);
+extern void riscv_subword_address (rtx, rtx *, rtx *, rtx *, rtx *);
+extern void riscv_lshift_subword (machine_mode, rtx, rtx, rtx *);
 
 /* Routines implemented in riscv-c.cc.  */
 void riscv_cpu_cpp_builtins (cpp_reader *);
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index cdb47e81e7c..61f019380ae 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -7148,6 +7148,55 @@ riscv_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
 							& ~zeroed_hardregs);
 }
 
+/* Given memory reference MEM, expand code to compute the aligned
+   memory address, shift and mask values and store them into
+   *ALIGNED_MEM, *SHIFT, *MASK and *NOT_MASK.  */
+
+void
+riscv_subword_address (rtx mem, rtx *aligned_mem, rtx *shift, rtx *mask,
+		       rtx *not_mask)
+{
+  /* Align the memory address to a word.  */
+  rtx addr = force_reg (Pmode, XEXP (mem, 0));
+
+  rtx addr_mask = gen_int_mode (-4, Pmode);
+
+  rtx aligned_addr = gen_reg_rtx (Pmode);
+  emit_move_insn (aligned_addr,  gen_rtx_AND (Pmode, addr, addr_mask));
+
+  *aligned_mem = change_address (mem, SImode, aligned_addr);
+
+  /* Calculate the shift amount.  */
+  emit_move_insn (*shift, gen_rtx_AND (SImode, gen_lowpart (SImode, addr),
+				       gen_int_mode (3, SImode)));
+  emit_move_insn (*shift, gen_rtx_ASHIFT (SImode, *shift,
+					  gen_int_mode (3, SImode)));
+
+  /* Calculate the mask.  */
+  int unshifted_mask = GET_MODE_MASK (GET_MODE (mem));
+
+  emit_move_insn (*mask, gen_int_mode (unshifted_mask, SImode));
+
+  emit_move_insn (*mask, gen_rtx_ASHIFT (SImode, *mask,
+					 gen_lowpart (QImode, *shift)));
+
+  emit_move_insn (*not_mask, gen_rtx_NOT(SImode, *mask));
+}
+
+/* Leftshift a subword within an SImode register.  */
+
+void
+riscv_lshift_subword (machine_mode mode, rtx value, rtx shift,
+		      rtx *shifted_value)
+{
+  rtx value_reg = gen_reg_rtx (SImode);
+  emit_move_insn (value_reg, simplify_gen_subreg (SImode, value,
+						  mode, 0));
+
+  emit_move_insn(*shifted_value, gen_rtx_ASHIFT (SImode, value_reg,
+						 gen_lowpart (QImode, shift)));
+}
+
 /* Initialize the GCC target structure.  */
 #undef TARGET_ASM_ALIGNED_HI_OP
 #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t"
diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index ff1dd4ddd4f..bc5e63ab3e6 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -254,3 +254,7 @@ Enum(isa_spec_class) String(20191213) Value(ISA_SPEC_CLASS_20191213)
 misa-spec=
 Target RejectNegative Joined Enum(isa_spec_class) Var(riscv_isa_spec) Init(TARGET_DEFAULT_ISA_SPEC)
 Set the version of RISC-V ISA spec.
+
+minline-atomics
+Target Var(TARGET_INLINE_SUBWORD_ATOMIC) Init(1)
+Always inline subword atomic operations.
diff --git a/gcc/config/riscv/sync.md b/gcc/config/riscv/sync.md
index c932ef87b9d..83be6431cb6 100644
--- a/gcc/config/riscv/sync.md
+++ b/gcc/config/riscv/sync.md
@@ -21,8 +21,11 @@
 
 (define_c_enum "unspec" [
   UNSPEC_COMPARE_AND_SWAP
+  UNSPEC_COMPARE_AND_SWAP_SUBWORD
   UNSPEC_SYNC_OLD_OP
+  UNSPEC_SYNC_OLD_OP_SUBWORD
   UNSPEC_SYNC_EXCHANGE
+  UNSPEC_SYNC_EXCHANGE_SUBWORD
   UNSPEC_ATOMIC_STORE
   UNSPEC_MEMORY_BARRIER
 ])
@@ -91,6 +94,135 @@
   [(set_attr "type" "atomic")
    (set (attr "length") (const_int 8))])
 
+(define_insn "subword_atomic_fetch_strong_<atomic_optab>"
+  [(set (match_operand:SI 0 "register_operand" "=&r")		   ;; old value at mem
+	(match_operand:SI 1 "memory_operand" "+A"))		   ;; mem location
+   (set (match_dup 1)
+	(unspec_volatile:SI
+	  [(any_atomic:SI (match_dup 1)
+		     (match_operand:SI 2 "register_operand" "rI")) ;; value for op
+	   (match_operand:SI 3 "register_operand" "rI")]	   ;; mask
+	 UNSPEC_SYNC_OLD_OP_SUBWORD))
+    (match_operand:SI 4 "register_operand" "rI")		   ;; not_mask
+    (clobber (match_scratch:SI 5 "=&r"))			   ;; tmp_1
+    (clobber (match_scratch:SI 6 "=&r"))]			   ;; tmp_2
+  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
+  {
+    return "1:\;"
+	   "lr.w.aq\t%0, %1\;"
+	   "<insn>\t%5, %0, %2\;"
+	   "and\t%5, %5, %3\;"
+	   "and\t%6, %0, %4\;"
+	   "or\t%6, %6, %5\;"
+	   "sc.w.rl\t%5, %6, %1\;"
+	   "bnez\t%5, 1b";
+  }
+  [(set (attr "length") (const_int 28))])
+
+(define_expand "atomic_fetch_nand<mode>"
+  [(match_operand:SHORT 0 "register_operand")			      ;; old value at mem
+   (not:SHORT (and:SHORT (match_operand:SHORT 1 "memory_operand")     ;; mem location
+			 (match_operand:SHORT 2 "reg_or_0_operand"))) ;; value for op
+   (match_operand:SI 3 "const_int_operand")]			      ;; model
+  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
+{
+  /* We have no QImode/HImode atomics, so form a mask, then use
+     subword_atomic_fetch_strong_nand to implement a LR/SC version of the
+     operation. */
+
+  /* Logic duplicated in gcc/libgcc/config/riscv/atomic.c for use when inlining
+     is disabled */
+
+  rtx old = gen_reg_rtx (SImode);
+  rtx mem = operands[1];
+  rtx value = operands[2];
+  rtx aligned_mem = gen_reg_rtx (SImode);
+  rtx shift = gen_reg_rtx (SImode);
+  rtx mask = gen_reg_rtx (SImode);
+  rtx not_mask = gen_reg_rtx (SImode);
+
+  riscv_subword_address (mem, &aligned_mem, &shift, &mask, &not_mask);
+
+  rtx shifted_value = gen_reg_rtx (SImode);
+  riscv_lshift_subword (<MODE>mode, value, shift, &shifted_value);
+
+  emit_insn (gen_subword_atomic_fetch_strong_nand (old, aligned_mem,
+						   shifted_value,
+						   mask, not_mask));
+
+  emit_move_insn (old, gen_rtx_ASHIFTRT (SImode, old,
+					 gen_lowpart (QImode, shift)));
+
+  emit_move_insn (operands[0], gen_lowpart (<MODE>mode, old));
+
+  DONE;
+})
+
+(define_insn "subword_atomic_fetch_strong_nand"
+  [(set (match_operand:SI 0 "register_operand" "=&r")			  ;; old value at mem
+	(match_operand:SI 1 "memory_operand" "+A"))			  ;; mem location
+   (set (match_dup 1)
+	(unspec_volatile:SI
+	  [(not:SI (and:SI (match_dup 1)
+			   (match_operand:SI 2 "register_operand" "rI"))) ;; value for op
+	   (match_operand:SI 3 "register_operand" "rI")]		  ;; mask
+	 UNSPEC_SYNC_OLD_OP_SUBWORD))
+    (match_operand:SI 4 "register_operand" "rI")			  ;; not_mask
+    (clobber (match_scratch:SI 5 "=&r"))				  ;; tmp_1
+    (clobber (match_scratch:SI 6 "=&r"))]				  ;; tmp_2
+  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
+  {
+    return "1:\;"
+	   "lr.w.aq\t%0, %1\;"
+	   "and\t%5, %0, %2\;"
+	   "not\t%5, %5\;"
+	   "and\t%5, %5, %3\;"
+	   "and\t%6, %0, %4\;"
+	   "or\t%6, %6, %5\;"
+	   "sc.w.rl\t%5, %6, %1\;"
+	   "bnez\t%5, 1b";
+  }
+  [(set (attr "length") (const_int 32))])
+
+(define_expand "atomic_fetch_<atomic_optab><mode>"
+  [(match_operand:SHORT 0 "register_operand")			 ;; old value at mem
+   (any_atomic:SHORT (match_operand:SHORT 1 "memory_operand")	 ;; mem location
+		     (match_operand:SHORT 2 "reg_or_0_operand")) ;; value for op
+   (match_operand:SI 3 "const_int_operand")]			 ;; model
+  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
+{
+  /* We have no QImode/HImode atomics, so form a mask, then use
+     subword_atomic_fetch_strong_<mode> to implement a LR/SC version of the
+     operation. */
+
+  /* Logic duplicated in gcc/libgcc/config/riscv/atomic.c for use when inlining
+     is disabled */
+
+  rtx old = gen_reg_rtx (SImode);
+  rtx mem = operands[1];
+  rtx value = operands[2];
+  rtx aligned_mem = gen_reg_rtx (SImode);
+  rtx shift = gen_reg_rtx (SImode);
+  rtx mask = gen_reg_rtx (SImode);
+  rtx not_mask = gen_reg_rtx (SImode);
+
+  riscv_subword_address (mem, &aligned_mem, &shift, &mask, &not_mask);
+
+  rtx shifted_value = gen_reg_rtx (SImode);
+  riscv_lshift_subword (<MODE>mode, value, shift, &shifted_value);
+
+  emit_insn (gen_subword_atomic_fetch_strong_<atomic_optab> (old, aligned_mem,
+							     shifted_value,
+							     mask, not_mask));
+
+  emit_move_insn (old, gen_rtx_ASHIFTRT (SImode, old,
+					 gen_lowpart (QImode, shift)));
+
+  emit_move_insn (operands[0], gen_lowpart (<MODE>mode, old));
+
+  DONE;
+})
+
 (define_insn "atomic_exchange<mode>"
   [(set (match_operand:GPR 0 "register_operand" "=&r")
 	(unspec_volatile:GPR
@@ -104,6 +236,56 @@
   [(set_attr "type" "atomic")
    (set (attr "length") (const_int 8))])
 
+(define_expand "atomic_exchange<mode>"
+  [(match_operand:SHORT 0 "register_operand") ;; old value at mem
+   (match_operand:SHORT 1 "memory_operand")   ;; mem location
+   (match_operand:SHORT 2 "register_operand") ;; value
+   (match_operand:SI 3 "const_int_operand")]  ;; model
+  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
+{
+  rtx old = gen_reg_rtx (SImode);
+  rtx mem = operands[1];
+  rtx value = operands[2];
+  rtx aligned_mem = gen_reg_rtx (SImode);
+  rtx shift = gen_reg_rtx (SImode);
+  rtx mask = gen_reg_rtx (SImode);
+  rtx not_mask = gen_reg_rtx (SImode);
+
+  riscv_subword_address (mem, &aligned_mem, &shift, &mask, &not_mask);
+
+  rtx shifted_value = gen_reg_rtx (SImode);
+  riscv_lshift_subword (<MODE>mode, value, shift, &shifted_value);
+
+  emit_insn (gen_subword_atomic_exchange_strong (old, aligned_mem,
+						 shifted_value, not_mask));
+
+  emit_move_insn (old, gen_rtx_ASHIFTRT (SImode, old,
+					 gen_lowpart (QImode, shift)));
+
+  emit_move_insn (operands[0], gen_lowpart (<MODE>mode, old));
+  DONE;
+})
+
+(define_insn "subword_atomic_exchange_strong"
+  [(set (match_operand:SI 0 "register_operand" "=&r")	 ;; old value at mem
+	(match_operand:SI 1 "memory_operand" "+A"))	 ;; mem location
+   (set (match_dup 1)
+	(unspec_volatile:SI
+	  [(match_operand:SI 2 "reg_or_0_operand" "rI")  ;; value
+	   (match_operand:SI 3 "reg_or_0_operand" "rI")] ;; not_mask
+      UNSPEC_SYNC_EXCHANGE_SUBWORD))
+    (clobber (match_scratch:SI 4 "=&r"))]		 ;; tmp_1
+  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
+  {
+    return "1:\;"
+	   "lr.w.aq\t%0, %1\;"
+	   "and\t%4, %0, %3\;"
+	   "or\t%4, %4, %2\;"
+	   "sc.w.rl\t%4, %4, %1\;"
+	   "bnez\t%4, 1b";
+  }
+  [(set (attr "length") (const_int 20))])
+
 (define_insn "atomic_cas_value_strong<mode>"
   [(set (match_operand:GPR 0 "register_operand" "=&r")
 	(match_operand:GPR 1 "memory_operand" "+A"))
@@ -153,6 +335,125 @@
   DONE;
 })
 
+(define_expand "atomic_compare_and_swap<mode>"
+  [(match_operand:SI 0 "register_operand")    ;; bool output
+   (match_operand:SHORT 1 "register_operand") ;; val output
+   (match_operand:SHORT 2 "memory_operand")   ;; memory
+   (match_operand:SHORT 3 "reg_or_0_operand") ;; expected value
+   (match_operand:SHORT 4 "reg_or_0_operand") ;; desired value
+   (match_operand:SI 5 "const_int_operand")   ;; is_weak
+   (match_operand:SI 6 "const_int_operand")   ;; mod_s
+   (match_operand:SI 7 "const_int_operand")]  ;; mod_f
+  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
+{
+  emit_insn (gen_atomic_cas_value_strong<mode> (operands[1], operands[2],
+						operands[3], operands[4],
+						operands[6], operands[7]));
+
+  rtx val = gen_reg_rtx (SImode);
+  if (operands[1] != const0_rtx)
+    emit_move_insn (val, gen_rtx_SIGN_EXTEND (SImode, operands[1]));
+  else
+    emit_move_insn (val, const0_rtx);
+
+  rtx exp = gen_reg_rtx (SImode);
+  if (operands[3] != const0_rtx)
+    emit_move_insn (exp, gen_rtx_SIGN_EXTEND (SImode, operands[3]));
+  else
+    emit_move_insn (exp, const0_rtx);
+
+  rtx compare = val;
+  if (exp != const0_rtx)
+    {
+      rtx difference = gen_rtx_MINUS (SImode, val, exp);
+      compare = gen_reg_rtx (SImode);
+      emit_move_insn  (compare, difference);
+    }
+
+  if (word_mode != SImode)
+    {
+      rtx reg = gen_reg_rtx (word_mode);
+      emit_move_insn (reg, gen_rtx_SIGN_EXTEND (word_mode, compare));
+      compare = reg;
+    }
+
+  emit_move_insn (operands[0], gen_rtx_EQ (SImode, compare, const0_rtx));
+  DONE;
+})
+
+(define_expand "atomic_cas_value_strong<mode>"
+  [(match_operand:SHORT 0 "register_operand") ;; val output
+   (match_operand:SHORT 1 "memory_operand")   ;; memory
+   (match_operand:SHORT 2 "reg_or_0_operand") ;; expected value
+   (match_operand:SHORT 3 "reg_or_0_operand") ;; desired value
+   (match_operand:SI 4 "const_int_operand")   ;; mod_s
+   (match_operand:SI 5 "const_int_operand")   ;; mod_f
+   (match_scratch:SHORT 6)]
+  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
+{
+  /* We have no QImode/HImode atomics, so form a mask, then use
+     subword_atomic_cas_strong<mode> to implement a LR/SC version of the
+     operation. */
+
+  /* Logic duplicated in gcc/libgcc/config/riscv/atomic.c for use when inlining
+     is disabled */
+
+  rtx old = gen_reg_rtx (SImode);
+  rtx mem = operands[1];
+  rtx aligned_mem = gen_reg_rtx (SImode);
+  rtx shift = gen_reg_rtx (SImode);
+  rtx mask = gen_reg_rtx (SImode);
+  rtx not_mask = gen_reg_rtx (SImode);
+
+  riscv_subword_address (mem, &aligned_mem, &shift, &mask, &not_mask);
+
+  rtx o = operands[2];
+  rtx n = operands[3];
+  rtx shifted_o = gen_reg_rtx (SImode);
+  rtx shifted_n = gen_reg_rtx (SImode);
+
+  riscv_lshift_subword (<MODE>mode, o, shift, &shifted_o);
+  riscv_lshift_subword (<MODE>mode, n, shift, &shifted_n);
+
+  emit_move_insn (shifted_o, gen_rtx_AND (SImode, shifted_o, mask));
+  emit_move_insn (shifted_n, gen_rtx_AND (SImode, shifted_n, mask));
+
+  emit_insn (gen_subword_atomic_cas_strong (old, aligned_mem,
+					    shifted_o, shifted_n,
+					    mask, not_mask));
+
+  emit_move_insn (old, gen_rtx_ASHIFTRT (SImode, old,
+					 gen_lowpart (QImode, shift)));
+
+  emit_move_insn (operands[0], gen_lowpart (<MODE>mode, old));
+
+  DONE;
+})
+
+(define_insn "subword_atomic_cas_strong"
+  [(set (match_operand:SI 0 "register_operand" "=&r")			   ;; old value at mem
+	(match_operand:SI 1 "memory_operand" "+A"))			   ;; mem location
+   (set (match_dup 1)
+	(unspec_volatile:SI [(match_operand:SI 2 "reg_or_0_operand" "rJ")  ;; expected value
+			     (match_operand:SI 3 "reg_or_0_operand" "rJ")] ;; desired value
+	 UNSPEC_COMPARE_AND_SWAP_SUBWORD))
+	(match_operand:SI 4 "register_operand" "rI")			   ;; mask
+	(match_operand:SI 5 "register_operand" "rI")			   ;; not_mask
+	(clobber (match_scratch:SI 6 "=&r"))]				   ;; tmp_1
+  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
+  {
+    return "1:\;"
+	   "lr.w.aq\t%0, %1\;"
+	   "and\t%6, %0, %4\;"
+	   "bne\t%6, %z2, 1f\;"
+	   "and\t%6, %0, %5\;"
+	   "or\t%6, %6, %3\;"
+	   "sc.w.rl\t%6, %6, %1\;"
+	   "bnez\t%6, 1b\;"
+	   "1:";
+  }
+  [(set (attr "length") (const_int 28))])
+
 (define_expand "atomic_test_and_set"
   [(match_operand:QI 0 "register_operand" "")     ;; bool output
    (match_operand:QI 1 "memory_operand" "+A")    ;; memory
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index a38547f53e5..ba448dcb7ef 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1226,7 +1226,8 @@ See RS/6000 and PowerPC Options.
 -mbig-endian  -mlittle-endian
 -mstack-protector-guard=@var{guard}  -mstack-protector-guard-reg=@var{reg}
 -mstack-protector-guard-offset=@var{offset}
--mcsr-check -mno-csr-check}
+-mcsr-check -mno-csr-check
+-minline-atomics  -mno-inline-atomics}
 
 @emph{RL78 Options}
 @gccoptlist{-msim  -mmul=none  -mmul=g13  -mmul=g14  -mallregs
@@ -29006,6 +29007,13 @@ Do or don't use smaller but slower prologue and epilogue code that uses
 library function calls.  The default is to use fast inline prologues and
 epilogues.
 
+@opindex minline-atomics
+@item -minline-atomics
+@itemx -mno-inline-atomics
+Do or don't use smaller but slower subword atomic emulation code that uses
+libatomic function calls.  The default is to use fast inline subword atomics
+that do not require libatomic.
+
 @opindex mshorten-memrefs
 @item -mshorten-memrefs
 @itemx -mno-shorten-memrefs
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-1.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-1.c
new file mode 100644
index 00000000000..5c5623d9b2f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mno-inline-atomics" } */
+/* { dg-message "note: '__sync_fetch_and_nand' changed semantics in GCC 4.4" "fetch_and_nand" { target *-*-* } 0 } */
+/* { dg-final { scan-assembler "\tcall\t__sync_fetch_and_add_1" } } */
+/* { dg-final { scan-assembler "\tcall\t__sync_fetch_and_nand_1" } } */
+/* { dg-final { scan-assembler "\tcall\t__sync_bool_compare_and_swap_1" } } */
+
+char foo;
+char bar;
+char baz;
+
+int
+main ()
+{
+  __sync_fetch_and_add(&foo, 1);
+  __sync_fetch_and_nand(&bar, 1);
+  __sync_bool_compare_and_swap (&baz, 1, 2);
+}
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-2.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-2.c
new file mode 100644
index 00000000000..01b43908692
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-2.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* Verify that subword atomics do not generate calls.  */
+/* { dg-options "-minline-atomics" } */
+/* { dg-message "note: '__sync_fetch_and_nand' changed semantics in GCC 4.4" "fetch_and_nand" { target *-*-* } 0 } */
+/* { dg-final { scan-assembler-not "\tcall\t__sync_fetch_and_add_1" } } */
+/* { dg-final { scan-assembler-not "\tcall\t__sync_fetch_and_nand_1" } } */
+/* { dg-final { scan-assembler-not "\tcall\t__sync_bool_compare_and_swap_1" } } */
+
+#include "inline-atomics-1.c"
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-3.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-3.c
new file mode 100644
index 00000000000..709f3734377
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-3.c
@@ -0,0 +1,569 @@
+/* Check all char alignments.  */
+/* Duplicate logic as libatomic/testsuite/libatomic.c/atomic-op-1.c */
+/* Test __atomic routines for existence and proper execution on 1 byte
+   values with each valid memory model.  */
+/* { dg-do run } */
+/* { dg-options "-minline-atomics -Wno-address-of-packed-member" } */
+
+/* Test the execution of the __atomic_*OP builtin routines for a char.  */
+
+extern void abort(void);
+
+char count, res;
+const char init = ~0;
+
+struct A
+{
+   char a;
+   char b;
+   char c;
+   char d;
+} __attribute__ ((packed)) A;
+
+/* The fetch_op routines return the original value before the operation.  */
+
+void
+test_fetch_add (char* v)
+{
+  *v = 0;
+  count = 1;
+
+  if (__atomic_fetch_add (v, count, __ATOMIC_RELAXED) != 0)
+    abort ();
+
+  if (__atomic_fetch_add (v, 1, __ATOMIC_CONSUME) != 1)
+    abort ();
+
+  if (__atomic_fetch_add (v, count, __ATOMIC_ACQUIRE) != 2)
+    abort ();
+
+  if (__atomic_fetch_add (v, 1, __ATOMIC_RELEASE) != 3)
+    abort ();
+
+  if (__atomic_fetch_add (v, count, __ATOMIC_ACQ_REL) != 4)
+    abort ();
+
+  if (__atomic_fetch_add (v, 1, __ATOMIC_SEQ_CST) != 5)
+    abort ();
+}
+
+
+void
+test_fetch_sub (char* v)
+{
+  *v = res = 20;
+  count = 0;
+
+  if (__atomic_fetch_sub (v, count + 1, __ATOMIC_RELAXED) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, 1, __ATOMIC_CONSUME) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, count + 1, __ATOMIC_ACQUIRE) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, 1, __ATOMIC_RELEASE) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, count + 1, __ATOMIC_ACQ_REL) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, 1, __ATOMIC_SEQ_CST) !=  res--)
+    abort ();
+}
+
+void
+test_fetch_and (char* v)
+{
+  *v = init;
+
+  if (__atomic_fetch_and (v, 0, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_fetch_and (v, init, __ATOMIC_CONSUME) !=  0)
+    abort ();
+
+  if (__atomic_fetch_and (v, 0, __ATOMIC_ACQUIRE) !=  0)
+    abort ();
+
+  *v = ~*v;
+  if (__atomic_fetch_and (v, init, __ATOMIC_RELEASE) !=  init)
+    abort ();
+
+  if (__atomic_fetch_and (v, 0, __ATOMIC_ACQ_REL) !=  init)
+    abort ();
+
+  if (__atomic_fetch_and (v, 0, __ATOMIC_SEQ_CST) !=  0)
+    abort ();
+}
+
+void
+test_fetch_nand (char* v)
+{
+  *v = init;
+
+  if (__atomic_fetch_nand (v, 0, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_fetch_nand (v, init, __ATOMIC_CONSUME) !=  init)
+    abort ();
+
+  if (__atomic_fetch_nand (v, 0, __ATOMIC_ACQUIRE) !=  0 )
+    abort ();
+
+  if (__atomic_fetch_nand (v, init, __ATOMIC_RELEASE) !=  init)
+    abort ();
+
+  if (__atomic_fetch_nand (v, init, __ATOMIC_ACQ_REL) !=  0)
+    abort ();
+
+  if (__atomic_fetch_nand (v, 0, __ATOMIC_SEQ_CST) !=  init)
+    abort ();
+}
+
+void
+test_fetch_xor (char* v)
+{
+  *v = init;
+  count = 0;
+
+  if (__atomic_fetch_xor (v, count, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_fetch_xor (v, ~count, __ATOMIC_CONSUME) !=  init)
+    abort ();
+
+  if (__atomic_fetch_xor (v, 0, __ATOMIC_ACQUIRE) !=  0)
+    abort ();
+
+  if (__atomic_fetch_xor (v, ~count, __ATOMIC_RELEASE) !=  0)
+    abort ();
+
+  if (__atomic_fetch_xor (v, 0, __ATOMIC_ACQ_REL) !=  init)
+    abort ();
+
+  if (__atomic_fetch_xor (v, ~count, __ATOMIC_SEQ_CST) !=  init)
+    abort ();
+}
+
+void
+test_fetch_or (char* v)
+{
+  *v = 0;
+  count = 1;
+
+  if (__atomic_fetch_or (v, count, __ATOMIC_RELAXED) !=  0)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, 2, __ATOMIC_CONSUME) !=  1)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, count, __ATOMIC_ACQUIRE) !=  3)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, 8, __ATOMIC_RELEASE) !=  7)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, count, __ATOMIC_ACQ_REL) !=  15)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, count, __ATOMIC_SEQ_CST) !=  31)
+    abort ();
+}
+
+/* The OP_fetch routines return the new value after the operation.  */
+
+void
+test_add_fetch (char* v)
+{
+  *v = 0;
+  count = 1;
+
+  if (__atomic_add_fetch (v, count, __ATOMIC_RELAXED) != 1)
+    abort ();
+
+  if (__atomic_add_fetch (v, 1, __ATOMIC_CONSUME) != 2)
+    abort ();
+
+  if (__atomic_add_fetch (v, count, __ATOMIC_ACQUIRE) != 3)
+    abort ();
+
+  if (__atomic_add_fetch (v, 1, __ATOMIC_RELEASE) != 4)
+    abort ();
+
+  if (__atomic_add_fetch (v, count, __ATOMIC_ACQ_REL) != 5)
+    abort ();
+
+  if (__atomic_add_fetch (v, count, __ATOMIC_SEQ_CST) != 6)
+    abort ();
+}
+
+
+void
+test_sub_fetch (char* v)
+{
+  *v = res = 20;
+  count = 0;
+
+  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_RELAXED) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, 1, __ATOMIC_CONSUME) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_ACQUIRE) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, 1, __ATOMIC_RELEASE) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_ACQ_REL) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_SEQ_CST) !=  --res)
+    abort ();
+}
+
+void
+test_and_fetch (char* v)
+{
+  *v = init;
+
+  if (__atomic_and_fetch (v, 0, __ATOMIC_RELAXED) !=  0)
+    abort ();
+
+  *v = init;
+  if (__atomic_and_fetch (v, init, __ATOMIC_CONSUME) !=  init)
+    abort ();
+
+  if (__atomic_and_fetch (v, 0, __ATOMIC_ACQUIRE) !=  0)
+    abort ();
+
+  *v = ~*v;
+  if (__atomic_and_fetch (v, init, __ATOMIC_RELEASE) !=  init)
+    abort ();
+
+  if (__atomic_and_fetch (v, 0, __ATOMIC_ACQ_REL) !=  0)
+    abort ();
+
+  *v = ~*v;
+  if (__atomic_and_fetch (v, 0, __ATOMIC_SEQ_CST) !=  0)
+    abort ();
+}
+
+void
+test_nand_fetch (char* v)
+{
+  *v = init;
+
+  if (__atomic_nand_fetch (v, 0, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_nand_fetch (v, init, __ATOMIC_CONSUME) !=  0)
+    abort ();
+
+  if (__atomic_nand_fetch (v, 0, __ATOMIC_ACQUIRE) !=  init)
+    abort ();
+
+  if (__atomic_nand_fetch (v, init, __ATOMIC_RELEASE) !=  0)
+    abort ();
+
+  if (__atomic_nand_fetch (v, init, __ATOMIC_ACQ_REL) !=  init)
+    abort ();
+
+  if (__atomic_nand_fetch (v, 0, __ATOMIC_SEQ_CST) !=  init)
+    abort ();
+}
+
+
+
+void
+test_xor_fetch (char* v)
+{
+  *v = init;
+  count = 0;
+
+  if (__atomic_xor_fetch (v, count, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_xor_fetch (v, ~count, __ATOMIC_CONSUME) !=  0)
+    abort ();
+
+  if (__atomic_xor_fetch (v, 0, __ATOMIC_ACQUIRE) !=  0)
+    abort ();
+
+  if (__atomic_xor_fetch (v, ~count, __ATOMIC_RELEASE) !=  init)
+    abort ();
+
+  if (__atomic_xor_fetch (v, 0, __ATOMIC_ACQ_REL) !=  init)
+    abort ();
+
+  if (__atomic_xor_fetch (v, ~count, __ATOMIC_SEQ_CST) !=  0)
+    abort ();
+}
+
+void
+test_or_fetch (char* v)
+{
+  *v = 0;
+  count = 1;
+
+  if (__atomic_or_fetch (v, count, __ATOMIC_RELAXED) !=  1)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, 2, __ATOMIC_CONSUME) !=  3)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, count, __ATOMIC_ACQUIRE) !=  7)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, 8, __ATOMIC_RELEASE) !=  15)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, count, __ATOMIC_ACQ_REL) !=  31)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, count, __ATOMIC_SEQ_CST) !=  63)
+    abort ();
+}
+
+
+/* Test the OP routines with a result which isn't used. Use both variations
+   within each function.  */
+
+void
+test_add (char* v)
+{
+  *v = 0;
+  count = 1;
+
+  __atomic_add_fetch (v, count, __ATOMIC_RELAXED);
+  if (*v != 1)
+    abort ();
+
+  __atomic_fetch_add (v, count, __ATOMIC_CONSUME);
+  if (*v != 2)
+    abort ();
+
+  __atomic_add_fetch (v, 1 , __ATOMIC_ACQUIRE);
+  if (*v != 3)
+    abort ();
+
+  __atomic_fetch_add (v, 1, __ATOMIC_RELEASE);
+  if (*v != 4)
+    abort ();
+
+  __atomic_add_fetch (v, count, __ATOMIC_ACQ_REL);
+  if (*v != 5)
+    abort ();
+
+  __atomic_fetch_add (v, count, __ATOMIC_SEQ_CST);
+  if (*v != 6)
+    abort ();
+}
+
+
+void
+test_sub (char* v)
+{
+  *v = res = 20;
+  count = 0;
+
+  __atomic_sub_fetch (v, count + 1, __ATOMIC_RELAXED);
+  if (*v != --res)
+    abort ();
+
+  __atomic_fetch_sub (v, count + 1, __ATOMIC_CONSUME);
+  if (*v != --res)
+    abort ();
+
+  __atomic_sub_fetch (v, 1, __ATOMIC_ACQUIRE);
+  if (*v != --res)
+    abort ();
+
+  __atomic_fetch_sub (v, 1, __ATOMIC_RELEASE);
+  if (*v != --res)
+    abort ();
+
+  __atomic_sub_fetch (v, count + 1, __ATOMIC_ACQ_REL);
+  if (*v != --res)
+    abort ();
+
+  __atomic_fetch_sub (v, count + 1, __ATOMIC_SEQ_CST);
+  if (*v != --res)
+    abort ();
+}
+
+void
+test_and (char* v)
+{
+  *v = init;
+
+  __atomic_and_fetch (v, 0, __ATOMIC_RELAXED);
+  if (*v != 0)
+    abort ();
+
+  *v = init;
+  __atomic_fetch_and (v, init, __ATOMIC_CONSUME);
+  if (*v != init)
+    abort ();
+
+  __atomic_and_fetch (v, 0, __ATOMIC_ACQUIRE);
+  if (*v != 0)
+    abort ();
+
+  *v = ~*v;
+  __atomic_fetch_and (v, init, __ATOMIC_RELEASE);
+  if (*v != init)
+    abort ();
+
+  __atomic_and_fetch (v, 0, __ATOMIC_ACQ_REL);
+  if (*v != 0)
+    abort ();
+
+  *v = ~*v;
+  __atomic_fetch_and (v, 0, __ATOMIC_SEQ_CST);
+  if (*v != 0)
+    abort ();
+}
+
+void
+test_nand (char* v)
+{
+  *v = init;
+
+  __atomic_fetch_nand (v, 0, __ATOMIC_RELAXED);
+  if (*v != init)
+    abort ();
+
+  __atomic_fetch_nand (v, init, __ATOMIC_CONSUME);
+  if (*v != 0)
+    abort ();
+
+  __atomic_nand_fetch (v, 0, __ATOMIC_ACQUIRE);
+  if (*v != init)
+    abort ();
+
+  __atomic_nand_fetch (v, init, __ATOMIC_RELEASE);
+  if (*v != 0)
+    abort ();
+
+  __atomic_fetch_nand (v, init, __ATOMIC_ACQ_REL);
+  if (*v != init)
+    abort ();
+
+  __atomic_nand_fetch (v, 0, __ATOMIC_SEQ_CST);
+  if (*v != init)
+    abort ();
+}
+
+
+
+void
+test_xor (char* v)
+{
+  *v = init;
+  count = 0;
+
+  __atomic_xor_fetch (v, count, __ATOMIC_RELAXED);
+  if (*v != init)
+    abort ();
+
+  __atomic_fetch_xor (v, ~count, __ATOMIC_CONSUME);
+  if (*v != 0)
+    abort ();
+
+  __atomic_xor_fetch (v, 0, __ATOMIC_ACQUIRE);
+  if (*v != 0)
+    abort ();
+
+  __atomic_fetch_xor (v, ~count, __ATOMIC_RELEASE);
+  if (*v != init)
+    abort ();
+
+  __atomic_fetch_xor (v, 0, __ATOMIC_ACQ_REL);
+  if (*v != init)
+    abort ();
+
+  __atomic_xor_fetch (v, ~count, __ATOMIC_SEQ_CST);
+  if (*v != 0)
+    abort ();
+}
+
+void
+test_or (char* v)
+{
+  *v = 0;
+  count = 1;
+
+  __atomic_or_fetch (v, count, __ATOMIC_RELAXED);
+  if (*v != 1)
+    abort ();
+
+  count *= 2;
+  __atomic_fetch_or (v, count, __ATOMIC_CONSUME);
+  if (*v != 3)
+    abort ();
+
+  count *= 2;
+  __atomic_or_fetch (v, 4, __ATOMIC_ACQUIRE);
+  if (*v != 7)
+    abort ();
+
+  count *= 2;
+  __atomic_fetch_or (v, 8, __ATOMIC_RELEASE);
+  if (*v != 15)
+    abort ();
+
+  count *= 2;
+  __atomic_or_fetch (v, count, __ATOMIC_ACQ_REL);
+  if (*v != 31)
+    abort ();
+
+  count *= 2;
+  __atomic_fetch_or (v, count, __ATOMIC_SEQ_CST);
+  if (*v != 63)
+    abort ();
+}
+
+int
+main ()
+{
+  char* V[] = {&A.a, &A.b, &A.c, &A.d};
+
+  for (int i = 0; i < 4; i++) {
+    test_fetch_add (V[i]);
+    test_fetch_sub (V[i]);
+    test_fetch_and (V[i]);
+    test_fetch_nand (V[i]);
+    test_fetch_xor (V[i]);
+    test_fetch_or (V[i]);
+
+    test_add_fetch (V[i]);
+    test_sub_fetch (V[i]);
+    test_and_fetch (V[i]);
+    test_nand_fetch (V[i]);
+    test_xor_fetch (V[i]);
+    test_or_fetch (V[i]);
+
+    test_add (V[i]);
+    test_sub (V[i]);
+    test_and (V[i]);
+    test_nand (V[i]);
+    test_xor (V[i]);
+    test_or (V[i]);
+  }
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-4.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-4.c
new file mode 100644
index 00000000000..eecfaae5cc6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-4.c
@@ -0,0 +1,566 @@
+/* Check all short alignments.  */
+/* Duplicate logic as libatomic/testsuite/libatomic.c/atomic-op-2.c */
+/* Test __atomic routines for existence and proper execution on 2 byte
+   values with each valid memory model.  */
+/* { dg-do run } */
+/* { dg-options "-minline-atomics -Wno-address-of-packed-member" } */
+
+/* Test the execution of the __atomic_*OP builtin routines for a short.  */
+
+extern void abort(void);
+
+short count, res;
+const short init = ~0;
+
+struct A
+{
+   short a;
+   short b;
+} __attribute__ ((packed)) A;
+
+/* The fetch_op routines return the original value before the operation.  */
+
+void
+test_fetch_add (short* v)
+{
+  *v = 0;
+  count = 1;
+
+  if (__atomic_fetch_add (v, count, __ATOMIC_RELAXED) != 0)
+    abort ();
+
+  if (__atomic_fetch_add (v, 1, __ATOMIC_CONSUME) != 1)
+    abort ();
+
+  if (__atomic_fetch_add (v, count, __ATOMIC_ACQUIRE) != 2)
+    abort ();
+
+  if (__atomic_fetch_add (v, 1, __ATOMIC_RELEASE) != 3)
+    abort ();
+
+  if (__atomic_fetch_add (v, count, __ATOMIC_ACQ_REL) != 4)
+    abort ();
+
+  if (__atomic_fetch_add (v, 1, __ATOMIC_SEQ_CST) != 5)
+    abort ();
+}
+
+
+void
+test_fetch_sub (short* v)
+{
+  *v = res = 20;
+  count = 0;
+
+  if (__atomic_fetch_sub (v, count + 1, __ATOMIC_RELAXED) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, 1, __ATOMIC_CONSUME) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, count + 1, __ATOMIC_ACQUIRE) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, 1, __ATOMIC_RELEASE) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, count + 1, __ATOMIC_ACQ_REL) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, 1, __ATOMIC_SEQ_CST) !=  res--)
+    abort ();
+}
+
+void
+test_fetch_and (short* v)
+{
+  *v = init;
+
+  if (__atomic_fetch_and (v, 0, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_fetch_and (v, init, __ATOMIC_CONSUME) !=  0)
+    abort ();
+
+  if (__atomic_fetch_and (v, 0, __ATOMIC_ACQUIRE) !=  0)
+    abort ();
+
+  *v = ~*v;
+  if (__atomic_fetch_and (v, init, __ATOMIC_RELEASE) !=  init)
+    abort ();
+
+  if (__atomic_fetch_and (v, 0, __ATOMIC_ACQ_REL) !=  init)
+    abort ();
+
+  if (__atomic_fetch_and (v, 0, __ATOMIC_SEQ_CST) !=  0)
+    abort ();
+}
+
+void
+test_fetch_nand (short* v)
+{
+  *v = init;
+
+  if (__atomic_fetch_nand (v, 0, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_fetch_nand (v, init, __ATOMIC_CONSUME) !=  init)
+    abort ();
+
+  if (__atomic_fetch_nand (v, 0, __ATOMIC_ACQUIRE) !=  0 )
+    abort ();
+
+  if (__atomic_fetch_nand (v, init, __ATOMIC_RELEASE) !=  init)
+    abort ();
+
+  if (__atomic_fetch_nand (v, init, __ATOMIC_ACQ_REL) !=  0)
+    abort ();
+
+  if (__atomic_fetch_nand (v, 0, __ATOMIC_SEQ_CST) !=  init)
+    abort ();
+}
+
+void
+test_fetch_xor (short* v)
+{
+  *v = init;
+  count = 0;
+
+  if (__atomic_fetch_xor (v, count, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_fetch_xor (v, ~count, __ATOMIC_CONSUME) !=  init)
+    abort ();
+
+  if (__atomic_fetch_xor (v, 0, __ATOMIC_ACQUIRE) !=  0)
+    abort ();
+
+  if (__atomic_fetch_xor (v, ~count, __ATOMIC_RELEASE) !=  0)
+    abort ();
+
+  if (__atomic_fetch_xor (v, 0, __ATOMIC_ACQ_REL) !=  init)
+    abort ();
+
+  if (__atomic_fetch_xor (v, ~count, __ATOMIC_SEQ_CST) !=  init)
+    abort ();
+}
+
+void
+test_fetch_or (short* v)
+{
+  *v = 0;
+  count = 1;
+
+  if (__atomic_fetch_or (v, count, __ATOMIC_RELAXED) !=  0)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, 2, __ATOMIC_CONSUME) !=  1)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, count, __ATOMIC_ACQUIRE) !=  3)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, 8, __ATOMIC_RELEASE) !=  7)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, count, __ATOMIC_ACQ_REL) !=  15)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, count, __ATOMIC_SEQ_CST) !=  31)
+    abort ();
+}
+
+/* The OP_fetch routines return the new value after the operation.  */
+
+void
+test_add_fetch (short* v)
+{
+  *v = 0;
+  count = 1;
+
+  if (__atomic_add_fetch (v, count, __ATOMIC_RELAXED) != 1)
+    abort ();
+
+  if (__atomic_add_fetch (v, 1, __ATOMIC_CONSUME) != 2)
+    abort ();
+
+  if (__atomic_add_fetch (v, count, __ATOMIC_ACQUIRE) != 3)
+    abort ();
+
+  if (__atomic_add_fetch (v, 1, __ATOMIC_RELEASE) != 4)
+    abort ();
+
+  if (__atomic_add_fetch (v, count, __ATOMIC_ACQ_REL) != 5)
+    abort ();
+
+  if (__atomic_add_fetch (v, count, __ATOMIC_SEQ_CST) != 6)
+    abort ();
+}
+
+
+void
+test_sub_fetch (short* v)
+{
+  *v = res = 20;
+  count = 0;
+
+  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_RELAXED) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, 1, __ATOMIC_CONSUME) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_ACQUIRE) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, 1, __ATOMIC_RELEASE) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_ACQ_REL) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_SEQ_CST) !=  --res)
+    abort ();
+}
+
+void
+test_and_fetch (short* v)
+{
+  *v = init;
+
+  if (__atomic_and_fetch (v, 0, __ATOMIC_RELAXED) !=  0)
+    abort ();
+
+  *v = init;
+  if (__atomic_and_fetch (v, init, __ATOMIC_CONSUME) !=  init)
+    abort ();
+
+  if (__atomic_and_fetch (v, 0, __ATOMIC_ACQUIRE) !=  0)
+    abort ();
+
+  *v = ~*v;
+  if (__atomic_and_fetch (v, init, __ATOMIC_RELEASE) !=  init)
+    abort ();
+
+  if (__atomic_and_fetch (v, 0, __ATOMIC_ACQ_REL) !=  0)
+    abort ();
+
+  *v = ~*v;
+  if (__atomic_and_fetch (v, 0, __ATOMIC_SEQ_CST) !=  0)
+    abort ();
+}
+
+void
+test_nand_fetch (short* v)
+{
+  *v = init;
+
+  if (__atomic_nand_fetch (v, 0, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_nand_fetch (v, init, __ATOMIC_CONSUME) !=  0)
+    abort ();
+
+  if (__atomic_nand_fetch (v, 0, __ATOMIC_ACQUIRE) !=  init)
+    abort ();
+
+  if (__atomic_nand_fetch (v, init, __ATOMIC_RELEASE) !=  0)
+    abort ();
+
+  if (__atomic_nand_fetch (v, init, __ATOMIC_ACQ_REL) !=  init)
+    abort ();
+
+  if (__atomic_nand_fetch (v, 0, __ATOMIC_SEQ_CST) !=  init)
+    abort ();
+}
+
+
+
+void
+test_xor_fetch (short* v)
+{
+  *v = init;
+  count = 0;
+
+  if (__atomic_xor_fetch (v, count, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_xor_fetch (v, ~count, __ATOMIC_CONSUME) !=  0)
+    abort ();
+
+  if (__atomic_xor_fetch (v, 0, __ATOMIC_ACQUIRE) !=  0)
+    abort ();
+
+  if (__atomic_xor_fetch (v, ~count, __ATOMIC_RELEASE) !=  init)
+    abort ();
+
+  if (__atomic_xor_fetch (v, 0, __ATOMIC_ACQ_REL) !=  init)
+    abort ();
+
+  if (__atomic_xor_fetch (v, ~count, __ATOMIC_SEQ_CST) !=  0)
+    abort ();
+}
+
+void
+test_or_fetch (short* v)
+{
+  *v = 0;
+  count = 1;
+
+  if (__atomic_or_fetch (v, count, __ATOMIC_RELAXED) !=  1)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, 2, __ATOMIC_CONSUME) !=  3)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, count, __ATOMIC_ACQUIRE) !=  7)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, 8, __ATOMIC_RELEASE) !=  15)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, count, __ATOMIC_ACQ_REL) !=  31)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, count, __ATOMIC_SEQ_CST) !=  63)
+    abort ();
+}
+
+
+/* Test the OP routines with a result which isn't used. Use both variations
+   within each function.  */
+
+void
+test_add (short* v)
+{
+  *v = 0;
+  count = 1;
+
+  __atomic_add_fetch (v, count, __ATOMIC_RELAXED);
+  if (*v != 1)
+    abort ();
+
+  __atomic_fetch_add (v, count, __ATOMIC_CONSUME);
+  if (*v != 2)
+    abort ();
+
+  __atomic_add_fetch (v, 1 , __ATOMIC_ACQUIRE);
+  if (*v != 3)
+    abort ();
+
+  __atomic_fetch_add (v, 1, __ATOMIC_RELEASE);
+  if (*v != 4)
+    abort ();
+
+  __atomic_add_fetch (v, count, __ATOMIC_ACQ_REL);
+  if (*v != 5)
+    abort ();
+
+  __atomic_fetch_add (v, count, __ATOMIC_SEQ_CST);
+  if (*v != 6)
+    abort ();
+}
+
+
+void
+test_sub (short* v)
+{
+  *v = res = 20;
+  count = 0;
+
+  __atomic_sub_fetch (v, count + 1, __ATOMIC_RELAXED);
+  if (*v != --res)
+    abort ();
+
+  __atomic_fetch_sub (v, count + 1, __ATOMIC_CONSUME);
+  if (*v != --res)
+    abort ();
+
+  __atomic_sub_fetch (v, 1, __ATOMIC_ACQUIRE);
+  if (*v != --res)
+    abort ();
+
+  __atomic_fetch_sub (v, 1, __ATOMIC_RELEASE);
+  if (*v != --res)
+    abort ();
+
+  __atomic_sub_fetch (v, count + 1, __ATOMIC_ACQ_REL);
+  if (*v != --res)
+    abort ();
+
+  __atomic_fetch_sub (v, count + 1, __ATOMIC_SEQ_CST);
+  if (*v != --res)
+    abort ();
+}
+
+void
+test_and (short* v)
+{
+  *v = init;
+
+  __atomic_and_fetch (v, 0, __ATOMIC_RELAXED);
+  if (*v != 0)
+    abort ();
+
+  *v = init;
+  __atomic_fetch_and (v, init, __ATOMIC_CONSUME);
+  if (*v != init)
+    abort ();
+
+  __atomic_and_fetch (v, 0, __ATOMIC_ACQUIRE);
+  if (*v != 0)
+    abort ();
+
+  *v = ~*v;
+  __atomic_fetch_and (v, init, __ATOMIC_RELEASE);
+  if (*v != init)
+    abort ();
+
+  __atomic_and_fetch (v, 0, __ATOMIC_ACQ_REL);
+  if (*v != 0)
+    abort ();
+
+  *v = ~*v;
+  __atomic_fetch_and (v, 0, __ATOMIC_SEQ_CST);
+  if (*v != 0)
+    abort ();
+}
+
+void
+test_nand (short* v)
+{
+  *v = init;
+
+  __atomic_fetch_nand (v, 0, __ATOMIC_RELAXED);
+  if (*v != init)
+    abort ();
+
+  __atomic_fetch_nand (v, init, __ATOMIC_CONSUME);
+  if (*v != 0)
+    abort ();
+
+  __atomic_nand_fetch (v, 0, __ATOMIC_ACQUIRE);
+  if (*v != init)
+    abort ();
+
+  __atomic_nand_fetch (v, init, __ATOMIC_RELEASE);
+  if (*v != 0)
+    abort ();
+
+  __atomic_fetch_nand (v, init, __ATOMIC_ACQ_REL);
+  if (*v != init)
+    abort ();
+
+  __atomic_nand_fetch (v, 0, __ATOMIC_SEQ_CST);
+  if (*v != init)
+    abort ();
+}
+
+
+
+void
+test_xor (short* v)
+{
+  *v = init;
+  count = 0;
+
+  __atomic_xor_fetch (v, count, __ATOMIC_RELAXED);
+  if (*v != init)
+    abort ();
+
+  __atomic_fetch_xor (v, ~count, __ATOMIC_CONSUME);
+  if (*v != 0)
+    abort ();
+
+  __atomic_xor_fetch (v, 0, __ATOMIC_ACQUIRE);
+  if (*v != 0)
+    abort ();
+
+  __atomic_fetch_xor (v, ~count, __ATOMIC_RELEASE);
+  if (*v != init)
+    abort ();
+
+  __atomic_fetch_xor (v, 0, __ATOMIC_ACQ_REL);
+  if (*v != init)
+    abort ();
+
+  __atomic_xor_fetch (v, ~count, __ATOMIC_SEQ_CST);
+  if (*v != 0)
+    abort ();
+}
+
+void
+test_or (short* v)
+{
+  *v = 0;
+  count = 1;
+
+  __atomic_or_fetch (v, count, __ATOMIC_RELAXED);
+  if (*v != 1)
+    abort ();
+
+  count *= 2;
+  __atomic_fetch_or (v, count, __ATOMIC_CONSUME);
+  if (*v != 3)
+    abort ();
+
+  count *= 2;
+  __atomic_or_fetch (v, 4, __ATOMIC_ACQUIRE);
+  if (*v != 7)
+    abort ();
+
+  count *= 2;
+  __atomic_fetch_or (v, 8, __ATOMIC_RELEASE);
+  if (*v != 15)
+    abort ();
+
+  count *= 2;
+  __atomic_or_fetch (v, count, __ATOMIC_ACQ_REL);
+  if (*v != 31)
+    abort ();
+
+  count *= 2;
+  __atomic_fetch_or (v, count, __ATOMIC_SEQ_CST);
+  if (*v != 63)
+    abort ();
+}
+
+int
+main () {
+  short* V[] = {&A.a, &A.b};
+
+  for (int i = 0; i < 2; i++) {
+    test_fetch_add (V[i]);
+    test_fetch_sub (V[i]);
+    test_fetch_and (V[i]);
+    test_fetch_nand (V[i]);
+    test_fetch_xor (V[i]);
+    test_fetch_or (V[i]);
+
+    test_add_fetch (V[i]);
+    test_sub_fetch (V[i]);
+    test_and_fetch (V[i]);
+    test_nand_fetch (V[i]);
+    test_xor_fetch (V[i]);
+    test_or_fetch (V[i]);
+
+    test_add (V[i]);
+    test_sub (V[i]);
+    test_and (V[i]);
+    test_nand (V[i]);
+    test_xor (V[i]);
+    test_or (V[i]);
+  }
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-5.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-5.c
new file mode 100644
index 00000000000..52093894a79
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-5.c
@@ -0,0 +1,87 @@
+/* Test __atomic routines for existence and proper execution on 1 byte
+   values with each valid memory model.  */
+/* Duplicate logic as libatomic/testsuite/libatomic.c/atomic-compare-exchange-1.c */
+/* { dg-do run } */
+/* { dg-options "-minline-atomics" } */
+
+/* Test the execution of the __atomic_compare_exchange_n builtin for a char.  */
+
+extern void abort(void);
+
+char v = 0;
+char expected = 0;
+char max = ~0;
+char desired = ~0;
+char zero = 0;
+
+#define STRONG 0
+#define WEAK 1
+
+int
+main ()
+{
+
+  if (!__atomic_compare_exchange_n (&v, &expected, max, STRONG , __ATOMIC_RELAXED, __ATOMIC_RELAXED))
+    abort ();
+  if (expected != 0)
+    abort ();
+
+  if (__atomic_compare_exchange_n (&v, &expected, 0, STRONG , __ATOMIC_ACQUIRE, __ATOMIC_RELAXED))
+    abort ();
+  if (expected != max)
+    abort ();
+
+  if (!__atomic_compare_exchange_n (&v, &expected, 0, STRONG , __ATOMIC_RELEASE, __ATOMIC_ACQUIRE))
+    abort ();
+  if (expected != max)
+    abort ();
+  if (v != 0)
+    abort ();
+
+  if (__atomic_compare_exchange_n (&v, &expected, desired, WEAK, __ATOMIC_ACQ_REL, __ATOMIC_ACQUIRE))
+    abort ();
+  if (expected != 0)
+    abort ();
+
+  if (!__atomic_compare_exchange_n (&v, &expected, desired, STRONG , __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST))
+    abort ();
+  if (expected != 0)
+    abort ();
+  if (v != max)
+    abort ();
+
+  /* Now test the generic version.  */
+
+  v = 0;
+
+  if (!__atomic_compare_exchange (&v, &expected, &max, STRONG, __ATOMIC_RELAXED, __ATOMIC_RELAXED))
+    abort ();
+  if (expected != 0)
+    abort ();
+
+  if (__atomic_compare_exchange (&v, &expected, &zero, STRONG , __ATOMIC_ACQUIRE, __ATOMIC_RELAXED))
+    abort ();
+  if (expected != max)
+    abort ();
+
+  if (!__atomic_compare_exchange (&v, &expected, &zero, STRONG , __ATOMIC_RELEASE, __ATOMIC_ACQUIRE))
+    abort ();
+  if (expected != max)
+    abort ();
+  if (v != 0)
+    abort ();
+
+  if (__atomic_compare_exchange (&v, &expected, &desired, WEAK, __ATOMIC_ACQ_REL, __ATOMIC_ACQUIRE))
+    abort ();
+  if (expected != 0)
+    abort ();
+
+  if (!__atomic_compare_exchange (&v, &expected, &desired, STRONG , __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST))
+    abort ();
+  if (expected != 0)
+    abort ();
+  if (v != max)
+    abort ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-6.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-6.c
new file mode 100644
index 00000000000..8fee8c44811
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-6.c
@@ -0,0 +1,87 @@
+/* Test __atomic routines for existence and proper execution on 2 byte
+   values with each valid memory model.  */
+/* Duplicate logic as libatomic/testsuite/libatomic.c/atomic-compare-exchange-2.c */
+/* { dg-do run } */
+/* { dg-options "-minline-atomics" } */
+
+/* Test the execution of the __atomic_compare_exchange_n builtin for a short.  */
+
+extern void abort(void);
+
+short v = 0;
+short expected = 0;
+short max = ~0;
+short desired = ~0;
+short zero = 0;
+
+#define STRONG 0
+#define WEAK 1
+
+int
+main ()
+{
+
+  if (!__atomic_compare_exchange_n (&v, &expected, max, STRONG , __ATOMIC_RELAXED, __ATOMIC_RELAXED))
+    abort ();
+  if (expected != 0)
+    abort ();
+
+  if (__atomic_compare_exchange_n (&v, &expected, 0, STRONG , __ATOMIC_ACQUIRE, __ATOMIC_RELAXED))
+    abort ();
+  if (expected != max)
+    abort ();
+
+  if (!__atomic_compare_exchange_n (&v, &expected, 0, STRONG , __ATOMIC_RELEASE, __ATOMIC_ACQUIRE))
+    abort ();
+  if (expected != max)
+    abort ();
+  if (v != 0)
+    abort ();
+
+  if (__atomic_compare_exchange_n (&v, &expected, desired, WEAK, __ATOMIC_ACQ_REL, __ATOMIC_ACQUIRE))
+    abort ();
+  if (expected != 0)
+    abort ();
+
+  if (!__atomic_compare_exchange_n (&v, &expected, desired, STRONG , __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST))
+    abort ();
+  if (expected != 0)
+    abort ();
+  if (v != max)
+    abort ();
+
+  /* Now test the generic version.  */
+
+  v = 0;
+
+  if (!__atomic_compare_exchange (&v, &expected, &max, STRONG, __ATOMIC_RELAXED, __ATOMIC_RELAXED))
+    abort ();
+  if (expected != 0)
+    abort ();
+
+  if (__atomic_compare_exchange (&v, &expected, &zero, STRONG , __ATOMIC_ACQUIRE, __ATOMIC_RELAXED))
+    abort ();
+  if (expected != max)
+    abort ();
+
+  if (!__atomic_compare_exchange (&v, &expected, &zero, STRONG , __ATOMIC_RELEASE, __ATOMIC_ACQUIRE))
+    abort ();
+  if (expected != max)
+    abort ();
+  if (v != 0)
+    abort ();
+
+  if (__atomic_compare_exchange (&v, &expected, &desired, WEAK, __ATOMIC_ACQ_REL, __ATOMIC_ACQUIRE))
+    abort ();
+  if (expected != 0)
+    abort ();
+
+  if (!__atomic_compare_exchange (&v, &expected, &desired, STRONG , __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST))
+    abort ();
+  if (expected != 0)
+    abort ();
+  if (v != max)
+    abort ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-7.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-7.c
new file mode 100644
index 00000000000..24c344c0ce3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-7.c
@@ -0,0 +1,69 @@
+/* Test __atomic routines for existence and proper execution on 1 byte
+   values with each valid memory model.  */
+/* Duplicate logic as libatomic/testsuite/libatomic.c/atomic-exchange-1.c */
+/* { dg-do run } */
+/* { dg-options "-minline-atomics" } */
+
+/* Test the execution of the __atomic_exchange_n builtin for a char.  */
+
+extern void abort(void);
+
+char v, count, ret;
+
+int
+main ()
+{
+  v = 0;
+  count = 0;
+
+  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_RELAXED) != count)
+    abort ();
+  count++;
+
+  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_ACQUIRE) != count)
+    abort ();
+  count++;
+
+  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_RELEASE) != count)
+    abort ();
+  count++;
+
+  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_ACQ_REL) != count)
+    abort ();
+  count++;
+
+  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_SEQ_CST) != count)
+    abort ();
+  count++;
+
+  /* Now test the generic version.  */
+
+  count++;
+
+  __atomic_exchange (&v, &count, &ret, __ATOMIC_RELAXED);
+  if (ret != count - 1 || v != count)
+    abort ();
+  count++;
+
+  __atomic_exchange (&v, &count, &ret, __ATOMIC_ACQUIRE);
+  if (ret != count - 1 || v != count)
+    abort ();
+  count++;
+
+  __atomic_exchange (&v, &count, &ret, __ATOMIC_RELEASE);
+  if (ret != count - 1 || v != count)
+    abort ();
+  count++;
+
+  __atomic_exchange (&v, &count, &ret, __ATOMIC_ACQ_REL);
+  if (ret != count - 1 || v != count)
+    abort ();
+  count++;
+
+  __atomic_exchange (&v, &count, &ret, __ATOMIC_SEQ_CST);
+  if (ret != count - 1 || v != count)
+    abort ();
+  count++;
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-8.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-8.c
new file mode 100644
index 00000000000..edc212df04e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-8.c
@@ -0,0 +1,69 @@
+/* Test __atomic routines for existence and proper execution on 2 byte
+   values with each valid memory model.  */
+/* Duplicate logic as libatomic/testsuite/libatomic.c/atomic-exchange-2.c */
+/* { dg-do run } */
+/* { dg-options "-minline-atomics" } */
+
+/* Test the execution of the __atomic_X builtin for a short.  */
+
+extern void abort(void);
+
+short v, count, ret;
+
+int
+main ()
+{
+  v = 0;
+  count = 0;
+
+  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_RELAXED) != count)
+    abort ();
+  count++;
+
+  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_ACQUIRE) != count)
+    abort ();
+  count++;
+
+  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_RELEASE) != count)
+    abort ();
+  count++;
+
+  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_ACQ_REL) != count)
+    abort ();
+  count++;
+
+  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_SEQ_CST) != count)
+    abort ();
+  count++;
+
+  /* Now test the generic version.  */
+
+  count++;
+
+  __atomic_exchange (&v, &count, &ret, __ATOMIC_RELAXED);
+  if (ret != count - 1 || v != count)
+    abort ();
+  count++;
+
+  __atomic_exchange (&v, &count, &ret, __ATOMIC_ACQUIRE);
+  if (ret != count - 1 || v != count)
+    abort ();
+  count++;
+
+  __atomic_exchange (&v, &count, &ret, __ATOMIC_RELEASE);
+  if (ret != count - 1 || v != count)
+    abort ();
+  count++;
+
+  __atomic_exchange (&v, &count, &ret, __ATOMIC_ACQ_REL);
+  if (ret != count - 1 || v != count)
+    abort ();
+  count++;
+
+  __atomic_exchange (&v, &count, &ret, __ATOMIC_SEQ_CST);
+  if (ret != count - 1 || v != count)
+    abort ();
+  count++;
+
+  return 0;
+}
diff --git a/libgcc/config/riscv/atomic.c b/libgcc/config/riscv/atomic.c
index 69f53623509..573d163ea04 100644
--- a/libgcc/config/riscv/atomic.c
+++ b/libgcc/config/riscv/atomic.c
@@ -30,6 +30,8 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 #define INVERT		"not %[tmp1], %[tmp1]\n\t"
 #define DONT_INVERT	""
 
+/* Logic duplicated in gcc/gcc/config/riscv/sync.md for use when inlining is enabled */
+
 #define GENERATE_FETCH_AND_OP(type, size, opname, insn, invert, cop)	\
   type __sync_fetch_and_ ## opname ## _ ## size (type *p, type v)	\
   {									\
-- 
2.25.1



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v7] RISCV: Inline subword atomic ops
  2023-04-18 21:41     ` [PATCH v7] " Patrick O'Neill
@ 2023-04-24 17:20       ` Patrick O'Neill
  2023-04-25  5:52       ` Jeff Law
  1 sibling, 0 replies; 24+ messages in thread
From: Patrick O'Neill @ 2023-04-24 17:20 UTC (permalink / raw)
  To: gcc-patches, Jeff Law; +Cc: palmer, kito.cheng, david.abd, schwab

Ping

Also here's the corrected link for Jeff's comments:
https://inbox.sourceware.org/gcc-patches/f965671f-5997-0220-8831-a94e8c68d060@gmail.com/T/#m53e5d46a94868e68693e0d79455ca5343cf275a9 


Patrick

On 4/18/23 14:41, Patrick O'Neill wrote:
> RISC-V has no support for subword atomic operations; code currently
> generates libatomic library calls.
>
> This patch changes the default behavior to inline subword atomic calls
> (using the same logic as the existing library call).
> Behavior can be specified using the -minline-atomics and
> -mno-inline-atomics command line flags.
>
> gcc/libgcc/config/riscv/atomic.c has the same logic implemented in asm.
> This will need to stay for backwards compatibility and the
> -mno-inline-atomics flag.
>
> 2023-04-18 Patrick O'Neill<patrick@rivosinc.com>
>
> 	PR target/104338
> 	* riscv-protos.h: Add helper function stubs.
> 	* riscv.cc: Add helper functions for subword masking.
> 	* riscv.opt: Add command-line flag.
> 	* sync.md: Add masking logic and inline asm for fetch_and_op,
> 	fetch_and_nand, CAS, and exchange ops.
> 	* invoke.texi: Add blurb regarding command-line flag.
> 	* inline-atomics-1.c: New test.
> 	* inline-atomics-2.c: Likewise.
> 	* inline-atomics-3.c: Likewise.
> 	* inline-atomics-4.c: Likewise.
> 	* inline-atomics-5.c: Likewise.
> 	* inline-atomics-6.c: Likewise.
> 	* inline-atomics-7.c: Likewise.
> 	* inline-atomics-8.c: Likewise.
> 	* atomic.c: Add reference to duplicate logic.
>
> Signed-off-by: Patrick O'Neill<patrick@rivosinc.com>
> Signed-off-by: Palmer Dabbelt<palmer@rivosinc.com>
> ---
> Comment from Jeff Law that gives more context to this patch:
> "So for others who may be interested.  The motivation here is that for a
> sub-word atomic we currently have to explicitly link in libatomic or we
> get undefined symbols.
>
> This is particularly problematical for the distros because we're one of
> the few (only?) architectures supported by the distros that require
> linking in libatomic for these cases.  THe distros don't want to adjust
> each affected packages and be stuck carrying that change forward or
> negotiating with all the relevant upstreams.  The distros might tackle
> this problem by porting this patch into their compiler tree which has
> its own set of problems with long term maintenance.
>
> The net is from a usability standpoint it's best if we get this problem
> addressed and backported to our gcc-13 RISC-V coordination branch.
>
> We had held this up pending resolution of some other issues in the
> atomics space.  In retrospect that might have been a mistake." https://inbox.sourceware.org/gcc-patches/87y1mpb57m.fsf@igel.home/ 
> --- v6: 
> https://inbox.sourceware.org/gcc-patches/20230418163913.2429812-1-patrick@rivosinc.com/ 
> Addressed Jeff Law's comments. 
> https://inbox.sourceware.org/gcc-patches/87y1mpb57m.fsf@igel.home/ 
> Changes: - Simplified define_expand expressions/removed unneeded 
> register constraints - Improve comment describing 
> riscv_subword_address - Use #include "inline-atomics-1.c" in inline-atomics-2.c
> - Use rtx addr_mask variable to describe the use of the -4 magic number.
> - Misc. formatting/define_expand comments
>
> No new failures on trunk.
> ---
> The mapping implemented here matches Libatomic. That mapping changes if
> "Implement ISA Manual Table A.6 Mappings" is merged. Depending on which
> patch is merged first, I will update the other to make sure the
> correct mapping is emitted.
>    https://gcc.gnu.org/pipermail/gcc-patches/2023-April/615748.html
> ---
>   gcc/config/riscv/riscv-protos.h               |   2 +
>   gcc/config/riscv/riscv.cc                     |  49 ++
>   gcc/config/riscv/riscv.opt                    |   4 +
>   gcc/config/riscv/sync.md                      | 301 +++++++++
>   gcc/doc/invoke.texi                           |  10 +-
>   .../gcc.target/riscv/inline-atomics-1.c       |  18 +
>   .../gcc.target/riscv/inline-atomics-2.c       |   9 +
>   .../gcc.target/riscv/inline-atomics-3.c       | 569 ++++++++++++++++++
>   .../gcc.target/riscv/inline-atomics-4.c       | 566 +++++++++++++++++
>   .../gcc.target/riscv/inline-atomics-5.c       |  87 +++
>   .../gcc.target/riscv/inline-atomics-6.c       |  87 +++
>   .../gcc.target/riscv/inline-atomics-7.c       |  69 +++
>   .../gcc.target/riscv/inline-atomics-8.c       |  69 +++
>   libgcc/config/riscv/atomic.c                  |   2 +
>   14 files changed, 1841 insertions(+), 1 deletion(-)
>   create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-1.c
>   create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-2.c
>   create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-3.c
>   create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-4.c
>   create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-5.c
>   create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-6.c
>   create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-7.c
>   create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-8.c
>
> diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
> index 5244e8dcbf0..02b33e02020 100644
> --- a/gcc/config/riscv/riscv-protos.h
> +++ b/gcc/config/riscv/riscv-protos.h
> @@ -79,6 +79,8 @@ extern void riscv_reinit (void);
>   extern poly_uint64 riscv_regmode_natural_size (machine_mode);
>   extern bool riscv_v_ext_vector_mode_p (machine_mode);
>   extern bool riscv_shamt_matches_mask_p (int, HOST_WIDE_INT);
> +extern void riscv_subword_address (rtx, rtx *, rtx *, rtx *, rtx *);
> +extern void riscv_lshift_subword (machine_mode, rtx, rtx, rtx *);
>   
>   /* Routines implemented in riscv-c.cc.  */
>   void riscv_cpu_cpp_builtins (cpp_reader *);
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index cdb47e81e7c..61f019380ae 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -7148,6 +7148,55 @@ riscv_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
>   							& ~zeroed_hardregs);
>   }
>   
> +/* Given memory reference MEM, expand code to compute the aligned
> +   memory address, shift and mask values and store them into
> +   *ALIGNED_MEM, *SHIFT, *MASK and *NOT_MASK.  */
> +
> +void
> +riscv_subword_address (rtx mem, rtx *aligned_mem, rtx *shift, rtx *mask,
> +		       rtx *not_mask)
> +{
> +  /* Align the memory address to a word.  */
> +  rtx addr = force_reg (Pmode, XEXP (mem, 0));
> +
> +  rtx addr_mask = gen_int_mode (-4, Pmode);
> +
> +  rtx aligned_addr = gen_reg_rtx (Pmode);
> +  emit_move_insn (aligned_addr,  gen_rtx_AND (Pmode, addr, addr_mask));
> +
> +  *aligned_mem = change_address (mem, SImode, aligned_addr);
> +
> +  /* Calculate the shift amount.  */
> +  emit_move_insn (*shift, gen_rtx_AND (SImode, gen_lowpart (SImode, addr),
> +				       gen_int_mode (3, SImode)));
> +  emit_move_insn (*shift, gen_rtx_ASHIFT (SImode, *shift,
> +					  gen_int_mode (3, SImode)));
> +
> +  /* Calculate the mask.  */
> +  int unshifted_mask = GET_MODE_MASK (GET_MODE (mem));
> +
> +  emit_move_insn (*mask, gen_int_mode (unshifted_mask, SImode));
> +
> +  emit_move_insn (*mask, gen_rtx_ASHIFT (SImode, *mask,
> +					 gen_lowpart (QImode, *shift)));
> +
> +  emit_move_insn (*not_mask, gen_rtx_NOT(SImode, *mask));
> +}
> +
> +/* Leftshift a subword within an SImode register.  */
> +
> +void
> +riscv_lshift_subword (machine_mode mode, rtx value, rtx shift,
> +		      rtx *shifted_value)
> +{
> +  rtx value_reg = gen_reg_rtx (SImode);
> +  emit_move_insn (value_reg, simplify_gen_subreg (SImode, value,
> +						  mode, 0));
> +
> +  emit_move_insn(*shifted_value, gen_rtx_ASHIFT (SImode, value_reg,
> +						 gen_lowpart (QImode, shift)));
> +}
> +
>   /* Initialize the GCC target structure.  */
>   #undef TARGET_ASM_ALIGNED_HI_OP
>   #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t"
> diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
> index ff1dd4ddd4f..bc5e63ab3e6 100644
> --- a/gcc/config/riscv/riscv.opt
> +++ b/gcc/config/riscv/riscv.opt
> @@ -254,3 +254,7 @@ Enum(isa_spec_class) String(20191213) Value(ISA_SPEC_CLASS_20191213)
>   misa-spec=
>   Target RejectNegative Joined Enum(isa_spec_class) Var(riscv_isa_spec) Init(TARGET_DEFAULT_ISA_SPEC)
>   Set the version of RISC-V ISA spec.
> +
> +minline-atomics
> +Target Var(TARGET_INLINE_SUBWORD_ATOMIC) Init(1)
> +Always inline subword atomic operations.
> diff --git a/gcc/config/riscv/sync.md b/gcc/config/riscv/sync.md
> index c932ef87b9d..83be6431cb6 100644
> --- a/gcc/config/riscv/sync.md
> +++ b/gcc/config/riscv/sync.md
> @@ -21,8 +21,11 @@
>   
>   (define_c_enum "unspec" [
>     UNSPEC_COMPARE_AND_SWAP
> +  UNSPEC_COMPARE_AND_SWAP_SUBWORD
>     UNSPEC_SYNC_OLD_OP
> +  UNSPEC_SYNC_OLD_OP_SUBWORD
>     UNSPEC_SYNC_EXCHANGE
> +  UNSPEC_SYNC_EXCHANGE_SUBWORD
>     UNSPEC_ATOMIC_STORE
>     UNSPEC_MEMORY_BARRIER
>   ])
> @@ -91,6 +94,135 @@
>     [(set_attr "type" "atomic")
>      (set (attr "length") (const_int 8))])
>   
> +(define_insn "subword_atomic_fetch_strong_<atomic_optab>"
> +  [(set (match_operand:SI 0 "register_operand" "=&r")		   ;; old value at mem
> +	(match_operand:SI 1 "memory_operand" "+A"))		   ;; mem location
> +   (set (match_dup 1)
> +	(unspec_volatile:SI
> +	  [(any_atomic:SI (match_dup 1)
> +		     (match_operand:SI 2 "register_operand" "rI")) ;; value for op
> +	   (match_operand:SI 3 "register_operand" "rI")]	   ;; mask
> +	 UNSPEC_SYNC_OLD_OP_SUBWORD))
> +    (match_operand:SI 4 "register_operand" "rI")		   ;; not_mask
> +    (clobber (match_scratch:SI 5 "=&r"))			   ;; tmp_1
> +    (clobber (match_scratch:SI 6 "=&r"))]			   ;; tmp_2
> +  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
> +  {
> +    return "1:\;"
> +	   "lr.w.aq\t%0, %1\;"
> +	   "<insn>\t%5, %0, %2\;"
> +	   "and\t%5, %5, %3\;"
> +	   "and\t%6, %0, %4\;"
> +	   "or\t%6, %6, %5\;"
> +	   "sc.w.rl\t%5, %6, %1\;"
> +	   "bnez\t%5, 1b";
> +  }
> +  [(set (attr "length") (const_int 28))])
> +
> +(define_expand "atomic_fetch_nand<mode>"
> +  [(match_operand:SHORT 0 "register_operand")			      ;; old value at mem
> +   (not:SHORT (and:SHORT (match_operand:SHORT 1 "memory_operand")     ;; mem location
> +			 (match_operand:SHORT 2 "reg_or_0_operand"))) ;; value for op
> +   (match_operand:SI 3 "const_int_operand")]			      ;; model
> +  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
> +{
> +  /* We have no QImode/HImode atomics, so form a mask, then use
> +     subword_atomic_fetch_strong_nand to implement a LR/SC version of the
> +     operation. */
> +
> +  /* Logic duplicated in gcc/libgcc/config/riscv/atomic.c for use when inlining
> +     is disabled */
> +
> +  rtx old = gen_reg_rtx (SImode);
> +  rtx mem = operands[1];
> +  rtx value = operands[2];
> +  rtx aligned_mem = gen_reg_rtx (SImode);
> +  rtx shift = gen_reg_rtx (SImode);
> +  rtx mask = gen_reg_rtx (SImode);
> +  rtx not_mask = gen_reg_rtx (SImode);
> +
> +  riscv_subword_address (mem, &aligned_mem, &shift, &mask, &not_mask);
> +
> +  rtx shifted_value = gen_reg_rtx (SImode);
> +  riscv_lshift_subword (<MODE>mode, value, shift, &shifted_value);
> +
> +  emit_insn (gen_subword_atomic_fetch_strong_nand (old, aligned_mem,
> +						   shifted_value,
> +						   mask, not_mask));
> +
> +  emit_move_insn (old, gen_rtx_ASHIFTRT (SImode, old,
> +					 gen_lowpart (QImode, shift)));
> +
> +  emit_move_insn (operands[0], gen_lowpart (<MODE>mode, old));
> +
> +  DONE;
> +})
> +
> +(define_insn "subword_atomic_fetch_strong_nand"
> +  [(set (match_operand:SI 0 "register_operand" "=&r")			  ;; old value at mem
> +	(match_operand:SI 1 "memory_operand" "+A"))			  ;; mem location
> +   (set (match_dup 1)
> +	(unspec_volatile:SI
> +	  [(not:SI (and:SI (match_dup 1)
> +			   (match_operand:SI 2 "register_operand" "rI"))) ;; value for op
> +	   (match_operand:SI 3 "register_operand" "rI")]		  ;; mask
> +	 UNSPEC_SYNC_OLD_OP_SUBWORD))
> +    (match_operand:SI 4 "register_operand" "rI")			  ;; not_mask
> +    (clobber (match_scratch:SI 5 "=&r"))				  ;; tmp_1
> +    (clobber (match_scratch:SI 6 "=&r"))]				  ;; tmp_2
> +  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
> +  {
> +    return "1:\;"
> +	   "lr.w.aq\t%0, %1\;"
> +	   "and\t%5, %0, %2\;"
> +	   "not\t%5, %5\;"
> +	   "and\t%5, %5, %3\;"
> +	   "and\t%6, %0, %4\;"
> +	   "or\t%6, %6, %5\;"
> +	   "sc.w.rl\t%5, %6, %1\;"
> +	   "bnez\t%5, 1b";
> +  }
> +  [(set (attr "length") (const_int 32))])
> +
> +(define_expand "atomic_fetch_<atomic_optab><mode>"
> +  [(match_operand:SHORT 0 "register_operand")			 ;; old value at mem
> +   (any_atomic:SHORT (match_operand:SHORT 1 "memory_operand")	 ;; mem location
> +		     (match_operand:SHORT 2 "reg_or_0_operand")) ;; value for op
> +   (match_operand:SI 3 "const_int_operand")]			 ;; model
> +  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
> +{
> +  /* We have no QImode/HImode atomics, so form a mask, then use
> +     subword_atomic_fetch_strong_<mode> to implement a LR/SC version of the
> +     operation. */
> +
> +  /* Logic duplicated in gcc/libgcc/config/riscv/atomic.c for use when inlining
> +     is disabled */
> +
> +  rtx old = gen_reg_rtx (SImode);
> +  rtx mem = operands[1];
> +  rtx value = operands[2];
> +  rtx aligned_mem = gen_reg_rtx (SImode);
> +  rtx shift = gen_reg_rtx (SImode);
> +  rtx mask = gen_reg_rtx (SImode);
> +  rtx not_mask = gen_reg_rtx (SImode);
> +
> +  riscv_subword_address (mem, &aligned_mem, &shift, &mask, &not_mask);
> +
> +  rtx shifted_value = gen_reg_rtx (SImode);
> +  riscv_lshift_subword (<MODE>mode, value, shift, &shifted_value);
> +
> +  emit_insn (gen_subword_atomic_fetch_strong_<atomic_optab> (old, aligned_mem,
> +							     shifted_value,
> +							     mask, not_mask));
> +
> +  emit_move_insn (old, gen_rtx_ASHIFTRT (SImode, old,
> +					 gen_lowpart (QImode, shift)));
> +
> +  emit_move_insn (operands[0], gen_lowpart (<MODE>mode, old));
> +
> +  DONE;
> +})
> +
>   (define_insn "atomic_exchange<mode>"
>     [(set (match_operand:GPR 0 "register_operand" "=&r")
>   	(unspec_volatile:GPR
> @@ -104,6 +236,56 @@
>     [(set_attr "type" "atomic")
>      (set (attr "length") (const_int 8))])
>   
> +(define_expand "atomic_exchange<mode>"
> +  [(match_operand:SHORT 0 "register_operand") ;; old value at mem
> +   (match_operand:SHORT 1 "memory_operand")   ;; mem location
> +   (match_operand:SHORT 2 "register_operand") ;; value
> +   (match_operand:SI 3 "const_int_operand")]  ;; model
> +  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
> +{
> +  rtx old = gen_reg_rtx (SImode);
> +  rtx mem = operands[1];
> +  rtx value = operands[2];
> +  rtx aligned_mem = gen_reg_rtx (SImode);
> +  rtx shift = gen_reg_rtx (SImode);
> +  rtx mask = gen_reg_rtx (SImode);
> +  rtx not_mask = gen_reg_rtx (SImode);
> +
> +  riscv_subword_address (mem, &aligned_mem, &shift, &mask, &not_mask);
> +
> +  rtx shifted_value = gen_reg_rtx (SImode);
> +  riscv_lshift_subword (<MODE>mode, value, shift, &shifted_value);
> +
> +  emit_insn (gen_subword_atomic_exchange_strong (old, aligned_mem,
> +						 shifted_value, not_mask));
> +
> +  emit_move_insn (old, gen_rtx_ASHIFTRT (SImode, old,
> +					 gen_lowpart (QImode, shift)));
> +
> +  emit_move_insn (operands[0], gen_lowpart (<MODE>mode, old));
> +  DONE;
> +})
> +
> +(define_insn "subword_atomic_exchange_strong"
> +  [(set (match_operand:SI 0 "register_operand" "=&r")	 ;; old value at mem
> +	(match_operand:SI 1 "memory_operand" "+A"))	 ;; mem location
> +   (set (match_dup 1)
> +	(unspec_volatile:SI
> +	  [(match_operand:SI 2 "reg_or_0_operand" "rI")  ;; value
> +	   (match_operand:SI 3 "reg_or_0_operand" "rI")] ;; not_mask
> +      UNSPEC_SYNC_EXCHANGE_SUBWORD))
> +    (clobber (match_scratch:SI 4 "=&r"))]		 ;; tmp_1
> +  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
> +  {
> +    return "1:\;"
> +	   "lr.w.aq\t%0, %1\;"
> +	   "and\t%4, %0, %3\;"
> +	   "or\t%4, %4, %2\;"
> +	   "sc.w.rl\t%4, %4, %1\;"
> +	   "bnez\t%4, 1b";
> +  }
> +  [(set (attr "length") (const_int 20))])
> +
>   (define_insn "atomic_cas_value_strong<mode>"
>     [(set (match_operand:GPR 0 "register_operand" "=&r")
>   	(match_operand:GPR 1 "memory_operand" "+A"))
> @@ -153,6 +335,125 @@
>     DONE;
>   })
>   
> +(define_expand "atomic_compare_and_swap<mode>"
> +  [(match_operand:SI 0 "register_operand")    ;; bool output
> +   (match_operand:SHORT 1 "register_operand") ;; val output
> +   (match_operand:SHORT 2 "memory_operand")   ;; memory
> +   (match_operand:SHORT 3 "reg_or_0_operand") ;; expected value
> +   (match_operand:SHORT 4 "reg_or_0_operand") ;; desired value
> +   (match_operand:SI 5 "const_int_operand")   ;; is_weak
> +   (match_operand:SI 6 "const_int_operand")   ;; mod_s
> +   (match_operand:SI 7 "const_int_operand")]  ;; mod_f
> +  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
> +{
> +  emit_insn (gen_atomic_cas_value_strong<mode> (operands[1], operands[2],
> +						operands[3], operands[4],
> +						operands[6], operands[7]));
> +
> +  rtx val = gen_reg_rtx (SImode);
> +  if (operands[1] != const0_rtx)
> +    emit_move_insn (val, gen_rtx_SIGN_EXTEND (SImode, operands[1]));
> +  else
> +    emit_move_insn (val, const0_rtx);
> +
> +  rtx exp = gen_reg_rtx (SImode);
> +  if (operands[3] != const0_rtx)
> +    emit_move_insn (exp, gen_rtx_SIGN_EXTEND (SImode, operands[3]));
> +  else
> +    emit_move_insn (exp, const0_rtx);
> +
> +  rtx compare = val;
> +  if (exp != const0_rtx)
> +    {
> +      rtx difference = gen_rtx_MINUS (SImode, val, exp);
> +      compare = gen_reg_rtx (SImode);
> +      emit_move_insn  (compare, difference);
> +    }
> +
> +  if (word_mode != SImode)
> +    {
> +      rtx reg = gen_reg_rtx (word_mode);
> +      emit_move_insn (reg, gen_rtx_SIGN_EXTEND (word_mode, compare));
> +      compare = reg;
> +    }
> +
> +  emit_move_insn (operands[0], gen_rtx_EQ (SImode, compare, const0_rtx));
> +  DONE;
> +})
> +
> +(define_expand "atomic_cas_value_strong<mode>"
> +  [(match_operand:SHORT 0 "register_operand") ;; val output
> +   (match_operand:SHORT 1 "memory_operand")   ;; memory
> +   (match_operand:SHORT 2 "reg_or_0_operand") ;; expected value
> +   (match_operand:SHORT 3 "reg_or_0_operand") ;; desired value
> +   (match_operand:SI 4 "const_int_operand")   ;; mod_s
> +   (match_operand:SI 5 "const_int_operand")   ;; mod_f
> +   (match_scratch:SHORT 6)]
> +  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
> +{
> +  /* We have no QImode/HImode atomics, so form a mask, then use
> +     subword_atomic_cas_strong<mode> to implement a LR/SC version of the
> +     operation. */
> +
> +  /* Logic duplicated in gcc/libgcc/config/riscv/atomic.c for use when inlining
> +     is disabled */
> +
> +  rtx old = gen_reg_rtx (SImode);
> +  rtx mem = operands[1];
> +  rtx aligned_mem = gen_reg_rtx (SImode);
> +  rtx shift = gen_reg_rtx (SImode);
> +  rtx mask = gen_reg_rtx (SImode);
> +  rtx not_mask = gen_reg_rtx (SImode);
> +
> +  riscv_subword_address (mem, &aligned_mem, &shift, &mask, &not_mask);
> +
> +  rtx o = operands[2];
> +  rtx n = operands[3];
> +  rtx shifted_o = gen_reg_rtx (SImode);
> +  rtx shifted_n = gen_reg_rtx (SImode);
> +
> +  riscv_lshift_subword (<MODE>mode, o, shift, &shifted_o);
> +  riscv_lshift_subword (<MODE>mode, n, shift, &shifted_n);
> +
> +  emit_move_insn (shifted_o, gen_rtx_AND (SImode, shifted_o, mask));
> +  emit_move_insn (shifted_n, gen_rtx_AND (SImode, shifted_n, mask));
> +
> +  emit_insn (gen_subword_atomic_cas_strong (old, aligned_mem,
> +					    shifted_o, shifted_n,
> +					    mask, not_mask));
> +
> +  emit_move_insn (old, gen_rtx_ASHIFTRT (SImode, old,
> +					 gen_lowpart (QImode, shift)));
> +
> +  emit_move_insn (operands[0], gen_lowpart (<MODE>mode, old));
> +
> +  DONE;
> +})
> +
> +(define_insn "subword_atomic_cas_strong"
> +  [(set (match_operand:SI 0 "register_operand" "=&r")			   ;; old value at mem
> +	(match_operand:SI 1 "memory_operand" "+A"))			   ;; mem location
> +   (set (match_dup 1)
> +	(unspec_volatile:SI [(match_operand:SI 2 "reg_or_0_operand" "rJ")  ;; expected value
> +			     (match_operand:SI 3 "reg_or_0_operand" "rJ")] ;; desired value
> +	 UNSPEC_COMPARE_AND_SWAP_SUBWORD))
> +	(match_operand:SI 4 "register_operand" "rI")			   ;; mask
> +	(match_operand:SI 5 "register_operand" "rI")			   ;; not_mask
> +	(clobber (match_scratch:SI 6 "=&r"))]				   ;; tmp_1
> +  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
> +  {
> +    return "1:\;"
> +	   "lr.w.aq\t%0, %1\;"
> +	   "and\t%6, %0, %4\;"
> +	   "bne\t%6, %z2, 1f\;"
> +	   "and\t%6, %0, %5\;"
> +	   "or\t%6, %6, %3\;"
> +	   "sc.w.rl\t%6, %6, %1\;"
> +	   "bnez\t%6, 1b\;"
> +	   "1:";
> +  }
> +  [(set (attr "length") (const_int 28))])
> +
>   (define_expand "atomic_test_and_set"
>     [(match_operand:QI 0 "register_operand" "")     ;; bool output
>      (match_operand:QI 1 "memory_operand" "+A") ;; memory diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi 
> index a38547f53e5..ba448dcb7ef 100644 --- a/gcc/doc/invoke.texi +++ 
> b/gcc/doc/invoke.texi @@ -1226,7 +1226,8 @@ See RS/6000 and PowerPC 
> Options. -mbig-endian -mlittle-endian 
> -mstack-protector-guard=@var{guard} 
> -mstack-protector-guard-reg=@var{reg} 
> -mstack-protector-guard-offset=@var{offset} --mcsr-check 
> -mno-csr-check} +-mcsr-check -mno-csr-check +-minline-atomics 
> -mno-inline-atomics} @emph{RL78 Options} @gccoptlist{-msim -mmul=none 
> -mmul=g13 -mmul=g14 -mallregs @@ -29006,6 +29007,13 @@ Do or don't use 
> smaller but slower prologue and epilogue code that uses library 
> function calls. The default is to use fast inline prologues and 
> epilogues. +@opindex minline-atomics +@item -minline-atomics +@itemx 
> -mno-inline-atomics +Do or don't use smaller but slower subword atomic 
> emulation code that uses +libatomic function calls. The default is to 
> use fast inline subword atomics +that do not require libatomic. + 
> @opindex mshorten-memrefs @item -mshorten-memrefs @itemx 
> -mno-shorten-memrefs diff --git 
> a/gcc/testsuite/gcc.target/riscv/inline-atomics-1.c 
> b/gcc/testsuite/gcc.target/riscv/inline-atomics-1.c new file mode 
> 100644 index 00000000000..5c5623d9b2f --- /dev/null +++ 
> b/gcc/testsuite/gcc.target/riscv/inline-atomics-1.c @@ -0,0 +1,18 @@ 
> +/* { dg-do compile } */ +/* { dg-options "-mno-inline-atomics" } */
> +/* { dg-message "note: '__sync_fetch_and_nand' changed semantics in GCC 4.4" "fetch_and_nand" { target *-*-* } 0 } */
> +/* { dg-final { scan-assembler "\tcall\t__sync_fetch_and_add_1" } } */
> +/* { dg-final { scan-assembler "\tcall\t__sync_fetch_and_nand_1" } } */
> +/* { dg-final { scan-assembler "\tcall\t__sync_bool_compare_and_swap_1" } } */
> +
> +char foo;
> +char bar;
> +char baz;
> +
> +int
> +main ()
> +{
> +  __sync_fetch_and_add(&foo, 1);
> +  __sync_fetch_and_nand(&bar, 1);
> +  __sync_bool_compare_and_swap (&baz, 1, 2);
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-2.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-2.c
> new file mode 100644
> index 00000000000..01b43908692
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-2.c
> @@ -0,0 +1,9 @@
> +/* { dg-do compile } */
> +/* Verify that subword atomics do not generate calls.  */
> +/* { dg-options "-minline-atomics" } */
> +/* { dg-message "note: '__sync_fetch_and_nand' changed semantics in GCC 4.4" "fetch_and_nand" { target *-*-* } 0 } */
> +/* { dg-final { scan-assembler-not "\tcall\t__sync_fetch_and_add_1" } } */
> +/* { dg-final { scan-assembler-not "\tcall\t__sync_fetch_and_nand_1" } } */
> +/* { dg-final { scan-assembler-not "\tcall\t__sync_bool_compare_and_swap_1" } } */
> +
> +#include "inline-atomics-1.c"
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-3.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-3.c
> new file mode 100644
> index 00000000000..709f3734377
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-3.c
> @@ -0,0 +1,569 @@
> +/* Check all char alignments.  */
> +/* Duplicate logic as libatomic/testsuite/libatomic.c/atomic-op-1.c */
> +/* Test __atomic routines for existence and proper execution on 1 byte
> +   values with each valid memory model.  */
> +/* { dg-do run } */
> +/* { dg-options "-minline-atomics -Wno-address-of-packed-member" } */
> +
> +/* Test the execution of the __atomic_*OP builtin routines for a char.  */
> +
> +extern void abort(void);
> +
> +char count, res;
> +const char init = ~0;
> +
> +struct A
> +{
> +   char a;
> +   char b;
> +   char c;
> +   char d;
> +} __attribute__ ((packed)) A;
> +
> +/* The fetch_op routines return the original value before the operation.  */
> +
> +void
> +test_fetch_add (char* v)
> +{
> +  *v = 0;
> +  count = 1;
> +
> +  if (__atomic_fetch_add (v, count, __ATOMIC_RELAXED) != 0)
> +    abort ();
> +
> +  if (__atomic_fetch_add (v, 1, __ATOMIC_CONSUME) != 1)
> +    abort ();
> +
> +  if (__atomic_fetch_add (v, count, __ATOMIC_ACQUIRE) != 2)
> +    abort ();
> +
> +  if (__atomic_fetch_add (v, 1, __ATOMIC_RELEASE) != 3)
> +    abort ();
> +
> +  if (__atomic_fetch_add (v, count, __ATOMIC_ACQ_REL) != 4)
> +    abort ();
> +
> +  if (__atomic_fetch_add (v, 1, __ATOMIC_SEQ_CST) != 5)
> +    abort ();
> +}
> +
> +
> +void
> +test_fetch_sub (char* v)
> +{
> +  *v = res = 20;
> +  count = 0;
> +
> +  if (__atomic_fetch_sub (v, count + 1, __ATOMIC_RELAXED) !=  res--)
> +    abort ();
> +
> +  if (__atomic_fetch_sub (v, 1, __ATOMIC_CONSUME) !=  res--)
> +    abort ();
> +
> +  if (__atomic_fetch_sub (v, count + 1, __ATOMIC_ACQUIRE) !=  res--)
> +    abort ();
> +
> +  if (__atomic_fetch_sub (v, 1, __ATOMIC_RELEASE) !=  res--)
> +    abort ();
> +
> +  if (__atomic_fetch_sub (v, count + 1, __ATOMIC_ACQ_REL) !=  res--)
> +    abort ();
> +
> +  if (__atomic_fetch_sub (v, 1, __ATOMIC_SEQ_CST) !=  res--)
> +    abort ();
> +}
> +
> +void
> +test_fetch_and (char* v)
> +{
> +  *v = init;
> +
> +  if (__atomic_fetch_and (v, 0, __ATOMIC_RELAXED) !=  init)
> +    abort ();
> +
> +  if (__atomic_fetch_and (v, init, __ATOMIC_CONSUME) !=  0)
> +    abort ();
> +
> +  if (__atomic_fetch_and (v, 0, __ATOMIC_ACQUIRE) !=  0)
> +    abort ();
> +
> +  *v = ~*v;
> +  if (__atomic_fetch_and (v, init, __ATOMIC_RELEASE) !=  init)
> +    abort ();
> +
> +  if (__atomic_fetch_and (v, 0, __ATOMIC_ACQ_REL) !=  init)
> +    abort ();
> +
> +  if (__atomic_fetch_and (v, 0, __ATOMIC_SEQ_CST) !=  0)
> +    abort ();
> +}
> +
> +void
> +test_fetch_nand (char* v)
> +{
> +  *v = init;
> +
> +  if (__atomic_fetch_nand (v, 0, __ATOMIC_RELAXED) !=  init)
> +    abort ();
> +
> +  if (__atomic_fetch_nand (v, init, __ATOMIC_CONSUME) !=  init)
> +    abort ();
> +
> +  if (__atomic_fetch_nand (v, 0, __ATOMIC_ACQUIRE) !=  0 )
> +    abort ();
> +
> +  if (__atomic_fetch_nand (v, init, __ATOMIC_RELEASE) !=  init)
> +    abort ();
> +
> +  if (__atomic_fetch_nand (v, init, __ATOMIC_ACQ_REL) !=  0)
> +    abort ();
> +
> +  if (__atomic_fetch_nand (v, 0, __ATOMIC_SEQ_CST) !=  init)
> +    abort ();
> +}
> +
> +void
> +test_fetch_xor (char* v)
> +{
> +  *v = init;
> +  count = 0;
> +
> +  if (__atomic_fetch_xor (v, count, __ATOMIC_RELAXED) !=  init)
> +    abort ();
> +
> +  if (__atomic_fetch_xor (v, ~count, __ATOMIC_CONSUME) !=  init)
> +    abort ();
> +
> +  if (__atomic_fetch_xor (v, 0, __ATOMIC_ACQUIRE) !=  0)
> +    abort ();
> +
> +  if (__atomic_fetch_xor (v, ~count, __ATOMIC_RELEASE) !=  0)
> +    abort ();
> +
> +  if (__atomic_fetch_xor (v, 0, __ATOMIC_ACQ_REL) !=  init)
> +    abort ();
> +
> +  if (__atomic_fetch_xor (v, ~count, __ATOMIC_SEQ_CST) !=  init)
> +    abort ();
> +}
> +
> +void
> +test_fetch_or (char* v)
> +{
> +  *v = 0;
> +  count = 1;
> +
> +  if (__atomic_fetch_or (v, count, __ATOMIC_RELAXED) !=  0)
> +    abort ();
> +
> +  count *= 2;
> +  if (__atomic_fetch_or (v, 2, __ATOMIC_CONSUME) !=  1)
> +    abort ();
> +
> +  count *= 2;
> +  if (__atomic_fetch_or (v, count, __ATOMIC_ACQUIRE) !=  3)
> +    abort ();
> +
> +  count *= 2;
> +  if (__atomic_fetch_or (v, 8, __ATOMIC_RELEASE) !=  7)
> +    abort ();
> +
> +  count *= 2;
> +  if (__atomic_fetch_or (v, count, __ATOMIC_ACQ_REL) !=  15)
> +    abort ();
> +
> +  count *= 2;
> +  if (__atomic_fetch_or (v, count, __ATOMIC_SEQ_CST) !=  31)
> +    abort ();
> +}
> +
> +/* The OP_fetch routines return the new value after the operation.  */
> +
> +void
> +test_add_fetch (char* v)
> +{
> +  *v = 0;
> +  count = 1;
> +
> +  if (__atomic_add_fetch (v, count, __ATOMIC_RELAXED) != 1)
> +    abort ();
> +
> +  if (__atomic_add_fetch (v, 1, __ATOMIC_CONSUME) != 2)
> +    abort ();
> +
> +  if (__atomic_add_fetch (v, count, __ATOMIC_ACQUIRE) != 3)
> +    abort ();
> +
> +  if (__atomic_add_fetch (v, 1, __ATOMIC_RELEASE) != 4)
> +    abort ();
> +
> +  if (__atomic_add_fetch (v, count, __ATOMIC_ACQ_REL) != 5)
> +    abort ();
> +
> +  if (__atomic_add_fetch (v, count, __ATOMIC_SEQ_CST) != 6)
> +    abort ();
> +}
> +
> +
> +void
> +test_sub_fetch (char* v)
> +{
> +  *v = res = 20;
> +  count = 0;
> +
> +  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_RELAXED) !=  --res)
> +    abort ();
> +
> +  if (__atomic_sub_fetch (v, 1, __ATOMIC_CONSUME) !=  --res)
> +    abort ();
> +
> +  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_ACQUIRE) !=  --res)
> +    abort ();
> +
> +  if (__atomic_sub_fetch (v, 1, __ATOMIC_RELEASE) !=  --res)
> +    abort ();
> +
> +  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_ACQ_REL) !=  --res)
> +    abort ();
> +
> +  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_SEQ_CST) !=  --res)
> +    abort ();
> +}
> +
> +void
> +test_and_fetch (char* v)
> +{
> +  *v = init;
> +
> +  if (__atomic_and_fetch (v, 0, __ATOMIC_RELAXED) !=  0)
> +    abort ();
> +
> +  *v = init;
> +  if (__atomic_and_fetch (v, init, __ATOMIC_CONSUME) !=  init)
> +    abort ();
> +
> +  if (__atomic_and_fetch (v, 0, __ATOMIC_ACQUIRE) !=  0)
> +    abort ();
> +
> +  *v = ~*v;
> +  if (__atomic_and_fetch (v, init, __ATOMIC_RELEASE) !=  init)
> +    abort ();
> +
> +  if (__atomic_and_fetch (v, 0, __ATOMIC_ACQ_REL) !=  0)
> +    abort ();
> +
> +  *v = ~*v;
> +  if (__atomic_and_fetch (v, 0, __ATOMIC_SEQ_CST) !=  0)
> +    abort ();
> +}
> +
> +void
> +test_nand_fetch (char* v)
> +{
> +  *v = init;
> +
> +  if (__atomic_nand_fetch (v, 0, __ATOMIC_RELAXED) !=  init)
> +    abort ();
> +
> +  if (__atomic_nand_fetch (v, init, __ATOMIC_CONSUME) !=  0)
> +    abort ();
> +
> +  if (__atomic_nand_fetch (v, 0, __ATOMIC_ACQUIRE) !=  init)
> +    abort ();
> +
> +  if (__atomic_nand_fetch (v, init, __ATOMIC_RELEASE) !=  0)
> +    abort ();
> +
> +  if (__atomic_nand_fetch (v, init, __ATOMIC_ACQ_REL) !=  init)
> +    abort ();
> +
> +  if (__atomic_nand_fetch (v, 0, __ATOMIC_SEQ_CST) !=  init)
> +    abort ();
> +}
> +
> +
> +
> +void
> +test_xor_fetch (char* v)
> +{
> +  *v = init;
> +  count = 0;
> +
> +  if (__atomic_xor_fetch (v, count, __ATOMIC_RELAXED) !=  init)
> +    abort ();
> +
> +  if (__atomic_xor_fetch (v, ~count, __ATOMIC_CONSUME) !=  0)
> +    abort ();
> +
> +  if (__atomic_xor_fetch (v, 0, __ATOMIC_ACQUIRE) !=  0)
> +    abort ();
> +
> +  if (__atomic_xor_fetch (v, ~count, __ATOMIC_RELEASE) !=  init)
> +    abort ();
> +
> +  if (__atomic_xor_fetch (v, 0, __ATOMIC_ACQ_REL) !=  init)
> +    abort ();
> +
> +  if (__atomic_xor_fetch (v, ~count, __ATOMIC_SEQ_CST) !=  0)
> +    abort ();
> +}
> +
> +void
> +test_or_fetch (char* v)
> +{
> +  *v = 0;
> +  count = 1;
> +
> +  if (__atomic_or_fetch (v, count, __ATOMIC_RELAXED) !=  1)
> +    abort ();
> +
> +  count *= 2;
> +  if (__atomic_or_fetch (v, 2, __ATOMIC_CONSUME) !=  3)
> +    abort ();
> +
> +  count *= 2;
> +  if (__atomic_or_fetch (v, count, __ATOMIC_ACQUIRE) !=  7)
> +    abort ();
> +
> +  count *= 2;
> +  if (__atomic_or_fetch (v, 8, __ATOMIC_RELEASE) !=  15)
> +    abort ();
> +
> +  count *= 2;
> +  if (__atomic_or_fetch (v, count, __ATOMIC_ACQ_REL) !=  31)
> +    abort ();
> +
> +  count *= 2;
> +  if (__atomic_or_fetch (v, count, __ATOMIC_SEQ_CST) !=  63)
> +    abort ();
> +}
> +
> +
> +/* Test the OP routines with a result which isn't used. Use both variations
> +   within each function.  */
> +
> +void
> +test_add (char* v)
> +{
> +  *v = 0;
> +  count = 1;
> +
> +  __atomic_add_fetch (v, count, __ATOMIC_RELAXED);
> +  if (*v != 1)
> +    abort ();
> +
> +  __atomic_fetch_add (v, count, __ATOMIC_CONSUME);
> +  if (*v != 2)
> +    abort ();
> +
> +  __atomic_add_fetch (v, 1 , __ATOMIC_ACQUIRE);
> +  if (*v != 3)
> +    abort ();
> +
> +  __atomic_fetch_add (v, 1, __ATOMIC_RELEASE);
> +  if (*v != 4)
> +    abort ();
> +
> +  __atomic_add_fetch (v, count, __ATOMIC_ACQ_REL);
> +  if (*v != 5)
> +    abort ();
> +
> +  __atomic_fetch_add (v, count, __ATOMIC_SEQ_CST);
> +  if (*v != 6)
> +    abort ();
> +}
> +
> +
> +void
> +test_sub (char* v)
> +{
> +  *v = res = 20;
> +  count = 0;
> +
> +  __atomic_sub_fetch (v, count + 1, __ATOMIC_RELAXED);
> +  if (*v != --res)
> +    abort ();
> +
> +  __atomic_fetch_sub (v, count + 1, __ATOMIC_CONSUME);
> +  if (*v != --res)
> +    abort ();
> +
> +  __atomic_sub_fetch (v, 1, __ATOMIC_ACQUIRE);
> +  if (*v != --res)
> +    abort ();
> +
> +  __atomic_fetch_sub (v, 1, __ATOMIC_RELEASE);
> +  if (*v != --res)
> +    abort ();
> +
> +  __atomic_sub_fetch (v, count + 1, __ATOMIC_ACQ_REL);
> +  if (*v != --res)
> +    abort ();
> +
> +  __atomic_fetch_sub (v, count + 1, __ATOMIC_SEQ_CST);
> +  if (*v != --res)
> +    abort ();
> +}
> +
> +void
> +test_and (char* v)
> +{
> +  *v = init;
> +
> +  __atomic_and_fetch (v, 0, __ATOMIC_RELAXED);
> +  if (*v != 0)
> +    abort ();
> +
> +  *v = init;
> +  __atomic_fetch_and (v, init, __ATOMIC_CONSUME);
> +  if (*v != init)
> +    abort ();
> +
> +  __atomic_and_fetch (v, 0, __ATOMIC_ACQUIRE);
> +  if (*v != 0)
> +    abort ();
> +
> +  *v = ~*v;
> +  __atomic_fetch_and (v, init, __ATOMIC_RELEASE);
> +  if (*v != init)
> +    abort ();
> +
> +  __atomic_and_fetch (v, 0, __ATOMIC_ACQ_REL);
> +  if (*v != 0)
> +    abort ();
> +
> +  *v = ~*v;
> +  __atomic_fetch_and (v, 0, __ATOMIC_SEQ_CST);
> +  if (*v != 0)
> +    abort ();
> +}
> +
> +void
> +test_nand (char* v)
> +{
> +  *v = init;
> +
> +  __atomic_fetch_nand (v, 0, __ATOMIC_RELAXED);
> +  if (*v != init)
> +    abort ();
> +
> +  __atomic_fetch_nand (v, init, __ATOMIC_CONSUME);
> +  if (*v != 0)
> +    abort ();
> +
> +  __atomic_nand_fetch (v, 0, __ATOMIC_ACQUIRE);
> +  if (*v != init)
> +    abort ();
> +
> +  __atomic_nand_fetch (v, init, __ATOMIC_RELEASE);
> +  if (*v != 0)
> +    abort ();
> +
> +  __atomic_fetch_nand (v, init, __ATOMIC_ACQ_REL);
> +  if (*v != init)
> +    abort ();
> +
> +  __atomic_nand_fetch (v, 0, __ATOMIC_SEQ_CST);
> +  if (*v != init)
> +    abort ();
> +}
> +
> +
> +
> +void
> +test_xor (char* v)
> +{
> +  *v = init;
> +  count = 0;
> +
> +  __atomic_xor_fetch (v, count, __ATOMIC_RELAXED);
> +  if (*v != init)
> +    abort ();
> +
> +  __atomic_fetch_xor (v, ~count, __ATOMIC_CONSUME);
> +  if (*v != 0)
> +    abort ();
> +
> +  __atomic_xor_fetch (v, 0, __ATOMIC_ACQUIRE);
> +  if (*v != 0)
> +    abort ();
> +
> +  __atomic_fetch_xor (v, ~count, __ATOMIC_RELEASE);
> +  if (*v != init)
> +    abort ();
> +
> +  __atomic_fetch_xor (v, 0, __ATOMIC_ACQ_REL);
> +  if (*v != init)
> +    abort ();
> +
> +  __atomic_xor_fetch (v, ~count, __ATOMIC_SEQ_CST);
> +  if (*v != 0)
> +    abort ();
> +}
> +
> +void
> +test_or (char* v)
> +{
> +  *v = 0;
> +  count = 1;
> +
> +  __atomic_or_fetch (v, count, __ATOMIC_RELAXED);
> +  if (*v != 1)
> +    abort ();
> +
> +  count *= 2;
> +  __atomic_fetch_or (v, count, __ATOMIC_CONSUME);
> +  if (*v != 3)
> +    abort ();
> +
> +  count *= 2;
> +  __atomic_or_fetch (v, 4, __ATOMIC_ACQUIRE);
> +  if (*v != 7)
> +    abort ();
> +
> +  count *= 2;
> +  __atomic_fetch_or (v, 8, __ATOMIC_RELEASE);
> +  if (*v != 15)
> +    abort ();
> +
> +  count *= 2;
> +  __atomic_or_fetch (v, count, __ATOMIC_ACQ_REL);
> +  if (*v != 31)
> +    abort ();
> +
> +  count *= 2;
> +  __atomic_fetch_or (v, count, __ATOMIC_SEQ_CST);
> +  if (*v != 63)
> +    abort ();
> +}
> +
> +int
> +main ()
> +{
> +  char* V[] = {&A.a, &A.b, &A.c, &A.d};
> +
> +  for (int i = 0; i < 4; i++) {
> +    test_fetch_add (V[i]);
> +    test_fetch_sub (V[i]);
> +    test_fetch_and (V[i]);
> +    test_fetch_nand (V[i]);
> +    test_fetch_xor (V[i]);
> +    test_fetch_or (V[i]);
> +
> +    test_add_fetch (V[i]);
> +    test_sub_fetch (V[i]);
> +    test_and_fetch (V[i]);
> +    test_nand_fetch (V[i]);
> +    test_xor_fetch (V[i]);
> +    test_or_fetch (V[i]);
> +
> +    test_add (V[i]);
> +    test_sub (V[i]);
> +    test_and (V[i]);
> +    test_nand (V[i]);
> +    test_xor (V[i]);
> +    test_or (V[i]);
> +  }
> +
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-4.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-4.c
> new file mode 100644
> index 00000000000..eecfaae5cc6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-4.c
> @@ -0,0 +1,566 @@
> +/* Check all short alignments.  */
> +/* Duplicate logic as libatomic/testsuite/libatomic.c/atomic-op-2.c */
> +/* Test __atomic routines for existence and proper execution on 2 byte
> +   values with each valid memory model.  */
> +/* { dg-do run } */
> +/* { dg-options "-minline-atomics -Wno-address-of-packed-member" } */
> +
> +/* Test the execution of the __atomic_*OP builtin routines for a short.  */
> +
> +extern void abort(void);
> +
> +short count, res;
> +const short init = ~0;
> +
> +struct A
> +{
> +   short a;
> +   short b;
> +} __attribute__ ((packed)) A;
> +
> +/* The fetch_op routines return the original value before the operation.  */
> +
> +void
> +test_fetch_add (short* v)
> +{
> +  *v = 0;
> +  count = 1;
> +
> +  if (__atomic_fetch_add (v, count, __ATOMIC_RELAXED) != 0)
> +    abort ();
> +
> +  if (__atomic_fetch_add (v, 1, __ATOMIC_CONSUME) != 1)
> +    abort ();
> +
> +  if (__atomic_fetch_add (v, count, __ATOMIC_ACQUIRE) != 2)
> +    abort ();
> +
> +  if (__atomic_fetch_add (v, 1, __ATOMIC_RELEASE) != 3)
> +    abort ();
> +
> +  if (__atomic_fetch_add (v, count, __ATOMIC_ACQ_REL) != 4)
> +    abort ();
> +
> +  if (__atomic_fetch_add (v, 1, __ATOMIC_SEQ_CST) != 5)
> +    abort ();
> +}
> +
> +
> +void
> +test_fetch_sub (short* v)
> +{
> +  *v = res = 20;
> +  count = 0;
> +
> +  if (__atomic_fetch_sub (v, count + 1, __ATOMIC_RELAXED) !=  res--)
> +    abort ();
> +
> +  if (__atomic_fetch_sub (v, 1, __ATOMIC_CONSUME) !=  res--)
> +    abort ();
> +
> +  if (__atomic_fetch_sub (v, count + 1, __ATOMIC_ACQUIRE) !=  res--)
> +    abort ();
> +
> +  if (__atomic_fetch_sub (v, 1, __ATOMIC_RELEASE) !=  res--)
> +    abort ();
> +
> +  if (__atomic_fetch_sub (v, count + 1, __ATOMIC_ACQ_REL) !=  res--)
> +    abort ();
> +
> +  if (__atomic_fetch_sub (v, 1, __ATOMIC_SEQ_CST) !=  res--)
> +    abort ();
> +}
> +
> +void
> +test_fetch_and (short* v)
> +{
> +  *v = init;
> +
> +  if (__atomic_fetch_and (v, 0, __ATOMIC_RELAXED) !=  init)
> +    abort ();
> +
> +  if (__atomic_fetch_and (v, init, __ATOMIC_CONSUME) !=  0)
> +    abort ();
> +
> +  if (__atomic_fetch_and (v, 0, __ATOMIC_ACQUIRE) !=  0)
> +    abort ();
> +
> +  *v = ~*v;
> +  if (__atomic_fetch_and (v, init, __ATOMIC_RELEASE) !=  init)
> +    abort ();
> +
> +  if (__atomic_fetch_and (v, 0, __ATOMIC_ACQ_REL) !=  init)
> +    abort ();
> +
> +  if (__atomic_fetch_and (v, 0, __ATOMIC_SEQ_CST) !=  0)
> +    abort ();
> +}
> +
> +void
> +test_fetch_nand (short* v)
> +{
> +  *v = init;
> +
> +  if (__atomic_fetch_nand (v, 0, __ATOMIC_RELAXED) !=  init)
> +    abort ();
> +
> +  if (__atomic_fetch_nand (v, init, __ATOMIC_CONSUME) !=  init)
> +    abort ();
> +
> +  if (__atomic_fetch_nand (v, 0, __ATOMIC_ACQUIRE) !=  0 )
> +    abort ();
> +
> +  if (__atomic_fetch_nand (v, init, __ATOMIC_RELEASE) !=  init)
> +    abort ();
> +
> +  if (__atomic_fetch_nand (v, init, __ATOMIC_ACQ_REL) !=  0)
> +    abort ();
> +
> +  if (__atomic_fetch_nand (v, 0, __ATOMIC_SEQ_CST) !=  init)
> +    abort ();
> +}
> +
> +void
> +test_fetch_xor (short* v)
> +{
> +  *v = init;
> +  count = 0;
> +
> +  if (__atomic_fetch_xor (v, count, __ATOMIC_RELAXED) !=  init)
> +    abort ();
> +
> +  if (__atomic_fetch_xor (v, ~count, __ATOMIC_CONSUME) !=  init)
> +    abort ();
> +
> +  if (__atomic_fetch_xor (v, 0, __ATOMIC_ACQUIRE) !=  0)
> +    abort ();
> +
> +  if (__atomic_fetch_xor (v, ~count, __ATOMIC_RELEASE) !=  0)
> +    abort ();
> +
> +  if (__atomic_fetch_xor (v, 0, __ATOMIC_ACQ_REL) !=  init)
> +    abort ();
> +
> +  if (__atomic_fetch_xor (v, ~count, __ATOMIC_SEQ_CST) !=  init)
> +    abort ();
> +}
> +
> +void
> +test_fetch_or (short* v)
> +{
> +  *v = 0;
> +  count = 1;
> +
> +  if (__atomic_fetch_or (v, count, __ATOMIC_RELAXED) !=  0)
> +    abort ();
> +
> +  count *= 2;
> +  if (__atomic_fetch_or (v, 2, __ATOMIC_CONSUME) !=  1)
> +    abort ();
> +
> +  count *= 2;
> +  if (__atomic_fetch_or (v, count, __ATOMIC_ACQUIRE) !=  3)
> +    abort ();
> +
> +  count *= 2;
> +  if (__atomic_fetch_or (v, 8, __ATOMIC_RELEASE) !=  7)
> +    abort ();
> +
> +  count *= 2;
> +  if (__atomic_fetch_or (v, count, __ATOMIC_ACQ_REL) !=  15)
> +    abort ();
> +
> +  count *= 2;
> +  if (__atomic_fetch_or (v, count, __ATOMIC_SEQ_CST) !=  31)
> +    abort ();
> +}
> +
> +/* The OP_fetch routines return the new value after the operation.  */
> +
> +void
> +test_add_fetch (short* v)
> +{
> +  *v = 0;
> +  count = 1;
> +
> +  if (__atomic_add_fetch (v, count, __ATOMIC_RELAXED) != 1)
> +    abort ();
> +
> +  if (__atomic_add_fetch (v, 1, __ATOMIC_CONSUME) != 2)
> +    abort ();
> +
> +  if (__atomic_add_fetch (v, count, __ATOMIC_ACQUIRE) != 3)
> +    abort ();
> +
> +  if (__atomic_add_fetch (v, 1, __ATOMIC_RELEASE) != 4)
> +    abort ();
> +
> +  if (__atomic_add_fetch (v, count, __ATOMIC_ACQ_REL) != 5)
> +    abort ();
> +
> +  if (__atomic_add_fetch (v, count, __ATOMIC_SEQ_CST) != 6)
> +    abort ();
> +}
> +
> +
> +void
> +test_sub_fetch (short* v)
> +{
> +  *v = res = 20;
> +  count = 0;
> +
> +  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_RELAXED) !=  --res)
> +    abort ();
> +
> +  if (__atomic_sub_fetch (v, 1, __ATOMIC_CONSUME) !=  --res)
> +    abort ();
> +
> +  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_ACQUIRE) !=  --res)
> +    abort ();
> +
> +  if (__atomic_sub_fetch (v, 1, __ATOMIC_RELEASE) !=  --res)
> +    abort ();
> +
> +  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_ACQ_REL) !=  --res)
> +    abort ();
> +
> +  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_SEQ_CST) !=  --res)
> +    abort ();
> +}
> +
> +void
> +test_and_fetch (short* v)
> +{
> +  *v = init;
> +
> +  if (__atomic_and_fetch (v, 0, __ATOMIC_RELAXED) !=  0)
> +    abort ();
> +
> +  *v = init;
> +  if (__atomic_and_fetch (v, init, __ATOMIC_CONSUME) !=  init)
> +    abort ();
> +
> +  if (__atomic_and_fetch (v, 0, __ATOMIC_ACQUIRE) !=  0)
> +    abort ();
> +
> +  *v = ~*v;
> +  if (__atomic_and_fetch (v, init, __ATOMIC_RELEASE) !=  init)
> +    abort ();
> +
> +  if (__atomic_and_fetch (v, 0, __ATOMIC_ACQ_REL) !=  0)
> +    abort ();
> +
> +  *v = ~*v;
> +  if (__atomic_and_fetch (v, 0, __ATOMIC_SEQ_CST) !=  0)
> +    abort ();
> +}
> +
> +void
> +test_nand_fetch (short* v)
> +{
> +  *v = init;
> +
> +  if (__atomic_nand_fetch (v, 0, __ATOMIC_RELAXED) !=  init)
> +    abort ();
> +
> +  if (__atomic_nand_fetch (v, init, __ATOMIC_CONSUME) !=  0)
> +    abort ();
> +
> +  if (__atomic_nand_fetch (v, 0, __ATOMIC_ACQUIRE) !=  init)
> +    abort ();
> +
> +  if (__atomic_nand_fetch (v, init, __ATOMIC_RELEASE) !=  0)
> +    abort ();
> +
> +  if (__atomic_nand_fetch (v, init, __ATOMIC_ACQ_REL) !=  init)
> +    abort ();
> +
> +  if (__atomic_nand_fetch (v, 0, __ATOMIC_SEQ_CST) !=  init)
> +    abort ();
> +}
> +
> +
> +
> +void
> +test_xor_fetch (short* v)
> +{
> +  *v = init;
> +  count = 0;
> +
> +  if (__atomic_xor_fetch (v, count, __ATOMIC_RELAXED) !=  init)
> +    abort ();
> +
> +  if (__atomic_xor_fetch (v, ~count, __ATOMIC_CONSUME) !=  0)
> +    abort ();
> +
> +  if (__atomic_xor_fetch (v, 0, __ATOMIC_ACQUIRE) !=  0)
> +    abort ();
> +
> +  if (__atomic_xor_fetch (v, ~count, __ATOMIC_RELEASE) !=  init)
> +    abort ();
> +
> +  if (__atomic_xor_fetch (v, 0, __ATOMIC_ACQ_REL) !=  init)
> +    abort ();
> +
> +  if (__atomic_xor_fetch (v, ~count, __ATOMIC_SEQ_CST) !=  0)
> +    abort ();
> +}
> +
> +void
> +test_or_fetch (short* v)
> +{
> +  *v = 0;
> +  count = 1;
> +
> +  if (__atomic_or_fetch (v, count, __ATOMIC_RELAXED) !=  1)
> +    abort ();
> +
> +  count *= 2;
> +  if (__atomic_or_fetch (v, 2, __ATOMIC_CONSUME) !=  3)
> +    abort ();
> +
> +  count *= 2;
> +  if (__atomic_or_fetch (v, count, __ATOMIC_ACQUIRE) !=  7)
> +    abort ();
> +
> +  count *= 2;
> +  if (__atomic_or_fetch (v, 8, __ATOMIC_RELEASE) !=  15)
> +    abort ();
> +
> +  count *= 2;
> +  if (__atomic_or_fetch (v, count, __ATOMIC_ACQ_REL) !=  31)
> +    abort ();
> +
> +  count *= 2;
> +  if (__atomic_or_fetch (v, count, __ATOMIC_SEQ_CST) !=  63)
> +    abort ();
> +}
> +
> +
> +/* Test the OP routines with a result which isn't used. Use both variations
> +   within each function.  */
> +
> +void
> +test_add (short* v)
> +{
> +  *v = 0;
> +  count = 1;
> +
> +  __atomic_add_fetch (v, count, __ATOMIC_RELAXED);
> +  if (*v != 1)
> +    abort ();
> +
> +  __atomic_fetch_add (v, count, __ATOMIC_CONSUME);
> +  if (*v != 2)
> +    abort ();
> +
> +  __atomic_add_fetch (v, 1 , __ATOMIC_ACQUIRE);
> +  if (*v != 3)
> +    abort ();
> +
> +  __atomic_fetch_add (v, 1, __ATOMIC_RELEASE);
> +  if (*v != 4)
> +    abort ();
> +
> +  __atomic_add_fetch (v, count, __ATOMIC_ACQ_REL);
> +  if (*v != 5)
> +    abort ();
> +
> +  __atomic_fetch_add (v, count, __ATOMIC_SEQ_CST);
> +  if (*v != 6)
> +    abort ();
> +}
> +
> +
> +void
> +test_sub (short* v)
> +{
> +  *v = res = 20;
> +  count = 0;
> +
> +  __atomic_sub_fetch (v, count + 1, __ATOMIC_RELAXED);
> +  if (*v != --res)
> +    abort ();
> +
> +  __atomic_fetch_sub (v, count + 1, __ATOMIC_CONSUME);
> +  if (*v != --res)
> +    abort ();
> +
> +  __atomic_sub_fetch (v, 1, __ATOMIC_ACQUIRE);
> +  if (*v != --res)
> +    abort ();
> +
> +  __atomic_fetch_sub (v, 1, __ATOMIC_RELEASE);
> +  if (*v != --res)
> +    abort ();
> +
> +  __atomic_sub_fetch (v, count + 1, __ATOMIC_ACQ_REL);
> +  if (*v != --res)
> +    abort ();
> +
> +  __atomic_fetch_sub (v, count + 1, __ATOMIC_SEQ_CST);
> +  if (*v != --res)
> +    abort ();
> +}
> +
> +void
> +test_and (short* v)
> +{
> +  *v = init;
> +
> +  __atomic_and_fetch (v, 0, __ATOMIC_RELAXED);
> +  if (*v != 0)
> +    abort ();
> +
> +  *v = init;
> +  __atomic_fetch_and (v, init, __ATOMIC_CONSUME);
> +  if (*v != init)
> +    abort ();
> +
> +  __atomic_and_fetch (v, 0, __ATOMIC_ACQUIRE);
> +  if (*v != 0)
> +    abort ();
> +
> +  *v = ~*v;
> +  __atomic_fetch_and (v, init, __ATOMIC_RELEASE);
> +  if (*v != init)
> +    abort ();
> +
> +  __atomic_and_fetch (v, 0, __ATOMIC_ACQ_REL);
> +  if (*v != 0)
> +    abort ();
> +
> +  *v = ~*v;
> +  __atomic_fetch_and (v, 0, __ATOMIC_SEQ_CST);
> +  if (*v != 0)
> +    abort ();
> +}
> +
> +void
> +test_nand (short* v)
> +{
> +  *v = init;
> +
> +  __atomic_fetch_nand (v, 0, __ATOMIC_RELAXED);
> +  if (*v != init)
> +    abort ();
> +
> +  __atomic_fetch_nand (v, init, __ATOMIC_CONSUME);
> +  if (*v != 0)
> +    abort ();
> +
> +  __atomic_nand_fetch (v, 0, __ATOMIC_ACQUIRE);
> +  if (*v != init)
> +    abort ();
> +
> +  __atomic_nand_fetch (v, init, __ATOMIC_RELEASE);
> +  if (*v != 0)
> +    abort ();
> +
> +  __atomic_fetch_nand (v, init, __ATOMIC_ACQ_REL);
> +  if (*v != init)
> +    abort ();
> +
> +  __atomic_nand_fetch (v, 0, __ATOMIC_SEQ_CST);
> +  if (*v != init)
> +    abort ();
> +}
> +
> +
> +
> +void
> +test_xor (short* v)
> +{
> +  *v = init;
> +  count = 0;
> +
> +  __atomic_xor_fetch (v, count, __ATOMIC_RELAXED);
> +  if (*v != init)
> +    abort ();
> +
> +  __atomic_fetch_xor (v, ~count, __ATOMIC_CONSUME);
> +  if (*v != 0)
> +    abort ();
> +
> +  __atomic_xor_fetch (v, 0, __ATOMIC_ACQUIRE);
> +  if (*v != 0)
> +    abort ();
> +
> +  __atomic_fetch_xor (v, ~count, __ATOMIC_RELEASE);
> +  if (*v != init)
> +    abort ();
> +
> +  __atomic_fetch_xor (v, 0, __ATOMIC_ACQ_REL);
> +  if (*v != init)
> +    abort ();
> +
> +  __atomic_xor_fetch (v, ~count, __ATOMIC_SEQ_CST);
> +  if (*v != 0)
> +    abort ();
> +}
> +
> +void
> +test_or (short* v)
> +{
> +  *v = 0;
> +  count = 1;
> +
> +  __atomic_or_fetch (v, count, __ATOMIC_RELAXED);
> +  if (*v != 1)
> +    abort ();
> +
> +  count *= 2;
> +  __atomic_fetch_or (v, count, __ATOMIC_CONSUME);
> +  if (*v != 3)
> +    abort ();
> +
> +  count *= 2;
> +  __atomic_or_fetch (v, 4, __ATOMIC_ACQUIRE);
> +  if (*v != 7)
> +    abort ();
> +
> +  count *= 2;
> +  __atomic_fetch_or (v, 8, __ATOMIC_RELEASE);
> +  if (*v != 15)
> +    abort ();
> +
> +  count *= 2;
> +  __atomic_or_fetch (v, count, __ATOMIC_ACQ_REL);
> +  if (*v != 31)
> +    abort ();
> +
> +  count *= 2;
> +  __atomic_fetch_or (v, count, __ATOMIC_SEQ_CST);
> +  if (*v != 63)
> +    abort ();
> +}
> +
> +int
> +main () {
> +  short* V[] = {&A.a, &A.b};
> +
> +  for (int i = 0; i < 2; i++) {
> +    test_fetch_add (V[i]);
> +    test_fetch_sub (V[i]);
> +    test_fetch_and (V[i]);
> +    test_fetch_nand (V[i]);
> +    test_fetch_xor (V[i]);
> +    test_fetch_or (V[i]);
> +
> +    test_add_fetch (V[i]);
> +    test_sub_fetch (V[i]);
> +    test_and_fetch (V[i]);
> +    test_nand_fetch (V[i]);
> +    test_xor_fetch (V[i]);
> +    test_or_fetch (V[i]);
> +
> +    test_add (V[i]);
> +    test_sub (V[i]);
> +    test_and (V[i]);
> +    test_nand (V[i]);
> +    test_xor (V[i]);
> +    test_or (V[i]);
> +  }
> +
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-5.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-5.c
> new file mode 100644
> index 00000000000..52093894a79
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-5.c
> @@ -0,0 +1,87 @@
> +/* Test __atomic routines for existence and proper execution on 1 byte
> +   values with each valid memory model.  */
> +/* Duplicate logic as libatomic/testsuite/libatomic.c/atomic-compare-exchange-1.c */
> +/* { dg-do run } */
> +/* { dg-options "-minline-atomics" } */
> +
> +/* Test the execution of the __atomic_compare_exchange_n builtin for a char.  */
> +
> +extern void abort(void);
> +
> +char v = 0;
> +char expected = 0;
> +char max = ~0;
> +char desired = ~0;
> +char zero = 0;
> +
> +#define STRONG 0
> +#define WEAK 1
> +
> +int
> +main ()
> +{
> +
> +  if (!__atomic_compare_exchange_n (&v, &expected, max, STRONG , __ATOMIC_RELAXED, __ATOMIC_RELAXED))
> +    abort ();
> +  if (expected != 0)
> +    abort ();
> +
> +  if (__atomic_compare_exchange_n (&v, &expected, 0, STRONG , __ATOMIC_ACQUIRE, __ATOMIC_RELAXED))
> +    abort ();
> +  if (expected != max)
> +    abort ();
> +
> +  if (!__atomic_compare_exchange_n (&v, &expected, 0, STRONG , __ATOMIC_RELEASE, __ATOMIC_ACQUIRE))
> +    abort ();
> +  if (expected != max)
> +    abort ();
> +  if (v != 0)
> +    abort ();
> +
> +  if (__atomic_compare_exchange_n (&v, &expected, desired, WEAK, __ATOMIC_ACQ_REL, __ATOMIC_ACQUIRE))
> +    abort ();
> +  if (expected != 0)
> +    abort ();
> +
> +  if (!__atomic_compare_exchange_n (&v, &expected, desired, STRONG , __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST))
> +    abort ();
> +  if (expected != 0)
> +    abort ();
> +  if (v != max)
> +    abort ();
> +
> +  /* Now test the generic version.  */
> +
> +  v = 0;
> +
> +  if (!__atomic_compare_exchange (&v, &expected, &max, STRONG, __ATOMIC_RELAXED, __ATOMIC_RELAXED))
> +    abort ();
> +  if (expected != 0)
> +    abort ();
> +
> +  if (__atomic_compare_exchange (&v, &expected, &zero, STRONG , __ATOMIC_ACQUIRE, __ATOMIC_RELAXED))
> +    abort ();
> +  if (expected != max)
> +    abort ();
> +
> +  if (!__atomic_compare_exchange (&v, &expected, &zero, STRONG , __ATOMIC_RELEASE, __ATOMIC_ACQUIRE))
> +    abort ();
> +  if (expected != max)
> +    abort ();
> +  if (v != 0)
> +    abort ();
> +
> +  if (__atomic_compare_exchange (&v, &expected, &desired, WEAK, __ATOMIC_ACQ_REL, __ATOMIC_ACQUIRE))
> +    abort ();
> +  if (expected != 0)
> +    abort ();
> +
> +  if (!__atomic_compare_exchange (&v, &expected, &desired, STRONG , __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST))
> +    abort ();
> +  if (expected != 0)
> +    abort ();
> +  if (v != max)
> +    abort ();
> +
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-6.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-6.c
> new file mode 100644
> index 00000000000..8fee8c44811
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-6.c
> @@ -0,0 +1,87 @@
> +/* Test __atomic routines for existence and proper execution on 2 byte
> +   values with each valid memory model.  */
> +/* Duplicate logic as libatomic/testsuite/libatomic.c/atomic-compare-exchange-2.c */
> +/* { dg-do run } */
> +/* { dg-options "-minline-atomics" } */
> +
> +/* Test the execution of the __atomic_compare_exchange_n builtin for a short.  */
> +
> +extern void abort(void);
> +
> +short v = 0;
> +short expected = 0;
> +short max = ~0;
> +short desired = ~0;
> +short zero = 0;
> +
> +#define STRONG 0
> +#define WEAK 1
> +
> +int
> +main ()
> +{
> +
> +  if (!__atomic_compare_exchange_n (&v, &expected, max, STRONG , __ATOMIC_RELAXED, __ATOMIC_RELAXED))
> +    abort ();
> +  if (expected != 0)
> +    abort ();
> +
> +  if (__atomic_compare_exchange_n (&v, &expected, 0, STRONG , __ATOMIC_ACQUIRE, __ATOMIC_RELAXED))
> +    abort ();
> +  if (expected != max)
> +    abort ();
> +
> +  if (!__atomic_compare_exchange_n (&v, &expected, 0, STRONG , __ATOMIC_RELEASE, __ATOMIC_ACQUIRE))
> +    abort ();
> +  if (expected != max)
> +    abort ();
> +  if (v != 0)
> +    abort ();
> +
> +  if (__atomic_compare_exchange_n (&v, &expected, desired, WEAK, __ATOMIC_ACQ_REL, __ATOMIC_ACQUIRE))
> +    abort ();
> +  if (expected != 0)
> +    abort ();
> +
> +  if (!__atomic_compare_exchange_n (&v, &expected, desired, STRONG , __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST))
> +    abort ();
> +  if (expected != 0)
> +    abort ();
> +  if (v != max)
> +    abort ();
> +
> +  /* Now test the generic version.  */
> +
> +  v = 0;
> +
> +  if (!__atomic_compare_exchange (&v, &expected, &max, STRONG, __ATOMIC_RELAXED, __ATOMIC_RELAXED))
> +    abort ();
> +  if (expected != 0)
> +    abort ();
> +
> +  if (__atomic_compare_exchange (&v, &expected, &zero, STRONG , __ATOMIC_ACQUIRE, __ATOMIC_RELAXED))
> +    abort ();
> +  if (expected != max)
> +    abort ();
> +
> +  if (!__atomic_compare_exchange (&v, &expected, &zero, STRONG , __ATOMIC_RELEASE, __ATOMIC_ACQUIRE))
> +    abort ();
> +  if (expected != max)
> +    abort ();
> +  if (v != 0)
> +    abort ();
> +
> +  if (__atomic_compare_exchange (&v, &expected, &desired, WEAK, __ATOMIC_ACQ_REL, __ATOMIC_ACQUIRE))
> +    abort ();
> +  if (expected != 0)
> +    abort ();
> +
> +  if (!__atomic_compare_exchange (&v, &expected, &desired, STRONG , __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST))
> +    abort ();
> +  if (expected != 0)
> +    abort ();
> +  if (v != max)
> +    abort ();
> +
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-7.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-7.c
> new file mode 100644
> index 00000000000..24c344c0ce3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-7.c
> @@ -0,0 +1,69 @@
> +/* Test __atomic routines for existence and proper execution on 1 byte
> +   values with each valid memory model.  */
> +/* Duplicate logic as libatomic/testsuite/libatomic.c/atomic-exchange-1.c */
> +/* { dg-do run } */
> +/* { dg-options "-minline-atomics" } */
> +
> +/* Test the execution of the __atomic_exchange_n builtin for a char.  */
> +
> +extern void abort(void);
> +
> +char v, count, ret;
> +
> +int
> +main ()
> +{
> +  v = 0;
> +  count = 0;
> +
> +  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_RELAXED) != count)
> +    abort ();
> +  count++;
> +
> +  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_ACQUIRE) != count)
> +    abort ();
> +  count++;
> +
> +  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_RELEASE) != count)
> +    abort ();
> +  count++;
> +
> +  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_ACQ_REL) != count)
> +    abort ();
> +  count++;
> +
> +  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_SEQ_CST) != count)
> +    abort ();
> +  count++;
> +
> +  /* Now test the generic version.  */
> +
> +  count++;
> +
> +  __atomic_exchange (&v, &count, &ret, __ATOMIC_RELAXED);
> +  if (ret != count - 1 || v != count)
> +    abort ();
> +  count++;
> +
> +  __atomic_exchange (&v, &count, &ret, __ATOMIC_ACQUIRE);
> +  if (ret != count - 1 || v != count)
> +    abort ();
> +  count++;
> +
> +  __atomic_exchange (&v, &count, &ret, __ATOMIC_RELEASE);
> +  if (ret != count - 1 || v != count)
> +    abort ();
> +  count++;
> +
> +  __atomic_exchange (&v, &count, &ret, __ATOMIC_ACQ_REL);
> +  if (ret != count - 1 || v != count)
> +    abort ();
> +  count++;
> +
> +  __atomic_exchange (&v, &count, &ret, __ATOMIC_SEQ_CST);
> +  if (ret != count - 1 || v != count)
> +    abort ();
> +  count++;
> +
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-8.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-8.c
> new file mode 100644
> index 00000000000..edc212df04e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-8.c
> @@ -0,0 +1,69 @@
> +/* Test __atomic routines for existence and proper execution on 2 byte
> +   values with each valid memory model.  */
> +/* Duplicate logic as libatomic/testsuite/libatomic.c/atomic-exchange-2.c */
> +/* { dg-do run } */
> +/* { dg-options "-minline-atomics" } */
> +
> +/* Test the execution of the __atomic_X builtin for a short.  */
> +
> +extern void abort(void);
> +
> +short v, count, ret;
> +
> +int
> +main ()
> +{
> +  v = 0;
> +  count = 0;
> +
> +  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_RELAXED) != count)
> +    abort ();
> +  count++;
> +
> +  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_ACQUIRE) != count)
> +    abort ();
> +  count++;
> +
> +  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_RELEASE) != count)
> +    abort ();
> +  count++;
> +
> +  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_ACQ_REL) != count)
> +    abort ();
> +  count++;
> +
> +  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_SEQ_CST) != count)
> +    abort ();
> +  count++;
> +
> +  /* Now test the generic version.  */
> +
> +  count++;
> +
> +  __atomic_exchange (&v, &count, &ret, __ATOMIC_RELAXED);
> +  if (ret != count - 1 || v != count)
> +    abort ();
> +  count++;
> +
> +  __atomic_exchange (&v, &count, &ret, __ATOMIC_ACQUIRE);
> +  if (ret != count - 1 || v != count)
> +    abort ();
> +  count++;
> +
> +  __atomic_exchange (&v, &count, &ret, __ATOMIC_RELEASE);
> +  if (ret != count - 1 || v != count)
> +    abort ();
> +  count++;
> +
> +  __atomic_exchange (&v, &count, &ret, __ATOMIC_ACQ_REL);
> +  if (ret != count - 1 || v != count)
> +    abort ();
> +  count++;
> +
> +  __atomic_exchange (&v, &count, &ret, __ATOMIC_SEQ_CST);
> +  if (ret != count - 1 || v != count)
> +    abort ();
> +  count++;
> +
> +  return 0;
> +}
> diff --git a/libgcc/config/riscv/atomic.c b/libgcc/config/riscv/atomic.c
> index 69f53623509..573d163ea04 100644
> --- a/libgcc/config/riscv/atomic.c
> +++ b/libgcc/config/riscv/atomic.c
> @@ -30,6 +30,8 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
>   #define INVERT		"not %[tmp1], %[tmp1]\n\t"
>   #define DONT_INVERT	""
>   
> +/* Logic duplicated in gcc/gcc/config/riscv/sync.md for use when inlining is enabled */
> +
>   #define GENERATE_FETCH_AND_OP(type, size, opname, insn, invert, cop)	\
>     type __sync_fetch_and_ ## opname ## _ ## size (type *p, type v)	\
>     {									\

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v7] RISCV: Inline subword atomic ops
  2023-04-18 21:41     ` [PATCH v7] " Patrick O'Neill
  2023-04-24 17:20       ` Patrick O'Neill
@ 2023-04-25  5:52       ` Jeff Law
  2023-04-25 15:20         ` Patrick O'Neill
  1 sibling, 1 reply; 24+ messages in thread
From: Jeff Law @ 2023-04-25  5:52 UTC (permalink / raw)
  To: Patrick O'Neill, gcc-patches; +Cc: palmer, kito.cheng, david.abd, schwab



On 4/18/23 15:41, Patrick O'Neill wrote:
> RISC-V has no support for subword atomic operations; code currently
> generates libatomic library calls.
> 
> This patch changes the default behavior to inline subword atomic calls
> (using the same logic as the existing library call).
> Behavior can be specified using the -minline-atomics and
> -mno-inline-atomics command line flags.
> 
> gcc/libgcc/config/riscv/atomic.c has the same logic implemented in asm.
> This will need to stay for backwards compatibility and the
> -mno-inline-atomics flag.
> 
> 2023-04-18 Patrick O'Neill <patrick@rivosinc.com>
> 
> 	PR target/104338
> 	* riscv-protos.h: Add helper function stubs.
> 	* riscv.cc: Add helper functions for subword masking.
> 	* riscv.opt: Add command-line flag.
> 	* sync.md: Add masking logic and inline asm for fetch_and_op,
> 	fetch_and_nand, CAS, and exchange ops.
> 	* invoke.texi: Add blurb regarding command-line flag.
> 	* inline-atomics-1.c: New test.
> 	* inline-atomics-2.c: Likewise.
> 	* inline-atomics-3.c: Likewise.
> 	* inline-atomics-4.c: Likewise.
> 	* inline-atomics-5.c: Likewise.
> 	* inline-atomics-6.c: Likewise.
> 	* inline-atomics-7.c: Likewise.
> 	* inline-atomics-8.c: Likewise.
> 	* atomic.c: Add reference to duplicate logic.
This is OK for the trunk.  I think the only question is whether or not 
you're going to contribute to GCC regularly.  If so, we should go ahead 
and get you write access so you can commit ACK's changes.  If you're not 
going to be making regular contributions, then I can go ahead and commit 
it for you.

Jeff

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v7] RISCV: Inline subword atomic ops
  2023-04-25  5:52       ` Jeff Law
@ 2023-04-25 15:20         ` Patrick O'Neill
  2023-04-26  2:27           ` Jeff Law
  0 siblings, 1 reply; 24+ messages in thread
From: Patrick O'Neill @ 2023-04-25 15:20 UTC (permalink / raw)
  To: Jeff Law, gcc-patches; +Cc: palmer, kito.cheng, david.abd, schwab

On 4/24/23 22:52, Jeff Law wrote:

>
> On 4/18/23 15:41, Patrick O'Neill wrote:
>> RISC-V has no support for subword atomic operations; code currently
>> generates libatomic library calls.
>>
>> This patch changes the default behavior to inline subword atomic calls
>> (using the same logic as the existing library call).
>> Behavior can be specified using the -minline-atomics and
>> -mno-inline-atomics command line flags.
>>
>> gcc/libgcc/config/riscv/atomic.c has the same logic implemented in asm.
>> This will need to stay for backwards compatibility and the
>> -mno-inline-atomics flag.
>>
>> 2023-04-18 Patrick O'Neill <patrick@rivosinc.com>
>>
>>     PR target/104338
>>     * riscv-protos.h: Add helper function stubs.
>>     * riscv.cc: Add helper functions for subword masking.
>>     * riscv.opt: Add command-line flag.
>>     * sync.md: Add masking logic and inline asm for fetch_and_op,
>>     fetch_and_nand, CAS, and exchange ops.
>>     * invoke.texi: Add blurb regarding command-line flag.
>>     * inline-atomics-1.c: New test.
>>     * inline-atomics-2.c: Likewise.
>>     * inline-atomics-3.c: Likewise.
>>     * inline-atomics-4.c: Likewise.
>>     * inline-atomics-5.c: Likewise.
>>     * inline-atomics-6.c: Likewise.
>>     * inline-atomics-7.c: Likewise.
>>     * inline-atomics-8.c: Likewise.
>>     * atomic.c: Add reference to duplicate logic.
> This is OK for the trunk.  I think the only question is whether or not 
> you're going to contribute to GCC regularly.  If so, we should go 
> ahead and get you write access so you can commit ACK's changes.  If 
> you're not going to be making regular contributions, then I can go 
> ahead and commit it for you.
>
> Jeff

I should be contributing regularly, so write access would be great.

Is this the correct form?
https://sourceware.org/cgi-bin/pdw/ps_form.cgi

Patrick


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v7] RISCV: Inline subword atomic ops
  2023-04-25 15:20         ` Patrick O'Neill
@ 2023-04-26  2:27           ` Jeff Law
  2023-04-26 17:01             ` [committed] " Patrick O'Neill
  0 siblings, 1 reply; 24+ messages in thread
From: Jeff Law @ 2023-04-26  2:27 UTC (permalink / raw)
  To: Patrick O'Neill, gcc-patches; +Cc: palmer, kito.cheng, david.abd, schwab



On 4/25/23 09:20, Patrick O'Neill wrote:
> On 4/24/23 22:52, Jeff Law wrote:
> 
>>
>> On 4/18/23 15:41, Patrick O'Neill wrote:
>>> RISC-V has no support for subword atomic operations; code currently
>>> generates libatomic library calls.
>>>
>>> This patch changes the default behavior to inline subword atomic calls
>>> (using the same logic as the existing library call).
>>> Behavior can be specified using the -minline-atomics and
>>> -mno-inline-atomics command line flags.
>>>
>>> gcc/libgcc/config/riscv/atomic.c has the same logic implemented in asm.
>>> This will need to stay for backwards compatibility and the
>>> -mno-inline-atomics flag.
>>>
>>> 2023-04-18 Patrick O'Neill <patrick@rivosinc.com>
>>>
>>>     PR target/104338
>>>     * riscv-protos.h: Add helper function stubs.
>>>     * riscv.cc: Add helper functions for subword masking.
>>>     * riscv.opt: Add command-line flag.
>>>     * sync.md: Add masking logic and inline asm for fetch_and_op,
>>>     fetch_and_nand, CAS, and exchange ops.
>>>     * invoke.texi: Add blurb regarding command-line flag.
>>>     * inline-atomics-1.c: New test.
>>>     * inline-atomics-2.c: Likewise.
>>>     * inline-atomics-3.c: Likewise.
>>>     * inline-atomics-4.c: Likewise.
>>>     * inline-atomics-5.c: Likewise.
>>>     * inline-atomics-6.c: Likewise.
>>>     * inline-atomics-7.c: Likewise.
>>>     * inline-atomics-8.c: Likewise.
>>>     * atomic.c: Add reference to duplicate logic.
>> This is OK for the trunk.  I think the only question is whether or not 
>> you're going to contribute to GCC regularly.  If so, we should go 
>> ahead and get you write access so you can commit ACK's changes.  If 
>> you're not going to be making regular contributions, then I can go 
>> ahead and commit it for you.
>>
>> Jeff
> 
> I should be contributing regularly, so write access would be great.
> 
> Is this the correct form?
> https://sourceware.org/cgi-bin/pdw/ps_form.cgi
It is and I've already pushed the ACK button on my side.  So we're just 
waiting for an admin to push the buttons on his side to make the account.

jeff

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [committed] RISCV: Inline subword atomic ops
  2023-04-26  2:27           ` Jeff Law
@ 2023-04-26 17:01             ` Patrick O'Neill
  2023-05-02 20:34               ` Patrick O'Neill
  0 siblings, 1 reply; 24+ messages in thread
From: Patrick O'Neill @ 2023-04-26 17:01 UTC (permalink / raw)
  To: jeffreyalaw, gcc-patches
  Cc: palmer, kito.cheng, david.abd, schwab, Patrick O'Neill

Committed - I had to reformat the changelog so it would push and resolve a
trivial merge conflict in riscv.opt.

---

RISC-V has no support for subword atomic operations; code currently
generates libatomic library calls.

This patch changes the default behavior to inline subword atomic calls
(using the same logic as the existing library call).
Behavior can be specified using the -minline-atomics and
-mno-inline-atomics command line flags.

gcc/libgcc/config/riscv/atomic.c has the same logic implemented in asm.
This will need to stay for backwards compatibility and the
-mno-inline-atomics flag.

2023-04-18 Patrick O'Neill <patrick@rivosinc.com>

gcc/ChangeLog:
	PR target/104338
	* config/riscv/riscv-protos.h: Add helper function stubs.
	* config/riscv/riscv.cc: Add helper functions for subword masking.
	* config/riscv/riscv.opt: Add command-line flag.
	* config/riscv/sync.md: Add masking logic and inline asm for fetch_and_op,
	fetch_and_nand, CAS, and exchange ops.
	* doc/invoke.texi: Add blurb regarding command-line flag.

libgcc/ChangeLog:
	PR target/104338
	* config/riscv/atomic.c: Add reference to duplicate logic.

gcc/testsuite/ChangeLog:
	PR target/104338
	* gcc.target/riscv/inline-atomics-1.c: New test.
	* gcc.target/riscv/inline-atomics-2.c: New test.
	* gcc.target/riscv/inline-atomics-3.c: New test.
	* gcc.target/riscv/inline-atomics-4.c: New test.
	* gcc.target/riscv/inline-atomics-5.c: New test.
	* gcc.target/riscv/inline-atomics-6.c: New test.
	* gcc.target/riscv/inline-atomics-7.c: New test.
	* gcc.target/riscv/inline-atomics-8.c: New test.

Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
---
 gcc/config/riscv/riscv-protos.h               |   2 +
 gcc/config/riscv/riscv.cc                     |  49 ++
 gcc/config/riscv/riscv.opt                    |   4 +
 gcc/config/riscv/sync.md                      | 301 +++++++++
 gcc/doc/invoke.texi                           |  10 +-
 .../gcc.target/riscv/inline-atomics-1.c       |  18 +
 .../gcc.target/riscv/inline-atomics-2.c       |   9 +
 .../gcc.target/riscv/inline-atomics-3.c       | 569 ++++++++++++++++++
 .../gcc.target/riscv/inline-atomics-4.c       | 566 +++++++++++++++++
 .../gcc.target/riscv/inline-atomics-5.c       |  87 +++
 .../gcc.target/riscv/inline-atomics-6.c       |  87 +++
 .../gcc.target/riscv/inline-atomics-7.c       |  69 +++
 .../gcc.target/riscv/inline-atomics-8.c       |  69 +++
 libgcc/config/riscv/atomic.c                  |   2 +
 14 files changed, 1841 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-8.c

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 607ff6ea697..f87661bde2c 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -79,6 +79,8 @@ extern void riscv_reinit (void);
 extern poly_uint64 riscv_regmode_natural_size (machine_mode);
 extern bool riscv_v_ext_vector_mode_p (machine_mode);
 extern bool riscv_shamt_matches_mask_p (int, HOST_WIDE_INT);
+extern void riscv_subword_address (rtx, rtx *, rtx *, rtx *, rtx *);
+extern void riscv_lshift_subword (machine_mode, rtx, rtx, rtx *);

 /* Routines implemented in riscv-c.cc.  */
 void riscv_cpu_cpp_builtins (cpp_reader *);
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index a2d2dd0bb67..0f890469d7a 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -7161,6 +7161,55 @@ riscv_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
 							& ~zeroed_hardregs);
 }

+/* Given memory reference MEM, expand code to compute the aligned
+   memory address, shift and mask values and store them into
+   *ALIGNED_MEM, *SHIFT, *MASK and *NOT_MASK.  */
+
+void
+riscv_subword_address (rtx mem, rtx *aligned_mem, rtx *shift, rtx *mask,
+		       rtx *not_mask)
+{
+  /* Align the memory address to a word.  */
+  rtx addr = force_reg (Pmode, XEXP (mem, 0));
+
+  rtx addr_mask = gen_int_mode (-4, Pmode);
+
+  rtx aligned_addr = gen_reg_rtx (Pmode);
+  emit_move_insn (aligned_addr,  gen_rtx_AND (Pmode, addr, addr_mask));
+
+  *aligned_mem = change_address (mem, SImode, aligned_addr);
+
+  /* Calculate the shift amount.  */
+  emit_move_insn (*shift, gen_rtx_AND (SImode, gen_lowpart (SImode, addr),
+				       gen_int_mode (3, SImode)));
+  emit_move_insn (*shift, gen_rtx_ASHIFT (SImode, *shift,
+					  gen_int_mode (3, SImode)));
+
+  /* Calculate the mask.  */
+  int unshifted_mask = GET_MODE_MASK (GET_MODE (mem));
+
+  emit_move_insn (*mask, gen_int_mode (unshifted_mask, SImode));
+
+  emit_move_insn (*mask, gen_rtx_ASHIFT (SImode, *mask,
+					 gen_lowpart (QImode, *shift)));
+
+  emit_move_insn (*not_mask, gen_rtx_NOT(SImode, *mask));
+}
+
+/* Leftshift a subword within an SImode register.  */
+
+void
+riscv_lshift_subword (machine_mode mode, rtx value, rtx shift,
+		      rtx *shifted_value)
+{
+  rtx value_reg = gen_reg_rtx (SImode);
+  emit_move_insn (value_reg, simplify_gen_subreg (SImode, value,
+						  mode, 0));
+
+  emit_move_insn(*shifted_value, gen_rtx_ASHIFT (SImode, value_reg,
+						 gen_lowpart (QImode, shift)));
+}
+
 /* Initialize the GCC target structure.  */
 #undef TARGET_ASM_ALIGNED_HI_OP
 #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t"
diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index ef1bdfcfe28..63d4710cb15 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -255,6 +255,10 @@ misa-spec=
 Target RejectNegative Joined Enum(isa_spec_class) Var(riscv_isa_spec) Init(TARGET_DEFAULT_ISA_SPEC)
 Set the version of RISC-V ISA spec.

+minline-atomics
+Target Var(TARGET_INLINE_SUBWORD_ATOMIC) Init(1)
+Always inline subword atomic operations.
+
 Enum
 Name(riscv_autovec_preference) Type(enum riscv_autovec_preference_enum)
 The RISC-V auto-vectorization preference:
diff --git a/gcc/config/riscv/sync.md b/gcc/config/riscv/sync.md
index c932ef87b9d..83be6431cb6 100644
--- a/gcc/config/riscv/sync.md
+++ b/gcc/config/riscv/sync.md
@@ -21,8 +21,11 @@

 (define_c_enum "unspec" [
   UNSPEC_COMPARE_AND_SWAP
+  UNSPEC_COMPARE_AND_SWAP_SUBWORD
   UNSPEC_SYNC_OLD_OP
+  UNSPEC_SYNC_OLD_OP_SUBWORD
   UNSPEC_SYNC_EXCHANGE
+  UNSPEC_SYNC_EXCHANGE_SUBWORD
   UNSPEC_ATOMIC_STORE
   UNSPEC_MEMORY_BARRIER
 ])
@@ -91,6 +94,135 @@
   [(set_attr "type" "atomic")
    (set (attr "length") (const_int 8))])

+(define_insn "subword_atomic_fetch_strong_<atomic_optab>"
+  [(set (match_operand:SI 0 "register_operand" "=&r")		   ;; old value at mem
+	(match_operand:SI 1 "memory_operand" "+A"))		   ;; mem location
+   (set (match_dup 1)
+	(unspec_volatile:SI
+	  [(any_atomic:SI (match_dup 1)
+		     (match_operand:SI 2 "register_operand" "rI")) ;; value for op
+	   (match_operand:SI 3 "register_operand" "rI")]	   ;; mask
+	 UNSPEC_SYNC_OLD_OP_SUBWORD))
+    (match_operand:SI 4 "register_operand" "rI")		   ;; not_mask
+    (clobber (match_scratch:SI 5 "=&r"))			   ;; tmp_1
+    (clobber (match_scratch:SI 6 "=&r"))]			   ;; tmp_2
+  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
+  {
+    return "1:\;"
+	   "lr.w.aq\t%0, %1\;"
+	   "<insn>\t%5, %0, %2\;"
+	   "and\t%5, %5, %3\;"
+	   "and\t%6, %0, %4\;"
+	   "or\t%6, %6, %5\;"
+	   "sc.w.rl\t%5, %6, %1\;"
+	   "bnez\t%5, 1b";
+  }
+  [(set (attr "length") (const_int 28))])
+
+(define_expand "atomic_fetch_nand<mode>"
+  [(match_operand:SHORT 0 "register_operand")			      ;; old value at mem
+   (not:SHORT (and:SHORT (match_operand:SHORT 1 "memory_operand")     ;; mem location
+			 (match_operand:SHORT 2 "reg_or_0_operand"))) ;; value for op
+   (match_operand:SI 3 "const_int_operand")]			      ;; model
+  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
+{
+  /* We have no QImode/HImode atomics, so form a mask, then use
+     subword_atomic_fetch_strong_nand to implement a LR/SC version of the
+     operation. */
+
+  /* Logic duplicated in gcc/libgcc/config/riscv/atomic.c for use when inlining
+     is disabled */
+
+  rtx old = gen_reg_rtx (SImode);
+  rtx mem = operands[1];
+  rtx value = operands[2];
+  rtx aligned_mem = gen_reg_rtx (SImode);
+  rtx shift = gen_reg_rtx (SImode);
+  rtx mask = gen_reg_rtx (SImode);
+  rtx not_mask = gen_reg_rtx (SImode);
+
+  riscv_subword_address (mem, &aligned_mem, &shift, &mask, &not_mask);
+
+  rtx shifted_value = gen_reg_rtx (SImode);
+  riscv_lshift_subword (<MODE>mode, value, shift, &shifted_value);
+
+  emit_insn (gen_subword_atomic_fetch_strong_nand (old, aligned_mem,
+						   shifted_value,
+						   mask, not_mask));
+
+  emit_move_insn (old, gen_rtx_ASHIFTRT (SImode, old,
+					 gen_lowpart (QImode, shift)));
+
+  emit_move_insn (operands[0], gen_lowpart (<MODE>mode, old));
+
+  DONE;
+})
+
+(define_insn "subword_atomic_fetch_strong_nand"
+  [(set (match_operand:SI 0 "register_operand" "=&r")			  ;; old value at mem
+	(match_operand:SI 1 "memory_operand" "+A"))			  ;; mem location
+   (set (match_dup 1)
+	(unspec_volatile:SI
+	  [(not:SI (and:SI (match_dup 1)
+			   (match_operand:SI 2 "register_operand" "rI"))) ;; value for op
+	   (match_operand:SI 3 "register_operand" "rI")]		  ;; mask
+	 UNSPEC_SYNC_OLD_OP_SUBWORD))
+    (match_operand:SI 4 "register_operand" "rI")			  ;; not_mask
+    (clobber (match_scratch:SI 5 "=&r"))				  ;; tmp_1
+    (clobber (match_scratch:SI 6 "=&r"))]				  ;; tmp_2
+  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
+  {
+    return "1:\;"
+	   "lr.w.aq\t%0, %1\;"
+	   "and\t%5, %0, %2\;"
+	   "not\t%5, %5\;"
+	   "and\t%5, %5, %3\;"
+	   "and\t%6, %0, %4\;"
+	   "or\t%6, %6, %5\;"
+	   "sc.w.rl\t%5, %6, %1\;"
+	   "bnez\t%5, 1b";
+  }
+  [(set (attr "length") (const_int 32))])
+
+(define_expand "atomic_fetch_<atomic_optab><mode>"
+  [(match_operand:SHORT 0 "register_operand")			 ;; old value at mem
+   (any_atomic:SHORT (match_operand:SHORT 1 "memory_operand")	 ;; mem location
+		     (match_operand:SHORT 2 "reg_or_0_operand")) ;; value for op
+   (match_operand:SI 3 "const_int_operand")]			 ;; model
+  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
+{
+  /* We have no QImode/HImode atomics, so form a mask, then use
+     subword_atomic_fetch_strong_<mode> to implement a LR/SC version of the
+     operation. */
+
+  /* Logic duplicated in gcc/libgcc/config/riscv/atomic.c for use when inlining
+     is disabled */
+
+  rtx old = gen_reg_rtx (SImode);
+  rtx mem = operands[1];
+  rtx value = operands[2];
+  rtx aligned_mem = gen_reg_rtx (SImode);
+  rtx shift = gen_reg_rtx (SImode);
+  rtx mask = gen_reg_rtx (SImode);
+  rtx not_mask = gen_reg_rtx (SImode);
+
+  riscv_subword_address (mem, &aligned_mem, &shift, &mask, &not_mask);
+
+  rtx shifted_value = gen_reg_rtx (SImode);
+  riscv_lshift_subword (<MODE>mode, value, shift, &shifted_value);
+
+  emit_insn (gen_subword_atomic_fetch_strong_<atomic_optab> (old, aligned_mem,
+							     shifted_value,
+							     mask, not_mask));
+
+  emit_move_insn (old, gen_rtx_ASHIFTRT (SImode, old,
+					 gen_lowpart (QImode, shift)));
+
+  emit_move_insn (operands[0], gen_lowpart (<MODE>mode, old));
+
+  DONE;
+})
+
 (define_insn "atomic_exchange<mode>"
   [(set (match_operand:GPR 0 "register_operand" "=&r")
 	(unspec_volatile:GPR
@@ -104,6 +236,56 @@
   [(set_attr "type" "atomic")
    (set (attr "length") (const_int 8))])

+(define_expand "atomic_exchange<mode>"
+  [(match_operand:SHORT 0 "register_operand") ;; old value at mem
+   (match_operand:SHORT 1 "memory_operand")   ;; mem location
+   (match_operand:SHORT 2 "register_operand") ;; value
+   (match_operand:SI 3 "const_int_operand")]  ;; model
+  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
+{
+  rtx old = gen_reg_rtx (SImode);
+  rtx mem = operands[1];
+  rtx value = operands[2];
+  rtx aligned_mem = gen_reg_rtx (SImode);
+  rtx shift = gen_reg_rtx (SImode);
+  rtx mask = gen_reg_rtx (SImode);
+  rtx not_mask = gen_reg_rtx (SImode);
+
+  riscv_subword_address (mem, &aligned_mem, &shift, &mask, &not_mask);
+
+  rtx shifted_value = gen_reg_rtx (SImode);
+  riscv_lshift_subword (<MODE>mode, value, shift, &shifted_value);
+
+  emit_insn (gen_subword_atomic_exchange_strong (old, aligned_mem,
+						 shifted_value, not_mask));
+
+  emit_move_insn (old, gen_rtx_ASHIFTRT (SImode, old,
+					 gen_lowpart (QImode, shift)));
+
+  emit_move_insn (operands[0], gen_lowpart (<MODE>mode, old));
+  DONE;
+})
+
+(define_insn "subword_atomic_exchange_strong"
+  [(set (match_operand:SI 0 "register_operand" "=&r")	 ;; old value at mem
+	(match_operand:SI 1 "memory_operand" "+A"))	 ;; mem location
+   (set (match_dup 1)
+	(unspec_volatile:SI
+	  [(match_operand:SI 2 "reg_or_0_operand" "rI")  ;; value
+	   (match_operand:SI 3 "reg_or_0_operand" "rI")] ;; not_mask
+      UNSPEC_SYNC_EXCHANGE_SUBWORD))
+    (clobber (match_scratch:SI 4 "=&r"))]		 ;; tmp_1
+  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
+  {
+    return "1:\;"
+	   "lr.w.aq\t%0, %1\;"
+	   "and\t%4, %0, %3\;"
+	   "or\t%4, %4, %2\;"
+	   "sc.w.rl\t%4, %4, %1\;"
+	   "bnez\t%4, 1b";
+  }
+  [(set (attr "length") (const_int 20))])
+
 (define_insn "atomic_cas_value_strong<mode>"
   [(set (match_operand:GPR 0 "register_operand" "=&r")
 	(match_operand:GPR 1 "memory_operand" "+A"))
@@ -153,6 +335,125 @@
   DONE;
 })

+(define_expand "atomic_compare_and_swap<mode>"
+  [(match_operand:SI 0 "register_operand")    ;; bool output
+   (match_operand:SHORT 1 "register_operand") ;; val output
+   (match_operand:SHORT 2 "memory_operand")   ;; memory
+   (match_operand:SHORT 3 "reg_or_0_operand") ;; expected value
+   (match_operand:SHORT 4 "reg_or_0_operand") ;; desired value
+   (match_operand:SI 5 "const_int_operand")   ;; is_weak
+   (match_operand:SI 6 "const_int_operand")   ;; mod_s
+   (match_operand:SI 7 "const_int_operand")]  ;; mod_f
+  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
+{
+  emit_insn (gen_atomic_cas_value_strong<mode> (operands[1], operands[2],
+						operands[3], operands[4],
+						operands[6], operands[7]));
+
+  rtx val = gen_reg_rtx (SImode);
+  if (operands[1] != const0_rtx)
+    emit_move_insn (val, gen_rtx_SIGN_EXTEND (SImode, operands[1]));
+  else
+    emit_move_insn (val, const0_rtx);
+
+  rtx exp = gen_reg_rtx (SImode);
+  if (operands[3] != const0_rtx)
+    emit_move_insn (exp, gen_rtx_SIGN_EXTEND (SImode, operands[3]));
+  else
+    emit_move_insn (exp, const0_rtx);
+
+  rtx compare = val;
+  if (exp != const0_rtx)
+    {
+      rtx difference = gen_rtx_MINUS (SImode, val, exp);
+      compare = gen_reg_rtx (SImode);
+      emit_move_insn  (compare, difference);
+    }
+
+  if (word_mode != SImode)
+    {
+      rtx reg = gen_reg_rtx (word_mode);
+      emit_move_insn (reg, gen_rtx_SIGN_EXTEND (word_mode, compare));
+      compare = reg;
+    }
+
+  emit_move_insn (operands[0], gen_rtx_EQ (SImode, compare, const0_rtx));
+  DONE;
+})
+
+(define_expand "atomic_cas_value_strong<mode>"
+  [(match_operand:SHORT 0 "register_operand") ;; val output
+   (match_operand:SHORT 1 "memory_operand")   ;; memory
+   (match_operand:SHORT 2 "reg_or_0_operand") ;; expected value
+   (match_operand:SHORT 3 "reg_or_0_operand") ;; desired value
+   (match_operand:SI 4 "const_int_operand")   ;; mod_s
+   (match_operand:SI 5 "const_int_operand")   ;; mod_f
+   (match_scratch:SHORT 6)]
+  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
+{
+  /* We have no QImode/HImode atomics, so form a mask, then use
+     subword_atomic_cas_strong<mode> to implement a LR/SC version of the
+     operation. */
+
+  /* Logic duplicated in gcc/libgcc/config/riscv/atomic.c for use when inlining
+     is disabled */
+
+  rtx old = gen_reg_rtx (SImode);
+  rtx mem = operands[1];
+  rtx aligned_mem = gen_reg_rtx (SImode);
+  rtx shift = gen_reg_rtx (SImode);
+  rtx mask = gen_reg_rtx (SImode);
+  rtx not_mask = gen_reg_rtx (SImode);
+
+  riscv_subword_address (mem, &aligned_mem, &shift, &mask, &not_mask);
+
+  rtx o = operands[2];
+  rtx n = operands[3];
+  rtx shifted_o = gen_reg_rtx (SImode);
+  rtx shifted_n = gen_reg_rtx (SImode);
+
+  riscv_lshift_subword (<MODE>mode, o, shift, &shifted_o);
+  riscv_lshift_subword (<MODE>mode, n, shift, &shifted_n);
+
+  emit_move_insn (shifted_o, gen_rtx_AND (SImode, shifted_o, mask));
+  emit_move_insn (shifted_n, gen_rtx_AND (SImode, shifted_n, mask));
+
+  emit_insn (gen_subword_atomic_cas_strong (old, aligned_mem,
+					    shifted_o, shifted_n,
+					    mask, not_mask));
+
+  emit_move_insn (old, gen_rtx_ASHIFTRT (SImode, old,
+					 gen_lowpart (QImode, shift)));
+
+  emit_move_insn (operands[0], gen_lowpart (<MODE>mode, old));
+
+  DONE;
+})
+
+(define_insn "subword_atomic_cas_strong"
+  [(set (match_operand:SI 0 "register_operand" "=&r")			   ;; old value at mem
+	(match_operand:SI 1 "memory_operand" "+A"))			   ;; mem location
+   (set (match_dup 1)
+	(unspec_volatile:SI [(match_operand:SI 2 "reg_or_0_operand" "rJ")  ;; expected value
+			     (match_operand:SI 3 "reg_or_0_operand" "rJ")] ;; desired value
+	 UNSPEC_COMPARE_AND_SWAP_SUBWORD))
+	(match_operand:SI 4 "register_operand" "rI")			   ;; mask
+	(match_operand:SI 5 "register_operand" "rI")			   ;; not_mask
+	(clobber (match_scratch:SI 6 "=&r"))]				   ;; tmp_1
+  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
+  {
+    return "1:\;"
+	   "lr.w.aq\t%0, %1\;"
+	   "and\t%6, %0, %4\;"
+	   "bne\t%6, %z2, 1f\;"
+	   "and\t%6, %0, %5\;"
+	   "or\t%6, %6, %3\;"
+	   "sc.w.rl\t%6, %6, %1\;"
+	   "bnez\t%6, 1b\;"
+	   "1:";
+  }
+  [(set (attr "length") (const_int 28))])
+
 (define_expand "atomic_test_and_set"
   [(match_operand:QI 0 "register_operand" "")     ;; bool output
    (match_operand:QI 1 "memory_operand" "+A")    ;; memory
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index e5ee2d536fc..2f40c58b21c 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1227,7 +1227,8 @@ See RS/6000 and PowerPC Options.
 -mbig-endian  -mlittle-endian
 -mstack-protector-guard=@var{guard}  -mstack-protector-guard-reg=@var{reg}
 -mstack-protector-guard-offset=@var{offset}
--mcsr-check -mno-csr-check}
+-mcsr-check -mno-csr-check
+-minline-atomics  -mno-inline-atomics}

 @emph{RL78 Options}
 @gccoptlist{-msim  -mmul=none  -mmul=g13  -mmul=g14  -mallregs
@@ -29024,6 +29025,13 @@ Do or don't use smaller but slower prologue and epilogue code that uses
 library function calls.  The default is to use fast inline prologues and
 epilogues.

+@opindex minline-atomics
+@item -minline-atomics
+@itemx -mno-inline-atomics
+Do or don't use smaller but slower subword atomic emulation code that uses
+libatomic function calls.  The default is to use fast inline subword atomics
+that do not require libatomic.
+
 @opindex mshorten-memrefs
 @item -mshorten-memrefs
 @itemx -mno-shorten-memrefs
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-1.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-1.c
new file mode 100644
index 00000000000..5c5623d9b2f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mno-inline-atomics" } */
+/* { dg-message "note: '__sync_fetch_and_nand' changed semantics in GCC 4.4" "fetch_and_nand" { target *-*-* } 0 } */
+/* { dg-final { scan-assembler "\tcall\t__sync_fetch_and_add_1" } } */
+/* { dg-final { scan-assembler "\tcall\t__sync_fetch_and_nand_1" } } */
+/* { dg-final { scan-assembler "\tcall\t__sync_bool_compare_and_swap_1" } } */
+
+char foo;
+char bar;
+char baz;
+
+int
+main ()
+{
+  __sync_fetch_and_add(&foo, 1);
+  __sync_fetch_and_nand(&bar, 1);
+  __sync_bool_compare_and_swap (&baz, 1, 2);
+}
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-2.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-2.c
new file mode 100644
index 00000000000..01b43908692
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-2.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* Verify that subword atomics do not generate calls.  */
+/* { dg-options "-minline-atomics" } */
+/* { dg-message "note: '__sync_fetch_and_nand' changed semantics in GCC 4.4" "fetch_and_nand" { target *-*-* } 0 } */
+/* { dg-final { scan-assembler-not "\tcall\t__sync_fetch_and_add_1" } } */
+/* { dg-final { scan-assembler-not "\tcall\t__sync_fetch_and_nand_1" } } */
+/* { dg-final { scan-assembler-not "\tcall\t__sync_bool_compare_and_swap_1" } } */
+
+#include "inline-atomics-1.c"
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-3.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-3.c
new file mode 100644
index 00000000000..709f3734377
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-3.c
@@ -0,0 +1,569 @@
+/* Check all char alignments.  */
+/* Duplicate logic as libatomic/testsuite/libatomic.c/atomic-op-1.c */
+/* Test __atomic routines for existence and proper execution on 1 byte
+   values with each valid memory model.  */
+/* { dg-do run } */
+/* { dg-options "-minline-atomics -Wno-address-of-packed-member" } */
+
+/* Test the execution of the __atomic_*OP builtin routines for a char.  */
+
+extern void abort(void);
+
+char count, res;
+const char init = ~0;
+
+struct A
+{
+   char a;
+   char b;
+   char c;
+   char d;
+} __attribute__ ((packed)) A;
+
+/* The fetch_op routines return the original value before the operation.  */
+
+void
+test_fetch_add (char* v)
+{
+  *v = 0;
+  count = 1;
+
+  if (__atomic_fetch_add (v, count, __ATOMIC_RELAXED) != 0)
+    abort ();
+
+  if (__atomic_fetch_add (v, 1, __ATOMIC_CONSUME) != 1)
+    abort ();
+
+  if (__atomic_fetch_add (v, count, __ATOMIC_ACQUIRE) != 2)
+    abort ();
+
+  if (__atomic_fetch_add (v, 1, __ATOMIC_RELEASE) != 3)
+    abort ();
+
+  if (__atomic_fetch_add (v, count, __ATOMIC_ACQ_REL) != 4)
+    abort ();
+
+  if (__atomic_fetch_add (v, 1, __ATOMIC_SEQ_CST) != 5)
+    abort ();
+}
+
+
+void
+test_fetch_sub (char* v)
+{
+  *v = res = 20;
+  count = 0;
+
+  if (__atomic_fetch_sub (v, count + 1, __ATOMIC_RELAXED) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, 1, __ATOMIC_CONSUME) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, count + 1, __ATOMIC_ACQUIRE) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, 1, __ATOMIC_RELEASE) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, count + 1, __ATOMIC_ACQ_REL) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, 1, __ATOMIC_SEQ_CST) !=  res--)
+    abort ();
+}
+
+void
+test_fetch_and (char* v)
+{
+  *v = init;
+
+  if (__atomic_fetch_and (v, 0, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_fetch_and (v, init, __ATOMIC_CONSUME) !=  0)
+    abort ();
+
+  if (__atomic_fetch_and (v, 0, __ATOMIC_ACQUIRE) !=  0)
+    abort ();
+
+  *v = ~*v;
+  if (__atomic_fetch_and (v, init, __ATOMIC_RELEASE) !=  init)
+    abort ();
+
+  if (__atomic_fetch_and (v, 0, __ATOMIC_ACQ_REL) !=  init)
+    abort ();
+
+  if (__atomic_fetch_and (v, 0, __ATOMIC_SEQ_CST) !=  0)
+    abort ();
+}
+
+void
+test_fetch_nand (char* v)
+{
+  *v = init;
+
+  if (__atomic_fetch_nand (v, 0, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_fetch_nand (v, init, __ATOMIC_CONSUME) !=  init)
+    abort ();
+
+  if (__atomic_fetch_nand (v, 0, __ATOMIC_ACQUIRE) !=  0 )
+    abort ();
+
+  if (__atomic_fetch_nand (v, init, __ATOMIC_RELEASE) !=  init)
+    abort ();
+
+  if (__atomic_fetch_nand (v, init, __ATOMIC_ACQ_REL) !=  0)
+    abort ();
+
+  if (__atomic_fetch_nand (v, 0, __ATOMIC_SEQ_CST) !=  init)
+    abort ();
+}
+
+void
+test_fetch_xor (char* v)
+{
+  *v = init;
+  count = 0;
+
+  if (__atomic_fetch_xor (v, count, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_fetch_xor (v, ~count, __ATOMIC_CONSUME) !=  init)
+    abort ();
+
+  if (__atomic_fetch_xor (v, 0, __ATOMIC_ACQUIRE) !=  0)
+    abort ();
+
+  if (__atomic_fetch_xor (v, ~count, __ATOMIC_RELEASE) !=  0)
+    abort ();
+
+  if (__atomic_fetch_xor (v, 0, __ATOMIC_ACQ_REL) !=  init)
+    abort ();
+
+  if (__atomic_fetch_xor (v, ~count, __ATOMIC_SEQ_CST) !=  init)
+    abort ();
+}
+
+void
+test_fetch_or (char* v)
+{
+  *v = 0;
+  count = 1;
+
+  if (__atomic_fetch_or (v, count, __ATOMIC_RELAXED) !=  0)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, 2, __ATOMIC_CONSUME) !=  1)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, count, __ATOMIC_ACQUIRE) !=  3)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, 8, __ATOMIC_RELEASE) !=  7)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, count, __ATOMIC_ACQ_REL) !=  15)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, count, __ATOMIC_SEQ_CST) !=  31)
+    abort ();
+}
+
+/* The OP_fetch routines return the new value after the operation.  */
+
+void
+test_add_fetch (char* v)
+{
+  *v = 0;
+  count = 1;
+
+  if (__atomic_add_fetch (v, count, __ATOMIC_RELAXED) != 1)
+    abort ();
+
+  if (__atomic_add_fetch (v, 1, __ATOMIC_CONSUME) != 2)
+    abort ();
+
+  if (__atomic_add_fetch (v, count, __ATOMIC_ACQUIRE) != 3)
+    abort ();
+
+  if (__atomic_add_fetch (v, 1, __ATOMIC_RELEASE) != 4)
+    abort ();
+
+  if (__atomic_add_fetch (v, count, __ATOMIC_ACQ_REL) != 5)
+    abort ();
+
+  if (__atomic_add_fetch (v, count, __ATOMIC_SEQ_CST) != 6)
+    abort ();
+}
+
+
+void
+test_sub_fetch (char* v)
+{
+  *v = res = 20;
+  count = 0;
+
+  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_RELAXED) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, 1, __ATOMIC_CONSUME) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_ACQUIRE) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, 1, __ATOMIC_RELEASE) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_ACQ_REL) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_SEQ_CST) !=  --res)
+    abort ();
+}
+
+void
+test_and_fetch (char* v)
+{
+  *v = init;
+
+  if (__atomic_and_fetch (v, 0, __ATOMIC_RELAXED) !=  0)
+    abort ();
+
+  *v = init;
+  if (__atomic_and_fetch (v, init, __ATOMIC_CONSUME) !=  init)
+    abort ();
+
+  if (__atomic_and_fetch (v, 0, __ATOMIC_ACQUIRE) !=  0)
+    abort ();
+
+  *v = ~*v;
+  if (__atomic_and_fetch (v, init, __ATOMIC_RELEASE) !=  init)
+    abort ();
+
+  if (__atomic_and_fetch (v, 0, __ATOMIC_ACQ_REL) !=  0)
+    abort ();
+
+  *v = ~*v;
+  if (__atomic_and_fetch (v, 0, __ATOMIC_SEQ_CST) !=  0)
+    abort ();
+}
+
+void
+test_nand_fetch (char* v)
+{
+  *v = init;
+
+  if (__atomic_nand_fetch (v, 0, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_nand_fetch (v, init, __ATOMIC_CONSUME) !=  0)
+    abort ();
+
+  if (__atomic_nand_fetch (v, 0, __ATOMIC_ACQUIRE) !=  init)
+    abort ();
+
+  if (__atomic_nand_fetch (v, init, __ATOMIC_RELEASE) !=  0)
+    abort ();
+
+  if (__atomic_nand_fetch (v, init, __ATOMIC_ACQ_REL) !=  init)
+    abort ();
+
+  if (__atomic_nand_fetch (v, 0, __ATOMIC_SEQ_CST) !=  init)
+    abort ();
+}
+
+
+
+void
+test_xor_fetch (char* v)
+{
+  *v = init;
+  count = 0;
+
+  if (__atomic_xor_fetch (v, count, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_xor_fetch (v, ~count, __ATOMIC_CONSUME) !=  0)
+    abort ();
+
+  if (__atomic_xor_fetch (v, 0, __ATOMIC_ACQUIRE) !=  0)
+    abort ();
+
+  if (__atomic_xor_fetch (v, ~count, __ATOMIC_RELEASE) !=  init)
+    abort ();
+
+  if (__atomic_xor_fetch (v, 0, __ATOMIC_ACQ_REL) !=  init)
+    abort ();
+
+  if (__atomic_xor_fetch (v, ~count, __ATOMIC_SEQ_CST) !=  0)
+    abort ();
+}
+
+void
+test_or_fetch (char* v)
+{
+  *v = 0;
+  count = 1;
+
+  if (__atomic_or_fetch (v, count, __ATOMIC_RELAXED) !=  1)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, 2, __ATOMIC_CONSUME) !=  3)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, count, __ATOMIC_ACQUIRE) !=  7)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, 8, __ATOMIC_RELEASE) !=  15)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, count, __ATOMIC_ACQ_REL) !=  31)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, count, __ATOMIC_SEQ_CST) !=  63)
+    abort ();
+}
+
+
+/* Test the OP routines with a result which isn't used. Use both variations
+   within each function.  */
+
+void
+test_add (char* v)
+{
+  *v = 0;
+  count = 1;
+
+  __atomic_add_fetch (v, count, __ATOMIC_RELAXED);
+  if (*v != 1)
+    abort ();
+
+  __atomic_fetch_add (v, count, __ATOMIC_CONSUME);
+  if (*v != 2)
+    abort ();
+
+  __atomic_add_fetch (v, 1 , __ATOMIC_ACQUIRE);
+  if (*v != 3)
+    abort ();
+
+  __atomic_fetch_add (v, 1, __ATOMIC_RELEASE);
+  if (*v != 4)
+    abort ();
+
+  __atomic_add_fetch (v, count, __ATOMIC_ACQ_REL);
+  if (*v != 5)
+    abort ();
+
+  __atomic_fetch_add (v, count, __ATOMIC_SEQ_CST);
+  if (*v != 6)
+    abort ();
+}
+
+
+void
+test_sub (char* v)
+{
+  *v = res = 20;
+  count = 0;
+
+  __atomic_sub_fetch (v, count + 1, __ATOMIC_RELAXED);
+  if (*v != --res)
+    abort ();
+
+  __atomic_fetch_sub (v, count + 1, __ATOMIC_CONSUME);
+  if (*v != --res)
+    abort ();
+
+  __atomic_sub_fetch (v, 1, __ATOMIC_ACQUIRE);
+  if (*v != --res)
+    abort ();
+
+  __atomic_fetch_sub (v, 1, __ATOMIC_RELEASE);
+  if (*v != --res)
+    abort ();
+
+  __atomic_sub_fetch (v, count + 1, __ATOMIC_ACQ_REL);
+  if (*v != --res)
+    abort ();
+
+  __atomic_fetch_sub (v, count + 1, __ATOMIC_SEQ_CST);
+  if (*v != --res)
+    abort ();
+}
+
+void
+test_and (char* v)
+{
+  *v = init;
+
+  __atomic_and_fetch (v, 0, __ATOMIC_RELAXED);
+  if (*v != 0)
+    abort ();
+
+  *v = init;
+  __atomic_fetch_and (v, init, __ATOMIC_CONSUME);
+  if (*v != init)
+    abort ();
+
+  __atomic_and_fetch (v, 0, __ATOMIC_ACQUIRE);
+  if (*v != 0)
+    abort ();
+
+  *v = ~*v;
+  __atomic_fetch_and (v, init, __ATOMIC_RELEASE);
+  if (*v != init)
+    abort ();
+
+  __atomic_and_fetch (v, 0, __ATOMIC_ACQ_REL);
+  if (*v != 0)
+    abort ();
+
+  *v = ~*v;
+  __atomic_fetch_and (v, 0, __ATOMIC_SEQ_CST);
+  if (*v != 0)
+    abort ();
+}
+
+void
+test_nand (char* v)
+{
+  *v = init;
+
+  __atomic_fetch_nand (v, 0, __ATOMIC_RELAXED);
+  if (*v != init)
+    abort ();
+
+  __atomic_fetch_nand (v, init, __ATOMIC_CONSUME);
+  if (*v != 0)
+    abort ();
+
+  __atomic_nand_fetch (v, 0, __ATOMIC_ACQUIRE);
+  if (*v != init)
+    abort ();
+
+  __atomic_nand_fetch (v, init, __ATOMIC_RELEASE);
+  if (*v != 0)
+    abort ();
+
+  __atomic_fetch_nand (v, init, __ATOMIC_ACQ_REL);
+  if (*v != init)
+    abort ();
+
+  __atomic_nand_fetch (v, 0, __ATOMIC_SEQ_CST);
+  if (*v != init)
+    abort ();
+}
+
+
+
+void
+test_xor (char* v)
+{
+  *v = init;
+  count = 0;
+
+  __atomic_xor_fetch (v, count, __ATOMIC_RELAXED);
+  if (*v != init)
+    abort ();
+
+  __atomic_fetch_xor (v, ~count, __ATOMIC_CONSUME);
+  if (*v != 0)
+    abort ();
+
+  __atomic_xor_fetch (v, 0, __ATOMIC_ACQUIRE);
+  if (*v != 0)
+    abort ();
+
+  __atomic_fetch_xor (v, ~count, __ATOMIC_RELEASE);
+  if (*v != init)
+    abort ();
+
+  __atomic_fetch_xor (v, 0, __ATOMIC_ACQ_REL);
+  if (*v != init)
+    abort ();
+
+  __atomic_xor_fetch (v, ~count, __ATOMIC_SEQ_CST);
+  if (*v != 0)
+    abort ();
+}
+
+void
+test_or (char* v)
+{
+  *v = 0;
+  count = 1;
+
+  __atomic_or_fetch (v, count, __ATOMIC_RELAXED);
+  if (*v != 1)
+    abort ();
+
+  count *= 2;
+  __atomic_fetch_or (v, count, __ATOMIC_CONSUME);
+  if (*v != 3)
+    abort ();
+
+  count *= 2;
+  __atomic_or_fetch (v, 4, __ATOMIC_ACQUIRE);
+  if (*v != 7)
+    abort ();
+
+  count *= 2;
+  __atomic_fetch_or (v, 8, __ATOMIC_RELEASE);
+  if (*v != 15)
+    abort ();
+
+  count *= 2;
+  __atomic_or_fetch (v, count, __ATOMIC_ACQ_REL);
+  if (*v != 31)
+    abort ();
+
+  count *= 2;
+  __atomic_fetch_or (v, count, __ATOMIC_SEQ_CST);
+  if (*v != 63)
+    abort ();
+}
+
+int
+main ()
+{
+  char* V[] = {&A.a, &A.b, &A.c, &A.d};
+
+  for (int i = 0; i < 4; i++) {
+    test_fetch_add (V[i]);
+    test_fetch_sub (V[i]);
+    test_fetch_and (V[i]);
+    test_fetch_nand (V[i]);
+    test_fetch_xor (V[i]);
+    test_fetch_or (V[i]);
+
+    test_add_fetch (V[i]);
+    test_sub_fetch (V[i]);
+    test_and_fetch (V[i]);
+    test_nand_fetch (V[i]);
+    test_xor_fetch (V[i]);
+    test_or_fetch (V[i]);
+
+    test_add (V[i]);
+    test_sub (V[i]);
+    test_and (V[i]);
+    test_nand (V[i]);
+    test_xor (V[i]);
+    test_or (V[i]);
+  }
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-4.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-4.c
new file mode 100644
index 00000000000..eecfaae5cc6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-4.c
@@ -0,0 +1,566 @@
+/* Check all short alignments.  */
+/* Duplicate logic as libatomic/testsuite/libatomic.c/atomic-op-2.c */
+/* Test __atomic routines for existence and proper execution on 2 byte
+   values with each valid memory model.  */
+/* { dg-do run } */
+/* { dg-options "-minline-atomics -Wno-address-of-packed-member" } */
+
+/* Test the execution of the __atomic_*OP builtin routines for a short.  */
+
+extern void abort(void);
+
+short count, res;
+const short init = ~0;
+
+struct A
+{
+   short a;
+   short b;
+} __attribute__ ((packed)) A;
+
+/* The fetch_op routines return the original value before the operation.  */
+
+void
+test_fetch_add (short* v)
+{
+  *v = 0;
+  count = 1;
+
+  if (__atomic_fetch_add (v, count, __ATOMIC_RELAXED) != 0)
+    abort ();
+
+  if (__atomic_fetch_add (v, 1, __ATOMIC_CONSUME) != 1)
+    abort ();
+
+  if (__atomic_fetch_add (v, count, __ATOMIC_ACQUIRE) != 2)
+    abort ();
+
+  if (__atomic_fetch_add (v, 1, __ATOMIC_RELEASE) != 3)
+    abort ();
+
+  if (__atomic_fetch_add (v, count, __ATOMIC_ACQ_REL) != 4)
+    abort ();
+
+  if (__atomic_fetch_add (v, 1, __ATOMIC_SEQ_CST) != 5)
+    abort ();
+}
+
+
+void
+test_fetch_sub (short* v)
+{
+  *v = res = 20;
+  count = 0;
+
+  if (__atomic_fetch_sub (v, count + 1, __ATOMIC_RELAXED) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, 1, __ATOMIC_CONSUME) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, count + 1, __ATOMIC_ACQUIRE) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, 1, __ATOMIC_RELEASE) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, count + 1, __ATOMIC_ACQ_REL) !=  res--)
+    abort ();
+
+  if (__atomic_fetch_sub (v, 1, __ATOMIC_SEQ_CST) !=  res--)
+    abort ();
+}
+
+void
+test_fetch_and (short* v)
+{
+  *v = init;
+
+  if (__atomic_fetch_and (v, 0, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_fetch_and (v, init, __ATOMIC_CONSUME) !=  0)
+    abort ();
+
+  if (__atomic_fetch_and (v, 0, __ATOMIC_ACQUIRE) !=  0)
+    abort ();
+
+  *v = ~*v;
+  if (__atomic_fetch_and (v, init, __ATOMIC_RELEASE) !=  init)
+    abort ();
+
+  if (__atomic_fetch_and (v, 0, __ATOMIC_ACQ_REL) !=  init)
+    abort ();
+
+  if (__atomic_fetch_and (v, 0, __ATOMIC_SEQ_CST) !=  0)
+    abort ();
+}
+
+void
+test_fetch_nand (short* v)
+{
+  *v = init;
+
+  if (__atomic_fetch_nand (v, 0, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_fetch_nand (v, init, __ATOMIC_CONSUME) !=  init)
+    abort ();
+
+  if (__atomic_fetch_nand (v, 0, __ATOMIC_ACQUIRE) !=  0 )
+    abort ();
+
+  if (__atomic_fetch_nand (v, init, __ATOMIC_RELEASE) !=  init)
+    abort ();
+
+  if (__atomic_fetch_nand (v, init, __ATOMIC_ACQ_REL) !=  0)
+    abort ();
+
+  if (__atomic_fetch_nand (v, 0, __ATOMIC_SEQ_CST) !=  init)
+    abort ();
+}
+
+void
+test_fetch_xor (short* v)
+{
+  *v = init;
+  count = 0;
+
+  if (__atomic_fetch_xor (v, count, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_fetch_xor (v, ~count, __ATOMIC_CONSUME) !=  init)
+    abort ();
+
+  if (__atomic_fetch_xor (v, 0, __ATOMIC_ACQUIRE) !=  0)
+    abort ();
+
+  if (__atomic_fetch_xor (v, ~count, __ATOMIC_RELEASE) !=  0)
+    abort ();
+
+  if (__atomic_fetch_xor (v, 0, __ATOMIC_ACQ_REL) !=  init)
+    abort ();
+
+  if (__atomic_fetch_xor (v, ~count, __ATOMIC_SEQ_CST) !=  init)
+    abort ();
+}
+
+void
+test_fetch_or (short* v)
+{
+  *v = 0;
+  count = 1;
+
+  if (__atomic_fetch_or (v, count, __ATOMIC_RELAXED) !=  0)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, 2, __ATOMIC_CONSUME) !=  1)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, count, __ATOMIC_ACQUIRE) !=  3)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, 8, __ATOMIC_RELEASE) !=  7)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, count, __ATOMIC_ACQ_REL) !=  15)
+    abort ();
+
+  count *= 2;
+  if (__atomic_fetch_or (v, count, __ATOMIC_SEQ_CST) !=  31)
+    abort ();
+}
+
+/* The OP_fetch routines return the new value after the operation.  */
+
+void
+test_add_fetch (short* v)
+{
+  *v = 0;
+  count = 1;
+
+  if (__atomic_add_fetch (v, count, __ATOMIC_RELAXED) != 1)
+    abort ();
+
+  if (__atomic_add_fetch (v, 1, __ATOMIC_CONSUME) != 2)
+    abort ();
+
+  if (__atomic_add_fetch (v, count, __ATOMIC_ACQUIRE) != 3)
+    abort ();
+
+  if (__atomic_add_fetch (v, 1, __ATOMIC_RELEASE) != 4)
+    abort ();
+
+  if (__atomic_add_fetch (v, count, __ATOMIC_ACQ_REL) != 5)
+    abort ();
+
+  if (__atomic_add_fetch (v, count, __ATOMIC_SEQ_CST) != 6)
+    abort ();
+}
+
+
+void
+test_sub_fetch (short* v)
+{
+  *v = res = 20;
+  count = 0;
+
+  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_RELAXED) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, 1, __ATOMIC_CONSUME) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_ACQUIRE) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, 1, __ATOMIC_RELEASE) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_ACQ_REL) !=  --res)
+    abort ();
+
+  if (__atomic_sub_fetch (v, count + 1, __ATOMIC_SEQ_CST) !=  --res)
+    abort ();
+}
+
+void
+test_and_fetch (short* v)
+{
+  *v = init;
+
+  if (__atomic_and_fetch (v, 0, __ATOMIC_RELAXED) !=  0)
+    abort ();
+
+  *v = init;
+  if (__atomic_and_fetch (v, init, __ATOMIC_CONSUME) !=  init)
+    abort ();
+
+  if (__atomic_and_fetch (v, 0, __ATOMIC_ACQUIRE) !=  0)
+    abort ();
+
+  *v = ~*v;
+  if (__atomic_and_fetch (v, init, __ATOMIC_RELEASE) !=  init)
+    abort ();
+
+  if (__atomic_and_fetch (v, 0, __ATOMIC_ACQ_REL) !=  0)
+    abort ();
+
+  *v = ~*v;
+  if (__atomic_and_fetch (v, 0, __ATOMIC_SEQ_CST) !=  0)
+    abort ();
+}
+
+void
+test_nand_fetch (short* v)
+{
+  *v = init;
+
+  if (__atomic_nand_fetch (v, 0, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_nand_fetch (v, init, __ATOMIC_CONSUME) !=  0)
+    abort ();
+
+  if (__atomic_nand_fetch (v, 0, __ATOMIC_ACQUIRE) !=  init)
+    abort ();
+
+  if (__atomic_nand_fetch (v, init, __ATOMIC_RELEASE) !=  0)
+    abort ();
+
+  if (__atomic_nand_fetch (v, init, __ATOMIC_ACQ_REL) !=  init)
+    abort ();
+
+  if (__atomic_nand_fetch (v, 0, __ATOMIC_SEQ_CST) !=  init)
+    abort ();
+}
+
+
+
+void
+test_xor_fetch (short* v)
+{
+  *v = init;
+  count = 0;
+
+  if (__atomic_xor_fetch (v, count, __ATOMIC_RELAXED) !=  init)
+    abort ();
+
+  if (__atomic_xor_fetch (v, ~count, __ATOMIC_CONSUME) !=  0)
+    abort ();
+
+  if (__atomic_xor_fetch (v, 0, __ATOMIC_ACQUIRE) !=  0)
+    abort ();
+
+  if (__atomic_xor_fetch (v, ~count, __ATOMIC_RELEASE) !=  init)
+    abort ();
+
+  if (__atomic_xor_fetch (v, 0, __ATOMIC_ACQ_REL) !=  init)
+    abort ();
+
+  if (__atomic_xor_fetch (v, ~count, __ATOMIC_SEQ_CST) !=  0)
+    abort ();
+}
+
+void
+test_or_fetch (short* v)
+{
+  *v = 0;
+  count = 1;
+
+  if (__atomic_or_fetch (v, count, __ATOMIC_RELAXED) !=  1)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, 2, __ATOMIC_CONSUME) !=  3)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, count, __ATOMIC_ACQUIRE) !=  7)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, 8, __ATOMIC_RELEASE) !=  15)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, count, __ATOMIC_ACQ_REL) !=  31)
+    abort ();
+
+  count *= 2;
+  if (__atomic_or_fetch (v, count, __ATOMIC_SEQ_CST) !=  63)
+    abort ();
+}
+
+
+/* Test the OP routines with a result which isn't used. Use both variations
+   within each function.  */
+
+void
+test_add (short* v)
+{
+  *v = 0;
+  count = 1;
+
+  __atomic_add_fetch (v, count, __ATOMIC_RELAXED);
+  if (*v != 1)
+    abort ();
+
+  __atomic_fetch_add (v, count, __ATOMIC_CONSUME);
+  if (*v != 2)
+    abort ();
+
+  __atomic_add_fetch (v, 1 , __ATOMIC_ACQUIRE);
+  if (*v != 3)
+    abort ();
+
+  __atomic_fetch_add (v, 1, __ATOMIC_RELEASE);
+  if (*v != 4)
+    abort ();
+
+  __atomic_add_fetch (v, count, __ATOMIC_ACQ_REL);
+  if (*v != 5)
+    abort ();
+
+  __atomic_fetch_add (v, count, __ATOMIC_SEQ_CST);
+  if (*v != 6)
+    abort ();
+}
+
+
+void
+test_sub (short* v)
+{
+  *v = res = 20;
+  count = 0;
+
+  __atomic_sub_fetch (v, count + 1, __ATOMIC_RELAXED);
+  if (*v != --res)
+    abort ();
+
+  __atomic_fetch_sub (v, count + 1, __ATOMIC_CONSUME);
+  if (*v != --res)
+    abort ();
+
+  __atomic_sub_fetch (v, 1, __ATOMIC_ACQUIRE);
+  if (*v != --res)
+    abort ();
+
+  __atomic_fetch_sub (v, 1, __ATOMIC_RELEASE);
+  if (*v != --res)
+    abort ();
+
+  __atomic_sub_fetch (v, count + 1, __ATOMIC_ACQ_REL);
+  if (*v != --res)
+    abort ();
+
+  __atomic_fetch_sub (v, count + 1, __ATOMIC_SEQ_CST);
+  if (*v != --res)
+    abort ();
+}
+
+void
+test_and (short* v)
+{
+  *v = init;
+
+  __atomic_and_fetch (v, 0, __ATOMIC_RELAXED);
+  if (*v != 0)
+    abort ();
+
+  *v = init;
+  __atomic_fetch_and (v, init, __ATOMIC_CONSUME);
+  if (*v != init)
+    abort ();
+
+  __atomic_and_fetch (v, 0, __ATOMIC_ACQUIRE);
+  if (*v != 0)
+    abort ();
+
+  *v = ~*v;
+  __atomic_fetch_and (v, init, __ATOMIC_RELEASE);
+  if (*v != init)
+    abort ();
+
+  __atomic_and_fetch (v, 0, __ATOMIC_ACQ_REL);
+  if (*v != 0)
+    abort ();
+
+  *v = ~*v;
+  __atomic_fetch_and (v, 0, __ATOMIC_SEQ_CST);
+  if (*v != 0)
+    abort ();
+}
+
+void
+test_nand (short* v)
+{
+  *v = init;
+
+  __atomic_fetch_nand (v, 0, __ATOMIC_RELAXED);
+  if (*v != init)
+    abort ();
+
+  __atomic_fetch_nand (v, init, __ATOMIC_CONSUME);
+  if (*v != 0)
+    abort ();
+
+  __atomic_nand_fetch (v, 0, __ATOMIC_ACQUIRE);
+  if (*v != init)
+    abort ();
+
+  __atomic_nand_fetch (v, init, __ATOMIC_RELEASE);
+  if (*v != 0)
+    abort ();
+
+  __atomic_fetch_nand (v, init, __ATOMIC_ACQ_REL);
+  if (*v != init)
+    abort ();
+
+  __atomic_nand_fetch (v, 0, __ATOMIC_SEQ_CST);
+  if (*v != init)
+    abort ();
+}
+
+
+
+void
+test_xor (short* v)
+{
+  *v = init;
+  count = 0;
+
+  __atomic_xor_fetch (v, count, __ATOMIC_RELAXED);
+  if (*v != init)
+    abort ();
+
+  __atomic_fetch_xor (v, ~count, __ATOMIC_CONSUME);
+  if (*v != 0)
+    abort ();
+
+  __atomic_xor_fetch (v, 0, __ATOMIC_ACQUIRE);
+  if (*v != 0)
+    abort ();
+
+  __atomic_fetch_xor (v, ~count, __ATOMIC_RELEASE);
+  if (*v != init)
+    abort ();
+
+  __atomic_fetch_xor (v, 0, __ATOMIC_ACQ_REL);
+  if (*v != init)
+    abort ();
+
+  __atomic_xor_fetch (v, ~count, __ATOMIC_SEQ_CST);
+  if (*v != 0)
+    abort ();
+}
+
+void
+test_or (short* v)
+{
+  *v = 0;
+  count = 1;
+
+  __atomic_or_fetch (v, count, __ATOMIC_RELAXED);
+  if (*v != 1)
+    abort ();
+
+  count *= 2;
+  __atomic_fetch_or (v, count, __ATOMIC_CONSUME);
+  if (*v != 3)
+    abort ();
+
+  count *= 2;
+  __atomic_or_fetch (v, 4, __ATOMIC_ACQUIRE);
+  if (*v != 7)
+    abort ();
+
+  count *= 2;
+  __atomic_fetch_or (v, 8, __ATOMIC_RELEASE);
+  if (*v != 15)
+    abort ();
+
+  count *= 2;
+  __atomic_or_fetch (v, count, __ATOMIC_ACQ_REL);
+  if (*v != 31)
+    abort ();
+
+  count *= 2;
+  __atomic_fetch_or (v, count, __ATOMIC_SEQ_CST);
+  if (*v != 63)
+    abort ();
+}
+
+int
+main () {
+  short* V[] = {&A.a, &A.b};
+
+  for (int i = 0; i < 2; i++) {
+    test_fetch_add (V[i]);
+    test_fetch_sub (V[i]);
+    test_fetch_and (V[i]);
+    test_fetch_nand (V[i]);
+    test_fetch_xor (V[i]);
+    test_fetch_or (V[i]);
+
+    test_add_fetch (V[i]);
+    test_sub_fetch (V[i]);
+    test_and_fetch (V[i]);
+    test_nand_fetch (V[i]);
+    test_xor_fetch (V[i]);
+    test_or_fetch (V[i]);
+
+    test_add (V[i]);
+    test_sub (V[i]);
+    test_and (V[i]);
+    test_nand (V[i]);
+    test_xor (V[i]);
+    test_or (V[i]);
+  }
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-5.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-5.c
new file mode 100644
index 00000000000..52093894a79
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-5.c
@@ -0,0 +1,87 @@
+/* Test __atomic routines for existence and proper execution on 1 byte
+   values with each valid memory model.  */
+/* Duplicate logic as libatomic/testsuite/libatomic.c/atomic-compare-exchange-1.c */
+/* { dg-do run } */
+/* { dg-options "-minline-atomics" } */
+
+/* Test the execution of the __atomic_compare_exchange_n builtin for a char.  */
+
+extern void abort(void);
+
+char v = 0;
+char expected = 0;
+char max = ~0;
+char desired = ~0;
+char zero = 0;
+
+#define STRONG 0
+#define WEAK 1
+
+int
+main ()
+{
+
+  if (!__atomic_compare_exchange_n (&v, &expected, max, STRONG , __ATOMIC_RELAXED, __ATOMIC_RELAXED))
+    abort ();
+  if (expected != 0)
+    abort ();
+
+  if (__atomic_compare_exchange_n (&v, &expected, 0, STRONG , __ATOMIC_ACQUIRE, __ATOMIC_RELAXED))
+    abort ();
+  if (expected != max)
+    abort ();
+
+  if (!__atomic_compare_exchange_n (&v, &expected, 0, STRONG , __ATOMIC_RELEASE, __ATOMIC_ACQUIRE))
+    abort ();
+  if (expected != max)
+    abort ();
+  if (v != 0)
+    abort ();
+
+  if (__atomic_compare_exchange_n (&v, &expected, desired, WEAK, __ATOMIC_ACQ_REL, __ATOMIC_ACQUIRE))
+    abort ();
+  if (expected != 0)
+    abort ();
+
+  if (!__atomic_compare_exchange_n (&v, &expected, desired, STRONG , __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST))
+    abort ();
+  if (expected != 0)
+    abort ();
+  if (v != max)
+    abort ();
+
+  /* Now test the generic version.  */
+
+  v = 0;
+
+  if (!__atomic_compare_exchange (&v, &expected, &max, STRONG, __ATOMIC_RELAXED, __ATOMIC_RELAXED))
+    abort ();
+  if (expected != 0)
+    abort ();
+
+  if (__atomic_compare_exchange (&v, &expected, &zero, STRONG , __ATOMIC_ACQUIRE, __ATOMIC_RELAXED))
+    abort ();
+  if (expected != max)
+    abort ();
+
+  if (!__atomic_compare_exchange (&v, &expected, &zero, STRONG , __ATOMIC_RELEASE, __ATOMIC_ACQUIRE))
+    abort ();
+  if (expected != max)
+    abort ();
+  if (v != 0)
+    abort ();
+
+  if (__atomic_compare_exchange (&v, &expected, &desired, WEAK, __ATOMIC_ACQ_REL, __ATOMIC_ACQUIRE))
+    abort ();
+  if (expected != 0)
+    abort ();
+
+  if (!__atomic_compare_exchange (&v, &expected, &desired, STRONG , __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST))
+    abort ();
+  if (expected != 0)
+    abort ();
+  if (v != max)
+    abort ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-6.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-6.c
new file mode 100644
index 00000000000..8fee8c44811
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-6.c
@@ -0,0 +1,87 @@
+/* Test __atomic routines for existence and proper execution on 2 byte
+   values with each valid memory model.  */
+/* Duplicate logic as libatomic/testsuite/libatomic.c/atomic-compare-exchange-2.c */
+/* { dg-do run } */
+/* { dg-options "-minline-atomics" } */
+
+/* Test the execution of the __atomic_compare_exchange_n builtin for a short.  */
+
+extern void abort(void);
+
+short v = 0;
+short expected = 0;
+short max = ~0;
+short desired = ~0;
+short zero = 0;
+
+#define STRONG 0
+#define WEAK 1
+
+int
+main ()
+{
+
+  if (!__atomic_compare_exchange_n (&v, &expected, max, STRONG , __ATOMIC_RELAXED, __ATOMIC_RELAXED))
+    abort ();
+  if (expected != 0)
+    abort ();
+
+  if (__atomic_compare_exchange_n (&v, &expected, 0, STRONG , __ATOMIC_ACQUIRE, __ATOMIC_RELAXED))
+    abort ();
+  if (expected != max)
+    abort ();
+
+  if (!__atomic_compare_exchange_n (&v, &expected, 0, STRONG , __ATOMIC_RELEASE, __ATOMIC_ACQUIRE))
+    abort ();
+  if (expected != max)
+    abort ();
+  if (v != 0)
+    abort ();
+
+  if (__atomic_compare_exchange_n (&v, &expected, desired, WEAK, __ATOMIC_ACQ_REL, __ATOMIC_ACQUIRE))
+    abort ();
+  if (expected != 0)
+    abort ();
+
+  if (!__atomic_compare_exchange_n (&v, &expected, desired, STRONG , __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST))
+    abort ();
+  if (expected != 0)
+    abort ();
+  if (v != max)
+    abort ();
+
+  /* Now test the generic version.  */
+
+  v = 0;
+
+  if (!__atomic_compare_exchange (&v, &expected, &max, STRONG, __ATOMIC_RELAXED, __ATOMIC_RELAXED))
+    abort ();
+  if (expected != 0)
+    abort ();
+
+  if (__atomic_compare_exchange (&v, &expected, &zero, STRONG , __ATOMIC_ACQUIRE, __ATOMIC_RELAXED))
+    abort ();
+  if (expected != max)
+    abort ();
+
+  if (!__atomic_compare_exchange (&v, &expected, &zero, STRONG , __ATOMIC_RELEASE, __ATOMIC_ACQUIRE))
+    abort ();
+  if (expected != max)
+    abort ();
+  if (v != 0)
+    abort ();
+
+  if (__atomic_compare_exchange (&v, &expected, &desired, WEAK, __ATOMIC_ACQ_REL, __ATOMIC_ACQUIRE))
+    abort ();
+  if (expected != 0)
+    abort ();
+
+  if (!__atomic_compare_exchange (&v, &expected, &desired, STRONG , __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST))
+    abort ();
+  if (expected != 0)
+    abort ();
+  if (v != max)
+    abort ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-7.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-7.c
new file mode 100644
index 00000000000..24c344c0ce3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-7.c
@@ -0,0 +1,69 @@
+/* Test __atomic routines for existence and proper execution on 1 byte
+   values with each valid memory model.  */
+/* Duplicate logic as libatomic/testsuite/libatomic.c/atomic-exchange-1.c */
+/* { dg-do run } */
+/* { dg-options "-minline-atomics" } */
+
+/* Test the execution of the __atomic_exchange_n builtin for a char.  */
+
+extern void abort(void);
+
+char v, count, ret;
+
+int
+main ()
+{
+  v = 0;
+  count = 0;
+
+  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_RELAXED) != count)
+    abort ();
+  count++;
+
+  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_ACQUIRE) != count)
+    abort ();
+  count++;
+
+  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_RELEASE) != count)
+    abort ();
+  count++;
+
+  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_ACQ_REL) != count)
+    abort ();
+  count++;
+
+  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_SEQ_CST) != count)
+    abort ();
+  count++;
+
+  /* Now test the generic version.  */
+
+  count++;
+
+  __atomic_exchange (&v, &count, &ret, __ATOMIC_RELAXED);
+  if (ret != count - 1 || v != count)
+    abort ();
+  count++;
+
+  __atomic_exchange (&v, &count, &ret, __ATOMIC_ACQUIRE);
+  if (ret != count - 1 || v != count)
+    abort ();
+  count++;
+
+  __atomic_exchange (&v, &count, &ret, __ATOMIC_RELEASE);
+  if (ret != count - 1 || v != count)
+    abort ();
+  count++;
+
+  __atomic_exchange (&v, &count, &ret, __ATOMIC_ACQ_REL);
+  if (ret != count - 1 || v != count)
+    abort ();
+  count++;
+
+  __atomic_exchange (&v, &count, &ret, __ATOMIC_SEQ_CST);
+  if (ret != count - 1 || v != count)
+    abort ();
+  count++;
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-8.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-8.c
new file mode 100644
index 00000000000..edc212df04e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-8.c
@@ -0,0 +1,69 @@
+/* Test __atomic routines for existence and proper execution on 2 byte
+   values with each valid memory model.  */
+/* Duplicate logic as libatomic/testsuite/libatomic.c/atomic-exchange-2.c */
+/* { dg-do run } */
+/* { dg-options "-minline-atomics" } */
+
+/* Test the execution of the __atomic_X builtin for a short.  */
+
+extern void abort(void);
+
+short v, count, ret;
+
+int
+main ()
+{
+  v = 0;
+  count = 0;
+
+  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_RELAXED) != count)
+    abort ();
+  count++;
+
+  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_ACQUIRE) != count)
+    abort ();
+  count++;
+
+  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_RELEASE) != count)
+    abort ();
+  count++;
+
+  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_ACQ_REL) != count)
+    abort ();
+  count++;
+
+  if (__atomic_exchange_n (&v, count + 1, __ATOMIC_SEQ_CST) != count)
+    abort ();
+  count++;
+
+  /* Now test the generic version.  */
+
+  count++;
+
+  __atomic_exchange (&v, &count, &ret, __ATOMIC_RELAXED);
+  if (ret != count - 1 || v != count)
+    abort ();
+  count++;
+
+  __atomic_exchange (&v, &count, &ret, __ATOMIC_ACQUIRE);
+  if (ret != count - 1 || v != count)
+    abort ();
+  count++;
+
+  __atomic_exchange (&v, &count, &ret, __ATOMIC_RELEASE);
+  if (ret != count - 1 || v != count)
+    abort ();
+  count++;
+
+  __atomic_exchange (&v, &count, &ret, __ATOMIC_ACQ_REL);
+  if (ret != count - 1 || v != count)
+    abort ();
+  count++;
+
+  __atomic_exchange (&v, &count, &ret, __ATOMIC_SEQ_CST);
+  if (ret != count - 1 || v != count)
+    abort ();
+  count++;
+
+  return 0;
+}
diff --git a/libgcc/config/riscv/atomic.c b/libgcc/config/riscv/atomic.c
index 69f53623509..573d163ea04 100644
--- a/libgcc/config/riscv/atomic.c
+++ b/libgcc/config/riscv/atomic.c
@@ -30,6 +30,8 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 #define INVERT		"not %[tmp1], %[tmp1]\n\t"
 #define DONT_INVERT	""

+/* Logic duplicated in gcc/gcc/config/riscv/sync.md for use when inlining is enabled */
+
 #define GENERATE_FETCH_AND_OP(type, size, opname, insn, invert, cop)	\
   type __sync_fetch_and_ ## opname ## _ ## size (type *p, type v)	\
   {									\
--
2.34.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [committed] RISCV: Inline subword atomic ops
  2023-04-26 17:01             ` [committed] " Patrick O'Neill
@ 2023-05-02 20:34               ` Patrick O'Neill
  2023-05-03  6:32                 ` Jeff Law
  0 siblings, 1 reply; 24+ messages in thread
From: Patrick O'Neill @ 2023-05-02 20:34 UTC (permalink / raw)
  To: jeffreyalaw, gcc-patches; +Cc: palmer, kito.cheng, david.abd, schwab

Is this OK for a backport to GCC-13 as well?

(with the whitespace fixes/changelog revision squashed into it)

Patrick

On 4/26/23 10:01, Patrick O'Neill wrote:
> Committed - I had to reformat the changelog so it would push and resolve a
> trivial merge conflict in riscv.opt.
>
> ---
>
> RISC-V has no support for subword atomic operations; code currently
> generates libatomic library calls.
>
> This patch changes the default behavior to inline subword atomic calls
> (using the same logic as the existing library call).
> Behavior can be specified using the -minline-atomics and
> -mno-inline-atomics command line flags.
>
> gcc/libgcc/config/riscv/atomic.c has the same logic implemented in asm.
> This will need to stay for backwards compatibility and the
> -mno-inline-atomics flag.
>
> 2023-04-18 Patrick O'Neill <patrick@rivosinc.com>
>
> gcc/ChangeLog:
> 	PR target/104338
> 	* config/riscv/riscv-protos.h: Add helper function stubs.
> 	* config/riscv/riscv.cc: Add helper functions for subword masking.
> 	* config/riscv/riscv.opt: Add command-line flag.
> 	* config/riscv/sync.md: Add masking logic and inline asm for fetch_and_op,
> 	fetch_and_nand, CAS, and exchange ops.
> 	* doc/invoke.texi: Add blurb regarding command-line flag.
>
> libgcc/ChangeLog:
> 	PR target/104338
> 	* config/riscv/atomic.c: Add reference to duplicate logic.
>
> gcc/testsuite/ChangeLog:
> 	PR target/104338
> 	* gcc.target/riscv/inline-atomics-1.c: New test.
> 	* gcc.target/riscv/inline-atomics-2.c: New test.
> 	* gcc.target/riscv/inline-atomics-3.c: New test.
> 	* gcc.target/riscv/inline-atomics-4.c: New test.
> 	* gcc.target/riscv/inline-atomics-5.c: New test.
> 	* gcc.target/riscv/inline-atomics-6.c: New test.
> 	* gcc.target/riscv/inline-atomics-7.c: New test.
> 	* gcc.target/riscv/inline-atomics-8.c: New test.
>
> Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [committed] RISCV: Inline subword atomic ops
  2023-05-02 20:34               ` Patrick O'Neill
@ 2023-05-03  6:32                 ` Jeff Law
  2023-05-03  9:49                   ` Richard Biener
  0 siblings, 1 reply; 24+ messages in thread
From: Jeff Law @ 2023-05-03  6:32 UTC (permalink / raw)
  To: Patrick O'Neill, gcc-patches; +Cc: palmer, kito.cheng, david.abd, schwab



On 5/2/23 14:34, Patrick O'Neill wrote:
> Is this OK for a backport to GCC-13 as well?
Let me sync with Richi & Jakub.  They're the release managers and this 
doesn't fall under the usual rules for things that can be backported.

jeff

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [committed] RISCV: Inline subword atomic ops
  2023-05-03  6:32                 ` Jeff Law
@ 2023-05-03  9:49                   ` Richard Biener
  2023-05-03 14:14                     ` Palmer Dabbelt
  0 siblings, 1 reply; 24+ messages in thread
From: Richard Biener @ 2023-05-03  9:49 UTC (permalink / raw)
  To: Jeff Law
  Cc: Patrick O'Neill, gcc-patches, palmer, kito.cheng, david.abd, schwab

On Wed, May 3, 2023 at 8:33 AM Jeff Law via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
>
>
> On 5/2/23 14:34, Patrick O'Neill wrote:
> > Is this OK for a backport to GCC-13 as well?
> Let me sync with Richi & Jakub.  They're the release managers and this
> doesn't fall under the usual rules for things that can be backported.

I would guess that most distros have these patches backported (SUSE has
that to both 12 and 13), so it wouldn't make much of a difference.  Since
this is backend specific and RISCV is neither primary nor secondary
it's up to the target maintainer discretion to bend the rules.

Richard.

> jeff

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [committed] RISCV: Inline subword atomic ops
  2023-05-03  9:49                   ` Richard Biener
@ 2023-05-03 14:14                     ` Palmer Dabbelt
  2023-05-03 15:13                       ` Jeff Law
  0 siblings, 1 reply; 24+ messages in thread
From: Palmer Dabbelt @ 2023-05-03 14:14 UTC (permalink / raw)
  To: richard.guenther
  Cc: jeffreyalaw, Patrick O'Neill, gcc-patches, Kito Cheng,
	david.abd, schwab

On Wed, 03 May 2023 02:49:41 PDT (-0700), richard.guenther@gmail.com wrote:
> On Wed, May 3, 2023 at 8:33 AM Jeff Law via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>>
>>
>>
>> On 5/2/23 14:34, Patrick O'Neill wrote:
>> > Is this OK for a backport to GCC-13 as well?
>> Let me sync with Richi & Jakub.  They're the release managers and this
>> doesn't fall under the usual rules for things that can be backported.
>
> I would guess that most distros have these patches backported (SUSE has
> that to both 12 and 13), so it wouldn't make much of a difference.  Since

That'd be my argument, too.  The distros that don't have this probably 
have something scarier, like an implicit default to -latomic.

> this is backend specific and RISCV is neither primary nor secondary
> it's up to the target maintainer discretion to bend the rules.

Fair, though we're trying to at least pretend we're playing by the 
rules... ;)

> Richard.
>
>> jeff

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [committed] RISCV: Inline subword atomic ops
  2023-05-03 14:14                     ` Palmer Dabbelt
@ 2023-05-03 15:13                       ` Jeff Law
  2023-05-03 15:33                         ` Palmer Dabbelt
  0 siblings, 1 reply; 24+ messages in thread
From: Jeff Law @ 2023-05-03 15:13 UTC (permalink / raw)
  To: Palmer Dabbelt, richard.guenther
  Cc: Patrick O'Neill, gcc-patches, Kito Cheng, david.abd, schwab



On 5/3/23 08:14, Palmer Dabbelt wrote:
> On Wed, 03 May 2023 02:49:41 PDT (-0700), richard.guenther@gmail.com wrote:
>> On Wed, May 3, 2023 at 8:33 AM Jeff Law via Gcc-patches
>> <gcc-patches@gcc.gnu.org> wrote:
>>>
>>>
>>>
>>> On 5/2/23 14:34, Patrick O'Neill wrote:
>>> > Is this OK for a backport to GCC-13 as well?
>>> Let me sync with Richi & Jakub.  They're the release managers and this
>>> doesn't fall under the usual rules for things that can be backported.
>>
>> I would guess that most distros have these patches backported (SUSE has
>> that to both 12 and 13), so it wouldn't make much of a difference.  Since
> 
> That'd be my argument, too.  The distros that don't have this probably 
> have something scarier, like an implicit default to -latomic.
> 
>> this is backend specific and RISCV is neither primary nor secondary
>> it's up to the target maintainer discretion to bend the rules.
> 
> Fair, though we're trying to at least pretend we're playing by the 
> rules... ;)
So the net is let's backport this patch series to gcc-13.

jeff


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [committed] RISCV: Inline subword atomic ops
  2023-05-03 15:13                       ` Jeff Law
@ 2023-05-03 15:33                         ` Palmer Dabbelt
  2023-05-03 16:13                           ` Patrick O'Neill
  0 siblings, 1 reply; 24+ messages in thread
From: Palmer Dabbelt @ 2023-05-03 15:33 UTC (permalink / raw)
  To: jeffreyalaw
  Cc: richard.guenther, Patrick O'Neill, gcc-patches, Kito Cheng,
	david.abd, schwab

On Wed, 03 May 2023 08:13:23 PDT (-0700), jeffreyalaw@gmail.com wrote:
>
>
> On 5/3/23 08:14, Palmer Dabbelt wrote:
>> On Wed, 03 May 2023 02:49:41 PDT (-0700), richard.guenther@gmail.com wrote:
>>> On Wed, May 3, 2023 at 8:33 AM Jeff Law via Gcc-patches
>>> <gcc-patches@gcc.gnu.org> wrote:
>>>>
>>>>
>>>>
>>>> On 5/2/23 14:34, Patrick O'Neill wrote:
>>>> > Is this OK for a backport to GCC-13 as well?
>>>> Let me sync with Richi & Jakub.  They're the release managers and this
>>>> doesn't fall under the usual rules for things that can be backported.
>>>
>>> I would guess that most distros have these patches backported (SUSE has
>>> that to both 12 and 13), so it wouldn't make much of a difference.  Since
>>
>> That'd be my argument, too.  The distros that don't have this probably
>> have something scarier, like an implicit default to -latomic.
>>
>>> this is backend specific and RISCV is neither primary nor secondary
>>> it's up to the target maintainer discretion to bend the rules.
>>
>> Fair, though we're trying to at least pretend we're playing by the
>> rules... ;)
> So the net is let's backport this patch series to gcc-13.

Sounds good.

Patrick: do you mind sending up a backport?  There was a build fix that 
just landed as well.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [committed] RISCV: Inline subword atomic ops
  2023-05-03 15:33                         ` Palmer Dabbelt
@ 2023-05-03 16:13                           ` Patrick O'Neill
  0 siblings, 0 replies; 24+ messages in thread
From: Patrick O'Neill @ 2023-05-03 16:13 UTC (permalink / raw)
  To: Palmer Dabbelt, jeffreyalaw
  Cc: richard.guenther, gcc-patches, Kito Cheng, david.abd, schwab


On 5/3/23 08:33, Palmer Dabbelt wrote:
> On Wed, 03 May 2023 08:13:23 PDT (-0700), jeffreyalaw@gmail.com wrote:
>>
>>
>> On 5/3/23 08:14, Palmer Dabbelt wrote:
>>> On Wed, 03 May 2023 02:49:41 PDT (-0700), richard.guenther@gmail.com 
>>> wrote:
>>>> On Wed, May 3, 2023 at 8:33 AM Jeff Law via Gcc-patches
>>>> <gcc-patches@gcc.gnu.org> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 5/2/23 14:34, Patrick O'Neill wrote:
>>>>> > Is this OK for a backport to GCC-13 as well?
>>>>> Let me sync with Richi & Jakub.  They're the release managers and 
>>>>> this
>>>>> doesn't fall under the usual rules for things that can be backported.
>>>>
>>>> I would guess that most distros have these patches backported (SUSE 
>>>> has
>>>> that to both 12 and 13), so it wouldn't make much of a difference.  
>>>> Since
>>>
>>> That'd be my argument, too.  The distros that don't have this probably
>>> have something scarier, like an implicit default to -latomic.
>>>
>>>> this is backend specific and RISCV is neither primary nor secondary
>>>> it's up to the target maintainer discretion to bend the rules.
>>>
>>> Fair, though we're trying to at least pretend we're playing by the
>>> rules... ;)
>> So the net is let's backport this patch series to gcc-13.
>
> Sounds good.
>
> Patrick: do you mind sending up a backport?  There was a build fix 
> that just landed as well.

Will do - thanks.

Patrick

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2023-05-03 16:13 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-21 21:58 [PATCH v4] RISC-V: Add support for inlining subword atomic operations Palmer Dabbelt
2022-09-02 10:08 ` Kito Cheng
2022-10-28 16:55   ` David Abdurachmanov
2022-11-16  3:53     ` Jeff Law
2023-04-18 14:28 ` [PATCH v5] RISCV: Inline subword atomic ops Patrick O'Neill
2023-04-18 15:06   ` Andreas Schwab
2023-04-18 16:39   ` [PATCH v6] " Patrick O'Neill
2023-04-18 20:17     ` Palmer Dabbelt
2023-04-18 21:41     ` [PATCH v7] " Patrick O'Neill
2023-04-24 17:20       ` Patrick O'Neill
2023-04-25  5:52       ` Jeff Law
2023-04-25 15:20         ` Patrick O'Neill
2023-04-26  2:27           ` Jeff Law
2023-04-26 17:01             ` [committed] " Patrick O'Neill
2023-05-02 20:34               ` Patrick O'Neill
2023-05-03  6:32                 ` Jeff Law
2023-05-03  9:49                   ` Richard Biener
2023-05-03 14:14                     ` Palmer Dabbelt
2023-05-03 15:13                       ` Jeff Law
2023-05-03 15:33                         ` Palmer Dabbelt
2023-05-03 16:13                           ` Patrick O'Neill
2023-04-18 16:59   ` [PATCH v5] " Jeff Law
2023-04-18 20:48     ` Patrick O'Neill
2023-04-18 21:04       ` Jeff Law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).