public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* __sync_swap* with acq/rel/full memory barrier semantics
@ 2011-05-24  8:27 Aldy Hernandez
  2011-05-24  9:25 ` Joseph S. Myers
  0 siblings, 1 reply; 26+ messages in thread
From: Aldy Hernandez @ 2011-05-24  8:27 UTC (permalink / raw)
  To: Richard Henderson, gcc-patches; +Cc: Benjamin De Kosnik, Andrew MacLeod

[-- Attachment #1: Type: text/plain, Size: 1407 bytes --]

This is a patch implementing builtins for an atomic exchange with full, 
acquire, and release memory barrier semantics.  It is similar to 
__sync_lock_test_and_set(), but the target does not have the option of 
implementing a reduced functionality of only implementing a store of 1.  
Also, unlike __sync_lock_test_and_set(), we have all three memory 
barrier variants.

The compiler will fall back to a full barrier if the user requests an 
acquire/release and it is not available in the target.  Also, if no 
variant is available, we will fall back to a compare and swap loop with 
a full barrier at the end.

The real reason for this patch is to implement atomic stores in the C++ 
runtime library, which can currently incorrectly move prior stores past 
an atomic store, thus invalidating the happens-before promise for the 
sequentially consistent model.  I am attaching the corresponding patch 
to libstdc++ to show how I intend to use the builtin.  This is not an 
official submission for the C++ library bits, as I have not yet fully 
tested the library.  I will do so separately.

In a followup patch I will be implementing acq/rel/full variants for all 
the __sync_* builtins which we can use for the atomic loads and for some 
of the OpenMP atomics Jakub has been working on.

Oh yeah, I would gladly accept patterns/patches for other architectures :).

Tested on x86-64 Linux.

OK for mainline?

[-- Attachment #2: curr --]
[-- Type: text/plain, Size: 15140 bytes --]

	* c-family/c-common.c (resolve_overloaded_builtin): Add
	BUILT_IN_LOCK_TEST_AND_SET_*_N variants.
	* doc/extend.texi: Document __sync_lock_test_and_set_* variants.
	* libgcc-std.ver: Add __sync_swap_*.
	* optabs.h: Add DOI_sync_swap*.
	Define sync_swap*_optab.
	* optabs.c (expand_sync_swap): New.
	* genopinit.c: Add sync_swap_{acq,rel,full}.
	* config/i386/sync.md ("sync_lock_test_and_set_full<mode>"): New.
	* config/i386/i386.md: Add UNSPECV_SWAP_FULL.
	* tree.h (enum membar_mode): Same.
	* builtins.c (expand_builtin_swap): New.
	(expand_builtin): Add cases for BUILT_IN_SWAP_*.
	* sync-builtins.def (BUILT_IN_SWAP_*): New.
	* expr.h (expand_sync_swap): Protoize.
	(expand_builtin_synchronize): Same.

Index: doc/extend.texi
===================================================================
--- doc/extend.texi	(revision 173831)
+++ doc/extend.texi	(working copy)
@@ -6719,6 +6719,22 @@ speculated to) before the builtin, but p
 be globally visible yet, and previous memory loads may not yet be
 satisfied.
 
+@item @var{type} __sync_swap_full (@var{type} *ptr, @var{type} value, ...)
+@itemx @var{type} __sync_swap_acq (@var{type} *ptr, @var{type} value, ...)
+@itemx @var{type} __sync_swap_rel (@var{type} *ptr, @var{type} value, ...)
+@findex __sync_swap_full
+@findex __sync_swap_acq
+@findex __sync_swap_rel
+These builtins implement an atomic exchange operation.  They write
+@var{value} into @code{*@var{ptr}}, and return the previous contents
+of @code{*@var{ptr}}.  The different variants provide a full barrier,
+acquire barrier, or a release barrier respectively depending on the
+suffix.
+
+If the acquire or release variants of these operations are not
+available on the given target, the compiler will fall back to a full
+barrier.
+
 @item void __sync_lock_release (@var{type} *ptr, ...)
 @findex __sync_lock_release
 This builtin releases the lock acquired by @code{__sync_lock_test_and_set}.
Index: c-family/c-common.c
===================================================================
--- c-family/c-common.c	(revision 173831)
+++ c-family/c-common.c	(working copy)
@@ -9035,6 +9035,9 @@ resolve_overloaded_builtin (location_t l
     case BUILT_IN_VAL_COMPARE_AND_SWAP_N:
     case BUILT_IN_LOCK_TEST_AND_SET_N:
     case BUILT_IN_LOCK_RELEASE_N:
+    case BUILT_IN_SWAP_FULL_N:
+    case BUILT_IN_SWAP_ACQ_N:
+    case BUILT_IN_SWAP_REL_N:
       {
 	int n = sync_resolve_size (function, params);
 	tree new_function, first_param, result;
Index: optabs.c
===================================================================
--- optabs.c	(revision 173831)
+++ optabs.c	(working copy)
@@ -6988,6 +6988,70 @@ expand_sync_lock_test_and_set (rtx mem, 
 
   return NULL_RTX;
 }
+
+/* This function expands an atomic exchange operation: atomically store
+   VAL in MEM and return the previous value in MEM.
+
+   TARGET is an option place to stick the return value.
+   MBMODE is the memory barrier type to use for the operation.  */
+
+rtx
+expand_sync_swap (rtx mem, rtx val, rtx target, enum membar_mode mbmode)
+{
+  enum machine_mode mode = GET_MODE (mem);
+  enum insn_code icode;
+  direct_optab op;
+
+  switch (mbmode)
+    {
+    case MEMBAR_MODE_ACQUIRE:
+      op = sync_swap_acq_optab;
+      break;
+    case MEMBAR_MODE_RELEASE:
+      op = sync_swap_rel_optab;
+      break;
+    case MEMBAR_MODE_FULL:
+      op = sync_swap_full_optab;
+      break;
+    default:
+      gcc_unreachable ();
+    }
+  /* If no variant is found, try the full barrier.  */
+  if (direct_optab_handler (op, mode) == CODE_FOR_nothing)
+    op = sync_swap_full_optab;
+
+  /* If the target supports the swap directly, great.  */
+  icode = direct_optab_handler (op, mode);
+  if (icode != CODE_FOR_nothing)
+    {
+      struct expand_operand ops[3];
+
+      create_output_operand (&ops[0], target, mode);
+      create_fixed_operand (&ops[1], mem);
+      /* VAL may have been promoted to a wider mode.  Shrink it if so.  */
+      create_convert_operand_to (&ops[2], val, mode, true);
+      if (maybe_expand_insn (icode, 3, ops))
+	return ops[0].value;
+    }
+
+  /* Otherwise, use a compare-and-swap loop for the exchange.  */
+  if (direct_optab_handler (sync_compare_and_swap_optab, mode)
+      != CODE_FOR_nothing)
+    {
+      if (!target || !register_operand (target, mode))
+	target = gen_reg_rtx (mode);
+      if (GET_MODE (val) != VOIDmode && GET_MODE (val) != mode)
+	val = convert_modes (mode, GET_MODE (val), val, 1);
+      if (expand_compare_and_swap_loop (mem, target, val, NULL_RTX))
+	{
+	  /* Issue a full barrier.  */
+	  expand_builtin_synchronize ();
+	  return target;
+	}
+    }
+
+  return NULL_RTX;
+}
 \f
 /* Return true if OPERAND is suitable for operand number OPNO of
    instruction ICODE.  */
Index: optabs.h
===================================================================
--- optabs.h	(revision 173831)
+++ optabs.h	(working copy)
@@ -669,9 +669,19 @@ enum direct_optab_index
   /* Atomic compare and swap.  */
   DOI_sync_compare_and_swap,
 
-  /* Atomic exchange with acquire semantics.  */
+  /* Atomic exchange with acquire semantics.  Exchange not fully
+     guaranteed.  Some targets may only support a store of 1.  */
   DOI_sync_lock_test_and_set,
 
+  /* Atomic exchange with acquire semantics.  */
+  DOI_sync_swap_acq,
+
+  /* Atomic exchange with release semantics.  */
+  DOI_sync_swap_rel,
+
+  /* Atomic exchange with full barrier semantics.  */
+  DOI_sync_swap_full,
+
   /* Atomic clear with release semantics.  */
   DOI_sync_lock_release,
 
@@ -720,6 +730,12 @@ typedef struct direct_optab_d *direct_op
   (&direct_optab_table[(int) DOI_sync_compare_and_swap])
 #define sync_lock_test_and_set_optab \
   (&direct_optab_table[(int) DOI_sync_lock_test_and_set])
+#define sync_swap_acq_optab \
+  (&direct_optab_table[(int) DOI_sync_swap_acq])
+#define sync_swap_rel_optab \
+  (&direct_optab_table[(int) DOI_sync_swap_rel])
+#define sync_swap_full_optab \
+  (&direct_optab_table[(int) DOI_sync_swap_full])
 #define sync_lock_release_optab \
   (&direct_optab_table[(int) DOI_sync_lock_release])
 \f
Index: genopinit.c
===================================================================
--- genopinit.c	(revision 173831)
+++ genopinit.c	(working copy)
@@ -239,6 +239,9 @@ static const char * const optabs[] =
   "set_direct_optab_handler (sync_new_nand_optab, $A, CODE_FOR_$(sync_new_nand$I$a$))",
   "set_direct_optab_handler (sync_compare_and_swap_optab, $A, CODE_FOR_$(sync_compare_and_swap$I$a$))",
   "set_direct_optab_handler (sync_lock_test_and_set_optab, $A, CODE_FOR_$(sync_lock_test_and_set$I$a$))",
+  "set_direct_optab_handler (sync_swap_acq_optab, $A, CODE_FOR_$(sync_swap_acq$I$a$))",
+  "set_direct_optab_handler (sync_swap_rel_optab, $A, CODE_FOR_$(sync_swap_rel$I$a$))",
+  "set_direct_optab_handler (sync_swap_full_optab, $A, CODE_FOR_$(sync_swap_full$I$a$))",
   "set_direct_optab_handler (sync_lock_release_optab, $A, CODE_FOR_$(sync_lock_release$I$a$))",
   "set_optab_handler (vec_set_optab, $A, CODE_FOR_$(vec_set$a$))",
   "set_optab_handler (vec_extract_optab, $A, CODE_FOR_$(vec_extract$a$))",
Index: builtins.c
===================================================================
--- builtins.c	(revision 173831)
+++ builtins.c	(working copy)
@@ -5682,9 +5682,35 @@ expand_builtin_lock_test_and_set (enum m
   return expand_sync_lock_test_and_set (mem, val, target);
 }
 
+/* Expand the __sync_swap_* intrinsics.
+
+   EXP is the CALL_EXPR.
+   TARGET is an optional place for us to store the results.
+   MBMODE is the memory barrier mode to use.  */
+
+static rtx
+expand_builtin_swap (enum machine_mode mode, tree exp, rtx target,
+		     enum membar_mode mbmode)
+{
+  rtx val, mem;
+  enum machine_mode old_mode;
+
+  /* Expand the operands.  */
+  mem = get_builtin_sync_mem (CALL_EXPR_ARG (exp, 0), mode);
+  val = expand_expr (CALL_EXPR_ARG (exp, 1), NULL_RTX, mode, EXPAND_NORMAL);
+  /* If VAL is promoted to a wider mode, convert it back to MODE.  Take care
+     of CONST_INTs, where we know the old_mode only from the call argument.  */
+  old_mode = GET_MODE (val);
+  if (old_mode == VOIDmode)
+    old_mode = TYPE_MODE (TREE_TYPE (CALL_EXPR_ARG (exp, 1)));
+  val = convert_modes (mode, old_mode, val, 1);
+
+  return expand_sync_swap (mem, val, target, mbmode);
+}
+
 /* Expand the __sync_synchronize intrinsic.  */
 
-static void
+void
 expand_builtin_synchronize (void)
 {
   gimple x;
@@ -6495,6 +6521,39 @@ expand_builtin (tree exp, rtx target, rt
 	return target;
       break;
 
+    case BUILT_IN_SWAP_ACQ_1:
+    case BUILT_IN_SWAP_ACQ_2:
+    case BUILT_IN_SWAP_ACQ_4:
+    case BUILT_IN_SWAP_ACQ_8:
+    case BUILT_IN_SWAP_ACQ_16:
+      mode = get_builtin_sync_mode (fcode - BUILT_IN_SWAP_ACQ_1);
+      target = expand_builtin_swap (mode, exp, target, MEMBAR_MODE_ACQUIRE);
+      if (target)
+	return target;
+      break;
+
+    case BUILT_IN_SWAP_REL_1:
+    case BUILT_IN_SWAP_REL_2:
+    case BUILT_IN_SWAP_REL_4:
+    case BUILT_IN_SWAP_REL_8:
+    case BUILT_IN_SWAP_REL_16:
+      mode = get_builtin_sync_mode (fcode - BUILT_IN_SWAP_REL_1);
+      target = expand_builtin_swap (mode, exp, target, MEMBAR_MODE_RELEASE);
+      if (target)
+	return target;
+      break;
+
+    case BUILT_IN_SWAP_FULL_1:
+    case BUILT_IN_SWAP_FULL_2:
+    case BUILT_IN_SWAP_FULL_4:
+    case BUILT_IN_SWAP_FULL_8:
+    case BUILT_IN_SWAP_FULL_16:
+      mode = get_builtin_sync_mode (fcode - BUILT_IN_SWAP_FULL_1);
+      target = expand_builtin_swap (mode, exp, target, MEMBAR_MODE_FULL);
+      if (target)
+	return target;
+      break;
+
     case BUILT_IN_LOCK_TEST_AND_SET_1:
     case BUILT_IN_LOCK_TEST_AND_SET_2:
     case BUILT_IN_LOCK_TEST_AND_SET_4:
Index: sync-builtins.def
===================================================================
--- sync-builtins.def	(revision 173831)
+++ sync-builtins.def	(working copy)
@@ -235,6 +235,63 @@ DEF_SYNC_BUILTIN (BUILT_IN_LOCK_TEST_AND
 DEF_SYNC_BUILTIN (BUILT_IN_LOCK_TEST_AND_SET_16, "__sync_lock_test_and_set_16",
 		  BT_FN_I16_VPTR_I16, ATTR_NOTHROW_LEAF_LIST)
 
+DEF_SYNC_BUILTIN (BUILT_IN_SWAP_ACQ_N,
+		  "__sync_swap_acq",
+		  BT_FN_VOID_VAR, ATTR_NOTHROW_LEAF_LIST)
+DEF_SYNC_BUILTIN (BUILT_IN_SWAP_ACQ_1,
+		  "__sync_swap_acq_1",
+		  BT_FN_I1_VPTR_I1, ATTR_NOTHROW_LEAF_LIST)
+DEF_SYNC_BUILTIN (BUILT_IN_SWAP_ACQ_2,
+		  "__sync_swap_acq_2",
+		  BT_FN_I2_VPTR_I2, ATTR_NOTHROW_LEAF_LIST)
+DEF_SYNC_BUILTIN (BUILT_IN_SWAP_ACQ_4,
+		  "__sync_swap_acq_4",
+		  BT_FN_I4_VPTR_I4, ATTR_NOTHROW_LEAF_LIST)
+DEF_SYNC_BUILTIN (BUILT_IN_SWAP_ACQ_8,
+		  "__sync_swap_acq_8",
+		  BT_FN_I8_VPTR_I8, ATTR_NOTHROW_LEAF_LIST)
+DEF_SYNC_BUILTIN (BUILT_IN_SWAP_ACQ_16,
+		  "__sync_swap_acq_16",
+		  BT_FN_I16_VPTR_I16, ATTR_NOTHROW_LEAF_LIST)
+
+DEF_SYNC_BUILTIN (BUILT_IN_SWAP_REL_N,
+		  "__sync_swap_rel",
+		  BT_FN_VOID_VAR, ATTR_NOTHROW_LEAF_LIST)
+DEF_SYNC_BUILTIN (BUILT_IN_SWAP_REL_1,
+		  "__sync_swap_rel_1",
+		  BT_FN_I1_VPTR_I1, ATTR_NOTHROW_LEAF_LIST)
+DEF_SYNC_BUILTIN (BUILT_IN_SWAP_REL_2,
+		  "__sync_swap_rel_2",
+		  BT_FN_I2_VPTR_I2, ATTR_NOTHROW_LEAF_LIST)
+DEF_SYNC_BUILTIN (BUILT_IN_SWAP_REL_4,
+		  "__sync_swap_rel_4",
+		  BT_FN_I4_VPTR_I4, ATTR_NOTHROW_LEAF_LIST)
+DEF_SYNC_BUILTIN (BUILT_IN_SWAP_REL_8,
+		  "__sync_swap_rel_8",
+		  BT_FN_I8_VPTR_I8, ATTR_NOTHROW_LEAF_LIST)
+DEF_SYNC_BUILTIN (BUILT_IN_SWAP_REL_16,
+		  "__sync_swap_rel_16",
+		  BT_FN_I16_VPTR_I16, ATTR_NOTHROW_LEAF_LIST)
+
+DEF_SYNC_BUILTIN (BUILT_IN_SWAP_FULL_N,
+		  "__sync_swap_full",
+		  BT_FN_VOID_VAR, ATTR_NOTHROW_LEAF_LIST)
+DEF_SYNC_BUILTIN (BUILT_IN_SWAP_FULL_1,
+		  "__sync_swap_full_1",
+		  BT_FN_I1_VPTR_I1, ATTR_NOTHROW_LEAF_LIST)
+DEF_SYNC_BUILTIN (BUILT_IN_SWAP_FULL_2,
+		  "__sync_swap_full_2",
+		  BT_FN_I2_VPTR_I2, ATTR_NOTHROW_LEAF_LIST)
+DEF_SYNC_BUILTIN (BUILT_IN_SWAP_FULL_4,
+		  "__sync_swap_full_4",
+		  BT_FN_I4_VPTR_I4, ATTR_NOTHROW_LEAF_LIST)
+DEF_SYNC_BUILTIN (BUILT_IN_SWAP_FULL_8,
+		  "__sync_swap_full_8",
+		  BT_FN_I8_VPTR_I8, ATTR_NOTHROW_LEAF_LIST)
+DEF_SYNC_BUILTIN (BUILT_IN_SWAP_FULL_16,
+		  "__sync_swap_full_16",
+		  BT_FN_I16_VPTR_I16, ATTR_NOTHROW_LEAF_LIST)
+
 DEF_SYNC_BUILTIN (BUILT_IN_LOCK_RELEASE_N, "__sync_lock_release",
 		  BT_FN_VOID_VAR, ATTR_NOTHROW_LEAF_LIST)
 DEF_SYNC_BUILTIN (BUILT_IN_LOCK_RELEASE_1, "__sync_lock_release_1",
Index: expr.h
===================================================================
--- expr.h	(revision 173831)
+++ expr.h	(working copy)
@@ -161,6 +161,14 @@ enum optab_methods
   OPTAB_MUST_WIDEN
 };
 
+/* Memory barrier type.  */
+enum membar_mode
+{
+  MEMBAR_MODE_RELEASE,
+  MEMBAR_MODE_ACQUIRE,
+  MEMBAR_MODE_FULL
+};
+
 /* Generate code for a simple binary or unary operation.  "Simple" in
    this case means "can be unambiguously described by a (mode, code)
    pair and mapped to a single optab."  */
@@ -217,6 +225,7 @@ rtx expand_bool_compare_and_swap (rtx, r
 rtx expand_sync_operation (rtx, rtx, enum rtx_code);
 rtx expand_sync_fetch_operation (rtx, rtx, enum rtx_code, bool, rtx);
 rtx expand_sync_lock_test_and_set (rtx, rtx, rtx);
+rtx expand_sync_swap (rtx, rtx, rtx, enum membar_mode);
 \f
 /* Functions from expmed.c:  */
 
@@ -248,6 +257,7 @@ extern void expand_builtin_setjmp_receiv
 extern rtx expand_builtin_saveregs (void);
 extern void expand_builtin_trap (void);
 extern rtx builtin_strncpy_read_str (void *, HOST_WIDE_INT, enum machine_mode);
+extern void expand_builtin_synchronize (void);
 \f
 /* Functions from expr.c:  */
 
Index: config/i386/i386.md
===================================================================
--- config/i386/i386.md	(revision 173831)
+++ config/i386/i386.md	(working copy)
@@ -250,6 +250,7 @@
   UNSPECV_MWAIT
   UNSPECV_CMPXCHG
   UNSPECV_XCHG
+  UNSPECV_SWAP_FULL
   UNSPECV_LOCK
   UNSPECV_PROLOGUE_USE
   UNSPECV_CLD
Index: config/i386/sync.md
===================================================================
--- config/i386/sync.md	(revision 173831)
+++ config/i386/sync.md	(working copy)
@@ -232,6 +232,15 @@
   return "lock{%;} add{<imodesuffix>}\t{%1, %0|%0, %1}";
 })
 
+(define_insn "sync_swap_full<mode>"
+  [(set (match_operand:SWI 0 "register_operand" "=<r>")
+	(unspec_volatile:SWI
+	  [(match_operand:SWI 1 "memory_operand" "+m")] UNSPECV_SWAP_FULL))
+   (set (match_dup 1)
+	(match_operand:SWI 2 "register_operand" "0"))]
+  ""
+  "xchg{<imodesuffix>}\t{%1, %0|%0, %1}")
+
 ;; Recall that xchg implicitly sets LOCK#, so adding it again wastes space.
 (define_insn "sync_lock_test_and_set<mode>"
   [(set (match_operand:SWI 0 "register_operand" "=<r>")
Index: libgcc-std.ver
===================================================================
--- libgcc-std.ver	(revision 173831)
+++ libgcc-std.ver	(working copy)
@@ -1919,3 +1919,26 @@ GCC_4.6.0 {
   __morestack_initial_sp
   __splitstack_find
 }
+
+%inherit GCC_4.7.0 GCC_4.6.0
+GCC_4.7.0 {
+  __sync_swap_acq_1
+  __sync_swap_rel_1
+  __sync_swap_full_1
+
+  __sync_swap_acq_2
+  __sync_swap_rel_2
+  __sync_swap_full_2
+
+  __sync_swap_acq_4
+  __sync_swap_rel_4
+  __sync_swap_full_4
+
+  __sync_swap_acq_8
+  __sync_swap_rel_8
+  __sync_swap_full_8
+
+  __sync_swap_acq_16
+  __sync_swap_rel_16
+  __sync_swap_full_16
+}

[-- Attachment #3: stl-atomic-store --]
[-- Type: text/plain, Size: 2040 bytes --]

	(__sso_string_base<>::_M_assign): Likewise.
	* include/bits/atomic_2.h (_ITp<>::store): Use __sync_swap_full.
	(_ITp<>::store volatile): Same.
	(_PTp<>::store): Same.
	(_PTp<>::store volatile): Same.

Index: include/bits/atomic_2.h
===================================================================
--- include/bits/atomic_2.h	(revision 173831)
+++ include/bits/atomic_2.h	(working copy)
@@ -249,14 +249,12 @@ namespace __atomic2
 	__glibcxx_assert(__m != memory_order_acq_rel);
 	__glibcxx_assert(__m != memory_order_consume);
 
-	if (__m == memory_order_relaxed)
-	  _M_i = __i;
+	if (__m == memory_order_seq_cst)
+	  (void)__sync_swap_full (&_M_i, __i);
 	else
 	  {
 	    // write_mem_barrier();
 	    _M_i = __i;
-	    if (__m == memory_order_seq_cst)
-	      __sync_synchronize();
 	  }
       }
 
@@ -267,14 +265,12 @@ namespace __atomic2
 	__glibcxx_assert(__m != memory_order_acq_rel);
 	__glibcxx_assert(__m != memory_order_consume);
 
-	if (__m == memory_order_relaxed)
-	  _M_i = __i;
+	if (__m == memory_order_seq_cst)
+	  (void)__sync_swap_full (&_M_i, __i);
 	else
 	  {
 	    // write_mem_barrier();
 	    _M_i = __i;
-	    if (__m == memory_order_seq_cst)
-	      __sync_synchronize();
 	  }
       }
 
@@ -540,14 +536,12 @@ namespace __atomic2
 	__glibcxx_assert(__m != memory_order_acq_rel);
 	__glibcxx_assert(__m != memory_order_consume);
 
-	if (__m == memory_order_relaxed)
-	  _M_p = __p;
+	if (__m = memory_order_seq_cst)
+	  __sync_swap_full (&_M_p, __p);
 	else
 	  {
 	    // write_mem_barrier();
 	    _M_p = __p;
-	    if (__m == memory_order_seq_cst)
-	      __sync_synchronize();
 	  }
       }
 
@@ -559,14 +553,12 @@ namespace __atomic2
 	__glibcxx_assert(__m != memory_order_acq_rel);
 	__glibcxx_assert(__m != memory_order_consume);
 
-	if (__m == memory_order_relaxed)
-	  _M_p = __p;
+	if (__m = memory_order_seq_cst)
+	  __sync_swap_full (&_M_p, __p);
 	else
 	  {
 	    // write_mem_barrier();
 	    _M_p = __p;
-	    if (__m == memory_order_seq_cst)
-	      __sync_synchronize();
 	  }
       }
 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: __sync_swap* with acq/rel/full memory barrier semantics
  2011-05-24  8:27 __sync_swap* with acq/rel/full memory barrier semantics Aldy Hernandez
@ 2011-05-24  9:25 ` Joseph S. Myers
  2011-05-30 22:53   ` Andrew MacLeod
  0 siblings, 1 reply; 26+ messages in thread
From: Joseph S. Myers @ 2011-05-24  9:25 UTC (permalink / raw)
  To: Aldy Hernandez
  Cc: Richard Henderson, gcc-patches, Benjamin De Kosnik, Andrew MacLeod

On Mon, 23 May 2011, Aldy Hernandez wrote:

> This is a patch implementing builtins for an atomic exchange with full,
> acquire, and release memory barrier semantics.  It is similar to
> __sync_lock_test_and_set(), but the target does not have the option of
> implementing a reduced functionality of only implementing a store of 1.  Also,
> unlike __sync_lock_test_and_set(), we have all three memory barrier variants.

What's the reason you've implemented three variants, rather than six (the 
C1X/C++0X atomics have six memory order values) or one built-in function 
taking a memory order parameter?  More generally, what is the underlying 
design here for how built-in functions should cover the whole of the new 
atomics functionality in C1X and C++0X?

Adding functions to libgcc-std.ver seems premature in the absence of any 
library implementations of them.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: __sync_swap* with acq/rel/full memory barrier semantics
  2011-05-24  9:25 ` Joseph S. Myers
@ 2011-05-30 22:53   ` Andrew MacLeod
  2011-05-31 13:12     ` Jakub Jelinek
  2011-06-02 19:13     ` Aldy Hernandez
  0 siblings, 2 replies; 26+ messages in thread
From: Andrew MacLeod @ 2011-05-30 22:53 UTC (permalink / raw)
  To: Joseph S. Myers
  Cc: Aldy Hernandez, Richard Henderson, gcc-patches, Benjamin De Kosnik

On 05/23/2011 07:05 PM, Joseph S. Myers wrote:
> On Mon, 23 May 2011, Aldy Hernandez wrote:
>
>> This is a patch implementing builtins for an atomic exchange with full,
>> acquire, and release memory barrier semantics.  It is similar to
>> __sync_lock_test_and_set(), but the target does not have the option of
>> implementing a reduced functionality of only implementing a store of 1.  Also,
>> unlike __sync_lock_test_and_set(), we have all three memory barrier variants.
> What's the reason you've implemented three variants, rather than six (the
> C1X/C++0X atomics have six memory order values) or one built-in function
> taking a memory order parameter?  More generally, what is the underlying
> design here for how built-in functions should cover the whole of the new
> atomics functionality in C1X and C++0X?

Aldy was just too excited about working on memory model I think :-)

I've been looking at this, and I propose we go this way :

http://gcc.gnu.org/wiki/Atomic/GCCMM/CodeGen

Please feel free to criticize, comment on,  or ask for clarification.  I 
usually miss something I meant to get across.


Andrew

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: __sync_swap* with acq/rel/full memory barrier semantics
  2011-05-30 22:53   ` Andrew MacLeod
@ 2011-05-31 13:12     ` Jakub Jelinek
  2011-05-31 15:23       ` Andrew MacLeod
  2011-06-02 19:13     ` Aldy Hernandez
  1 sibling, 1 reply; 26+ messages in thread
From: Jakub Jelinek @ 2011-05-31 13:12 UTC (permalink / raw)
  To: Andrew MacLeod
  Cc: Joseph S. Myers, Aldy Hernandez, Richard Henderson, gcc-patches,
	Benjamin De Kosnik

On Mon, May 30, 2011 at 04:07:09PM -0400, Andrew MacLeod wrote:
> On 05/23/2011 07:05 PM, Joseph S. Myers wrote:
> >On Mon, 23 May 2011, Aldy Hernandez wrote:
> >
> >>This is a patch implementing builtins for an atomic exchange with full,
> >>acquire, and release memory barrier semantics.  It is similar to
> >>__sync_lock_test_and_set(), but the target does not have the option of
> >>implementing a reduced functionality of only implementing a store of 1.  Also,
> >>unlike __sync_lock_test_and_set(), we have all three memory barrier variants.
> >What's the reason you've implemented three variants, rather than six (the
> >C1X/C++0X atomics have six memory order values) or one built-in function
> >taking a memory order parameter?  More generally, what is the underlying
> >design here for how built-in functions should cover the whole of the new
> >atomics functionality in C1X and C++0X?
> 
> Aldy was just too excited about working on memory model I think :-)
> 
> I've been looking at this, and I propose we go this way :
> 
> http://gcc.gnu.org/wiki/Atomic/GCCMM/CodeGen
> 
> Please feel free to criticize, comment on,  or ask for
> clarification.  I usually miss something I meant to get across.

I think the addition of new __sync_* builtins for the different models
is preferrable and would be generally more usable even for other users than
C++ atomics. On some targets any atomic insn will act as a full barrier,
while on others it could generate different insns or code sequences that
way.  For OpenMP atomics having a none (in addition to full/acq/rel)
would be useful, I think #pragma omp atomic doesn't impose any ordering
on memory accesses other than the memory being atomically
read/written/changed.  Haven't read the C++0x standard in detail why
it has 6 memory order modes instead of just 4, but if really 6 are needed
(even for 4 probably), having new builtins with just one constant extra
argument which says the memory ordering mode would be best.

	Jakub

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: __sync_swap* with acq/rel/full memory barrier semantics
  2011-05-31 13:12     ` Jakub Jelinek
@ 2011-05-31 15:23       ` Andrew MacLeod
  0 siblings, 0 replies; 26+ messages in thread
From: Andrew MacLeod @ 2011-05-31 15:23 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Joseph S. Myers, Aldy Hernandez, Richard Henderson, gcc-patches,
	Benjamin De Kosnik

On 05/31/2011 06:38 AM, Jakub Jelinek wrote:
>
>> Aldy was just too excited about working on memory model I think :-)
>>
>> I've been looking at this, and I propose we go this way :
>>
>> http://gcc.gnu.org/wiki/Atomic/GCCMM/CodeGen
>>
>> Please feel free to criticize, comment on,  or ask for
>> clarification.  I usually miss something I meant to get across.
> I think the addition of new __sync_* builtins for the different models
> is preferrable and would be generally more usable even for other users than
> C++ atomics. On some targets any atomic insn will act as a full barrier,
> while on others it could generate different insns or code sequences that
> way.  For OpenMP atomics having a none (in addition to full/acq/rel)
> would be useful, I think #pragma omp atomic doesn't impose any ordering
> on memory accesses other than the memory being atomically
> read/written/changed.  Haven't read the C++0x standard in detail why
> it has 6 memory order modes instead of just 4, but if really 6 are needed
> (even for 4 probably), having new builtins with just one constant extra
> argument which says the memory ordering mode would be best.
>
>
I'm not sure if you are agreeing or not, or how much :-)

There is still only the basics of relaxed, consume, release/acquire, and 
seq-cst.  so there are 4 modes.  C++ gives you two more  by separating 
release and acquire for loads and stores, loads using 'acquire' mode, 
stores using 'release'.  I guess It allows for a slightly finer control 
over instructions that can be loads and/or stores. It looks like the 
optimal powerpc sequence for cmpxchg is slightly more efficient when its 
just an acquire or just a release rather than an acquire/release for 
instance. (and all 3 sequences are slightly different)

The table is more or less complete... ie,  a store cant have an 
'acquire' mode...  and I presume that a consumer which doesn't break 
release-acquire down into component parts would use that 'release' 
version of the store as 'release/acquire' mode.

I presume a single builtin with a parameter is the most efficient way to 
build them, but thats just an implementation detail. Presumable you have 
each builtin in the table with each of those possible modes as a valid 
parameter.   The one thing I would care about is i would like to see the 
relaxed version be 'just an insn' rather than a builtin, if thats 
possible... My understanding is that relaxed (as far as C++) has no 
synchronization at all, so therefore you can treat it like a normal 
operation as far as optimization. That seems the same for openMP.  Its 
just thats its atomic operation.  So it would be preferable if we can 
avoid a builtin in the optimizers for that. Thats why I left it out of 
the table.  If all the atomic operations are already builtins, well, 
then I guess it doesn't matter :-P

It would be nice to say something like   emit_atomic_fetch_add 
(memory_order)  and if its  relaxed, emit the atomic fetch_add insn (or 
builtin if thats what it is), and if its something else, emit the 
appropriate builtin.  That would make bits/libstdc++v2/atomic_2.h even 
easier too

I think maybe we are more or less saying the same thing? :-)

Andrew





^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: __sync_swap* with acq/rel/full memory barrier semantics
  2011-05-30 22:53   ` Andrew MacLeod
  2011-05-31 13:12     ` Jakub Jelinek
@ 2011-06-02 19:13     ` Aldy Hernandez
  2011-06-02 19:25       ` Jakub Jelinek
  1 sibling, 1 reply; 26+ messages in thread
From: Aldy Hernandez @ 2011-06-02 19:13 UTC (permalink / raw)
  To: Andrew MacLeod
  Cc: Joseph S. Myers, Richard Henderson, gcc-patches, Andrew MacLeod

[-- Attachment #1: Type: text/plain, Size: 1089 bytes --]

On 05/30/11 15:07, Andrew MacLeod wrote:

> Aldy was just too excited about working on memory model I think :-)
>
> I've been looking at this, and I propose we go this way :
>
> http://gcc.gnu.org/wiki/Atomic/GCCMM/CodeGen

Still overly excited, but now with a more thorough plan :).

I'm going to concentrate on the non controversial parts (the __sync 
builtins), while the details are ironed out.

The attached patch implements the exchange operation, with a 
parameter/enum for the type of memory model to use.  I have chosen to 
call the builtins __sync_mem_BLAH to keep them all consistent.

I am including documentation and a test, so folks can get an idea of 
where I'm headed with this.  Once I take everyone's input, we can 
implement the rest of the builtins, and take it from there.

I see no prior art in providing some sort of enum for a builtin 
parameter.  I can proceed down this path if advisable, but an easier 
path is to just declare the __SYNC_MEM_* enum as preprocessor macros as 
I do in this patch.  Suggestions welcome.

How does this (lightly tested patch) look?

[-- Attachment #2: curr --]
[-- Type: text/plain, Size: 16486 bytes --]

	* doc/extend.texi (__sync_mem_exchange): Document.
	* cppbuiltin.c (define__GNUC__): Define __SYNC_MEM*.
	* c-family/c-common.c (BUILT_IN_MEM_EXCHANGE_N): Add case.
	* optabs.c (expand_sync_mem_exchange): New.
	* optabs.h (enum direct_optab_index): Add DOI_sync_mem* entries.
	(sync_mem_exchange_*_optab): Define.
	* genopinit.c: Add entries for sync_mem_exchange_*.
	* tree.h (enum memmodel): New.
	* builtins.c (get_memmodel): New.
	(expand_builtin_mem_exchange): New.
	(expand_builtin_synchronize): Remove static.
	(expand_builtin): Add cases for BUILT_IN_MEM_EXCHANGE_*.
	* sync-builtins.def: Add entries for BUILT_IN_MEM_EXCHANGE_*.
	* builtin-types.def (BT_FN_I{1,2,4,8,16}_VPTR_I{1,2,4,8,16}_INT):
	New.
	* expr.h (expand_sync_mem_exchange): Declare.
	(expand_builtin_synchronize): Same.
	* config/i386/i386.md (UNSPECV_MEM_XCHG): New.
	(sync_mem_exchange_seq_cst<mode>): New pattern.

Index: doc/extend.texi
===================================================================
--- doc/extend.texi	(revision 173831)
+++ doc/extend.texi	(working copy)
@@ -6728,6 +6728,22 @@ This builtin is not a full barrier, but 
 This means that all previous memory stores are globally visible, and all
 previous memory loads have been satisfied, but following memory reads
 are not prevented from being speculated to before the barrier.
+
+@item @var{type} __sync_mem_exchange (@var{type} *ptr, @var{type} value, int memmodel, ...)
+@findex __sync_mem_exchange
+This builtin implements an atomic exchange operation within the
+constraints of a memory model.  It writes @var{value} into
+@code{*@var{ptr}}, and returns the previous contents of
+@code{*@var{ptr}}.
+
+The valid memory model variants for this builtin are
+__SYNC_MEM_RELAXED, __SYNC_MEM_SEQ_CST, __SYNC_MEM_ACQUIRE,
+__SYNC_MEM_RELEASE, and __SYNC_MEM_ACQ_REL.  If the variant is not
+available for the given target, the compiler will fall back to the
+more restrictive memory model, the sequentially consistent model (if
+available).  If the sequentially consistent model is not implemented
+for the target, the compiler will implement the builtin with a compare
+and swap loop.
 @end table
 
 @node Object Size Checking
Index: cppbuiltin.c
===================================================================
--- cppbuiltin.c	(revision 173831)
+++ cppbuiltin.c	(working copy)
@@ -66,6 +66,12 @@ define__GNUC__ (cpp_reader *pfile)
   cpp_define_formatted (pfile, "__GNUC_MINOR__=%d", minor);
   cpp_define_formatted (pfile, "__GNUC_PATCHLEVEL__=%d", patchlevel);
   cpp_define_formatted (pfile, "__VERSION__=\"%s\"", version_string);
+  cpp_define_formatted (pfile, "__SYNC_MEM_RELAXED=%d", MEMMODEL_RELAXED);
+  cpp_define_formatted (pfile, "__SYNC_MEM_SEQ_CST=%d", MEMMODEL_SEQ_CST);
+  cpp_define_formatted (pfile, "__SYNC_MEM_ACQUIRE=%d", MEMMODEL_ACQUIRE);
+  cpp_define_formatted (pfile, "__SYNC_MEM_RELEASE=%d", MEMMODEL_RELEASE);
+  cpp_define_formatted (pfile, "__SYNC_MEM_ACQ_REL=%d", MEMMODEL_ACQ_REL);
+  cpp_define_formatted (pfile, "__SYNC_MEM_CONSUME=%d", MEMMODEL_CONSUME);
 }
 
 
Index: c-family/c-common.c
===================================================================
--- c-family/c-common.c	(revision 173831)
+++ c-family/c-common.c	(working copy)
@@ -9035,6 +9035,7 @@ resolve_overloaded_builtin (location_t l
     case BUILT_IN_VAL_COMPARE_AND_SWAP_N:
     case BUILT_IN_LOCK_TEST_AND_SET_N:
     case BUILT_IN_LOCK_RELEASE_N:
+    case BUILT_IN_MEM_EXCHANGE_N:
       {
 	int n = sync_resolve_size (function, params);
 	tree new_function, first_param, result;
Index: optabs.c
===================================================================
--- optabs.c	(revision 173831)
+++ optabs.c	(working copy)
@@ -6988,6 +6988,85 @@ expand_sync_lock_test_and_set (rtx mem, 
 
   return NULL_RTX;
 }
+
+/* This function expands a fine grained atomic exchange operation:
+   atomically store VAL in MEM and return the previous value in MEM.
+
+   MEMMODEL is the memory model variant to use.
+   TARGET is an option place to stick the return value.  */
+
+rtx
+expand_sync_mem_exchange (enum memmodel model, rtx mem, rtx val, rtx target)
+{
+  enum machine_mode mode = GET_MODE (mem);
+  enum insn_code icode;
+  direct_optab op;
+
+  switch (model)
+    {
+    case MEMMODEL_RELAXED:
+      /* ?? Eventually we should either just emit the atomic
+	 instruction without any barriers (and thus allow movements
+	 and transformations), or emit a relaxed builtin.
+
+	 It is still not clear whether any transformations are
+	 permissible on the atomics (for example, CSE might break
+	 coherence), so we might need to emit a relaxed builtin.
+
+         Until we figure this out, be conservative and fall
+         through.  */
+    case MEMMODEL_SEQ_CST:
+      op = sync_mem_exchange_seq_cst_optab;
+      break;
+    case MEMMODEL_ACQUIRE:
+      op = sync_mem_exchange_acq_optab;
+      break;
+    case MEMMODEL_RELEASE:
+      op = sync_mem_exchange_rel_optab;
+      break;
+    case MEMMODEL_ACQ_REL:
+      op = sync_mem_exchange_acq_rel_optab;
+      break;
+    default:
+      gcc_unreachable ();
+    }
+  /* If no variant is found, try the full barrier.  */
+  if (direct_optab_handler (op, mode) == CODE_FOR_nothing)
+    op = sync_mem_exchange_seq_cst_optab;
+
+  /* If the target supports the swap directly, great.  */
+  icode = direct_optab_handler (op, mode);
+  if (icode != CODE_FOR_nothing)
+    {
+      struct expand_operand ops[3];
+
+      create_output_operand (&ops[0], target, mode);
+      create_fixed_operand (&ops[1], mem);
+      /* VAL may have been promoted to a wider mode.  Shrink it if so.  */
+      create_convert_operand_to (&ops[2], val, mode, true);
+      if (maybe_expand_insn (icode, 3, ops))
+	return ops[0].value;
+    }
+
+  /* Otherwise, use a compare-and-swap loop with full barriers around
+     it.  */
+  if (direct_optab_handler (sync_compare_and_swap_optab, mode)
+      != CODE_FOR_nothing)
+    {
+      expand_builtin_synchronize ();
+      if (!target || !register_operand (target, mode))
+	target = gen_reg_rtx (mode);
+      if (GET_MODE (val) != VOIDmode && GET_MODE (val) != mode)
+	val = convert_modes (mode, GET_MODE (val), val, 1);
+      if (expand_compare_and_swap_loop (mem, target, val, NULL_RTX))
+	{
+	  expand_builtin_synchronize ();
+	  return target;
+	}
+    }
+
+  return NULL_RTX;
+}
 \f
 /* Return true if OPERAND is suitable for operand number OPNO of
    instruction ICODE.  */
Index: optabs.h
===================================================================
--- optabs.h	(revision 173831)
+++ optabs.h	(working copy)
@@ -675,6 +675,12 @@ enum direct_optab_index
   /* Atomic clear with release semantics.  */
   DOI_sync_lock_release,
 
+  /* Fine grained atomic exchange.  */
+  DOI_sync_mem_exchange_seq_cst,
+  DOI_sync_mem_exchange_acq,
+  DOI_sync_mem_exchange_rel,
+  DOI_sync_mem_exchange_acq_rel,
+
   DOI_MAX
 };
 
@@ -722,6 +728,15 @@ typedef struct direct_optab_d *direct_op
   (&direct_optab_table[(int) DOI_sync_lock_test_and_set])
 #define sync_lock_release_optab \
   (&direct_optab_table[(int) DOI_sync_lock_release])
+
+#define sync_mem_exchange_seq_cst_optab \
+  (&direct_optab_table[(int) DOI_sync_mem_exchange_seq_cst])
+#define sync_mem_exchange_acq_optab \
+  (&direct_optab_table[(int) DOI_sync_mem_exchange_acq])
+#define sync_mem_exchange_rel_optab \
+  (&direct_optab_table[(int) DOI_sync_mem_exchange_rel])
+#define sync_mem_exchange_acq_rel_optab \
+  (&direct_optab_table[(int) DOI_sync_mem_exchange_acq_rel])
 \f
 /* Target-dependent globals.  */
 struct target_optabs {
Index: genopinit.c
===================================================================
--- genopinit.c	(revision 173831)
+++ genopinit.c	(working copy)
@@ -240,6 +240,10 @@ static const char * const optabs[] =
   "set_direct_optab_handler (sync_compare_and_swap_optab, $A, CODE_FOR_$(sync_compare_and_swap$I$a$))",
   "set_direct_optab_handler (sync_lock_test_and_set_optab, $A, CODE_FOR_$(sync_lock_test_and_set$I$a$))",
   "set_direct_optab_handler (sync_lock_release_optab, $A, CODE_FOR_$(sync_lock_release$I$a$))",
+  "set_direct_optab_handler (sync_mem_exchange_seq_cst_optab, $A, CODE_FOR_$(sync_mem_exchange_seq_cst$I$a$))",
+  "set_direct_optab_handler (sync_mem_exchange_acq_rel_optab, $A, CODE_FOR_$(sync_mem_exchange_acq_rel$I$a$))",
+  "set_direct_optab_handler (sync_mem_exchange_acq_optab, $A, CODE_FOR_$(sync_mem_exchange_acq$I$a$))",
+  "set_direct_optab_handler (sync_mem_exchange_rel_optab, $A, CODE_FOR_$(sync_mem_exchange_rel$I$a$))",
   "set_optab_handler (vec_set_optab, $A, CODE_FOR_$(vec_set$a$))",
   "set_optab_handler (vec_extract_optab, $A, CODE_FOR_$(vec_extract$a$))",
   "set_optab_handler (vec_extract_even_optab, $A, CODE_FOR_$(vec_extract_even$a$))",
Index: tree.h
===================================================================
--- tree.h	(revision 173831)
+++ tree.h	(working copy)
@@ -5840,4 +5840,16 @@ is_lang_specific (tree t)
 /* In gimple-low.c.  */
 extern bool block_may_fallthru (const_tree);
 
+/* Memory model types for the __sync_mem* builtins.  */
+enum memmodel
+{
+  MEMMODEL_RELAXED = 0,
+  MEMMODEL_SEQ_CST = 1,
+  MEMMODEL_ACQUIRE = 2,
+  MEMMODEL_RELEASE = 3,
+  MEMMODEL_ACQ_REL = 4,
+  MEMMODEL_CONSUME = 5,
+  MEMMODEL_LAST = 6
+};
+
 #endif  /* GCC_TREE_H  */
Index: builtins.c
===================================================================
--- builtins.c	(revision 173831)
+++ builtins.c	(working copy)
@@ -5682,9 +5682,69 @@ expand_builtin_lock_test_and_set (enum m
   return expand_sync_lock_test_and_set (mem, val, target);
 }
 
+/* Given an integer representing an ``enum memmodel'', verify its
+   correctness and return the memory model enum.  */
+
+static enum memmodel
+get_memmodel (tree exp)
+{
+  rtx op;
+
+  if (TREE_CODE (exp) != INTEGER_CST)
+    {
+      error ("third argument to builtin is an invalid memory model");
+      return MEMMODEL_SEQ_CST;
+    }
+  op = expand_normal (exp);
+  if (INTVAL (op) < 0 || INTVAL (op) >= MEMMODEL_LAST)
+    {
+      error ("third argument to builtin is an invalid memory model");
+      return MEMMODEL_SEQ_CST;
+    }
+  return (enum memmodel) INTVAL (op);
+}
+
+/* Expand the __sync_mem_exchange intrinsic:
+
+   	TYPE __sync_mem_exchange (TYPE *to, TYPE from, enum memmodel)
+
+   EXP is the CALL_EXPR.
+   TARGET is an optional place for us to store the results.  */
+
+static rtx
+expand_builtin_mem_exchange (enum machine_mode mode, tree exp, rtx target)
+{
+  rtx val, mem;
+  enum machine_mode old_mode;
+  enum memmodel model;
+
+  model = get_memmodel (CALL_EXPR_ARG (exp, 2));
+  if (model != MEMMODEL_RELAXED
+      && model != MEMMODEL_SEQ_CST
+      && model != MEMMODEL_ACQ_REL
+      && model != MEMMODEL_RELEASE
+      && model != MEMMODEL_ACQUIRE)
+    {
+      error ("invalid memory model for %<__sync_mem_exchange%>");
+      return NULL_RTX;
+    }
+
+  /* Expand the operands.  */
+  mem = get_builtin_sync_mem (CALL_EXPR_ARG (exp, 0), mode);
+  val = expand_expr (CALL_EXPR_ARG (exp, 1), NULL_RTX, mode, EXPAND_NORMAL);
+  /* If VAL is promoted to a wider mode, convert it back to MODE.  Take care
+     of CONST_INTs, where we know the old_mode only from the call argument.  */
+  old_mode = GET_MODE (val);
+  if (old_mode == VOIDmode)
+    old_mode = TYPE_MODE (TREE_TYPE (CALL_EXPR_ARG (exp, 1)));
+  val = convert_modes (mode, old_mode, val, 1);
+
+  return expand_sync_mem_exchange (model, mem, val, target);
+}
+
 /* Expand the __sync_synchronize intrinsic.  */
 
-static void
+void
 expand_builtin_synchronize (void)
 {
   gimple x;
@@ -6495,6 +6555,17 @@ expand_builtin (tree exp, rtx target, rt
 	return target;
       break;
 
+    case BUILT_IN_MEM_EXCHANGE_1:
+    case BUILT_IN_MEM_EXCHANGE_2:
+    case BUILT_IN_MEM_EXCHANGE_4:
+    case BUILT_IN_MEM_EXCHANGE_8:
+    case BUILT_IN_MEM_EXCHANGE_16:
+      mode = get_builtin_sync_mode (fcode - BUILT_IN_MEM_EXCHANGE_1);
+      target = expand_builtin_mem_exchange (mode, exp, target);
+      if (target)
+	return target;
+      break;
+
     case BUILT_IN_LOCK_TEST_AND_SET_1:
     case BUILT_IN_LOCK_TEST_AND_SET_2:
     case BUILT_IN_LOCK_TEST_AND_SET_4:
Index: sync-builtins.def
===================================================================
--- sync-builtins.def	(revision 173831)
+++ sync-builtins.def	(working copy)
@@ -250,3 +250,24 @@ DEF_SYNC_BUILTIN (BUILT_IN_LOCK_RELEASE_
 
 DEF_SYNC_BUILTIN (BUILT_IN_SYNCHRONIZE, "__sync_synchronize",
 		  BT_FN_VOID, ATTR_NOTHROW_LEAF_LIST)
+
+/* Fine grained __sync* builtins for the C++ memory model.  */
+
+DEF_SYNC_BUILTIN (BUILT_IN_MEM_EXCHANGE_N,
+		  "__sync_mem_exchange",
+		  BT_FN_VOID_VAR, ATTR_NOTHROW_LEAF_LIST)
+DEF_SYNC_BUILTIN (BUILT_IN_MEM_EXCHANGE_1,
+		  "__sync_mem_exchange_1",
+		  BT_FN_I1_VPTR_I1_INT, ATTR_NOTHROW_LEAF_LIST)
+DEF_SYNC_BUILTIN (BUILT_IN_MEM_EXCHANGE_2,
+		  "__sync_mem_exchange_2",
+		  BT_FN_I2_VPTR_I2_INT, ATTR_NOTHROW_LEAF_LIST)
+DEF_SYNC_BUILTIN (BUILT_IN_MEM_EXCHANGE_4,
+		  "__sync_mem_exchange_4",
+		  BT_FN_I4_VPTR_I4_INT, ATTR_NOTHROW_LEAF_LIST)
+DEF_SYNC_BUILTIN (BUILT_IN_MEM_EXCHANGE_8,
+		  "__sync_mem_exchange_8",
+		  BT_FN_I8_VPTR_I8_INT, ATTR_NOTHROW_LEAF_LIST)
+DEF_SYNC_BUILTIN (BUILT_IN_MEM_EXCHANGE_16,
+		  "__sync_mem_exchange_16",
+		  BT_FN_I16_VPTR_I16_INT, ATTR_NOTHROW_LEAF_LIST)
Index: testsuite/gcc.dg/x86-sync-1.c
===================================================================
--- testsuite/gcc.dg/x86-sync-1.c	(revision 0)
+++ testsuite/gcc.dg/x86-sync-1.c	(revision 0)
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-dap" } */
+
+int i;
+
+void foo()
+{
+  __sync_mem_exchange (&i, 555, __SYNC_MEM_SEQ_CST);
+}
+
+/* { dg-final { scan-assembler "sync_mem_exchange_seq_cstsi" } } */
Index: builtin-types.def
===================================================================
--- builtin-types.def	(revision 173831)
+++ builtin-types.def	(working copy)
@@ -383,6 +383,11 @@ DEF_FUNCTION_TYPE_3 (BT_FN_VOID_OMPFN_PT
 		     BT_PTR, BT_UINT)
 DEF_FUNCTION_TYPE_3 (BT_FN_PTR_CONST_PTR_INT_SIZE, BT_PTR,
 		     BT_CONST_PTR, BT_INT, BT_SIZE)
+DEF_FUNCTION_TYPE_3 (BT_FN_I1_VPTR_I1_INT, BT_I1, BT_VOLATILE_PTR, BT_I1, BT_INT)
+DEF_FUNCTION_TYPE_3 (BT_FN_I2_VPTR_I2_INT, BT_I2, BT_VOLATILE_PTR, BT_I2, BT_INT)
+DEF_FUNCTION_TYPE_3 (BT_FN_I4_VPTR_I4_INT, BT_I4, BT_VOLATILE_PTR, BT_I4, BT_INT)
+DEF_FUNCTION_TYPE_3 (BT_FN_I8_VPTR_I8_INT, BT_I8, BT_VOLATILE_PTR, BT_I8, BT_INT)
+DEF_FUNCTION_TYPE_3 (BT_FN_I16_VPTR_I16_INT, BT_I16, BT_VOLATILE_PTR, BT_I16, BT_INT)
 
 DEF_FUNCTION_TYPE_4 (BT_FN_SIZE_CONST_PTR_SIZE_SIZE_FILEPTR,
 		     BT_SIZE, BT_CONST_PTR, BT_SIZE, BT_SIZE, BT_FILEPTR)
Index: expr.h
===================================================================
--- expr.h	(revision 173831)
+++ expr.h	(working copy)
@@ -217,6 +217,7 @@ rtx expand_bool_compare_and_swap (rtx, r
 rtx expand_sync_operation (rtx, rtx, enum rtx_code);
 rtx expand_sync_fetch_operation (rtx, rtx, enum rtx_code, bool, rtx);
 rtx expand_sync_lock_test_and_set (rtx, rtx, rtx);
+rtx expand_sync_mem_exchange (enum memmodel, rtx, rtx, rtx);
 \f
 /* Functions from expmed.c:  */
 
@@ -248,6 +249,7 @@ extern void expand_builtin_setjmp_receiv
 extern rtx expand_builtin_saveregs (void);
 extern void expand_builtin_trap (void);
 extern rtx builtin_strncpy_read_str (void *, HOST_WIDE_INT, enum machine_mode);
+extern void expand_builtin_synchronize (void);
 \f
 /* Functions from expr.c:  */
 
Index: config/i386/i386.md
===================================================================
--- config/i386/i386.md	(revision 173831)
+++ config/i386/i386.md	(working copy)
@@ -250,6 +250,7 @@
   UNSPECV_MWAIT
   UNSPECV_CMPXCHG
   UNSPECV_XCHG
+  UNSPECV_MEM_XCHG
   UNSPECV_LOCK
   UNSPECV_PROLOGUE_USE
   UNSPECV_CLD
Index: config/i386/sync.md
===================================================================
--- config/i386/sync.md	(revision 173831)
+++ config/i386/sync.md	(working copy)
@@ -232,6 +232,15 @@
   return "lock{%;} add{<imodesuffix>}\t{%1, %0|%0, %1}";
 })
 
+(define_insn "sync_mem_exchange_seq_cst<mode>"
+  [(set (match_operand:SWI 0 "register_operand" "=<r>")
+	(unspec_volatile:SWI
+         [(match_operand:SWI 1 "memory_operand" "+m")] UNSPECV_MEM_XCHG))
+   (set (match_dup 1)
+	(match_operand:SWI 2 "register_operand" "0"))]
+  ""
+  "xchg{<imodesuffix>}\t{%1, %0|%0, %1}")
+
 ;; Recall that xchg implicitly sets LOCK#, so adding it again wastes space.
 (define_insn "sync_lock_test_and_set<mode>"
   [(set (match_operand:SWI 0 "register_operand" "=<r>")

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: __sync_swap* with acq/rel/full memory barrier semantics
  2011-06-02 19:13     ` Aldy Hernandez
@ 2011-06-02 19:25       ` Jakub Jelinek
  2011-06-02 19:53         ` Aldy Hernandez
  0 siblings, 1 reply; 26+ messages in thread
From: Jakub Jelinek @ 2011-06-02 19:25 UTC (permalink / raw)
  To: Aldy Hernandez
  Cc: Andrew MacLeod, Joseph S. Myers, Richard Henderson, gcc-patches

On Thu, Jun 02, 2011 at 02:12:38PM -0500, Aldy Hernandez wrote:

> +/* This function expands a fine grained atomic exchange operation:
> +   atomically store VAL in MEM and return the previous value in MEM.
> +
> +   MEMMODEL is the memory model variant to use.
> +   TARGET is an option place to stick the return value.  */
> +
> +rtx
> +expand_sync_mem_exchange (enum memmodel model, rtx mem, rtx val, rtx target)
> +{
> +  enum machine_mode mode = GET_MODE (mem);
> +  enum insn_code icode;
> +  direct_optab op;
> +
> +  switch (model)
> +    {
> +    case MEMMODEL_RELAXED:
> +      /* ?? Eventually we should either just emit the atomic
> +	 instruction without any barriers (and thus allow movements
> +	 and transformations), or emit a relaxed builtin.
> +
> +	 It is still not clear whether any transformations are
> +	 permissible on the atomics (for example, CSE might break
> +	 coherence), so we might need to emit a relaxed builtin.
> +
> +         Until we figure this out, be conservative and fall
> +         through.  */
> +    case MEMMODEL_SEQ_CST:
> +      op = sync_mem_exchange_seq_cst_optab;
> +      break;
> +    case MEMMODEL_ACQUIRE:
> +      op = sync_mem_exchange_acq_optab;
> +      break;
> +    case MEMMODEL_RELEASE:
> +      op = sync_mem_exchange_rel_optab;
> +      break;
> +    case MEMMODEL_ACQ_REL:
> +      op = sync_mem_exchange_acq_rel_optab;
> +      break;

Wouldn't it be better to pass the model (as an extra CONST_INT
operand) to the expanders?  Targets where atomic instructions always act
as full barriers could just ignore that argument, other could decide what
to do based on the value.

	Jakub

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: __sync_swap* with acq/rel/full memory barrier semantics
  2011-06-02 19:25       ` Jakub Jelinek
@ 2011-06-02 19:53         ` Aldy Hernandez
  2011-06-03 14:27           ` Richard Henderson
  0 siblings, 1 reply; 26+ messages in thread
From: Aldy Hernandez @ 2011-06-02 19:53 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Andrew MacLeod, Joseph S. Myers, Richard Henderson, gcc-patches

On 06/02/11 14:25, Jakub Jelinek wrote:

>> +    case MEMMODEL_SEQ_CST:
>> +      op = sync_mem_exchange_seq_cst_optab;
>> +      break;
>> +    case MEMMODEL_ACQUIRE:
>> +      op = sync_mem_exchange_acq_optab;
>> +      break;
>> +    case MEMMODEL_RELEASE:
>> +      op = sync_mem_exchange_rel_optab;
>> +      break;
>> +    case MEMMODEL_ACQ_REL:
>> +      op = sync_mem_exchange_acq_rel_optab;
>> +      break;
>
> Wouldn't it be better to pass the model (as an extra CONST_INT
> operand) to the expanders?  Targets where atomic instructions always act
> as full barriers could just ignore that argument, other could decide what
> to do based on the value.

*shrug* I don't care.  Whatever everyone agrees on.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: __sync_swap* with acq/rel/full memory barrier semantics
  2011-06-02 19:53         ` Aldy Hernandez
@ 2011-06-03 14:27           ` Richard Henderson
  2011-06-17 22:21             ` Andrew MacLeod
  0 siblings, 1 reply; 26+ messages in thread
From: Richard Henderson @ 2011-06-03 14:27 UTC (permalink / raw)
  To: Aldy Hernandez
  Cc: Jakub Jelinek, Andrew MacLeod, Joseph S. Myers, gcc-patches

On 06/02/2011 02:52 PM, Aldy Hernandez wrote:
> On 06/02/11 14:25, Jakub Jelinek wrote:
> 
>>> +    case MEMMODEL_SEQ_CST:
>>> +      op = sync_mem_exchange_seq_cst_optab;
>>> +      break;
>>> +    case MEMMODEL_ACQUIRE:
>>> +      op = sync_mem_exchange_acq_optab;
>>> +      break;
>>> +    case MEMMODEL_RELEASE:
>>> +      op = sync_mem_exchange_rel_optab;
>>> +      break;
>>> +    case MEMMODEL_ACQ_REL:
>>> +      op = sync_mem_exchange_acq_rel_optab;
>>> +      break;
>>
>> Wouldn't it be better to pass the model (as an extra CONST_INT
>> operand) to the expanders?  Targets where atomic instructions always act
>> as full barriers could just ignore that argument, other could decide what
>> to do based on the value.
> 
> *shrug* I don't care.  Whatever everyone agrees on.

Let's do that.  Many of the targets will be expanding these to
a somewhat longer sequence at some stage, and they'll all be
95% identical.  The extra operand ought to make for less
boiler-plate code.


r~

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: __sync_swap* with acq/rel/full memory barrier semantics
  2011-06-03 14:27           ` Richard Henderson
@ 2011-06-17 22:21             ` Andrew MacLeod
  2011-06-18 19:49               ` Richard Henderson
  2011-06-18 23:49               ` Richard Henderson
  0 siblings, 2 replies; 26+ messages in thread
From: Andrew MacLeod @ 2011-06-17 22:21 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Aldy Hernandez, Jakub Jelinek, Joseph S. Myers, gcc-patches,
	Andrew MacLeod

[-- Attachment #1: Type: text/plain, Size: 2589 bytes --]

> On 06/02/2011 02:52 PM, Aldy Hernandez wrote:
>> Wouldn't it be better to pass the model (as an extra CONST_INT
>> operand) to the expanders?  Targets where atomic instructions always act
>> as full barriers could just ignore that argument, other could decide what
>> to do based on the value.
>> *shrug* I don't care.  Whatever everyone agrees on.
> Let's do that.  Many of the targets will be expanding these to
> a somewhat longer sequence at some stage, and they'll all be
> 95% identical.  The extra operand ought to make for less
> boiler-plate code.
>
>

OK, here's Aldy's patch modified to make the memory model a parameter to 
the RTL pattern.  I haven't worked with RTL for a while, so hopefully 
it's close to right :-)

If we can settle on the implementation, I'll proceed with the rest of 
the required atomics and then make the changes required to libstdc++-v3 
all at once.

Fortran seems to decide to copy only some of builtin-types.def in its 
own private types.def... dare I ask why?

Other changes...

Rather than duplicating the code, expand_sync_mem_exchange() now 
calls expand_sync_lock_test_and_set() if there is no sync_mem_exchange 
pattern.  This will mean when libstdc++ is converted to directly use the 
new __sync_mem_exchange, all existing architectures will still work as 
they do today, even if they dont provide the new patterns.  That should 
ease the transition if the current behaviour is retained.

I'm also now inserting a _sync_synchronize() before expanding the 
lock_test_and_set() IFF the memory model is 'acq_rel' or 'seq_cst'. My 
understanding is that lock_test_and_set() is defined to be an acquire 
barrier only, and the results may not be correct without the extra 
synchronization.  (the processor is free to delay stores after the 
instruction if is only an acquire barrier)  I assume the compiler's 
lock_test_and_set builtin is considered to have the same characteristics 
as the intel instruction...      for the i386 port, i turned 
mem_exchange into a define_expand to issue the barrier if need be, and 
then follow it with the current lock_test_and_set insn.

Finally, I moved the definition of the various memmodel modes to 
machmode.h from tree.h.  This allows rtl pattern code to check memory 
order values during the expansion of patterns/insns.

So my rusty hands screwed around with the patch quite a bit... have a 
look.  thoughts?

bootstraps and no regressions on x86_64-unknown-linux.  Do we apply this 
to mainline?  or cxx-mem-model and then bring it all over later when 
they are all done and "perfected" ?

Andrew



[-- Attachment #2: exchange.patch --]
[-- Type: text/plain, Size: 18400 bytes --]


	* doc/extend.texi (__sync_mem_exchange): Document.
	* cppbuiltin.c (define__GNUC__): Define __SYNC_MEM*.
	* machmode.h (enum memmodel): New.
	* c-family/c-common.c (BUILT_IN_MEM_EXCHANGE_N): Add case.
	* optabs.c (expand_sync_mem_exchange): New.
	* optabs.h (enum direct_optab_index): Add DOI_sync_mem_exchange entry.
	(sync_mem_exchange_optab): Define.
	* genopinit.c: Add entry for sync_mem_exchange.
	* builtins.c (get_memmodel): New.
	(expand_builtin_mem_exchange): New.
	(expand_builtin_synchronize): Remove static.
	(expand_builtin): Add cases for BUILT_IN_MEM_EXCHANGE_*.
	* sync-builtins.def: Add entries for BUILT_IN_MEM_EXCHANGE_*.
	* testsuite/gcc.dg/x86-sync-1.c: New test.
	* builtin-types.def (BT_FN_I{1,2,4,8,16}_VPTR_I{1,2,4,8,16}_INT): New.
	* expr.h (expand_sync_mem_exchange): Declare.
	(expand_builtin_synchronize): Declare.
	* fortran/types.def (BT_FN_I{1,2,4,8,16}_VPTR_I{1,2,4,8,16}_INT): New.
	* Makefile.in (cppbuiltin.o) Add missing dependency on $(TREE_H)
	* config/i386/i386.md (UNSPECV_MEM_XCHG): New.
	* config/i386/sync.md (sync_mem_exchange<mode>): New pattern.


Index: doc/extend.texi
===================================================================
*** doc/extend.texi	(revision 174933)
--- doc/extend.texi	(working copy)
*************** This builtin is not a full barrier, but 
*** 6728,6733 ****
--- 6728,6751 ----
  This means that all previous memory stores are globally visible, and all
  previous memory loads have been satisfied, but following memory reads
  are not prevented from being speculated to before the barrier.
+ 
+ @item @var{type} __sync_mem_exchange (@var{type} *ptr, @var{type} value, int memmodel, ...)
+ @findex __sync_mem_exchange
+ This builtin implements an atomic exchange operation within the
+ constraints of a memory model.  It writes @var{value} into
+ @code{*@var{ptr}}, and returns the previous contents of
+ @code{*@var{ptr}}.
+ 
+ The valid memory model variants for this builtin are
+ __SYNC_MEM_RELAXED, __SYNC_MEM_SEQ_CST, __SYNC_MEM_ACQUIRE,
+ __SYNC_MEM_RELEASE, and __SYNC_MEM_ACQ_REL.  The target pattern is responsible
+ for issuing the different synchronization instructions. It should default to 
+ the more restrictive memory model, the sequentially consistent model.  If 
+ nothing is implemented for the target, the compiler will implement it by
+ calling the __sync_lock_test_and_set builtin.  If the memory model is more
+ restrictive than memory_order_acquire, a memory barrier is emitted before
+ the instruction.
+ 
  @end table
  
  @node Object Size Checking
Index: cppbuiltin.c
===================================================================
*** cppbuiltin.c	(revision 174933)
--- cppbuiltin.c	(working copy)
*************** define__GNUC__ (cpp_reader *pfile)
*** 66,71 ****
--- 66,77 ----
    cpp_define_formatted (pfile, "__GNUC_MINOR__=%d", minor);
    cpp_define_formatted (pfile, "__GNUC_PATCHLEVEL__=%d", patchlevel);
    cpp_define_formatted (pfile, "__VERSION__=\"%s\"", version_string);
+   cpp_define_formatted (pfile, "__SYNC_MEM_RELAXED=%d", MEMMODEL_RELAXED);
+   cpp_define_formatted (pfile, "__SYNC_MEM_SEQ_CST=%d", MEMMODEL_SEQ_CST);
+   cpp_define_formatted (pfile, "__SYNC_MEM_ACQUIRE=%d", MEMMODEL_ACQUIRE);
+   cpp_define_formatted (pfile, "__SYNC_MEM_RELEASE=%d", MEMMODEL_RELEASE);
+   cpp_define_formatted (pfile, "__SYNC_MEM_ACQ_REL=%d", MEMMODEL_ACQ_REL);
+   cpp_define_formatted (pfile, "__SYNC_MEM_CONSUME=%d", MEMMODEL_CONSUME);
  }
  
  
Index: machmode.h
===================================================================
*** machmode.h	(revision 174933)
--- machmode.h	(working copy)
*************** extern enum machine_mode ptr_mode;
*** 275,278 ****
--- 275,291 ----
  /* Target-dependent machine mode initialization - in insn-modes.c.  */
  extern void init_adjust_machine_modes (void);
  
+ /* Memory model types for the __sync_mem* builtins. 
+    This must match the order in libstdc++-v3/include/bits/atomic_base.h.  */
+ enum memmodel
+ {
+   MEMMODEL_RELAXED = 0,
+   MEMMODEL_CONSUME = 1,
+   MEMMODEL_ACQUIRE = 2,
+   MEMMODEL_RELEASE = 3,
+   MEMMODEL_ACQ_REL = 4,
+   MEMMODEL_SEQ_CST = 5,
+   MEMMODEL_LAST = 6
+ };
+ 
  #endif /* not HAVE_MACHINE_MODES */
Index: c-family/c-common.c
===================================================================
*** c-family/c-common.c	(revision 174933)
--- c-family/c-common.c	(working copy)
*************** resolve_overloaded_builtin (location_t l
*** 9059,9064 ****
--- 9059,9065 ----
      case BUILT_IN_VAL_COMPARE_AND_SWAP_N:
      case BUILT_IN_LOCK_TEST_AND_SET_N:
      case BUILT_IN_LOCK_RELEASE_N:
+     case BUILT_IN_MEM_EXCHANGE_N:
        {
  	int n = sync_resolve_size (function, params);
  	tree new_function, first_param, result;
Index: optabs.c
===================================================================
*** optabs.c	(revision 174933)
--- optabs.c	(working copy)
*************** expand_sync_lock_test_and_set (rtx mem, 
*** 7037,7042 ****
--- 7037,7082 ----
  
    return NULL_RTX;
  }
+ 
+ /* This function expands the atomic exchange operation:
+    atomically store VAL in MEM and return the previous value in MEM.
+ 
+    MEMMODEL is the memory model variant to use.
+    TARGET is an option place to stick the return value.  */
+ 
+ rtx
+ expand_sync_mem_exchange (enum memmodel model, rtx mem, rtx val, rtx target)
+ {
+   enum machine_mode mode = GET_MODE (mem);
+   enum insn_code icode;
+ 
+   /* If the target supports the exchange directly, great.  */
+   icode = direct_optab_handler (sync_mem_exchange_optab, mode);
+   if (icode != CODE_FOR_nothing)
+     {
+       struct expand_operand ops[4];
+ 
+       create_output_operand (&ops[0], target, mode);
+       create_fixed_operand (&ops[1], mem);
+       /* VAL may have been promoted to a wider mode.  Shrink it if so.  */
+       create_convert_operand_to (&ops[2], val, mode, true);
+       create_integer_operand (&ops[3], model);
+       if (maybe_expand_insn (icode, 4, ops))
+ 	return ops[0].value;
+     }
+ 
+   /* Legacy sync_lock_test_and_set works the same, but is only defined as an 
+      acquire barrier.  If the pattern exists, and the memory model is stronger
+      than acquire, add a release barrier before the instruction.
+      The barrier is not needed if sync_lock_test_and_set doesn't exist since
+      it will expand into a compare-and-swap loop.  */
+   icode = direct_optab_handler (sync_lock_test_and_set_optab, mode);
+   if ((icode != CODE_FOR_nothing) && (model == MEMMODEL_SEQ_CST || 
+ 				     model == MEMMODEL_ACQ_REL))
+     expand_builtin_synchronize ();
+ 
+   return expand_sync_lock_test_and_set (mem, val, target);
+ }
  \f
  /* Return true if OPERAND is suitable for operand number OPNO of
     instruction ICODE.  */
Index: optabs.h
===================================================================
*** optabs.h	(revision 174933)
--- optabs.h	(working copy)
*************** enum direct_optab_index
*** 675,680 ****
--- 675,683 ----
    /* Atomic clear with release semantics.  */
    DOI_sync_lock_release,
  
+   /* Atomic operations with C++0x memory model parameters. */
+   DOI_sync_mem_exchange,
+ 
    DOI_MAX
  };
  
*************** typedef struct direct_optab_d *direct_op
*** 722,727 ****
--- 725,733 ----
    (&direct_optab_table[(int) DOI_sync_lock_test_and_set])
  #define sync_lock_release_optab \
    (&direct_optab_table[(int) DOI_sync_lock_release])
+ 
+ #define sync_mem_exchange_optab \
+   (&direct_optab_table[(int) DOI_sync_mem_exchange])
  \f
  /* Target-dependent globals.  */
  struct target_optabs {
Index: genopinit.c
===================================================================
*** genopinit.c	(revision 174933)
--- genopinit.c	(working copy)
*************** static const char * const optabs[] =
*** 240,245 ****
--- 240,246 ----
    "set_direct_optab_handler (sync_compare_and_swap_optab, $A, CODE_FOR_$(sync_compare_and_swap$I$a$))",
    "set_direct_optab_handler (sync_lock_test_and_set_optab, $A, CODE_FOR_$(sync_lock_test_and_set$I$a$))",
    "set_direct_optab_handler (sync_lock_release_optab, $A, CODE_FOR_$(sync_lock_release$I$a$))",
+   "set_direct_optab_handler (sync_mem_exchange_optab, $A, CODE_FOR_$(sync_mem_exchange$I$a$))",
    "set_optab_handler (vec_set_optab, $A, CODE_FOR_$(vec_set$a$))",
    "set_optab_handler (vec_extract_optab, $A, CODE_FOR_$(vec_extract$a$))",
    "set_optab_handler (vec_extract_even_optab, $A, CODE_FOR_$(vec_extract_even$a$))",
Index: builtins.c
===================================================================
*** builtins.c	(revision 174933)
--- builtins.c	(working copy)
*************** expand_builtin_lock_test_and_set (enum m
*** 5192,5200 ****
    return expand_sync_lock_test_and_set (mem, val, target);
  }
  
  /* Expand the __sync_synchronize intrinsic.  */
  
! static void
  expand_builtin_synchronize (void)
  {
    gimple x;
--- 5192,5260 ----
    return expand_sync_lock_test_and_set (mem, val, target);
  }
  
+ /* Given an integer representing an ``enum memmodel'', verify its
+    correctness and return the memory model enum.  */
+ 
+ static enum memmodel
+ get_memmodel (tree exp)
+ {
+   rtx op;
+ 
+   if (TREE_CODE (exp) != INTEGER_CST)
+     {
+       error ("third argument to builtin is an invalid memory model");
+       return MEMMODEL_SEQ_CST;
+     }
+   op = expand_normal (exp);
+   if (INTVAL (op) < 0 || INTVAL (op) >= MEMMODEL_LAST)
+     {
+       error ("third argument to builtin is an invalid memory model");
+       return MEMMODEL_SEQ_CST;
+     }
+   return (enum memmodel) INTVAL (op);
+ }
+ 
+ /* Expand the __sync_mem_exchange intrinsic:
+ 
+    	TYPE __sync_mem_exchange (TYPE *to, TYPE from, enum memmodel)
+ 
+    EXP is the CALL_EXPR.
+    TARGET is an optional place for us to store the results.  */
+ 
+ static rtx
+ expand_builtin_mem_exchange (enum machine_mode mode, tree exp, rtx target)
+ {
+   rtx val, mem;
+   enum machine_mode old_mode;
+   enum memmodel model;
+ 
+   model = get_memmodel (CALL_EXPR_ARG (exp, 2));
+   if (model != MEMMODEL_RELAXED
+       && model != MEMMODEL_SEQ_CST
+       && model != MEMMODEL_ACQ_REL
+       && model != MEMMODEL_RELEASE
+       && model != MEMMODEL_ACQUIRE)
+     {
+       error ("invalid memory model for %<__sync_mem_exchange%>");
+       return NULL_RTX;
+     }
+ 
+   /* Expand the operands.  */
+   mem = get_builtin_sync_mem (CALL_EXPR_ARG (exp, 0), mode);
+   val = expand_expr (CALL_EXPR_ARG (exp, 1), NULL_RTX, mode, EXPAND_NORMAL);
+   /* If VAL is promoted to a wider mode, convert it back to MODE.  Take care
+      of CONST_INTs, where we know the old_mode only from the call argument.  */
+   old_mode = GET_MODE (val);
+   if (old_mode == VOIDmode)
+     old_mode = TYPE_MODE (TREE_TYPE (CALL_EXPR_ARG (exp, 1)));
+   val = convert_modes (mode, old_mode, val, 1);
+ 
+   return expand_sync_mem_exchange (model, mem, val, target);
+ }
+ 
  /* Expand the __sync_synchronize intrinsic.  */
  
! void
  expand_builtin_synchronize (void)
  {
    gimple x;
*************** expand_builtin (tree exp, rtx target, rt
*** 6000,6005 ****
--- 6060,6076 ----
  	return target;
        break;
  
+     case BUILT_IN_MEM_EXCHANGE_1:
+     case BUILT_IN_MEM_EXCHANGE_2:
+     case BUILT_IN_MEM_EXCHANGE_4:
+     case BUILT_IN_MEM_EXCHANGE_8:
+     case BUILT_IN_MEM_EXCHANGE_16:
+       mode = get_builtin_sync_mode (fcode - BUILT_IN_MEM_EXCHANGE_1);
+       target = expand_builtin_mem_exchange (mode, exp, target);
+       if (target)
+ 	return target;
+       break;
+ 
      case BUILT_IN_LOCK_TEST_AND_SET_1:
      case BUILT_IN_LOCK_TEST_AND_SET_2:
      case BUILT_IN_LOCK_TEST_AND_SET_4:
Index: sync-builtins.def
===================================================================
*** sync-builtins.def	(revision 174933)
--- sync-builtins.def	(working copy)
*************** DEF_SYNC_BUILTIN (BUILT_IN_LOCK_RELEASE_
*** 250,252 ****
--- 250,273 ----
  
  DEF_SYNC_BUILTIN (BUILT_IN_SYNCHRONIZE, "__sync_synchronize",
  		  BT_FN_VOID, ATTR_NOTHROW_LEAF_LIST)
+ 
+ /* __sync* builtins for the C++ memory model.  */
+ 
+ DEF_SYNC_BUILTIN (BUILT_IN_MEM_EXCHANGE_N,
+ 		  "__sync_mem_exchange",
+ 		  BT_FN_VOID_VAR, ATTR_NOTHROW_LEAF_LIST)
+ DEF_SYNC_BUILTIN (BUILT_IN_MEM_EXCHANGE_1,
+ 		  "__sync_mem_exchange_1",
+ 		  BT_FN_I1_VPTR_I1_INT, ATTR_NOTHROW_LEAF_LIST)
+ DEF_SYNC_BUILTIN (BUILT_IN_MEM_EXCHANGE_2,
+ 		  "__sync_mem_exchange_2",
+ 		  BT_FN_I2_VPTR_I2_INT, ATTR_NOTHROW_LEAF_LIST)
+ DEF_SYNC_BUILTIN (BUILT_IN_MEM_EXCHANGE_4,
+ 		  "__sync_mem_exchange_4",
+ 		  BT_FN_I4_VPTR_I4_INT, ATTR_NOTHROW_LEAF_LIST)
+ DEF_SYNC_BUILTIN (BUILT_IN_MEM_EXCHANGE_8,
+ 		  "__sync_mem_exchange_8",
+ 		  BT_FN_I8_VPTR_I8_INT, ATTR_NOTHROW_LEAF_LIST)
+ DEF_SYNC_BUILTIN (BUILT_IN_MEM_EXCHANGE_16,
+ 		  "__sync_mem_exchange_16",
+ 		  BT_FN_I16_VPTR_I16_INT, ATTR_NOTHROW_LEAF_LIST)
Index: testsuite/gcc.dg/x86-sync-1.c
===================================================================
*** testsuite/gcc.dg/x86-sync-1.c	(revision 0)
--- testsuite/gcc.dg/x86-sync-1.c	(revision 0)
***************
*** 0 ****
--- 1,9 ----
+ /* { dg-do compile } */
+ /* { dg-options "-dap" } */
+ 
+ int i;
+ 
+ void foo()
+ {
+   __sync_mem_exchange (&i, 555, __SYNC_MEM_SEQ_CST);
+ }
Index: builtin-types.def
===================================================================
*** builtin-types.def	(revision 174933)
--- builtin-types.def	(working copy)
*************** DEF_FUNCTION_TYPE_3 (BT_FN_VOID_OMPFN_PT
*** 383,388 ****
--- 383,393 ----
  		     BT_PTR, BT_UINT)
  DEF_FUNCTION_TYPE_3 (BT_FN_PTR_CONST_PTR_INT_SIZE, BT_PTR,
  		     BT_CONST_PTR, BT_INT, BT_SIZE)
+ DEF_FUNCTION_TYPE_3 (BT_FN_I1_VPTR_I1_INT, BT_I1, BT_VOLATILE_PTR, BT_I1, BT_INT)
+ DEF_FUNCTION_TYPE_3 (BT_FN_I2_VPTR_I2_INT, BT_I2, BT_VOLATILE_PTR, BT_I2, BT_INT)
+ DEF_FUNCTION_TYPE_3 (BT_FN_I4_VPTR_I4_INT, BT_I4, BT_VOLATILE_PTR, BT_I4, BT_INT)
+ DEF_FUNCTION_TYPE_3 (BT_FN_I8_VPTR_I8_INT, BT_I8, BT_VOLATILE_PTR, BT_I8, BT_INT)
+ DEF_FUNCTION_TYPE_3 (BT_FN_I16_VPTR_I16_INT, BT_I16, BT_VOLATILE_PTR, BT_I16, BT_INT)
  
  DEF_FUNCTION_TYPE_4 (BT_FN_SIZE_CONST_PTR_SIZE_SIZE_FILEPTR,
  		     BT_SIZE, BT_CONST_PTR, BT_SIZE, BT_SIZE, BT_FILEPTR)
Index: expr.h
===================================================================
*** expr.h	(revision 174933)
--- expr.h	(working copy)
*************** rtx expand_bool_compare_and_swap (rtx, r
*** 217,222 ****
--- 217,223 ----
  rtx expand_sync_operation (rtx, rtx, enum rtx_code);
  rtx expand_sync_fetch_operation (rtx, rtx, enum rtx_code, bool, rtx);
  rtx expand_sync_lock_test_and_set (rtx, rtx, rtx);
+ rtx expand_sync_mem_exchange (enum memmodel, rtx, rtx, rtx);
  \f
  /* Functions from expmed.c:  */
  
*************** extern void expand_builtin_setjmp_receiv
*** 248,253 ****
--- 249,255 ----
  extern rtx expand_builtin_saveregs (void);
  extern void expand_builtin_trap (void);
  extern rtx builtin_strncpy_read_str (void *, HOST_WIDE_INT, enum machine_mode);
+ extern void expand_builtin_synchronize (void);
  \f
  /* Functions from expr.c:  */
  
Index: fortran/types.def
===================================================================
*** fortran/types.def	(revision 174933)
--- fortran/types.def	(working copy)
*************** DEF_FUNCTION_TYPE_3 (BT_FN_I16_VPTR_I16_
*** 120,125 ****
--- 120,132 ----
  DEF_FUNCTION_TYPE_3 (BT_FN_VOID_OMPFN_PTR_UINT, BT_VOID, BT_PTR_FN_VOID_PTR,
                       BT_PTR, BT_UINT)
  
+ DEF_FUNCTION_TYPE_3 (BT_FN_I1_VPTR_I1_INT, BT_I1, BT_VOLATILE_PTR, BT_I1, BT_INT)
+ DEF_FUNCTION_TYPE_3 (BT_FN_I2_VPTR_I2_INT, BT_I2, BT_VOLATILE_PTR, BT_I2, BT_INT)
+ DEF_FUNCTION_TYPE_3 (BT_FN_I4_VPTR_I4_INT, BT_I4, BT_VOLATILE_PTR, BT_I4, BT_INT)
+ DEF_FUNCTION_TYPE_3 (BT_FN_I8_VPTR_I8_INT, BT_I8, BT_VOLATILE_PTR, BT_I8, BT_INT)
+ DEF_FUNCTION_TYPE_3 (BT_FN_I16_VPTR_I16_INT, BT_I16, BT_VOLATILE_PTR, BT_I16, BT_INT)
+ 
+ 
  DEF_FUNCTION_TYPE_4 (BT_FN_VOID_OMPFN_PTR_UINT_UINT,
                       BT_VOID, BT_PTR_FN_VOID_PTR, BT_PTR, BT_UINT, BT_UINT)
  DEF_FUNCTION_TYPE_4 (BT_FN_VOID_PTR_WORD_WORD_PTR,
Index: Makefile.in
===================================================================
*** Makefile.in	(revision 174933)
--- Makefile.in	(working copy)
*************** PREPROCESSOR_DEFINES = \
*** 4048,4054 ****
    @TARGET_SYSTEM_ROOT_DEFINE@
  
  cppbuiltin.o: cppbuiltin.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
! 	cppbuiltin.h Makefile
  	$(COMPILER) $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) \
  	  $(PREPROCESSOR_DEFINES) -DBASEVER=$(BASEVER_s) \
  	  -c $(srcdir)/cppbuiltin.c $(OUTPUT_OPTION)
--- 4048,4054 ----
    @TARGET_SYSTEM_ROOT_DEFINE@
  
  cppbuiltin.o: cppbuiltin.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
! 	$(TREE_H) cppbuiltin.h Makefile
  	$(COMPILER) $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) \
  	  $(PREPROCESSOR_DEFINES) -DBASEVER=$(BASEVER_s) \
  	  -c $(srcdir)/cppbuiltin.c $(OUTPUT_OPTION)
Index: config/i386/i386.md
===================================================================
*** config/i386/i386.md	(revision 174933)
--- config/i386/i386.md	(working copy)
***************
*** 252,257 ****
--- 252,258 ----
    UNSPECV_MWAIT
    UNSPECV_CMPXCHG
    UNSPECV_XCHG
+   UNSPECV_MEM_XCHG
    UNSPECV_LOCK
    UNSPECV_PROLOGUE_USE
    UNSPECV_CLD
Index: config/i386/sync.md
===================================================================
*** config/i386/sync.md	(revision 174933)
--- config/i386/sync.md	(working copy)
***************
*** 232,237 ****
--- 232,257 ----
    return "lock{%;} add{<imodesuffix>}\t{%1, %0|%0, %1}";
  })
  
+ (define_expand "sync_mem_exchange<mode>"
+   [(set (match_operand:SWI 0 "register_operand" "=<r>")
+ 	(unspec_volatile:SWI
+          [(match_operand:SWI 1 "memory_operand" "+m")] UNSPECV_MEM_XCHG))
+    (set (match_dup 1)
+ 	(match_operand:SWI 2 "register_operand" "0"))
+    (match_operand:SI 3 "const_int_operand" "n")]
+   ""
+ {
+   /* lock_test_and_set is only an acquire barrier. If a stronger barrier is
+      required, issue a release barrier before the insn.  */
+   if (INTVAL (operands[3]) == MEMMODEL_ACQ_REL ||
+       INTVAL (operands[3]) == MEMMODEL_SEQ_CST)
+     emit_insn (gen_memory_barrier ());
+   emit_insn (gen_sync_lock_test_and_set<mode> (operands[0], 
+ 					       operands[1],
+ 					       operands[2]));
+   DONE;
+ })
+ 
  ;; Recall that xchg implicitly sets LOCK#, so adding it again wastes space.
  (define_insn "sync_lock_test_and_set<mode>"
    [(set (match_operand:SWI 0 "register_operand" "=<r>")

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: __sync_swap* with acq/rel/full memory barrier semantics
  2011-06-17 22:21             ` Andrew MacLeod
@ 2011-06-18 19:49               ` Richard Henderson
  2011-06-20 16:39                 ` Andrew MacLeod
  2011-06-18 23:49               ` Richard Henderson
  1 sibling, 1 reply; 26+ messages in thread
From: Richard Henderson @ 2011-06-18 19:49 UTC (permalink / raw)
  To: Andrew MacLeod
  Cc: Aldy Hernandez, Jakub Jelinek, Joseph S. Myers, gcc-patches

On 06/17/2011 02:12 PM, Andrew MacLeod wrote:
> --- machmode.h	(working copy)
> *************** extern enum machine_mode ptr_mode;
> *** 275,278 ****
> --- 275,291 ----
>   /* Target-dependent machine mode initialization - in insn-modes.c.  */
>   extern void init_adjust_machine_modes (void);
>   
> + /* Memory model types for the __sync_mem* builtins. 
> +    This must match the order in libstdc++-v3/include/bits/atomic_base.h.  */
> + enum memmodel
> + {
> +   MEMMODEL_RELAXED = 0,
> +   MEMMODEL_CONSUME = 1,
> +   MEMMODEL_ACQUIRE = 2,
> +   MEMMODEL_RELEASE = 3,
> +   MEMMODEL_ACQ_REL = 4,
> +   MEMMODEL_SEQ_CST = 5,
> +   MEMMODEL_LAST = 6
> + };

This isn't a very machine mode sort of define.
I think coretypes.h is a better choice.

> + static rtx
> + expand_builtin_mem_exchange (enum machine_mode mode, tree exp, rtx target)

Some names include "sync" and some don't?

> + DEF_SYNC_BUILTIN (BUILT_IN_MEM_EXCHANGE_N,
> + 		  "__sync_mem_exchange",
> + 		  BT_FN_VOID_VAR, ATTR_NOTHROW_LEAF_LIST)

Similarly...

> + (define_expand "sync_mem_exchange<mode>"
> +   [(set (match_operand:SWI 0 "register_operand" "=<r>")
> + 	(unspec_volatile:SWI
> +          [(match_operand:SWI 1 "memory_operand" "+m")] UNSPECV_MEM_XCHG))
> +    (set (match_dup 1)
> + 	(match_operand:SWI 2 "register_operand" "0"))
> +    (match_operand:SI 3 "const_int_operand" "n")]
> +   ""
> + {
> +   /* lock_test_and_set is only an acquire barrier. If a stronger barrier is
> +      required, issue a release barrier before the insn.  */
> +   if (INTVAL (operands[3]) == MEMMODEL_ACQ_REL ||
> +       INTVAL (operands[3]) == MEMMODEL_SEQ_CST)
> +     emit_insn (gen_memory_barrier ());
> +   emit_insn (gen_sync_lock_test_and_set<mode> (operands[0], 
> + 					       operands[1],
> + 					       operands[2]));
> +   DONE;

The xchg instruction is a full barrier; no need for anything extra here.
Indeed, you needn't define UNSPECV_MEM_XCHG either.  This could be as
simple as

(define_expand "sync_mem_exchange<mode>"
  [(match_operand:SWI 0 "register_operand" "")		;; output
   (match_operand:SWI 1 "memory_operand" "")		;; memory
   (match_operand:SWI 2 "register_operand" "")		;; input
   (match_operand:SI  3 "const_int_operand" "")]	;; memory model
  ""
{
  /* On i386 the xchg instruction is a full barrier.  Thus we
     can completely ignore the memory model operand.  */
  emit_insn (gen_sync_lock_test_and_set<mode>
		(operands[0], operands[1], operands[2]));
  DONE;
})


r~

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: __sync_swap* with acq/rel/full memory barrier semantics
  2011-06-17 22:21             ` Andrew MacLeod
  2011-06-18 19:49               ` Richard Henderson
@ 2011-06-18 23:49               ` Richard Henderson
  1 sibling, 0 replies; 26+ messages in thread
From: Richard Henderson @ 2011-06-18 23:49 UTC (permalink / raw)
  To: Andrew MacLeod
  Cc: Aldy Hernandez, Jakub Jelinek, Joseph S. Myers, gcc-patches

On 06/17/2011 02:12 PM, Andrew MacLeod wrote:
> Do we apply this to mainline?  or cxx-mem-model and then bring it all
> over later when they are all done and "perfected" ?

Let's put it over in cxx-mem-model at least until we settle on 
the interface at least.


r~

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: __sync_swap* with acq/rel/full memory barrier semantics
  2011-06-18 19:49               ` Richard Henderson
@ 2011-06-20 16:39                 ` Andrew MacLeod
  2011-06-20 22:50                   ` Richard Henderson
  2011-07-08 17:00                   ` __sync_swap* with acq/rel/full memory barrier semantics Aldy Hernandez
  0 siblings, 2 replies; 26+ messages in thread
From: Andrew MacLeod @ 2011-06-20 16:39 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Aldy Hernandez, Jakub Jelinek, Joseph S. Myers, gcc-patches

> On 06/17/2011 02:12 PM, Andrew MacLeod wrote:
>> --- machmode.h	(working copy)
>> *************** extern enum machine_mode ptr_mode;
>> *** 275,278 ****
>> --- 275,291 ----
>>    /* Target-dependent machine mode initialization - in insn-modes.c.  */
>>    extern void init_adjust_machine_modes (void);
>>
>> + /* Memory model types for the __sync_mem* builtins.
>> +    This must match the order in libstdc++-v3/include/bits/atomic_base.h.  */
>> + enum memmodel
>> + {
>> +   MEMMODEL_RELAXED = 0,
>> +   MEMMODEL_CONSUME = 1,
>> +   MEMMODEL_ACQUIRE = 2,
>> +   MEMMODEL_RELEASE = 3,
>> +   MEMMODEL_ACQ_REL = 4,
>> +   MEMMODEL_SEQ_CST = 5,
>> +   MEMMODEL_LAST = 6
>> + };
> This isn't a very machine mode sort of define.
> I think coretypes.h is a better choice.

cool that seems to work fine.  As long as its somewhere common.

>> + static rtx
>> + expand_builtin_mem_exchange (enum machine_mode mode, tree exp, rtx target)
> Some names include "sync" and some don't?

Well, I was going to blame Aldy :-)  but then I went to look at this, 
and thats the same way *all* the other __sync instructions seem to be.

ie:

builtins.c:expand_builtin_lock_test_and_set (enum machine_mode mode, 
tree exp,
builtins.c:    case BUILT_IN_LOCK_TEST_AND_SET_1:
builtins.c:    case BUILT_IN_LOCK_TEST_AND_SET_2:
builtins.c:    case BUILT_IN_LOCK_TEST_AND_SET_4:

whereas everything else is 'sync_lock_test_and_set'..

So i guess it falls to prior art... I assume Aldy just cut-and-pasted 
for his new routine and just changed the names in the same format.


>> + 
> The xchg instruction is a full barrier; no need for anything extra here.
> Indeed, you needn't define UNSPECV_MEM_XCHG either.  This could be as
> simple as
>
>
Ah, even better.  For some reason I thought I saw somewhere that it 
wasn't a full barrier.  Might have just been the documentation for 
lock_test_and_set.

Andrew

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: __sync_swap* with acq/rel/full memory barrier semantics
  2011-06-20 16:39                 ` Andrew MacLeod
@ 2011-06-20 22:50                   ` Richard Henderson
  2011-06-20 23:02                     ` Andrew MacLeod
  2011-07-08 17:00                   ` __sync_swap* with acq/rel/full memory barrier semantics Aldy Hernandez
  1 sibling, 1 reply; 26+ messages in thread
From: Richard Henderson @ 2011-06-20 22:50 UTC (permalink / raw)
  To: Andrew MacLeod
  Cc: Aldy Hernandez, Jakub Jelinek, Joseph S. Myers, gcc-patches

On 06/20/2011 09:22 AM, Andrew MacLeod wrote:
> builtins.c:expand_builtin_lock_test_and_set (enum machine_mode mode, tree exp,
> builtins.c:    case BUILT_IN_LOCK_TEST_AND_SET_1:
> builtins.c:    case BUILT_IN_LOCK_TEST_AND_SET_2:
> builtins.c:    case BUILT_IN_LOCK_TEST_AND_SET_4:
> 
> whereas everything else is 'sync_lock_test_and_set'..
> 
> So i guess it falls to prior art... I assume Aldy just cut-and-pasted
> for his new routine and just changed the names in the same format.

Heh.  So this is really my fault.  Ah well.

> Ah, even better. For some reason I thought I saw somewhere that it
> wasn't a full barrier. Might have just been the documentation for
> lock_test_and_set.

Very likely.


r~

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: __sync_swap* with acq/rel/full memory barrier semantics
  2011-06-20 22:50                   ` Richard Henderson
@ 2011-06-20 23:02                     ` Andrew MacLeod
  2011-06-20 23:29                       ` Richard Henderson
  0 siblings, 1 reply; 26+ messages in thread
From: Andrew MacLeod @ 2011-06-20 23:02 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Aldy Hernandez, Jakub Jelinek, Joseph S. Myers, gcc-patches

On 06/20/2011 06:33 PM, Richard Henderson wrote:
> On 06/20/2011 09:22 AM, Andrew MacLeod wrote:
>> builtins.c:expand_builtin_lock_test_and_set (enum machine_mode mode, tree exp,
>> builtins.c:    case BUILT_IN_LOCK_TEST_AND_SET_1:
>> builtins.c:    case BUILT_IN_LOCK_TEST_AND_SET_2:
>> builtins.c:    case BUILT_IN_LOCK_TEST_AND_SET_4:
>>
>> whereas everything else is 'sync_lock_test_and_set'..
>>
>> So i guess it falls to prior art... I assume Aldy just cut-and-pasted
>> for his new routine and just changed the names in the same format.
> Heh.  So this is really my fault.  Ah well.

yeah, from 2005 :-)    it seems to be localized to bultins.c, 
sync-builtins.def, and c-family/c-common.c.

If you want to standardize it with SYNC_  for all cases, I will create 
all the new ones that way.

I'm trying to avoid unnecessary noise on the branch.   I'm bringing the 
cxx-mem-model branch up to mainline revision now, so I can go and submit 
a patch to fix the existing ones right now on mainline if you want...  
turn them all into BUILT_IN_SYNC_LOCK_TEST_AND_SET  or whatever they 
need to be to match...  then it will be right and wont affect me at all :-)

Andrew

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: __sync_swap* with acq/rel/full memory barrier semantics
  2011-06-20 23:02                     ` Andrew MacLeod
@ 2011-06-20 23:29                       ` Richard Henderson
  2011-06-21 18:56                         ` __sync_swap* [ rename sync builtins ] Andrew MacLeod
  2011-06-21 23:03                         ` [cxx-mem-model] sync_mem_exchange implementation with memory model parameters Andrew MacLeod
  0 siblings, 2 replies; 26+ messages in thread
From: Richard Henderson @ 2011-06-20 23:29 UTC (permalink / raw)
  To: Andrew MacLeod
  Cc: Aldy Hernandez, Jakub Jelinek, Joseph S. Myers, gcc-patches

On 06/20/2011 03:53 PM, Andrew MacLeod wrote:
> On 06/20/2011 06:33 PM, Richard Henderson wrote:
> If you want to standardize it with SYNC_  for all cases, I will create all the new ones that way.

I do think the name of all the bits related to handling a builtin
function should match the builtin function itself.  It's less
confusing that way.

> I'm trying to avoid unnecessary noise on the branch. I'm bringing the
> cxx-mem-model branch up to mainline revision now, so I can go and
> submit a patch to fix the existing ones right now on mainline if you
> want... turn them all into BUILT_IN_SYNC_LOCK_TEST_AND_SET or
> whatever they need to be to match... then it will be right and wont
> affect me at all :-)

If you'd like to rename the existing stuff on mainline while you're
waiting for something else to run, please do.


r~

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: __sync_swap*  [ rename sync builtins ]
  2011-06-20 23:29                       ` Richard Henderson
@ 2011-06-21 18:56                         ` Andrew MacLeod
  2011-06-21 19:03                           ` Richard Henderson
  2011-06-21 23:03                         ` [cxx-mem-model] sync_mem_exchange implementation with memory model parameters Andrew MacLeod
  1 sibling, 1 reply; 26+ messages in thread
From: Andrew MacLeod @ 2011-06-21 18:56 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Aldy Hernandez, Jakub Jelinek, Joseph S. Myers, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2800 bytes --]

On 06/20/2011 06:58 PM, Richard Henderson wrote:
>
> If you'd like to rename the existing stuff on mainline while you're
> waiting for something else to run, please do.
>
OK, I generated a script to fix all the places where the built-in 
expansion name is missing the sync_ part of the built-in's real name.  
It looked in all .c .h and .def files in gcc and subdirectories.

I had to manually modify 3 different area the longer names caused line 
wrapping.

I did a full bootstrap/testsuite regression check on a fresh checkout 
today on x86_64-unknown-linux-gnu.  The change should be purely cosmestic.

I presume this is what you were looking for... OK to check into mainline?

The rename script (I think I got everything):

     s/BUILT_IN_FETCH_AND_ADD/BUILT_IN_SYNC_FETCH_AND_ADD/g
     s/BUILT_IN_FETCH_AND_SUB/BUILT_IN_SYNC_FETCH_AND_SUB/g
     s/BUILT_IN_FETCH_AND_OR/BUILT_IN_SYNC_FETCH_AND_OR/g
     s/BUILT_IN_FETCH_AND_AND/BUILT_IN_SYNC_FETCH_AND_AND/g
     s/BUILT_IN_FETCH_AND_XOR/BUILT_IN_SYNC_FETCH_AND_XOR/g
     s/BUILT_IN_FETCH_AND_NAND/BUILT_IN_SYNC_FETCH_AND_NAND/g
     s/BUILT_IN_ADD_AND_FETCH/BUILT_IN_SYNC_ADD_AND_FETCH/g
     s/BUILT_IN_SUB_AND_FETCH/BUILT_IN_SYNC_SUB_AND_FETCH/g
     s/BUILT_IN_OR_AND_FETCH/BUILT_IN_SYNC_OR_AND_FETCH/g
     s/BUILT_IN_AND_AND_FETCH/BUILT_IN_SYNC_AND_AND_FETCH/g
     s/BUILT_IN_XOR_AND_FETCH/BUILT_IN_SYNC_XOR_AND_FETCH/g
     s/BUILT_IN_NAND_AND_FETCH/BUILT_IN_SYNC_NAND_AND_FETCH/g
     s/BUILT_IN_BOOL_COMPARE_AND_SWAP/BUILT_IN_SYNC_BOOL_COMPARE_AND_SWAP/g
     s/BUILT_IN_VAL_COMPARE_AND_SWAP/BUILT_IN_SYNC_VAL_COMPARE_AND_SWAP/g
     s/BUILT_IN_LOCK_TEST_AND_SET/BUILT_IN_SYNC_LOCK_TEST_AND_SET/g
     s/BUILT_IN_LOCK_RELEASE/BUILT_IN_SYNC_LOCK_RELEASE/g
     s/BUILT_IN_SYNCHRONIZE/BUILT_IN_SYNC_SYNCHRONIZE/g
     s/builtin_fetch_and_add/builtin_sync_fetch_and_add/g
     s/builtin_fetch_and_sub/builtin_sync_fetch_and_sub/g
     s/builtin_fetch_and_or/builtin_sync_fetch_and_or/g
     s/builtin_fetch_and_and/builtin_sync_fetch_and_and/g
     s/builtin_fetch_and_xor/builtin_sync_fetch_and_xor/g
     s/builtin_fetch_and_nand/builtin_sync_fetch_and_nand/g
     s/builtin_add_and_fetch/builtin_sync_add_and_fetch/g
     s/builtin_sub_and_fetch/builtin_sync_sub_and_fetch/g
     s/builtin_or_and_fetch/builtin_sync_or_and_fetch/g
     s/builtin_and_and_fetch/builtin_sync_and_and_fetch/g
     s/builtin_xor_and_fetch/builtin_sync_xor_and_fetch/g
     s/builtin_nand_and_fetch/builtin_sync_nand_and_fetch/g
     s/builtin_bool_compare_and_swap/builtin_sync_bool_compare_and_swap/g
     s/builtin_val_compare_and_swap/builtin_sync_val_compare_and_swap/g
     s/builtin_lock_test_and_set/builtin_sync_lock_test_and_set/g
     s/builtin_lock_release/builtin_sync_lock_release/g
     s/builtin_synchronize/builtin_sync_synchronize/g



[-- Attachment #2: rename.patch --]
[-- Type: text/plain, Size: 59441 bytes --]

	* c-family/c-common.c: Add sync_ or SYNC__ to builtin names.
	* c-family/c-omp.c: Add sync_ or SYNC__ to builtin names.
	* java/builtins.c: Add sync_ or SYNC__ to builtin names.
	* java/expr.c: Add sync_ or SYNC__ to builtin names.
	* builtins.c: Add sync_ or SYNC__ to builtin names.
	* sync-builtins.def: Add sync_ or SYNC__ to builtin names.
	* omp-low.c: Add sync_ or SYNC__ to builtin names.
	* cp/semantics.c: Add sync_ or SYNC__ to builtin names.
	* fortran/trans-openmp.c: Add sync_ or SYNC__ to builtin names.
	* fortran/trans-stmt.c: Add sync_ or SYNC__ to builtin names.
	* fortran/trans-decl.c: Add sync_ or SYNC__ to builtin names.

Index: c-family/c-common.c
===================================================================
*** c-family/c-common.c	(revision 175226)
--- c-family/c-common.c	(working copy)
*************** resolve_overloaded_builtin (location_t l
*** 9044,9065 ****
    /* Handle BUILT_IN_NORMAL here.  */
    switch (orig_code)
      {
!     case BUILT_IN_FETCH_AND_ADD_N:
!     case BUILT_IN_FETCH_AND_SUB_N:
!     case BUILT_IN_FETCH_AND_OR_N:
!     case BUILT_IN_FETCH_AND_AND_N:
!     case BUILT_IN_FETCH_AND_XOR_N:
!     case BUILT_IN_FETCH_AND_NAND_N:
!     case BUILT_IN_ADD_AND_FETCH_N:
!     case BUILT_IN_SUB_AND_FETCH_N:
!     case BUILT_IN_OR_AND_FETCH_N:
!     case BUILT_IN_AND_AND_FETCH_N:
!     case BUILT_IN_XOR_AND_FETCH_N:
!     case BUILT_IN_NAND_AND_FETCH_N:
!     case BUILT_IN_BOOL_COMPARE_AND_SWAP_N:
!     case BUILT_IN_VAL_COMPARE_AND_SWAP_N:
!     case BUILT_IN_LOCK_TEST_AND_SET_N:
!     case BUILT_IN_LOCK_RELEASE_N:
        {
  	int n = sync_resolve_size (function, params);
  	tree new_function, first_param, result;
--- 9044,9065 ----
    /* Handle BUILT_IN_NORMAL here.  */
    switch (orig_code)
      {
!     case BUILT_IN_SYNC_FETCH_AND_ADD_N:
!     case BUILT_IN_SYNC_FETCH_AND_SUB_N:
!     case BUILT_IN_SYNC_FETCH_AND_OR_N:
!     case BUILT_IN_SYNC_FETCH_AND_AND_N:
!     case BUILT_IN_SYNC_FETCH_AND_XOR_N:
!     case BUILT_IN_SYNC_FETCH_AND_NAND_N:
!     case BUILT_IN_SYNC_ADD_AND_FETCH_N:
!     case BUILT_IN_SYNC_SUB_AND_FETCH_N:
!     case BUILT_IN_SYNC_OR_AND_FETCH_N:
!     case BUILT_IN_SYNC_AND_AND_FETCH_N:
!     case BUILT_IN_SYNC_XOR_AND_FETCH_N:
!     case BUILT_IN_SYNC_NAND_AND_FETCH_N:
!     case BUILT_IN_SYNC_BOOL_COMPARE_AND_SWAP_N:
!     case BUILT_IN_SYNC_VAL_COMPARE_AND_SWAP_N:
!     case BUILT_IN_SYNC_LOCK_TEST_AND_SET_N:
!     case BUILT_IN_SYNC_LOCK_RELEASE_N:
        {
  	int n = sync_resolve_size (function, params);
  	tree new_function, first_param, result;
*************** resolve_overloaded_builtin (location_t l
*** 9073,9080 ****
  
  	first_param = VEC_index (tree, params, 0);
  	result = build_function_call_vec (loc, new_function, params, NULL);
! 	if (orig_code != BUILT_IN_BOOL_COMPARE_AND_SWAP_N
! 	    && orig_code != BUILT_IN_LOCK_RELEASE_N)
  	  result = sync_resolve_return (first_param, result);
  
  	return result;
--- 9073,9080 ----
  
  	first_param = VEC_index (tree, params, 0);
  	result = build_function_call_vec (loc, new_function, params, NULL);
! 	if (orig_code != BUILT_IN_SYNC_BOOL_COMPARE_AND_SWAP_N
! 	    && orig_code != BUILT_IN_SYNC_LOCK_RELEASE_N)
  	  result = sync_resolve_return (first_param, result);
  
  	return result;
Index: c-family/c-omp.c
===================================================================
*** c-family/c-omp.c	(revision 175226)
--- c-family/c-omp.c	(working copy)
*************** c_finish_omp_flush (location_t loc)
*** 169,175 ****
  {
    tree x;
  
!   x = built_in_decls[BUILT_IN_SYNCHRONIZE];
    x = build_call_expr_loc (loc, x, 0);
    add_stmt (x);
  }
--- 169,175 ----
  {
    tree x;
  
!   x = built_in_decls[BUILT_IN_SYNC_SYNCHRONIZE];
    x = build_call_expr_loc (loc, x, 0);
    add_stmt (x);
  }
Index: java/builtins.c
===================================================================
*** java/builtins.c	(revision 175226)
--- java/builtins.c	(working copy)
*************** compareAndSwapInt_builtin (tree method_r
*** 331,338 ****
        (void) value_type; /* Avoid set but not used warning.  */
  
        addr = build_addr_sum (int_type_node, obj_arg, offset_arg);
!       stmt = build_call_expr (built_in_decls[BUILT_IN_BOOL_COMPARE_AND_SWAP_4],
! 			      3, addr, expected_arg, value_arg);
  
        return build_check_this (stmt, this_arg);
      }
--- 331,339 ----
        (void) value_type; /* Avoid set but not used warning.  */
  
        addr = build_addr_sum (int_type_node, obj_arg, offset_arg);
!       stmt = build_call_expr 
! 			(built_in_decls[BUILT_IN_SYNC_BOOL_COMPARE_AND_SWAP_4],
! 			 3, addr, expected_arg, value_arg);
  
        return build_check_this (stmt, this_arg);
      }
*************** compareAndSwapLong_builtin (tree method_
*** 357,364 ****
        (void) value_type; /* Avoid set but not used warning.  */
  
        addr = build_addr_sum (long_type_node, obj_arg, offset_arg);
!       stmt = build_call_expr (built_in_decls[BUILT_IN_BOOL_COMPARE_AND_SWAP_8],
! 			      3, addr, expected_arg, value_arg);
  
        return build_check_this (stmt, this_arg);
      }
--- 358,366 ----
        (void) value_type; /* Avoid set but not used warning.  */
  
        addr = build_addr_sum (long_type_node, obj_arg, offset_arg);
!       stmt = build_call_expr 
! 			(built_in_decls[BUILT_IN_SYNC_BOOL_COMPARE_AND_SWAP_8],
! 			 3, addr, expected_arg, value_arg);
  
        return build_check_this (stmt, this_arg);
      }
*************** compareAndSwapObject_builtin (tree metho
*** 378,385 ****
  
      UNMARSHAL5 (orig_call);
      builtin = (POINTER_SIZE == 32 
! 	       ? BUILT_IN_BOOL_COMPARE_AND_SWAP_4 
! 	       : BUILT_IN_BOOL_COMPARE_AND_SWAP_8);
  
      addr = build_addr_sum (value_type, obj_arg, offset_arg);
      stmt = build_call_expr (built_in_decls[builtin],
--- 380,387 ----
  
      UNMARSHAL5 (orig_call);
      builtin = (POINTER_SIZE == 32 
! 	       ? BUILT_IN_SYNC_BOOL_COMPARE_AND_SWAP_4 
! 	       : BUILT_IN_SYNC_BOOL_COMPARE_AND_SWAP_8);
  
      addr = build_addr_sum (value_type, obj_arg, offset_arg);
      stmt = build_call_expr (built_in_decls[builtin],
*************** putVolatile_builtin (tree method_return_
*** 402,408 ****
      = fold_convert (build_pointer_type (build_type_variant (value_type, 0, 1)),
  		    addr);
    
!   stmt = build_call_expr (built_in_decls[BUILT_IN_SYNCHRONIZE], 0);
    modify_stmt = fold_build2 (MODIFY_EXPR, value_type,
  			     build_java_indirect_ref (value_type, addr,
  						      flag_check_references),
--- 404,410 ----
      = fold_convert (build_pointer_type (build_type_variant (value_type, 0, 1)),
  		    addr);
    
!   stmt = build_call_expr (built_in_decls[BUILT_IN_SYNC_SYNCHRONIZE], 0);
    modify_stmt = fold_build2 (MODIFY_EXPR, value_type,
  			     build_java_indirect_ref (value_type, addr,
  						      flag_check_references),
*************** getVolatile_builtin (tree method_return_
*** 426,432 ****
      = fold_convert (build_pointer_type (build_type_variant 
  					(method_return_type, 0, 1)), addr);
    
!   stmt = build_call_expr (built_in_decls[BUILT_IN_SYNCHRONIZE], 0);
    
    tmp = build_decl (BUILTINS_LOCATION, VAR_DECL, NULL, method_return_type);
    DECL_IGNORED_P (tmp) = 1;
--- 428,434 ----
      = fold_convert (build_pointer_type (build_type_variant 
  					(method_return_type, 0, 1)), addr);
    
!   stmt = build_call_expr (built_in_decls[BUILT_IN_SYNC_SYNCHRONIZE], 0);
    
    tmp = build_decl (BUILTINS_LOCATION, VAR_DECL, NULL, method_return_type);
    DECL_IGNORED_P (tmp) = 1;
*************** initialize_builtins (void)
*** 573,593 ****
  		  boolean_ftype_boolean_boolean,
  		  "__builtin_expect",
  		  BUILTIN_CONST | BUILTIN_NOTHROW);
!   define_builtin (BUILT_IN_BOOL_COMPARE_AND_SWAP_4, 
  		  "__sync_bool_compare_and_swap_4",
  		  build_function_type_list (boolean_type_node,
  					    int_type_node, 
  					    build_pointer_type (int_type_node),
  					    int_type_node, NULL_TREE), 
  		  "__sync_bool_compare_and_swap_4", 0);
!   define_builtin (BUILT_IN_BOOL_COMPARE_AND_SWAP_8, 
  		  "__sync_bool_compare_and_swap_8",
  		  build_function_type_list (boolean_type_node,
  					    long_type_node, 
  					    build_pointer_type (long_type_node),
  					    int_type_node, NULL_TREE), 
  		  "__sync_bool_compare_and_swap_8", 0);
!   define_builtin (BUILT_IN_SYNCHRONIZE, "__sync_synchronize",
  		  build_function_type_list (void_type_node, NULL_TREE),
  		  "__sync_synchronize", BUILTIN_NOTHROW);
    
--- 575,595 ----
  		  boolean_ftype_boolean_boolean,
  		  "__builtin_expect",
  		  BUILTIN_CONST | BUILTIN_NOTHROW);
!   define_builtin (BUILT_IN_SYNC_BOOL_COMPARE_AND_SWAP_4, 
  		  "__sync_bool_compare_and_swap_4",
  		  build_function_type_list (boolean_type_node,
  					    int_type_node, 
  					    build_pointer_type (int_type_node),
  					    int_type_node, NULL_TREE), 
  		  "__sync_bool_compare_and_swap_4", 0);
!   define_builtin (BUILT_IN_SYNC_BOOL_COMPARE_AND_SWAP_8, 
  		  "__sync_bool_compare_and_swap_8",
  		  build_function_type_list (boolean_type_node,
  					    long_type_node, 
  					    build_pointer_type (long_type_node),
  					    int_type_node, NULL_TREE), 
  		  "__sync_bool_compare_and_swap_8", 0);
!   define_builtin (BUILT_IN_SYNC_SYNCHRONIZE, "__sync_synchronize",
  		  build_function_type_list (void_type_node, NULL_TREE),
  		  "__sync_synchronize", BUILTIN_NOTHROW);
    
Index: java/expr.c
===================================================================
*** java/expr.c	(revision 175226)
--- java/expr.c	(working copy)
*************** expand_java_field_op (int is_static, int
*** 2940,2946 ****
  
        if (TREE_THIS_VOLATILE (field_decl))
  	java_add_stmt
! 	  (build_call_expr (built_in_decls[BUILT_IN_SYNCHRONIZE], 0));
        	  
        java_add_stmt (modify_expr);
      }
--- 2940,2946 ----
  
        if (TREE_THIS_VOLATILE (field_decl))
  	java_add_stmt
! 	  (build_call_expr (built_in_decls[BUILT_IN_SYNC_SYNCHRONIZE], 0));
        	  
        java_add_stmt (modify_expr);
      }
*************** expand_java_field_op (int is_static, int
*** 2959,2965 ****
  
        if (TREE_THIS_VOLATILE (field_decl))
  	java_add_stmt 
! 	  (build_call_expr (built_in_decls[BUILT_IN_SYNCHRONIZE], 0));
  
        push_value (temp);
      }      
--- 2959,2965 ----
  
        if (TREE_THIS_VOLATILE (field_decl))
  	java_add_stmt 
! 	  (build_call_expr (built_in_decls[BUILT_IN_SYNC_SYNCHRONIZE], 0));
  
        push_value (temp);
      }      
Index: builtins.c
===================================================================
*** builtins.c	(revision 175226)
--- builtins.c	(working copy)
*************** expand_builtin_sync_operation (enum mach
*** 5083,5112 ****
  
        switch (fcode)
  	{
! 	case BUILT_IN_FETCH_AND_NAND_1:
! 	case BUILT_IN_FETCH_AND_NAND_2:
! 	case BUILT_IN_FETCH_AND_NAND_4:
! 	case BUILT_IN_FETCH_AND_NAND_8:
! 	case BUILT_IN_FETCH_AND_NAND_16:
  
  	  if (warned_f_a_n)
  	    break;
  
! 	  fndecl = implicit_built_in_decls[BUILT_IN_FETCH_AND_NAND_N];
  	  inform (loc, "%qD changed semantics in GCC 4.4", fndecl);
  	  warned_f_a_n = true;
  	  break;
  
! 	case BUILT_IN_NAND_AND_FETCH_1:
! 	case BUILT_IN_NAND_AND_FETCH_2:
! 	case BUILT_IN_NAND_AND_FETCH_4:
! 	case BUILT_IN_NAND_AND_FETCH_8:
! 	case BUILT_IN_NAND_AND_FETCH_16:
  
  	  if (warned_n_a_f)
  	    break;
  
! 	  fndecl = implicit_built_in_decls[BUILT_IN_NAND_AND_FETCH_N];
  	  inform (loc, "%qD changed semantics in GCC 4.4", fndecl);
  	  warned_n_a_f = true;
  	  break;
--- 5083,5112 ----
  
        switch (fcode)
  	{
! 	case BUILT_IN_SYNC_FETCH_AND_NAND_1:
! 	case BUILT_IN_SYNC_FETCH_AND_NAND_2:
! 	case BUILT_IN_SYNC_FETCH_AND_NAND_4:
! 	case BUILT_IN_SYNC_FETCH_AND_NAND_8:
! 	case BUILT_IN_SYNC_FETCH_AND_NAND_16:
  
  	  if (warned_f_a_n)
  	    break;
  
! 	  fndecl = implicit_built_in_decls[BUILT_IN_SYNC_FETCH_AND_NAND_N];
  	  inform (loc, "%qD changed semantics in GCC 4.4", fndecl);
  	  warned_f_a_n = true;
  	  break;
  
! 	case BUILT_IN_SYNC_NAND_AND_FETCH_1:
! 	case BUILT_IN_SYNC_NAND_AND_FETCH_2:
! 	case BUILT_IN_SYNC_NAND_AND_FETCH_4:
! 	case BUILT_IN_SYNC_NAND_AND_FETCH_8:
! 	case BUILT_IN_SYNC_NAND_AND_FETCH_16:
  
  	  if (warned_n_a_f)
  	    break;
  
! 	  fndecl = implicit_built_in_decls[BUILT_IN_SYNC_NAND_AND_FETCH_N];
  	  inform (loc, "%qD changed semantics in GCC 4.4", fndecl);
  	  warned_n_a_f = true;
  	  break;
*************** expand_builtin_compare_and_swap (enum ma
*** 5180,5186 ****
     the results.  */
  
  static rtx
! expand_builtin_lock_test_and_set (enum machine_mode mode, tree exp,
  				  rtx target)
  {
    rtx val, mem;
--- 5180,5186 ----
     the results.  */
  
  static rtx
! expand_builtin_sync_lock_test_and_set (enum machine_mode mode, tree exp,
  				  rtx target)
  {
    rtx val, mem;
*************** expand_builtin_lock_test_and_set (enum m
*** 5202,5208 ****
  /* Expand the __sync_synchronize intrinsic.  */
  
  static void
! expand_builtin_synchronize (void)
  {
    gimple x;
    VEC (tree, gc) *v_clobbers;
--- 5202,5208 ----
  /* Expand the __sync_synchronize intrinsic.  */
  
  static void
! expand_builtin_sync_synchronize (void)
  {
    gimple x;
    VEC (tree, gc) *v_clobbers;
*************** expand_builtin_synchronize (void)
*** 5234,5240 ****
  /* Expand the __sync_lock_release intrinsic.  EXP is the CALL_EXPR.  */
  
  static void
! expand_builtin_lock_release (enum machine_mode mode, tree exp)
  {
    struct expand_operand ops[2];
    enum insn_code icode;
--- 5234,5240 ----
  /* Expand the __sync_lock_release intrinsic.  EXP is the CALL_EXPR.  */
  
  static void
! expand_builtin_sync_lock_release (enum machine_mode mode, tree exp)
  {
    struct expand_operand ops[2];
    enum insn_code icode;
*************** expand_builtin_lock_release (enum machin
*** 5255,5261 ****
  
    /* Otherwise we can implement this operation by emitting a barrier
       followed by a store of zero.  */
!   expand_builtin_synchronize ();
    emit_move_insn (mem, const0_rtx);
  }
  \f
--- 5255,5261 ----
  
    /* Otherwise we can implement this operation by emitting a barrier
       followed by a store of zero.  */
!   expand_builtin_sync_synchronize ();
    emit_move_insn (mem, const0_rtx);
  }
  \f
*************** expand_builtin (tree exp, rtx target, rt
*** 5836,6034 ****
  	return target;
        break;
  
!     case BUILT_IN_FETCH_AND_ADD_1:
!     case BUILT_IN_FETCH_AND_ADD_2:
!     case BUILT_IN_FETCH_AND_ADD_4:
!     case BUILT_IN_FETCH_AND_ADD_8:
!     case BUILT_IN_FETCH_AND_ADD_16:
!       mode = get_builtin_sync_mode (fcode - BUILT_IN_FETCH_AND_ADD_1);
        target = expand_builtin_sync_operation (mode, exp, PLUS,
  					      false, target, ignore);
        if (target)
  	return target;
        break;
  
!     case BUILT_IN_FETCH_AND_SUB_1:
!     case BUILT_IN_FETCH_AND_SUB_2:
!     case BUILT_IN_FETCH_AND_SUB_4:
!     case BUILT_IN_FETCH_AND_SUB_8:
!     case BUILT_IN_FETCH_AND_SUB_16:
!       mode = get_builtin_sync_mode (fcode - BUILT_IN_FETCH_AND_SUB_1);
        target = expand_builtin_sync_operation (mode, exp, MINUS,
  					      false, target, ignore);
        if (target)
  	return target;
        break;
  
!     case BUILT_IN_FETCH_AND_OR_1:
!     case BUILT_IN_FETCH_AND_OR_2:
!     case BUILT_IN_FETCH_AND_OR_4:
!     case BUILT_IN_FETCH_AND_OR_8:
!     case BUILT_IN_FETCH_AND_OR_16:
!       mode = get_builtin_sync_mode (fcode - BUILT_IN_FETCH_AND_OR_1);
        target = expand_builtin_sync_operation (mode, exp, IOR,
  					      false, target, ignore);
        if (target)
  	return target;
        break;
  
!     case BUILT_IN_FETCH_AND_AND_1:
!     case BUILT_IN_FETCH_AND_AND_2:
!     case BUILT_IN_FETCH_AND_AND_4:
!     case BUILT_IN_FETCH_AND_AND_8:
!     case BUILT_IN_FETCH_AND_AND_16:
!       mode = get_builtin_sync_mode (fcode - BUILT_IN_FETCH_AND_AND_1);
        target = expand_builtin_sync_operation (mode, exp, AND,
  					      false, target, ignore);
        if (target)
  	return target;
        break;
  
!     case BUILT_IN_FETCH_AND_XOR_1:
!     case BUILT_IN_FETCH_AND_XOR_2:
!     case BUILT_IN_FETCH_AND_XOR_4:
!     case BUILT_IN_FETCH_AND_XOR_8:
!     case BUILT_IN_FETCH_AND_XOR_16:
!       mode = get_builtin_sync_mode (fcode - BUILT_IN_FETCH_AND_XOR_1);
        target = expand_builtin_sync_operation (mode, exp, XOR,
  					      false, target, ignore);
        if (target)
  	return target;
        break;
  
!     case BUILT_IN_FETCH_AND_NAND_1:
!     case BUILT_IN_FETCH_AND_NAND_2:
!     case BUILT_IN_FETCH_AND_NAND_4:
!     case BUILT_IN_FETCH_AND_NAND_8:
!     case BUILT_IN_FETCH_AND_NAND_16:
!       mode = get_builtin_sync_mode (fcode - BUILT_IN_FETCH_AND_NAND_1);
        target = expand_builtin_sync_operation (mode, exp, NOT,
  					      false, target, ignore);
        if (target)
  	return target;
        break;
  
!     case BUILT_IN_ADD_AND_FETCH_1:
!     case BUILT_IN_ADD_AND_FETCH_2:
!     case BUILT_IN_ADD_AND_FETCH_4:
!     case BUILT_IN_ADD_AND_FETCH_8:
!     case BUILT_IN_ADD_AND_FETCH_16:
!       mode = get_builtin_sync_mode (fcode - BUILT_IN_ADD_AND_FETCH_1);
        target = expand_builtin_sync_operation (mode, exp, PLUS,
  					      true, target, ignore);
        if (target)
  	return target;
        break;
  
!     case BUILT_IN_SUB_AND_FETCH_1:
!     case BUILT_IN_SUB_AND_FETCH_2:
!     case BUILT_IN_SUB_AND_FETCH_4:
!     case BUILT_IN_SUB_AND_FETCH_8:
!     case BUILT_IN_SUB_AND_FETCH_16:
!       mode = get_builtin_sync_mode (fcode - BUILT_IN_SUB_AND_FETCH_1);
        target = expand_builtin_sync_operation (mode, exp, MINUS,
  					      true, target, ignore);
        if (target)
  	return target;
        break;
  
!     case BUILT_IN_OR_AND_FETCH_1:
!     case BUILT_IN_OR_AND_FETCH_2:
!     case BUILT_IN_OR_AND_FETCH_4:
!     case BUILT_IN_OR_AND_FETCH_8:
!     case BUILT_IN_OR_AND_FETCH_16:
!       mode = get_builtin_sync_mode (fcode - BUILT_IN_OR_AND_FETCH_1);
        target = expand_builtin_sync_operation (mode, exp, IOR,
  					      true, target, ignore);
        if (target)
  	return target;
        break;
  
!     case BUILT_IN_AND_AND_FETCH_1:
!     case BUILT_IN_AND_AND_FETCH_2:
!     case BUILT_IN_AND_AND_FETCH_4:
!     case BUILT_IN_AND_AND_FETCH_8:
!     case BUILT_IN_AND_AND_FETCH_16:
!       mode = get_builtin_sync_mode (fcode - BUILT_IN_AND_AND_FETCH_1);
        target = expand_builtin_sync_operation (mode, exp, AND,
  					      true, target, ignore);
        if (target)
  	return target;
        break;
  
!     case BUILT_IN_XOR_AND_FETCH_1:
!     case BUILT_IN_XOR_AND_FETCH_2:
!     case BUILT_IN_XOR_AND_FETCH_4:
!     case BUILT_IN_XOR_AND_FETCH_8:
!     case BUILT_IN_XOR_AND_FETCH_16:
!       mode = get_builtin_sync_mode (fcode - BUILT_IN_XOR_AND_FETCH_1);
        target = expand_builtin_sync_operation (mode, exp, XOR,
  					      true, target, ignore);
        if (target)
  	return target;
        break;
  
!     case BUILT_IN_NAND_AND_FETCH_1:
!     case BUILT_IN_NAND_AND_FETCH_2:
!     case BUILT_IN_NAND_AND_FETCH_4:
!     case BUILT_IN_NAND_AND_FETCH_8:
!     case BUILT_IN_NAND_AND_FETCH_16:
!       mode = get_builtin_sync_mode (fcode - BUILT_IN_NAND_AND_FETCH_1);
        target = expand_builtin_sync_operation (mode, exp, NOT,
  					      true, target, ignore);
        if (target)
  	return target;
        break;
  
!     case BUILT_IN_BOOL_COMPARE_AND_SWAP_1:
!     case BUILT_IN_BOOL_COMPARE_AND_SWAP_2:
!     case BUILT_IN_BOOL_COMPARE_AND_SWAP_4:
!     case BUILT_IN_BOOL_COMPARE_AND_SWAP_8:
!     case BUILT_IN_BOOL_COMPARE_AND_SWAP_16:
        if (mode == VOIDmode)
  	mode = TYPE_MODE (boolean_type_node);
        if (!target || !register_operand (target, mode))
  	target = gen_reg_rtx (mode);
  
!       mode = get_builtin_sync_mode (fcode - BUILT_IN_BOOL_COMPARE_AND_SWAP_1);
        target = expand_builtin_compare_and_swap (mode, exp, true, target);
        if (target)
  	return target;
        break;
  
!     case BUILT_IN_VAL_COMPARE_AND_SWAP_1:
!     case BUILT_IN_VAL_COMPARE_AND_SWAP_2:
!     case BUILT_IN_VAL_COMPARE_AND_SWAP_4:
!     case BUILT_IN_VAL_COMPARE_AND_SWAP_8:
!     case BUILT_IN_VAL_COMPARE_AND_SWAP_16:
!       mode = get_builtin_sync_mode (fcode - BUILT_IN_VAL_COMPARE_AND_SWAP_1);
        target = expand_builtin_compare_and_swap (mode, exp, false, target);
        if (target)
  	return target;
        break;
  
!     case BUILT_IN_LOCK_TEST_AND_SET_1:
!     case BUILT_IN_LOCK_TEST_AND_SET_2:
!     case BUILT_IN_LOCK_TEST_AND_SET_4:
!     case BUILT_IN_LOCK_TEST_AND_SET_8:
!     case BUILT_IN_LOCK_TEST_AND_SET_16:
!       mode = get_builtin_sync_mode (fcode - BUILT_IN_LOCK_TEST_AND_SET_1);
!       target = expand_builtin_lock_test_and_set (mode, exp, target);
        if (target)
  	return target;
        break;
  
!     case BUILT_IN_LOCK_RELEASE_1:
!     case BUILT_IN_LOCK_RELEASE_2:
!     case BUILT_IN_LOCK_RELEASE_4:
!     case BUILT_IN_LOCK_RELEASE_8:
!     case BUILT_IN_LOCK_RELEASE_16:
!       mode = get_builtin_sync_mode (fcode - BUILT_IN_LOCK_RELEASE_1);
!       expand_builtin_lock_release (mode, exp);
        return const0_rtx;
  
!     case BUILT_IN_SYNCHRONIZE:
!       expand_builtin_synchronize ();
        return const0_rtx;
  
      case BUILT_IN_OBJECT_SIZE:
--- 5836,6036 ----
  	return target;
        break;
  
!     case BUILT_IN_SYNC_FETCH_AND_ADD_1:
!     case BUILT_IN_SYNC_FETCH_AND_ADD_2:
!     case BUILT_IN_SYNC_FETCH_AND_ADD_4:
!     case BUILT_IN_SYNC_FETCH_AND_ADD_8:
!     case BUILT_IN_SYNC_FETCH_AND_ADD_16:
!       mode = get_builtin_sync_mode (fcode - BUILT_IN_SYNC_FETCH_AND_ADD_1);
        target = expand_builtin_sync_operation (mode, exp, PLUS,
  					      false, target, ignore);
        if (target)
  	return target;
        break;
  
!     case BUILT_IN_SYNC_FETCH_AND_SUB_1:
!     case BUILT_IN_SYNC_FETCH_AND_SUB_2:
!     case BUILT_IN_SYNC_FETCH_AND_SUB_4:
!     case BUILT_IN_SYNC_FETCH_AND_SUB_8:
!     case BUILT_IN_SYNC_FETCH_AND_SUB_16:
!       mode = get_builtin_sync_mode (fcode - BUILT_IN_SYNC_FETCH_AND_SUB_1);
        target = expand_builtin_sync_operation (mode, exp, MINUS,
  					      false, target, ignore);
        if (target)
  	return target;
        break;
  
!     case BUILT_IN_SYNC_FETCH_AND_OR_1:
!     case BUILT_IN_SYNC_FETCH_AND_OR_2:
!     case BUILT_IN_SYNC_FETCH_AND_OR_4:
!     case BUILT_IN_SYNC_FETCH_AND_OR_8:
!     case BUILT_IN_SYNC_FETCH_AND_OR_16:
!       mode = get_builtin_sync_mode (fcode - BUILT_IN_SYNC_FETCH_AND_OR_1);
        target = expand_builtin_sync_operation (mode, exp, IOR,
  					      false, target, ignore);
        if (target)
  	return target;
        break;
  
!     case BUILT_IN_SYNC_FETCH_AND_AND_1:
!     case BUILT_IN_SYNC_FETCH_AND_AND_2:
!     case BUILT_IN_SYNC_FETCH_AND_AND_4:
!     case BUILT_IN_SYNC_FETCH_AND_AND_8:
!     case BUILT_IN_SYNC_FETCH_AND_AND_16:
!       mode = get_builtin_sync_mode (fcode - BUILT_IN_SYNC_FETCH_AND_AND_1);
        target = expand_builtin_sync_operation (mode, exp, AND,
  					      false, target, ignore);
        if (target)
  	return target;
        break;
  
!     case BUILT_IN_SYNC_FETCH_AND_XOR_1:
!     case BUILT_IN_SYNC_FETCH_AND_XOR_2:
!     case BUILT_IN_SYNC_FETCH_AND_XOR_4:
!     case BUILT_IN_SYNC_FETCH_AND_XOR_8:
!     case BUILT_IN_SYNC_FETCH_AND_XOR_16:
!       mode = get_builtin_sync_mode (fcode - BUILT_IN_SYNC_FETCH_AND_XOR_1);
        target = expand_builtin_sync_operation (mode, exp, XOR,
  					      false, target, ignore);
        if (target)
  	return target;
        break;
  
!     case BUILT_IN_SYNC_FETCH_AND_NAND_1:
!     case BUILT_IN_SYNC_FETCH_AND_NAND_2:
!     case BUILT_IN_SYNC_FETCH_AND_NAND_4:
!     case BUILT_IN_SYNC_FETCH_AND_NAND_8:
!     case BUILT_IN_SYNC_FETCH_AND_NAND_16:
!       mode = get_builtin_sync_mode (fcode - BUILT_IN_SYNC_FETCH_AND_NAND_1);
        target = expand_builtin_sync_operation (mode, exp, NOT,
  					      false, target, ignore);
        if (target)
  	return target;
        break;
  
!     case BUILT_IN_SYNC_ADD_AND_FETCH_1:
!     case BUILT_IN_SYNC_ADD_AND_FETCH_2:
!     case BUILT_IN_SYNC_ADD_AND_FETCH_4:
!     case BUILT_IN_SYNC_ADD_AND_FETCH_8:
!     case BUILT_IN_SYNC_ADD_AND_FETCH_16:
!       mode = get_builtin_sync_mode (fcode - BUILT_IN_SYNC_ADD_AND_FETCH_1);
        target = expand_builtin_sync_operation (mode, exp, PLUS,
  					      true, target, ignore);
        if (target)
  	return target;
        break;
  
!     case BUILT_IN_SYNC_SUB_AND_FETCH_1:
!     case BUILT_IN_SYNC_SUB_AND_FETCH_2:
!     case BUILT_IN_SYNC_SUB_AND_FETCH_4:
!     case BUILT_IN_SYNC_SUB_AND_FETCH_8:
!     case BUILT_IN_SYNC_SUB_AND_FETCH_16:
!       mode = get_builtin_sync_mode (fcode - BUILT_IN_SYNC_SUB_AND_FETCH_1);
        target = expand_builtin_sync_operation (mode, exp, MINUS,
  					      true, target, ignore);
        if (target)
  	return target;
        break;
  
!     case BUILT_IN_SYNC_OR_AND_FETCH_1:
!     case BUILT_IN_SYNC_OR_AND_FETCH_2:
!     case BUILT_IN_SYNC_OR_AND_FETCH_4:
!     case BUILT_IN_SYNC_OR_AND_FETCH_8:
!     case BUILT_IN_SYNC_OR_AND_FETCH_16:
!       mode = get_builtin_sync_mode (fcode - BUILT_IN_SYNC_OR_AND_FETCH_1);
        target = expand_builtin_sync_operation (mode, exp, IOR,
  					      true, target, ignore);
        if (target)
  	return target;
        break;
  
!     case BUILT_IN_SYNC_AND_AND_FETCH_1:
!     case BUILT_IN_SYNC_AND_AND_FETCH_2:
!     case BUILT_IN_SYNC_AND_AND_FETCH_4:
!     case BUILT_IN_SYNC_AND_AND_FETCH_8:
!     case BUILT_IN_SYNC_AND_AND_FETCH_16:
!       mode = get_builtin_sync_mode (fcode - BUILT_IN_SYNC_AND_AND_FETCH_1);
        target = expand_builtin_sync_operation (mode, exp, AND,
  					      true, target, ignore);
        if (target)
  	return target;
        break;
  
!     case BUILT_IN_SYNC_XOR_AND_FETCH_1:
!     case BUILT_IN_SYNC_XOR_AND_FETCH_2:
!     case BUILT_IN_SYNC_XOR_AND_FETCH_4:
!     case BUILT_IN_SYNC_XOR_AND_FETCH_8:
!     case BUILT_IN_SYNC_XOR_AND_FETCH_16:
!       mode = get_builtin_sync_mode (fcode - BUILT_IN_SYNC_XOR_AND_FETCH_1);
        target = expand_builtin_sync_operation (mode, exp, XOR,
  					      true, target, ignore);
        if (target)
  	return target;
        break;
  
!     case BUILT_IN_SYNC_NAND_AND_FETCH_1:
!     case BUILT_IN_SYNC_NAND_AND_FETCH_2:
!     case BUILT_IN_SYNC_NAND_AND_FETCH_4:
!     case BUILT_IN_SYNC_NAND_AND_FETCH_8:
!     case BUILT_IN_SYNC_NAND_AND_FETCH_16:
!       mode = get_builtin_sync_mode (fcode - BUILT_IN_SYNC_NAND_AND_FETCH_1);
        target = expand_builtin_sync_operation (mode, exp, NOT,
  					      true, target, ignore);
        if (target)
  	return target;
        break;
  
!     case BUILT_IN_SYNC_BOOL_COMPARE_AND_SWAP_1:
!     case BUILT_IN_SYNC_BOOL_COMPARE_AND_SWAP_2:
!     case BUILT_IN_SYNC_BOOL_COMPARE_AND_SWAP_4:
!     case BUILT_IN_SYNC_BOOL_COMPARE_AND_SWAP_8:
!     case BUILT_IN_SYNC_BOOL_COMPARE_AND_SWAP_16:
        if (mode == VOIDmode)
  	mode = TYPE_MODE (boolean_type_node);
        if (!target || !register_operand (target, mode))
  	target = gen_reg_rtx (mode);
  
!       mode = get_builtin_sync_mode 
! 				(fcode - BUILT_IN_SYNC_BOOL_COMPARE_AND_SWAP_1);
        target = expand_builtin_compare_and_swap (mode, exp, true, target);
        if (target)
  	return target;
        break;
  
!     case BUILT_IN_SYNC_VAL_COMPARE_AND_SWAP_1:
!     case BUILT_IN_SYNC_VAL_COMPARE_AND_SWAP_2:
!     case BUILT_IN_SYNC_VAL_COMPARE_AND_SWAP_4:
!     case BUILT_IN_SYNC_VAL_COMPARE_AND_SWAP_8:
!     case BUILT_IN_SYNC_VAL_COMPARE_AND_SWAP_16:
!       mode = get_builtin_sync_mode 
! 				(fcode - BUILT_IN_SYNC_VAL_COMPARE_AND_SWAP_1);
        target = expand_builtin_compare_and_swap (mode, exp, false, target);
        if (target)
  	return target;
        break;
  
!     case BUILT_IN_SYNC_LOCK_TEST_AND_SET_1:
!     case BUILT_IN_SYNC_LOCK_TEST_AND_SET_2:
!     case BUILT_IN_SYNC_LOCK_TEST_AND_SET_4:
!     case BUILT_IN_SYNC_LOCK_TEST_AND_SET_8:
!     case BUILT_IN_SYNC_LOCK_TEST_AND_SET_16:
!       mode = get_builtin_sync_mode (fcode - BUILT_IN_SYNC_LOCK_TEST_AND_SET_1);
!       target = expand_builtin_sync_lock_test_and_set (mode, exp, target);
        if (target)
  	return target;
        break;
  
!     case BUILT_IN_SYNC_LOCK_RELEASE_1:
!     case BUILT_IN_SYNC_LOCK_RELEASE_2:
!     case BUILT_IN_SYNC_LOCK_RELEASE_4:
!     case BUILT_IN_SYNC_LOCK_RELEASE_8:
!     case BUILT_IN_SYNC_LOCK_RELEASE_16:
!       mode = get_builtin_sync_mode (fcode - BUILT_IN_SYNC_LOCK_RELEASE_1);
!       expand_builtin_sync_lock_release (mode, exp);
        return const0_rtx;
  
!     case BUILT_IN_SYNC_SYNCHRONIZE:
!       expand_builtin_sync_synchronize ();
        return const0_rtx;
  
      case BUILT_IN_OBJECT_SIZE:
Index: sync-builtins.def
===================================================================
*** sync-builtins.def	(revision 175226)
--- sync-builtins.def	(working copy)
*************** along with GCC; see the file COPYING3.  
*** 28,252 ****
     is supposed to be using.  It's overloaded, and is resolved to one of the
     "_1" through "_16" versions, plus some extra casts.  */
  
! DEF_SYNC_BUILTIN (BUILT_IN_FETCH_AND_ADD_N, "__sync_fetch_and_add",
  		  BT_FN_VOID_VAR, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_FETCH_AND_ADD_1, "__sync_fetch_and_add_1",
  		  BT_FN_I1_VPTR_I1, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_FETCH_AND_ADD_2, "__sync_fetch_and_add_2",
  		  BT_FN_I2_VPTR_I2, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_FETCH_AND_ADD_4, "__sync_fetch_and_add_4",
  		  BT_FN_I4_VPTR_I4, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_FETCH_AND_ADD_8, "__sync_fetch_and_add_8",
  		  BT_FN_I8_VPTR_I8, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_FETCH_AND_ADD_16, "__sync_fetch_and_add_16",
  		  BT_FN_I16_VPTR_I16, ATTR_NOTHROW_LEAF_LIST)
  
! DEF_SYNC_BUILTIN (BUILT_IN_FETCH_AND_SUB_N, "__sync_fetch_and_sub",
  		  BT_FN_VOID_VAR, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_FETCH_AND_SUB_1, "__sync_fetch_and_sub_1",
  		  BT_FN_I1_VPTR_I1, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_FETCH_AND_SUB_2, "__sync_fetch_and_sub_2",
  		  BT_FN_I2_VPTR_I2, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_FETCH_AND_SUB_4, "__sync_fetch_and_sub_4",
  		  BT_FN_I4_VPTR_I4, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_FETCH_AND_SUB_8, "__sync_fetch_and_sub_8",
  		  BT_FN_I8_VPTR_I8, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_FETCH_AND_SUB_16, "__sync_fetch_and_sub_16",
  		  BT_FN_I16_VPTR_I16, ATTR_NOTHROW_LEAF_LIST)
  
! DEF_SYNC_BUILTIN (BUILT_IN_FETCH_AND_OR_N, "__sync_fetch_and_or",
  		  BT_FN_VOID_VAR, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_FETCH_AND_OR_1, "__sync_fetch_and_or_1",
  		  BT_FN_I1_VPTR_I1, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_FETCH_AND_OR_2, "__sync_fetch_and_or_2",
  		  BT_FN_I2_VPTR_I2, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_FETCH_AND_OR_4, "__sync_fetch_and_or_4",
  		  BT_FN_I4_VPTR_I4, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_FETCH_AND_OR_8, "__sync_fetch_and_or_8",
  		  BT_FN_I8_VPTR_I8, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_FETCH_AND_OR_16, "__sync_fetch_and_or_16",
  		  BT_FN_I16_VPTR_I16, ATTR_NOTHROW_LEAF_LIST)
  
! DEF_SYNC_BUILTIN (BUILT_IN_FETCH_AND_AND_N, "__sync_fetch_and_and",
  		  BT_FN_VOID_VAR, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_FETCH_AND_AND_1, "__sync_fetch_and_and_1",
  		  BT_FN_I1_VPTR_I1, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_FETCH_AND_AND_2, "__sync_fetch_and_and_2",
  		  BT_FN_I2_VPTR_I2, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_FETCH_AND_AND_4, "__sync_fetch_and_and_4",
  		  BT_FN_I4_VPTR_I4, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_FETCH_AND_AND_8, "__sync_fetch_and_and_8",
  		  BT_FN_I8_VPTR_I8, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_FETCH_AND_AND_16, "__sync_fetch_and_and_16",
  		  BT_FN_I16_VPTR_I16, ATTR_NOTHROW_LEAF_LIST)
  
! DEF_SYNC_BUILTIN (BUILT_IN_FETCH_AND_XOR_N, "__sync_fetch_and_xor",
  		  BT_FN_VOID_VAR, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_FETCH_AND_XOR_1, "__sync_fetch_and_xor_1",
  		  BT_FN_I1_VPTR_I1, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_FETCH_AND_XOR_2, "__sync_fetch_and_xor_2",
  		  BT_FN_I2_VPTR_I2, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_FETCH_AND_XOR_4, "__sync_fetch_and_xor_4",
  		  BT_FN_I4_VPTR_I4, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_FETCH_AND_XOR_8, "__sync_fetch_and_xor_8",
  		  BT_FN_I8_VPTR_I8, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_FETCH_AND_XOR_16, "__sync_fetch_and_xor_16",
  		  BT_FN_I16_VPTR_I16, ATTR_NOTHROW_LEAF_LIST)
  
! DEF_SYNC_BUILTIN (BUILT_IN_FETCH_AND_NAND_N, "__sync_fetch_and_nand",
  		  BT_FN_VOID_VAR, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_FETCH_AND_NAND_1, "__sync_fetch_and_nand_1",
  		  BT_FN_I1_VPTR_I1, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_FETCH_AND_NAND_2, "__sync_fetch_and_nand_2",
  		  BT_FN_I2_VPTR_I2, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_FETCH_AND_NAND_4, "__sync_fetch_and_nand_4",
  		  BT_FN_I4_VPTR_I4, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_FETCH_AND_NAND_8, "__sync_fetch_and_nand_8",
  		  BT_FN_I8_VPTR_I8, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_FETCH_AND_NAND_16, "__sync_fetch_and_nand_16",
  		  BT_FN_I16_VPTR_I16, ATTR_NOTHROW_LEAF_LIST)
  
! DEF_SYNC_BUILTIN (BUILT_IN_ADD_AND_FETCH_N, "__sync_add_and_fetch",
  		  BT_FN_VOID_VAR, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_ADD_AND_FETCH_1, "__sync_add_and_fetch_1",
  		  BT_FN_I1_VPTR_I1, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_ADD_AND_FETCH_2, "__sync_add_and_fetch_2",
  		  BT_FN_I2_VPTR_I2, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_ADD_AND_FETCH_4, "__sync_add_and_fetch_4",
  		  BT_FN_I4_VPTR_I4, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_ADD_AND_FETCH_8, "__sync_add_and_fetch_8",
  		  BT_FN_I8_VPTR_I8, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_ADD_AND_FETCH_16, "__sync_add_and_fetch_16",
  		  BT_FN_I16_VPTR_I16, ATTR_NOTHROW_LEAF_LIST)
  
! DEF_SYNC_BUILTIN (BUILT_IN_SUB_AND_FETCH_N, "__sync_sub_and_fetch",
  		  BT_FN_VOID_VAR, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SUB_AND_FETCH_1, "__sync_sub_and_fetch_1",
  		  BT_FN_I1_VPTR_I1, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SUB_AND_FETCH_2, "__sync_sub_and_fetch_2",
  		  BT_FN_I2_VPTR_I2, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SUB_AND_FETCH_4, "__sync_sub_and_fetch_4",
  		  BT_FN_I4_VPTR_I4, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SUB_AND_FETCH_8, "__sync_sub_and_fetch_8",
  		  BT_FN_I8_VPTR_I8, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SUB_AND_FETCH_16, "__sync_sub_and_fetch_16",
  		  BT_FN_I16_VPTR_I16, ATTR_NOTHROW_LEAF_LIST)
  
! DEF_SYNC_BUILTIN (BUILT_IN_OR_AND_FETCH_N, "__sync_or_and_fetch",
  		  BT_FN_VOID_VAR, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_OR_AND_FETCH_1, "__sync_or_and_fetch_1",
  		  BT_FN_I1_VPTR_I1, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_OR_AND_FETCH_2, "__sync_or_and_fetch_2",
  		  BT_FN_I2_VPTR_I2, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_OR_AND_FETCH_4, "__sync_or_and_fetch_4",
  		  BT_FN_I4_VPTR_I4, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_OR_AND_FETCH_8, "__sync_or_and_fetch_8",
  		  BT_FN_I8_VPTR_I8, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_OR_AND_FETCH_16, "__sync_or_and_fetch_16",
  		  BT_FN_I16_VPTR_I16, ATTR_NOTHROW_LEAF_LIST)
  
! DEF_SYNC_BUILTIN (BUILT_IN_AND_AND_FETCH_N, "__sync_and_and_fetch",
  		  BT_FN_VOID_VAR, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_AND_AND_FETCH_1, "__sync_and_and_fetch_1",
  		  BT_FN_I1_VPTR_I1, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_AND_AND_FETCH_2, "__sync_and_and_fetch_2",
  		  BT_FN_I2_VPTR_I2, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_AND_AND_FETCH_4, "__sync_and_and_fetch_4",
  		  BT_FN_I4_VPTR_I4, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_AND_AND_FETCH_8, "__sync_and_and_fetch_8",
  		  BT_FN_I8_VPTR_I8, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_AND_AND_FETCH_16, "__sync_and_and_fetch_16",
  		  BT_FN_I16_VPTR_I16, ATTR_NOTHROW_LEAF_LIST)
  
! DEF_SYNC_BUILTIN (BUILT_IN_XOR_AND_FETCH_N, "__sync_xor_and_fetch",
  		  BT_FN_VOID_VAR, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_XOR_AND_FETCH_1, "__sync_xor_and_fetch_1",
  		  BT_FN_I1_VPTR_I1, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_XOR_AND_FETCH_2, "__sync_xor_and_fetch_2",
  		  BT_FN_I2_VPTR_I2, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_XOR_AND_FETCH_4, "__sync_xor_and_fetch_4",
  		  BT_FN_I4_VPTR_I4, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_XOR_AND_FETCH_8, "__sync_xor_and_fetch_8",
  		  BT_FN_I8_VPTR_I8, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_XOR_AND_FETCH_16, "__sync_xor_and_fetch_16",
  		  BT_FN_I16_VPTR_I16, ATTR_NOTHROW_LEAF_LIST)
  
! DEF_SYNC_BUILTIN (BUILT_IN_NAND_AND_FETCH_N, "__sync_nand_and_fetch",
  		  BT_FN_VOID_VAR, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_NAND_AND_FETCH_1, "__sync_nand_and_fetch_1",
  		  BT_FN_I1_VPTR_I1, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_NAND_AND_FETCH_2, "__sync_nand_and_fetch_2",
  		  BT_FN_I2_VPTR_I2, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_NAND_AND_FETCH_4, "__sync_nand_and_fetch_4",
  		  BT_FN_I4_VPTR_I4, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_NAND_AND_FETCH_8, "__sync_nand_and_fetch_8",
  		  BT_FN_I8_VPTR_I8, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_NAND_AND_FETCH_16, "__sync_nand_and_fetch_16",
  		  BT_FN_I16_VPTR_I16, ATTR_NOTHROW_LEAF_LIST)
  
! DEF_SYNC_BUILTIN (BUILT_IN_BOOL_COMPARE_AND_SWAP_N,
  		  "__sync_bool_compare_and_swap",
  		  BT_FN_VOID_VAR, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_BOOL_COMPARE_AND_SWAP_1,
  		  "__sync_bool_compare_and_swap_1",
  		  BT_FN_BOOL_VPTR_I1_I1, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_BOOL_COMPARE_AND_SWAP_2,
  		  "__sync_bool_compare_and_swap_2",
  		  BT_FN_BOOL_VPTR_I2_I2, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_BOOL_COMPARE_AND_SWAP_4,
  		  "__sync_bool_compare_and_swap_4",
  		  BT_FN_BOOL_VPTR_I4_I4, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_BOOL_COMPARE_AND_SWAP_8,
  		  "__sync_bool_compare_and_swap_8",
  		  BT_FN_BOOL_VPTR_I8_I8, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_BOOL_COMPARE_AND_SWAP_16,
  		  "__sync_bool_compare_and_swap_16",
  		  BT_FN_BOOL_VPTR_I16_I16, ATTR_NOTHROW_LEAF_LIST)
  
! DEF_SYNC_BUILTIN (BUILT_IN_VAL_COMPARE_AND_SWAP_N,
  		  "__sync_val_compare_and_swap",
  		  BT_FN_VOID_VAR, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_VAL_COMPARE_AND_SWAP_1,
  		  "__sync_val_compare_and_swap_1",
  		  BT_FN_I1_VPTR_I1_I1, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_VAL_COMPARE_AND_SWAP_2,
  		  "__sync_val_compare_and_swap_2",
  		  BT_FN_I2_VPTR_I2_I2, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_VAL_COMPARE_AND_SWAP_4,
  		  "__sync_val_compare_and_swap_4",
  		  BT_FN_I4_VPTR_I4_I4, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_VAL_COMPARE_AND_SWAP_8,
  		  "__sync_val_compare_and_swap_8",
  		  BT_FN_I8_VPTR_I8_I8, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_VAL_COMPARE_AND_SWAP_16,
  		  "__sync_val_compare_and_swap_16",
  		  BT_FN_I16_VPTR_I16_I16, ATTR_NOTHROW_LEAF_LIST)
  
! DEF_SYNC_BUILTIN (BUILT_IN_LOCK_TEST_AND_SET_N, "__sync_lock_test_and_set",
  		  BT_FN_VOID_VAR, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_LOCK_TEST_AND_SET_1, "__sync_lock_test_and_set_1",
  		  BT_FN_I1_VPTR_I1, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_LOCK_TEST_AND_SET_2, "__sync_lock_test_and_set_2",
  		  BT_FN_I2_VPTR_I2, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_LOCK_TEST_AND_SET_4, "__sync_lock_test_and_set_4",
  		  BT_FN_I4_VPTR_I4, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_LOCK_TEST_AND_SET_8, "__sync_lock_test_and_set_8",
  		  BT_FN_I8_VPTR_I8, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_LOCK_TEST_AND_SET_16, "__sync_lock_test_and_set_16",
  		  BT_FN_I16_VPTR_I16, ATTR_NOTHROW_LEAF_LIST)
  
! DEF_SYNC_BUILTIN (BUILT_IN_LOCK_RELEASE_N, "__sync_lock_release",
  		  BT_FN_VOID_VAR, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_LOCK_RELEASE_1, "__sync_lock_release_1",
  		  BT_FN_VOID_VPTR, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_LOCK_RELEASE_2, "__sync_lock_release_2",
  		  BT_FN_VOID_VPTR, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_LOCK_RELEASE_4, "__sync_lock_release_4",
  		  BT_FN_VOID_VPTR, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_LOCK_RELEASE_8, "__sync_lock_release_8",
  		  BT_FN_VOID_VPTR, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_LOCK_RELEASE_16, "__sync_lock_release_16",
  		  BT_FN_VOID_VPTR, ATTR_NOTHROW_LEAF_LIST)
  
! DEF_SYNC_BUILTIN (BUILT_IN_SYNCHRONIZE, "__sync_synchronize",
  		  BT_FN_VOID, ATTR_NOTHROW_LEAF_LIST)
--- 28,258 ----
     is supposed to be using.  It's overloaded, and is resolved to one of the
     "_1" through "_16" versions, plus some extra casts.  */
  
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_FETCH_AND_ADD_N, "__sync_fetch_and_add",
  		  BT_FN_VOID_VAR, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_FETCH_AND_ADD_1, "__sync_fetch_and_add_1",
  		  BT_FN_I1_VPTR_I1, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_FETCH_AND_ADD_2, "__sync_fetch_and_add_2",
  		  BT_FN_I2_VPTR_I2, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_FETCH_AND_ADD_4, "__sync_fetch_and_add_4",
  		  BT_FN_I4_VPTR_I4, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_FETCH_AND_ADD_8, "__sync_fetch_and_add_8",
  		  BT_FN_I8_VPTR_I8, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_FETCH_AND_ADD_16, "__sync_fetch_and_add_16",
  		  BT_FN_I16_VPTR_I16, ATTR_NOTHROW_LEAF_LIST)
  
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_FETCH_AND_SUB_N, "__sync_fetch_and_sub",
  		  BT_FN_VOID_VAR, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_FETCH_AND_SUB_1, "__sync_fetch_and_sub_1",
  		  BT_FN_I1_VPTR_I1, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_FETCH_AND_SUB_2, "__sync_fetch_and_sub_2",
  		  BT_FN_I2_VPTR_I2, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_FETCH_AND_SUB_4, "__sync_fetch_and_sub_4",
  		  BT_FN_I4_VPTR_I4, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_FETCH_AND_SUB_8, "__sync_fetch_and_sub_8",
  		  BT_FN_I8_VPTR_I8, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_FETCH_AND_SUB_16, "__sync_fetch_and_sub_16",
  		  BT_FN_I16_VPTR_I16, ATTR_NOTHROW_LEAF_LIST)
  
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_FETCH_AND_OR_N, "__sync_fetch_and_or",
  		  BT_FN_VOID_VAR, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_FETCH_AND_OR_1, "__sync_fetch_and_or_1",
  		  BT_FN_I1_VPTR_I1, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_FETCH_AND_OR_2, "__sync_fetch_and_or_2",
  		  BT_FN_I2_VPTR_I2, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_FETCH_AND_OR_4, "__sync_fetch_and_or_4",
  		  BT_FN_I4_VPTR_I4, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_FETCH_AND_OR_8, "__sync_fetch_and_or_8",
  		  BT_FN_I8_VPTR_I8, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_FETCH_AND_OR_16, "__sync_fetch_and_or_16",
  		  BT_FN_I16_VPTR_I16, ATTR_NOTHROW_LEAF_LIST)
  
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_FETCH_AND_AND_N, "__sync_fetch_and_and",
  		  BT_FN_VOID_VAR, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_FETCH_AND_AND_1, "__sync_fetch_and_and_1",
  		  BT_FN_I1_VPTR_I1, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_FETCH_AND_AND_2, "__sync_fetch_and_and_2",
  		  BT_FN_I2_VPTR_I2, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_FETCH_AND_AND_4, "__sync_fetch_and_and_4",
  		  BT_FN_I4_VPTR_I4, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_FETCH_AND_AND_8, "__sync_fetch_and_and_8",
  		  BT_FN_I8_VPTR_I8, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_FETCH_AND_AND_16, "__sync_fetch_and_and_16",
  		  BT_FN_I16_VPTR_I16, ATTR_NOTHROW_LEAF_LIST)
  
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_FETCH_AND_XOR_N, "__sync_fetch_and_xor",
  		  BT_FN_VOID_VAR, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_FETCH_AND_XOR_1, "__sync_fetch_and_xor_1",
  		  BT_FN_I1_VPTR_I1, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_FETCH_AND_XOR_2, "__sync_fetch_and_xor_2",
  		  BT_FN_I2_VPTR_I2, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_FETCH_AND_XOR_4, "__sync_fetch_and_xor_4",
  		  BT_FN_I4_VPTR_I4, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_FETCH_AND_XOR_8, "__sync_fetch_and_xor_8",
  		  BT_FN_I8_VPTR_I8, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_FETCH_AND_XOR_16, "__sync_fetch_and_xor_16",
  		  BT_FN_I16_VPTR_I16, ATTR_NOTHROW_LEAF_LIST)
  
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_FETCH_AND_NAND_N, "__sync_fetch_and_nand",
  		  BT_FN_VOID_VAR, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_FETCH_AND_NAND_1, "__sync_fetch_and_nand_1",
  		  BT_FN_I1_VPTR_I1, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_FETCH_AND_NAND_2, "__sync_fetch_and_nand_2",
  		  BT_FN_I2_VPTR_I2, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_FETCH_AND_NAND_4, "__sync_fetch_and_nand_4",
  		  BT_FN_I4_VPTR_I4, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_FETCH_AND_NAND_8, "__sync_fetch_and_nand_8",
  		  BT_FN_I8_VPTR_I8, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_FETCH_AND_NAND_16, "__sync_fetch_and_nand_16",
  		  BT_FN_I16_VPTR_I16, ATTR_NOTHROW_LEAF_LIST)
  
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_ADD_AND_FETCH_N, "__sync_add_and_fetch",
  		  BT_FN_VOID_VAR, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_ADD_AND_FETCH_1, "__sync_add_and_fetch_1",
  		  BT_FN_I1_VPTR_I1, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_ADD_AND_FETCH_2, "__sync_add_and_fetch_2",
  		  BT_FN_I2_VPTR_I2, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_ADD_AND_FETCH_4, "__sync_add_and_fetch_4",
  		  BT_FN_I4_VPTR_I4, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_ADD_AND_FETCH_8, "__sync_add_and_fetch_8",
  		  BT_FN_I8_VPTR_I8, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_ADD_AND_FETCH_16, "__sync_add_and_fetch_16",
  		  BT_FN_I16_VPTR_I16, ATTR_NOTHROW_LEAF_LIST)
  
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_SUB_AND_FETCH_N, "__sync_sub_and_fetch",
  		  BT_FN_VOID_VAR, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_SUB_AND_FETCH_1, "__sync_sub_and_fetch_1",
  		  BT_FN_I1_VPTR_I1, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_SUB_AND_FETCH_2, "__sync_sub_and_fetch_2",
  		  BT_FN_I2_VPTR_I2, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_SUB_AND_FETCH_4, "__sync_sub_and_fetch_4",
  		  BT_FN_I4_VPTR_I4, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_SUB_AND_FETCH_8, "__sync_sub_and_fetch_8",
  		  BT_FN_I8_VPTR_I8, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_SUB_AND_FETCH_16, "__sync_sub_and_fetch_16",
  		  BT_FN_I16_VPTR_I16, ATTR_NOTHROW_LEAF_LIST)
  
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_OR_AND_FETCH_N, "__sync_or_and_fetch",
  		  BT_FN_VOID_VAR, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_OR_AND_FETCH_1, "__sync_or_and_fetch_1",
  		  BT_FN_I1_VPTR_I1, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_OR_AND_FETCH_2, "__sync_or_and_fetch_2",
  		  BT_FN_I2_VPTR_I2, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_OR_AND_FETCH_4, "__sync_or_and_fetch_4",
  		  BT_FN_I4_VPTR_I4, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_OR_AND_FETCH_8, "__sync_or_and_fetch_8",
  		  BT_FN_I8_VPTR_I8, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_OR_AND_FETCH_16, "__sync_or_and_fetch_16",
  		  BT_FN_I16_VPTR_I16, ATTR_NOTHROW_LEAF_LIST)
  
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_AND_AND_FETCH_N, "__sync_and_and_fetch",
  		  BT_FN_VOID_VAR, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_AND_AND_FETCH_1, "__sync_and_and_fetch_1",
  		  BT_FN_I1_VPTR_I1, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_AND_AND_FETCH_2, "__sync_and_and_fetch_2",
  		  BT_FN_I2_VPTR_I2, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_AND_AND_FETCH_4, "__sync_and_and_fetch_4",
  		  BT_FN_I4_VPTR_I4, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_AND_AND_FETCH_8, "__sync_and_and_fetch_8",
  		  BT_FN_I8_VPTR_I8, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_AND_AND_FETCH_16, "__sync_and_and_fetch_16",
  		  BT_FN_I16_VPTR_I16, ATTR_NOTHROW_LEAF_LIST)
  
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_XOR_AND_FETCH_N, "__sync_xor_and_fetch",
  		  BT_FN_VOID_VAR, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_XOR_AND_FETCH_1, "__sync_xor_and_fetch_1",
  		  BT_FN_I1_VPTR_I1, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_XOR_AND_FETCH_2, "__sync_xor_and_fetch_2",
  		  BT_FN_I2_VPTR_I2, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_XOR_AND_FETCH_4, "__sync_xor_and_fetch_4",
  		  BT_FN_I4_VPTR_I4, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_XOR_AND_FETCH_8, "__sync_xor_and_fetch_8",
  		  BT_FN_I8_VPTR_I8, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_XOR_AND_FETCH_16, "__sync_xor_and_fetch_16",
  		  BT_FN_I16_VPTR_I16, ATTR_NOTHROW_LEAF_LIST)
  
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_NAND_AND_FETCH_N, "__sync_nand_and_fetch",
  		  BT_FN_VOID_VAR, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_NAND_AND_FETCH_1, "__sync_nand_and_fetch_1",
  		  BT_FN_I1_VPTR_I1, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_NAND_AND_FETCH_2, "__sync_nand_and_fetch_2",
  		  BT_FN_I2_VPTR_I2, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_NAND_AND_FETCH_4, "__sync_nand_and_fetch_4",
  		  BT_FN_I4_VPTR_I4, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_NAND_AND_FETCH_8, "__sync_nand_and_fetch_8",
  		  BT_FN_I8_VPTR_I8, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_NAND_AND_FETCH_16, "__sync_nand_and_fetch_16",
  		  BT_FN_I16_VPTR_I16, ATTR_NOTHROW_LEAF_LIST)
  
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_BOOL_COMPARE_AND_SWAP_N,
  		  "__sync_bool_compare_and_swap",
  		  BT_FN_VOID_VAR, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_BOOL_COMPARE_AND_SWAP_1,
  		  "__sync_bool_compare_and_swap_1",
  		  BT_FN_BOOL_VPTR_I1_I1, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_BOOL_COMPARE_AND_SWAP_2,
  		  "__sync_bool_compare_and_swap_2",
  		  BT_FN_BOOL_VPTR_I2_I2, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_BOOL_COMPARE_AND_SWAP_4,
  		  "__sync_bool_compare_and_swap_4",
  		  BT_FN_BOOL_VPTR_I4_I4, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_BOOL_COMPARE_AND_SWAP_8,
  		  "__sync_bool_compare_and_swap_8",
  		  BT_FN_BOOL_VPTR_I8_I8, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_BOOL_COMPARE_AND_SWAP_16,
  		  "__sync_bool_compare_and_swap_16",
  		  BT_FN_BOOL_VPTR_I16_I16, ATTR_NOTHROW_LEAF_LIST)
  
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_VAL_COMPARE_AND_SWAP_N,
  		  "__sync_val_compare_and_swap",
  		  BT_FN_VOID_VAR, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_VAL_COMPARE_AND_SWAP_1,
  		  "__sync_val_compare_and_swap_1",
  		  BT_FN_I1_VPTR_I1_I1, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_VAL_COMPARE_AND_SWAP_2,
  		  "__sync_val_compare_and_swap_2",
  		  BT_FN_I2_VPTR_I2_I2, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_VAL_COMPARE_AND_SWAP_4,
  		  "__sync_val_compare_and_swap_4",
  		  BT_FN_I4_VPTR_I4_I4, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_VAL_COMPARE_AND_SWAP_8,
  		  "__sync_val_compare_and_swap_8",
  		  BT_FN_I8_VPTR_I8_I8, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_VAL_COMPARE_AND_SWAP_16,
  		  "__sync_val_compare_and_swap_16",
  		  BT_FN_I16_VPTR_I16_I16, ATTR_NOTHROW_LEAF_LIST)
  
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_LOCK_TEST_AND_SET_N,
! 		  "__sync_lock_test_and_set",
  		  BT_FN_VOID_VAR, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_LOCK_TEST_AND_SET_1,
! 		  "__sync_lock_test_and_set_1",
  		  BT_FN_I1_VPTR_I1, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_LOCK_TEST_AND_SET_2,
! 		  "__sync_lock_test_and_set_2",
  		  BT_FN_I2_VPTR_I2, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_LOCK_TEST_AND_SET_4,
! 		  "__sync_lock_test_and_set_4",
  		  BT_FN_I4_VPTR_I4, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_LOCK_TEST_AND_SET_8,
! 		  "__sync_lock_test_and_set_8",
  		  BT_FN_I8_VPTR_I8, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_LOCK_TEST_AND_SET_16,
! 		  "__sync_lock_test_and_set_16",
  		  BT_FN_I16_VPTR_I16, ATTR_NOTHROW_LEAF_LIST)
  
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_LOCK_RELEASE_N, "__sync_lock_release",
  		  BT_FN_VOID_VAR, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_LOCK_RELEASE_1, "__sync_lock_release_1",
  		  BT_FN_VOID_VPTR, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_LOCK_RELEASE_2, "__sync_lock_release_2",
  		  BT_FN_VOID_VPTR, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_LOCK_RELEASE_4, "__sync_lock_release_4",
  		  BT_FN_VOID_VPTR, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_LOCK_RELEASE_8, "__sync_lock_release_8",
  		  BT_FN_VOID_VPTR, ATTR_NOTHROW_LEAF_LIST)
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_LOCK_RELEASE_16, "__sync_lock_release_16",
  		  BT_FN_VOID_VPTR, ATTR_NOTHROW_LEAF_LIST)
  
! DEF_SYNC_BUILTIN (BUILT_IN_SYNC_SYNCHRONIZE, "__sync_synchronize",
  		  BT_FN_VOID, ATTR_NOTHROW_LEAF_LIST)
Index: omp-low.c
===================================================================
*** omp-low.c	(revision 175226)
--- omp-low.c	(working copy)
*************** expand_omp_atomic_fetch_op (basic_block 
*** 4973,4995 ****
      {
      case PLUS_EXPR:
      case POINTER_PLUS_EXPR:
!       base = BUILT_IN_FETCH_AND_ADD_N;
        optab = sync_add_optab;
        break;
      case MINUS_EXPR:
!       base = BUILT_IN_FETCH_AND_SUB_N;
        optab = sync_add_optab;
        break;
      case BIT_AND_EXPR:
!       base = BUILT_IN_FETCH_AND_AND_N;
        optab = sync_and_optab;
        break;
      case BIT_IOR_EXPR:
!       base = BUILT_IN_FETCH_AND_OR_N;
        optab = sync_ior_optab;
        break;
      case BIT_XOR_EXPR:
!       base = BUILT_IN_FETCH_AND_XOR_N;
        optab = sync_xor_optab;
        break;
      default:
--- 4973,4995 ----
      {
      case PLUS_EXPR:
      case POINTER_PLUS_EXPR:
!       base = BUILT_IN_SYNC_FETCH_AND_ADD_N;
        optab = sync_add_optab;
        break;
      case MINUS_EXPR:
!       base = BUILT_IN_SYNC_FETCH_AND_SUB_N;
        optab = sync_add_optab;
        break;
      case BIT_AND_EXPR:
!       base = BUILT_IN_SYNC_FETCH_AND_AND_N;
        optab = sync_and_optab;
        break;
      case BIT_IOR_EXPR:
!       base = BUILT_IN_SYNC_FETCH_AND_OR_N;
        optab = sync_ior_optab;
        break;
      case BIT_XOR_EXPR:
!       base = BUILT_IN_SYNC_FETCH_AND_XOR_N;
        optab = sync_xor_optab;
        break;
      default:
*************** expand_omp_atomic_pipeline (basic_block 
*** 5057,5063 ****
    gimple phi, stmt;
    edge e;
  
!   cmpxchg = built_in_decls[BUILT_IN_VAL_COMPARE_AND_SWAP_N + index + 1];
    if (cmpxchg == NULL_TREE)
      return false;
    type = TYPE_MAIN_VARIANT (TREE_TYPE (TREE_TYPE (addr)));
--- 5057,5063 ----
    gimple phi, stmt;
    edge e;
  
!   cmpxchg = built_in_decls[BUILT_IN_SYNC_VAL_COMPARE_AND_SWAP_N + index + 1];
    if (cmpxchg == NULL_TREE)
      return false;
    type = TYPE_MAIN_VARIANT (TREE_TYPE (TREE_TYPE (addr)));
Index: cp/semantics.c
===================================================================
*** cp/semantics.c	(revision 175226)
--- cp/semantics.c	(working copy)
*************** finish_omp_barrier (void)
*** 4722,4728 ****
  void
  finish_omp_flush (void)
  {
!   tree fn = built_in_decls[BUILT_IN_SYNCHRONIZE];
    VEC(tree,gc) *vec = make_tree_vector ();
    tree stmt = finish_call_expr (fn, &vec, false, false, tf_warning_or_error);
    release_tree_vector (vec);
--- 4722,4728 ----
  void
  finish_omp_flush (void)
  {
!   tree fn = built_in_decls[BUILT_IN_SYNC_SYNCHRONIZE];
    VEC(tree,gc) *vec = make_tree_vector ();
    tree stmt = finish_call_expr (fn, &vec, false, false, tf_warning_or_error);
    release_tree_vector (vec);
Index: fortran/trans-openmp.c
===================================================================
*** fortran/trans-openmp.c	(revision 175226)
--- fortran/trans-openmp.c	(working copy)
*************** gfc_trans_omp_do (gfc_code *code, stmtbl
*** 1430,1436 ****
  static tree
  gfc_trans_omp_flush (void)
  {
!   tree decl = built_in_decls [BUILT_IN_SYNCHRONIZE];
    return build_call_expr_loc (input_location, decl, 0);
  }
  
--- 1430,1436 ----
  static tree
  gfc_trans_omp_flush (void)
  {
!   tree decl = built_in_decls [BUILT_IN_SYNC_SYNCHRONIZE];
    return build_call_expr_loc (input_location, decl, 0);
  }
  
Index: fortran/trans-stmt.c
===================================================================
*** fortran/trans-stmt.c	(revision 175226)
--- fortran/trans-stmt.c	(working copy)
*************** gfc_trans_stop (gfc_code *code, bool err
*** 602,608 ****
    if (gfc_option.coarray == GFC_FCOARRAY_LIB && !error_stop)
      {
        /* Per F2008, 8.5.1 STOP implies a SYNC MEMORY.  */
!       tmp = built_in_decls [BUILT_IN_SYNCHRONIZE];
        tmp = build_call_expr_loc (input_location, tmp, 0);
        gfc_add_expr_to_block (&se.pre, tmp);
  
--- 602,608 ----
    if (gfc_option.coarray == GFC_FCOARRAY_LIB && !error_stop)
      {
        /* Per F2008, 8.5.1 STOP implies a SYNC MEMORY.  */
!       tmp = built_in_decls [BUILT_IN_SYNC_SYNCHRONIZE];
        tmp = build_call_expr_loc (input_location, tmp, 0);
        gfc_add_expr_to_block (&se.pre, tmp);
  
*************** gfc_trans_sync (gfc_code *code, gfc_exec
*** 732,738 ****
        image control statements SYNC IMAGES and SYNC ALL.  */
     if (gfc_option.coarray == GFC_FCOARRAY_LIB)
       {
! 	tmp = built_in_decls [BUILT_IN_SYNCHRONIZE];
  	tmp = build_call_expr_loc (input_location, tmp, 0);
  	gfc_add_expr_to_block (&se.pre, tmp);
       }
--- 732,738 ----
        image control statements SYNC IMAGES and SYNC ALL.  */
     if (gfc_option.coarray == GFC_FCOARRAY_LIB)
       {
! 	tmp = built_in_decls [BUILT_IN_SYNC_SYNCHRONIZE];
  	tmp = build_call_expr_loc (input_location, tmp, 0);
  	gfc_add_expr_to_block (&se.pre, tmp);
       }
Index: fortran/trans-decl.c
===================================================================
*** fortran/trans-decl.c	(revision 175226)
--- fortran/trans-decl.c	(working copy)
*************** create_main_function (tree fndecl)
*** 4904,4910 ****
      { 
        /* Per F2008, 8.5.1 END of the main program implies a
  	 SYNC MEMORY.  */ 
!       tmp = built_in_decls [BUILT_IN_SYNCHRONIZE];
        tmp = build_call_expr_loc (input_location, tmp, 0);
        gfc_add_expr_to_block (&body, tmp);
  
--- 4904,4910 ----
      { 
        /* Per F2008, 8.5.1 END of the main program implies a
  	 SYNC MEMORY.  */ 
!       tmp = built_in_decls [BUILT_IN_SYNC_SYNCHRONIZE];
        tmp = build_call_expr_loc (input_location, tmp, 0);
        gfc_add_expr_to_block (&body, tmp);
  

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: __sync_swap*  [ rename sync builtins ]
  2011-06-21 18:56                         ` __sync_swap* [ rename sync builtins ] Andrew MacLeod
@ 2011-06-21 19:03                           ` Richard Henderson
  2011-06-21 23:03                             ` Graham Stott
  0 siblings, 1 reply; 26+ messages in thread
From: Richard Henderson @ 2011-06-21 19:03 UTC (permalink / raw)
  To: Andrew MacLeod
  Cc: Aldy Hernandez, Jakub Jelinek, Joseph S. Myers, gcc-patches

On 06/21/2011 11:46 AM, Andrew MacLeod wrote:
> 	* c-family/c-common.c: Add sync_ or SYNC__ to builtin names.
> 	* c-family/c-omp.c: Add sync_ or SYNC__ to builtin names.
> 	* java/builtins.c: Add sync_ or SYNC__ to builtin names.
> 	* java/expr.c: Add sync_ or SYNC__ to builtin names.
> 	* builtins.c: Add sync_ or SYNC__ to builtin names.
> 	* sync-builtins.def: Add sync_ or SYNC__ to builtin names.
> 	* omp-low.c: Add sync_ or SYNC__ to builtin names.
> 	* cp/semantics.c: Add sync_ or SYNC__ to builtin names.
> 	* fortran/trans-openmp.c: Add sync_ or SYNC__ to builtin names.
> 	* fortran/trans-stmt.c: Add sync_ or SYNC__ to builtin names.
> 	* fortran/trans-decl.c: Add sync_ or SYNC__ to builtin names.

Ok.


r~

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: __sync_swap*  [ rename sync builtins ]
  2011-06-21 19:03                           ` Richard Henderson
@ 2011-06-21 23:03                             ` Graham Stott
  2011-06-21 23:26                               ` Andrew MacLeod
  0 siblings, 1 reply; 26+ messages in thread
From: Graham Stott @ 2011-06-21 23:03 UTC (permalink / raw)
  To: Andrew MacLeod, Richard Henderson
  Cc: Aldy Hernandez, Jakub Jelinek, Joseph S. Myers, gcc-patches

All

--- On Tue, 21/6/11, Richard Henderson <rth@redhat.com> wrote:

> From: Richard Henderson <rth@redhat.com>
> Subject: Re: __sync_swap*  [ rename sync builtins ]
> To: "Andrew MacLeod" <amacleod@redhat.com>
> Cc: "Aldy Hernandez" <aldyh@redhat.com>, "Jakub Jelinek" <jakub@redhat.com>, "Joseph S. Myers" <joseph@codesourcery.com>, "gcc-patches" <gcc-patches@gcc.gnu.org>
> Date: Tuesday, 21 June, 2011, 19:50
> On 06/21/2011 11:46 AM, Andrew
> MacLeod wrote:
> >     * c-family/c-common.c: Add sync_ or
> SYNC__ to builtin names.
> >     * c-family/c-omp.c: Add sync_ or
> SYNC__ to builtin names.
> >     * java/builtins.c: Add sync_ or
> SYNC__ to builtin names.
> >     * java/expr.c: Add sync_ or SYNC__
> to builtin names.
> >     * builtins.c: Add sync_ or SYNC__
> to builtin names.
> >     * sync-builtins.def: Add sync_ or
> SYNC__ to builtin names.
> >     * omp-low.c: Add sync_ or SYNC__ to
> builtin names.
> >     * cp/semantics.c: Add sync_ or
> SYNC__ to builtin names.
> >     * fortran/trans-openmp.c: Add sync_
> or SYNC__ to builtin names.
> >     * fortran/trans-stmt.c: Add sync_
> or SYNC__ to builtin names.
> >     * fortran/trans-decl.c: Add sync_
> or SYNC__ to builtin names.
> 
> Ok.
> 
> 
> r~
> 

This looks to have broken the go frontend

gcc/gcc/go/gofrontend/gogo-tree.cc: In member function ‘void Gogo::define_builtin_function_trees()’:
/usr/local/src/gcc4.7/gcc/gcc/go/gofrontend/gogo-tree.cc:94:18: error: ‘BUILT_IN_ADD_AND_FETCH_1’ was not declared in this scope
/usr/local/src/gcc4.7/gcc/gcc/go/gofrontend/gogo-tree.cc:99:19: error: ‘BUILT_IN_ADD_AND_FETCH_2’ was not declared in this scope
/usr/local/src/gcc4.7/gcc/gcc/go/gofrontend/gogo-tree.cc:104:18: error: ‘BUILT_IN_ADD_AND_FETCH_4’ was not declared in this scope
/usr/local/src/gcc4.7/gcc/gcc/go/gofrontend/gogo-tree.cc:109:18: error: ‘BUILT_IN_ADD_AND_FETCH_8’ was not declared in this scope

Graham

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [cxx-mem-model]  sync_mem_exchange implementation with memory model parameters
  2011-06-20 23:29                       ` Richard Henderson
  2011-06-21 18:56                         ` __sync_swap* [ rename sync builtins ] Andrew MacLeod
@ 2011-06-21 23:03                         ` Andrew MacLeod
  2011-06-22 20:36                           ` Richard Henderson
  1 sibling, 1 reply; 26+ messages in thread
From: Andrew MacLeod @ 2011-06-21 23:03 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Aldy Hernandez, Jakub Jelinek, Joseph S. Myers, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 592 bytes --]

OK, I've brought the cxx-mem-model branch up to mainline as of this 
afternoon.  (After I applied the patch which renamed all the mis-named 
_sync expanders)

So this is the marginally reworked patch we had before which I think 
satisfies all the requests I've seen.

Bootstraps and no regerssions on x86_64-unknown-linux-gnu.

Assuming everyone is OK with this, I'll check it in and use it as a 
template for all the other __sync which need a memory model parameter 
specified at the end of http://gcc.gnu.org/wiki/Atomic/GCCMM/CodeGen

Then I'll move on to the libstdc++ changes.

Andrew



[-- Attachment #2: exchange.patch --]
[-- Type: text/plain, Size: 17816 bytes --]


	* doc/extend.texi (__sync_mem_exchange): Document.
	* cppbuiltin.c (define__GNUC__): Define __SYNC_MEM*.
	* c-family/c-common.c (BUILT_IN_SYNC_MEM_EXCHANGE_N): Add case.
	* optabs.c (expand_sync_mem_exchange): New.
	* optabs.h (enum direct_optab_index): Add DOI_sync_mem_exchange entry.
	(sync_mem_exchange_optab): Define.
	* genopinit.c: Add entry for sync_mem_exchange.
	* builtins.c (get_memmodel): New.
	(expand_builtin_sync_mem_exchange): New.
	(expand_builtin_sync_synchronize): Remove static.
	(expand_builtin): Add cases for BUILT_IN_SYNC_MEM_EXCHANGE_*.
	* sync-builtins.def: Add entries for BUILT_IN_SYNC_MEM_EXCHANGE_*.
	* testsuite/gcc.dg/x86-sync-1.c: New test.
	* builtin-types.def (BT_FN_I{1,2,4,8,16}_VPTR_I{1,2,4,8,16}_INT): New.
	* expr.h (expand_sync_mem_exchange): Declare.
	(expand_builtin_sync_synchronize): Declare.
	* fortran/types.def (BT_FN_I{1,2,4,8,16}_VPTR_I{1,2,4,8,16}_INT): New.
	* coretypes.h (enum memmodel): New.
	* Makefile.in (cppbuiltin.o) Add missing dependency on $(TREE_H)
	* config/i386/sync.md (sync_mem_exchange<mode>): New pattern.


Index: doc/extend.texi
===================================================================
*** doc/extend.texi	(revision 175275)
--- doc/extend.texi	(working copy)
*************** This builtin is not a full barrier, but 
*** 6728,6733 ****
--- 6728,6751 ----
  This means that all previous memory stores are globally visible, and all
  previous memory loads have been satisfied, but following memory reads
  are not prevented from being speculated to before the barrier.
+ 
+ @item @var{type} __sync_mem_exchange (@var{type} *ptr, @var{type} value, int memmodel, ...)
+ @findex __sync_mem_exchange
+ This builtin implements an atomic exchange operation within the
+ constraints of a memory model.  It writes @var{value} into
+ @code{*@var{ptr}}, and returns the previous contents of
+ @code{*@var{ptr}}.
+ 
+ The valid memory model variants for this builtin are
+ __SYNC_MEM_RELAXED, __SYNC_MEM_SEQ_CST, __SYNC_MEM_ACQUIRE,
+ __SYNC_MEM_RELEASE, and __SYNC_MEM_ACQ_REL.  The target pattern is responsible
+ for issuing the different synchronization instructions. It should default to 
+ the more restrictive memory model, the sequentially consistent model.  If 
+ nothing is implemented for the target, the compiler will implement it by
+ calling the __sync_lock_test_and_set builtin.  If the memory model is more
+ restrictive than memory_order_acquire, a memory barrier is emitted before
+ the instruction.
+ 
  @end table
  
  @node Object Size Checking
Index: cppbuiltin.c
===================================================================
*** cppbuiltin.c	(revision 175226)
--- cppbuiltin.c	(working copy)
*************** define__GNUC__ (cpp_reader *pfile)
*** 66,71 ****
--- 66,77 ----
    cpp_define_formatted (pfile, "__GNUC_MINOR__=%d", minor);
    cpp_define_formatted (pfile, "__GNUC_PATCHLEVEL__=%d", patchlevel);
    cpp_define_formatted (pfile, "__VERSION__=\"%s\"", version_string);
+   cpp_define_formatted (pfile, "__SYNC_MEM_RELAXED=%d", MEMMODEL_RELAXED);
+   cpp_define_formatted (pfile, "__SYNC_MEM_SEQ_CST=%d", MEMMODEL_SEQ_CST);
+   cpp_define_formatted (pfile, "__SYNC_MEM_ACQUIRE=%d", MEMMODEL_ACQUIRE);
+   cpp_define_formatted (pfile, "__SYNC_MEM_RELEASE=%d", MEMMODEL_RELEASE);
+   cpp_define_formatted (pfile, "__SYNC_MEM_ACQ_REL=%d", MEMMODEL_ACQ_REL);
+   cpp_define_formatted (pfile, "__SYNC_MEM_CONSUME=%d", MEMMODEL_CONSUME);
  }
  
  
Index: c-family/c-common.c
===================================================================
*** c-family/c-common.c	(revision 175275)
--- c-family/c-common.c	(working copy)
*************** resolve_overloaded_builtin (location_t l
*** 9060,9065 ****
--- 9060,9066 ----
      case BUILT_IN_SYNC_VAL_COMPARE_AND_SWAP_N:
      case BUILT_IN_SYNC_LOCK_TEST_AND_SET_N:
      case BUILT_IN_SYNC_LOCK_RELEASE_N:
+     case BUILT_IN_SYNC_MEM_EXCHANGE_N:
        {
  	int n = sync_resolve_size (function, params);
  	tree new_function, first_param, result;
Index: optabs.c
===================================================================
*** optabs.c	(revision 175275)
--- optabs.c	(working copy)
*************** expand_sync_lock_test_and_set (rtx mem, 
*** 7056,7061 ****
--- 7056,7101 ----
  
    return NULL_RTX;
  }
+ 
+ /* This function expands the atomic exchange operation:
+    atomically store VAL in MEM and return the previous value in MEM.
+ 
+    MEMMODEL is the memory model variant to use.
+    TARGET is an option place to stick the return value.  */
+ 
+ rtx
+ expand_sync_mem_exchange (enum memmodel model, rtx mem, rtx val, rtx target)
+ {
+   enum machine_mode mode = GET_MODE (mem);
+   enum insn_code icode;
+ 
+   /* If the target supports the exchange directly, great.  */
+   icode = direct_optab_handler (sync_mem_exchange_optab, mode);
+   if (icode != CODE_FOR_nothing)
+     {
+       struct expand_operand ops[4];
+ 
+       create_output_operand (&ops[0], target, mode);
+       create_fixed_operand (&ops[1], mem);
+       /* VAL may have been promoted to a wider mode.  Shrink it if so.  */
+       create_convert_operand_to (&ops[2], val, mode, true);
+       create_integer_operand (&ops[3], model);
+       if (maybe_expand_insn (icode, 4, ops))
+ 	return ops[0].value;
+     }
+ 
+   /* Legacy sync_lock_test_and_set works the same, but is only defined as an 
+      acquire barrier.  If the pattern exists, and the memory model is stronger
+      than acquire, add a release barrier before the instruction.
+      The barrier is not needed if sync_lock_test_and_set doesn't exist since
+      it will expand into a compare-and-swap loop.  */
+   icode = direct_optab_handler (sync_lock_test_and_set_optab, mode);
+   if ((icode != CODE_FOR_nothing) && (model == MEMMODEL_SEQ_CST || 
+ 				     model == MEMMODEL_ACQ_REL))
+     expand_builtin_sync_synchronize ();
+ 
+   return expand_sync_lock_test_and_set (mem, val, target);
+ }
  \f
  /* Return true if OPERAND is suitable for operand number OPNO of
     instruction ICODE.  */
Index: optabs.h
===================================================================
*** optabs.h	(revision 175275)
--- optabs.h	(working copy)
*************** enum direct_optab_index
*** 677,682 ****
--- 677,685 ----
    /* Atomic clear with release semantics.  */
    DOI_sync_lock_release,
  
+   /* Atomic operations with C++0x memory model parameters. */
+   DOI_sync_mem_exchange,
+ 
    DOI_MAX
  };
  
*************** typedef struct direct_optab_d *direct_op
*** 724,729 ****
--- 727,735 ----
    (&direct_optab_table[(int) DOI_sync_lock_test_and_set])
  #define sync_lock_release_optab \
    (&direct_optab_table[(int) DOI_sync_lock_release])
+ 
+ #define sync_mem_exchange_optab \
+   (&direct_optab_table[(int) DOI_sync_mem_exchange])
  \f
  /* Target-dependent globals.  */
  struct target_optabs {
Index: genopinit.c
===================================================================
*** genopinit.c	(revision 175275)
--- genopinit.c	(working copy)
*************** static const char * const optabs[] =
*** 241,246 ****
--- 241,247 ----
    "set_direct_optab_handler (sync_compare_and_swap_optab, $A, CODE_FOR_$(sync_compare_and_swap$I$a$))",
    "set_direct_optab_handler (sync_lock_test_and_set_optab, $A, CODE_FOR_$(sync_lock_test_and_set$I$a$))",
    "set_direct_optab_handler (sync_lock_release_optab, $A, CODE_FOR_$(sync_lock_release$I$a$))",
+   "set_direct_optab_handler (sync_mem_exchange_optab, $A, CODE_FOR_$(sync_mem_exchange$I$a$))",
    "set_optab_handler (vec_set_optab, $A, CODE_FOR_$(vec_set$a$))",
    "set_optab_handler (vec_extract_optab, $A, CODE_FOR_$(vec_extract$a$))",
    "set_optab_handler (vec_extract_even_optab, $A, CODE_FOR_$(vec_extract_even$a$))",
Index: builtins.c
===================================================================
*** builtins.c	(revision 175275)
--- builtins.c	(working copy)
*************** expand_builtin_sync_lock_test_and_set (e
*** 5199,5207 ****
    return expand_sync_lock_test_and_set (mem, val, target);
  }
  
  /* Expand the __sync_synchronize intrinsic.  */
  
! static void
  expand_builtin_sync_synchronize (void)
  {
    gimple x;
--- 5199,5267 ----
    return expand_sync_lock_test_and_set (mem, val, target);
  }
  
+ /* Given an integer representing an ``enum memmodel'', verify its
+    correctness and return the memory model enum.  */
+ 
+ static enum memmodel
+ get_memmodel (tree exp)
+ {
+   rtx op;
+ 
+   if (TREE_CODE (exp) != INTEGER_CST)
+     {
+       error ("third argument to builtin is an invalid memory model");
+       return MEMMODEL_SEQ_CST;
+     }
+   op = expand_normal (exp);
+   if (INTVAL (op) < 0 || INTVAL (op) >= MEMMODEL_LAST)
+     {
+       error ("third argument to builtin is an invalid memory model");
+       return MEMMODEL_SEQ_CST;
+     }
+   return (enum memmodel) INTVAL (op);
+ }
+ 
+ /* Expand the __sync_mem_exchange intrinsic:
+ 
+    	TYPE __sync_mem_exchange (TYPE *to, TYPE from, enum memmodel)
+ 
+    EXP is the CALL_EXPR.
+    TARGET is an optional place for us to store the results.  */
+ 
+ static rtx
+ expand_builtin_sync_mem_exchange (enum machine_mode mode, tree exp, rtx target)
+ {
+   rtx val, mem;
+   enum machine_mode old_mode;
+   enum memmodel model;
+ 
+   model = get_memmodel (CALL_EXPR_ARG (exp, 2));
+   if (model != MEMMODEL_RELAXED
+       && model != MEMMODEL_SEQ_CST
+       && model != MEMMODEL_ACQ_REL
+       && model != MEMMODEL_RELEASE
+       && model != MEMMODEL_ACQUIRE)
+     {
+       error ("invalid memory model for %<__sync_mem_exchange%>");
+       return NULL_RTX;
+     }
+ 
+   /* Expand the operands.  */
+   mem = get_builtin_sync_mem (CALL_EXPR_ARG (exp, 0), mode);
+   val = expand_expr (CALL_EXPR_ARG (exp, 1), NULL_RTX, mode, EXPAND_NORMAL);
+   /* If VAL is promoted to a wider mode, convert it back to MODE.  Take care
+      of CONST_INTs, where we know the old_mode only from the call argument.  */
+   old_mode = GET_MODE (val);
+   if (old_mode == VOIDmode)
+     old_mode = TYPE_MODE (TREE_TYPE (CALL_EXPR_ARG (exp, 1)));
+   val = convert_modes (mode, old_mode, val, 1);
+ 
+   return expand_sync_mem_exchange (model, mem, val, target);
+ }
+ 
  /* Expand the __sync_synchronize intrinsic.  */
  
! void
  expand_builtin_sync_synchronize (void)
  {
    gimple x;
*************** expand_builtin (tree exp, rtx target, rt
*** 6017,6022 ****
--- 6077,6093 ----
  	return target;
        break;
  
+     case BUILT_IN_SYNC_MEM_EXCHANGE_1:
+     case BUILT_IN_SYNC_MEM_EXCHANGE_2:
+     case BUILT_IN_SYNC_MEM_EXCHANGE_4:
+     case BUILT_IN_SYNC_MEM_EXCHANGE_8:
+     case BUILT_IN_SYNC_MEM_EXCHANGE_16:
+       mode = get_builtin_sync_mode (fcode - BUILT_IN_SYNC_MEM_EXCHANGE_1);
+       target = expand_builtin_sync_mem_exchange (mode, exp, target);
+       if (target)
+ 	return target;
+       break;
+ 
      case BUILT_IN_SYNC_LOCK_TEST_AND_SET_1:
      case BUILT_IN_SYNC_LOCK_TEST_AND_SET_2:
      case BUILT_IN_SYNC_LOCK_TEST_AND_SET_4:
Index: sync-builtins.def
===================================================================
*** sync-builtins.def	(revision 175275)
--- sync-builtins.def	(working copy)
*************** DEF_SYNC_BUILTIN (BUILT_IN_SYNC_LOCK_REL
*** 256,258 ****
--- 256,279 ----
  
  DEF_SYNC_BUILTIN (BUILT_IN_SYNC_SYNCHRONIZE, "__sync_synchronize",
  		  BT_FN_VOID, ATTR_NOTHROW_LEAF_LIST)
+ 
+ /* __sync* builtins for the C++ memory model.  */
+ 
+ DEF_SYNC_BUILTIN (BUILT_IN_SYNC_MEM_EXCHANGE_N,
+ 		  "__sync_mem_exchange",
+ 		  BT_FN_VOID_VAR, ATTR_NOTHROW_LEAF_LIST)
+ DEF_SYNC_BUILTIN (BUILT_IN_SYNC_MEM_EXCHANGE_1,
+ 		  "__sync_mem_exchange_1",
+ 		  BT_FN_I1_VPTR_I1_INT, ATTR_NOTHROW_LEAF_LIST)
+ DEF_SYNC_BUILTIN (BUILT_IN_SYNC_MEM_EXCHANGE_2,
+ 		  "__sync_mem_exchange_2",
+ 		  BT_FN_I2_VPTR_I2_INT, ATTR_NOTHROW_LEAF_LIST)
+ DEF_SYNC_BUILTIN (BUILT_IN_SYNC_MEM_EXCHANGE_4,
+ 		  "__sync_mem_exchange_4",
+ 		  BT_FN_I4_VPTR_I4_INT, ATTR_NOTHROW_LEAF_LIST)
+ DEF_SYNC_BUILTIN (BUILT_IN_SYNC_MEM_EXCHANGE_8,
+ 		  "__sync_mem_exchange_8",
+ 		  BT_FN_I8_VPTR_I8_INT, ATTR_NOTHROW_LEAF_LIST)
+ DEF_SYNC_BUILTIN (BUILT_IN_SYNC_MEM_EXCHANGE_16,
+ 		  "__sync_mem_exchange_16",
+ 		  BT_FN_I16_VPTR_I16_INT, ATTR_NOTHROW_LEAF_LIST)
Index: testsuite/gcc.dg/x86-sync-1.c
===================================================================
*** testsuite/gcc.dg/x86-sync-1.c	(revision 175275)
--- testsuite/gcc.dg/x86-sync-1.c	(working copy)
***************
*** 0 ****
--- 1,9 ----
+ /* { dg-do compile } */
+ /* { dg-options "-dap" } */
+ 
+ int i;
+ 
+ void foo()
+ {
+   __sync_mem_exchange (&i, 555, __SYNC_MEM_SEQ_CST);
+ }
Index: builtin-types.def
===================================================================
*** builtin-types.def	(revision 175226)
--- builtin-types.def	(working copy)
*************** DEF_FUNCTION_TYPE_3 (BT_FN_VOID_OMPFN_PT
*** 383,388 ****
--- 383,393 ----
  		     BT_PTR, BT_UINT)
  DEF_FUNCTION_TYPE_3 (BT_FN_PTR_CONST_PTR_INT_SIZE, BT_PTR,
  		     BT_CONST_PTR, BT_INT, BT_SIZE)
+ DEF_FUNCTION_TYPE_3 (BT_FN_I1_VPTR_I1_INT, BT_I1, BT_VOLATILE_PTR, BT_I1, BT_INT)
+ DEF_FUNCTION_TYPE_3 (BT_FN_I2_VPTR_I2_INT, BT_I2, BT_VOLATILE_PTR, BT_I2, BT_INT)
+ DEF_FUNCTION_TYPE_3 (BT_FN_I4_VPTR_I4_INT, BT_I4, BT_VOLATILE_PTR, BT_I4, BT_INT)
+ DEF_FUNCTION_TYPE_3 (BT_FN_I8_VPTR_I8_INT, BT_I8, BT_VOLATILE_PTR, BT_I8, BT_INT)
+ DEF_FUNCTION_TYPE_3 (BT_FN_I16_VPTR_I16_INT, BT_I16, BT_VOLATILE_PTR, BT_I16, BT_INT)
  
  DEF_FUNCTION_TYPE_4 (BT_FN_SIZE_CONST_PTR_SIZE_SIZE_FILEPTR,
  		     BT_SIZE, BT_CONST_PTR, BT_SIZE, BT_SIZE, BT_FILEPTR)
Index: expr.h
===================================================================
*** expr.h	(revision 175229)
--- expr.h	(working copy)
*************** rtx expand_bool_compare_and_swap (rtx, r
*** 217,222 ****
--- 217,223 ----
  rtx expand_sync_operation (rtx, rtx, enum rtx_code);
  rtx expand_sync_fetch_operation (rtx, rtx, enum rtx_code, bool, rtx);
  rtx expand_sync_lock_test_and_set (rtx, rtx, rtx);
+ rtx expand_sync_mem_exchange (enum memmodel, rtx, rtx, rtx);
  \f
  /* Functions from expmed.c:  */
  
*************** extern void expand_builtin_setjmp_receiv
*** 248,253 ****
--- 249,255 ----
  extern rtx expand_builtin_saveregs (void);
  extern void expand_builtin_trap (void);
  extern rtx builtin_strncpy_read_str (void *, HOST_WIDE_INT, enum machine_mode);
+ extern void expand_builtin_sync_synchronize (void);
  \f
  /* Functions from expr.c:  */
  
Index: fortran/types.def
===================================================================
*** fortran/types.def	(revision 175226)
--- fortran/types.def	(working copy)
*************** DEF_FUNCTION_TYPE_3 (BT_FN_I16_VPTR_I16_
*** 120,125 ****
--- 120,132 ----
  DEF_FUNCTION_TYPE_3 (BT_FN_VOID_OMPFN_PTR_UINT, BT_VOID, BT_PTR_FN_VOID_PTR,
                       BT_PTR, BT_UINT)
  
+ DEF_FUNCTION_TYPE_3 (BT_FN_I1_VPTR_I1_INT, BT_I1, BT_VOLATILE_PTR, BT_I1, BT_INT)
+ DEF_FUNCTION_TYPE_3 (BT_FN_I2_VPTR_I2_INT, BT_I2, BT_VOLATILE_PTR, BT_I2, BT_INT)
+ DEF_FUNCTION_TYPE_3 (BT_FN_I4_VPTR_I4_INT, BT_I4, BT_VOLATILE_PTR, BT_I4, BT_INT)
+ DEF_FUNCTION_TYPE_3 (BT_FN_I8_VPTR_I8_INT, BT_I8, BT_VOLATILE_PTR, BT_I8, BT_INT)
+ DEF_FUNCTION_TYPE_3 (BT_FN_I16_VPTR_I16_INT, BT_I16, BT_VOLATILE_PTR, BT_I16, BT_INT)
+ 
+ 
  DEF_FUNCTION_TYPE_4 (BT_FN_VOID_OMPFN_PTR_UINT_UINT,
                       BT_VOID, BT_PTR_FN_VOID_PTR, BT_PTR, BT_UINT, BT_UINT)
  DEF_FUNCTION_TYPE_4 (BT_FN_VOID_PTR_WORD_WORD_PTR,
Index: coretypes.h
===================================================================
*** coretypes.h	(revision 175226)
--- coretypes.h	(working copy)
*************** union _dont_use_tree_here_;
*** 171,175 ****
--- 171,188 ----
  
  #endif
  
+ /* Memory model types for the __sync_mem* builtins. 
+    This must match the order in libstdc++-v3/include/bits/atomic_base.h.  */
+ enum memmodel
+ {
+   MEMMODEL_RELAXED = 0,
+   MEMMODEL_CONSUME = 1,
+   MEMMODEL_ACQUIRE = 2,
+   MEMMODEL_RELEASE = 3,
+   MEMMODEL_ACQ_REL = 4,
+   MEMMODEL_SEQ_CST = 5,
+   MEMMODEL_LAST = 6
+ };
+ 
  #endif /* coretypes.h */
  
Index: Makefile.in
===================================================================
*** Makefile.in	(revision 175275)
--- Makefile.in	(working copy)
*************** PREPROCESSOR_DEFINES = \
*** 4083,4089 ****
    @TARGET_SYSTEM_ROOT_DEFINE@
  
  cppbuiltin.o: cppbuiltin.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
! 	cppbuiltin.h Makefile
  	$(COMPILER) $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) \
  	  $(PREPROCESSOR_DEFINES) -DBASEVER=$(BASEVER_s) \
  	  -c $(srcdir)/cppbuiltin.c $(OUTPUT_OPTION)
--- 4083,4089 ----
    @TARGET_SYSTEM_ROOT_DEFINE@
  
  cppbuiltin.o: cppbuiltin.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
! 	$(TREE_H) cppbuiltin.h Makefile
  	$(COMPILER) $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) \
  	  $(PREPROCESSOR_DEFINES) -DBASEVER=$(BASEVER_s) \
  	  -c $(srcdir)/cppbuiltin.c $(OUTPUT_OPTION)
Index: config/i386/sync.md
===================================================================
*** config/i386/sync.md	(revision 175229)
--- config/i386/sync.md	(working copy)
***************
*** 232,237 ****
--- 232,251 ----
    return "lock{%;} add{<imodesuffix>}\t{%1, %0|%0, %1}";
  })
  
+ (define_expand "sync_mem_exchange<mode>"
+   [(match_operand:SWI 0 "register_operand" "")		;; output
+    (match_operand:SWI 1 "memory_operand" "")		;; memory
+    (match_operand:SWI 2 "register_operand" "")		;; input
+    (match_operand:SI  3 "const_int_operand" "")]	;; memory model
+   ""
+ {
+   /* On i386 the xchg instruction is a full barrier.  Thus we
+      can completely ignore the memory model operand.  */
+   emit_insn (gen_sync_lock_test_and_set<mode>
+ 		(operands[0], operands[1], operands[2]));
+   DONE;
+ })
+ 
  ;; Recall that xchg implicitly sets LOCK#, so adding it again wastes space.
  (define_insn "sync_lock_test_and_set<mode>"
    [(set (match_operand:SWI 0 "register_operand" "=<r>")

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: __sync_swap*  [ rename sync builtins ]
  2011-06-21 23:03                             ` Graham Stott
@ 2011-06-21 23:26                               ` Andrew MacLeod
  2011-06-22  0:59                                 ` Andrew MacLeod
  2011-06-24  0:35                                 ` Ian Lance Taylor
  0 siblings, 2 replies; 26+ messages in thread
From: Andrew MacLeod @ 2011-06-21 23:26 UTC (permalink / raw)
  To: Graham Stott
  Cc: Richard Henderson, Aldy Hernandez, Jakub Jelinek,
	Joseph S. Myers, gcc-patches

On 06/21/2011 06:26 PM, Graham Stott wrote:
> All
>
> --- On Tue, 21/6/11, Richard Henderson<rth@redhat.com>  wrote:
>
> This looks to have broken the go frontend
>
> gcc/gcc/go/gofrontend/gogo-tree.cc: In member function ‘void Gogo::define_builtin_function_trees()’:
> /usr/local/src/gcc4.7/gcc/gcc/go/gofrontend/gogo-tree.cc:94:18: error: ‘BUILT_IN_ADD_AND_FETCH_1’ was not declared in this scope
> /usr/local/src/gcc4.7/gcc/gcc/go/gofrontend/gogo-tree.cc:99:19: error: ‘BUILT_IN_ADD_AND_FETCH_2’ was not declared in this scope
> /usr/local/src/gcc4.7/gcc/gcc/go/gofrontend/gogo-tree.cc:104:18: error: ‘BUILT_IN_ADD_AND_FETCH_4’ was not declared in this scope
> /usr/local/src/gcc4.7/gcc/gcc/go/gofrontend/gogo-tree.cc:109:18: error: ‘BUILT_IN_ADD_AND_FETCH_8’ was not declared in this scope
>
> Graham
ah, missed it's .cc file, and I guess it doesn't build by default  :-P

This ought to fix it, checking in as obvious...

Andrew

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: __sync_swap*  [ rename sync builtins ]
  2011-06-21 23:26                               ` Andrew MacLeod
@ 2011-06-22  0:59                                 ` Andrew MacLeod
  2011-06-24  0:35                                 ` Ian Lance Taylor
  1 sibling, 0 replies; 26+ messages in thread
From: Andrew MacLeod @ 2011-06-22  0:59 UTC (permalink / raw)
  To: Graham Stott
  Cc: Richard Henderson, Aldy Hernandez, Jakub Jelinek,
	Joseph S. Myers, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 399 bytes --]

On 06/21/2011 07:03 PM, Andrew MacLeod wrote:
> On 06/21/2011 06:26 PM, Graham Stott wrote:
>> All
>>
>> --- On Tue, 21/6/11, Richard Henderson<rth@redhat.com>  wrote:
>>
>> This looks to have broken the go frontend
>>
>>
>> Graham
> ah, missed it's .cc file, and I guess it doesn't build by default  :-P
>
> This ought to fix it, checking in as obvious...
Not sure where the patch went... sigh...


[-- Attachment #2: go.patch --]
[-- Type: text/plain, Size: 2690 bytes --]


	* gogo-tree.cc (Gogo::define_builtin_function_trees): Change 
	BUILT_IN_ADD_AND_FETCH to BUILT_IN_SYNC_ADD_AND_FETCH.

Index: go/gofrontend/gogo-tree.cc
===================================================================
*** go/gofrontend/gogo-tree.cc	(revision 175272)
--- go/gofrontend/gogo-tree.cc	(working copy)
*************** Gogo::define_builtin_function_trees()
*** 91,112 ****
       for ++ and --.  */
    tree t = go_type_for_size(BITS_PER_UNIT, 1);
    tree p = build_pointer_type(build_qualified_type(t, TYPE_QUAL_VOLATILE));
!   define_builtin(BUILT_IN_ADD_AND_FETCH_1, "__sync_fetch_and_add_1", NULL,
  		 build_function_type_list(t, p, t, NULL_TREE), false);
  
    t = go_type_for_size(BITS_PER_UNIT * 2, 1);
    p = build_pointer_type(build_qualified_type(t, TYPE_QUAL_VOLATILE));
!   define_builtin (BUILT_IN_ADD_AND_FETCH_2, "__sync_fetch_and_add_2", NULL,
  		  build_function_type_list(t, p, t, NULL_TREE), false);
  
    t = go_type_for_size(BITS_PER_UNIT * 4, 1);
    p = build_pointer_type(build_qualified_type(t, TYPE_QUAL_VOLATILE));
!   define_builtin(BUILT_IN_ADD_AND_FETCH_4, "__sync_fetch_and_add_4", NULL,
  		 build_function_type_list(t, p, t, NULL_TREE), false);
  
    t = go_type_for_size(BITS_PER_UNIT * 8, 1);
    p = build_pointer_type(build_qualified_type(t, TYPE_QUAL_VOLATILE));
!   define_builtin(BUILT_IN_ADD_AND_FETCH_8, "__sync_fetch_and_add_8", NULL,
  		 build_function_type_list(t, p, t, NULL_TREE), false);
  
    // We use __builtin_expect for magic import functions.
--- 91,112 ----
       for ++ and --.  */
    tree t = go_type_for_size(BITS_PER_UNIT, 1);
    tree p = build_pointer_type(build_qualified_type(t, TYPE_QUAL_VOLATILE));
!   define_builtin(BUILT_IN_SYNC_ADD_AND_FETCH_1, "__sync_fetch_and_add_1", NULL,
  		 build_function_type_list(t, p, t, NULL_TREE), false);
  
    t = go_type_for_size(BITS_PER_UNIT * 2, 1);
    p = build_pointer_type(build_qualified_type(t, TYPE_QUAL_VOLATILE));
!   define_builtin (BUILT_IN_SYNC_ADD_AND_FETCH_2, "__sync_fetch_and_add_2", NULL,
  		  build_function_type_list(t, p, t, NULL_TREE), false);
  
    t = go_type_for_size(BITS_PER_UNIT * 4, 1);
    p = build_pointer_type(build_qualified_type(t, TYPE_QUAL_VOLATILE));
!   define_builtin(BUILT_IN_SYNC_ADD_AND_FETCH_4, "__sync_fetch_and_add_4", NULL,
  		 build_function_type_list(t, p, t, NULL_TREE), false);
  
    t = go_type_for_size(BITS_PER_UNIT * 8, 1);
    p = build_pointer_type(build_qualified_type(t, TYPE_QUAL_VOLATILE));
!   define_builtin(BUILT_IN_SYNC_ADD_AND_FETCH_8, "__sync_fetch_and_add_8", NULL,
  		 build_function_type_list(t, p, t, NULL_TREE), false);
  
    // We use __builtin_expect for magic import functions.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [cxx-mem-model]  sync_mem_exchange implementation with memory model parameters
  2011-06-21 23:03                         ` [cxx-mem-model] sync_mem_exchange implementation with memory model parameters Andrew MacLeod
@ 2011-06-22 20:36                           ` Richard Henderson
  0 siblings, 0 replies; 26+ messages in thread
From: Richard Henderson @ 2011-06-22 20:36 UTC (permalink / raw)
  To: Andrew MacLeod
  Cc: Aldy Hernandez, Jakub Jelinek, Joseph S. Myers, gcc-patches

On 06/21/2011 03:27 PM, Andrew MacLeod wrote:
> 	* doc/extend.texi (__sync_mem_exchange): Document.
> 	* cppbuiltin.c (define__GNUC__): Define __SYNC_MEM*.
> 	* c-family/c-common.c (BUILT_IN_SYNC_MEM_EXCHANGE_N): Add case.
> 	* optabs.c (expand_sync_mem_exchange): New.
> 	* optabs.h (enum direct_optab_index): Add DOI_sync_mem_exchange entry.
> 	(sync_mem_exchange_optab): Define.
> 	* genopinit.c: Add entry for sync_mem_exchange.
> 	* builtins.c (get_memmodel): New.
> 	(expand_builtin_sync_mem_exchange): New.
> 	(expand_builtin_sync_synchronize): Remove static.
> 	(expand_builtin): Add cases for BUILT_IN_SYNC_MEM_EXCHANGE_*.
> 	* sync-builtins.def: Add entries for BUILT_IN_SYNC_MEM_EXCHANGE_*.
> 	* testsuite/gcc.dg/x86-sync-1.c: New test.
> 	* builtin-types.def (BT_FN_I{1,2,4,8,16}_VPTR_I{1,2,4,8,16}_INT): New.
> 	* expr.h (expand_sync_mem_exchange): Declare.
> 	(expand_builtin_sync_synchronize): Declare.
> 	* fortran/types.def (BT_FN_I{1,2,4,8,16}_VPTR_I{1,2,4,8,16}_INT): New.
> 	* coretypes.h (enum memmodel): New.
> 	* Makefile.in (cppbuiltin.o) Add missing dependency on $(TREE_H)
> 	* config/i386/sync.md (sync_mem_exchange<mode>): New pattern.
> 

Looks good.


r~

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: __sync_swap* [ rename sync builtins ]
  2011-06-21 23:26                               ` Andrew MacLeod
  2011-06-22  0:59                                 ` Andrew MacLeod
@ 2011-06-24  0:35                                 ` Ian Lance Taylor
  2011-06-24  0:38                                   ` Andrew MacLeod
  1 sibling, 1 reply; 26+ messages in thread
From: Ian Lance Taylor @ 2011-06-24  0:35 UTC (permalink / raw)
  To: Andrew MacLeod
  Cc: Graham Stott, Richard Henderson, Aldy Hernandez, Jakub Jelinek,
	Joseph S. Myers, gcc-patches

On Tue, Jun 21, 2011 at 4:03 PM, Andrew MacLeod <amacleod@redhat.com> wrote:
> On 06/21/2011 06:26 PM, Graham Stott wrote:
>> This looks to have broken the go frontend
>
> ah, missed it's .cc file, and I guess it doesn't build by default  :-P
>
> This ought to fix it, checking in as obvious...

Note that the files in gcc/go/gofrontend are mirrored from a different
repository,
and should be changed there before changing the files in the gcc repository.
I am slowly fixing the cases where this causes generic gcc problems
like this one.
I will take care of moving this trivial patch over.

Ian

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: __sync_swap* [ rename sync builtins ]
  2011-06-24  0:35                                 ` Ian Lance Taylor
@ 2011-06-24  0:38                                   ` Andrew MacLeod
  0 siblings, 0 replies; 26+ messages in thread
From: Andrew MacLeod @ 2011-06-24  0:38 UTC (permalink / raw)
  To: Ian Lance Taylor
  Cc: Graham Stott, Richard Henderson, Aldy Hernandez, Jakub Jelinek,
	Joseph S. Myers, gcc-patches

On 06/23/2011 08:14 PM, Ian Lance Taylor wrote:
> On Tue, Jun 21, 2011 at 4:03 PM, Andrew MacLeod<amacleod@redhat.com>  wrote:
>> On 06/21/2011 06:26 PM, Graham Stott wrote:
>>> This looks to have broken the go frontend
>> ah, missed it's .cc file, and I guess it doesn't build by default  :-P
>>
>> This ought to fix it, checking in as obvious...
> Note that the files in gcc/go/gofrontend are mirrored from a different
> repository,
> and should be changed there before changing the files in the gcc repository.
> I am slowly fixing the cases where this causes generic gcc problems
> like this one.
> I will take care of moving this trivial patch over.

oops, sorry I had no idea...  and thanks :-)

Andrew

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: __sync_swap* with acq/rel/full memory barrier semantics
  2011-06-20 16:39                 ` Andrew MacLeod
  2011-06-20 22:50                   ` Richard Henderson
@ 2011-07-08 17:00                   ` Aldy Hernandez
  1 sibling, 0 replies; 26+ messages in thread
From: Aldy Hernandez @ 2011-07-08 17:00 UTC (permalink / raw)
  To: Andrew MacLeod
  Cc: Richard Henderson, Jakub Jelinek, Joseph S. Myers, gcc-patches


>> Some names include "sync" and some don't?
>
> Well, I was going to blame Aldy :-) but then I went to look at this, and
> thats the same way *all* the other __sync instructions seem to be.
>
> ie:
>
> builtins.c:expand_builtin_lock_test_and_set (enum machine_mode mode,
> tree exp,
> builtins.c: case BUILT_IN_LOCK_TEST_AND_SET_1:
> builtins.c: case BUILT_IN_LOCK_TEST_AND_SET_2:
> builtins.c: case BUILT_IN_LOCK_TEST_AND_SET_4:
>
> whereas everything else is 'sync_lock_test_and_set'..
>
> So i guess it falls to prior art... I assume Aldy just cut-and-pasted
> for his new routine and just changed the names in the same format.

Correct, this was the way all the other sync builtins were implemented. 
  I found it odd as well, but wanted to keep my changes to a minimum.

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2011-07-08 16:31 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-24  8:27 __sync_swap* with acq/rel/full memory barrier semantics Aldy Hernandez
2011-05-24  9:25 ` Joseph S. Myers
2011-05-30 22:53   ` Andrew MacLeod
2011-05-31 13:12     ` Jakub Jelinek
2011-05-31 15:23       ` Andrew MacLeod
2011-06-02 19:13     ` Aldy Hernandez
2011-06-02 19:25       ` Jakub Jelinek
2011-06-02 19:53         ` Aldy Hernandez
2011-06-03 14:27           ` Richard Henderson
2011-06-17 22:21             ` Andrew MacLeod
2011-06-18 19:49               ` Richard Henderson
2011-06-20 16:39                 ` Andrew MacLeod
2011-06-20 22:50                   ` Richard Henderson
2011-06-20 23:02                     ` Andrew MacLeod
2011-06-20 23:29                       ` Richard Henderson
2011-06-21 18:56                         ` __sync_swap* [ rename sync builtins ] Andrew MacLeod
2011-06-21 19:03                           ` Richard Henderson
2011-06-21 23:03                             ` Graham Stott
2011-06-21 23:26                               ` Andrew MacLeod
2011-06-22  0:59                                 ` Andrew MacLeod
2011-06-24  0:35                                 ` Ian Lance Taylor
2011-06-24  0:38                                   ` Andrew MacLeod
2011-06-21 23:03                         ` [cxx-mem-model] sync_mem_exchange implementation with memory model parameters Andrew MacLeod
2011-06-22 20:36                           ` Richard Henderson
2011-07-08 17:00                   ` __sync_swap* with acq/rel/full memory barrier semantics Aldy Hernandez
2011-06-18 23:49               ` Richard Henderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).