public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [Patch 0/3] ARM 64 bit atomic operations
@ 2011-07-01 15:53 Dr. David Alan Gilbert
  2011-07-01 15:55 ` [Patch 1/3] " Dr. David Alan Gilbert
                   ` (3 more replies)
  0 siblings, 4 replies; 43+ messages in thread
From: Dr. David Alan Gilbert @ 2011-07-01 15:53 UTC (permalink / raw)
  To: gcc-patches

Hi,
  This is a series of 3 patches relating to ARM atomic operations.

  1) Provide 64 bit atomic operations using the new ldrexd/strexd in ARMv6k
     and above.
  2) Provide fallbacks so that when compiled for earlier CPUs a Linux kernel
     asssist is called (as per 32bit and smaller ops)
  3) Fix pr48126 which is a misplaced barrier in the atomic generation

Many thanks to Richard Sandiford for pointing me in the right direction
and reviewing it.

This work is part of Linaro blueprint:
https://blueprints.launchpad.net/linaro-toolchain-misc/+spec/64-bit-sync-primitives

The patch was generated from the gcc git tree from about 2 weeks back
but applies happily on top of the head.

It's been tested cross to ARM from x86 and also a native x86 build & test.

Dave

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [Patch 1/3] ARM 64 bit atomic operations
  2011-07-01 15:53 [Patch 0/3] ARM 64 bit atomic operations Dr. David Alan Gilbert
@ 2011-07-01 15:55 ` Dr. David Alan Gilbert
  2011-07-12 21:24   ` Ramana Radhakrishnan
  2011-07-01 15:56 ` [Patch 2/3] " Dr. David Alan Gilbert
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 43+ messages in thread
From: Dr. David Alan Gilbert @ 2011-07-01 15:55 UTC (permalink / raw)
  To: gcc-patches; +Cc: patches


  Provide 64bit atomic support for ARMv6k and above using the new ldrexd and
  strexd instructions.
  Add test cases to cover case where 64bit is long long (not long)
  and where the additions etc are done as 32bit ops.
  Add a multithreaded test.
  Add test predicates for
   architectures with long long 64 bit atomics (ARM only so far)
   ARM architecture varients

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 057f9ba..39057d2 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -23356,12 +23356,26 @@ arm_output_ldrex (emit_f emit,
 		  rtx target,
 		  rtx memory)
 {
-  const char *suffix = arm_ldrex_suffix (mode);
-  rtx operands[2];
+  rtx operands[3];
 
   operands[0] = target;
-  operands[1] = memory;
-  arm_output_asm_insn (emit, 0, operands, "ldrex%s\t%%0, %%C1", suffix);
+  if (mode != DImode)
+    {
+      const char *suffix = arm_ldrex_suffix (mode);
+      operands[1] = memory;
+      arm_output_asm_insn (emit, 0, operands, "ldrex%s\t%%0, %%C1", suffix);
+    }
+  else 
+    {
+      /* The restrictions on target registers in ARM mode are that the two
+	 registers are consecutive and the first one is even; Thumb is
+	 actually more flexible, but DI should give us this anyway.
+	 Note that the 1st register always gets the lowest word in memory.  */
+      gcc_assert ((REGNO (target) & 1) == 0);
+      operands[1] = gen_rtx_REG (SImode, REGNO (target) + 1);
+      operands[2] = memory;
+      arm_output_asm_insn (emit, 0, operands, "ldrexd\t%%0, %%1, %%C2");
+    }
 }
 
 /* Emit a strex{b,h,d, } instruction appropriate for the specified
@@ -23374,14 +23388,29 @@ arm_output_strex (emit_f emit,
 		  rtx value,
 		  rtx memory)
 {
-  const char *suffix = arm_ldrex_suffix (mode);
-  rtx operands[3];
+  rtx operands[4];
 
   operands[0] = result;
   operands[1] = value;
-  operands[2] = memory;
-  arm_output_asm_insn (emit, 0, operands, "strex%s%s\t%%0, %%1, %%C2", suffix,
-		       cc);
+  if (mode != DImode)
+    {
+      const char *suffix = arm_ldrex_suffix (mode);
+      operands[2] = memory;
+      arm_output_asm_insn (emit, 0, operands, "strex%s%s\t%%0, %%1, %%C2",
+			  suffix, cc);
+    }
+  else
+    {
+      /* The restrictions on target registers in ARM mode are that the two
+	 registers are consecutive and the first one is even; Thumb is
+	 actually more flexible, but DI should give us this anyway.
+	 Note that the 1st register always gets the lowest word in memory.  */
+      gcc_assert ((REGNO (value) & 1) == 0);
+      operands[2] = gen_rtx_REG (SImode, REGNO (value) + 1);
+      operands[3] = memory;
+      arm_output_asm_insn (emit, 0, operands, "strexd%s\t%%0, %%1, %%2, %%C3",
+			   cc);
+    }
 }
 
 /* Helper to emit a two operand instruction.  */
@@ -23423,7 +23452,7 @@ arm_output_op3 (emit_f emit, const char *mnemonic, rtx d, rtx a, rtx b)
 
    required_value:
 
-   RTX register or const_int representing the required old_value for
+   RTX register representing the required old_value for
    the modify to continue, if NULL no comparsion is performed.  */
 static void
 arm_output_sync_loop (emit_f emit,
@@ -23437,7 +23466,13 @@ arm_output_sync_loop (emit_f emit,
 		      enum attr_sync_op sync_op,
 		      int early_barrier_required)
 {
-  rtx operands[1];
+  rtx operands[2];
+  /* We'll use the lo for the normal rtx in the none-DI case
+     as well as the least-sig word in the DI case.  */
+  rtx old_value_lo, required_value_lo, new_value_lo, t1_lo;
+  rtx old_value_hi, required_value_hi, new_value_hi, t1_hi;
+
+  bool is_di = mode == DImode;
 
   gcc_assert (t1 != t2);
 
@@ -23448,82 +23483,142 @@ arm_output_sync_loop (emit_f emit,
 
   arm_output_ldrex (emit, mode, old_value, memory);
 
+  if (is_di)
+    {
+      old_value_lo = gen_lowpart (SImode, old_value);
+      old_value_hi = gen_highpart (SImode, old_value);
+      if (required_value)
+	{
+	  required_value_lo = gen_lowpart (SImode, required_value);
+	  required_value_hi = gen_highpart (SImode, required_value);
+	}
+      else
+	{
+	  /* Silence false potentially unused warning */
+	  required_value_lo = NULL;
+	  required_value_hi = NULL;
+	}
+      new_value_lo = gen_lowpart (SImode, new_value);
+      new_value_hi = gen_highpart (SImode, new_value);
+      t1_lo = gen_lowpart (SImode, t1);
+      t1_hi = gen_highpart (SImode, t1);
+    }
+  else
+    {
+      old_value_lo = old_value;
+      new_value_lo = new_value;
+      required_value_lo = required_value;
+      t1_lo = t1;
+
+      /* Silence false potentially unused warning */
+      t1_hi = NULL;
+      new_value_hi = NULL;
+      required_value_hi = NULL;
+      old_value_hi = NULL;
+    }
+
   if (required_value)
     {
-      rtx operands[2];
+      operands[0] = old_value_lo;
+      operands[1] = required_value_lo;
 
-      operands[0] = old_value;
-      operands[1] = required_value;
       arm_output_asm_insn (emit, 0, operands, "cmp\t%%0, %%1");
+      if (is_di)
+        {
+          arm_output_asm_insn (emit, 0, operands, "it\teq");
+          arm_output_op2 (emit, "cmpeq", old_value_hi, required_value_hi);
+        }
       arm_output_asm_insn (emit, 0, operands, "bne\t%sLSYB%%=", LOCAL_LABEL_PREFIX);
     }
 
   switch (sync_op)
     {
     case SYNC_OP_ADD:
-      arm_output_op3 (emit, "add", t1, old_value, new_value);
+      arm_output_op3 (emit, is_di ? "adds" : "add",
+		      t1_lo, old_value_lo, new_value_lo);
+      if (is_di)
+	arm_output_op3 (emit, "adc", t1_hi, old_value_hi, new_value_hi);
       break;
 
     case SYNC_OP_SUB:
-      arm_output_op3 (emit, "sub", t1, old_value, new_value);
+      arm_output_op3 (emit, is_di ? "subs" : "sub",
+		      t1_lo, old_value_lo, new_value_lo);
+      if (is_di)
+	arm_output_op3 (emit, "sbc", t1_hi, old_value_hi, new_value_hi);
       break;
 
     case SYNC_OP_IOR:
-      arm_output_op3 (emit, "orr", t1, old_value, new_value);
+      arm_output_op3 (emit, "orr", t1_lo, old_value_lo, new_value_lo);
+      if (is_di)
+	arm_output_op3 (emit, "orr", t1_hi, old_value_hi, new_value_hi);
       break;
 
     case SYNC_OP_XOR:
-      arm_output_op3 (emit, "eor", t1, old_value, new_value);
+      arm_output_op3 (emit, "eor", t1_lo, old_value_lo, new_value_lo);
+      if (is_di)
+	arm_output_op3 (emit, "eor", t1_hi, old_value_hi, new_value_hi);
       break;
 
     case SYNC_OP_AND:
-      arm_output_op3 (emit,"and", t1, old_value, new_value);
+      arm_output_op3 (emit,"and", t1_lo, old_value_lo, new_value_lo);
+      if (is_di)
+	arm_output_op3 (emit, "and", t1_hi, old_value_hi, new_value_hi);
       break;
 
     case SYNC_OP_NAND:
-      arm_output_op3 (emit, "and", t1, old_value, new_value);
-      arm_output_op2 (emit, "mvn", t1, t1);
+      arm_output_op3 (emit, "and", t1_lo, old_value_lo, new_value_lo);
+      if (is_di)
+	arm_output_op3 (emit, "and", t1_hi, old_value_hi, new_value_hi);
+      arm_output_op2 (emit, "mvn", t1_lo, t1_lo);
+      if (is_di)
+	arm_output_op2 (emit, "mvn", t1_hi, t1_hi);
       break;
 
     case SYNC_OP_NONE:
       t1 = new_value;
+      t1_lo = new_value_lo;
+      if (is_di)
+	t1_hi = new_value_hi;
       break;
     }
 
+  /* Note that the result of strex is a 0/1 flag that's always 1 register. */
   if (t2)
     {
-       arm_output_strex (emit, mode, "", t2, t1, memory);
-       operands[0] = t2;
-       arm_output_asm_insn (emit, 0, operands, "teq\t%%0, #0");
-       arm_output_asm_insn (emit, 0, operands, "bne\t%sLSYT%%=",
-			    LOCAL_LABEL_PREFIX);
+      arm_output_strex (emit, mode, "", t2, t1, memory);
+      operands[0] = t2;
+      arm_output_asm_insn (emit, 0, operands, "teq\t%%0, #0");
+      arm_output_asm_insn (emit, 0, operands, "bne\t%sLSYT%%=",
+			   LOCAL_LABEL_PREFIX);
     }
   else
     {
       /* Use old_value for the return value because for some operations
 	 the old_value can easily be restored.  This saves one register.  */
-      arm_output_strex (emit, mode, "", old_value, t1, memory);
-      operands[0] = old_value;
+      arm_output_strex (emit, mode, "", old_value_lo, t1, memory);
+      operands[0] = old_value_lo;
       arm_output_asm_insn (emit, 0, operands, "teq\t%%0, #0");
       arm_output_asm_insn (emit, 0, operands, "bne\t%sLSYT%%=",
 			   LOCAL_LABEL_PREFIX);
 
+      /* Note that we only used the _lo half of old_value as a temporary
+	 so in DI we don't have to restore the _hi part */
       switch (sync_op)
 	{
 	case SYNC_OP_ADD:
-	  arm_output_op3 (emit, "sub", old_value, t1, new_value);
+	  arm_output_op3 (emit, "sub", old_value_lo, t1_lo, new_value_lo);
 	  break;
 
 	case SYNC_OP_SUB:
-	  arm_output_op3 (emit, "add", old_value, t1, new_value);
+	  arm_output_op3 (emit, "add", old_value_lo, t1_lo, new_value_lo);
 	  break;
 
 	case SYNC_OP_XOR:
-	  arm_output_op3 (emit, "eor", old_value, t1, new_value);
+	  arm_output_op3 (emit, "eor", old_value_lo, t1_lo, new_value_lo);
 	  break;
 
 	case SYNC_OP_NONE:
-	  arm_output_op2 (emit, "mov", old_value, required_value);
+	  arm_output_op2 (emit, "mov", old_value_lo, required_value_lo);
 	  break;
 
 	default:
@@ -23626,7 +23721,7 @@ arm_expand_sync (enum machine_mode mode,
     target = gen_reg_rtx (mode);
 
   memory = arm_legitimize_sync_memory (memory);
-  if (mode != SImode)
+  if (mode != SImode && mode != DImode)
     {
       rtx load_temp = gen_reg_rtx (SImode);
 
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index c32ef1a..3fdd22f 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -282,7 +282,8 @@ extern void (*arm_lang_output_object_attributes_hook)(void);
 #define TARGET_HAVE_DMB		(arm_arch7)
 
 /* Nonzero if this chip implements a memory barrier via CP15.  */
-#define TARGET_HAVE_DMB_MCR	(arm_arch6k && ! TARGET_HAVE_DMB)
+#define TARGET_HAVE_DMB_MCR	(arm_arch6k && ! TARGET_HAVE_DMB \
+				 && ! TARGET_THUMB1)
 
 /* Nonzero if this chip implements a memory barrier instruction.  */
 #define TARGET_HAVE_MEMORY_BARRIER (TARGET_HAVE_DMB || TARGET_HAVE_DMB_MCR)
@@ -290,8 +291,12 @@ extern void (*arm_lang_output_object_attributes_hook)(void);
 /* Nonzero if this chip supports ldrex and strex */
 #define TARGET_HAVE_LDREX	((arm_arch6 && TARGET_ARM) || arm_arch7)
 
-/* Nonzero if this chip supports ldrex{bhd} and strex{bhd}.  */
-#define TARGET_HAVE_LDREXBHD	((arm_arch6k && TARGET_ARM) || arm_arch7)
+/* Nonzero if this chip supports ldrex{bh} and strex{bh}.  */
+#define TARGET_HAVE_LDREXBH	((arm_arch6k && TARGET_ARM) || arm_arch7)
+
+/* Nonzero if this chip supports ldrexd and strexd.  */
+#define TARGET_HAVE_LDREXD	(((arm_arch6k && TARGET_ARM) || arm_arch7) \
+				 && arm_arch_notm)
 
 /* Nonzero if integer division instructions supported.  */
 #define TARGET_IDIV		((TARGET_ARM && arm_arch_arm_hwdiv) \
diff --git a/gcc/config/arm/sync.md b/gcc/config/arm/sync.md
index 689a235..406dc6c 100644
--- a/gcc/config/arm/sync.md
+++ b/gcc/config/arm/sync.md
@@ -1,6 +1,7 @@
 ;; Machine description for ARM processor synchronization primitives.
 ;; Copyright (C) 2010 Free Software Foundation, Inc.
 ;; Written by Marcus Shawcroft (marcus.shawcroft@arm.com)
+;; 64bit Atomics by Dave Gilbert (david.gilbert@linaro.org)
 ;;
 ;; This file is part of GCC.
 ;;
@@ -33,31 +34,28 @@
   MEM_VOLATILE_P (operands[0]) = 1;
 })
 
-(define_expand "sync_compare_and_swapsi"
-  [(set (match_operand:SI 0 "s_register_operand")
-        (unspec_volatile:SI [(match_operand:SI 1 "memory_operand")
-			     (match_operand:SI 2 "s_register_operand")
-			     (match_operand:SI 3 "s_register_operand")]
-			     VUNSPEC_SYNC_COMPARE_AND_SWAP))]
-  "TARGET_HAVE_LDREX && TARGET_HAVE_MEMORY_BARRIER"
-  {
-    struct arm_sync_generator generator;
-    generator.op = arm_sync_generator_omrn;
-    generator.u.omrn = gen_arm_sync_compare_and_swapsi;
-    arm_expand_sync (SImode, &generator, operands[0], operands[1], operands[2],
-                     operands[3]);
-    DONE;
-  })
 
 (define_mode_iterator NARROW [QI HI])
+(define_mode_iterator QHSD [QI HI SI DI])
+(define_mode_iterator SIDI [SI DI])
+
+(define_mode_attr sync_predtab [(SI "TARGET_HAVE_LDREX && TARGET_HAVE_MEMORY_BARRIER")
+				(QI "TARGET_HAVE_LDREXBH && TARGET_HAVE_MEMORY_BARRIER")
+				(HI "TARGET_HAVE_LDREXBH && TARGET_HAVE_MEMORY_BARRIER")
+				(DI "TARGET_HAVE_LDREXD && ARM_DOUBLEWORD_ALIGN && TARGET_HAVE_MEMORY_BARRIER")])
+
+(define_mode_attr sync_atleastsi [(SI "SI")
+				  (DI "DI")
+				  (HI "SI")
+				  (QI "SI")])
 
 (define_expand "sync_compare_and_swap<mode>"
-  [(set (match_operand:NARROW 0 "s_register_operand")
-        (unspec_volatile:NARROW [(match_operand:NARROW 1 "memory_operand")
-			     (match_operand:NARROW 2 "s_register_operand")
-			     (match_operand:NARROW 3 "s_register_operand")]
+  [(set (match_operand:QHSD 0 "s_register_operand")
+        (unspec_volatile:QHSD [(match_operand:QHSD 1 "memory_operand")
+			     (match_operand:QHSD 2 "s_register_operand")
+			     (match_operand:QHSD 3 "s_register_operand")]
 			     VUNSPEC_SYNC_COMPARE_AND_SWAP))]
-  "TARGET_HAVE_LDREXBHD && TARGET_HAVE_MEMORY_BARRIER"
+  "<sync_predtab>"
   {
     struct arm_sync_generator generator;
     generator.op = arm_sync_generator_omrn;
@@ -67,25 +65,11 @@
     DONE;
   })
 
-(define_expand "sync_lock_test_and_setsi"
-  [(match_operand:SI 0 "s_register_operand")
-   (match_operand:SI 1 "memory_operand")
-   (match_operand:SI 2 "s_register_operand")]
-  "TARGET_HAVE_LDREX && TARGET_HAVE_MEMORY_BARRIER"
-  {
-    struct arm_sync_generator generator;
-    generator.op = arm_sync_generator_omn;
-    generator.u.omn = gen_arm_sync_lock_test_and_setsi;
-    arm_expand_sync (SImode, &generator, operands[0], operands[1], NULL,
-                     operands[2]);
-    DONE;
-  })
-
 (define_expand "sync_lock_test_and_set<mode>"
-  [(match_operand:NARROW 0 "s_register_operand")
-   (match_operand:NARROW 1 "memory_operand")
-   (match_operand:NARROW 2 "s_register_operand")]
-  "TARGET_HAVE_LDREXBHD && TARGET_HAVE_MEMORY_BARRIER"
+  [(match_operand:QHSD 0 "s_register_operand")
+   (match_operand:QHSD 1 "memory_operand")
+   (match_operand:QHSD 2 "s_register_operand")]
+  "<sync_predtab>"
   {
     struct arm_sync_generator generator;
     generator.op = arm_sync_generator_omn;
@@ -115,51 +99,25 @@
 				(plus "*")
 				(minus "*")])
 
-(define_expand "sync_<sync_optab>si"
-  [(match_operand:SI 0 "memory_operand")
-   (match_operand:SI 1 "s_register_operand")
-   (syncop:SI (match_dup 0) (match_dup 1))]
-  "TARGET_HAVE_LDREX && TARGET_HAVE_MEMORY_BARRIER"
-  {
-    struct arm_sync_generator generator;
-    generator.op = arm_sync_generator_omn;
-    generator.u.omn = gen_arm_sync_new_<sync_optab>si;
-    arm_expand_sync (SImode, &generator, NULL, operands[0], NULL, operands[1]);
-    DONE;
-  })
-
-(define_expand "sync_nandsi"
-  [(match_operand:SI 0 "memory_operand")
-   (match_operand:SI 1 "s_register_operand")
-   (not:SI (and:SI (match_dup 0) (match_dup 1)))]
-  "TARGET_HAVE_LDREX && TARGET_HAVE_MEMORY_BARRIER"
-  {
-    struct arm_sync_generator generator;
-    generator.op = arm_sync_generator_omn;
-    generator.u.omn = gen_arm_sync_new_nandsi;
-    arm_expand_sync (SImode, &generator, NULL, operands[0], NULL, operands[1]);
-    DONE;
-  })
-
 (define_expand "sync_<sync_optab><mode>"
-  [(match_operand:NARROW 0 "memory_operand")
-   (match_operand:NARROW 1 "s_register_operand")
-   (syncop:NARROW (match_dup 0) (match_dup 1))]
-  "TARGET_HAVE_LDREXBHD && TARGET_HAVE_MEMORY_BARRIER"
+  [(match_operand:QHSD 0 "memory_operand")
+   (match_operand:QHSD 1 "s_register_operand")
+   (syncop:QHSD (match_dup 0) (match_dup 1))]
+  "<sync_predtab>"
   {
     struct arm_sync_generator generator;
     generator.op = arm_sync_generator_omn;
     generator.u.omn = gen_arm_sync_new_<sync_optab><mode>;
     arm_expand_sync (<MODE>mode, &generator, NULL, operands[0], NULL,
-    		     operands[1]);
+		     operands[1]);
     DONE;
   })
 
 (define_expand "sync_nand<mode>"
-  [(match_operand:NARROW 0 "memory_operand")
-   (match_operand:NARROW 1 "s_register_operand")
-   (not:NARROW (and:NARROW (match_dup 0) (match_dup 1)))]
-  "TARGET_HAVE_LDREXBHD && TARGET_HAVE_MEMORY_BARRIER"
+  [(match_operand:QHSD 0 "memory_operand")
+   (match_operand:QHSD 1 "s_register_operand")
+   (not:QHSD (and:QHSD (match_dup 0) (match_dup 1)))]
+  "<sync_predtab>"
   {
     struct arm_sync_generator generator;
     generator.op = arm_sync_generator_omn;
@@ -169,57 +127,27 @@
     DONE;
   })
 
-(define_expand "sync_new_<sync_optab>si"
-  [(match_operand:SI 0 "s_register_operand")
-   (match_operand:SI 1 "memory_operand")
-   (match_operand:SI 2 "s_register_operand")
-   (syncop:SI (match_dup 1) (match_dup 2))]
-  "TARGET_HAVE_LDREX && TARGET_HAVE_MEMORY_BARRIER"
-  {
-    struct arm_sync_generator generator;
-    generator.op = arm_sync_generator_omn;
-    generator.u.omn = gen_arm_sync_new_<sync_optab>si;
-    arm_expand_sync (SImode, &generator, operands[0], operands[1], NULL,
-                     operands[2]);
-    DONE;
-  })
-
-(define_expand "sync_new_nandsi"
-  [(match_operand:SI 0 "s_register_operand")
-   (match_operand:SI 1 "memory_operand")
-   (match_operand:SI 2 "s_register_operand")
-   (not:SI (and:SI (match_dup 1) (match_dup 2)))]
-  "TARGET_HAVE_LDREX && TARGET_HAVE_MEMORY_BARRIER"
-  {
-    struct arm_sync_generator generator;
-    generator.op = arm_sync_generator_omn;
-    generator.u.omn = gen_arm_sync_new_nandsi;
-    arm_expand_sync (SImode, &generator, operands[0], operands[1], NULL,
-    		     operands[2]);
-    DONE;
-  })
-
 (define_expand "sync_new_<sync_optab><mode>"
-  [(match_operand:NARROW 0 "s_register_operand")
-   (match_operand:NARROW 1 "memory_operand")
-   (match_operand:NARROW 2 "s_register_operand")
-   (syncop:NARROW (match_dup 1) (match_dup 2))]
-  "TARGET_HAVE_LDREXBHD && TARGET_HAVE_MEMORY_BARRIER"
+  [(match_operand:QHSD 0 "s_register_operand")
+   (match_operand:QHSD 1 "memory_operand")
+   (match_operand:QHSD 2 "s_register_operand")
+   (syncop:QHSD (match_dup 1) (match_dup 2))]
+  "<sync_predtab>"
   {
     struct arm_sync_generator generator;
     generator.op = arm_sync_generator_omn;
     generator.u.omn = gen_arm_sync_new_<sync_optab><mode>;
     arm_expand_sync (<MODE>mode, &generator, operands[0], operands[1],
-    		     NULL, operands[2]);
+		     NULL, operands[2]);
     DONE;
   })
 
 (define_expand "sync_new_nand<mode>"
-  [(match_operand:NARROW 0 "s_register_operand")
-   (match_operand:NARROW 1 "memory_operand")
-   (match_operand:NARROW 2 "s_register_operand")
-   (not:NARROW (and:NARROW (match_dup 1) (match_dup 2)))]
-  "TARGET_HAVE_LDREXBHD && TARGET_HAVE_MEMORY_BARRIER"
+  [(match_operand:QHSD 0 "s_register_operand")
+   (match_operand:QHSD 1 "memory_operand")
+   (match_operand:QHSD 2 "s_register_operand")
+   (not:QHSD (and:QHSD (match_dup 1) (match_dup 2)))]
+  "<sync_predtab>"
   {
     struct arm_sync_generator generator;
     generator.op = arm_sync_generator_omn;
@@ -229,57 +157,27 @@
     DONE;
   });
 
-(define_expand "sync_old_<sync_optab>si"
-  [(match_operand:SI 0 "s_register_operand")
-   (match_operand:SI 1 "memory_operand")
-   (match_operand:SI 2 "s_register_operand")
-   (syncop:SI (match_dup 1) (match_dup 2))]
-  "TARGET_HAVE_LDREX && TARGET_HAVE_MEMORY_BARRIER"
-  {
-    struct arm_sync_generator generator;
-    generator.op = arm_sync_generator_omn;
-    generator.u.omn = gen_arm_sync_old_<sync_optab>si;
-    arm_expand_sync (SImode, &generator, operands[0], operands[1], NULL,
-                     operands[2]);
-    DONE;
-  })
-
-(define_expand "sync_old_nandsi"
-  [(match_operand:SI 0 "s_register_operand")
-   (match_operand:SI 1 "memory_operand")
-   (match_operand:SI 2 "s_register_operand")
-   (not:SI (and:SI (match_dup 1) (match_dup 2)))]
-  "TARGET_HAVE_LDREX && TARGET_HAVE_MEMORY_BARRIER"
-  {
-    struct arm_sync_generator generator;
-    generator.op = arm_sync_generator_omn;
-    generator.u.omn = gen_arm_sync_old_nandsi;
-    arm_expand_sync (SImode, &generator, operands[0], operands[1], NULL,
-                     operands[2]);
-    DONE;
-  })
-
 (define_expand "sync_old_<sync_optab><mode>"
-  [(match_operand:NARROW 0 "s_register_operand")
-   (match_operand:NARROW 1 "memory_operand")
-   (match_operand:NARROW 2 "s_register_operand")
-   (syncop:NARROW (match_dup 1) (match_dup 2))]
-  "TARGET_HAVE_LDREXBHD && TARGET_HAVE_MEMORY_BARRIER"
+  [(match_operand:QHSD 0 "s_register_operand")
+   (match_operand:QHSD 1 "memory_operand")
+   (match_operand:QHSD 2 "s_register_operand")
+   (syncop:QHSD (match_dup 1) (match_dup 2))]
+  "<sync_predtab>"
   {
     struct arm_sync_generator generator;
     generator.op = arm_sync_generator_omn;
     generator.u.omn = gen_arm_sync_old_<sync_optab><mode>;
     arm_expand_sync (<MODE>mode, &generator, operands[0], operands[1],
-    		     NULL, operands[2]);
+		     NULL, operands[2]);
     DONE;
   })
 
 (define_expand "sync_old_nand<mode>"
-  [(match_operand:NARROW 0 "s_register_operand")
-   (match_operand:NARROW 1 "memory_operand")
-   (match_operand:NARROW 2 "s_register_operand")
-   (not:NARROW (and:NARROW (match_dup 1) (match_dup 2)))]
-  "TARGET_HAVE_LDREXBHD && TARGET_HAVE_MEMORY_BARRIER"
+  [(match_operand:QHSD 0 "s_register_operand")
+   (match_operand:QHSD 1 "memory_operand")
+   (match_operand:QHSD 2 "s_register_operand")
+   (not:QHSD (and:QHSD (match_dup 1) (match_dup 2)))]
+  "<sync_predtab>"
   {
     struct arm_sync_generator generator;
     generator.op = arm_sync_generator_omn;
@@ -289,22 +187,22 @@
     DONE;
   })
 
-(define_insn "arm_sync_compare_and_swapsi"
-  [(set (match_operand:SI 0 "s_register_operand" "=&r")
-        (unspec_volatile:SI
-	  [(match_operand:SI 1 "arm_sync_memory_operand" "+Q")
-   	   (match_operand:SI 2 "s_register_operand" "r")
-	   (match_operand:SI 3 "s_register_operand" "r")]
-	  VUNSPEC_SYNC_COMPARE_AND_SWAP))
-   (set (match_dup 1) (unspec_volatile:SI [(match_dup 2)]
+(define_insn "arm_sync_compare_and_swap<mode>"
+  [(set (match_operand:SIDI 0 "s_register_operand" "=&r")
+        (unspec_volatile:SIDI
+	 [(match_operand:SIDI 1 "arm_sync_memory_operand" "+Q")
+	  (match_operand:SIDI 2 "s_register_operand" "r")
+	  (match_operand:SIDI 3 "s_register_operand" "r")]
+	 VUNSPEC_SYNC_COMPARE_AND_SWAP))
+   (set (match_dup 1) (unspec_volatile:SIDI [(match_dup 2)]
                                           VUNSPEC_SYNC_COMPARE_AND_SWAP))
    (set (reg:CC CC_REGNUM) (unspec_volatile:CC [(match_dup 1)]
                                                 VUNSPEC_SYNC_COMPARE_AND_SWAP))
    ]
-  "TARGET_HAVE_LDREX && TARGET_HAVE_MEMORY_BARRIER"
+  "<sync_predtab>"
   {
     return arm_output_sync_insn (insn, operands);
-  } 
+  }
   [(set_attr "sync_result"          "0")
    (set_attr "sync_memory"          "1")
    (set_attr "sync_required_value"  "2")
@@ -318,7 +216,7 @@
         (zero_extend:SI
 	  (unspec_volatile:NARROW
 	    [(match_operand:NARROW 1 "arm_sync_memory_operand" "+Q")
-   	     (match_operand:SI 2 "s_register_operand" "r")
+	     (match_operand:SI 2 "s_register_operand" "r")
 	     (match_operand:SI 3 "s_register_operand" "r")]
 	    VUNSPEC_SYNC_COMPARE_AND_SWAP)))
    (set (match_dup 1) (unspec_volatile:NARROW [(match_dup 2)]
@@ -326,10 +224,10 @@
    (set (reg:CC CC_REGNUM) (unspec_volatile:CC [(match_dup 1)]
                                                 VUNSPEC_SYNC_COMPARE_AND_SWAP))
    ]
-  "TARGET_HAVE_LDREXBHD && TARGET_HAVE_MEMORY_BARRIER"
+  "TARGET_HAVE_LDREXBH && TARGET_HAVE_MEMORY_BARRIER"
   {
     return arm_output_sync_insn (insn, operands);
-  } 
+  }
   [(set_attr "sync_result"          "0")
    (set_attr "sync_memory"          "1")
    (set_attr "sync_required_value"  "2")
@@ -338,18 +236,18 @@
    (set_attr "conds" "clob")
    (set_attr "predicable" "no")])
 
-(define_insn "arm_sync_lock_test_and_setsi"
-  [(set (match_operand:SI 0 "s_register_operand" "=&r")
-        (match_operand:SI 1 "arm_sync_memory_operand" "+Q"))
+(define_insn "arm_sync_lock_test_and_set<mode>"
+  [(set (match_operand:SIDI 0 "s_register_operand" "=&r")
+	(match_operand:SIDI 1 "arm_sync_memory_operand" "+Q"))
    (set (match_dup 1)
-        (unspec_volatile:SI [(match_operand:SI 2 "s_register_operand" "r")]
-	                    VUNSPEC_SYNC_LOCK))
+	(unspec_volatile:SIDI [(match_operand:SIDI 2 "s_register_operand" "r")]
+	VUNSPEC_SYNC_LOCK))
    (clobber (reg:CC CC_REGNUM))
    (clobber (match_scratch:SI 3 "=&r"))]
-  "TARGET_HAVE_LDREX && TARGET_HAVE_MEMORY_BARRIER"
+  "<sync_predtab>"
   {
     return arm_output_sync_insn (insn, operands);
-  } 
+  }
   [(set_attr "sync_release_barrier" "no")
    (set_attr "sync_result"          "0")
    (set_attr "sync_memory"          "1")
@@ -364,10 +262,10 @@
         (zero_extend:SI (match_operand:NARROW 1 "arm_sync_memory_operand" "+Q")))
    (set (match_dup 1)
         (unspec_volatile:NARROW [(match_operand:SI 2 "s_register_operand" "r")]
-	                        VUNSPEC_SYNC_LOCK))
+				VUNSPEC_SYNC_LOCK))
    (clobber (reg:CC CC_REGNUM))
    (clobber (match_scratch:SI 3 "=&r"))]
-  "TARGET_HAVE_LDREX && TARGET_HAVE_MEMORY_BARRIER"
+  "TARGET_HAVE_LDREXBH && TARGET_HAVE_MEMORY_BARRIER"
   {
     return arm_output_sync_insn (insn, operands);
   } 
@@ -380,22 +278,22 @@
    (set_attr "conds" "clob")
    (set_attr "predicable" "no")])
 
-(define_insn "arm_sync_new_<sync_optab>si"
-  [(set (match_operand:SI 0 "s_register_operand" "=&r")
-        (unspec_volatile:SI [(syncop:SI
-                               (match_operand:SI 1 "arm_sync_memory_operand" "+Q")
-                               (match_operand:SI 2 "s_register_operand" "r"))
-	                    ]
-	                    VUNSPEC_SYNC_NEW_OP))
+(define_insn "arm_sync_new_<sync_optab><mode>"
+  [(set (match_operand:SIDI 0 "s_register_operand" "=&r")
+        (unspec_volatile:SIDI [(syncop:SIDI
+			       (match_operand:SIDI 1 "arm_sync_memory_operand" "+Q")
+			       (match_operand:SIDI 2 "s_register_operand" "r"))
+			    ]
+			    VUNSPEC_SYNC_NEW_OP))
    (set (match_dup 1)
-        (unspec_volatile:SI [(match_dup 1) (match_dup 2)]
-	                    VUNSPEC_SYNC_NEW_OP))
+	(unspec_volatile:SIDI [(match_dup 1) (match_dup 2)]
+			    VUNSPEC_SYNC_NEW_OP))
    (clobber (reg:CC CC_REGNUM))
    (clobber (match_scratch:SI 3 "=&r"))]
-  "TARGET_HAVE_LDREX && TARGET_HAVE_MEMORY_BARRIER"
+  "<sync_predtab>"
   {
     return arm_output_sync_insn (insn, operands);
-  } 
+  }
   [(set_attr "sync_result"          "0")
    (set_attr "sync_memory"          "1")
    (set_attr "sync_new_value"       "2")
@@ -405,54 +303,54 @@
    (set_attr "conds" "clob")
    (set_attr "predicable" "no")])
 
-(define_insn "arm_sync_new_nandsi"
+(define_insn "arm_sync_new_<sync_optab><mode>"
   [(set (match_operand:SI 0 "s_register_operand" "=&r")
-        (unspec_volatile:SI [(not:SI (and:SI
-                               (match_operand:SI 1 "arm_sync_memory_operand" "+Q")
-                               (match_operand:SI 2 "s_register_operand" "r")))
-	                    ]
-	                    VUNSPEC_SYNC_NEW_OP))
+        (unspec_volatile:SI [(syncop:SI
+			       (zero_extend:SI
+				 (match_operand:NARROW 1 "arm_sync_memory_operand" "+Q"))
+			       (match_operand:SI 2 "s_register_operand" "r"))
+			    ]
+			    VUNSPEC_SYNC_NEW_OP))
    (set (match_dup 1)
-        (unspec_volatile:SI [(match_dup 1) (match_dup 2)]
-	                    VUNSPEC_SYNC_NEW_OP))
+	(unspec_volatile:NARROW [(match_dup 1) (match_dup 2)]
+				VUNSPEC_SYNC_NEW_OP))
    (clobber (reg:CC CC_REGNUM))
    (clobber (match_scratch:SI 3 "=&r"))]
-  "TARGET_HAVE_LDREX && TARGET_HAVE_MEMORY_BARRIER"
+  "TARGET_HAVE_LDREXBH && TARGET_HAVE_MEMORY_BARRIER"
   {
     return arm_output_sync_insn (insn, operands);
-  } 
+  }
   [(set_attr "sync_result"          "0")
    (set_attr "sync_memory"          "1")
    (set_attr "sync_new_value"       "2")
    (set_attr "sync_t1"              "0")
    (set_attr "sync_t2"              "3")
-   (set_attr "sync_op"              "nand")
+   (set_attr "sync_op"              "<sync_optab>")
    (set_attr "conds" "clob")
    (set_attr "predicable" "no")])
 
-(define_insn "arm_sync_new_<sync_optab><mode>"
-  [(set (match_operand:SI 0 "s_register_operand" "=&r")
-        (unspec_volatile:SI [(syncop:SI
-                               (zero_extend:SI
-			         (match_operand:NARROW 1 "arm_sync_memory_operand" "+Q"))
-                               (match_operand:SI 2 "s_register_operand" "r"))
-	                    ]
-	                    VUNSPEC_SYNC_NEW_OP))
+(define_insn "arm_sync_new_nand<mode>"
+  [(set (match_operand:SIDI 0 "s_register_operand" "=&r")
+        (unspec_volatile:SIDI [(not:SIDI (and:SIDI
+			       (match_operand:SIDI 1 "arm_sync_memory_operand" "+Q")
+			       (match_operand:SIDI 2 "s_register_operand" "r")))
+			    ]
+			    VUNSPEC_SYNC_NEW_OP))
    (set (match_dup 1)
-        (unspec_volatile:NARROW [(match_dup 1) (match_dup 2)]
-	                        VUNSPEC_SYNC_NEW_OP))
+	(unspec_volatile:SIDI [(match_dup 1) (match_dup 2)]
+			    VUNSPEC_SYNC_NEW_OP))
    (clobber (reg:CC CC_REGNUM))
    (clobber (match_scratch:SI 3 "=&r"))]
-  "TARGET_HAVE_LDREXBHD && TARGET_HAVE_MEMORY_BARRIER"
+  "<sync_predtab>"
   {
     return arm_output_sync_insn (insn, operands);
-  } 
+  }
   [(set_attr "sync_result"          "0")
    (set_attr "sync_memory"          "1")
    (set_attr "sync_new_value"       "2")
    (set_attr "sync_t1"              "0")
    (set_attr "sync_t2"              "3")
-   (set_attr "sync_op"              "<sync_optab>")
+   (set_attr "sync_op"              "nand")
    (set_attr "conds" "clob")
    (set_attr "predicable" "no")])
 
@@ -461,19 +359,19 @@
         (unspec_volatile:SI
 	  [(not:SI
 	     (and:SI
-               (zero_extend:SI	  
-	         (match_operand:NARROW 1 "arm_sync_memory_operand" "+Q"))
-               (match_operand:SI 2 "s_register_operand" "r")))
+	       (zero_extend:SI	  
+		 (match_operand:NARROW 1 "arm_sync_memory_operand" "+Q"))
+	       (match_operand:SI 2 "s_register_operand" "r")))
 	  ] VUNSPEC_SYNC_NEW_OP))
    (set (match_dup 1)
         (unspec_volatile:NARROW [(match_dup 1) (match_dup 2)]
-	                        VUNSPEC_SYNC_NEW_OP))
+				VUNSPEC_SYNC_NEW_OP))
    (clobber (reg:CC CC_REGNUM))
    (clobber (match_scratch:SI 3 "=&r"))]
-  "TARGET_HAVE_LDREX && TARGET_HAVE_MEMORY_BARRIER"
+  "TARGET_HAVE_LDREXBH && TARGET_HAVE_MEMORY_BARRIER"
   {
     return arm_output_sync_insn (insn, operands);
-  } 
+  }
   [(set_attr "sync_result"          "0")
    (set_attr "sync_memory"          "1")
    (set_attr "sync_new_value"       "2")
@@ -483,20 +381,20 @@
    (set_attr "conds" "clob")
    (set_attr "predicable" "no")])
 
-(define_insn "arm_sync_old_<sync_optab>si"
-  [(set (match_operand:SI 0 "s_register_operand" "=&r")
-        (unspec_volatile:SI [(syncop:SI
-                               (match_operand:SI 1 "arm_sync_memory_operand" "+Q")
-                               (match_operand:SI 2 "s_register_operand" "r"))
-	                    ]
-	                    VUNSPEC_SYNC_OLD_OP))
+(define_insn "arm_sync_old_<sync_optab><mode>"
+  [(set (match_operand:SIDI 0 "s_register_operand" "=&r")
+	(unspec_volatile:SIDI [(syncop:SIDI
+			       (match_operand:SIDI 1 "arm_sync_memory_operand" "+Q")
+			       (match_operand:SIDI 2 "s_register_operand" "r"))
+			    ]
+			    VUNSPEC_SYNC_OLD_OP))
    (set (match_dup 1)
-        (unspec_volatile:SI [(match_dup 1) (match_dup 2)]
-	                    VUNSPEC_SYNC_OLD_OP))
+        (unspec_volatile:SIDI [(match_dup 1) (match_dup 2)]
+			      VUNSPEC_SYNC_OLD_OP))
    (clobber (reg:CC CC_REGNUM))
-   (clobber (match_scratch:SI 3 "=&r"))
+   (clobber (match_scratch:SIDI 3 "=&r"))
    (clobber (match_scratch:SI 4 "<sync_clobber>"))]
-  "TARGET_HAVE_LDREX && TARGET_HAVE_MEMORY_BARRIER"
+  "<sync_predtab>"
   {
     return arm_output_sync_insn (insn, operands);
   } 
@@ -509,20 +407,21 @@
    (set_attr "conds" "clob")
    (set_attr "predicable" "no")])
 
-(define_insn "arm_sync_old_nandsi"
+(define_insn "arm_sync_old_<sync_optab><mode>"
   [(set (match_operand:SI 0 "s_register_operand" "=&r")
-        (unspec_volatile:SI [(not:SI (and:SI
-                               (match_operand:SI 1 "arm_sync_memory_operand" "+Q")
-                               (match_operand:SI 2 "s_register_operand" "r")))
-	                    ]
-	                    VUNSPEC_SYNC_OLD_OP))
+        (unspec_volatile:SI [(syncop:SI
+			       (zero_extend:SI
+				 (match_operand:NARROW 1 "arm_sync_memory_operand" "+Q"))
+			       (match_operand:SI 2 "s_register_operand" "r"))
+			    ]
+			    VUNSPEC_SYNC_OLD_OP))
    (set (match_dup 1)
-        (unspec_volatile:SI [(match_dup 1) (match_dup 2)]
-	                    VUNSPEC_SYNC_OLD_OP))
+	(unspec_volatile:NARROW [(match_dup 1) (match_dup 2)]
+			    VUNSPEC_SYNC_OLD_OP))
    (clobber (reg:CC CC_REGNUM))
    (clobber (match_scratch:SI 3 "=&r"))
-   (clobber (match_scratch:SI 4 "=&r"))]
-  "TARGET_HAVE_LDREX && TARGET_HAVE_MEMORY_BARRIER"
+   (clobber (match_scratch:SI 4 "<sync_clobber>"))]
+  "TARGET_HAVE_LDREXBH && TARGET_HAVE_MEMORY_BARRIER"
   {
     return arm_output_sync_insn (insn, operands);
   } 
@@ -530,26 +429,25 @@
    (set_attr "sync_memory"          "1")
    (set_attr "sync_new_value"       "2")
    (set_attr "sync_t1"              "3")
-   (set_attr "sync_t2"              "4")
-   (set_attr "sync_op"              "nand")
+   (set_attr "sync_t2"              "<sync_t2_reqd>")
+   (set_attr "sync_op"              "<sync_optab>")
    (set_attr "conds" 		    "clob")
    (set_attr "predicable" "no")])
 
-(define_insn "arm_sync_old_<sync_optab><mode>"
-  [(set (match_operand:SI 0 "s_register_operand" "=&r")
-        (unspec_volatile:SI [(syncop:SI
-                               (zero_extend:SI
-			         (match_operand:NARROW 1 "arm_sync_memory_operand" "+Q"))
-                               (match_operand:SI 2 "s_register_operand" "r"))
-	                    ]
-	                    VUNSPEC_SYNC_OLD_OP))
+(define_insn "arm_sync_old_nand<mode>"
+  [(set (match_operand:SIDI 0 "s_register_operand" "=&r")
+	(unspec_volatile:SIDI [(not:SIDI (and:SIDI
+			       (match_operand:SIDI 1 "arm_sync_memory_operand" "+Q")
+			       (match_operand:SIDI 2 "s_register_operand" "r")))
+			    ]
+			    VUNSPEC_SYNC_OLD_OP))
    (set (match_dup 1)
-        (unspec_volatile:NARROW [(match_dup 1) (match_dup 2)]
+        (unspec_volatile:SIDI [(match_dup 1) (match_dup 2)]
 	                    VUNSPEC_SYNC_OLD_OP))
    (clobber (reg:CC CC_REGNUM))
-   (clobber (match_scratch:SI 3 "=&r"))
-   (clobber (match_scratch:SI 4 "<sync_clobber>"))]
-  "TARGET_HAVE_LDREXBHD && TARGET_HAVE_MEMORY_BARRIER"
+   (clobber (match_scratch:SIDI 3 "=&r"))
+   (clobber (match_scratch:SI 4 "=&r"))]
+  "<sync_predtab>"
   {
     return arm_output_sync_insn (insn, operands);
   } 
@@ -557,26 +455,26 @@
    (set_attr "sync_memory"          "1")
    (set_attr "sync_new_value"       "2")
    (set_attr "sync_t1"              "3")
-   (set_attr "sync_t2"              "<sync_t2_reqd>")
-   (set_attr "sync_op"              "<sync_optab>")
+   (set_attr "sync_t2"              "4")
+   (set_attr "sync_op"              "nand")
    (set_attr "conds" 		    "clob")
    (set_attr "predicable" "no")])
 
 (define_insn "arm_sync_old_nand<mode>"
   [(set (match_operand:SI 0 "s_register_operand" "=&r")
-        (unspec_volatile:SI [(not:SI (and:SI
-                               (zero_extend:SI
-			         (match_operand:NARROW 1 "arm_sync_memory_operand" "+Q"))
-                               (match_operand:SI 2 "s_register_operand" "r")))
-	                    ]
-	                    VUNSPEC_SYNC_OLD_OP))
+	(unspec_volatile:SI [(not:SI (and:SI
+			       (zero_extend:SI
+				 (match_operand:NARROW 1 "arm_sync_memory_operand" "+Q"))
+			       (match_operand:SI 2 "s_register_operand" "r")))
+			    ]
+			    VUNSPEC_SYNC_OLD_OP))
    (set (match_dup 1)
-        (unspec_volatile:NARROW [(match_dup 1) (match_dup 2)]
-	                    VUNSPEC_SYNC_OLD_OP))
+	(unspec_volatile:NARROW [(match_dup 1) (match_dup 2)]
+			    VUNSPEC_SYNC_OLD_OP))
    (clobber (reg:CC CC_REGNUM))
    (clobber (match_scratch:SI 3 "=&r"))
    (clobber (match_scratch:SI 4 "=&r"))]
-  "TARGET_HAVE_LDREXBHD && TARGET_HAVE_MEMORY_BARRIER"
+  "TARGET_HAVE_LDREXBH && TARGET_HAVE_MEMORY_BARRIER"
   {
     return arm_output_sync_insn (insn, operands);
   } 
diff --git a/gcc/testsuite/gcc.dg/di-longlong64-sync-1.c b/gcc/testsuite/gcc.dg/di-longlong64-sync-1.c
new file mode 100644
index 0000000..5d91dfc
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/di-longlong64-sync-1.c
@@ -0,0 +1,161 @@
+/* { dg-do run } */
+/* { dg-require-effective-target sync_longlong } */
+/* { dg-options "-std=gnu99" } */
+/* { dg-message "note: '__sync_fetch_and_nand' changed semantics in GCC 4.4" "" { target *-*-* } 0 } */
+/* { dg-message "note: '__sync_nand_and_fetch' changed semantics in GCC 4.4" "" { target *-*-* } 0 } */
+
+
+/* Test basic functionality of the intrinsics.  The operations should
+   not be optimized away if no one checks the return values.  */
+
+/* Based on ia64-sync-[12].c, but 1) long on ARM is 32 bit so use long long
+   (an explicit 64bit type maybe a better bet) and 2) Use values that cross
+   the 32bit boundary and cause carries since the actual maths are done as
+   pairs of 32 bit instructions.  */
+__extension__ typedef __SIZE_TYPE__ size_t;
+
+extern void abort (void);
+extern void *memcpy (void *, const void *, size_t);
+extern int memcmp (const void *, const void *, size_t);
+
+/* Temporary space where the work actually gets done */
+static long long AL[24];
+/* Values copied into AL before we start */
+static long long init_di[24] = { 0x100000002ll, 0x200000003ll, 0, 1,
+
+				 0x100000002ll, 0x100000002ll,
+				 0x100000002ll, 0x100000002ll,
+
+				 0, 0x1000e0de0000ll,
+				 42 , 0xc001c0de0000ll,
+
+				 -1ll, 0, 0xff00ff0000ll, -1ll,
+
+				 0, 0x1000e0de0000ll,
+				 42 , 0xc001c0de0000ll,
+
+				 -1ll, 0, 0xff00ff0000ll, -1ll};
+/* This is what should be in AL at the end */
+static long long test_di[24] = { 0x1234567890ll, 0x1234567890ll, 1, 0,
+
+				 0x100000002ll, 0x100000002ll,
+				 0x100000002ll, 0x100000002ll,
+
+				 1, 0xc001c0de0000ll,
+				 20, 0x1000e0de0000ll,
+
+				 0x300000007ll , 0x500000009ll,
+				 0xf100ff0001ll, ~0xa00000007ll,
+
+				 1, 0xc001c0de0000ll,
+				 20, 0x1000e0de0000ll,
+
+				 0x300000007ll , 0x500000009ll,
+				 0xf100ff0001ll, ~0xa00000007ll };
+
+/* First check they work in terms of what they do to memory */
+static void
+do_noret_di (void)
+{
+  __sync_val_compare_and_swap(AL+0, 0x100000002ll, 0x1234567890ll);
+  __sync_bool_compare_and_swap(AL+1, 0x200000003ll, 0x1234567890ll);
+  __sync_lock_test_and_set(AL+2, 1);
+  __sync_lock_release(AL+3);
+
+  /* The following tests should not change the value since the
+     original does NOT match
+   */
+  __sync_val_compare_and_swap(AL+4, 0x000000002ll, 0x1234567890ll);
+  __sync_val_compare_and_swap(AL+5, 0x100000000ll, 0x1234567890ll);
+  __sync_bool_compare_and_swap(AL+6, 0x000000002ll, 0x1234567890ll);
+  __sync_bool_compare_and_swap(AL+7, 0x100000000ll, 0x1234567890ll);
+
+  __sync_fetch_and_add(AL+8, 1);
+  __sync_fetch_and_add(AL+9, 0xb000e0000000ll); /* add to both halves & carry */
+  __sync_fetch_and_sub(AL+10, 22);
+  __sync_fetch_and_sub(AL+11, 0xb000e0000000ll);
+
+  __sync_fetch_and_and(AL+12, 0x300000007ll);
+  __sync_fetch_and_or(AL+13, 0x500000009ll);
+  __sync_fetch_and_xor(AL+14, 0xe00000001ll);
+  __sync_fetch_and_nand(AL+15, 0xa00000007ll);
+
+  /* These should be the same as the fetch_and_* cases except for
+     return value */
+  __sync_add_and_fetch(AL+16, 1);
+  __sync_add_and_fetch(AL+17, 0xb000e0000000ll); /* add to both halves & carry */
+  __sync_sub_and_fetch(AL+18, 22);
+  __sync_sub_and_fetch(AL+19, 0xb000e0000000ll);
+
+  __sync_and_and_fetch(AL+20, 0x300000007ll);
+  __sync_or_and_fetch(AL+21, 0x500000009ll);
+  __sync_xor_and_fetch(AL+22, 0xe00000001ll);
+  __sync_nand_and_fetch(AL+23, 0xa00000007ll);
+}
+
+/* Now check return values */
+static void
+do_ret_di (void)
+{
+  if (__sync_val_compare_and_swap(AL+0, 0x100000002ll, 0x1234567890ll) !=
+	0x100000002ll) abort();
+  if (__sync_bool_compare_and_swap(AL+1, 0x200000003ll, 0x1234567890ll) !=
+	1) abort();
+  if (__sync_lock_test_and_set(AL+2, 1) != 0) abort();
+  __sync_lock_release(AL+3); /* no return value, but keep to match results */
+
+  /* The following tests should not change the value since the
+     original does NOT match */
+  if (__sync_val_compare_and_swap(AL+4, 0x000000002ll, 0x1234567890ll) !=
+	0x100000002ll) abort();
+  if (__sync_val_compare_and_swap(AL+5, 0x100000000ll, 0x1234567890ll) !=
+	0x100000002ll) abort();
+  if (__sync_bool_compare_and_swap(AL+6, 0x000000002ll, 0x1234567890ll) !=
+	0) abort();
+  if (__sync_bool_compare_and_swap(AL+7, 0x100000000ll, 0x1234567890ll) !=
+	0) abort();
+
+  if (__sync_fetch_and_add(AL+8, 1) != 0) abort();
+  if (__sync_fetch_and_add(AL+9, 0xb000e0000000ll) != 0x1000e0de0000ll) abort();
+  if (__sync_fetch_and_sub(AL+10, 22) != 42) abort();
+  if (__sync_fetch_and_sub(AL+11, 0xb000e0000000ll) != 0xc001c0de0000ll)
+	abort();
+
+  if (__sync_fetch_and_and(AL+12, 0x300000007ll) != -1ll) abort();
+  if (__sync_fetch_and_or(AL+13, 0x500000009ll) != 0) abort();
+  if (__sync_fetch_and_xor(AL+14, 0xe00000001ll) != 0xff00ff0000ll) abort();
+  if (__sync_fetch_and_nand(AL+15, 0xa00000007ll) != -1ll) abort();
+
+  /* These should be the same as the fetch_and_* cases except for
+     return value */
+  if (__sync_add_and_fetch(AL+16, 1) != 1) abort();
+  if (__sync_add_and_fetch(AL+17, 0xb000e0000000ll) != 0xc001c0de0000ll)
+	abort();
+  if (__sync_sub_and_fetch(AL+18, 22) != 20) abort();
+  if (__sync_sub_and_fetch(AL+19, 0xb000e0000000ll) != 0x1000e0de0000ll)
+	abort();
+
+  if (__sync_and_and_fetch(AL+20, 0x300000007ll) != 0x300000007ll) abort();
+  if (__sync_or_and_fetch(AL+21, 0x500000009ll) != 0x500000009ll) abort();
+  if (__sync_xor_and_fetch(AL+22, 0xe00000001ll) != 0xf100ff0001ll) abort();
+  if (__sync_nand_and_fetch(AL+23, 0xa00000007ll) != ~0xa00000007ll) abort();
+}
+
+int main()
+{
+  memcpy(AL, init_di, sizeof(init_di));
+
+  do_noret_di ();
+
+  if (memcmp (AL, test_di, sizeof(test_di)))
+    abort ();
+
+  memcpy(AL, init_di, sizeof(init_di));
+
+  do_ret_di ();
+
+  if (memcmp (AL, test_di, sizeof(test_di)))
+    abort ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/di-sync-multithread.c b/gcc/testsuite/gcc.dg/di-sync-multithread.c
new file mode 100644
index 0000000..cfa556f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/di-sync-multithread.c
@@ -0,0 +1,191 @@
+/* { dg-do run } */
+/* { dg-require-effective-target sync_longlong } */
+/* { dg-require-effective-target pthread_h } */
+/* { dg-require-effective-target pthread } */
+/* { dg-options "-pthread -std=gnu99" } */
+
+/* test of long long atomic ops performed in parallel in 3 pthreads 
+   david.gilbert@linaro.org */
+
+#include <pthread.h>
+#include <unistd.h>
+
+/*#define DEBUGIT 1 */
+
+#ifdef DEBUGIT
+#include <stdio.h>
+
+#define DOABORT(x,...) { fprintf(stderr, x, __VA_ARGS__); fflush(stderr); abort(); }
+
+#else
+
+#define DOABORT(x,...) abort();
+
+#endif
+
+/* Passed to each thread to describe which bits it is going to work on. */
+struct threadwork {
+  unsigned long long count; /* incremented each time the worker loops */
+  unsigned int thread;    /* ID */
+  unsigned int addlsb;    /* 8 bit */
+  unsigned int logic1lsb; /* 5 bit */
+  unsigned int logic2lsb; /* 8 bit */
+};
+
+/* The shared word where all the atomic work is done */
+static volatile long long workspace;
+
+/* A shared word to tell the workers to quit when non-0 */
+static long long doquit;
+
+extern void abort (void);
+
+/* Note this test doesn't test the return values much */
+void*
+worker (void* data)
+{
+  struct threadwork *tw=(struct threadwork*)data;
+  long long add1bit=1ll << tw->addlsb;
+  long long logic1bit=1ll << tw->logic1lsb;
+  long long logic2bit=1ll << tw->logic2lsb;
+
+  /* Clear the bits we use */
+  __sync_and_and_fetch (&workspace, ~(0xffll * add1bit));
+  __sync_fetch_and_and (&workspace, ~(0x1fll * logic1bit));
+  __sync_fetch_and_and (&workspace, ~(0xffll * logic2bit));
+
+  do
+    {
+      long long tmp1, tmp2, tmp3;
+      /* OK, lets try and do some stuff to the workspace - by the end
+         of the main loop our area should be the same as it is now - i.e. 0 */
+
+      /* Push the arithmetic section upto 128 - one of the threads will
+         case this to carry accross the 32bit boundary */
+      for(tmp2=0; tmp2<64; tmp2++)
+	{
+	  /* Add 2 using the two different adds */
+	  tmp1=__sync_add_and_fetch (&workspace, add1bit);
+	  tmp3=__sync_fetch_and_add (&workspace, add1bit);
+
+	  /* The value should be the intermediate add value in both cases */
+	  if ((tmp1 & (add1bit * 0xff)) != (tmp3 & (add1bit * 0xff)))
+	    DOABORT("Mismatch of add intermediates on thread %d workspace=0x%llx tmp1=0x%llx tmp2=0x%llx tmp3=0x%llx\n",
+			 tw->thread, workspace, tmp1, tmp2, tmp3);
+	}
+
+      /* Set the logic bits */
+      tmp2=__sync_or_and_fetch (&workspace,
+			  0x1fll * logic1bit | 0xffll * logic2bit);
+
+      /* Check the logic bits are set and the arithmetic value is correct */
+      if ((tmp2 & (0x1fll * logic1bit | 0xffll * logic2bit | 0xffll * add1bit))
+	  != (0x1fll * logic1bit | 0xffll * logic2bit | 0x80ll * add1bit))
+	DOABORT("Midloop check failed on thread %d workspace=0x%llx tmp2=0x%llx masktmp2=0x%llx expected=0x%llx\n",
+		tw->thread, workspace, tmp2,
+		tmp2 & (0x1fll * logic1bit | 0xffll * logic2bit | 0xffll * add1bit),
+		(0x1fll * logic1bit | 0xffll * logic2bit | 0x80ll * add1bit));
+
+      /* Pull the arithmetic set back down to 0 - again this should cause a
+	 carry across the 32bit boundary in one thread */
+
+      for(tmp2=0; tmp2<64; tmp2++)
+	{
+	  /* Subtract 2 using the two different subs */
+	  tmp1=__sync_sub_and_fetch (&workspace, add1bit);
+	  tmp3=__sync_fetch_and_sub (&workspace, add1bit);
+
+	  /* The value should be the intermediate sub value in both cases */
+	  if ((tmp1 & (add1bit * 0xff)) != (tmp3 & (add1bit * 0xff)))
+	    DOABORT("Mismatch of sub intermediates on thread %d workspace=0x%llx tmp1=0x%llx tmp2=0x%llx tmp3=0x%llx\n",
+			tw->thread, workspace, tmp1, tmp2, tmp3);
+	}
+
+
+      /* Clear the logic bits */
+      __sync_fetch_and_xor (&workspace, 0x1fll * logic1bit);
+      tmp3=__sync_and_and_fetch (&workspace, ~(0xffll * logic2bit));
+
+      /* And so the logic bits and the arithmetic bits should be zero again */
+      if (tmp3 & (0x1fll * logic1bit | 0xffll * logic2bit | 0xffll * add1bit))
+	DOABORT("End of worker loop; bits none 0 on thread %d workspace=0x%llx tmp3=0x%llx mask=0x%llx maskedtmp3=0x%llx\n",
+		tw->thread, workspace, tmp3, (0x1fll * logic1bit | 0xffll * logic2bit | 0xffll * add1bit),
+		tmp3 & (0x1fll * logic1bit | 0xffll * logic2bit | 0xffll * add1bit));
+
+      __sync_add_and_fetch (&tw->count, 1);
+    }
+  while (!__sync_bool_compare_and_swap (&doquit, 1, 1));
+
+  pthread_exit (0);
+}
+
+int
+main ()
+{
+  /* We have 3 threads doing three sets of operations, an 8 bit
+     arithmetic field, a 5 bit logic field and an 8 bit logic
+     field (just to pack them all in).
+
+  6      5       4       4       3       2       1                
+  3      6       8       0       2       4       6       8       0
+  |...,...|...,...|...,...|...,...|...,...|...,...|...,...|...,...
+  - T0   --  T1  -- T2   --T2 --  T0  -*- T2-- T1-- T1   -***- T0-
+   logic2  logic2  arith   log2  arith  log1 log1  arith     log1
+
+  */
+  unsigned int t;
+  long long tmp;
+  int err;
+
+  struct threadwork tw[3]={
+    { 0ll, 0, 27, 0, 56 },
+    { 0ll, 1,  8,16, 48 },
+    { 0ll, 2, 40,21, 35 }
+  };
+
+  pthread_t threads[3];
+
+  __sync_lock_release (&doquit);
+
+  /* Get the work space into a known value - All 1's */
+  __sync_lock_release (&workspace); /* Now all 0 */
+  tmp = __sync_val_compare_and_swap (&workspace, 0, -1ll);
+  if (tmp!=0)
+    DOABORT("Initial __sync_val_compare_and_swap wasn't 0 workspace=0x%llx tmp=0x%llx\n", workspace,tmp);
+
+  for(t=0; t<3; t++)
+  {
+    err=pthread_create (&threads[t], NULL , worker, &tw[t]);
+    if (err) DOABORT("pthread_create failed on thread %d with error %d\n", t, err);
+  };
+
+  sleep (5);
+
+  /* Stop please */
+  __sync_lock_test_and_set (&doquit, 1ll);
+
+  for(t=0;t<3;t++)
+    {
+      err=pthread_join (threads[t], NULL);
+      if (err)
+	DOABORT("pthread_join failed on thread %d with error %d\n", t, err);
+    };
+
+  __sync_synchronize ();
+
+  /* OK, so all the workers have finished -
+     the workers should have zero'd their workspace, the unused areas
+     should still be 1
+  */
+  if (!__sync_bool_compare_and_swap (&workspace, 0x040000e0ll, 0))
+    DOABORT("End of run workspace mismatch, got %llx\n", workspace);
+
+  /* All the workers should have done some work */
+  for(t=0; t<3; t++)
+    {
+      if (tw[t].count == 0) DOABORT("Worker %d gave 0 count\n", t);
+    };
+
+  return 0;
+}
+
diff --git a/gcc/testsuite/gcc.target/arm/di-longlong64-sync-withhelpers.c b/gcc/testsuite/gcc.target/arm/di-longlong64-sync-withhelpers.c
new file mode 100644
index 0000000..741094b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/di-longlong64-sync-withhelpers.c
@@ -0,0 +1,175 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_arch_v5_ok } */
+/* { dg-options "-std=gnu99" } */
+/* { dg-add-options arm_arch_v5 } */
+/* { dg-message "note: '__sync_fetch_and_nand' changed semantics in GCC 4.4" "" { target *-*-* } 0 } */
+/* { dg-message "note: '__sync_nand_and_fetch' changed semantics in GCC 4.4" "" { target *-*-* } 0 } */
+
+/* This is a copy of di-longlong64-sync-1.c, and is here to check that the
+   assembler doesn't generated ldrexd's and strexd's in when compiled for
+   armv5 in ARM mode
+ */
+
+/* Test basic functionality of the intrinsics.  The operations should
+   not be optimized away if no one checks the return values.  */
+
+/* Based on ia64-sync-[12].c, but 1) long on ARM is 32 bit so I needed
+   to use long long (an explicit 64bit type maybe a better bet) and
+   2) I wanted to use values that cross the 32bit boundary and cause
+   carries since the actual maths are done as pairs of 32 bit instructions.
+ */
+__extension__ typedef __SIZE_TYPE__ size_t;
+
+extern void abort (void);
+extern void *memcpy (void *, const void *, size_t);
+extern int memcmp (const void *, const void *, size_t);
+
+/* Temporary space where the work actually gets done */
+static long long AL[24];
+/* Values copied into AL before we start */
+static long long init_di[24] = { 0x100000002ll, 0x200000003ll, 0, 1,
+
+				 0x100000002ll, 0x100000002ll,
+				 0x100000002ll, 0x100000002ll,
+
+				 0, 0x1000e0de0000ll,
+				 42 , 0xc001c0de0000ll,
+
+				 -1ll, 0, 0xff00ff0000ll, -1ll,
+
+				 0, 0x1000e0de0000ll,
+				 42 , 0xc001c0de0000ll,
+
+				 -1ll, 0, 0xff00ff0000ll, -1ll};
+/* This is what should be in AL at the end */
+static long long test_di[24] = { 0x1234567890ll, 0x1234567890ll, 1, 0,
+
+				 0x100000002ll, 0x100000002ll,
+				 0x100000002ll, 0x100000002ll,
+
+				 1, 0xc001c0de0000ll,
+				 20, 0x1000e0de0000ll,
+
+				 0x300000007ll , 0x500000009ll,
+				 0xf100ff0001ll, ~0xa00000007ll,
+
+				 1, 0xc001c0de0000ll,
+				 20, 0x1000e0de0000ll,
+
+				 0x300000007ll , 0x500000009ll,
+				 0xf100ff0001ll, ~0xa00000007ll };
+
+/* First check they work in terms of what they do to memory */
+static void
+do_noret_di (void)
+{
+  __sync_val_compare_and_swap(AL+0, 0x100000002ll, 0x1234567890ll);
+  __sync_bool_compare_and_swap(AL+1, 0x200000003ll, 0x1234567890ll);
+  __sync_lock_test_and_set(AL+2, 1);
+  __sync_lock_release(AL+3);
+
+  /* The following tests should not change the value since the
+     original does NOT match
+   */
+  __sync_val_compare_and_swap(AL+4, 0x000000002ll, 0x1234567890ll);
+  __sync_val_compare_and_swap(AL+5, 0x100000000ll, 0x1234567890ll);
+  __sync_bool_compare_and_swap(AL+6, 0x000000002ll, 0x1234567890ll);
+  __sync_bool_compare_and_swap(AL+7, 0x100000000ll, 0x1234567890ll);
+
+  __sync_fetch_and_add(AL+8, 1);
+  __sync_fetch_and_add(AL+9, 0xb000e0000000ll); /* add to both halves & carry */
+  __sync_fetch_and_sub(AL+10, 22);
+  __sync_fetch_and_sub(AL+11, 0xb000e0000000ll);
+
+  __sync_fetch_and_and(AL+12, 0x300000007ll);
+  __sync_fetch_and_or(AL+13, 0x500000009ll);
+  __sync_fetch_and_xor(AL+14, 0xe00000001ll);
+  __sync_fetch_and_nand(AL+15, 0xa00000007ll);
+
+  /* These should be the same as the fetch_and_* cases except for
+     return value
+   */
+  __sync_add_and_fetch(AL+16, 1);
+  __sync_add_and_fetch(AL+17, 0xb000e0000000ll); /* add to both halves & carry */
+  __sync_sub_and_fetch(AL+18, 22);
+  __sync_sub_and_fetch(AL+19, 0xb000e0000000ll);
+
+  __sync_and_and_fetch(AL+20, 0x300000007ll);
+  __sync_or_and_fetch(AL+21, 0x500000009ll);
+  __sync_xor_and_fetch(AL+22, 0xe00000001ll);
+  __sync_nand_and_fetch(AL+23, 0xa00000007ll);
+}
+
+/* Now check return values */
+static void
+do_ret_di (void)
+{
+  if (__sync_val_compare_and_swap(AL+0, 0x100000002ll, 0x1234567890ll) !=
+	0x100000002ll) abort();
+  if (__sync_bool_compare_and_swap(AL+1, 0x200000003ll, 0x1234567890ll) !=
+	1) abort();
+  if (__sync_lock_test_and_set(AL+2, 1) != 0) abort();
+  __sync_lock_release(AL+3); /* no return value, but keep to match results */
+
+  /* The following tests should not change the value since the
+     original does NOT match
+   */
+  if (__sync_val_compare_and_swap(AL+4, 0x000000002ll, 0x1234567890ll) !=
+	0x100000002ll) abort();
+  if (__sync_val_compare_and_swap(AL+5, 0x100000000ll, 0x1234567890ll) !=
+	0x100000002ll) abort();
+  if (__sync_bool_compare_and_swap(AL+6, 0x000000002ll, 0x1234567890ll) !=
+	0) abort();
+  if (__sync_bool_compare_and_swap(AL+7, 0x100000000ll, 0x1234567890ll) !=
+	0) abort();
+
+  if (__sync_fetch_and_add(AL+8, 1) != 0) abort();
+  if (__sync_fetch_and_add(AL+9, 0xb000e0000000ll) != 0x1000e0de0000ll) abort();
+  if (__sync_fetch_and_sub(AL+10, 22) != 42) abort();
+  if (__sync_fetch_and_sub(AL+11, 0xb000e0000000ll) != 0xc001c0de0000ll)
+	abort();
+
+  if (__sync_fetch_and_and(AL+12, 0x300000007ll) != -1ll) abort();
+  if (__sync_fetch_and_or(AL+13, 0x500000009ll) != 0) abort();
+  if (__sync_fetch_and_xor(AL+14, 0xe00000001ll) != 0xff00ff0000ll) abort();
+  if (__sync_fetch_and_nand(AL+15, 0xa00000007ll) != -1ll) abort();
+
+  /* These should be the same as the fetch_and_* cases except for
+     return value
+   */
+  if (__sync_add_and_fetch(AL+16, 1) != 1) abort();
+  if (__sync_add_and_fetch(AL+17, 0xb000e0000000ll) != 0xc001c0de0000ll)
+	abort();
+  if (__sync_sub_and_fetch(AL+18, 22) != 20) abort();
+  if (__sync_sub_and_fetch(AL+19, 0xb000e0000000ll) != 0x1000e0de0000ll)
+	abort();
+
+  if (__sync_and_and_fetch(AL+20, 0x300000007ll) != 0x300000007ll) abort();
+  if (__sync_or_and_fetch(AL+21, 0x500000009ll) != 0x500000009ll) abort();
+  if (__sync_xor_and_fetch(AL+22, 0xe00000001ll) != 0xf100ff0001ll) abort();
+  if (__sync_nand_and_fetch(AL+23, 0xa00000007ll) != ~0xa00000007ll) abort();
+}
+
+int main()
+{
+  memcpy(AL, init_di, sizeof(init_di));
+
+  do_noret_di ();
+
+  if (memcmp (AL, test_di, sizeof(test_di)))
+    abort ();
+
+  memcpy(AL, init_di, sizeof(init_di));
+
+  do_ret_di ();
+
+  if (memcmp (AL, test_di, sizeof(test_di)))
+    abort ();
+
+  return 0;
+}
+
+/* On an old ARM we have no ldrexd or strexd so we have to use helpers */
+/* { dg-final { scan-assembler-not "ldrexd" } } */
+/* { dg-final { scan-assembler-not "strexd" } } */
+/* { dg-final { scan-assembler "__sync_" } } */
diff --git a/gcc/testsuite/gcc.target/arm/di-longlong64-sync-withldrexd.c b/gcc/testsuite/gcc.target/arm/di-longlong64-sync-withldrexd.c
new file mode 100644
index 0000000..77a224a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/di-longlong64-sync-withldrexd.c
@@ -0,0 +1,179 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_arm_ok } */
+/* { dg-options "-marm -std=gnu99" } */
+/* { dg-require-effective-target arm_arch_v6k_ok } */
+/* { dg-add-options arm_arch_v6k } */
+/* { dg-message "note: '__sync_fetch_and_nand' changed semantics in GCC 4.4" "" { target *-*-* } 0 } */
+/* { dg-message "note: '__sync_nand_and_fetch' changed semantics in GCC 4.4" "" { target *-*-* } 0 } */
+
+
+/* This is a copy of di-longlong64-sync-1.c, and is here to check that the
+   assembler generated has ldrexd's and strexd's in when compiled for
+   armv6k in ARM mode (which is the earliest config that can use them)
+ */
+ 
+/* Test basic functionality of the intrinsics.  The operations should
+   not be optimized away if no one checks the return values.  */
+
+/* Based on ia64-sync-[12].c, but 1) long on ARM is 32 bit so I needed
+   to use long long (an explicit 64bit type maybe a better bet) and
+   2) I wanted to use values that cross the 32bit boundary and cause
+   carries since the actual maths are done as pairs of 32 bit instructions.
+ */
+__extension__ typedef __SIZE_TYPE__ size_t;
+
+extern void abort (void);
+extern void *memcpy (void *, const void *, size_t);
+extern int memcmp (const void *, const void *, size_t);
+
+/* Temporary space where the work actually gets done */
+static long long AL[24];
+/* Values copied into AL before we start */
+static long long init_di[24] = { 0x100000002ll, 0x200000003ll, 0, 1,
+
+				 0x100000002ll, 0x100000002ll,
+				 0x100000002ll, 0x100000002ll,
+
+				 0, 0x1000e0de0000ll,
+				 42 , 0xc001c0de0000ll,
+
+				 -1ll, 0, 0xff00ff0000ll, -1ll,
+
+				 0, 0x1000e0de0000ll,
+				 42 , 0xc001c0de0000ll,
+
+				 -1ll, 0, 0xff00ff0000ll, -1ll};
+/* This is what should be in AL at the end */
+static long long test_di[24] = { 0x1234567890ll, 0x1234567890ll, 1, 0,
+
+				 0x100000002ll, 0x100000002ll,
+				 0x100000002ll, 0x100000002ll,
+
+				 1, 0xc001c0de0000ll,
+				 20, 0x1000e0de0000ll,
+
+				 0x300000007ll , 0x500000009ll,
+				 0xf100ff0001ll, ~0xa00000007ll,
+
+				 1, 0xc001c0de0000ll,
+				 20, 0x1000e0de0000ll,
+
+				 0x300000007ll , 0x500000009ll,
+				 0xf100ff0001ll, ~0xa00000007ll };
+
+/* First check they work in terms of what they do to memory */
+static void
+do_noret_di (void)
+{
+  __sync_val_compare_and_swap(AL+0, 0x100000002ll, 0x1234567890ll);
+  __sync_bool_compare_and_swap(AL+1, 0x200000003ll, 0x1234567890ll);
+  __sync_lock_test_and_set(AL+2, 1);
+  __sync_lock_release(AL+3);
+
+  /* The following tests should not change the value since the
+     original does NOT match
+   */
+  __sync_val_compare_and_swap(AL+4, 0x000000002ll, 0x1234567890ll);
+  __sync_val_compare_and_swap(AL+5, 0x100000000ll, 0x1234567890ll);
+  __sync_bool_compare_and_swap(AL+6, 0x000000002ll, 0x1234567890ll);
+  __sync_bool_compare_and_swap(AL+7, 0x100000000ll, 0x1234567890ll);
+
+  __sync_fetch_and_add(AL+8, 1);
+  __sync_fetch_and_add(AL+9, 0xb000e0000000ll); /* add to both halves & carry */
+  __sync_fetch_and_sub(AL+10, 22);
+  __sync_fetch_and_sub(AL+11, 0xb000e0000000ll);
+
+  __sync_fetch_and_and(AL+12, 0x300000007ll);
+  __sync_fetch_and_or(AL+13, 0x500000009ll);
+  __sync_fetch_and_xor(AL+14, 0xe00000001ll);
+  __sync_fetch_and_nand(AL+15, 0xa00000007ll);
+
+  /* These should be the same as the fetch_and_* cases except for
+     return value
+   */
+  __sync_add_and_fetch(AL+16, 1);
+  __sync_add_and_fetch(AL+17, 0xb000e0000000ll); /* add to both halves & carry */
+  __sync_sub_and_fetch(AL+18, 22);
+  __sync_sub_and_fetch(AL+19, 0xb000e0000000ll);
+
+  __sync_and_and_fetch(AL+20, 0x300000007ll);
+  __sync_or_and_fetch(AL+21, 0x500000009ll);
+  __sync_xor_and_fetch(AL+22, 0xe00000001ll);
+  __sync_nand_and_fetch(AL+23, 0xa00000007ll);
+}
+
+/* Now check return values */
+static void
+do_ret_di (void)
+{
+  if (__sync_val_compare_and_swap(AL+0, 0x100000002ll, 0x1234567890ll) !=
+	0x100000002ll) abort();
+  if (__sync_bool_compare_and_swap(AL+1, 0x200000003ll, 0x1234567890ll) !=
+	1) abort();
+  if (__sync_lock_test_and_set(AL+2, 1) != 0) abort();
+  __sync_lock_release(AL+3); /* no return value, but keep to match results */
+
+  /* The following tests should not change the value since the
+     original does NOT match
+   */
+  if (__sync_val_compare_and_swap(AL+4, 0x000000002ll, 0x1234567890ll) !=
+	0x100000002ll) abort();
+  if (__sync_val_compare_and_swap(AL+5, 0x100000000ll, 0x1234567890ll) !=
+	0x100000002ll) abort();
+  if (__sync_bool_compare_and_swap(AL+6, 0x000000002ll, 0x1234567890ll) !=
+	0) abort();
+  if (__sync_bool_compare_and_swap(AL+7, 0x100000000ll, 0x1234567890ll) !=
+	0) abort();
+
+  if (__sync_fetch_and_add(AL+8, 1) != 0) abort();
+  if (__sync_fetch_and_add(AL+9, 0xb000e0000000ll) != 0x1000e0de0000ll) abort();
+  if (__sync_fetch_and_sub(AL+10, 22) != 42) abort();
+  if (__sync_fetch_and_sub(AL+11, 0xb000e0000000ll) != 0xc001c0de0000ll)
+	abort();
+
+  if (__sync_fetch_and_and(AL+12, 0x300000007ll) != -1ll) abort();
+  if (__sync_fetch_and_or(AL+13, 0x500000009ll) != 0) abort();
+  if (__sync_fetch_and_xor(AL+14, 0xe00000001ll) != 0xff00ff0000ll) abort();
+  if (__sync_fetch_and_nand(AL+15, 0xa00000007ll) != -1ll) abort();
+
+  /* These should be the same as the fetch_and_* cases except for
+     return value
+   */
+  if (__sync_add_and_fetch(AL+16, 1) != 1) abort();
+  if (__sync_add_and_fetch(AL+17, 0xb000e0000000ll) != 0xc001c0de0000ll)
+	abort();
+  if (__sync_sub_and_fetch(AL+18, 22) != 20) abort();
+  if (__sync_sub_and_fetch(AL+19, 0xb000e0000000ll) != 0x1000e0de0000ll)
+	abort();
+
+  if (__sync_and_and_fetch(AL+20, 0x300000007ll) != 0x300000007ll) abort();
+  if (__sync_or_and_fetch(AL+21, 0x500000009ll) != 0x500000009ll) abort();
+  if (__sync_xor_and_fetch(AL+22, 0xe00000001ll) != 0xf100ff0001ll) abort();
+  if (__sync_nand_and_fetch(AL+23, 0xa00000007ll) != ~0xa00000007ll) abort();
+}
+
+int main()
+{
+  memcpy(AL, init_di, sizeof(init_di));
+
+  do_noret_di ();
+
+  if (memcmp (AL, test_di, sizeof(test_di)))
+    abort ();
+
+  memcpy(AL, init_di, sizeof(init_di));
+
+  do_ret_di ();
+
+  if (memcmp (AL, test_di, sizeof(test_di)))
+    abort ();
+
+  return 0;
+}
+
+/* We should be using ldrexd, strexd and no helper functions or shorter ldrex */
+/* { dg-final { scan-assembler-times "\tldrexd" 46 } } */
+/* { dg-final { scan-assembler-times "\tstrexd" 46 } } */
+/* { dg-final { scan-assembler-not "__sync_" } } */
+/* { dg-final { scan-assembler-not "ldrex\t" } } */
+/* { dg-final { scan-assembler-not "strex\t" } } */
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 7d3a271..d9b3678 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -1975,6 +1975,47 @@ proc check_effective_target_arm_neon_fp16_ok { } {
 		check_effective_target_arm_neon_fp16_ok_nocache]
 }
 
+# Creates a series of routines that return 1 if the given architecture
+# can be selected and a routine to give the flags to select that architecture
+# Note: Extra flags may be added to disable options from newer compilers
+# (Thumb in particular - but others may be added in the future)
+# Usage: /* { dg-require-effective-target arm_arch_v5_ok } */
+#        /* { dg-add-options arm_arch_v5 } */
+foreach { armfunc armflag armdef } { v5 "-march=armv5 -marm" __ARM_ARCH_5__
+				     v6 "-march=armv6" __ARM_ARCH_6__
+				     v6k "-march=armv6k" __ARM_ARCH_6K__
+				     v7a "-march=armv7-a" __ARM_ARCH_7A__ } {
+    eval [string map [list FUNC $armfunc FLAG $armflag DEF $armdef ] {
+	proc check_effective_target_arm_arch_FUNC_ok { } {
+	    if { [ string match "*-marm*" "FLAG" ] && 
+		![check_effective_target_arm_arm_ok] } {
+		return 0
+	    }
+	    return [check_no_compiler_messages arm_arch_FUNC_ok assembly {
+		#if !defined(DEF)
+		#error FOO
+		#endif
+	    } "FLAG" ]
+	}
+
+	proc add_options_for_arm_arch_FUNC { flags } {
+	    return "$flags FLAG"
+	}
+    }]
+}
+
+# Return 1 if this is an ARM target where -marm causes ARM to be
+# used (not Thumb)
+
+proc check_effective_target_arm_arm_ok { } {
+    return [check_no_compiler_messages arm_arm_ok assembly {
+	#if !defined(__arm__) || defined(__thumb__) || defined(__thumb2__)
+	#error FOO
+	#endif
+    } "-marm"]
+}
+
+
 # Return 1 is this is an ARM target where -mthumb causes Thumb-1 to be
 # used.
 
@@ -3290,6 +3331,31 @@ proc check_effective_target_sync_int_long { } {
     return $et_sync_int_long_saved
 }
 
+# Return 1 if the target supports atomic operations on "long long" and can actually
+# execute them
+# So far only put checks in for ARM, others may want to add their own
+proc check_effective_target_sync_longlong { } {
+    return [check_runtime sync_longlong_runtime {
+      #include <stdlib.h>
+      int main()
+      {
+	long long l1;
+
+	if (sizeof(long long)!=8)
+	  exit(1);
+
+      #ifdef __arm__
+	/* Just check for native; checking for kernel fallback is tricky */
+	asm volatile ("ldrexd r0,r1, [%0]" : : "r" (&l1) : "r0", "r1");
+      #else
+      # error "Add other suitable archs here"
+      #endif
+
+	exit(0);
+      }
+    } "" ]
+}
+
 # Return 1 if the target supports atomic operations on "char" and "short".
 
 proc check_effective_target_sync_char_short { } {

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [Patch 2/3] ARM 64 bit atomic operations
  2011-07-01 15:53 [Patch 0/3] ARM 64 bit atomic operations Dr. David Alan Gilbert
  2011-07-01 15:55 ` [Patch 1/3] " Dr. David Alan Gilbert
@ 2011-07-01 15:56 ` Dr. David Alan Gilbert
  2011-07-01 16:03   ` Richard Henderson
  2011-07-01 19:38   ` Joseph S. Myers
  2011-07-01 15:57 ` [Patch 3/3] " Dr. David Alan Gilbert
  2011-07-12 12:55 ` [Patch 0/3] " Ramana Radhakrishnan
  3 siblings, 2 replies; 43+ messages in thread
From: Dr. David Alan Gilbert @ 2011-07-01 15:56 UTC (permalink / raw)
  To: gcc-patches; +Cc: patches


  Provide fallbacks for 64bit atomics that call Linux commpage helpers
  when compiling for older machines.  The code is based on the existing
  linux-atomic.c for other sizes, however it performs an init time
  check that the kernel is new enough to provide the helper.

  This relies on Nicolas Pitre's kernel patch here:
  https://patchwork.kernel.org/patch/894932/

diff --git a/gcc/config/arm/linux-atomic-64bit.c b/gcc/config/arm/linux-atomic-64bit.c
new file mode 100644
index 0000000..140cc2f
--- /dev/null
+++ b/gcc/config/arm/linux-atomic-64bit.c
@@ -0,0 +1,165 @@
+/* 64bit Linux-specific atomic operations for ARM EABI.
+   Copyright (C) 2008, 2009, 2010, 2011 Free Software Foundation, Inc.
+   Based on linux-atomic.c
+
+   64 bit additions david.gilbert@linaro.org
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+/* 64bit helper functions for atomic operations; the compiler will
+   call these when the code is compiled for a CPU without ldrexd/strexd.
+   (If the CPU had those then the compiler inlines the operation).
+
+   These helpers require a kernel helper that's only present on newer
+   kernels; we check for that in an init section and bail out rather
+   unceremoneously.  */
+
+/* For write */
+#include <unistd.h>
+/* For abort */
+#include <stdlib.h>
+
+/* Kernel helper for compare-and-exchange.  */
+typedef int (__kernel_cmpxchg64_t) (const long long* oldval,
+					const long long* newval,
+					long long *ptr);
+#define __kernel_cmpxchg64 (*(__kernel_cmpxchg64_t *) 0xffff0f60)
+
+/* Kernel helper page version number */
+
+#define __kernel_helper_version (*(unsigned int *)0xffff0ffc)
+
+/* Check that the kernel has a new enough version at load */
+void __check_for_sync8_kernelhelper (void)
+{
+  if (__kernel_helper_version < 5)
+    {
+      const char err[] = "A newer kernel is required to run this binary. (__kernel_cmpxchg64 helper)\n";
+      /* At this point we need a way to crash with some information
+	 for the user - I'm not sure I can rely on much else being
+	 available at this point, so do the same as generic-morestack.c
+	 write() and abort(). */
+      write (2 /* stderr */, err, sizeof(err));
+      abort ();
+    }
+};
+
+void (*__sync8_kernelhelper_inithook[]) (void) __attribute__ ((section (".init_array"))) = {
+  &__check_for_sync8_kernelhelper
+};
+
+#define HIDDEN __attribute__ ((visibility ("hidden")))
+
+#define FETCH_AND_OP_WORD64(OP, PFX_OP, INF_OP)			\
+  long long HIDDEN						\
+  __sync_fetch_and_##OP##_8 (long long *ptr, long long val)	\
+  {								\
+    int failure;						\
+    long long tmp,tmp2;						\
+								\
+    do {							\
+      tmp = *ptr;						\
+      tmp2 = PFX_OP (tmp INF_OP val);				\
+      failure = __kernel_cmpxchg64 (&tmp, &tmp2, ptr);		\
+    } while (failure != 0);					\
+								\
+    return tmp;							\
+  }
+
+FETCH_AND_OP_WORD64 (add,   , +)
+FETCH_AND_OP_WORD64 (sub,   , -)
+FETCH_AND_OP_WORD64 (or,    , |)
+FETCH_AND_OP_WORD64 (and,   , &)
+FETCH_AND_OP_WORD64 (xor,   , ^)
+FETCH_AND_OP_WORD64 (nand, ~, &)
+
+#define NAME_oldval(OP, WIDTH) __sync_fetch_and_##OP##_##WIDTH
+#define NAME_newval(OP, WIDTH) __sync_##OP##_and_fetch_##WIDTH
+
+/* Implement both __sync_<op>_and_fetch and __sync_fetch_and_<op> for
+   subword-sized quantities.  */
+
+#define OP_AND_FETCH_WORD64(OP, PFX_OP, INF_OP)			\
+  long long HIDDEN						\
+  __sync_##OP##_and_fetch_8 (long long *ptr, long long val)	\
+  {								\
+    int failure;						\
+    long long tmp,tmp2;						\
+								\
+    do {							\
+      tmp = *ptr;						\
+      tmp2 = PFX_OP (tmp INF_OP val);				\
+      failure = __kernel_cmpxchg64 (&tmp, &tmp2, ptr);		\
+    } while (failure != 0);					\
+								\
+    return tmp2;						\
+  }
+
+OP_AND_FETCH_WORD64 (add,   , +)
+OP_AND_FETCH_WORD64 (sub,   , -)
+OP_AND_FETCH_WORD64 (or,    , |)
+OP_AND_FETCH_WORD64 (and,   , &)
+OP_AND_FETCH_WORD64 (xor,   , ^)
+OP_AND_FETCH_WORD64 (nand, ~, &)
+
+long long HIDDEN
+__sync_val_compare_and_swap_8 (long long *ptr, long long oldval, long long newval)
+{
+  int failure;
+  long long actual_oldval;
+
+  while (1)
+    {
+      actual_oldval = *ptr;
+
+      if (__builtin_expect (oldval != actual_oldval, 0))
+	return actual_oldval;
+
+      failure = __kernel_cmpxchg64 (&actual_oldval, &newval, ptr);
+
+      if (__builtin_expect (!failure, 1))
+	return oldval;
+    }
+}
+
+typedef unsigned char bool;
+
+bool HIDDEN
+__sync_bool_compare_and_swap_8 (long long *ptr, long long oldval, long long newval)
+{
+  int failure = __kernel_cmpxchg64 (&oldval, &newval, ptr);
+  return (failure == 0);
+}
+
+long long HIDDEN
+__sync_lock_test_and_set_8 (long long *ptr, long long val)
+{
+  int failure;
+  long long oldval;
+
+  do {
+    oldval = *ptr;
+    failure = __kernel_cmpxchg64 (&oldval, &val, ptr);
+  } while (failure != 0);
+
+  return oldval;
+}
diff --git a/gcc/config/arm/linux-atomic.c b/gcc/config/arm/linux-atomic.c
index 57065a6..80f161d 100644
--- a/gcc/config/arm/linux-atomic.c
+++ b/gcc/config/arm/linux-atomic.c
@@ -32,8 +32,8 @@ typedef void (__kernel_dmb_t) (void);
 #define __kernel_dmb (*(__kernel_dmb_t *) 0xffff0fa0)
 
 /* Note: we implement byte, short and int versions of atomic operations using
-   the above kernel helpers, but there is no support for "long long" (64-bit)
-   operations as yet.  */
+   the above kernel helpers; see linux-atomic-64bit.c for "long long" (64-bit)
+   operations.  */
 
 #define HIDDEN __attribute__ ((visibility ("hidden")))
 
@@ -273,6 +273,7 @@ SUBWORD_TEST_AND_SET (unsigned char,  1)
     *ptr = 0;								\
   }
 
+SYNC_LOCK_RELEASE (long long,   8)
 SYNC_LOCK_RELEASE (int,   4)
 SYNC_LOCK_RELEASE (short, 2)
 SYNC_LOCK_RELEASE (char,  1)
diff --git a/gcc/config/arm/t-linux-eabi b/gcc/config/arm/t-linux-eabi
index a328a00..3814cc0 100644
--- a/gcc/config/arm/t-linux-eabi
+++ b/gcc/config/arm/t-linux-eabi
@@ -36,3 +36,4 @@ LIB1ASMFUNCS := $(filter-out _dvmd_tls,$(LIB1ASMFUNCS)) _dvmd_lnx _clear_cache
 EXTRA_MULTILIB_PARTS=crtbegin.o crtend.o crtbeginS.o crtendS.o crtbeginT.o
 
 LIB2FUNCS_STATIC_EXTRA += $(srcdir)/config/arm/linux-atomic.c
+LIB2FUNCS_STATIC_EXTRA += $(srcdir)/config/arm/linux-atomic-64bit.c

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [Patch 3/3] ARM 64 bit atomic operations
  2011-07-01 15:53 [Patch 0/3] ARM 64 bit atomic operations Dr. David Alan Gilbert
  2011-07-01 15:55 ` [Patch 1/3] " Dr. David Alan Gilbert
  2011-07-01 15:56 ` [Patch 2/3] " Dr. David Alan Gilbert
@ 2011-07-01 15:57 ` Dr. David Alan Gilbert
  2011-07-12 15:31   ` Ramana Radhakrishnan
  2011-07-12 12:55 ` [Patch 0/3] " Ramana Radhakrishnan
  3 siblings, 1 reply; 43+ messages in thread
From: Dr. David Alan Gilbert @ 2011-07-01 15:57 UTC (permalink / raw)
  To: gcc-patches; +Cc: patches


As per pr/48126 Michael Edwards spotted that in the case where
the compare fails in the cmpxchg, the barrier at the end wasn't taken
theoretically allowing a following load to float up above the load
value compared.

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 057f9ba..39057d2 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -23531,8 +23626,8 @@ arm_output_sync_loop (emit_f emit,
 	}
     }
 
-  arm_process_output_memory_barrier (emit, NULL);
   arm_output_asm_insn (emit, 1, operands, "%sLSYB%%=:", LOCAL_LABEL_PREFIX);
+  arm_process_output_memory_barrier (emit, NULL);
 }
 
 static rtx

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Patch 2/3] ARM 64 bit atomic operations
  2011-07-01 15:56 ` [Patch 2/3] " Dr. David Alan Gilbert
@ 2011-07-01 16:03   ` Richard Henderson
  2011-07-01 17:08     ` David Gilbert
  2011-07-01 19:38   ` Joseph S. Myers
  1 sibling, 1 reply; 43+ messages in thread
From: Richard Henderson @ 2011-07-01 16:03 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: gcc-patches, patches

On 07/01/2011 08:55 AM, Dr. David Alan Gilbert wrote:
> +/* Check that the kernel has a new enough version at load */
> +void __check_for_sync8_kernelhelper (void)
> +{
> +  if (__kernel_helper_version < 5)
> +    {
> +      const char err[] = "A newer kernel is required to run this binary. (__kernel_cmpxchg64 helper)\n";
> +      /* At this point we need a way to crash with some information
> +	 for the user - I'm not sure I can rely on much else being
> +	 available at this point, so do the same as generic-morestack.c
> +	 write() and abort(). */
> +      write (2 /* stderr */, err, sizeof(err));
> +      abort ();
> +    }
> +};

Wouldn't it be better to convert the arm kernel to use a proper VDSO,
so that this error actually comes from the dynamic linker?  That's
the beauty of a true VDSO -- proper symbol resolution.


r~

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Patch 2/3] ARM 64 bit atomic operations
  2011-07-01 16:03   ` Richard Henderson
@ 2011-07-01 17:08     ` David Gilbert
  0 siblings, 0 replies; 43+ messages in thread
From: David Gilbert @ 2011-07-01 17:08 UTC (permalink / raw)
  To: Richard Henderson; +Cc: gcc-patches, patches

On 1 July 2011 17:03, Richard Henderson <rth@redhat.com> wrote:
> On 07/01/2011 08:55 AM, Dr. David Alan Gilbert wrote:
>> +/* Check that the kernel has a new enough version at load */
>> +void __check_for_sync8_kernelhelper (void)
>> +{
>> +  if (__kernel_helper_version < 5)
>> +    {
>> +      const char err[] = "A newer kernel is required to run this binary. (__kernel_cmpxchg64 helper)\n";
>> +      /* At this point we need a way to crash with some information
>> +      for the user - I'm not sure I can rely on much else being
>> +      available at this point, so do the same as generic-morestack.c
>> +      write() and abort(). */
>> +      write (2 /* stderr */, err, sizeof(err));
>> +      abort ();
>> +    }
>> +};
>
> Wouldn't it be better to convert the arm kernel to use a proper VDSO,
> so that this error actually comes from the dynamic linker?  That's
> the beauty of a true VDSO -- proper symbol resolution.

Well I can't say I like the need for this check/exit - so yes if something else
could do it for me that would be great.  Having said that, I don't know
the history of the ARM commpage/lack of VDSO , neither do I know how
big a change that would be.

Let me have a chat to some of the kernel guys who might know the history.

Dave

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Patch 2/3] ARM 64 bit atomic operations
  2011-07-01 15:56 ` [Patch 2/3] " Dr. David Alan Gilbert
  2011-07-01 16:03   ` Richard Henderson
@ 2011-07-01 19:38   ` Joseph S. Myers
  2011-07-04 22:27     ` David Gilbert
  1 sibling, 1 reply; 43+ messages in thread
From: Joseph S. Myers @ 2011-07-01 19:38 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: gcc-patches, patches

On Fri, 1 Jul 2011, Dr. David Alan Gilbert wrote:

> +/* For write */
> +#include <unistd.h>
> +/* For abort */
> +#include <stdlib.h>

Please don't include system headers in libgcc without appropriate 
inhibit_libc checks for bootstrap purposes.  In this case, it would seem 
better just to declare the functions you need.

> +/* Check that the kernel has a new enough version at load */
> +void __check_for_sync8_kernelhelper (void)

Shouldn't this function be static?

> +{
> +  if (__kernel_helper_version < 5)
> +    {
> +      const char err[] = "A newer kernel is required to run this binary. (__kernel_cmpxchg64 helper)\n";
> +      /* At this point we need a way to crash with some information
> +	 for the user - I'm not sure I can rely on much else being
> +	 available at this point, so do the same as generic-morestack.c
> +	 write() and abort(). */
> +      write (2 /* stderr */, err, sizeof(err));

"write" is in the user's namespace in ISO C so it's not ideal to have a 
call to it.  If there isn't a reserved-namespace version, using the 
syscall directly (hardcoding the syscall number) might be better.

> +void (*__sync8_kernelhelper_inithook[]) (void) __attribute__ ((section (".init_array"))) = {
> +  &__check_for_sync8_kernelhelper
> +};

Shouldn't this also be static (marked "used" if needed)?  Though I'd have 
thought simply marking the function as a constructor would be simpler and 
better....

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Patch 2/3] ARM 64 bit atomic operations
  2011-07-01 19:38   ` Joseph S. Myers
@ 2011-07-04 22:27     ` David Gilbert
  0 siblings, 0 replies; 43+ messages in thread
From: David Gilbert @ 2011-07-04 22:27 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: gcc-patches, patches

On 1 July 2011 20:38, Joseph S. Myers <joseph@codesourcery.com> wrote:

Hi Joseph,
  Thanks for your comments.

> On Fri, 1 Jul 2011, Dr. David Alan Gilbert wrote:
>
>> +/* For write */
>> +#include <unistd.h>
>> +/* For abort */
>> +#include <stdlib.h>
>
> Please don't include system headers in libgcc without appropriate
> inhibit_libc checks for bootstrap purposes.  In this case, it would seem
> better just to declare the functions you need.

OK.

>> +/* Check that the kernel has a new enough version at load */
>> +void __check_for_sync8_kernelhelper (void)
>
> Shouldn't this function be static?

Yep.

>> +{
>> +  if (__kernel_helper_version < 5)
>> +    {
>> +      const char err[] = "A newer kernel is required to run this binary. (__kernel_cmpxchg64 helper)\n";
>> +      /* At this point we need a way to crash with some information
>> +      for the user - I'm not sure I can rely on much else being
>> +      available at this point, so do the same as generic-morestack.c
>> +      write() and abort(). */
>> +      write (2 /* stderr */, err, sizeof(err));
>
> "write" is in the user's namespace in ISO C so it's not ideal to have a
> call to it.  If there isn't a reserved-namespace version, using the
> syscall directly (hardcoding the syscall number) might be better.

OK, fair enough.

>> +void (*__sync8_kernelhelper_inithook[]) (void) __attribute__ ((section (".init_array"))) = {
>> +  &__check_for_sync8_kernelhelper
>> +};
>
> Shouldn't this also be static (marked "used" if needed)?  Though I'd have
> thought simply marking the function as a constructor would be simpler and
> better....

OK, can do - I wasn't too sure if constructor would end up later in
the initialisation - I was worrying whether that might end up after a
C++ constructor that might actually use; (although I'm not actually
sure if that's more or less likely to happen with constructor v
init_array).

Dave

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Patch 0/3] ARM 64 bit atomic operations
  2011-07-01 15:53 [Patch 0/3] ARM 64 bit atomic operations Dr. David Alan Gilbert
                   ` (2 preceding siblings ...)
  2011-07-01 15:57 ` [Patch 3/3] " Dr. David Alan Gilbert
@ 2011-07-12 12:55 ` Ramana Radhakrishnan
  2011-07-26  9:14   ` [Patch 0/4] ARM 64 bit sync atomic operations [V2] Dr. David Alan Gilbert
  3 siblings, 1 reply; 43+ messages in thread
From: Ramana Radhakrishnan @ 2011-07-12 12:55 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: gcc-patches

>
> It's been tested cross to ARM from x86 and also a native x86 build & test.

Cross on qemu ? You do mean a native ARM build and test :) You are
missing changelog entries in each of your patches. Could you please
reply to each of your patches with the appropriate Changelog entries ?

Thanks,
Ramana

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Patch 3/3] ARM 64 bit atomic operations
  2011-07-01 15:57 ` [Patch 3/3] " Dr. David Alan Gilbert
@ 2011-07-12 15:31   ` Ramana Radhakrishnan
  0 siblings, 0 replies; 43+ messages in thread
From: Ramana Radhakrishnan @ 2011-07-12 15:31 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: gcc-patches, patches

On 1 July 2011 16:57, Dr. David Alan Gilbert <david.gilbert@linaro.org> wrote:
>
> As per pr/48126 Michael Edwards spotted that in the case where
> the compare fails in the cmpxchg, the barrier at the end wasn't taken
> theoretically allowing a following load to float up above the load
> value compared.

Please resubmit with a proper changelog entry. Can you add a comment
in the code to explain that this is to prevent speculative loads
before the barrier ?

cheers
Ramana

>
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index 057f9ba..39057d2 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -23531,8 +23626,8 @@ arm_output_sync_loop (emit_f emit,
>        }
>     }
>
> -  arm_process_output_memory_barrier (emit, NULL);
>   arm_output_asm_insn (emit, 1, operands, "%sLSYB%%=:", LOCAL_LABEL_PREFIX);
> +  arm_process_output_memory_barrier (emit, NULL);
>  }
>
>  static rtx
>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Patch 1/3] ARM 64 bit atomic operations
  2011-07-01 15:55 ` [Patch 1/3] " Dr. David Alan Gilbert
@ 2011-07-12 21:24   ` Ramana Radhakrishnan
  2011-07-13  9:18     ` David Gilbert
  0 siblings, 1 reply; 43+ messages in thread
From: Ramana Radhakrishnan @ 2011-07-12 21:24 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: gcc-patches, patches

Hi Dave,

Could you split this further into a patch that deals with the
case for disabling MCR memory barriers for Thumb1 so that it
maybe backported to the release branches ? I have commented inline
as well.

Could you also provide a proper changelog entry for this that will
also help with review of the patch ?

I've not yet managed to fully review all the bits in this patch but
here's some initial comments that should be looked at.

On 1 July 2011 16:54, Dr. David Alan Gilbert <david.gilbert@linaro.org> wrote:
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index 057f9ba..39057d2 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c

>
>  /* Emit a strex{b,h,d, } instruction appropriate for the specified
> @@ -23374,14 +23388,29 @@ arm_output_strex (emit_f emit,
>                  rtx value,
>                  rtx memory)
>  {
> -  const char *suffix = arm_ldrex_suffix (mode);
> -  rtx operands[3];
> +  rtx operands[4];
>
>   operands[0] = result;
>   operands[1] = value;
> -  operands[2] = memory;
> -  arm_output_asm_insn (emit, 0, operands, "strex%s%s\t%%0, %%1, %%C2", suffix,
> -                      cc);
> +  if (mode != DImode)
> +    {
> +      const char *suffix = arm_ldrex_suffix (mode);
> +      operands[2] = memory;
> +      arm_output_asm_insn (emit, 0, operands, "strex%s%s\t%%0, %%1, %%C2",
> +                         suffix, cc);
> +    }
> +  else
> +    {
> +      /* The restrictions on target registers in ARM mode are that the two
> +        registers are consecutive and the first one is even; Thumb is
> +        actually more flexible, but DI should give us this anyway.
> +        Note that the 1st register always gets the lowest word in memory.  */
> +      gcc_assert ((REGNO (value) & 1) == 0);
> +      operands[2] = gen_rtx_REG (SImode, REGNO (value) + 1);
> +      operands[3] = memory;
> +      arm_output_asm_insn (emit, 0, operands, "strexd%s\t%%0, %%1, %%2, %%C3",
> +                          cc);
> +    }
>  }
>
>  /* Helper to emit a two operand instruction.  */
> @@ -23423,7 +23452,7 @@ arm_output_op3 (emit_f emit, const char *mnemonic, rtx d, rtx a, rtx b)
>
>    required_value:
>
> -   RTX register or const_int representing the required old_value for
> +   RTX register representing the required old_value for
>    the modify to continue, if NULL no comparsion is performed.  */
>  static void
>  arm_output_sync_loop (emit_f emit,
> @@ -23437,7 +23466,13 @@ arm_output_sync_loop (emit_f emit,
>                      enum attr_sync_op sync_op,
>                      int early_barrier_required)
>  {
> -  rtx operands[1];
> +  rtx operands[2];
> +  /* We'll use the lo for the normal rtx in the none-DI case
> +     as well as the least-sig word in the DI case.  */
> +  rtx old_value_lo, required_value_lo, new_value_lo, t1_lo;
> +  rtx old_value_hi, required_value_hi, new_value_hi, t1_hi;
> +
> +  bool is_di = mode == DImode;
>
>   gcc_assert (t1 != t2);
>
> @@ -23448,82 +23483,142 @@ arm_output_sync_loop (emit_f emit,
>
>   arm_output_ldrex (emit, mode, old_value, memory);
>
> +  if (is_di)
> +    {
> +      old_value_lo = gen_lowpart (SImode, old_value);
> +      old_value_hi = gen_highpart (SImode, old_value);
> +      if (required_value)
> +       {
> +         required_value_lo = gen_lowpart (SImode, required_value);
> +         required_value_hi = gen_highpart (SImode, required_value);
> +       }
> +      else
> +       {
> +         /* Silence false potentially unused warning */
> +         required_value_lo = NULL;
> +         required_value_hi = NULL;
> +       }
> +      new_value_lo = gen_lowpart (SImode, new_value);
> +      new_value_hi = gen_highpart (SImode, new_value);
> +      t1_lo = gen_lowpart (SImode, t1);
> +      t1_hi = gen_highpart (SImode, t1);
> +    }
> +  else
> +    {
> +      old_value_lo = old_value;
> +      new_value_lo = new_value;
> +      required_value_lo = required_value;
> +      t1_lo = t1;
> +
> +      /* Silence false potentially unused warning */
> +      t1_hi = NULL;
> +      new_value_hi = NULL;
> +      required_value_hi = NULL;
> +      old_value_hi = NULL;
> +    }
> +
>   if (required_value)
>     {
> -      rtx operands[2];
> +      operands[0] = old_value_lo;
> +      operands[1] = required_value_lo;
>
> -      operands[0] = old_value;
> -      operands[1] = required_value;
>       arm_output_asm_insn (emit, 0, operands, "cmp\t%%0, %%1");
> +      if (is_di)
> +        {
> +          arm_output_asm_insn (emit, 0, operands, "it\teq");

This should be guarded with a if (TARGET_THUMB2) - there's no point in
accounting for the length of this instruction in the compiler and then
have the assembler fold it away in ARM state.

>
> diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
> index c32ef1a..3fdd22f 100644
> --- a/gcc/config/arm/arm.h
> +++ b/gcc/config/arm/arm.h
> @@ -282,7 +282,8 @@ extern void (*arm_lang_output_object_attributes_hook)(void);
> -#define TARGET_HAVE_DMB_MCR    (arm_arch6k && ! TARGET_HAVE_DMB)
> +#define TARGET_HAVE_DMB_MCR    (arm_arch6k && ! TARGET_HAVE_DMB \
> +                                && ! TARGET_THUMB1)

This hunk (TARGET_HAVE_DMB_MCR) should probably be backported to
release branches because this is technically fixing an issue and
hence should be a separate patch that can be looked at separately.

>
>  /* Nonzero if this chip implements a memory barrier instruction.  */
>  #define TARGET_HAVE_MEMORY_BARRIER (TARGET_HAVE_DMB || TARGET_HAVE_DMB_MCR)
> @@ -290,8 +291,12 @@ extern void (*arm_lang_output_object_attributes_hook)(void);

sync.md changes -

> (define_mode_iterator NARROW [QI HI])
>+(define_mode_iterator QHSD [QI HI SI DI])
>+(define_mode_iterator SIDI [SI DI])
>+
>+(define_mode_attr sync_predtab [(SI "TARGET_HAVE_LDREX && TARGET_HAVE_MEMORY_BARRIER")
>+				(QI "TARGET_HAVE_LDREXBH && TARGET_HAVE_MEMORY_BARRIER")
>+				(HI "TARGET_HAVE_LDREXBH && TARGET_HAVE_MEMORY_BARRIER")
>+				(DI "TARGET_HAVE_LDREXD && ARM_DOUBLEWORD_ALIGN && TARGET_HAVE_MEMORY_BARRIER")])
>+

Can we move all the iterators to iterators.md and then arrange
includes to work automatically ? Minor nit - could you align the entries
for QI, HI and DI with the start of the SI ?

>+(define_mode_attr sync_atleastsi [(SI "SI")
>+				  (DI "DI")
>+				  (HI "SI")
>+				  (QI "SI")])
>

I couldn't spot where this was being used. Can this be removed if not
necessary ?

>
>-(define_insn "arm_sync_new_nandsi"
>+(define_insn "arm_sync_new_<sync_optab><mode>"
>   [(set (match_operand:SI 0 "s_register_operand" "=&r")
>-        (unspec_volatile:SI [(not:SI (and:SI
>-                               (match_operand:SI 1 "arm_sync_memory_operand" "+Q")
>-                               (match_operand:SI 2 "s_register_operand" "r")))
>-	                    ]
>-	                    VUNSPEC_SYNC_NEW_OP))
>+        (unspec_volatile:SI [(syncop:SI
>+			       (zero_extend:SI
>+				 (match_operand:NARROW 1 "arm_sync_memory_operand" "+Q"))
>+			       (match_operand:SI 2 "s_register_operand" "r"))
>+			    ]
>+			    VUNSPEC_SYNC_NEW_OP))
>    (set (match_dup 1)
>-        (unspec_volatile:SI [(match_dup 1) (match_dup 2)]
>-	                    VUNSPEC_SYNC_NEW_OP))
>+	(unspec_volatile:NARROW [(match_dup 1) (match_dup 2)]
>+				VUNSPEC_SYNC_NEW_OP))
>    (clobber (reg:CC CC_REGNUM))
>    (clobber (match_scratch:SI 3 "=&r"))]
>-  "TARGET_HAVE_LDREX && TARGET_HAVE_MEMORY_BARRIER"
>+  "TARGET_HAVE_LDREXBH && TARGET_HAVE_MEMORY_BARRIER"

Can't this just use <sync_predtab> instead since the condition is identical
for QImode and HImode from that mode attribute and in quite a few
places below. ?

>@@ -461,19 +359,19 @@
>         (unspec_volatile:SI
> 	  [(not:SI
> 	     (and:SI
>-               (zero_extend:SI	
>-	         (match_operand:NARROW 1 "arm_sync_memory_operand" "+Q"))
>-               (match_operand:SI 2 "s_register_operand" "r")))
>+	       (zero_extend:SI	
>+		 (match_operand:NARROW 1 "arm_sync_memory_operand" "+Q"))
>+	       (match_operand:SI 2 "s_register_operand" "r")))
> 	  ] VUNSPEC_SYNC_NEW_OP))
>    (set (match_dup 1)
>         (unspec_volatile:NARROW [(match_dup 1) (match_dup 2)]
>-	                        VUNSPEC_SYNC_NEW_OP))
>+				VUNSPEC_SYNC_NEW_OP))
>    (clobber (reg:CC CC_REGNUM))
>    (clobber (match_scratch:SI 3 "=&r"))]
>-  "TARGET_HAVE_LDREX && TARGET_HAVE_MEMORY_BARRIER"
>+  "TARGET_HAVE_LDREXBH && TARGET_HAVE_MEMORY_BARRIER"

Again here . Not sure which pattern this is though just looking at the patch.

>   {
>     return arm_output_sync_insn (insn, operands);
>-  }
>+  }
>   [(set_attr "sync_result"          "0")
>    (set_attr "sync_memory"          "1")
>    (set_attr "sync_new_value"       "2")
>@@ -483,20 +381,20 @@
>    (set_attr "conds" "clob")
>    (set_attr "predicable" "no")])
>
>-(define_insn "arm_sync_old_<sync_optab>si"
>-  [(set (match_operand:SI 0 "s_register_operand" "=&r")
>-        (unspec_volatile:SI [(syncop:SI
>-                               (match_operand:SI 1 "arm_sync_memory_operand" "+Q")
>-                               (match_operand:SI 2 "s_register_operand" "r"))
>-	                    ]
>-	                    VUNSPEC_SYNC_OLD_OP))
>+(define_insn "arm_sync_old_<sync_optab><mode>"
>+  [(set (match_operand:SIDI 0 "s_register_operand" "=&r")
>+	(unspec_volatile:SIDI [(syncop:SIDI
>+			       (match_operand:SIDI 1 "arm_sync_memory_operand" "+Q")
>+			       (match_operand:SIDI 2 "s_register_operand" "r"))
>+			    ]
>+			    VUNSPEC_SYNC_OLD_OP))
>    (set (match_dup 1)
>-        (unspec_volatile:SI [(match_dup 1) (match_dup 2)]
>-	                    VUNSPEC_SYNC_OLD_OP))
>+        (unspec_volatile:SIDI [(match_dup 1) (match_dup 2)]
>+			      VUNSPEC_SYNC_OLD_OP))
>    (clobber (reg:CC CC_REGNUM))
>-   (clobber (match_scratch:SI 3 "=&r"))
>+   (clobber (match_scratch:SIDI 3 "=&r"))
>    (clobber (match_scratch:SI 4 "<sync_clobber>"))]
>-  "TARGET_HAVE_LDREX && TARGET_HAVE_MEMORY_BARRIER"
>+  "<sync_predtab>"
>   {
>     return arm_output_sync_insn (insn, operands);
>   }
>@@ -509,20 +407,21 @@
>    (set_attr "conds" "clob")
>    (set_attr "predicable" "no")])
>
>-(define_insn "arm_sync_old_nandsi"
>+(define_insn "arm_sync_old_<sync_optab><mode>"
>   [(set (match_operand:SI 0 "s_register_operand" "=&r")
>-        (unspec_volatile:SI [(not:SI (and:SI
>-                               (match_operand:SI 1 "arm_sync_memory_operand" "+Q")
>-                               (match_operand:SI 2 "s_register_operand" "r")))
>-	                    ]
>-	                    VUNSPEC_SYNC_OLD_OP))
>+        (unspec_volatile:SI [(syncop:SI
>+			       (zero_extend:SI
>+				 (match_operand:NARROW 1 "arm_sync_memory_operand" "+Q"))
>+			       (match_operand:SI 2 "s_register_operand" "r"))
>+			    ]
>+			    VUNSPEC_SYNC_OLD_OP))
>    (set (match_dup 1)
>-        (unspec_volatile:SI [(match_dup 1) (match_dup 2)]
>-	                    VUNSPEC_SYNC_OLD_OP))
>+	(unspec_volatile:NARROW [(match_dup 1) (match_dup 2)]
>+			    VUNSPEC_SYNC_OLD_OP))
>    (clobber (reg:CC CC_REGNUM))
>    (clobber (match_scratch:SI 3 "=&r"))
>-   (clobber (match_scratch:SI 4 "=&r"))]
>-  "TARGET_HAVE_LDREX && TARGET_HAVE_MEMORY_BARRIER"
>+   (clobber (match_scratch:SI 4 "<sync_clobber>"))]
>+  "TARGET_HAVE_LDREXBH && TARGET_HAVE_MEMORY_BARRIER"

Likewise for sync_predtab.

>-(define_insn "arm_sync_old_<sync_optab><mode>"
>-  [(set (match_operand:SI 0 "s_register_operand" "=&r")
>-        (unspec_volatile:SI [(syncop:SI
>-                               (zero_extend:SI
>-			         (match_operand:NARROW 1 "arm_sync_memory_operand" "+Q"))
>-                               (match_operand:SI 2 "s_register_operand" "r"))
>-	                    ]
>-	                    VUNSPEC_SYNC_OLD_OP))
>+(define_insn "arm_sync_old_nand<mode>"
>+  [(set (match_operand:SIDI 0 "s_register_operand" "=&r")
>+	(unspec_volatile:SIDI [(not:SIDI (and:SIDI
>+			       (match_operand:SIDI 1 "arm_sync_memory_operand" "+Q")
>+			       (match_operand:SIDI 2 "s_register_operand" "r")))
>+			    ]
>+			    VUNSPEC_SYNC_OLD_OP))
>    (set (match_dup 1)
>-        (unspec_volatile:NARROW [(match_dup 1) (match_dup 2)]
>+        (unspec_volatile:SIDI [(match_dup 1) (match_dup 2)]
> 	                    VUNSPEC_SYNC_OLD_OP))
>    (clobber (reg:CC CC_REGNUM))
>-   (clobber (match_scratch:SI 3 "=&r"))
>-   (clobber (match_scratch:SI 4 "<sync_clobber>"))]
>-  "TARGET_HAVE_LDREXBHD && TARGET_HAVE_MEMORY_BARRIER"
>+   (clobber (match_scratch:SIDI 3 "=&r"))
>+   (clobber (match_scratch:SI 4 "=&r"))]
>+  "<sync_predtab>"
>   {
>     return arm_output_sync_insn (insn, operands);
>   }
>@@ -557,26 +455,26 @@
>    (set_attr "sync_memory"          "1")
>    (set_attr "sync_new_value"       "2")
>    (set_attr "sync_t1"              "3")
>-   (set_attr "sync_t2"              "<sync_t2_reqd>")
>-   (set_attr "sync_op"              "<sync_optab>")
>+   (set_attr "sync_t2"              "4")
>+   (set_attr "sync_op"              "nand")
>    (set_attr "conds" 		    "clob")
>    (set_attr "predicable" "no")])
>
> (define_insn "arm_sync_old_nand<mode>"
>   [(set (match_operand:SI 0 "s_register_operand" "=&r")
>-        (unspec_volatile:SI [(not:SI (and:SI
>-                               (zero_extend:SI
>-			         (match_operand:NARROW 1 "arm_sync_memory_operand" "+Q"))
>-                               (match_operand:SI 2 "s_register_operand" "r")))
>-	                    ]
>-	                    VUNSPEC_SYNC_OLD_OP))
>+	(unspec_volatile:SI [(not:SI (and:SI
>+			       (zero_extend:SI
>+				 (match_operand:NARROW 1 "arm_sync_memory_operand" "+Q"))
>+			       (match_operand:SI 2 "s_register_operand" "r")))
<>+			    ]
>+			    VUNSPEC_SYNC_OLD_OP))
>    (set (match_dup 1)
>-        (unspec_volatile:NARROW [(match_dup 1) (match_dup 2)]
>-	                    VUNSPEC_SYNC_OLD_OP))
>+	(unspec_volatile:NARROW [(match_dup 1) (match_dup 2)]
>+			    VUNSPEC_SYNC_OLD_OP))
>    (clobber (reg:CC CC_REGNUM))
>    (clobber (match_scratch:SI 3 "=&r"))
>    (clobber (match_scratch:SI 4 "=&r"))]
>-  "TARGET_HAVE_LDREXBHD && TARGET_HAVE_MEMORY_BARRIER"
>+  "TARGET_HAVE_LDREXBH && TARGET_HAVE_MEMORY_BARRIER"
>   {
>     return arm_output_sync_insn (insn, operands);
>   }


Cheers
Ramana

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Patch 1/3] ARM 64 bit atomic operations
  2011-07-12 21:24   ` Ramana Radhakrishnan
@ 2011-07-13  9:18     ` David Gilbert
  0 siblings, 0 replies; 43+ messages in thread
From: David Gilbert @ 2011-07-13  9:18 UTC (permalink / raw)
  To: Ramana Radhakrishnan; +Cc: gcc-patches, patches

On 12 July 2011 22:07, Ramana Radhakrishnan
<ramana.radhakrishnan@linaro.org> wrote:
> Hi Dave,

Hi Ramana,
  Thanks for the review.

> Could you split this further into a patch that deals with the
> case for disabling MCR memory barriers for Thumb1 so that it
> maybe backported to the release branches ? I have commented inline
> as well.

Sure.

> Could you also provide a proper changelog entry for this that will
> also help with review of the patch ?

Yep, no problem.

> I've not yet managed to fully review all the bits in this patch but
> here's some initial comments that should be looked at.
>
> On 1 July 2011 16:54, Dr. David Alan Gilbert <david.gilbert@linaro.org> wrote:
>> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
<snip>

>> +      if (is_di)
>> +        {
>> +          arm_output_asm_insn (emit, 0, operands, "it\teq");
>
> This should be guarded with a if (TARGET_THUMB2) - there's no point in
> accounting for the length of this instruction in the compiler and then
> have the assembler fold it away in ARM state.

OK; the length accounting seems pretty broken anyway; I think it assumes
all instructions are 4 bytes.

>> diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
>> index c32ef1a..3fdd22f 100644
>> --- a/gcc/config/arm/arm.h
>> +++ b/gcc/config/arm/arm.h
>> @@ -282,7 +282,8 @@ extern void (*arm_lang_output_object_attributes_hook)(void);
>> -#define TARGET_HAVE_DMB_MCR    (arm_arch6k && ! TARGET_HAVE_DMB)
>> +#define TARGET_HAVE_DMB_MCR    (arm_arch6k && ! TARGET_HAVE_DMB \
>> +                                && ! TARGET_THUMB1)
>
> This hunk (TARGET_HAVE_DMB_MCR) should probably be backported to
> release branches because this is technically fixing an issue and
> hence should be a separate patch that can be looked at separately.

OK, will do.

>>  /* Nonzero if this chip implements a memory barrier instruction.  */
>>  #define TARGET_HAVE_MEMORY_BARRIER (TARGET_HAVE_DMB || TARGET_HAVE_DMB_MCR)
>> @@ -290,8 +291,12 @@ extern void (*arm_lang_output_object_attributes_hook)(void);
>
> sync.md changes -
>
>> (define_mode_iterator NARROW [QI HI])
>>+(define_mode_iterator QHSD [QI HI SI DI])
>>+(define_mode_iterator SIDI [SI DI])
>>+
>>+(define_mode_attr sync_predtab [(SI "TARGET_HAVE_LDREX && TARGET_HAVE_MEMORY_BARRIER")
>>+                              (QI "TARGET_HAVE_LDREXBH && TARGET_HAVE_MEMORY_BARRIER")
>>+                              (HI "TARGET_HAVE_LDREXBH && TARGET_HAVE_MEMORY_BARRIER")
>>+                              (DI "TARGET_HAVE_LDREXD && ARM_DOUBLEWORD_ALIGN && TARGET_HAVE_MEMORY_BARRIER")])
>>+
>
> Can we move all the iterators to iterators.md and then arrange
> includes to work automatically ? Minor nit - could you align the entries
> for QI, HI and DI with the start of the SI ?

Yes I can do - the only odd thing is I guess the sync_predtab is very
sync.md specific, does it really make sense for that
to be in iterators.md ?

>>+(define_mode_attr sync_atleastsi [(SI "SI")
>>+                                (DI "DI")
>>+                                (HI "SI")
>>+                                (QI "SI")])
>>
>
> I couldn't spot where this was being used. Can this be removed if not
> necessary ?

Ah - yes I think that's dead; it's a relic from an attempt to merge some of the
other narrow cases into the same iterator but it got way too messy.

>>-(define_insn "arm_sync_new_nandsi"
>>+(define_insn "arm_sync_new_<sync_optab><mode>"
>>   [(set (match_operand:SI 0 "s_register_operand" "=&r")
>>-        (unspec_volatile:SI [(not:SI (and:SI
>>-                               (match_operand:SI 1 "arm_sync_memory_operand" "+Q")
>>-                               (match_operand:SI 2 "s_register_operand" "r")))
>>-                          ]
>>-                          VUNSPEC_SYNC_NEW_OP))
>>+        (unspec_volatile:SI [(syncop:SI
>>+                             (zero_extend:SI
>>+                               (match_operand:NARROW 1 "arm_sync_memory_operand" "+Q"))
>>+                             (match_operand:SI 2 "s_register_operand" "r"))
>>+                          ]
>>+                          VUNSPEC_SYNC_NEW_OP))
>>    (set (match_dup 1)
>>-        (unspec_volatile:SI [(match_dup 1) (match_dup 2)]
>>-                          VUNSPEC_SYNC_NEW_OP))
>>+      (unspec_volatile:NARROW [(match_dup 1) (match_dup 2)]
>>+                              VUNSPEC_SYNC_NEW_OP))
>>    (clobber (reg:CC CC_REGNUM))
>>    (clobber (match_scratch:SI 3 "=&r"))]
>>-  "TARGET_HAVE_LDREX && TARGET_HAVE_MEMORY_BARRIER"
>>+  "TARGET_HAVE_LDREXBH && TARGET_HAVE_MEMORY_BARRIER"
>
> Can't this just use <sync_predtab> instead since the condition is identical
> for QImode and HImode from that mode attribute and in quite a few
> places below. ?

Hmm yes it can - I'd only been using predtab in the places where it was
varying on the mode; but as you say this can be converted as well.

>>@@ -461,19 +359,19 @@
>>         (unspec_volatile:SI
>>         [(not:SI
>>            (and:SI
>>-               (zero_extend:SI
>>-               (match_operand:NARROW 1 "arm_sync_memory_operand" "+Q"))
>>-               (match_operand:SI 2 "s_register_operand" "r")))
>>+             (zero_extend:SI
>>+               (match_operand:NARROW 1 "arm_sync_memory_operand" "+Q"))
>>+             (match_operand:SI 2 "s_register_operand" "r")))
>>         ] VUNSPEC_SYNC_NEW_OP))
>>    (set (match_dup 1)
>>         (unspec_volatile:NARROW [(match_dup 1) (match_dup 2)]
>>-                              VUNSPEC_SYNC_NEW_OP))
>>+                              VUNSPEC_SYNC_NEW_OP))
>>    (clobber (reg:CC CC_REGNUM))
>>    (clobber (match_scratch:SI 3 "=&r"))]
>>-  "TARGET_HAVE_LDREX && TARGET_HAVE_MEMORY_BARRIER"
>>+  "TARGET_HAVE_LDREXBH && TARGET_HAVE_MEMORY_BARRIER"
>
> Again here . Not sure which pattern this is though just looking at the patch.

Sure.

Thanks for reviewing it.

Dave

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [Patch 0/4] ARM 64 bit sync atomic operations [V2]
  2011-07-12 12:55 ` [Patch 0/3] " Ramana Radhakrishnan
@ 2011-07-26  9:14   ` Dr. David Alan Gilbert
  2011-07-26  9:16     ` [Patch 1/4] " Dr. David Alan Gilbert
  0 siblings, 1 reply; 43+ messages in thread
From: Dr. David Alan Gilbert @ 2011-07-26  9:14 UTC (permalink / raw)
  To: gcc-patches; +Cc: ramana.radhakrishnan, rth, joseph, patches

Hi,
  This is V2 of a series of 4 patches relating to ARM atomic operations;
they incorporate most of the feedback from V1 - thanks Ramana, Richard and
Joseph for comments.

  1) Provide 64 bit atomic operations using the new ldrexd/strexd in ARMv6k 
     and above.
  2) Provide fallbacks so that when compiled for earlier CPUs a Linux kernel
     asssist is called (as per 32bit and smaller ops)
  3) Fix pr48126 which is a misplaced barrier in the atomic generation
  4) Correct the definition of TARGET_HAVE_DMB_MCR so that it doesn't
     produce the mcr instruction in Thumb1 (and enable on ARMv6 not just 6k
     as per the docs).

Relative to v1:
  Split the DMB_MCR patch out
  Provide complete changelogs
  Don't emit IT instruction except in Thumb2 mode
  Move iterators to iterators.md (didn't move the table since it was specific
    to sync.md)
  Remove sync_atleastsi
  Use sync_predtab in as many places as possible
  Avoid headers in libgcc
  Made various libgcc routines I added static
  used __write instead of write
  Comment the barrier move to explain it more

  Note that the kernel interface has remained the same for the helper, and as
such I've not changed the way the helper calling in patch 2 is structured.

This code was tested with a full bootstrap on ARM; make check results
are the same as without the patches except for extra passes due to the new
tests.

This work is part of Linaro blueprint:
https://blueprints.launchpad.net/linaro-toolchain-misc/+spec/64-bit-sync-primitives

Dave

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [Patch 1/4] ARM 64 bit sync atomic operations [V2]
  2011-07-26  9:14   ` [Patch 0/4] ARM 64 bit sync atomic operations [V2] Dr. David Alan Gilbert
@ 2011-07-26  9:16     ` Dr. David Alan Gilbert
  2011-07-26  9:18       ` [Patch 2/4] " Dr. David Alan Gilbert
  2011-09-30 14:25       ` [Patch 1/4] " Ramana Radhakrishnan
  0 siblings, 2 replies; 43+ messages in thread
From: Dr. David Alan Gilbert @ 2011-07-26  9:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: ramana.radhakrishnan, rth, joseph, patches


    Add support for ARM 64bit sync intrinsics.
    
        gcc/
	* arm.c (arm_output_ldrex): Support ldrexd.
	  (arm_output_strex): Support strexd.
	  (arm_output_it): New helper to output it in Thumb2 mode only.
	  (arm_output_sync_loop): Support DI mode,
				 Change comment to not support const_int.
	  (arm_expand_sync): Support DI mode.
    
	* arm.h (TARGET_HAVE_LDREXBHD): Split into LDREXBH and LDREXD.
    
	* iterators.md (NARROW): move from sync.md.
	  (QHSD): New iterator for all current ARM integer modes.
	  (SIDI): New iterator for SI and DI modes only.
    
	* sync.md  (sync_predtab): New mode_attr
	  (sync_compare_and_swapsi): Fold into sync_compare_and_swap<mode>
	  (sync_lock_test_and_setsi): Fold into sync_lock_test_and_setsi<mode>
	  (sync_<sync_optab>si): Fold into sync_<sync_optab><mode>
	  (sync_nandsi): Fold into sync_nand<mode>
	  (sync_new_<sync_optab>si): Fold into sync_new_<sync_optab><mode>
	  (sync_new_nandsi): Fold into sync_new_nand<mode>
	  (sync_old_<sync_optab>si): Fold into sync_old_<sync_optab><mode>
	  (sync_old_nandsi): Fold into sync_old_nand<mode>
	  (sync_compare_and_swap<mode>): Support SI & DI
	  (sync_lock_test_and_set<mode>): Likewise
	  (sync_<sync_optab><mode>): Likewise
	  (sync_nand<mode>): Likewise
	  (sync_new_<sync_optab><mode>): Likewise
	  (sync_new_nand<mode>): Likewise
	  (sync_old_<sync_optab><mode>): Likewise
	  (sync_old_nand<mode>): Likewise
	  (arm_sync_compare_and_swapsi): Turn into iterator on SI & DI
	  (arm_sync_lock_test_and_setsi): Likewise
	  (arm_sync_new_<sync_optab>si): Likewise
	  (arm_sync_new_nandsi): Likewise
	  (arm_sync_old_<sync_optab>si): Likewise
	  (arm_sync_old_nandsi): Likewise
	  (arm_sync_compare_and_swap<mode> NARROW): use sync_predtab, fix indent
	  (arm_sync_lock_test_and_setsi<mode> NARROW): Likewise
	  (arm_sync_new_<sync_optab><mode> NARROW): Likewise
	  (arm_sync_new_nand<mode> NARROW): Likewise
	  (arm_sync_old_<sync_optab><mode> NARROW): Likewise
	  (arm_sync_old_nand<mode> NARROW): Likewise
    
      gcc/testsuite/
	* gcc.dg/di-longlong64-sync-1.c: New test.
	* gcc.dg/di-sync-multithread.c: New test.
	* gcc.target/arm/di-longlong64-sync-withhelpers.c: New test.
	* gcc.target/arm/di-longlong64-sync-withldrexd.c: New test.
	* lib/target-supports.exp: (arm_arch_*_ok): Series of  effective-target
		tests for v5, v6, v6k, and v7-a, and add-options helpers.
	  (check_effective_target_arm_arm_ok): New helper.
	  (check_effective_target_sync_longlong): New helper.

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index d9763d2..28be078 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -23498,12 +23498,26 @@ arm_output_ldrex (emit_f emit,
 		  rtx target,
 		  rtx memory)
 {
-  const char *suffix = arm_ldrex_suffix (mode);
-  rtx operands[2];
+  rtx operands[3];
 
   operands[0] = target;
-  operands[1] = memory;
-  arm_output_asm_insn (emit, 0, operands, "ldrex%s\t%%0, %%C1", suffix);
+  if (mode != DImode)
+    {
+      const char *suffix = arm_ldrex_suffix (mode);
+      operands[1] = memory;
+      arm_output_asm_insn (emit, 0, operands, "ldrex%s\t%%0, %%C1", suffix);
+    }
+  else 
+    {
+      /* The restrictions on target registers in ARM mode are that the two
+	 registers are consecutive and the first one is even; Thumb is
+	 actually more flexible, but DI should give us this anyway.
+	 Note that the 1st register always gets the lowest word in memory.  */
+      gcc_assert ((REGNO (target) & 1) == 0);
+      operands[1] = gen_rtx_REG (SImode, REGNO (target) + 1);
+      operands[2] = memory;
+      arm_output_asm_insn (emit, 0, operands, "ldrexd\t%%0, %%1, %%C2");
+    }
 }
 
 /* Emit a strex{b,h,d, } instruction appropriate for the specified
@@ -23516,14 +23530,41 @@ arm_output_strex (emit_f emit,
 		  rtx value,
 		  rtx memory)
 {
-  const char *suffix = arm_ldrex_suffix (mode);
-  rtx operands[3];
+  rtx operands[4];
 
   operands[0] = result;
   operands[1] = value;
-  operands[2] = memory;
-  arm_output_asm_insn (emit, 0, operands, "strex%s%s\t%%0, %%1, %%C2", suffix,
-		       cc);
+  if (mode != DImode)
+    {
+      const char *suffix = arm_ldrex_suffix (mode);
+      operands[2] = memory;
+      arm_output_asm_insn (emit, 0, operands, "strex%s%s\t%%0, %%1, %%C2",
+			  suffix, cc);
+    }
+  else
+    {
+      /* The restrictions on target registers in ARM mode are that the two
+	 registers are consecutive and the first one is even; Thumb is
+	 actually more flexible, but DI should give us this anyway.
+	 Note that the 1st register always gets the lowest word in memory.  */
+      gcc_assert ((REGNO (value) & 1) == 0);
+      operands[2] = gen_rtx_REG (SImode, REGNO (value) + 1);
+      operands[3] = memory;
+      arm_output_asm_insn (emit, 0, operands, "strexd%s\t%%0, %%1, %%2, %%C3",
+			   cc);
+    }
+}
+
+/* Helper to emit an it instruction in Thumb2 mode only; although the assembler
+   will ignore it in ARM mode, emitting it will mess up instruction counts we
+   sometimes keep 'flags' are the extra t's and e's if it's more than one
+   instruction that is conditional. */
+static void
+arm_output_it (emit_f emit, const char *flags, const char *cond)
+{
+  rtx operands[1]; /* Don't actually use the operand */
+  if (TARGET_THUMB2)
+    arm_output_asm_insn (emit, 0, operands, "it%s\t%s", flags, cond);
 }
 
 /* Helper to emit a two operand instruction.  */
@@ -23565,7 +23606,7 @@ arm_output_op3 (emit_f emit, const char *mnemonic, rtx d, rtx a, rtx b)
 
    required_value:
 
-   RTX register or const_int representing the required old_value for
+   RTX register representing the required old_value for
    the modify to continue, if NULL no comparsion is performed.  */
 static void
 arm_output_sync_loop (emit_f emit,
@@ -23579,7 +23620,13 @@ arm_output_sync_loop (emit_f emit,
 		      enum attr_sync_op sync_op,
 		      int early_barrier_required)
 {
-  rtx operands[1];
+  rtx operands[2];
+  /* We'll use the lo for the normal rtx in the none-DI case
+     as well as the least-sig word in the DI case.  */
+  rtx old_value_lo, required_value_lo, new_value_lo, t1_lo;
+  rtx old_value_hi, required_value_hi, new_value_hi, t1_hi;
+
+  bool is_di = mode == DImode;
 
   gcc_assert (t1 != t2);
 
@@ -23590,82 +23637,142 @@ arm_output_sync_loop (emit_f emit,
 
   arm_output_ldrex (emit, mode, old_value, memory);
 
+  if (is_di)
+    {
+      old_value_lo = gen_lowpart (SImode, old_value);
+      old_value_hi = gen_highpart (SImode, old_value);
+      if (required_value)
+	{
+	  required_value_lo = gen_lowpart (SImode, required_value);
+	  required_value_hi = gen_highpart (SImode, required_value);
+	}
+      else
+	{
+	  /* Silence false potentially unused warning */
+	  required_value_lo = NULL;
+	  required_value_hi = NULL;
+	}
+      new_value_lo = gen_lowpart (SImode, new_value);
+      new_value_hi = gen_highpart (SImode, new_value);
+      t1_lo = gen_lowpart (SImode, t1);
+      t1_hi = gen_highpart (SImode, t1);
+    }
+  else
+    {
+      old_value_lo = old_value;
+      new_value_lo = new_value;
+      required_value_lo = required_value;
+      t1_lo = t1;
+
+      /* Silence false potentially unused warning */
+      t1_hi = NULL;
+      new_value_hi = NULL;
+      required_value_hi = NULL;
+      old_value_hi = NULL;
+    }
+
   if (required_value)
     {
-      rtx operands[2];
+      operands[0] = old_value_lo;
+      operands[1] = required_value_lo;
 
-      operands[0] = old_value;
-      operands[1] = required_value;
       arm_output_asm_insn (emit, 0, operands, "cmp\t%%0, %%1");
+      if (is_di)
+        {
+          arm_output_it (emit, "", "eq");
+          arm_output_op2 (emit, "cmpeq", old_value_hi, required_value_hi);
+        }
       arm_output_asm_insn (emit, 0, operands, "bne\t%sLSYB%%=", LOCAL_LABEL_PREFIX);
     }
 
   switch (sync_op)
     {
     case SYNC_OP_ADD:
-      arm_output_op3 (emit, "add", t1, old_value, new_value);
+      arm_output_op3 (emit, is_di ? "adds" : "add",
+		      t1_lo, old_value_lo, new_value_lo);
+      if (is_di)
+	arm_output_op3 (emit, "adc", t1_hi, old_value_hi, new_value_hi);
       break;
 
     case SYNC_OP_SUB:
-      arm_output_op3 (emit, "sub", t1, old_value, new_value);
+      arm_output_op3 (emit, is_di ? "subs" : "sub",
+		      t1_lo, old_value_lo, new_value_lo);
+      if (is_di)
+	arm_output_op3 (emit, "sbc", t1_hi, old_value_hi, new_value_hi);
       break;
 
     case SYNC_OP_IOR:
-      arm_output_op3 (emit, "orr", t1, old_value, new_value);
+      arm_output_op3 (emit, "orr", t1_lo, old_value_lo, new_value_lo);
+      if (is_di)
+	arm_output_op3 (emit, "orr", t1_hi, old_value_hi, new_value_hi);
       break;
 
     case SYNC_OP_XOR:
-      arm_output_op3 (emit, "eor", t1, old_value, new_value);
+      arm_output_op3 (emit, "eor", t1_lo, old_value_lo, new_value_lo);
+      if (is_di)
+	arm_output_op3 (emit, "eor", t1_hi, old_value_hi, new_value_hi);
       break;
 
     case SYNC_OP_AND:
-      arm_output_op3 (emit,"and", t1, old_value, new_value);
+      arm_output_op3 (emit,"and", t1_lo, old_value_lo, new_value_lo);
+      if (is_di)
+	arm_output_op3 (emit, "and", t1_hi, old_value_hi, new_value_hi);
       break;
 
     case SYNC_OP_NAND:
-      arm_output_op3 (emit, "and", t1, old_value, new_value);
-      arm_output_op2 (emit, "mvn", t1, t1);
+      arm_output_op3 (emit, "and", t1_lo, old_value_lo, new_value_lo);
+      if (is_di)
+	arm_output_op3 (emit, "and", t1_hi, old_value_hi, new_value_hi);
+      arm_output_op2 (emit, "mvn", t1_lo, t1_lo);
+      if (is_di)
+	arm_output_op2 (emit, "mvn", t1_hi, t1_hi);
       break;
 
     case SYNC_OP_NONE:
       t1 = new_value;
+      t1_lo = new_value_lo;
+      if (is_di)
+	t1_hi = new_value_hi;
       break;
     }
 
+  /* Note that the result of strex is a 0/1 flag that's always 1 register. */
   if (t2)
     {
-       arm_output_strex (emit, mode, "", t2, t1, memory);
-       operands[0] = t2;
-       arm_output_asm_insn (emit, 0, operands, "teq\t%%0, #0");
-       arm_output_asm_insn (emit, 0, operands, "bne\t%sLSYT%%=",
-			    LOCAL_LABEL_PREFIX);
+      arm_output_strex (emit, mode, "", t2, t1, memory);
+      operands[0] = t2;
+      arm_output_asm_insn (emit, 0, operands, "teq\t%%0, #0");
+      arm_output_asm_insn (emit, 0, operands, "bne\t%sLSYT%%=",
+			   LOCAL_LABEL_PREFIX);
     }
   else
     {
       /* Use old_value for the return value because for some operations
 	 the old_value can easily be restored.  This saves one register.  */
-      arm_output_strex (emit, mode, "", old_value, t1, memory);
-      operands[0] = old_value;
+      arm_output_strex (emit, mode, "", old_value_lo, t1, memory);
+      operands[0] = old_value_lo;
       arm_output_asm_insn (emit, 0, operands, "teq\t%%0, #0");
       arm_output_asm_insn (emit, 0, operands, "bne\t%sLSYT%%=",
 			   LOCAL_LABEL_PREFIX);
 
+      /* Note that we only used the _lo half of old_value as a temporary
+	 so in DI we don't have to restore the _hi part */
       switch (sync_op)
 	{
 	case SYNC_OP_ADD:
-	  arm_output_op3 (emit, "sub", old_value, t1, new_value);
+	  arm_output_op3 (emit, "sub", old_value_lo, t1_lo, new_value_lo);
 	  break;
 
 	case SYNC_OP_SUB:
-	  arm_output_op3 (emit, "add", old_value, t1, new_value);
+	  arm_output_op3 (emit, "add", old_value_lo, t1_lo, new_value_lo);
 	  break;
 
 	case SYNC_OP_XOR:
-	  arm_output_op3 (emit, "eor", old_value, t1, new_value);
+	  arm_output_op3 (emit, "eor", old_value_lo, t1_lo, new_value_lo);
 	  break;
 
 	case SYNC_OP_NONE:
-	  arm_output_op2 (emit, "mov", old_value, required_value);
+	  arm_output_op2 (emit, "mov", old_value_lo, required_value_lo);
 	  break;
 
 	default:
@@ -23768,7 +23875,7 @@ arm_expand_sync (enum machine_mode mode,
     target = gen_reg_rtx (mode);
 
   memory = arm_legitimize_sync_memory (memory);
-  if (mode != SImode)
+  if (mode != SImode && mode != DImode)
     {
       rtx load_temp = gen_reg_rtx (SImode);
 
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 3810f9e..0d419d5 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -293,8 +293,12 @@ extern void (*arm_lang_output_object_attributes_hook)(void);
 /* Nonzero if this chip supports ldrex and strex */
 #define TARGET_HAVE_LDREX	((arm_arch6 && TARGET_ARM) || arm_arch7)
 
-/* Nonzero if this chip supports ldrex{bhd} and strex{bhd}.  */
-#define TARGET_HAVE_LDREXBHD	((arm_arch6k && TARGET_ARM) || arm_arch7)
+/* Nonzero if this chip supports ldrex{bh} and strex{bh}.  */
+#define TARGET_HAVE_LDREXBH	((arm_arch6k && TARGET_ARM) || arm_arch7)
+
+/* Nonzero if this chip supports ldrexd and strexd.  */
+#define TARGET_HAVE_LDREXD	(((arm_arch6k && TARGET_ARM) || arm_arch7) \
+				 && arm_arch_notm)
 
 /* Nonzero if integer division instructions supported.  */
 #define TARGET_IDIV		((TARGET_ARM && arm_arch_arm_hwdiv) \
diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index b11b112..0f4c692 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -33,6 +33,15 @@
 ;; A list of integer modes that are up to one word long
 (define_mode_iterator QHSI [QI HI SI])
 
+;; A list of integer modes that are less than a word
+(define_mode_iterator NARROW [QI HI])
+
+;; A list of all the integer modes upto 64bit
+(define_mode_iterator QHSD [QI HI SI DI])
+
+;; A list of the 32bit and 64bit integer modes
+(define_mode_iterator SIDI [SI DI])
+
 ;; Integer element sizes implemented by IWMMXT.
 (define_mode_iterator VMMX [V2SI V4HI V8QI])
 
diff --git a/gcc/config/arm/sync.md b/gcc/config/arm/sync.md
index 689a235..7d74e20 100644
--- a/gcc/config/arm/sync.md
+++ b/gcc/config/arm/sync.md
@@ -1,6 +1,7 @@
 ;; Machine description for ARM processor synchronization primitives.
 ;; Copyright (C) 2010 Free Software Foundation, Inc.
 ;; Written by Marcus Shawcroft (marcus.shawcroft@arm.com)
+;; 64bit Atomics by Dave Gilbert (david.gilbert@linaro.org)
 ;;
 ;; This file is part of GCC.
 ;;
@@ -33,31 +34,19 @@
   MEM_VOLATILE_P (operands[0]) = 1;
 })
 
-(define_expand "sync_compare_and_swapsi"
-  [(set (match_operand:SI 0 "s_register_operand")
-        (unspec_volatile:SI [(match_operand:SI 1 "memory_operand")
-			     (match_operand:SI 2 "s_register_operand")
-			     (match_operand:SI 3 "s_register_operand")]
-			     VUNSPEC_SYNC_COMPARE_AND_SWAP))]
-  "TARGET_HAVE_LDREX && TARGET_HAVE_MEMORY_BARRIER"
-  {
-    struct arm_sync_generator generator;
-    generator.op = arm_sync_generator_omrn;
-    generator.u.omrn = gen_arm_sync_compare_and_swapsi;
-    arm_expand_sync (SImode, &generator, operands[0], operands[1], operands[2],
-                     operands[3]);
-    DONE;
-  })
 
-(define_mode_iterator NARROW [QI HI])
+(define_mode_attr sync_predtab [(SI "TARGET_HAVE_LDREX && TARGET_HAVE_MEMORY_BARRIER")
+				(QI "TARGET_HAVE_LDREXBH && TARGET_HAVE_MEMORY_BARRIER")
+				(HI "TARGET_HAVE_LDREXBH && TARGET_HAVE_MEMORY_BARRIER")
+				(DI "TARGET_HAVE_LDREXD && ARM_DOUBLEWORD_ALIGN && TARGET_HAVE_MEMORY_BARRIER")])
 
 (define_expand "sync_compare_and_swap<mode>"
-  [(set (match_operand:NARROW 0 "s_register_operand")
-        (unspec_volatile:NARROW [(match_operand:NARROW 1 "memory_operand")
-			     (match_operand:NARROW 2 "s_register_operand")
-			     (match_operand:NARROW 3 "s_register_operand")]
+  [(set (match_operand:QHSD 0 "s_register_operand")
+        (unspec_volatile:QHSD [(match_operand:QHSD 1 "memory_operand")
+			     (match_operand:QHSD 2 "s_register_operand")
+			     (match_operand:QHSD 3 "s_register_operand")]
 			     VUNSPEC_SYNC_COMPARE_AND_SWAP))]
-  "TARGET_HAVE_LDREXBHD && TARGET_HAVE_MEMORY_BARRIER"
+  "<sync_predtab>"
   {
     struct arm_sync_generator generator;
     generator.op = arm_sync_generator_omrn;
@@ -67,25 +56,11 @@
     DONE;
   })
 
-(define_expand "sync_lock_test_and_setsi"
-  [(match_operand:SI 0 "s_register_operand")
-   (match_operand:SI 1 "memory_operand")
-   (match_operand:SI 2 "s_register_operand")]
-  "TARGET_HAVE_LDREX && TARGET_HAVE_MEMORY_BARRIER"
-  {
-    struct arm_sync_generator generator;
-    generator.op = arm_sync_generator_omn;
-    generator.u.omn = gen_arm_sync_lock_test_and_setsi;
-    arm_expand_sync (SImode, &generator, operands[0], operands[1], NULL,
-                     operands[2]);
-    DONE;
-  })
-
 (define_expand "sync_lock_test_and_set<mode>"
-  [(match_operand:NARROW 0 "s_register_operand")
-   (match_operand:NARROW 1 "memory_operand")
-   (match_operand:NARROW 2 "s_register_operand")]
-  "TARGET_HAVE_LDREXBHD && TARGET_HAVE_MEMORY_BARRIER"
+  [(match_operand:QHSD 0 "s_register_operand")
+   (match_operand:QHSD 1 "memory_operand")
+   (match_operand:QHSD 2 "s_register_operand")]
+  "<sync_predtab>"
   {
     struct arm_sync_generator generator;
     generator.op = arm_sync_generator_omn;
@@ -115,51 +90,25 @@
 				(plus "*")
 				(minus "*")])
 
-(define_expand "sync_<sync_optab>si"
-  [(match_operand:SI 0 "memory_operand")
-   (match_operand:SI 1 "s_register_operand")
-   (syncop:SI (match_dup 0) (match_dup 1))]
-  "TARGET_HAVE_LDREX && TARGET_HAVE_MEMORY_BARRIER"
-  {
-    struct arm_sync_generator generator;
-    generator.op = arm_sync_generator_omn;
-    generator.u.omn = gen_arm_sync_new_<sync_optab>si;
-    arm_expand_sync (SImode, &generator, NULL, operands[0], NULL, operands[1]);
-    DONE;
-  })
-
-(define_expand "sync_nandsi"
-  [(match_operand:SI 0 "memory_operand")
-   (match_operand:SI 1 "s_register_operand")
-   (not:SI (and:SI (match_dup 0) (match_dup 1)))]
-  "TARGET_HAVE_LDREX && TARGET_HAVE_MEMORY_BARRIER"
-  {
-    struct arm_sync_generator generator;
-    generator.op = arm_sync_generator_omn;
-    generator.u.omn = gen_arm_sync_new_nandsi;
-    arm_expand_sync (SImode, &generator, NULL, operands[0], NULL, operands[1]);
-    DONE;
-  })
-
 (define_expand "sync_<sync_optab><mode>"
-  [(match_operand:NARROW 0 "memory_operand")
-   (match_operand:NARROW 1 "s_register_operand")
-   (syncop:NARROW (match_dup 0) (match_dup 1))]
-  "TARGET_HAVE_LDREXBHD && TARGET_HAVE_MEMORY_BARRIER"
+  [(match_operand:QHSD 0 "memory_operand")
+   (match_operand:QHSD 1 "s_register_operand")
+   (syncop:QHSD (match_dup 0) (match_dup 1))]
+  "<sync_predtab>"
   {
     struct arm_sync_generator generator;
     generator.op = arm_sync_generator_omn;
     generator.u.omn = gen_arm_sync_new_<sync_optab><mode>;
     arm_expand_sync (<MODE>mode, &generator, NULL, operands[0], NULL,
-    		     operands[1]);
+		     operands[1]);
     DONE;
   })
 
 (define_expand "sync_nand<mode>"
-  [(match_operand:NARROW 0 "memory_operand")
-   (match_operand:NARROW 1 "s_register_operand")
-   (not:NARROW (and:NARROW (match_dup 0) (match_dup 1)))]
-  "TARGET_HAVE_LDREXBHD && TARGET_HAVE_MEMORY_BARRIER"
+  [(match_operand:QHSD 0 "memory_operand")
+   (match_operand:QHSD 1 "s_register_operand")
+   (not:QHSD (and:QHSD (match_dup 0) (match_dup 1)))]
+  "<sync_predtab>"
   {
     struct arm_sync_generator generator;
     generator.op = arm_sync_generator_omn;
@@ -169,57 +118,27 @@
     DONE;
   })
 
-(define_expand "sync_new_<sync_optab>si"
-  [(match_operand:SI 0 "s_register_operand")
-   (match_operand:SI 1 "memory_operand")
-   (match_operand:SI 2 "s_register_operand")
-   (syncop:SI (match_dup 1) (match_dup 2))]
-  "TARGET_HAVE_LDREX && TARGET_HAVE_MEMORY_BARRIER"
-  {
-    struct arm_sync_generator generator;
-    generator.op = arm_sync_generator_omn;
-    generator.u.omn = gen_arm_sync_new_<sync_optab>si;
-    arm_expand_sync (SImode, &generator, operands[0], operands[1], NULL,
-                     operands[2]);
-    DONE;
-  })
-
-(define_expand "sync_new_nandsi"
-  [(match_operand:SI 0 "s_register_operand")
-   (match_operand:SI 1 "memory_operand")
-   (match_operand:SI 2 "s_register_operand")
-   (not:SI (and:SI (match_dup 1) (match_dup 2)))]
-  "TARGET_HAVE_LDREX && TARGET_HAVE_MEMORY_BARRIER"
-  {
-    struct arm_sync_generator generator;
-    generator.op = arm_sync_generator_omn;
-    generator.u.omn = gen_arm_sync_new_nandsi;
-    arm_expand_sync (SImode, &generator, operands[0], operands[1], NULL,
-    		     operands[2]);
-    DONE;
-  })
-
 (define_expand "sync_new_<sync_optab><mode>"
-  [(match_operand:NARROW 0 "s_register_operand")
-   (match_operand:NARROW 1 "memory_operand")
-   (match_operand:NARROW 2 "s_register_operand")
-   (syncop:NARROW (match_dup 1) (match_dup 2))]
-  "TARGET_HAVE_LDREXBHD && TARGET_HAVE_MEMORY_BARRIER"
+  [(match_operand:QHSD 0 "s_register_operand")
+   (match_operand:QHSD 1 "memory_operand")
+   (match_operand:QHSD 2 "s_register_operand")
+   (syncop:QHSD (match_dup 1) (match_dup 2))]
+  "<sync_predtab>"
   {
     struct arm_sync_generator generator;
     generator.op = arm_sync_generator_omn;
     generator.u.omn = gen_arm_sync_new_<sync_optab><mode>;
     arm_expand_sync (<MODE>mode, &generator, operands[0], operands[1],
-    		     NULL, operands[2]);
+		     NULL, operands[2]);
     DONE;
   })
 
 (define_expand "sync_new_nand<mode>"
-  [(match_operand:NARROW 0 "s_register_operand")
-   (match_operand:NARROW 1 "memory_operand")
-   (match_operand:NARROW 2 "s_register_operand")
-   (not:NARROW (and:NARROW (match_dup 1) (match_dup 2)))]
-  "TARGET_HAVE_LDREXBHD && TARGET_HAVE_MEMORY_BARRIER"
+  [(match_operand:QHSD 0 "s_register_operand")
+   (match_operand:QHSD 1 "memory_operand")
+   (match_operand:QHSD 2 "s_register_operand")
+   (not:QHSD (and:QHSD (match_dup 1) (match_dup 2)))]
+  "<sync_predtab>"
   {
     struct arm_sync_generator generator;
     generator.op = arm_sync_generator_omn;
@@ -229,57 +148,27 @@
     DONE;
   });
 
-(define_expand "sync_old_<sync_optab>si"
-  [(match_operand:SI 0 "s_register_operand")
-   (match_operand:SI 1 "memory_operand")
-   (match_operand:SI 2 "s_register_operand")
-   (syncop:SI (match_dup 1) (match_dup 2))]
-  "TARGET_HAVE_LDREX && TARGET_HAVE_MEMORY_BARRIER"
-  {
-    struct arm_sync_generator generator;
-    generator.op = arm_sync_generator_omn;
-    generator.u.omn = gen_arm_sync_old_<sync_optab>si;
-    arm_expand_sync (SImode, &generator, operands[0], operands[1], NULL,
-                     operands[2]);
-    DONE;
-  })
-
-(define_expand "sync_old_nandsi"
-  [(match_operand:SI 0 "s_register_operand")
-   (match_operand:SI 1 "memory_operand")
-   (match_operand:SI 2 "s_register_operand")
-   (not:SI (and:SI (match_dup 1) (match_dup 2)))]
-  "TARGET_HAVE_LDREX && TARGET_HAVE_MEMORY_BARRIER"
-  {
-    struct arm_sync_generator generator;
-    generator.op = arm_sync_generator_omn;
-    generator.u.omn = gen_arm_sync_old_nandsi;
-    arm_expand_sync (SImode, &generator, operands[0], operands[1], NULL,
-                     operands[2]);
-    DONE;
-  })
-
 (define_expand "sync_old_<sync_optab><mode>"
-  [(match_operand:NARROW 0 "s_register_operand")
-   (match_operand:NARROW 1 "memory_operand")
-   (match_operand:NARROW 2 "s_register_operand")
-   (syncop:NARROW (match_dup 1) (match_dup 2))]
-  "TARGET_HAVE_LDREXBHD && TARGET_HAVE_MEMORY_BARRIER"
+  [(match_operand:QHSD 0 "s_register_operand")
+   (match_operand:QHSD 1 "memory_operand")
+   (match_operand:QHSD 2 "s_register_operand")
+   (syncop:QHSD (match_dup 1) (match_dup 2))]
+  "<sync_predtab>"
   {
     struct arm_sync_generator generator;
     generator.op = arm_sync_generator_omn;
     generator.u.omn = gen_arm_sync_old_<sync_optab><mode>;
     arm_expand_sync (<MODE>mode, &generator, operands[0], operands[1],
-    		     NULL, operands[2]);
+		     NULL, operands[2]);
     DONE;
   })
 
 (define_expand "sync_old_nand<mode>"
-  [(match_operand:NARROW 0 "s_register_operand")
-   (match_operand:NARROW 1 "memory_operand")
-   (match_operand:NARROW 2 "s_register_operand")
-   (not:NARROW (and:NARROW (match_dup 1) (match_dup 2)))]
-  "TARGET_HAVE_LDREXBHD && TARGET_HAVE_MEMORY_BARRIER"
+  [(match_operand:QHSD 0 "s_register_operand")
+   (match_operand:QHSD 1 "memory_operand")
+   (match_operand:QHSD 2 "s_register_operand")
+   (not:QHSD (and:QHSD (match_dup 1) (match_dup 2)))]
+  "<sync_predtab>"
   {
     struct arm_sync_generator generator;
     generator.op = arm_sync_generator_omn;
@@ -289,22 +178,22 @@
     DONE;
   })
 
-(define_insn "arm_sync_compare_and_swapsi"
-  [(set (match_operand:SI 0 "s_register_operand" "=&r")
-        (unspec_volatile:SI
-	  [(match_operand:SI 1 "arm_sync_memory_operand" "+Q")
-   	   (match_operand:SI 2 "s_register_operand" "r")
-	   (match_operand:SI 3 "s_register_operand" "r")]
-	  VUNSPEC_SYNC_COMPARE_AND_SWAP))
-   (set (match_dup 1) (unspec_volatile:SI [(match_dup 2)]
+(define_insn "arm_sync_compare_and_swap<mode>"
+  [(set (match_operand:SIDI 0 "s_register_operand" "=&r")
+        (unspec_volatile:SIDI
+	 [(match_operand:SIDI 1 "arm_sync_memory_operand" "+Q")
+	  (match_operand:SIDI 2 "s_register_operand" "r")
+	  (match_operand:SIDI 3 "s_register_operand" "r")]
+	 VUNSPEC_SYNC_COMPARE_AND_SWAP))
+   (set (match_dup 1) (unspec_volatile:SIDI [(match_dup 2)]
                                           VUNSPEC_SYNC_COMPARE_AND_SWAP))
    (set (reg:CC CC_REGNUM) (unspec_volatile:CC [(match_dup 1)]
                                                 VUNSPEC_SYNC_COMPARE_AND_SWAP))
    ]
-  "TARGET_HAVE_LDREX && TARGET_HAVE_MEMORY_BARRIER"
+  "<sync_predtab>"
   {
     return arm_output_sync_insn (insn, operands);
-  } 
+  }
   [(set_attr "sync_result"          "0")
    (set_attr "sync_memory"          "1")
    (set_attr "sync_required_value"  "2")
@@ -318,7 +207,7 @@
         (zero_extend:SI
 	  (unspec_volatile:NARROW
 	    [(match_operand:NARROW 1 "arm_sync_memory_operand" "+Q")
-   	     (match_operand:SI 2 "s_register_operand" "r")
+	     (match_operand:SI 2 "s_register_operand" "r")
 	     (match_operand:SI 3 "s_register_operand" "r")]
 	    VUNSPEC_SYNC_COMPARE_AND_SWAP)))
    (set (match_dup 1) (unspec_volatile:NARROW [(match_dup 2)]
@@ -326,10 +215,10 @@
    (set (reg:CC CC_REGNUM) (unspec_volatile:CC [(match_dup 1)]
                                                 VUNSPEC_SYNC_COMPARE_AND_SWAP))
    ]
-  "TARGET_HAVE_LDREXBHD && TARGET_HAVE_MEMORY_BARRIER"
+  "<sync_predtab>"
   {
     return arm_output_sync_insn (insn, operands);
-  } 
+  }
   [(set_attr "sync_result"          "0")
    (set_attr "sync_memory"          "1")
    (set_attr "sync_required_value"  "2")
@@ -338,18 +227,18 @@
    (set_attr "conds" "clob")
    (set_attr "predicable" "no")])
 
-(define_insn "arm_sync_lock_test_and_setsi"
-  [(set (match_operand:SI 0 "s_register_operand" "=&r")
-        (match_operand:SI 1 "arm_sync_memory_operand" "+Q"))
+(define_insn "arm_sync_lock_test_and_set<mode>"
+  [(set (match_operand:SIDI 0 "s_register_operand" "=&r")
+	(match_operand:SIDI 1 "arm_sync_memory_operand" "+Q"))
    (set (match_dup 1)
-        (unspec_volatile:SI [(match_operand:SI 2 "s_register_operand" "r")]
-	                    VUNSPEC_SYNC_LOCK))
+	(unspec_volatile:SIDI [(match_operand:SIDI 2 "s_register_operand" "r")]
+	VUNSPEC_SYNC_LOCK))
    (clobber (reg:CC CC_REGNUM))
    (clobber (match_scratch:SI 3 "=&r"))]
-  "TARGET_HAVE_LDREX && TARGET_HAVE_MEMORY_BARRIER"
+  "<sync_predtab>"
   {
     return arm_output_sync_insn (insn, operands);
-  } 
+  }
   [(set_attr "sync_release_barrier" "no")
    (set_attr "sync_result"          "0")
    (set_attr "sync_memory"          "1")
@@ -364,10 +253,10 @@
         (zero_extend:SI (match_operand:NARROW 1 "arm_sync_memory_operand" "+Q")))
    (set (match_dup 1)
         (unspec_volatile:NARROW [(match_operand:SI 2 "s_register_operand" "r")]
-	                        VUNSPEC_SYNC_LOCK))
+				VUNSPEC_SYNC_LOCK))
    (clobber (reg:CC CC_REGNUM))
    (clobber (match_scratch:SI 3 "=&r"))]
-  "TARGET_HAVE_LDREX && TARGET_HAVE_MEMORY_BARRIER"
+  "<sync_predtab>"
   {
     return arm_output_sync_insn (insn, operands);
   } 
@@ -380,22 +269,22 @@
    (set_attr "conds" "clob")
    (set_attr "predicable" "no")])
 
-(define_insn "arm_sync_new_<sync_optab>si"
-  [(set (match_operand:SI 0 "s_register_operand" "=&r")
-        (unspec_volatile:SI [(syncop:SI
-                               (match_operand:SI 1 "arm_sync_memory_operand" "+Q")
-                               (match_operand:SI 2 "s_register_operand" "r"))
-	                    ]
-	                    VUNSPEC_SYNC_NEW_OP))
+(define_insn "arm_sync_new_<sync_optab><mode>"
+  [(set (match_operand:SIDI 0 "s_register_operand" "=&r")
+        (unspec_volatile:SIDI [(syncop:SIDI
+			       (match_operand:SIDI 1 "arm_sync_memory_operand" "+Q")
+			       (match_operand:SIDI 2 "s_register_operand" "r"))
+			    ]
+			    VUNSPEC_SYNC_NEW_OP))
    (set (match_dup 1)
-        (unspec_volatile:SI [(match_dup 1) (match_dup 2)]
-	                    VUNSPEC_SYNC_NEW_OP))
+	(unspec_volatile:SIDI [(match_dup 1) (match_dup 2)]
+			    VUNSPEC_SYNC_NEW_OP))
    (clobber (reg:CC CC_REGNUM))
    (clobber (match_scratch:SI 3 "=&r"))]
-  "TARGET_HAVE_LDREX && TARGET_HAVE_MEMORY_BARRIER"
+  "<sync_predtab>"
   {
     return arm_output_sync_insn (insn, operands);
-  } 
+  }
   [(set_attr "sync_result"          "0")
    (set_attr "sync_memory"          "1")
    (set_attr "sync_new_value"       "2")
@@ -405,54 +294,54 @@
    (set_attr "conds" "clob")
    (set_attr "predicable" "no")])
 
-(define_insn "arm_sync_new_nandsi"
+(define_insn "arm_sync_new_<sync_optab><mode>"
   [(set (match_operand:SI 0 "s_register_operand" "=&r")
-        (unspec_volatile:SI [(not:SI (and:SI
-                               (match_operand:SI 1 "arm_sync_memory_operand" "+Q")
-                               (match_operand:SI 2 "s_register_operand" "r")))
-	                    ]
-	                    VUNSPEC_SYNC_NEW_OP))
+        (unspec_volatile:SI [(syncop:SI
+			       (zero_extend:SI
+				 (match_operand:NARROW 1 "arm_sync_memory_operand" "+Q"))
+			       (match_operand:SI 2 "s_register_operand" "r"))
+			    ]
+			    VUNSPEC_SYNC_NEW_OP))
    (set (match_dup 1)
-        (unspec_volatile:SI [(match_dup 1) (match_dup 2)]
-	                    VUNSPEC_SYNC_NEW_OP))
+	(unspec_volatile:NARROW [(match_dup 1) (match_dup 2)]
+				VUNSPEC_SYNC_NEW_OP))
    (clobber (reg:CC CC_REGNUM))
    (clobber (match_scratch:SI 3 "=&r"))]
-  "TARGET_HAVE_LDREX && TARGET_HAVE_MEMORY_BARRIER"
+  "<sync_predtab>"
   {
     return arm_output_sync_insn (insn, operands);
-  } 
+  }
   [(set_attr "sync_result"          "0")
    (set_attr "sync_memory"          "1")
    (set_attr "sync_new_value"       "2")
    (set_attr "sync_t1"              "0")
    (set_attr "sync_t2"              "3")
-   (set_attr "sync_op"              "nand")
+   (set_attr "sync_op"              "<sync_optab>")
    (set_attr "conds" "clob")
    (set_attr "predicable" "no")])
 
-(define_insn "arm_sync_new_<sync_optab><mode>"
-  [(set (match_operand:SI 0 "s_register_operand" "=&r")
-        (unspec_volatile:SI [(syncop:SI
-                               (zero_extend:SI
-			         (match_operand:NARROW 1 "arm_sync_memory_operand" "+Q"))
-                               (match_operand:SI 2 "s_register_operand" "r"))
-	                    ]
-	                    VUNSPEC_SYNC_NEW_OP))
+(define_insn "arm_sync_new_nand<mode>"
+  [(set (match_operand:SIDI 0 "s_register_operand" "=&r")
+        (unspec_volatile:SIDI [(not:SIDI (and:SIDI
+			       (match_operand:SIDI 1 "arm_sync_memory_operand" "+Q")
+			       (match_operand:SIDI 2 "s_register_operand" "r")))
+			    ]
+			    VUNSPEC_SYNC_NEW_OP))
    (set (match_dup 1)
-        (unspec_volatile:NARROW [(match_dup 1) (match_dup 2)]
-	                        VUNSPEC_SYNC_NEW_OP))
+	(unspec_volatile:SIDI [(match_dup 1) (match_dup 2)]
+			    VUNSPEC_SYNC_NEW_OP))
    (clobber (reg:CC CC_REGNUM))
    (clobber (match_scratch:SI 3 "=&r"))]
-  "TARGET_HAVE_LDREXBHD && TARGET_HAVE_MEMORY_BARRIER"
+  "<sync_predtab>"
   {
     return arm_output_sync_insn (insn, operands);
-  } 
+  }
   [(set_attr "sync_result"          "0")
    (set_attr "sync_memory"          "1")
    (set_attr "sync_new_value"       "2")
    (set_attr "sync_t1"              "0")
    (set_attr "sync_t2"              "3")
-   (set_attr "sync_op"              "<sync_optab>")
+   (set_attr "sync_op"              "nand")
    (set_attr "conds" "clob")
    (set_attr "predicable" "no")])
 
@@ -461,19 +350,19 @@
         (unspec_volatile:SI
 	  [(not:SI
 	     (and:SI
-               (zero_extend:SI	  
-	         (match_operand:NARROW 1 "arm_sync_memory_operand" "+Q"))
-               (match_operand:SI 2 "s_register_operand" "r")))
+	       (zero_extend:SI	  
+		 (match_operand:NARROW 1 "arm_sync_memory_operand" "+Q"))
+	       (match_operand:SI 2 "s_register_operand" "r")))
 	  ] VUNSPEC_SYNC_NEW_OP))
    (set (match_dup 1)
         (unspec_volatile:NARROW [(match_dup 1) (match_dup 2)]
-	                        VUNSPEC_SYNC_NEW_OP))
+				VUNSPEC_SYNC_NEW_OP))
    (clobber (reg:CC CC_REGNUM))
    (clobber (match_scratch:SI 3 "=&r"))]
-  "TARGET_HAVE_LDREX && TARGET_HAVE_MEMORY_BARRIER"
+  "<sync_predtab>"
   {
     return arm_output_sync_insn (insn, operands);
-  } 
+  }
   [(set_attr "sync_result"          "0")
    (set_attr "sync_memory"          "1")
    (set_attr "sync_new_value"       "2")
@@ -483,20 +372,20 @@
    (set_attr "conds" "clob")
    (set_attr "predicable" "no")])
 
-(define_insn "arm_sync_old_<sync_optab>si"
-  [(set (match_operand:SI 0 "s_register_operand" "=&r")
-        (unspec_volatile:SI [(syncop:SI
-                               (match_operand:SI 1 "arm_sync_memory_operand" "+Q")
-                               (match_operand:SI 2 "s_register_operand" "r"))
-	                    ]
-	                    VUNSPEC_SYNC_OLD_OP))
+(define_insn "arm_sync_old_<sync_optab><mode>"
+  [(set (match_operand:SIDI 0 "s_register_operand" "=&r")
+	(unspec_volatile:SIDI [(syncop:SIDI
+			       (match_operand:SIDI 1 "arm_sync_memory_operand" "+Q")
+			       (match_operand:SIDI 2 "s_register_operand" "r"))
+			    ]
+			    VUNSPEC_SYNC_OLD_OP))
    (set (match_dup 1)
-        (unspec_volatile:SI [(match_dup 1) (match_dup 2)]
-	                    VUNSPEC_SYNC_OLD_OP))
+        (unspec_volatile:SIDI [(match_dup 1) (match_dup 2)]
+			      VUNSPEC_SYNC_OLD_OP))
    (clobber (reg:CC CC_REGNUM))
-   (clobber (match_scratch:SI 3 "=&r"))
+   (clobber (match_scratch:SIDI 3 "=&r"))
    (clobber (match_scratch:SI 4 "<sync_clobber>"))]
-  "TARGET_HAVE_LDREX && TARGET_HAVE_MEMORY_BARRIER"
+  "<sync_predtab>"
   {
     return arm_output_sync_insn (insn, operands);
   } 
@@ -509,20 +398,21 @@
    (set_attr "conds" "clob")
    (set_attr "predicable" "no")])
 
-(define_insn "arm_sync_old_nandsi"
+(define_insn "arm_sync_old_<sync_optab><mode>"
   [(set (match_operand:SI 0 "s_register_operand" "=&r")
-        (unspec_volatile:SI [(not:SI (and:SI
-                               (match_operand:SI 1 "arm_sync_memory_operand" "+Q")
-                               (match_operand:SI 2 "s_register_operand" "r")))
-	                    ]
-	                    VUNSPEC_SYNC_OLD_OP))
+        (unspec_volatile:SI [(syncop:SI
+			       (zero_extend:SI
+				 (match_operand:NARROW 1 "arm_sync_memory_operand" "+Q"))
+			       (match_operand:SI 2 "s_register_operand" "r"))
+			    ]
+			    VUNSPEC_SYNC_OLD_OP))
    (set (match_dup 1)
-        (unspec_volatile:SI [(match_dup 1) (match_dup 2)]
-	                    VUNSPEC_SYNC_OLD_OP))
+	(unspec_volatile:NARROW [(match_dup 1) (match_dup 2)]
+			    VUNSPEC_SYNC_OLD_OP))
    (clobber (reg:CC CC_REGNUM))
    (clobber (match_scratch:SI 3 "=&r"))
-   (clobber (match_scratch:SI 4 "=&r"))]
-  "TARGET_HAVE_LDREX && TARGET_HAVE_MEMORY_BARRIER"
+   (clobber (match_scratch:SI 4 "<sync_clobber>"))]
+  "<sync_predtab>"
   {
     return arm_output_sync_insn (insn, operands);
   } 
@@ -530,26 +420,25 @@
    (set_attr "sync_memory"          "1")
    (set_attr "sync_new_value"       "2")
    (set_attr "sync_t1"              "3")
-   (set_attr "sync_t2"              "4")
-   (set_attr "sync_op"              "nand")
+   (set_attr "sync_t2"              "<sync_t2_reqd>")
+   (set_attr "sync_op"              "<sync_optab>")
    (set_attr "conds" 		    "clob")
    (set_attr "predicable" "no")])
 
-(define_insn "arm_sync_old_<sync_optab><mode>"
-  [(set (match_operand:SI 0 "s_register_operand" "=&r")
-        (unspec_volatile:SI [(syncop:SI
-                               (zero_extend:SI
-			         (match_operand:NARROW 1 "arm_sync_memory_operand" "+Q"))
-                               (match_operand:SI 2 "s_register_operand" "r"))
-	                    ]
-	                    VUNSPEC_SYNC_OLD_OP))
+(define_insn "arm_sync_old_nand<mode>"
+  [(set (match_operand:SIDI 0 "s_register_operand" "=&r")
+	(unspec_volatile:SIDI [(not:SIDI (and:SIDI
+			       (match_operand:SIDI 1 "arm_sync_memory_operand" "+Q")
+			       (match_operand:SIDI 2 "s_register_operand" "r")))
+			    ]
+			    VUNSPEC_SYNC_OLD_OP))
    (set (match_dup 1)
-        (unspec_volatile:NARROW [(match_dup 1) (match_dup 2)]
+        (unspec_volatile:SIDI [(match_dup 1) (match_dup 2)]
 	                    VUNSPEC_SYNC_OLD_OP))
    (clobber (reg:CC CC_REGNUM))
-   (clobber (match_scratch:SI 3 "=&r"))
-   (clobber (match_scratch:SI 4 "<sync_clobber>"))]
-  "TARGET_HAVE_LDREXBHD && TARGET_HAVE_MEMORY_BARRIER"
+   (clobber (match_scratch:SIDI 3 "=&r"))
+   (clobber (match_scratch:SI 4 "=&r"))]
+  "<sync_predtab>"
   {
     return arm_output_sync_insn (insn, operands);
   } 
@@ -557,26 +446,26 @@
    (set_attr "sync_memory"          "1")
    (set_attr "sync_new_value"       "2")
    (set_attr "sync_t1"              "3")
-   (set_attr "sync_t2"              "<sync_t2_reqd>")
-   (set_attr "sync_op"              "<sync_optab>")
+   (set_attr "sync_t2"              "4")
+   (set_attr "sync_op"              "nand")
    (set_attr "conds" 		    "clob")
    (set_attr "predicable" "no")])
 
 (define_insn "arm_sync_old_nand<mode>"
   [(set (match_operand:SI 0 "s_register_operand" "=&r")
-        (unspec_volatile:SI [(not:SI (and:SI
-                               (zero_extend:SI
-			         (match_operand:NARROW 1 "arm_sync_memory_operand" "+Q"))
-                               (match_operand:SI 2 "s_register_operand" "r")))
-	                    ]
-	                    VUNSPEC_SYNC_OLD_OP))
+	(unspec_volatile:SI [(not:SI (and:SI
+			       (zero_extend:SI
+				 (match_operand:NARROW 1 "arm_sync_memory_operand" "+Q"))
+			       (match_operand:SI 2 "s_register_operand" "r")))
+			    ]
+			    VUNSPEC_SYNC_OLD_OP))
    (set (match_dup 1)
-        (unspec_volatile:NARROW [(match_dup 1) (match_dup 2)]
-	                    VUNSPEC_SYNC_OLD_OP))
+	(unspec_volatile:NARROW [(match_dup 1) (match_dup 2)]
+			    VUNSPEC_SYNC_OLD_OP))
    (clobber (reg:CC CC_REGNUM))
    (clobber (match_scratch:SI 3 "=&r"))
    (clobber (match_scratch:SI 4 "=&r"))]
-  "TARGET_HAVE_LDREXBHD && TARGET_HAVE_MEMORY_BARRIER"
+  "<sync_predtab>"
   {
     return arm_output_sync_insn (insn, operands);
   } 
diff --git a/gcc/testsuite/gcc.dg/di-longlong64-sync-1.c b/gcc/testsuite/gcc.dg/di-longlong64-sync-1.c
new file mode 100644
index 0000000..5d91dfc
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/di-longlong64-sync-1.c
@@ -0,0 +1,161 @@
+/* { dg-do run } */
+/* { dg-require-effective-target sync_longlong } */
+/* { dg-options "-std=gnu99" } */
+/* { dg-message "note: '__sync_fetch_and_nand' changed semantics in GCC 4.4" "" { target *-*-* } 0 } */
+/* { dg-message "note: '__sync_nand_and_fetch' changed semantics in GCC 4.4" "" { target *-*-* } 0 } */
+
+
+/* Test basic functionality of the intrinsics.  The operations should
+   not be optimized away if no one checks the return values.  */
+
+/* Based on ia64-sync-[12].c, but 1) long on ARM is 32 bit so use long long
+   (an explicit 64bit type maybe a better bet) and 2) Use values that cross
+   the 32bit boundary and cause carries since the actual maths are done as
+   pairs of 32 bit instructions.  */
+__extension__ typedef __SIZE_TYPE__ size_t;
+
+extern void abort (void);
+extern void *memcpy (void *, const void *, size_t);
+extern int memcmp (const void *, const void *, size_t);
+
+/* Temporary space where the work actually gets done */
+static long long AL[24];
+/* Values copied into AL before we start */
+static long long init_di[24] = { 0x100000002ll, 0x200000003ll, 0, 1,
+
+				 0x100000002ll, 0x100000002ll,
+				 0x100000002ll, 0x100000002ll,
+
+				 0, 0x1000e0de0000ll,
+				 42 , 0xc001c0de0000ll,
+
+				 -1ll, 0, 0xff00ff0000ll, -1ll,
+
+				 0, 0x1000e0de0000ll,
+				 42 , 0xc001c0de0000ll,
+
+				 -1ll, 0, 0xff00ff0000ll, -1ll};
+/* This is what should be in AL at the end */
+static long long test_di[24] = { 0x1234567890ll, 0x1234567890ll, 1, 0,
+
+				 0x100000002ll, 0x100000002ll,
+				 0x100000002ll, 0x100000002ll,
+
+				 1, 0xc001c0de0000ll,
+				 20, 0x1000e0de0000ll,
+
+				 0x300000007ll , 0x500000009ll,
+				 0xf100ff0001ll, ~0xa00000007ll,
+
+				 1, 0xc001c0de0000ll,
+				 20, 0x1000e0de0000ll,
+
+				 0x300000007ll , 0x500000009ll,
+				 0xf100ff0001ll, ~0xa00000007ll };
+
+/* First check they work in terms of what they do to memory */
+static void
+do_noret_di (void)
+{
+  __sync_val_compare_and_swap(AL+0, 0x100000002ll, 0x1234567890ll);
+  __sync_bool_compare_and_swap(AL+1, 0x200000003ll, 0x1234567890ll);
+  __sync_lock_test_and_set(AL+2, 1);
+  __sync_lock_release(AL+3);
+
+  /* The following tests should not change the value since the
+     original does NOT match
+   */
+  __sync_val_compare_and_swap(AL+4, 0x000000002ll, 0x1234567890ll);
+  __sync_val_compare_and_swap(AL+5, 0x100000000ll, 0x1234567890ll);
+  __sync_bool_compare_and_swap(AL+6, 0x000000002ll, 0x1234567890ll);
+  __sync_bool_compare_and_swap(AL+7, 0x100000000ll, 0x1234567890ll);
+
+  __sync_fetch_and_add(AL+8, 1);
+  __sync_fetch_and_add(AL+9, 0xb000e0000000ll); /* add to both halves & carry */
+  __sync_fetch_and_sub(AL+10, 22);
+  __sync_fetch_and_sub(AL+11, 0xb000e0000000ll);
+
+  __sync_fetch_and_and(AL+12, 0x300000007ll);
+  __sync_fetch_and_or(AL+13, 0x500000009ll);
+  __sync_fetch_and_xor(AL+14, 0xe00000001ll);
+  __sync_fetch_and_nand(AL+15, 0xa00000007ll);
+
+  /* These should be the same as the fetch_and_* cases except for
+     return value */
+  __sync_add_and_fetch(AL+16, 1);
+  __sync_add_and_fetch(AL+17, 0xb000e0000000ll); /* add to both halves & carry */
+  __sync_sub_and_fetch(AL+18, 22);
+  __sync_sub_and_fetch(AL+19, 0xb000e0000000ll);
+
+  __sync_and_and_fetch(AL+20, 0x300000007ll);
+  __sync_or_and_fetch(AL+21, 0x500000009ll);
+  __sync_xor_and_fetch(AL+22, 0xe00000001ll);
+  __sync_nand_and_fetch(AL+23, 0xa00000007ll);
+}
+
+/* Now check return values */
+static void
+do_ret_di (void)
+{
+  if (__sync_val_compare_and_swap(AL+0, 0x100000002ll, 0x1234567890ll) !=
+	0x100000002ll) abort();
+  if (__sync_bool_compare_and_swap(AL+1, 0x200000003ll, 0x1234567890ll) !=
+	1) abort();
+  if (__sync_lock_test_and_set(AL+2, 1) != 0) abort();
+  __sync_lock_release(AL+3); /* no return value, but keep to match results */
+
+  /* The following tests should not change the value since the
+     original does NOT match */
+  if (__sync_val_compare_and_swap(AL+4, 0x000000002ll, 0x1234567890ll) !=
+	0x100000002ll) abort();
+  if (__sync_val_compare_and_swap(AL+5, 0x100000000ll, 0x1234567890ll) !=
+	0x100000002ll) abort();
+  if (__sync_bool_compare_and_swap(AL+6, 0x000000002ll, 0x1234567890ll) !=
+	0) abort();
+  if (__sync_bool_compare_and_swap(AL+7, 0x100000000ll, 0x1234567890ll) !=
+	0) abort();
+
+  if (__sync_fetch_and_add(AL+8, 1) != 0) abort();
+  if (__sync_fetch_and_add(AL+9, 0xb000e0000000ll) != 0x1000e0de0000ll) abort();
+  if (__sync_fetch_and_sub(AL+10, 22) != 42) abort();
+  if (__sync_fetch_and_sub(AL+11, 0xb000e0000000ll) != 0xc001c0de0000ll)
+	abort();
+
+  if (__sync_fetch_and_and(AL+12, 0x300000007ll) != -1ll) abort();
+  if (__sync_fetch_and_or(AL+13, 0x500000009ll) != 0) abort();
+  if (__sync_fetch_and_xor(AL+14, 0xe00000001ll) != 0xff00ff0000ll) abort();
+  if (__sync_fetch_and_nand(AL+15, 0xa00000007ll) != -1ll) abort();
+
+  /* These should be the same as the fetch_and_* cases except for
+     return value */
+  if (__sync_add_and_fetch(AL+16, 1) != 1) abort();
+  if (__sync_add_and_fetch(AL+17, 0xb000e0000000ll) != 0xc001c0de0000ll)
+	abort();
+  if (__sync_sub_and_fetch(AL+18, 22) != 20) abort();
+  if (__sync_sub_and_fetch(AL+19, 0xb000e0000000ll) != 0x1000e0de0000ll)
+	abort();
+
+  if (__sync_and_and_fetch(AL+20, 0x300000007ll) != 0x300000007ll) abort();
+  if (__sync_or_and_fetch(AL+21, 0x500000009ll) != 0x500000009ll) abort();
+  if (__sync_xor_and_fetch(AL+22, 0xe00000001ll) != 0xf100ff0001ll) abort();
+  if (__sync_nand_and_fetch(AL+23, 0xa00000007ll) != ~0xa00000007ll) abort();
+}
+
+int main()
+{
+  memcpy(AL, init_di, sizeof(init_di));
+
+  do_noret_di ();
+
+  if (memcmp (AL, test_di, sizeof(test_di)))
+    abort ();
+
+  memcpy(AL, init_di, sizeof(init_di));
+
+  do_ret_di ();
+
+  if (memcmp (AL, test_di, sizeof(test_di)))
+    abort ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/di-sync-multithread.c b/gcc/testsuite/gcc.dg/di-sync-multithread.c
new file mode 100644
index 0000000..cfa556f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/di-sync-multithread.c
@@ -0,0 +1,191 @@
+/* { dg-do run } */
+/* { dg-require-effective-target sync_longlong } */
+/* { dg-require-effective-target pthread_h } */
+/* { dg-require-effective-target pthread } */
+/* { dg-options "-pthread -std=gnu99" } */
+
+/* test of long long atomic ops performed in parallel in 3 pthreads 
+   david.gilbert@linaro.org */
+
+#include <pthread.h>
+#include <unistd.h>
+
+/*#define DEBUGIT 1 */
+
+#ifdef DEBUGIT
+#include <stdio.h>
+
+#define DOABORT(x,...) { fprintf(stderr, x, __VA_ARGS__); fflush(stderr); abort(); }
+
+#else
+
+#define DOABORT(x,...) abort();
+
+#endif
+
+/* Passed to each thread to describe which bits it is going to work on. */
+struct threadwork {
+  unsigned long long count; /* incremented each time the worker loops */
+  unsigned int thread;    /* ID */
+  unsigned int addlsb;    /* 8 bit */
+  unsigned int logic1lsb; /* 5 bit */
+  unsigned int logic2lsb; /* 8 bit */
+};
+
+/* The shared word where all the atomic work is done */
+static volatile long long workspace;
+
+/* A shared word to tell the workers to quit when non-0 */
+static long long doquit;
+
+extern void abort (void);
+
+/* Note this test doesn't test the return values much */
+void*
+worker (void* data)
+{
+  struct threadwork *tw=(struct threadwork*)data;
+  long long add1bit=1ll << tw->addlsb;
+  long long logic1bit=1ll << tw->logic1lsb;
+  long long logic2bit=1ll << tw->logic2lsb;
+
+  /* Clear the bits we use */
+  __sync_and_and_fetch (&workspace, ~(0xffll * add1bit));
+  __sync_fetch_and_and (&workspace, ~(0x1fll * logic1bit));
+  __sync_fetch_and_and (&workspace, ~(0xffll * logic2bit));
+
+  do
+    {
+      long long tmp1, tmp2, tmp3;
+      /* OK, lets try and do some stuff to the workspace - by the end
+         of the main loop our area should be the same as it is now - i.e. 0 */
+
+      /* Push the arithmetic section upto 128 - one of the threads will
+         case this to carry accross the 32bit boundary */
+      for(tmp2=0; tmp2<64; tmp2++)
+	{
+	  /* Add 2 using the two different adds */
+	  tmp1=__sync_add_and_fetch (&workspace, add1bit);
+	  tmp3=__sync_fetch_and_add (&workspace, add1bit);
+
+	  /* The value should be the intermediate add value in both cases */
+	  if ((tmp1 & (add1bit * 0xff)) != (tmp3 & (add1bit * 0xff)))
+	    DOABORT("Mismatch of add intermediates on thread %d workspace=0x%llx tmp1=0x%llx tmp2=0x%llx tmp3=0x%llx\n",
+			 tw->thread, workspace, tmp1, tmp2, tmp3);
+	}
+
+      /* Set the logic bits */
+      tmp2=__sync_or_and_fetch (&workspace,
+			  0x1fll * logic1bit | 0xffll * logic2bit);
+
+      /* Check the logic bits are set and the arithmetic value is correct */
+      if ((tmp2 & (0x1fll * logic1bit | 0xffll * logic2bit | 0xffll * add1bit))
+	  != (0x1fll * logic1bit | 0xffll * logic2bit | 0x80ll * add1bit))
+	DOABORT("Midloop check failed on thread %d workspace=0x%llx tmp2=0x%llx masktmp2=0x%llx expected=0x%llx\n",
+		tw->thread, workspace, tmp2,
+		tmp2 & (0x1fll * logic1bit | 0xffll * logic2bit | 0xffll * add1bit),
+		(0x1fll * logic1bit | 0xffll * logic2bit | 0x80ll * add1bit));
+
+      /* Pull the arithmetic set back down to 0 - again this should cause a
+	 carry across the 32bit boundary in one thread */
+
+      for(tmp2=0; tmp2<64; tmp2++)
+	{
+	  /* Subtract 2 using the two different subs */
+	  tmp1=__sync_sub_and_fetch (&workspace, add1bit);
+	  tmp3=__sync_fetch_and_sub (&workspace, add1bit);
+
+	  /* The value should be the intermediate sub value in both cases */
+	  if ((tmp1 & (add1bit * 0xff)) != (tmp3 & (add1bit * 0xff)))
+	    DOABORT("Mismatch of sub intermediates on thread %d workspace=0x%llx tmp1=0x%llx tmp2=0x%llx tmp3=0x%llx\n",
+			tw->thread, workspace, tmp1, tmp2, tmp3);
+	}
+
+
+      /* Clear the logic bits */
+      __sync_fetch_and_xor (&workspace, 0x1fll * logic1bit);
+      tmp3=__sync_and_and_fetch (&workspace, ~(0xffll * logic2bit));
+
+      /* And so the logic bits and the arithmetic bits should be zero again */
+      if (tmp3 & (0x1fll * logic1bit | 0xffll * logic2bit | 0xffll * add1bit))
+	DOABORT("End of worker loop; bits none 0 on thread %d workspace=0x%llx tmp3=0x%llx mask=0x%llx maskedtmp3=0x%llx\n",
+		tw->thread, workspace, tmp3, (0x1fll * logic1bit | 0xffll * logic2bit | 0xffll * add1bit),
+		tmp3 & (0x1fll * logic1bit | 0xffll * logic2bit | 0xffll * add1bit));
+
+      __sync_add_and_fetch (&tw->count, 1);
+    }
+  while (!__sync_bool_compare_and_swap (&doquit, 1, 1));
+
+  pthread_exit (0);
+}
+
+int
+main ()
+{
+  /* We have 3 threads doing three sets of operations, an 8 bit
+     arithmetic field, a 5 bit logic field and an 8 bit logic
+     field (just to pack them all in).
+
+  6      5       4       4       3       2       1                
+  3      6       8       0       2       4       6       8       0
+  |...,...|...,...|...,...|...,...|...,...|...,...|...,...|...,...
+  - T0   --  T1  -- T2   --T2 --  T0  -*- T2-- T1-- T1   -***- T0-
+   logic2  logic2  arith   log2  arith  log1 log1  arith     log1
+
+  */
+  unsigned int t;
+  long long tmp;
+  int err;
+
+  struct threadwork tw[3]={
+    { 0ll, 0, 27, 0, 56 },
+    { 0ll, 1,  8,16, 48 },
+    { 0ll, 2, 40,21, 35 }
+  };
+
+  pthread_t threads[3];
+
+  __sync_lock_release (&doquit);
+
+  /* Get the work space into a known value - All 1's */
+  __sync_lock_release (&workspace); /* Now all 0 */
+  tmp = __sync_val_compare_and_swap (&workspace, 0, -1ll);
+  if (tmp!=0)
+    DOABORT("Initial __sync_val_compare_and_swap wasn't 0 workspace=0x%llx tmp=0x%llx\n", workspace,tmp);
+
+  for(t=0; t<3; t++)
+  {
+    err=pthread_create (&threads[t], NULL , worker, &tw[t]);
+    if (err) DOABORT("pthread_create failed on thread %d with error %d\n", t, err);
+  };
+
+  sleep (5);
+
+  /* Stop please */
+  __sync_lock_test_and_set (&doquit, 1ll);
+
+  for(t=0;t<3;t++)
+    {
+      err=pthread_join (threads[t], NULL);
+      if (err)
+	DOABORT("pthread_join failed on thread %d with error %d\n", t, err);
+    };
+
+  __sync_synchronize ();
+
+  /* OK, so all the workers have finished -
+     the workers should have zero'd their workspace, the unused areas
+     should still be 1
+  */
+  if (!__sync_bool_compare_and_swap (&workspace, 0x040000e0ll, 0))
+    DOABORT("End of run workspace mismatch, got %llx\n", workspace);
+
+  /* All the workers should have done some work */
+  for(t=0; t<3; t++)
+    {
+      if (tw[t].count == 0) DOABORT("Worker %d gave 0 count\n", t);
+    };
+
+  return 0;
+}
+
diff --git a/gcc/testsuite/gcc.target/arm/di-longlong64-sync-withhelpers.c b/gcc/testsuite/gcc.target/arm/di-longlong64-sync-withhelpers.c
new file mode 100644
index 0000000..741094b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/di-longlong64-sync-withhelpers.c
@@ -0,0 +1,175 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_arch_v5_ok } */
+/* { dg-options "-std=gnu99" } */
+/* { dg-add-options arm_arch_v5 } */
+/* { dg-message "note: '__sync_fetch_and_nand' changed semantics in GCC 4.4" "" { target *-*-* } 0 } */
+/* { dg-message "note: '__sync_nand_and_fetch' changed semantics in GCC 4.4" "" { target *-*-* } 0 } */
+
+/* This is a copy of di-longlong64-sync-1.c, and is here to check that the
+   assembler doesn't generated ldrexd's and strexd's in when compiled for
+   armv5 in ARM mode
+ */
+
+/* Test basic functionality of the intrinsics.  The operations should
+   not be optimized away if no one checks the return values.  */
+
+/* Based on ia64-sync-[12].c, but 1) long on ARM is 32 bit so I needed
+   to use long long (an explicit 64bit type maybe a better bet) and
+   2) I wanted to use values that cross the 32bit boundary and cause
+   carries since the actual maths are done as pairs of 32 bit instructions.
+ */
+__extension__ typedef __SIZE_TYPE__ size_t;
+
+extern void abort (void);
+extern void *memcpy (void *, const void *, size_t);
+extern int memcmp (const void *, const void *, size_t);
+
+/* Temporary space where the work actually gets done */
+static long long AL[24];
+/* Values copied into AL before we start */
+static long long init_di[24] = { 0x100000002ll, 0x200000003ll, 0, 1,
+
+				 0x100000002ll, 0x100000002ll,
+				 0x100000002ll, 0x100000002ll,
+
+				 0, 0x1000e0de0000ll,
+				 42 , 0xc001c0de0000ll,
+
+				 -1ll, 0, 0xff00ff0000ll, -1ll,
+
+				 0, 0x1000e0de0000ll,
+				 42 , 0xc001c0de0000ll,
+
+				 -1ll, 0, 0xff00ff0000ll, -1ll};
+/* This is what should be in AL at the end */
+static long long test_di[24] = { 0x1234567890ll, 0x1234567890ll, 1, 0,
+
+				 0x100000002ll, 0x100000002ll,
+				 0x100000002ll, 0x100000002ll,
+
+				 1, 0xc001c0de0000ll,
+				 20, 0x1000e0de0000ll,
+
+				 0x300000007ll , 0x500000009ll,
+				 0xf100ff0001ll, ~0xa00000007ll,
+
+				 1, 0xc001c0de0000ll,
+				 20, 0x1000e0de0000ll,
+
+				 0x300000007ll , 0x500000009ll,
+				 0xf100ff0001ll, ~0xa00000007ll };
+
+/* First check they work in terms of what they do to memory */
+static void
+do_noret_di (void)
+{
+  __sync_val_compare_and_swap(AL+0, 0x100000002ll, 0x1234567890ll);
+  __sync_bool_compare_and_swap(AL+1, 0x200000003ll, 0x1234567890ll);
+  __sync_lock_test_and_set(AL+2, 1);
+  __sync_lock_release(AL+3);
+
+  /* The following tests should not change the value since the
+     original does NOT match
+   */
+  __sync_val_compare_and_swap(AL+4, 0x000000002ll, 0x1234567890ll);
+  __sync_val_compare_and_swap(AL+5, 0x100000000ll, 0x1234567890ll);
+  __sync_bool_compare_and_swap(AL+6, 0x000000002ll, 0x1234567890ll);
+  __sync_bool_compare_and_swap(AL+7, 0x100000000ll, 0x1234567890ll);
+
+  __sync_fetch_and_add(AL+8, 1);
+  __sync_fetch_and_add(AL+9, 0xb000e0000000ll); /* add to both halves & carry */
+  __sync_fetch_and_sub(AL+10, 22);
+  __sync_fetch_and_sub(AL+11, 0xb000e0000000ll);
+
+  __sync_fetch_and_and(AL+12, 0x300000007ll);
+  __sync_fetch_and_or(AL+13, 0x500000009ll);
+  __sync_fetch_and_xor(AL+14, 0xe00000001ll);
+  __sync_fetch_and_nand(AL+15, 0xa00000007ll);
+
+  /* These should be the same as the fetch_and_* cases except for
+     return value
+   */
+  __sync_add_and_fetch(AL+16, 1);
+  __sync_add_and_fetch(AL+17, 0xb000e0000000ll); /* add to both halves & carry */
+  __sync_sub_and_fetch(AL+18, 22);
+  __sync_sub_and_fetch(AL+19, 0xb000e0000000ll);
+
+  __sync_and_and_fetch(AL+20, 0x300000007ll);
+  __sync_or_and_fetch(AL+21, 0x500000009ll);
+  __sync_xor_and_fetch(AL+22, 0xe00000001ll);
+  __sync_nand_and_fetch(AL+23, 0xa00000007ll);
+}
+
+/* Now check return values */
+static void
+do_ret_di (void)
+{
+  if (__sync_val_compare_and_swap(AL+0, 0x100000002ll, 0x1234567890ll) !=
+	0x100000002ll) abort();
+  if (__sync_bool_compare_and_swap(AL+1, 0x200000003ll, 0x1234567890ll) !=
+	1) abort();
+  if (__sync_lock_test_and_set(AL+2, 1) != 0) abort();
+  __sync_lock_release(AL+3); /* no return value, but keep to match results */
+
+  /* The following tests should not change the value since the
+     original does NOT match
+   */
+  if (__sync_val_compare_and_swap(AL+4, 0x000000002ll, 0x1234567890ll) !=
+	0x100000002ll) abort();
+  if (__sync_val_compare_and_swap(AL+5, 0x100000000ll, 0x1234567890ll) !=
+	0x100000002ll) abort();
+  if (__sync_bool_compare_and_swap(AL+6, 0x000000002ll, 0x1234567890ll) !=
+	0) abort();
+  if (__sync_bool_compare_and_swap(AL+7, 0x100000000ll, 0x1234567890ll) !=
+	0) abort();
+
+  if (__sync_fetch_and_add(AL+8, 1) != 0) abort();
+  if (__sync_fetch_and_add(AL+9, 0xb000e0000000ll) != 0x1000e0de0000ll) abort();
+  if (__sync_fetch_and_sub(AL+10, 22) != 42) abort();
+  if (__sync_fetch_and_sub(AL+11, 0xb000e0000000ll) != 0xc001c0de0000ll)
+	abort();
+
+  if (__sync_fetch_and_and(AL+12, 0x300000007ll) != -1ll) abort();
+  if (__sync_fetch_and_or(AL+13, 0x500000009ll) != 0) abort();
+  if (__sync_fetch_and_xor(AL+14, 0xe00000001ll) != 0xff00ff0000ll) abort();
+  if (__sync_fetch_and_nand(AL+15, 0xa00000007ll) != -1ll) abort();
+
+  /* These should be the same as the fetch_and_* cases except for
+     return value
+   */
+  if (__sync_add_and_fetch(AL+16, 1) != 1) abort();
+  if (__sync_add_and_fetch(AL+17, 0xb000e0000000ll) != 0xc001c0de0000ll)
+	abort();
+  if (__sync_sub_and_fetch(AL+18, 22) != 20) abort();
+  if (__sync_sub_and_fetch(AL+19, 0xb000e0000000ll) != 0x1000e0de0000ll)
+	abort();
+
+  if (__sync_and_and_fetch(AL+20, 0x300000007ll) != 0x300000007ll) abort();
+  if (__sync_or_and_fetch(AL+21, 0x500000009ll) != 0x500000009ll) abort();
+  if (__sync_xor_and_fetch(AL+22, 0xe00000001ll) != 0xf100ff0001ll) abort();
+  if (__sync_nand_and_fetch(AL+23, 0xa00000007ll) != ~0xa00000007ll) abort();
+}
+
+int main()
+{
+  memcpy(AL, init_di, sizeof(init_di));
+
+  do_noret_di ();
+
+  if (memcmp (AL, test_di, sizeof(test_di)))
+    abort ();
+
+  memcpy(AL, init_di, sizeof(init_di));
+
+  do_ret_di ();
+
+  if (memcmp (AL, test_di, sizeof(test_di)))
+    abort ();
+
+  return 0;
+}
+
+/* On an old ARM we have no ldrexd or strexd so we have to use helpers */
+/* { dg-final { scan-assembler-not "ldrexd" } } */
+/* { dg-final { scan-assembler-not "strexd" } } */
+/* { dg-final { scan-assembler "__sync_" } } */
diff --git a/gcc/testsuite/gcc.target/arm/di-longlong64-sync-withldrexd.c b/gcc/testsuite/gcc.target/arm/di-longlong64-sync-withldrexd.c
new file mode 100644
index 0000000..77a224a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/di-longlong64-sync-withldrexd.c
@@ -0,0 +1,179 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_arm_ok } */
+/* { dg-options "-marm -std=gnu99" } */
+/* { dg-require-effective-target arm_arch_v6k_ok } */
+/* { dg-add-options arm_arch_v6k } */
+/* { dg-message "note: '__sync_fetch_and_nand' changed semantics in GCC 4.4" "" { target *-*-* } 0 } */
+/* { dg-message "note: '__sync_nand_and_fetch' changed semantics in GCC 4.4" "" { target *-*-* } 0 } */
+
+
+/* This is a copy of di-longlong64-sync-1.c, and is here to check that the
+   assembler generated has ldrexd's and strexd's in when compiled for
+   armv6k in ARM mode (which is the earliest config that can use them)
+ */
+ 
+/* Test basic functionality of the intrinsics.  The operations should
+   not be optimized away if no one checks the return values.  */
+
+/* Based on ia64-sync-[12].c, but 1) long on ARM is 32 bit so I needed
+   to use long long (an explicit 64bit type maybe a better bet) and
+   2) I wanted to use values that cross the 32bit boundary and cause
+   carries since the actual maths are done as pairs of 32 bit instructions.
+ */
+__extension__ typedef __SIZE_TYPE__ size_t;
+
+extern void abort (void);
+extern void *memcpy (void *, const void *, size_t);
+extern int memcmp (const void *, const void *, size_t);
+
+/* Temporary space where the work actually gets done */
+static long long AL[24];
+/* Values copied into AL before we start */
+static long long init_di[24] = { 0x100000002ll, 0x200000003ll, 0, 1,
+
+				 0x100000002ll, 0x100000002ll,
+				 0x100000002ll, 0x100000002ll,
+
+				 0, 0x1000e0de0000ll,
+				 42 , 0xc001c0de0000ll,
+
+				 -1ll, 0, 0xff00ff0000ll, -1ll,
+
+				 0, 0x1000e0de0000ll,
+				 42 , 0xc001c0de0000ll,
+
+				 -1ll, 0, 0xff00ff0000ll, -1ll};
+/* This is what should be in AL at the end */
+static long long test_di[24] = { 0x1234567890ll, 0x1234567890ll, 1, 0,
+
+				 0x100000002ll, 0x100000002ll,
+				 0x100000002ll, 0x100000002ll,
+
+				 1, 0xc001c0de0000ll,
+				 20, 0x1000e0de0000ll,
+
+				 0x300000007ll , 0x500000009ll,
+				 0xf100ff0001ll, ~0xa00000007ll,
+
+				 1, 0xc001c0de0000ll,
+				 20, 0x1000e0de0000ll,
+
+				 0x300000007ll , 0x500000009ll,
+				 0xf100ff0001ll, ~0xa00000007ll };
+
+/* First check they work in terms of what they do to memory */
+static void
+do_noret_di (void)
+{
+  __sync_val_compare_and_swap(AL+0, 0x100000002ll, 0x1234567890ll);
+  __sync_bool_compare_and_swap(AL+1, 0x200000003ll, 0x1234567890ll);
+  __sync_lock_test_and_set(AL+2, 1);
+  __sync_lock_release(AL+3);
+
+  /* The following tests should not change the value since the
+     original does NOT match
+   */
+  __sync_val_compare_and_swap(AL+4, 0x000000002ll, 0x1234567890ll);
+  __sync_val_compare_and_swap(AL+5, 0x100000000ll, 0x1234567890ll);
+  __sync_bool_compare_and_swap(AL+6, 0x000000002ll, 0x1234567890ll);
+  __sync_bool_compare_and_swap(AL+7, 0x100000000ll, 0x1234567890ll);
+
+  __sync_fetch_and_add(AL+8, 1);
+  __sync_fetch_and_add(AL+9, 0xb000e0000000ll); /* add to both halves & carry */
+  __sync_fetch_and_sub(AL+10, 22);
+  __sync_fetch_and_sub(AL+11, 0xb000e0000000ll);
+
+  __sync_fetch_and_and(AL+12, 0x300000007ll);
+  __sync_fetch_and_or(AL+13, 0x500000009ll);
+  __sync_fetch_and_xor(AL+14, 0xe00000001ll);
+  __sync_fetch_and_nand(AL+15, 0xa00000007ll);
+
+  /* These should be the same as the fetch_and_* cases except for
+     return value
+   */
+  __sync_add_and_fetch(AL+16, 1);
+  __sync_add_and_fetch(AL+17, 0xb000e0000000ll); /* add to both halves & carry */
+  __sync_sub_and_fetch(AL+18, 22);
+  __sync_sub_and_fetch(AL+19, 0xb000e0000000ll);
+
+  __sync_and_and_fetch(AL+20, 0x300000007ll);
+  __sync_or_and_fetch(AL+21, 0x500000009ll);
+  __sync_xor_and_fetch(AL+22, 0xe00000001ll);
+  __sync_nand_and_fetch(AL+23, 0xa00000007ll);
+}
+
+/* Now check return values */
+static void
+do_ret_di (void)
+{
+  if (__sync_val_compare_and_swap(AL+0, 0x100000002ll, 0x1234567890ll) !=
+	0x100000002ll) abort();
+  if (__sync_bool_compare_and_swap(AL+1, 0x200000003ll, 0x1234567890ll) !=
+	1) abort();
+  if (__sync_lock_test_and_set(AL+2, 1) != 0) abort();
+  __sync_lock_release(AL+3); /* no return value, but keep to match results */
+
+  /* The following tests should not change the value since the
+     original does NOT match
+   */
+  if (__sync_val_compare_and_swap(AL+4, 0x000000002ll, 0x1234567890ll) !=
+	0x100000002ll) abort();
+  if (__sync_val_compare_and_swap(AL+5, 0x100000000ll, 0x1234567890ll) !=
+	0x100000002ll) abort();
+  if (__sync_bool_compare_and_swap(AL+6, 0x000000002ll, 0x1234567890ll) !=
+	0) abort();
+  if (__sync_bool_compare_and_swap(AL+7, 0x100000000ll, 0x1234567890ll) !=
+	0) abort();
+
+  if (__sync_fetch_and_add(AL+8, 1) != 0) abort();
+  if (__sync_fetch_and_add(AL+9, 0xb000e0000000ll) != 0x1000e0de0000ll) abort();
+  if (__sync_fetch_and_sub(AL+10, 22) != 42) abort();
+  if (__sync_fetch_and_sub(AL+11, 0xb000e0000000ll) != 0xc001c0de0000ll)
+	abort();
+
+  if (__sync_fetch_and_and(AL+12, 0x300000007ll) != -1ll) abort();
+  if (__sync_fetch_and_or(AL+13, 0x500000009ll) != 0) abort();
+  if (__sync_fetch_and_xor(AL+14, 0xe00000001ll) != 0xff00ff0000ll) abort();
+  if (__sync_fetch_and_nand(AL+15, 0xa00000007ll) != -1ll) abort();
+
+  /* These should be the same as the fetch_and_* cases except for
+     return value
+   */
+  if (__sync_add_and_fetch(AL+16, 1) != 1) abort();
+  if (__sync_add_and_fetch(AL+17, 0xb000e0000000ll) != 0xc001c0de0000ll)
+	abort();
+  if (__sync_sub_and_fetch(AL+18, 22) != 20) abort();
+  if (__sync_sub_and_fetch(AL+19, 0xb000e0000000ll) != 0x1000e0de0000ll)
+	abort();
+
+  if (__sync_and_and_fetch(AL+20, 0x300000007ll) != 0x300000007ll) abort();
+  if (__sync_or_and_fetch(AL+21, 0x500000009ll) != 0x500000009ll) abort();
+  if (__sync_xor_and_fetch(AL+22, 0xe00000001ll) != 0xf100ff0001ll) abort();
+  if (__sync_nand_and_fetch(AL+23, 0xa00000007ll) != ~0xa00000007ll) abort();
+}
+
+int main()
+{
+  memcpy(AL, init_di, sizeof(init_di));
+
+  do_noret_di ();
+
+  if (memcmp (AL, test_di, sizeof(test_di)))
+    abort ();
+
+  memcpy(AL, init_di, sizeof(init_di));
+
+  do_ret_di ();
+
+  if (memcmp (AL, test_di, sizeof(test_di)))
+    abort ();
+
+  return 0;
+}
+
+/* We should be using ldrexd, strexd and no helper functions or shorter ldrex */
+/* { dg-final { scan-assembler-times "\tldrexd" 46 } } */
+/* { dg-final { scan-assembler-times "\tstrexd" 46 } } */
+/* { dg-final { scan-assembler-not "__sync_" } } */
+/* { dg-final { scan-assembler-not "ldrex\t" } } */
+/* { dg-final { scan-assembler-not "strex\t" } } */
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index cf44f1e..02f3121 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -2062,6 +2062,47 @@ proc check_effective_target_arm_fp16_ok { } {
 		check_effective_target_arm_fp16_ok_nocache]
 }
 
+# Creates a series of routines that return 1 if the given architecture
+# can be selected and a routine to give the flags to select that architecture
+# Note: Extra flags may be added to disable options from newer compilers
+# (Thumb in particular - but others may be added in the future)
+# Usage: /* { dg-require-effective-target arm_arch_v5_ok } */
+#        /* { dg-add-options arm_arch_v5 } */
+foreach { armfunc armflag armdef } { v5 "-march=armv5 -marm" __ARM_ARCH_5__
+				     v6 "-march=armv6" __ARM_ARCH_6__
+				     v6k "-march=armv6k" __ARM_ARCH_6K__
+				     v7a "-march=armv7-a" __ARM_ARCH_7A__ } {
+    eval [string map [list FUNC $armfunc FLAG $armflag DEF $armdef ] {
+	proc check_effective_target_arm_arch_FUNC_ok { } {
+	    if { [ string match "*-marm*" "FLAG" ] && 
+		![check_effective_target_arm_arm_ok] } {
+		return 0
+	    }
+	    return [check_no_compiler_messages arm_arch_FUNC_ok assembly {
+		#if !defined(DEF)
+		#error FOO
+		#endif
+	    } "FLAG" ]
+	}
+
+	proc add_options_for_arm_arch_FUNC { flags } {
+	    return "$flags FLAG"
+	}
+    }]
+}
+
+# Return 1 if this is an ARM target where -marm causes ARM to be
+# used (not Thumb)
+
+proc check_effective_target_arm_arm_ok { } {
+    return [check_no_compiler_messages arm_arm_ok assembly {
+	#if !defined(__arm__) || defined(__thumb__) || defined(__thumb2__)
+	#error FOO
+	#endif
+    } "-marm"]
+}
+
+
 # Return 1 is this is an ARM target where -mthumb causes Thumb-1 to be
 # used.
 
@@ -3404,6 +3445,31 @@ proc check_effective_target_sync_int_long { } {
     return $et_sync_int_long_saved
 }
 
+# Return 1 if the target supports atomic operations on "long long" and can actually
+# execute them
+# So far only put checks in for ARM, others may want to add their own
+proc check_effective_target_sync_longlong { } {
+    return [check_runtime sync_longlong_runtime {
+      #include <stdlib.h>
+      int main()
+      {
+	long long l1;
+
+	if (sizeof(long long)!=8)
+	  exit(1);
+
+      #ifdef __arm__
+	/* Just check for native; checking for kernel fallback is tricky */
+	asm volatile ("ldrexd r0,r1, [%0]" : : "r" (&l1) : "r0", "r1");
+      #else
+      # error "Add other suitable archs here"
+      #endif
+
+	exit(0);
+      }
+    } "" ]
+}
+
 # Return 1 if the target supports atomic operations on "char" and "short".
 
 proc check_effective_target_sync_char_short { } {

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [Patch 2/4] ARM 64 bit sync atomic operations [V2]
  2011-07-26  9:16     ` [Patch 1/4] " Dr. David Alan Gilbert
@ 2011-07-26  9:18       ` Dr. David Alan Gilbert
  2011-07-26  9:23         ` [Patch 3/4] " Dr. David Alan Gilbert
  2011-09-30 18:04         ` [Patch 2/4] " Ramana Radhakrishnan
  2011-09-30 14:25       ` [Patch 1/4] " Ramana Radhakrishnan
  1 sibling, 2 replies; 43+ messages in thread
From: Dr. David Alan Gilbert @ 2011-07-26  9:18 UTC (permalink / raw)
  To: gcc-patches; +Cc: ramana.radhakrishnan, rth, joseph, patches


    Add ARM 64bit sync helpers for use on older ARMs.  Based on 32bit
    versions but with check for sufficiently new kernel version.
    
        gcc/
    	* config/arm/linux-atomic-64bit.c: New (based on linux-atomic.c)
    	* config/arm/linux-atomic.c: Change comment to point to 64bit version
    	  (SYNC_LOCK_RELEASE): Instantiate 64bit version.
    	* config/arm/t-linux-eabi: Pull in linux-atomic-64bit.c

diff --git a/gcc/config/arm/linux-atomic-64bit.c b/gcc/config/arm/linux-atomic-64bit.c
new file mode 100644
index 0000000..8b65de8
--- /dev/null
+++ b/gcc/config/arm/linux-atomic-64bit.c
@@ -0,0 +1,162 @@
+/* 64bit Linux-specific atomic operations for ARM EABI.
+   Copyright (C) 2008, 2009, 2010, 2011 Free Software Foundation, Inc.
+   Based on linux-atomic.c
+
+   64 bit additions david.gilbert@linaro.org
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+/* 64bit helper functions for atomic operations; the compiler will
+   call these when the code is compiled for a CPU without ldrexd/strexd.
+   (If the CPU had those then the compiler inlines the operation).
+
+   These helpers require a kernel helper that's only present on newer
+   kernels; we check for that in an init section and bail out rather
+   unceremoneously.  */
+
+extern unsigned int __write(int fd, const void *buf, unsigned int count);
+extern void abort(void);
+
+/* Kernel helper for compare-and-exchange.  */
+typedef int (__kernel_cmpxchg64_t) (const long long* oldval,
+					const long long* newval,
+					long long *ptr);
+#define __kernel_cmpxchg64 (*(__kernel_cmpxchg64_t *) 0xffff0f60)
+
+/* Kernel helper page version number */
+#define __kernel_helper_version (*(unsigned int *)0xffff0ffc)
+
+/* Check that the kernel has a new enough version at load */
+static void __check_for_sync8_kernelhelper (void)
+{
+  if (__kernel_helper_version < 5)
+    {
+      const char err[] = "A newer kernel is required to run this binary. (__kernel_cmpxchg64 helper)\n";
+      /* At this point we need a way to crash with some information
+	 for the user - I'm not sure I can rely on much else being
+	 available at this point, so do the same as generic-morestack.c
+	 write() and abort(). */
+      __write (2 /* stderr */, err, sizeof(err));
+      abort ();
+    }
+};
+
+static void (*__sync8_kernelhelper_inithook[]) (void) __attribute__ ((used, section (".init_array"))) = {
+  &__check_for_sync8_kernelhelper
+};
+
+#define HIDDEN __attribute__ ((visibility ("hidden")))
+
+#define FETCH_AND_OP_WORD64(OP, PFX_OP, INF_OP)			\
+  long long HIDDEN						\
+  __sync_fetch_and_##OP##_8 (long long *ptr, long long val)	\
+  {								\
+    int failure;						\
+    long long tmp,tmp2;						\
+								\
+    do {							\
+      tmp = *ptr;						\
+      tmp2 = PFX_OP (tmp INF_OP val);				\
+      failure = __kernel_cmpxchg64 (&tmp, &tmp2, ptr);		\
+    } while (failure != 0);					\
+								\
+    return tmp;							\
+  }
+
+FETCH_AND_OP_WORD64 (add,   , +)
+FETCH_AND_OP_WORD64 (sub,   , -)
+FETCH_AND_OP_WORD64 (or,    , |)
+FETCH_AND_OP_WORD64 (and,   , &)
+FETCH_AND_OP_WORD64 (xor,   , ^)
+FETCH_AND_OP_WORD64 (nand, ~, &)
+
+#define NAME_oldval(OP, WIDTH) __sync_fetch_and_##OP##_##WIDTH
+#define NAME_newval(OP, WIDTH) __sync_##OP##_and_fetch_##WIDTH
+
+/* Implement both __sync_<op>_and_fetch and __sync_fetch_and_<op> for
+   subword-sized quantities.  */
+
+#define OP_AND_FETCH_WORD64(OP, PFX_OP, INF_OP)			\
+  long long HIDDEN						\
+  __sync_##OP##_and_fetch_8 (long long *ptr, long long val)	\
+  {								\
+    int failure;						\
+    long long tmp,tmp2;						\
+								\
+    do {							\
+      tmp = *ptr;						\
+      tmp2 = PFX_OP (tmp INF_OP val);				\
+      failure = __kernel_cmpxchg64 (&tmp, &tmp2, ptr);		\
+    } while (failure != 0);					\
+								\
+    return tmp2;						\
+  }
+
+OP_AND_FETCH_WORD64 (add,   , +)
+OP_AND_FETCH_WORD64 (sub,   , -)
+OP_AND_FETCH_WORD64 (or,    , |)
+OP_AND_FETCH_WORD64 (and,   , &)
+OP_AND_FETCH_WORD64 (xor,   , ^)
+OP_AND_FETCH_WORD64 (nand, ~, &)
+
+long long HIDDEN
+__sync_val_compare_and_swap_8 (long long *ptr, long long oldval, long long newval)
+{
+  int failure;
+  long long actual_oldval;
+
+  while (1)
+    {
+      actual_oldval = *ptr;
+
+      if (__builtin_expect (oldval != actual_oldval, 0))
+	return actual_oldval;
+
+      failure = __kernel_cmpxchg64 (&actual_oldval, &newval, ptr);
+
+      if (__builtin_expect (!failure, 1))
+	return oldval;
+    }
+}
+
+typedef unsigned char bool;
+
+bool HIDDEN
+__sync_bool_compare_and_swap_8 (long long *ptr, long long oldval, long long newval)
+{
+  int failure = __kernel_cmpxchg64 (&oldval, &newval, ptr);
+  return (failure == 0);
+}
+
+long long HIDDEN
+__sync_lock_test_and_set_8 (long long *ptr, long long val)
+{
+  int failure;
+  long long oldval;
+
+  do {
+    oldval = *ptr;
+    failure = __kernel_cmpxchg64 (&oldval, &val, ptr);
+  } while (failure != 0);
+
+  return oldval;
+}
diff --git a/gcc/config/arm/linux-atomic.c b/gcc/config/arm/linux-atomic.c
index 57065a6..80f161d 100644
--- a/gcc/config/arm/linux-atomic.c
+++ b/gcc/config/arm/linux-atomic.c
@@ -32,8 +32,8 @@ typedef void (__kernel_dmb_t) (void);
 #define __kernel_dmb (*(__kernel_dmb_t *) 0xffff0fa0)
 
 /* Note: we implement byte, short and int versions of atomic operations using
-   the above kernel helpers, but there is no support for "long long" (64-bit)
-   operations as yet.  */
+   the above kernel helpers; see linux-atomic-64bit.c for "long long" (64-bit)
+   operations.  */
 
 #define HIDDEN __attribute__ ((visibility ("hidden")))
 
@@ -273,6 +273,7 @@ SUBWORD_TEST_AND_SET (unsigned char,  1)
     *ptr = 0;								\
   }
 
+SYNC_LOCK_RELEASE (long long,   8)
 SYNC_LOCK_RELEASE (int,   4)
 SYNC_LOCK_RELEASE (short, 2)
 SYNC_LOCK_RELEASE (char,  1)
diff --git a/gcc/config/arm/t-linux-eabi b/gcc/config/arm/t-linux-eabi
index a328a00..3814cc0 100644
--- a/gcc/config/arm/t-linux-eabi
+++ b/gcc/config/arm/t-linux-eabi
@@ -36,3 +36,4 @@ LIB1ASMFUNCS := $(filter-out _dvmd_tls,$(LIB1ASMFUNCS)) _dvmd_lnx _clear_cache
 EXTRA_MULTILIB_PARTS=crtbegin.o crtend.o crtbeginS.o crtendS.o crtbeginT.o
 
 LIB2FUNCS_STATIC_EXTRA += $(srcdir)/config/arm/linux-atomic.c
+LIB2FUNCS_STATIC_EXTRA += $(srcdir)/config/arm/linux-atomic-64bit.c

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [Patch 3/4] ARM 64 bit sync atomic operations [V2]
  2011-07-26  9:18       ` [Patch 2/4] " Dr. David Alan Gilbert
@ 2011-07-26  9:23         ` Dr. David Alan Gilbert
  2011-07-26  9:24           ` [Patch 4/4] " Dr. David Alan Gilbert
  2011-08-17 13:07           ` [Patch 3/4] " Ramana Radhakrishnan
  2011-09-30 18:04         ` [Patch 2/4] " Ramana Radhakrishnan
  1 sibling, 2 replies; 43+ messages in thread
From: Dr. David Alan Gilbert @ 2011-07-26  9:23 UTC (permalink / raw)
  To: gcc-patches; +Cc: ramana.radhakrishnan, rth, joseph, patches

      Micahel K. Edwards points out in PR/48126 that the sync is in the wrong place
      relative to the branch target of the compare, since the load could float
      up beyond the ldrex.
    
        gcc/
    	* config/arm/arm.c (arm_output_sync_loop): Move label before barier,
    						   fixes PR/48126

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 28be078..cee3471 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -23780,8 +23780,11 @@ arm_output_sync_loop (emit_f emit,
 	}
     }
 
-  arm_process_output_memory_barrier (emit, NULL);
+  /* Note: label is before barrier so that in cmp failure case we still get
+     a barrier to stop subsequent loads floating upwards past the ldrex
+     pr/48126 */
   arm_output_asm_insn (emit, 1, operands, "%sLSYB%%=:", LOCAL_LABEL_PREFIX);
+  arm_process_output_memory_barrier (emit, NULL);
 }
 
 static rtx

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [Patch 4/4] ARM 64 bit sync atomic operations [V2]
  2011-07-26  9:23         ` [Patch 3/4] " Dr. David Alan Gilbert
@ 2011-07-26  9:24           ` Dr. David Alan Gilbert
  2011-08-01 15:43             ` Ramana Radhakrishnan
  2011-08-17 13:07           ` [Patch 3/4] " Ramana Radhakrishnan
  1 sibling, 1 reply; 43+ messages in thread
From: Dr. David Alan Gilbert @ 2011-07-26  9:24 UTC (permalink / raw)
  To: gcc-patches; +Cc: ramana.radhakrishnan, rth, joseph, patches


        gcc/
    	* config/arm/arm.c (TARGET_HAVE_DMB_MCR) MCR Not available in Thumb1
    	  but is available on armv6

diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 0d419d5..146b9ad 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -285,7 +285,8 @@ extern void (*arm_lang_output_object_attributes_hook)(void);
 #define TARGET_HAVE_DMB		(arm_arch7)
 
 /* Nonzero if this chip implements a memory barrier via CP15.  */
-#define TARGET_HAVE_DMB_MCR	(arm_arch6k && ! TARGET_HAVE_DMB)
+#define TARGET_HAVE_DMB_MCR	(arm_arch6 && ! TARGET_HAVE_DMB \
+				 && ! TARGET_THUMB1)
 
 /* Nonzero if this chip implements a memory barrier instruction.  */
 #define TARGET_HAVE_MEMORY_BARRIER (TARGET_HAVE_DMB || TARGET_HAVE_DMB_MCR)

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Patch 4/4] ARM 64 bit sync atomic operations [V2]
  2011-07-26  9:24           ` [Patch 4/4] " Dr. David Alan Gilbert
@ 2011-08-01 15:43             ` Ramana Radhakrishnan
  0 siblings, 0 replies; 43+ messages in thread
From: Ramana Radhakrishnan @ 2011-08-01 15:43 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: gcc-patches, rth, joseph, patches

On 26 July 2011 10:02, Dr. David Alan Gilbert <david.gilbert@linaro.org> wrote:
>
>        gcc/
>        * config/arm/arm.c (TARGET_HAVE_DMB_MCR) MCR Not available in Thumb1
>          but is available on armv6

`:' after (TARGET_HAVE_DMB_MCR) and something like

`Disable for Thumb1.'

instead of what's on there.

>
>  /* Nonzero if this chip implements a memory barrier via CP15.  */
> -#define TARGET_HAVE_DMB_MCR    (arm_arch6k && ! TARGET_HAVE_DMB)
> +#define TARGET_HAVE_DMB_MCR    (arm_arch6 && ! TARGET_HAVE_DMB \
> +                                && ! TARGET_THUMB1)

Otherwise OK for trunk and afflicted release branches since this
really is a bug fix.

cheers
Ramana

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Patch 3/4] ARM 64 bit sync atomic operations [V2]
  2011-07-26  9:23         ` [Patch 3/4] " Dr. David Alan Gilbert
  2011-07-26  9:24           ` [Patch 4/4] " Dr. David Alan Gilbert
@ 2011-08-17 13:07           ` Ramana Radhakrishnan
  1 sibling, 0 replies; 43+ messages in thread
From: Ramana Radhakrishnan @ 2011-08-17 13:07 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: gcc-patches, rth, joseph, patches

On 26 July 2011 10:01, Dr. David Alan Gilbert <david.gilbert@linaro.org> wrote:
>      Micahel K. Edwards points out in PR/48126 that the sync is in the wrong place
>      relative to the branch target of the compare, since the load could float
>      up beyond the ldrex.
>
>        gcc/
>        * config/arm/arm.c (arm_output_sync_loop): Move label before barier,
>                                                   fixes PR/48126

s/barier/barrier.

This is OK for trunk and appropriate release branches after due testing.

Next time note that the Changelog entry should read - The PR number
really shouldn't be a part
of the actual log.

PR target/48126

* config/arm/arm.c (arm_output_sync_loop): Move label before barrier.



with appropriate formatting.


Thanks,
Ramana

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Patch 1/4] ARM 64 bit sync atomic operations [V2]
  2011-07-26  9:16     ` [Patch 1/4] " Dr. David Alan Gilbert
  2011-07-26  9:18       ` [Patch 2/4] " Dr. David Alan Gilbert
@ 2011-09-30 14:25       ` Ramana Radhakrishnan
  2011-10-03 12:53         ` David Gilbert
  1 sibling, 1 reply; 43+ messages in thread
From: Ramana Radhakrishnan @ 2011-09-30 14:25 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: gcc-patches, rth, joseph, patches

Hi Dave,


The nit-picky bit - There are still a number of formatting issues with
your patch . Could you run your patch through
contrib/check_GNU_style.sh and correct these. These are typically
around problems with the number of spaces between a full stop and the
end of comment, lines with trailing whitespaces and a few lines with
number of characters > 80.  Thanks.

>@@ -23590,82 +23637,142 @@ arm_output_sync_loop (emit_f emit,
>
>+      else
>+	{
>+	  /* Silence false potentially unused warning */
>+	  required_value_lo = NULL;
>+	  required_value_hi = NULL;
>+	}
>

s/NULL/NULL_RTX in a number of places in arm.c

>@@ -23516,14 +23530,41 @@ arm_output_strex (emit_f emit,
> 		  rtx value,
> 		  rtx memory)
> {
>-  const char *suffix = arm_ldrex_suffix (mode);
>-  rtx operands[3];
>+  rtx operands[4];
>
>   operands[0] = result;
>   operands[1] = value;
>-  operands[2] = memory;
>-  arm_output_asm_insn (emit, 0, operands, "strex%s%s\t%%0, %%1, %%C2", suffix,
>-		       cc);
>+  if (mode != DImode)
>+    {
>+      const char *suffix = arm_ldrex_suffix (mode);
>+      operands[2] = memory;
>+      arm_output_asm_insn (emit, 0, operands, "strex%s%s\t%%0, %%1, %%C2",
>+			  suffix, cc);
>+    }
>+  else
>+    {
>+      /* The restrictions on target registers in ARM mode are that the two
>+	 registers are consecutive and the first one is even; Thumb is
>+	 actually more flexible, but DI should give us this anyway.
>+	 Note that the 1st register always gets the lowest word in memory.  */
>+      gcc_assert ((REGNO (value) & 1) == 0);
>+      operands[2] = gen_rtx_REG (SImode, REGNO (value) + 1);
>+      operands[3] = memory;
>+      arm_output_asm_insn (emit, 0, operands, "strexd%s\t%%0, %%1, %%2, %%C3",
>+			   cc);
>+    }
>

The restriction is actually mandatory for ARM state only and thus I'm fine
with this assertion being true only in ARM state.


I don't like duplicating the tests from gcc.dg into gcc.target/arm.
If you wanted to check for assembler output specific to a target you could
add your own conditions to the test in gcc.dg and conditionalize that on
target arm_eabi

Something like :

{ dg-final { scan-assembler "ldrexd\t"} {target arm_eabi}} } .

I would like a testsuite maintainer to comment on the testsuite infrastructure
bits as well but I have a few comments below .



>> +# Return 1 if the target supports atomic operations on "long long" and can actually
>+# execute them
>+# So far only put checks in for ARM, others may want to add their own
>+proc check_effective_target_sync_longlong { } {
>+    return [check_runtime sync_longlong_runtime {
>+      #include <stdlib.h>
>+      int main()
>+      {
>+	long long l1;
>+
>+	if (sizeof(long long)!=8)

Space between ')' and ! as well as '=' and 8

>+	  exit(1);
>+
>+      #ifdef __arm__

Why is this checking only for ARM state ? We could have ldrexd in T2 as
well ?

Otherwise the functionality looks good to me. Can you confirm that
this has survived a testrun for v7-a thumb2 and v7-a arm state ?

cheers
Ramana

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Patch 2/4] ARM 64 bit sync atomic operations [V2]
  2011-07-26  9:18       ` [Patch 2/4] " Dr. David Alan Gilbert
  2011-07-26  9:23         ` [Patch 3/4] " Dr. David Alan Gilbert
@ 2011-09-30 18:04         ` Ramana Radhakrishnan
  2011-09-30 18:29           ` H.J. Lu
  2011-09-30 20:51           ` Joseph S. Myers
  1 sibling, 2 replies; 43+ messages in thread
From: Ramana Radhakrishnan @ 2011-09-30 18:04 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: gcc-patches, rth, joseph, patches

On 26 July 2011 10:01, Dr. David Alan Gilbert <david.gilbert@linaro.org> wrote:
>
> +
> +extern unsigned int __write(int fd, const void *buf, unsigned int count);

Why are we using __write instead of write?

A comment elaborating that this file should only be in the static
libgcc and never in the dynamic libgcc would be useful, given that the
constructor is only pulled in only if a 64 bit sync primitive is
referred to.

cheers
Ramana

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Patch 2/4] ARM 64 bit sync atomic operations [V2]
  2011-09-30 18:04         ` [Patch 2/4] " Ramana Radhakrishnan
@ 2011-09-30 18:29           ` H.J. Lu
  2011-10-03 13:22             ` David Gilbert
  2011-09-30 20:51           ` Joseph S. Myers
  1 sibling, 1 reply; 43+ messages in thread
From: H.J. Lu @ 2011-09-30 18:29 UTC (permalink / raw)
  To: Ramana Radhakrishnan
  Cc: Dr. David Alan Gilbert, gcc-patches, rth, joseph, patches

On Fri, Sep 30, 2011 at 9:45 AM, Ramana Radhakrishnan
<ramana.radhakrishnan@linaro.org> wrote:
> On 26 July 2011 10:01, Dr. David Alan Gilbert <david.gilbert@linaro.org> wrote:
>>
>> +
>> +extern unsigned int __write(int fd, const void *buf, unsigned int count);
>
> Why are we using __write instead of write?
>
> A comment elaborating that this file should only be in the static
> libgcc and never in the dynamic libgcc would be useful, given that the
> constructor is only pulled in only if a 64 bit sync primitive is
> referred to.
>

You may want to look a look at:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50583

ARM may have the same problem.

-- 
H.J.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Patch 2/4] ARM 64 bit sync atomic operations [V2]
  2011-09-30 18:04         ` [Patch 2/4] " Ramana Radhakrishnan
  2011-09-30 18:29           ` H.J. Lu
@ 2011-09-30 20:51           ` Joseph S. Myers
  2011-10-03  8:35             ` Andrew Haley
  1 sibling, 1 reply; 43+ messages in thread
From: Joseph S. Myers @ 2011-09-30 20:51 UTC (permalink / raw)
  To: Ramana Radhakrishnan; +Cc: Dr. David Alan Gilbert, gcc-patches, rth, patches

On Fri, 30 Sep 2011, Ramana Radhakrishnan wrote:

> On 26 July 2011 10:01, Dr. David Alan Gilbert <david.gilbert@linaro.org> wrote:
> >
> > +
> > +extern unsigned int __write(int fd, const void *buf, unsigned int count);
> 
> Why are we using __write instead of write?

Because plain write is in the user's namespace in ISO C.  See what I said 
in <http://gcc.gnu.org/ml/gcc-patches/2011-07/msg00084.html> - the 
alternative is hardcoding the syscall number and using the syscall 
directly.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Patch 2/4] ARM 64 bit sync atomic operations [V2]
  2011-09-30 20:51           ` Joseph S. Myers
@ 2011-10-03  8:35             ` Andrew Haley
  2011-10-03 14:04               ` David Gilbert
  2011-10-03 15:56               ` Joseph S. Myers
  0 siblings, 2 replies; 43+ messages in thread
From: Andrew Haley @ 2011-10-03  8:35 UTC (permalink / raw)
  To: gcc-patches

On 09/30/2011 08:54 PM, Joseph S. Myers wrote:
> On Fri, 30 Sep 2011, Ramana Radhakrishnan wrote:
> 
>> On 26 July 2011 10:01, Dr. David Alan Gilbert <david.gilbert@linaro.org> wrote:
>>>
>>> +
>>> +extern unsigned int __write(int fd, const void *buf, unsigned int count);
>>
>> Why are we using __write instead of write?
> 
> Because plain write is in the user's namespace in ISO C.  See what I said 
> in <http://gcc.gnu.org/ml/gcc-patches/2011-07/msg00084.html> - the 
> alternative is hardcoding the syscall number and using the syscall 
> directly.

That would be better, no?  Unless __write is part of the glibc API,
which AFAIK it isn't.

Andrew.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Patch 1/4] ARM 64 bit sync atomic operations [V2]
  2011-09-30 14:25       ` [Patch 1/4] " Ramana Radhakrishnan
@ 2011-10-03 12:53         ` David Gilbert
  2011-10-03 15:18           ` David Gilbert
  0 siblings, 1 reply; 43+ messages in thread
From: David Gilbert @ 2011-10-03 12:53 UTC (permalink / raw)
  To: Ramana Radhakrishnan; +Cc: gcc-patches, rth, joseph, patches

On 30 September 2011 14:21, Ramana Radhakrishnan
<ramana.radhakrishnan@linaro.org> wrote:
> Hi Dave,
>
>
> The nit-picky bit - There are still a number of formatting issues with
> your patch . Could you run your patch through
> contrib/check_GNU_style.sh and correct these. These are typically
> around problems with the number of spaces between a full stop and the
> end of comment, lines with trailing whitespaces and a few lines with
> number of characters > 80.  Thanks.

Oops - sorry about those; I'll run it through the check script and nail them.

>>@@ -23590,82 +23637,142 @@ arm_output_sync_loop (emit_f emit,
>>
>>+      else
>>+      {
>>+        /* Silence false potentially unused warning */
>>+        required_value_lo = NULL;
>>+        required_value_hi = NULL;
>>+      }
>>
>
> s/NULL/NULL_RTX in a number of places in arm.c

OK.

>>+      /* The restrictions on target registers in ARM mode are that the two
>>+       registers are consecutive and the first one is even; Thumb is
>>+       actually more flexible, but DI should give us this anyway.
>>+       Note that the 1st register always gets the lowest word in memory.  */
>>+      gcc_assert ((REGNO (value) & 1) == 0);
>>+      operands[2] = gen_rtx_REG (SImode, REGNO (value) + 1);
>>+      operands[3] = memory;
>>+      arm_output_asm_insn (emit, 0, operands, "strexd%s\t%%0, %%1, %%2, %%C3",
>>+                         cc);
>>+    }
>>
>
> The restriction is actually mandatory for ARM state only and thus I'm fine
> with this assertion being true only in ARM state.

OK, I can make the assert only for thumb mode; but I thought the simpler
logic was better and should hold true anyway because of DI mode allocation.

> I don't like duplicating the tests from gcc.dg into gcc.target/arm.
> If you wanted to check for assembler output specific to a target you could
> add your own conditions to the test in gcc.dg and conditionalize that on
> target arm_eabi
>
> Something like :
>
> { dg-final { scan-assembler "ldrexd\t"} {target arm_eabi}} } .
>
> I would like a testsuite maintainer to comment on the testsuite infrastructure
> bits as well but I have a few comments below .

As discussed, I don't like the dupes either - the problem is that we
have 3 tests
with identical code but different dg annotation:

   1) Build & run and check that the sync behaves correctly - using whatever
       compile flags you happen to have. (gcc.dg version)
   2) Build and check assembler for use of ldrexd - compiled with armv6k flags
   3) Build and check assembler doesn't use ldrexd - compiled with armv5 flags

Because (2) and (3) include different dg-add-options lines I don't see
how I can combine them.

The suggestion that I'm OK with is to #include the gcc.dg one in the
gcc.arm one.

>>> +# Return 1 if the target supports atomic operations on "long long" and can actually
>>+# execute them
>>+# So far only put checks in for ARM, others may want to add their own
>>+proc check_effective_target_sync_longlong { } {
>>+    return [check_runtime sync_longlong_runtime {
>>+      #include <stdlib.h>
>>+      int main()
>>+      {
>>+      long long l1;
>>+
>>+      if (sizeof(long long)!=8)
>
> Space between ')' and ! as well as '=' and 8
>
>>+        exit(1);
>>+
>>+      #ifdef __arm__
>
> Why is this checking only for ARM state ? We could have ldrexd in T2 as
> well ?

Because __arm__ gets defined for either thumb or arm mode; in thumb mode
we just get __thumb__  (and __thumb2__) defined as well.

> Otherwise the functionality looks good to me. Can you confirm that
> this has survived a testrun for v7-a thumb2 and v7-a arm state ?

Yes it did.  I'll give it another whirl later today after I go and fix
the formatting niggles and mvoe the test.

Thanks for the review.

Dave

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Patch 2/4] ARM 64 bit sync atomic operations [V2]
  2011-09-30 18:29           ` H.J. Lu
@ 2011-10-03 13:22             ` David Gilbert
  0 siblings, 0 replies; 43+ messages in thread
From: David Gilbert @ 2011-10-03 13:22 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Ramana Radhakrishnan, gcc-patches, rth, joseph, patches

On 30 September 2011 18:01, H.J. Lu <hjl.tools@gmail.com> wrote:
> You may want to look a look at:
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50583
>
> ARM may have the same problem.

OK - although to be honest this patch only stretches the same
structures to 64bit - any major changes in semantics are a separate issue - but
thanks for pointing it out.

Hmm - I think what's produced is correct; however the manual
description is inconsistent:

     These builtins perform the operation suggested by the name, and
     returns the value that had previously been in memory.  That is,

          { tmp = *ptr; *ptr OP= value; return tmp; }

The ARM code (see below) does a single load inside a loop with a guarded
store.  This guarantees that the value returned is the value that
was 'previously been in memory' directly prior to the atomic operation - however
that does mean it doesn't do the pair of accesses implied by the 'tmp
= *ptr; *ptr OP= value'

On ARM the operation for fetch_and_add we get:
(This is pre-my-patch and 32bit, my patch doesn't change the structure
except for the position of that last label):

	mov	r3, r0
	dmb	sy
	.LSYT6:
	ldrex	r0, [r3]
	add	r2, r0, r1
	strex	r0, r2, [r3]
	teq	r0, #0
	bne	.LSYT6
	sub	r0, r2, r1
	dmb	sy

That seems the correct semantics to me - if not what am I missing? Was
the intention of the example
really to cause two loads - if so why?

for sync_and_fetch we get:


	dmb	sy
	.LSYT6:
	ldrex	r0, [r3]
	add	r0, r0, r1
	strex	r2, r0, [r3]
	teq	r2, #0
	bne	.LSYT6
	dmb	sy

i.e. the value returned is always the value that goes into the guarded
store - and is hence
always the value that's stored.

Dave

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Patch 2/4] ARM 64 bit sync atomic operations [V2]
  2011-10-03  8:35             ` Andrew Haley
@ 2011-10-03 14:04               ` David Gilbert
  2011-10-03 15:56               ` Joseph S. Myers
  1 sibling, 0 replies; 43+ messages in thread
From: David Gilbert @ 2011-10-03 14:04 UTC (permalink / raw)
  To: Andrew Haley; +Cc: gcc-patches

On 3 October 2011 09:35, Andrew Haley <aph@redhat.com> wrote:
> On 09/30/2011 08:54 PM, Joseph S. Myers wrote:
>> On Fri, 30 Sep 2011, Ramana Radhakrishnan wrote:
>>
>>> On 26 July 2011 10:01, Dr. David Alan Gilbert <david.gilbert@linaro.org> wrote:
>>>>
>>>> +
>>>> +extern unsigned int __write(int fd, const void *buf, unsigned int count);
>>>
>>> Why are we using __write instead of write?
>>
>> Because plain write is in the user's namespace in ISO C.  See what I said
>> in <http://gcc.gnu.org/ml/gcc-patches/2011-07/msg00084.html> - the
>> alternative is hardcoding the syscall number and using the syscall
>> directly.
>
> That would be better, no?  Unless __write is part of the glibc API,
> which AFAIK it isn't.

I could change it to calling the syscall directly - although it gets
a little messy having to deal with both ARM and Thumb syscalls;
I was trying to avoid further complicating an already complicated corner
case.

Dave

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Patch 1/4] ARM 64 bit sync atomic operations [V2]
  2011-10-03 12:53         ` David Gilbert
@ 2011-10-03 15:18           ` David Gilbert
  2011-10-06 17:52             ` [Patch 0/5] ARM 64 bit sync atomic operations [V3] Dr. David Alan Gilbert
  0 siblings, 1 reply; 43+ messages in thread
From: David Gilbert @ 2011-10-03 15:18 UTC (permalink / raw)
  To: Ramana Radhakrishnan; +Cc: gcc-patches, rth, joseph, patches, mikestump, ro

(Sorry, repost - I'd meant to cc Mike and Rainer into the
conversation, but forgot to
add them).

On 3 October 2011 13:53, David Gilbert <david.gilbert@linaro.org> wrote:
> On 30 September 2011 14:21, Ramana Radhakrishnan
> <ramana.radhakrishnan@linaro.org> wrote:
>> Hi Dave,
>>
>>
>> The nit-picky bit - There are still a number of formatting issues with
>> your patch . Could you run your patch through
>> contrib/check_GNU_style.sh and correct these. These are typically
>> around problems with the number of spaces between a full stop and the
>> end of comment, lines with trailing whitespaces and a few lines with
>> number of characters > 80.  Thanks.
>
> Oops - sorry about those; I'll run it through the check script and nail them.
>
>>>@@ -23590,82 +23637,142 @@ arm_output_sync_loop (emit_f emit,
>>>
>>>+      else
>>>+      {
>>>+        /* Silence false potentially unused warning */
>>>+        required_value_lo = NULL;
>>>+        required_value_hi = NULL;
>>>+      }
>>>
>>
>> s/NULL/NULL_RTX in a number of places in arm.c
>
> OK.
>
>>>+      /* The restrictions on target registers in ARM mode are that the two
>>>+       registers are consecutive and the first one is even; Thumb is
>>>+       actually more flexible, but DI should give us this anyway.
>>>+       Note that the 1st register always gets the lowest word in memory.  */
>>>+      gcc_assert ((REGNO (value) & 1) == 0);
>>>+      operands[2] = gen_rtx_REG (SImode, REGNO (value) + 1);
>>>+      operands[3] = memory;
>>>+      arm_output_asm_insn (emit, 0, operands, "strexd%s\t%%0, %%1, %%2, %%C3",
>>>+                         cc);
>>>+    }
>>>
>>
>> The restriction is actually mandatory for ARM state only and thus I'm fine
>> with this assertion being true only in ARM state.
>
> OK, I can make the assert only for thumb mode; but I thought the simpler
> logic was better and should hold true anyway because of DI mode allocation.
>
>> I don't like duplicating the tests from gcc.dg into gcc.target/arm.
>> If you wanted to check for assembler output specific to a target you could
>> add your own conditions to the test in gcc.dg and conditionalize that on
>> target arm_eabi
>>
>> Something like :
>>
>> { dg-final { scan-assembler "ldrexd\t"} {target arm_eabi}} } .
>>
>> I would like a testsuite maintainer to comment on the testsuite infrastructure
>> bits as well but I have a few comments below .
>
> As discussed, I don't like the dupes either - the problem is that we
> have 3 tests
> with identical code but different dg annotation:
>
>   1) Build & run and check that the sync behaves correctly - using whatever
>       compile flags you happen to have. (gcc.dg version)
>   2) Build and check assembler for use of ldrexd - compiled with armv6k flags
>   3) Build and check assembler doesn't use ldrexd - compiled with armv5 flags
>
> Because (2) and (3) include different dg-add-options lines I don't see
> how I can combine them.
>
> The suggestion that I'm OK with is to #include the gcc.dg one in the
> gcc.arm one.
>
>>>> +# Return 1 if the target supports atomic operations on "long long" and can actually
>>>+# execute them
>>>+# So far only put checks in for ARM, others may want to add their own
>>>+proc check_effective_target_sync_longlong { } {
>>>+    return [check_runtime sync_longlong_runtime {
>>>+      #include <stdlib.h>
>>>+      int main()
>>>+      {
>>>+      long long l1;
>>>+
>>>+      if (sizeof(long long)!=8)
>>
>> Space between ')' and ! as well as '=' and 8
>>
>>>+        exit(1);
>>>+
>>>+      #ifdef __arm__
>>
>> Why is this checking only for ARM state ? We could have ldrexd in T2 as
>> well ?
>
> Because __arm__ gets defined for either thumb or arm mode; in thumb mode
> we just get __thumb__  (and __thumb2__) defined as well.
>
>> Otherwise the functionality looks good to me. Can you confirm that
>> this has survived a testrun for v7-a thumb2 and v7-a arm state ?
>
> Yes it did.  I'll give it another whirl later today after I go and fix
> the formatting niggles and mvoe the test.
>
> Thanks for the review.
>
> Dave
>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Patch 2/4] ARM 64 bit sync atomic operations [V2]
  2011-10-03  8:35             ` Andrew Haley
  2011-10-03 14:04               ` David Gilbert
@ 2011-10-03 15:56               ` Joseph S. Myers
  2011-10-03 16:07                 ` Jakub Jelinek
  1 sibling, 1 reply; 43+ messages in thread
From: Joseph S. Myers @ 2011-10-03 15:56 UTC (permalink / raw)
  To: Andrew Haley; +Cc: gcc-patches

On Mon, 3 Oct 2011, Andrew Haley wrote:

> On 09/30/2011 08:54 PM, Joseph S. Myers wrote:
> > On Fri, 30 Sep 2011, Ramana Radhakrishnan wrote:
> > 
> >> On 26 July 2011 10:01, Dr. David Alan Gilbert <david.gilbert@linaro.org> wrote:
> >>>
> >>> +
> >>> +extern unsigned int __write(int fd, const void *buf, unsigned int count);
> >>
> >> Why are we using __write instead of write?
> > 
> > Because plain write is in the user's namespace in ISO C.  See what I said 
> > in <http://gcc.gnu.org/ml/gcc-patches/2011-07/msg00084.html> - the 
> > alternative is hardcoding the syscall number and using the syscall 
> > directly.
> 
> That would be better, no?  Unless __write is part of the glibc API,
> which AFAIK it isn't.

It's exported at version GLIBC_2.0 (not GLIBC_PRIVATE) under the comment 
"functions used in inline functions or macros", although I don't actually 
see any such functions or macros in current glibc headers.  I think being 
under a public version means you can rely on it staying there.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Patch 2/4] ARM 64 bit sync atomic operations [V2]
  2011-10-03 15:56               ` Joseph S. Myers
@ 2011-10-03 16:07                 ` Jakub Jelinek
  0 siblings, 0 replies; 43+ messages in thread
From: Jakub Jelinek @ 2011-10-03 16:07 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: Andrew Haley, gcc-patches

On Mon, Oct 03, 2011 at 03:55:58PM +0000, Joseph S. Myers wrote:
> > That would be better, no?  Unless __write is part of the glibc API,
> > which AFAIK it isn't.
> 
> It's exported at version GLIBC_2.0 (not GLIBC_PRIVATE) under the comment 
> "functions used in inline functions or macros", although I don't actually 
> see any such functions or macros in current glibc headers.  I think being 
> under a public version means you can rely on it staying there.

On the architectures where it has been exported, yes.  New architectures
might prune those.  But we are talking here about a particular architecture
which had these exported, therefore they will continue to be exported
in the future as well, unless glibc SONAME changes (very unlikely).

	Jakub

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [Patch 0/5] ARM 64 bit sync atomic operations [V3]
  2011-10-03 15:18           ` David Gilbert
@ 2011-10-06 17:52             ` Dr. David Alan Gilbert
  2011-10-06 17:53               ` [Patch 1/5] " Dr. David Alan Gilbert
  0 siblings, 1 reply; 43+ messages in thread
From: Dr. David Alan Gilbert @ 2011-10-06 17:52 UTC (permalink / raw)
  To: gcc-patches; +Cc: Ramana Radhakrishnan, rth, joseph, patches, mikestump, ro


Hi,
  This is V3 of a series of 5 patches relating to ARM atomic operations;
they incorporate most of the feedback from V2.  Note the patch numbering/
ordering is different from v2; the two simple patches are now first.

  1) Correct the definition of TARGET_HAVE_DMB_MCR so that it doesn't
     produce the mcr instruction in Thumb1 (and enable on ARMv6 not just 6k
     as per the docs).
  2) Fix pr48126 which is a misplaced barrier in the atomic generation
  3) Provide 64 bit atomic operations using the new ldrexd/strexd in ARMv6k 
     and above.
  4) Provide fallbacks so that when compiled for earlier CPUs a Linux kernel
     asssist is called (as per 32bit and smaller ops)
  5) Add test cases and support for those test cases, for the operations
     added in (3) and (4).

This code has been tested built on x86-64 cross to ARM run in ARM and Thumb
(C, C++, Fortran).

It is against git rev 68a79dfc.

Relative to v2:
  Test cases split out
  Code sharing between the test cases
  More coding style cleanup
  A handful of NULL->NULL_RTX changes

Relative to v1:
  Split the DMB_MCR patch out
  Provide complete changelogs
  Don't emit IT instruction except in Thumb2 mode
  Move iterators to iterators.md (didn't move the table since it was specific
    to sync.md)
  Remove sync_atleastsi
  Use sync_predtab in as many places as possible
  Avoid headers in libgcc
  Made various libgcc routines I added static
  used __write instead of write
  Comment the barrier move to explain it more

  Note that the kernel interface has remained the same for the helper, and as
such I've not changed the way the helper calling in patch 2 is structured.

This work is part of Linaro blueprint:
https://blueprints.launchpad.net/linaro-toolchain-misc/+spec/64-bit-sync-primitives

Dave

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [Patch 1/5] ARM 64 bit sync atomic operations [V3]
  2011-10-06 17:52             ` [Patch 0/5] ARM 64 bit sync atomic operations [V3] Dr. David Alan Gilbert
@ 2011-10-06 17:53               ` Dr. David Alan Gilbert
  2011-10-06 17:54                 ` [Patch 2/5] " Dr. David Alan Gilbert
  2011-10-12  0:51                 ` [Patch 1/5] " Ramana Radhakrishnan
  0 siblings, 2 replies; 43+ messages in thread
From: Dr. David Alan Gilbert @ 2011-10-06 17:53 UTC (permalink / raw)
  To: gcc-patches; +Cc: Ramana Radhakrishnan, rth, joseph, patches, mikestump, ro

       gcc/
       * config/arm/arm.c (TARGET_HAVE_DMB_MCR): MCR Not available in Thumb1

diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 993e3a0..f6f1da7 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -288,7 +288,8 @@ extern void (*arm_lang_output_object_attributes_hook)(void);
 #define TARGET_HAVE_DMB		(arm_arch7)
 
 /* Nonzero if this chip implements a memory barrier via CP15.  */
-#define TARGET_HAVE_DMB_MCR	(arm_arch6k && ! TARGET_HAVE_DMB)
+#define TARGET_HAVE_DMB_MCR	(arm_arch6 && ! TARGET_HAVE_DMB \
+				 && ! TARGET_THUMB1)
 
 /* Nonzero if this chip implements a memory barrier instruction.  */
 #define TARGET_HAVE_MEMORY_BARRIER (TARGET_HAVE_DMB || TARGET_HAVE_DMB_MCR)

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [Patch 2/5] ARM 64 bit sync atomic operations [V3]
  2011-10-06 17:53               ` [Patch 1/5] " Dr. David Alan Gilbert
@ 2011-10-06 17:54                 ` Dr. David Alan Gilbert
  2011-10-06 17:55                   ` [Patch 3/5] " Dr. David Alan Gilbert
                                     ` (2 more replies)
  2011-10-12  0:51                 ` [Patch 1/5] " Ramana Radhakrishnan
  1 sibling, 3 replies; 43+ messages in thread
From: Dr. David Alan Gilbert @ 2011-10-06 17:54 UTC (permalink / raw)
  To: gcc-patches; +Cc: Ramana Radhakrishnan, rth, joseph, patches, mikestump, ro

        Micahel K. Edwards points out in PR/48126 that the sync is in the wrong place
        relative to the branch target of the compare, since the load could float
        up beyond the ldrex.
  
        PR target/48126
    
          * config/arm/arm.c (arm_output_sync_loop): Move label before barrier

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 5161439..6e7105a 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -24214,8 +24214,11 @@ arm_output_sync_loop (emit_f emit,
 	}
     }
 
-  arm_process_output_memory_barrier (emit, NULL);
+  /* Note: label is before barrier so that in cmp failure case we still get
+     a barrier to stop subsequent loads floating upwards past the ldrex
+     pr/48126.  */
   arm_output_asm_insn (emit, 1, operands, "%sLSYB%%=:", LOCAL_LABEL_PREFIX);
+  arm_process_output_memory_barrier (emit, NULL);
 }
 
 static rtx

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [Patch 3/5] ARM 64 bit sync atomic operations [V3]
  2011-10-06 17:54                 ` [Patch 2/5] " Dr. David Alan Gilbert
@ 2011-10-06 17:55                   ` Dr. David Alan Gilbert
  2011-10-06 18:03                     ` [Patch 4/5] " Dr. David Alan Gilbert
  2011-10-12  2:35                     ` [Patch 3/5] " Ramana Radhakrishnan
  2011-10-12  0:52                   ` [Patch 2/5] " Ramana Radhakrishnan
  2011-10-12  1:45                   ` Ramana Radhakrishnan
  2 siblings, 2 replies; 43+ messages in thread
From: Dr. David Alan Gilbert @ 2011-10-06 17:55 UTC (permalink / raw)
  To: gcc-patches; +Cc: Ramana Radhakrishnan, rth, joseph, patches, mikestump, ro

    Add support for ARM 64bit sync intrinsics.

        gcc/
        * arm.c (arm_output_ldrex): Support ldrexd.
          (arm_output_strex): Support strexd.
          (arm_output_it): New helper to output it in Thumb2 mode only.
          (arm_output_sync_loop): Support DI mode,
                                 Change comment to not support const_int.
          (arm_expand_sync): Support DI mode.

        * arm.h (TARGET_HAVE_LDREXBHD): Split into LDREXBH and LDREXD.

        * iterators.md (NARROW): move from sync.md.
          (QHSD): New iterator for all current ARM integer modes.
          (SIDI): New iterator for SI and DI modes only.

        * sync.md  (sync_predtab): New mode_attr
          (sync_compare_and_swapsi): Fold into sync_compare_and_swap<mode>
          (sync_lock_test_and_setsi): Fold into sync_lock_test_and_setsi<mode>
          (sync_<sync_optab>si): Fold into sync_<sync_optab><mode>
          (sync_nandsi): Fold into sync_nand<mode>
          (sync_new_<sync_optab>si): Fold into sync_new_<sync_optab><mode>
          (sync_new_nandsi): Fold into sync_new_nand<mode>
          (sync_old_<sync_optab>si): Fold into sync_old_<sync_optab><mode>
          (sync_old_nandsi): Fold into sync_old_nand<mode>
          (sync_compare_and_swap<mode>): Support SI & DI
          (sync_lock_test_and_set<mode>): Likewise
          (sync_<sync_optab><mode>): Likewise
          (sync_nand<mode>): Likewise
          (sync_new_<sync_optab><mode>): Likewise
          (sync_new_nand<mode>): Likewise
          (sync_old_<sync_optab><mode>): Likewise
          (sync_old_nand<mode>): Likewise
          (arm_sync_compare_and_swapsi): Turn into iterator on SI & DI
          (arm_sync_lock_test_and_setsi): Likewise
          (arm_sync_new_<sync_optab>si): Likewise
          (arm_sync_new_nandsi): Likewise
          (arm_sync_old_<sync_optab>si): Likewise
          (arm_sync_old_nandsi): Likewise
          (arm_sync_compare_and_swap<mode> NARROW): use sync_predtab, fix indent
          (arm_sync_lock_test_and_setsi<mode> NARROW): Likewise
          (arm_sync_new_<sync_optab><mode> NARROW): Likewise
          (arm_sync_new_nand<mode> NARROW): Likewise
          (arm_sync_old_<sync_optab><mode> NARROW): Likewise
          (arm_sync_old_nand<mode> NARROW): Likewise

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 6e7105a..51c0f3f 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -24039,12 +24039,26 @@ arm_output_ldrex (emit_f emit,
 		  rtx target,
 		  rtx memory)
 {
-  const char *suffix = arm_ldrex_suffix (mode);
-  rtx operands[2];
+  rtx operands[3];
 
   operands[0] = target;
-  operands[1] = memory;
-  arm_output_asm_insn (emit, 0, operands, "ldrex%s\t%%0, %%C1", suffix);
+  if (mode != DImode)
+    {
+      const char *suffix = arm_ldrex_suffix (mode);
+      operands[1] = memory;
+      arm_output_asm_insn (emit, 0, operands, "ldrex%s\t%%0, %%C1", suffix);
+    }
+  else
+    {
+      /* The restrictions on target registers in ARM mode are that the two
+	 registers are consecutive and the first one is even; Thumb is
+	 actually more flexible, but DI should give us this anyway.
+	 Note that the 1st register always gets the lowest word in memory.  */
+      gcc_assert ((REGNO (target) & 1) == 0);
+      operands[1] = gen_rtx_REG (SImode, REGNO (target) + 1);
+      operands[2] = memory;
+      arm_output_asm_insn (emit, 0, operands, "ldrexd\t%%0, %%1, %%C2");
+    }
 }
 
 /* Emit a strex{b,h,d, } instruction appropriate for the specified
@@ -24057,14 +24071,41 @@ arm_output_strex (emit_f emit,
 		  rtx value,
 		  rtx memory)
 {
-  const char *suffix = arm_ldrex_suffix (mode);
-  rtx operands[3];
+  rtx operands[4];
 
   operands[0] = result;
   operands[1] = value;
-  operands[2] = memory;
-  arm_output_asm_insn (emit, 0, operands, "strex%s%s\t%%0, %%1, %%C2", suffix,
-		       cc);
+  if (mode != DImode)
+    {
+      const char *suffix = arm_ldrex_suffix (mode);
+      operands[2] = memory;
+      arm_output_asm_insn (emit, 0, operands, "strex%s%s\t%%0, %%1, %%C2",
+			  suffix, cc);
+    }
+  else
+    {
+      /* The restrictions on target registers in ARM mode are that the two
+	 registers are consecutive and the first one is even; Thumb is
+	 actually more flexible, but DI should give us this anyway.
+	 Note that the 1st register always gets the lowest word in memory.  */
+      gcc_assert ((REGNO (value) & 1) == 0 || TARGET_THUMB2);
+      operands[2] = gen_rtx_REG (SImode, REGNO (value) + 1);
+      operands[3] = memory;
+      arm_output_asm_insn (emit, 0, operands, "strexd%s\t%%0, %%1, %%2, %%C3",
+			   cc);
+    }
+}
+
+/* Helper to emit an it instruction in Thumb2 mode only; although the assembler
+   will ignore it in ARM mode, emitting it will mess up instruction counts we
+   sometimes keep 'flags' are the extra t's and e's if it's more than one
+   instruction that is conditional.  */
+static void
+arm_output_it (emit_f emit, const char *flags, const char *cond)
+{
+  rtx operands[1]; /* Don't actually use the operand.  */
+  if (TARGET_THUMB2)
+    arm_output_asm_insn (emit, 0, operands, "it%s\t%s", flags, cond);
 }
 
 /* Helper to emit a two operand instruction.  */
@@ -24106,7 +24147,7 @@ arm_output_op3 (emit_f emit, const char *mnemonic, rtx d, rtx a, rtx b)
 
    required_value:
 
-   RTX register or const_int representing the required old_value for
+   RTX register representing the required old_value for
    the modify to continue, if NULL no comparsion is performed.  */
 static void
 arm_output_sync_loop (emit_f emit,
@@ -24120,7 +24161,13 @@ arm_output_sync_loop (emit_f emit,
 		      enum attr_sync_op sync_op,
 		      int early_barrier_required)
 {
-  rtx operands[1];
+  rtx operands[2];
+  /* We'll use the lo for the normal rtx in the none-DI case
+     as well as the least-sig word in the DI case.  */
+  rtx old_value_lo, required_value_lo, new_value_lo, t1_lo;
+  rtx old_value_hi, required_value_hi, new_value_hi, t1_hi;
+
+  bool is_di = mode == DImode;
 
   gcc_assert (t1 != t2);
 
@@ -24131,82 +24178,142 @@ arm_output_sync_loop (emit_f emit,
 
   arm_output_ldrex (emit, mode, old_value, memory);
 
+  if (is_di)
+    {
+      old_value_lo = gen_lowpart (SImode, old_value);
+      old_value_hi = gen_highpart (SImode, old_value);
+      if (required_value)
+	{
+	  required_value_lo = gen_lowpart (SImode, required_value);
+	  required_value_hi = gen_highpart (SImode, required_value);
+	}
+      else
+	{
+	  /* Silence false potentially unused warning.  */
+	  required_value_lo = NULL_RTX;
+	  required_value_hi = NULL_RTX;
+	}
+      new_value_lo = gen_lowpart (SImode, new_value);
+      new_value_hi = gen_highpart (SImode, new_value);
+      t1_lo = gen_lowpart (SImode, t1);
+      t1_hi = gen_highpart (SImode, t1);
+    }
+  else
+    {
+      old_value_lo = old_value;
+      new_value_lo = new_value;
+      required_value_lo = required_value;
+      t1_lo = t1;
+
+      /* Silence false potentially unused warning.  */
+      t1_hi = NULL_RTX;
+      new_value_hi = NULL_RTX;
+      required_value_hi = NULL_RTX;
+      old_value_hi = NULL_RTX;
+    }
+
   if (required_value)
     {
-      rtx operands[2];
+      operands[0] = old_value_lo;
+      operands[1] = required_value_lo;
 
-      operands[0] = old_value;
-      operands[1] = required_value;
       arm_output_asm_insn (emit, 0, operands, "cmp\t%%0, %%1");
+      if (is_di)
+        {
+          arm_output_it (emit, "", "eq");
+          arm_output_op2 (emit, "cmpeq", old_value_hi, required_value_hi);
+        }
       arm_output_asm_insn (emit, 0, operands, "bne\t%sLSYB%%=", LOCAL_LABEL_PREFIX);
     }
 
   switch (sync_op)
     {
     case SYNC_OP_ADD:
-      arm_output_op3 (emit, "add", t1, old_value, new_value);
+      arm_output_op3 (emit, is_di ? "adds" : "add",
+		      t1_lo, old_value_lo, new_value_lo);
+      if (is_di)
+	arm_output_op3 (emit, "adc", t1_hi, old_value_hi, new_value_hi);
       break;
 
     case SYNC_OP_SUB:
-      arm_output_op3 (emit, "sub", t1, old_value, new_value);
+      arm_output_op3 (emit, is_di ? "subs" : "sub",
+		      t1_lo, old_value_lo, new_value_lo);
+      if (is_di)
+	arm_output_op3 (emit, "sbc", t1_hi, old_value_hi, new_value_hi);
       break;
 
     case SYNC_OP_IOR:
-      arm_output_op3 (emit, "orr", t1, old_value, new_value);
+      arm_output_op3 (emit, "orr", t1_lo, old_value_lo, new_value_lo);
+      if (is_di)
+	arm_output_op3 (emit, "orr", t1_hi, old_value_hi, new_value_hi);
       break;
 
     case SYNC_OP_XOR:
-      arm_output_op3 (emit, "eor", t1, old_value, new_value);
+      arm_output_op3 (emit, "eor", t1_lo, old_value_lo, new_value_lo);
+      if (is_di)
+	arm_output_op3 (emit, "eor", t1_hi, old_value_hi, new_value_hi);
       break;
 
     case SYNC_OP_AND:
-      arm_output_op3 (emit,"and", t1, old_value, new_value);
+      arm_output_op3 (emit,"and", t1_lo, old_value_lo, new_value_lo);
+      if (is_di)
+	arm_output_op3 (emit, "and", t1_hi, old_value_hi, new_value_hi);
       break;
 
     case SYNC_OP_NAND:
-      arm_output_op3 (emit, "and", t1, old_value, new_value);
-      arm_output_op2 (emit, "mvn", t1, t1);
+      arm_output_op3 (emit, "and", t1_lo, old_value_lo, new_value_lo);
+      if (is_di)
+	arm_output_op3 (emit, "and", t1_hi, old_value_hi, new_value_hi);
+      arm_output_op2 (emit, "mvn", t1_lo, t1_lo);
+      if (is_di)
+	arm_output_op2 (emit, "mvn", t1_hi, t1_hi);
       break;
 
     case SYNC_OP_NONE:
       t1 = new_value;
+      t1_lo = new_value_lo;
+      if (is_di)
+	t1_hi = new_value_hi;
       break;
     }
 
+  /* Note that the result of strex is a 0/1 flag that's always 1 register.  */
   if (t2)
     {
-       arm_output_strex (emit, mode, "", t2, t1, memory);
-       operands[0] = t2;
-       arm_output_asm_insn (emit, 0, operands, "teq\t%%0, #0");
-       arm_output_asm_insn (emit, 0, operands, "bne\t%sLSYT%%=",
-			    LOCAL_LABEL_PREFIX);
+      arm_output_strex (emit, mode, "", t2, t1, memory);
+      operands[0] = t2;
+      arm_output_asm_insn (emit, 0, operands, "teq\t%%0, #0");
+      arm_output_asm_insn (emit, 0, operands, "bne\t%sLSYT%%=",
+			   LOCAL_LABEL_PREFIX);
     }
   else
     {
       /* Use old_value for the return value because for some operations
 	 the old_value can easily be restored.  This saves one register.  */
-      arm_output_strex (emit, mode, "", old_value, t1, memory);
-      operands[0] = old_value;
+      arm_output_strex (emit, mode, "", old_value_lo, t1, memory);
+      operands[0] = old_value_lo;
       arm_output_asm_insn (emit, 0, operands, "teq\t%%0, #0");
       arm_output_asm_insn (emit, 0, operands, "bne\t%sLSYT%%=",
 			   LOCAL_LABEL_PREFIX);
 
+      /* Note that we only used the _lo half of old_value as a temporary
+	 so in DI we don't have to restore the _hi part.  */
       switch (sync_op)
 	{
 	case SYNC_OP_ADD:
-	  arm_output_op3 (emit, "sub", old_value, t1, new_value);
+	  arm_output_op3 (emit, "sub", old_value_lo, t1_lo, new_value_lo);
 	  break;
 
 	case SYNC_OP_SUB:
-	  arm_output_op3 (emit, "add", old_value, t1, new_value);
+	  arm_output_op3 (emit, "add", old_value_lo, t1_lo, new_value_lo);
 	  break;
 
 	case SYNC_OP_XOR:
-	  arm_output_op3 (emit, "eor", old_value, t1, new_value);
+	  arm_output_op3 (emit, "eor", old_value_lo, t1_lo, new_value_lo);
 	  break;
 
 	case SYNC_OP_NONE:
-	  arm_output_op2 (emit, "mov", old_value, required_value);
+	  arm_output_op2 (emit, "mov", old_value_lo, required_value_lo);
 	  break;
 
 	default:
@@ -24312,7 +24419,7 @@ arm_expand_sync (enum machine_mode mode,
     target = gen_reg_rtx (mode);
 
   memory = arm_legitimize_sync_memory (memory);
-  if (mode != SImode)
+  if (mode != SImode && mode != DImode)
     {
       rtx load_temp = gen_reg_rtx (SImode);
 
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index f6f1da7..995eae0 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -297,8 +297,12 @@ extern void (*arm_lang_output_object_attributes_hook)(void);
 /* Nonzero if this chip supports ldrex and strex */
 #define TARGET_HAVE_LDREX	((arm_arch6 && TARGET_ARM) || arm_arch7)
 
-/* Nonzero if this chip supports ldrex{bhd} and strex{bhd}.  */
-#define TARGET_HAVE_LDREXBHD	((arm_arch6k && TARGET_ARM) || arm_arch7)
+/* Nonzero if this chip supports ldrex{bh} and strex{bh}.  */
+#define TARGET_HAVE_LDREXBH	((arm_arch6k && TARGET_ARM) || arm_arch7)
+
+/* Nonzero if this chip supports ldrexd and strexd.  */
+#define TARGET_HAVE_LDREXD	(((arm_arch6k && TARGET_ARM) || arm_arch7) \
+				 && arm_arch_notm)
 
 /* Nonzero if integer division instructions supported.  */
 #define TARGET_IDIV		((TARGET_ARM && arm_arch_arm_hwdiv) \
diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index da1f7af..85dd641 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -33,6 +33,15 @@
 ;; A list of integer modes that are up to one word long
 (define_mode_iterator QHSI [QI HI SI])
 
+;; A list of integer modes that are less than a word
+(define_mode_iterator NARROW [QI HI])
+
+;; A list of all the integer modes upto 64bit
+(define_mode_iterator QHSD [QI HI SI DI])
+
+;; A list of the 32bit and 64bit integer modes
+(define_mode_iterator SIDI [SI DI])
+
 ;; Integer element sizes implemented by IWMMXT.
 (define_mode_iterator VMMX [V2SI V4HI V8QI])
 
diff --git a/gcc/config/arm/sync.md b/gcc/config/arm/sync.md
index 689a235..40ee93c 100644
--- a/gcc/config/arm/sync.md
+++ b/gcc/config/arm/sync.md
@@ -1,6 +1,7 @@
 ;; Machine description for ARM processor synchronization primitives.
 ;; Copyright (C) 2010 Free Software Foundation, Inc.
 ;; Written by Marcus Shawcroft (marcus.shawcroft@arm.com)
+;; 64bit Atomics by Dave Gilbert (david.gilbert@linaro.org)
 ;;
 ;; This file is part of GCC.
 ;;
@@ -33,31 +34,24 @@
   MEM_VOLATILE_P (operands[0]) = 1;
 })
 
-(define_expand "sync_compare_and_swapsi"
-  [(set (match_operand:SI 0 "s_register_operand")
-        (unspec_volatile:SI [(match_operand:SI 1 "memory_operand")
-			     (match_operand:SI 2 "s_register_operand")
-			     (match_operand:SI 3 "s_register_operand")]
-			     VUNSPEC_SYNC_COMPARE_AND_SWAP))]
-  "TARGET_HAVE_LDREX && TARGET_HAVE_MEMORY_BARRIER"
-  {
-    struct arm_sync_generator generator;
-    generator.op = arm_sync_generator_omrn;
-    generator.u.omrn = gen_arm_sync_compare_and_swapsi;
-    arm_expand_sync (SImode, &generator, operands[0], operands[1], operands[2],
-                     operands[3]);
-    DONE;
-  })
 
-(define_mode_iterator NARROW [QI HI])
+(define_mode_attr sync_predtab [(SI "TARGET_HAVE_LDREX &&
+					TARGET_HAVE_MEMORY_BARRIER")
+				(QI "TARGET_HAVE_LDREXBH &&
+					TARGET_HAVE_MEMORY_BARRIER")
+				(HI "TARGET_HAVE_LDREXBH &&
+					TARGET_HAVE_MEMORY_BARRIER")
+				(DI "TARGET_HAVE_LDREXD &&
+					ARM_DOUBLEWORD_ALIGN &&
+					TARGET_HAVE_MEMORY_BARRIER")])
 
 (define_expand "sync_compare_and_swap<mode>"
-  [(set (match_operand:NARROW 0 "s_register_operand")
-        (unspec_volatile:NARROW [(match_operand:NARROW 1 "memory_operand")
-			     (match_operand:NARROW 2 "s_register_operand")
-			     (match_operand:NARROW 3 "s_register_operand")]
+  [(set (match_operand:QHSD 0 "s_register_operand")
+        (unspec_volatile:QHSD [(match_operand:QHSD 1 "memory_operand")
+			     (match_operand:QHSD 2 "s_register_operand")
+			     (match_operand:QHSD 3 "s_register_operand")]
 			     VUNSPEC_SYNC_COMPARE_AND_SWAP))]
-  "TARGET_HAVE_LDREXBHD && TARGET_HAVE_MEMORY_BARRIER"
+  "<sync_predtab>"
   {
     struct arm_sync_generator generator;
     generator.op = arm_sync_generator_omrn;
@@ -67,25 +61,11 @@
     DONE;
   })
 
-(define_expand "sync_lock_test_and_setsi"
-  [(match_operand:SI 0 "s_register_operand")
-   (match_operand:SI 1 "memory_operand")
-   (match_operand:SI 2 "s_register_operand")]
-  "TARGET_HAVE_LDREX && TARGET_HAVE_MEMORY_BARRIER"
-  {
-    struct arm_sync_generator generator;
-    generator.op = arm_sync_generator_omn;
-    generator.u.omn = gen_arm_sync_lock_test_and_setsi;
-    arm_expand_sync (SImode, &generator, operands[0], operands[1], NULL,
-                     operands[2]);
-    DONE;
-  })
-
 (define_expand "sync_lock_test_and_set<mode>"
-  [(match_operand:NARROW 0 "s_register_operand")
-   (match_operand:NARROW 1 "memory_operand")
-   (match_operand:NARROW 2 "s_register_operand")]
-  "TARGET_HAVE_LDREXBHD && TARGET_HAVE_MEMORY_BARRIER"
+  [(match_operand:QHSD 0 "s_register_operand")
+   (match_operand:QHSD 1 "memory_operand")
+   (match_operand:QHSD 2 "s_register_operand")]
+  "<sync_predtab>"
   {
     struct arm_sync_generator generator;
     generator.op = arm_sync_generator_omn;
@@ -115,51 +95,25 @@
 				(plus "*")
 				(minus "*")])
 
-(define_expand "sync_<sync_optab>si"
-  [(match_operand:SI 0 "memory_operand")
-   (match_operand:SI 1 "s_register_operand")
-   (syncop:SI (match_dup 0) (match_dup 1))]
-  "TARGET_HAVE_LDREX && TARGET_HAVE_MEMORY_BARRIER"
-  {
-    struct arm_sync_generator generator;
-    generator.op = arm_sync_generator_omn;
-    generator.u.omn = gen_arm_sync_new_<sync_optab>si;
-    arm_expand_sync (SImode, &generator, NULL, operands[0], NULL, operands[1]);
-    DONE;
-  })
-
-(define_expand "sync_nandsi"
-  [(match_operand:SI 0 "memory_operand")
-   (match_operand:SI 1 "s_register_operand")
-   (not:SI (and:SI (match_dup 0) (match_dup 1)))]
-  "TARGET_HAVE_LDREX && TARGET_HAVE_MEMORY_BARRIER"
-  {
-    struct arm_sync_generator generator;
-    generator.op = arm_sync_generator_omn;
-    generator.u.omn = gen_arm_sync_new_nandsi;
-    arm_expand_sync (SImode, &generator, NULL, operands[0], NULL, operands[1]);
-    DONE;
-  })
-
 (define_expand "sync_<sync_optab><mode>"
-  [(match_operand:NARROW 0 "memory_operand")
-   (match_operand:NARROW 1 "s_register_operand")
-   (syncop:NARROW (match_dup 0) (match_dup 1))]
-  "TARGET_HAVE_LDREXBHD && TARGET_HAVE_MEMORY_BARRIER"
+  [(match_operand:QHSD 0 "memory_operand")
+   (match_operand:QHSD 1 "s_register_operand")
+   (syncop:QHSD (match_dup 0) (match_dup 1))]
+  "<sync_predtab>"
   {
     struct arm_sync_generator generator;
     generator.op = arm_sync_generator_omn;
     generator.u.omn = gen_arm_sync_new_<sync_optab><mode>;
     arm_expand_sync (<MODE>mode, &generator, NULL, operands[0], NULL,
-    		     operands[1]);
+		     operands[1]);
     DONE;
   })
 
 (define_expand "sync_nand<mode>"
-  [(match_operand:NARROW 0 "memory_operand")
-   (match_operand:NARROW 1 "s_register_operand")
-   (not:NARROW (and:NARROW (match_dup 0) (match_dup 1)))]
-  "TARGET_HAVE_LDREXBHD && TARGET_HAVE_MEMORY_BARRIER"
+  [(match_operand:QHSD 0 "memory_operand")
+   (match_operand:QHSD 1 "s_register_operand")
+   (not:QHSD (and:QHSD (match_dup 0) (match_dup 1)))]
+  "<sync_predtab>"
   {
     struct arm_sync_generator generator;
     generator.op = arm_sync_generator_omn;
@@ -169,57 +123,27 @@
     DONE;
   })
 
-(define_expand "sync_new_<sync_optab>si"
-  [(match_operand:SI 0 "s_register_operand")
-   (match_operand:SI 1 "memory_operand")
-   (match_operand:SI 2 "s_register_operand")
-   (syncop:SI (match_dup 1) (match_dup 2))]
-  "TARGET_HAVE_LDREX && TARGET_HAVE_MEMORY_BARRIER"
-  {
-    struct arm_sync_generator generator;
-    generator.op = arm_sync_generator_omn;
-    generator.u.omn = gen_arm_sync_new_<sync_optab>si;
-    arm_expand_sync (SImode, &generator, operands[0], operands[1], NULL,
-                     operands[2]);
-    DONE;
-  })
-
-(define_expand "sync_new_nandsi"
-  [(match_operand:SI 0 "s_register_operand")
-   (match_operand:SI 1 "memory_operand")
-   (match_operand:SI 2 "s_register_operand")
-   (not:SI (and:SI (match_dup 1) (match_dup 2)))]
-  "TARGET_HAVE_LDREX && TARGET_HAVE_MEMORY_BARRIER"
-  {
-    struct arm_sync_generator generator;
-    generator.op = arm_sync_generator_omn;
-    generator.u.omn = gen_arm_sync_new_nandsi;
-    arm_expand_sync (SImode, &generator, operands[0], operands[1], NULL,
-    		     operands[2]);
-    DONE;
-  })
-
 (define_expand "sync_new_<sync_optab><mode>"
-  [(match_operand:NARROW 0 "s_register_operand")
-   (match_operand:NARROW 1 "memory_operand")
-   (match_operand:NARROW 2 "s_register_operand")
-   (syncop:NARROW (match_dup 1) (match_dup 2))]
-  "TARGET_HAVE_LDREXBHD && TARGET_HAVE_MEMORY_BARRIER"
+  [(match_operand:QHSD 0 "s_register_operand")
+   (match_operand:QHSD 1 "memory_operand")
+   (match_operand:QHSD 2 "s_register_operand")
+   (syncop:QHSD (match_dup 1) (match_dup 2))]
+  "<sync_predtab>"
   {
     struct arm_sync_generator generator;
     generator.op = arm_sync_generator_omn;
     generator.u.omn = gen_arm_sync_new_<sync_optab><mode>;
     arm_expand_sync (<MODE>mode, &generator, operands[0], operands[1],
-    		     NULL, operands[2]);
+		     NULL, operands[2]);
     DONE;
   })
 
 (define_expand "sync_new_nand<mode>"
-  [(match_operand:NARROW 0 "s_register_operand")
-   (match_operand:NARROW 1 "memory_operand")
-   (match_operand:NARROW 2 "s_register_operand")
-   (not:NARROW (and:NARROW (match_dup 1) (match_dup 2)))]
-  "TARGET_HAVE_LDREXBHD && TARGET_HAVE_MEMORY_BARRIER"
+  [(match_operand:QHSD 0 "s_register_operand")
+   (match_operand:QHSD 1 "memory_operand")
+   (match_operand:QHSD 2 "s_register_operand")
+   (not:QHSD (and:QHSD (match_dup 1) (match_dup 2)))]
+  "<sync_predtab>"
   {
     struct arm_sync_generator generator;
     generator.op = arm_sync_generator_omn;
@@ -229,57 +153,27 @@
     DONE;
   });
 
-(define_expand "sync_old_<sync_optab>si"
-  [(match_operand:SI 0 "s_register_operand")
-   (match_operand:SI 1 "memory_operand")
-   (match_operand:SI 2 "s_register_operand")
-   (syncop:SI (match_dup 1) (match_dup 2))]
-  "TARGET_HAVE_LDREX && TARGET_HAVE_MEMORY_BARRIER"
-  {
-    struct arm_sync_generator generator;
-    generator.op = arm_sync_generator_omn;
-    generator.u.omn = gen_arm_sync_old_<sync_optab>si;
-    arm_expand_sync (SImode, &generator, operands[0], operands[1], NULL,
-                     operands[2]);
-    DONE;
-  })
-
-(define_expand "sync_old_nandsi"
-  [(match_operand:SI 0 "s_register_operand")
-   (match_operand:SI 1 "memory_operand")
-   (match_operand:SI 2 "s_register_operand")
-   (not:SI (and:SI (match_dup 1) (match_dup 2)))]
-  "TARGET_HAVE_LDREX && TARGET_HAVE_MEMORY_BARRIER"
-  {
-    struct arm_sync_generator generator;
-    generator.op = arm_sync_generator_omn;
-    generator.u.omn = gen_arm_sync_old_nandsi;
-    arm_expand_sync (SImode, &generator, operands[0], operands[1], NULL,
-                     operands[2]);
-    DONE;
-  })
-
 (define_expand "sync_old_<sync_optab><mode>"
-  [(match_operand:NARROW 0 "s_register_operand")
-   (match_operand:NARROW 1 "memory_operand")
-   (match_operand:NARROW 2 "s_register_operand")
-   (syncop:NARROW (match_dup 1) (match_dup 2))]
-  "TARGET_HAVE_LDREXBHD && TARGET_HAVE_MEMORY_BARRIER"
+  [(match_operand:QHSD 0 "s_register_operand")
+   (match_operand:QHSD 1 "memory_operand")
+   (match_operand:QHSD 2 "s_register_operand")
+   (syncop:QHSD (match_dup 1) (match_dup 2))]
+  "<sync_predtab>"
   {
     struct arm_sync_generator generator;
     generator.op = arm_sync_generator_omn;
     generator.u.omn = gen_arm_sync_old_<sync_optab><mode>;
     arm_expand_sync (<MODE>mode, &generator, operands[0], operands[1],
-    		     NULL, operands[2]);
+		     NULL, operands[2]);
     DONE;
   })
 
 (define_expand "sync_old_nand<mode>"
-  [(match_operand:NARROW 0 "s_register_operand")
-   (match_operand:NARROW 1 "memory_operand")
-   (match_operand:NARROW 2 "s_register_operand")
-   (not:NARROW (and:NARROW (match_dup 1) (match_dup 2)))]
-  "TARGET_HAVE_LDREXBHD && TARGET_HAVE_MEMORY_BARRIER"
+  [(match_operand:QHSD 0 "s_register_operand")
+   (match_operand:QHSD 1 "memory_operand")
+   (match_operand:QHSD 2 "s_register_operand")
+   (not:QHSD (and:QHSD (match_dup 1) (match_dup 2)))]
+  "<sync_predtab>"
   {
     struct arm_sync_generator generator;
     generator.op = arm_sync_generator_omn;
@@ -289,22 +183,22 @@
     DONE;
   })
 
-(define_insn "arm_sync_compare_and_swapsi"
-  [(set (match_operand:SI 0 "s_register_operand" "=&r")
-        (unspec_volatile:SI
-	  [(match_operand:SI 1 "arm_sync_memory_operand" "+Q")
-   	   (match_operand:SI 2 "s_register_operand" "r")
-	   (match_operand:SI 3 "s_register_operand" "r")]
-	  VUNSPEC_SYNC_COMPARE_AND_SWAP))
-   (set (match_dup 1) (unspec_volatile:SI [(match_dup 2)]
+(define_insn "arm_sync_compare_and_swap<mode>"
+  [(set (match_operand:SIDI 0 "s_register_operand" "=&r")
+        (unspec_volatile:SIDI
+	 [(match_operand:SIDI 1 "arm_sync_memory_operand" "+Q")
+	  (match_operand:SIDI 2 "s_register_operand" "r")
+	  (match_operand:SIDI 3 "s_register_operand" "r")]
+	 VUNSPEC_SYNC_COMPARE_AND_SWAP))
+   (set (match_dup 1) (unspec_volatile:SIDI [(match_dup 2)]
                                           VUNSPEC_SYNC_COMPARE_AND_SWAP))
    (set (reg:CC CC_REGNUM) (unspec_volatile:CC [(match_dup 1)]
                                                 VUNSPEC_SYNC_COMPARE_AND_SWAP))
    ]
-  "TARGET_HAVE_LDREX && TARGET_HAVE_MEMORY_BARRIER"
+  "<sync_predtab>"
   {
     return arm_output_sync_insn (insn, operands);
-  } 
+  }
   [(set_attr "sync_result"          "0")
    (set_attr "sync_memory"          "1")
    (set_attr "sync_required_value"  "2")
@@ -318,7 +212,7 @@
         (zero_extend:SI
 	  (unspec_volatile:NARROW
 	    [(match_operand:NARROW 1 "arm_sync_memory_operand" "+Q")
-   	     (match_operand:SI 2 "s_register_operand" "r")
+	     (match_operand:SI 2 "s_register_operand" "r")
 	     (match_operand:SI 3 "s_register_operand" "r")]
 	    VUNSPEC_SYNC_COMPARE_AND_SWAP)))
    (set (match_dup 1) (unspec_volatile:NARROW [(match_dup 2)]
@@ -326,10 +220,10 @@
    (set (reg:CC CC_REGNUM) (unspec_volatile:CC [(match_dup 1)]
                                                 VUNSPEC_SYNC_COMPARE_AND_SWAP))
    ]
-  "TARGET_HAVE_LDREXBHD && TARGET_HAVE_MEMORY_BARRIER"
+  "<sync_predtab>"
   {
     return arm_output_sync_insn (insn, operands);
-  } 
+  }
   [(set_attr "sync_result"          "0")
    (set_attr "sync_memory"          "1")
    (set_attr "sync_required_value"  "2")
@@ -338,18 +232,18 @@
    (set_attr "conds" "clob")
    (set_attr "predicable" "no")])
 
-(define_insn "arm_sync_lock_test_and_setsi"
-  [(set (match_operand:SI 0 "s_register_operand" "=&r")
-        (match_operand:SI 1 "arm_sync_memory_operand" "+Q"))
+(define_insn "arm_sync_lock_test_and_set<mode>"
+  [(set (match_operand:SIDI 0 "s_register_operand" "=&r")
+	(match_operand:SIDI 1 "arm_sync_memory_operand" "+Q"))
    (set (match_dup 1)
-        (unspec_volatile:SI [(match_operand:SI 2 "s_register_operand" "r")]
-	                    VUNSPEC_SYNC_LOCK))
+	(unspec_volatile:SIDI [(match_operand:SIDI 2 "s_register_operand" "r")]
+	VUNSPEC_SYNC_LOCK))
    (clobber (reg:CC CC_REGNUM))
    (clobber (match_scratch:SI 3 "=&r"))]
-  "TARGET_HAVE_LDREX && TARGET_HAVE_MEMORY_BARRIER"
+  "<sync_predtab>"
   {
     return arm_output_sync_insn (insn, operands);
-  } 
+  }
   [(set_attr "sync_release_barrier" "no")
    (set_attr "sync_result"          "0")
    (set_attr "sync_memory"          "1")
@@ -364,10 +258,10 @@
         (zero_extend:SI (match_operand:NARROW 1 "arm_sync_memory_operand" "+Q")))
    (set (match_dup 1)
         (unspec_volatile:NARROW [(match_operand:SI 2 "s_register_operand" "r")]
-	                        VUNSPEC_SYNC_LOCK))
+				VUNSPEC_SYNC_LOCK))
    (clobber (reg:CC CC_REGNUM))
    (clobber (match_scratch:SI 3 "=&r"))]
-  "TARGET_HAVE_LDREX && TARGET_HAVE_MEMORY_BARRIER"
+  "<sync_predtab>"
   {
     return arm_output_sync_insn (insn, operands);
   } 
@@ -380,22 +274,22 @@
    (set_attr "conds" "clob")
    (set_attr "predicable" "no")])
 
-(define_insn "arm_sync_new_<sync_optab>si"
-  [(set (match_operand:SI 0 "s_register_operand" "=&r")
-        (unspec_volatile:SI [(syncop:SI
-                               (match_operand:SI 1 "arm_sync_memory_operand" "+Q")
-                               (match_operand:SI 2 "s_register_operand" "r"))
-	                    ]
-	                    VUNSPEC_SYNC_NEW_OP))
+(define_insn "arm_sync_new_<sync_optab><mode>"
+  [(set (match_operand:SIDI 0 "s_register_operand" "=&r")
+        (unspec_volatile:SIDI [(syncop:SIDI
+			       (match_operand:SIDI 1 "arm_sync_memory_operand" "+Q")
+			       (match_operand:SIDI 2 "s_register_operand" "r"))
+			    ]
+			    VUNSPEC_SYNC_NEW_OP))
    (set (match_dup 1)
-        (unspec_volatile:SI [(match_dup 1) (match_dup 2)]
-	                    VUNSPEC_SYNC_NEW_OP))
+	(unspec_volatile:SIDI [(match_dup 1) (match_dup 2)]
+			    VUNSPEC_SYNC_NEW_OP))
    (clobber (reg:CC CC_REGNUM))
    (clobber (match_scratch:SI 3 "=&r"))]
-  "TARGET_HAVE_LDREX && TARGET_HAVE_MEMORY_BARRIER"
+  "<sync_predtab>"
   {
     return arm_output_sync_insn (insn, operands);
-  } 
+  }
   [(set_attr "sync_result"          "0")
    (set_attr "sync_memory"          "1")
    (set_attr "sync_new_value"       "2")
@@ -405,54 +299,54 @@
    (set_attr "conds" "clob")
    (set_attr "predicable" "no")])
 
-(define_insn "arm_sync_new_nandsi"
+(define_insn "arm_sync_new_<sync_optab><mode>"
   [(set (match_operand:SI 0 "s_register_operand" "=&r")
-        (unspec_volatile:SI [(not:SI (and:SI
-                               (match_operand:SI 1 "arm_sync_memory_operand" "+Q")
-                               (match_operand:SI 2 "s_register_operand" "r")))
-	                    ]
-	                    VUNSPEC_SYNC_NEW_OP))
+        (unspec_volatile:SI [(syncop:SI
+			       (zero_extend:SI
+				 (match_operand:NARROW 1 "arm_sync_memory_operand" "+Q"))
+			       (match_operand:SI 2 "s_register_operand" "r"))
+			    ]
+			    VUNSPEC_SYNC_NEW_OP))
    (set (match_dup 1)
-        (unspec_volatile:SI [(match_dup 1) (match_dup 2)]
-	                    VUNSPEC_SYNC_NEW_OP))
+	(unspec_volatile:NARROW [(match_dup 1) (match_dup 2)]
+				VUNSPEC_SYNC_NEW_OP))
    (clobber (reg:CC CC_REGNUM))
    (clobber (match_scratch:SI 3 "=&r"))]
-  "TARGET_HAVE_LDREX && TARGET_HAVE_MEMORY_BARRIER"
+  "<sync_predtab>"
   {
     return arm_output_sync_insn (insn, operands);
-  } 
+  }
   [(set_attr "sync_result"          "0")
    (set_attr "sync_memory"          "1")
    (set_attr "sync_new_value"       "2")
    (set_attr "sync_t1"              "0")
    (set_attr "sync_t2"              "3")
-   (set_attr "sync_op"              "nand")
+   (set_attr "sync_op"              "<sync_optab>")
    (set_attr "conds" "clob")
    (set_attr "predicable" "no")])
 
-(define_insn "arm_sync_new_<sync_optab><mode>"
-  [(set (match_operand:SI 0 "s_register_operand" "=&r")
-        (unspec_volatile:SI [(syncop:SI
-                               (zero_extend:SI
-			         (match_operand:NARROW 1 "arm_sync_memory_operand" "+Q"))
-                               (match_operand:SI 2 "s_register_operand" "r"))
-	                    ]
-	                    VUNSPEC_SYNC_NEW_OP))
+(define_insn "arm_sync_new_nand<mode>"
+  [(set (match_operand:SIDI 0 "s_register_operand" "=&r")
+        (unspec_volatile:SIDI [(not:SIDI (and:SIDI
+			       (match_operand:SIDI 1 "arm_sync_memory_operand" "+Q")
+			       (match_operand:SIDI 2 "s_register_operand" "r")))
+			    ]
+			    VUNSPEC_SYNC_NEW_OP))
    (set (match_dup 1)
-        (unspec_volatile:NARROW [(match_dup 1) (match_dup 2)]
-	                        VUNSPEC_SYNC_NEW_OP))
+	(unspec_volatile:SIDI [(match_dup 1) (match_dup 2)]
+			    VUNSPEC_SYNC_NEW_OP))
    (clobber (reg:CC CC_REGNUM))
    (clobber (match_scratch:SI 3 "=&r"))]
-  "TARGET_HAVE_LDREXBHD && TARGET_HAVE_MEMORY_BARRIER"
+  "<sync_predtab>"
   {
     return arm_output_sync_insn (insn, operands);
-  } 
+  }
   [(set_attr "sync_result"          "0")
    (set_attr "sync_memory"          "1")
    (set_attr "sync_new_value"       "2")
    (set_attr "sync_t1"              "0")
    (set_attr "sync_t2"              "3")
-   (set_attr "sync_op"              "<sync_optab>")
+   (set_attr "sync_op"              "nand")
    (set_attr "conds" "clob")
    (set_attr "predicable" "no")])
 
@@ -461,19 +355,19 @@
         (unspec_volatile:SI
 	  [(not:SI
 	     (and:SI
-               (zero_extend:SI	  
-	         (match_operand:NARROW 1 "arm_sync_memory_operand" "+Q"))
-               (match_operand:SI 2 "s_register_operand" "r")))
+	       (zero_extend:SI
+		 (match_operand:NARROW 1 "arm_sync_memory_operand" "+Q"))
+	       (match_operand:SI 2 "s_register_operand" "r")))
 	  ] VUNSPEC_SYNC_NEW_OP))
    (set (match_dup 1)
         (unspec_volatile:NARROW [(match_dup 1) (match_dup 2)]
-	                        VUNSPEC_SYNC_NEW_OP))
+				VUNSPEC_SYNC_NEW_OP))
    (clobber (reg:CC CC_REGNUM))
    (clobber (match_scratch:SI 3 "=&r"))]
-  "TARGET_HAVE_LDREX && TARGET_HAVE_MEMORY_BARRIER"
+  "<sync_predtab>"
   {
     return arm_output_sync_insn (insn, operands);
-  } 
+  }
   [(set_attr "sync_result"          "0")
    (set_attr "sync_memory"          "1")
    (set_attr "sync_new_value"       "2")
@@ -483,20 +377,20 @@
    (set_attr "conds" "clob")
    (set_attr "predicable" "no")])
 
-(define_insn "arm_sync_old_<sync_optab>si"
-  [(set (match_operand:SI 0 "s_register_operand" "=&r")
-        (unspec_volatile:SI [(syncop:SI
-                               (match_operand:SI 1 "arm_sync_memory_operand" "+Q")
-                               (match_operand:SI 2 "s_register_operand" "r"))
-	                    ]
-	                    VUNSPEC_SYNC_OLD_OP))
+(define_insn "arm_sync_old_<sync_optab><mode>"
+  [(set (match_operand:SIDI 0 "s_register_operand" "=&r")
+	(unspec_volatile:SIDI [(syncop:SIDI
+			       (match_operand:SIDI 1 "arm_sync_memory_operand" "+Q")
+			       (match_operand:SIDI 2 "s_register_operand" "r"))
+			    ]
+			    VUNSPEC_SYNC_OLD_OP))
    (set (match_dup 1)
-        (unspec_volatile:SI [(match_dup 1) (match_dup 2)]
-	                    VUNSPEC_SYNC_OLD_OP))
+        (unspec_volatile:SIDI [(match_dup 1) (match_dup 2)]
+			      VUNSPEC_SYNC_OLD_OP))
    (clobber (reg:CC CC_REGNUM))
-   (clobber (match_scratch:SI 3 "=&r"))
+   (clobber (match_scratch:SIDI 3 "=&r"))
    (clobber (match_scratch:SI 4 "<sync_clobber>"))]
-  "TARGET_HAVE_LDREX && TARGET_HAVE_MEMORY_BARRIER"
+  "<sync_predtab>"
   {
     return arm_output_sync_insn (insn, operands);
   } 
@@ -509,20 +403,21 @@
    (set_attr "conds" "clob")
    (set_attr "predicable" "no")])
 
-(define_insn "arm_sync_old_nandsi"
+(define_insn "arm_sync_old_<sync_optab><mode>"
   [(set (match_operand:SI 0 "s_register_operand" "=&r")
-        (unspec_volatile:SI [(not:SI (and:SI
-                               (match_operand:SI 1 "arm_sync_memory_operand" "+Q")
-                               (match_operand:SI 2 "s_register_operand" "r")))
-	                    ]
-	                    VUNSPEC_SYNC_OLD_OP))
+        (unspec_volatile:SI [(syncop:SI
+			       (zero_extend:SI
+				 (match_operand:NARROW 1 "arm_sync_memory_operand" "+Q"))
+			       (match_operand:SI 2 "s_register_operand" "r"))
+			    ]
+			    VUNSPEC_SYNC_OLD_OP))
    (set (match_dup 1)
-        (unspec_volatile:SI [(match_dup 1) (match_dup 2)]
-	                    VUNSPEC_SYNC_OLD_OP))
+	(unspec_volatile:NARROW [(match_dup 1) (match_dup 2)]
+			    VUNSPEC_SYNC_OLD_OP))
    (clobber (reg:CC CC_REGNUM))
    (clobber (match_scratch:SI 3 "=&r"))
-   (clobber (match_scratch:SI 4 "=&r"))]
-  "TARGET_HAVE_LDREX && TARGET_HAVE_MEMORY_BARRIER"
+   (clobber (match_scratch:SI 4 "<sync_clobber>"))]
+  "<sync_predtab>"
   {
     return arm_output_sync_insn (insn, operands);
   } 
@@ -530,26 +425,25 @@
    (set_attr "sync_memory"          "1")
    (set_attr "sync_new_value"       "2")
    (set_attr "sync_t1"              "3")
-   (set_attr "sync_t2"              "4")
-   (set_attr "sync_op"              "nand")
+   (set_attr "sync_t2"              "<sync_t2_reqd>")
+   (set_attr "sync_op"              "<sync_optab>")
    (set_attr "conds" 		    "clob")
    (set_attr "predicable" "no")])
 
-(define_insn "arm_sync_old_<sync_optab><mode>"
-  [(set (match_operand:SI 0 "s_register_operand" "=&r")
-        (unspec_volatile:SI [(syncop:SI
-                               (zero_extend:SI
-			         (match_operand:NARROW 1 "arm_sync_memory_operand" "+Q"))
-                               (match_operand:SI 2 "s_register_operand" "r"))
-	                    ]
-	                    VUNSPEC_SYNC_OLD_OP))
+(define_insn "arm_sync_old_nand<mode>"
+  [(set (match_operand:SIDI 0 "s_register_operand" "=&r")
+	(unspec_volatile:SIDI [(not:SIDI (and:SIDI
+			       (match_operand:SIDI 1 "arm_sync_memory_operand" "+Q")
+			       (match_operand:SIDI 2 "s_register_operand" "r")))
+			    ]
+			    VUNSPEC_SYNC_OLD_OP))
    (set (match_dup 1)
-        (unspec_volatile:NARROW [(match_dup 1) (match_dup 2)]
+        (unspec_volatile:SIDI [(match_dup 1) (match_dup 2)]
 	                    VUNSPEC_SYNC_OLD_OP))
    (clobber (reg:CC CC_REGNUM))
-   (clobber (match_scratch:SI 3 "=&r"))
-   (clobber (match_scratch:SI 4 "<sync_clobber>"))]
-  "TARGET_HAVE_LDREXBHD && TARGET_HAVE_MEMORY_BARRIER"
+   (clobber (match_scratch:SIDI 3 "=&r"))
+   (clobber (match_scratch:SI 4 "=&r"))]
+  "<sync_predtab>"
   {
     return arm_output_sync_insn (insn, operands);
   } 
@@ -557,26 +451,26 @@
    (set_attr "sync_memory"          "1")
    (set_attr "sync_new_value"       "2")
    (set_attr "sync_t1"              "3")
-   (set_attr "sync_t2"              "<sync_t2_reqd>")
-   (set_attr "sync_op"              "<sync_optab>")
+   (set_attr "sync_t2"              "4")
+   (set_attr "sync_op"              "nand")
    (set_attr "conds" 		    "clob")
    (set_attr "predicable" "no")])
 
 (define_insn "arm_sync_old_nand<mode>"
   [(set (match_operand:SI 0 "s_register_operand" "=&r")
-        (unspec_volatile:SI [(not:SI (and:SI
-                               (zero_extend:SI
-			         (match_operand:NARROW 1 "arm_sync_memory_operand" "+Q"))
-                               (match_operand:SI 2 "s_register_operand" "r")))
-	                    ]
-	                    VUNSPEC_SYNC_OLD_OP))
+	(unspec_volatile:SI [(not:SI (and:SI
+			       (zero_extend:SI
+				 (match_operand:NARROW 1 "arm_sync_memory_operand" "+Q"))
+			       (match_operand:SI 2 "s_register_operand" "r")))
+			    ]
+			    VUNSPEC_SYNC_OLD_OP))
    (set (match_dup 1)
-        (unspec_volatile:NARROW [(match_dup 1) (match_dup 2)]
-	                    VUNSPEC_SYNC_OLD_OP))
+	(unspec_volatile:NARROW [(match_dup 1) (match_dup 2)]
+			    VUNSPEC_SYNC_OLD_OP))
    (clobber (reg:CC CC_REGNUM))
    (clobber (match_scratch:SI 3 "=&r"))
    (clobber (match_scratch:SI 4 "=&r"))]
-  "TARGET_HAVE_LDREXBHD && TARGET_HAVE_MEMORY_BARRIER"
+  "<sync_predtab>"
   {
     return arm_output_sync_insn (insn, operands);
   } 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [Patch 4/5] ARM 64 bit sync atomic operations [V3]
  2011-10-06 17:55                   ` [Patch 3/5] " Dr. David Alan Gilbert
@ 2011-10-06 18:03                     ` Dr. David Alan Gilbert
  2011-10-06 18:03                       ` [Patch 5/5] " Dr. David Alan Gilbert
  2011-10-12  0:53                       ` [Patch 4/5] " Ramana Radhakrishnan
  2011-10-12  2:35                     ` [Patch 3/5] " Ramana Radhakrishnan
  1 sibling, 2 replies; 43+ messages in thread
From: Dr. David Alan Gilbert @ 2011-10-06 18:03 UTC (permalink / raw)
  To: gcc-patches; +Cc: Ramana Radhakrishnan, rth, joseph, patches, mikestump, ro

    Add ARM 64bit sync helpers for use on older ARMs.  Based on 32bit
    versions but with check for sufficiently new kernel version.

        gcc/
        * config/arm/linux-atomic-64bit.c: New (based on linux-atomic.c)
        * config/arm/linux-atomic.c: Change comment to point to 64bit version
          (SYNC_LOCK_RELEASE): Instantiate 64bit version.
        * config/arm/t-linux-eabi: Pull in linux-atomic-64bit.c


diff --git a/gcc/config/arm/linux-atomic-64bit.c b/gcc/config/arm/linux-atomic-64bit.c
new file mode 100644
index 0000000..6966e66
--- /dev/null
+++ b/gcc/config/arm/linux-atomic-64bit.c
@@ -0,0 +1,166 @@
+/* 64bit Linux-specific atomic operations for ARM EABI.
+   Copyright (C) 2008, 2009, 2010, 2011 Free Software Foundation, Inc.
+   Based on linux-atomic.c
+
+   64 bit additions david.gilbert@linaro.org
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+/* 64bit helper functions for atomic operations; the compiler will
+   call these when the code is compiled for a CPU without ldrexd/strexd.
+   (If the CPU had those then the compiler inlines the operation).
+
+   These helpers require a kernel helper that's only present on newer
+   kernels; we check for that in an init section and bail out rather
+   unceremoneously.  */
+
+extern unsigned int __write (int fd, const void *buf, unsigned int count);
+extern void abort (void);
+
+/* Kernel helper for compare-and-exchange.  */
+typedef int (__kernel_cmpxchg64_t) (const long long* oldval,
+					const long long* newval,
+					long long *ptr);
+#define __kernel_cmpxchg64 (*(__kernel_cmpxchg64_t *) 0xffff0f60)
+
+/* Kernel helper page version number.  */
+#define __kernel_helper_version (*(unsigned int *)0xffff0ffc)
+
+/* Check that the kernel has a new enough version at load.  */
+static void __check_for_sync8_kernelhelper (void)
+{
+  if (__kernel_helper_version < 5)
+    {
+      const char err[] = "A newer kernel is required to run this binary. "
+				"(__kernel_cmpxchg64 helper)\n";
+      /* At this point we need a way to crash with some information
+	 for the user - I'm not sure I can rely on much else being
+	 available at this point, so do the same as generic-morestack.c
+	 write () and abort ().  */
+      __write (2 /* stderr.  */, err, sizeof (err));
+      abort ();
+    }
+};
+
+static void (*__sync8_kernelhelper_inithook[]) (void)
+		__attribute__ ((used, section (".init_array"))) = {
+  &__check_for_sync8_kernelhelper
+};
+
+#define HIDDEN __attribute__ ((visibility ("hidden")))
+
+#define FETCH_AND_OP_WORD64(OP, PFX_OP, INF_OP)			\
+  long long HIDDEN						\
+  __sync_fetch_and_##OP##_8 (long long *ptr, long long val)	\
+  {								\
+    int failure;						\
+    long long tmp,tmp2;						\
+								\
+    do {							\
+      tmp = *ptr;						\
+      tmp2 = PFX_OP (tmp INF_OP val);				\
+      failure = __kernel_cmpxchg64 (&tmp, &tmp2, ptr);		\
+    } while (failure != 0);					\
+								\
+    return tmp;							\
+  }
+
+FETCH_AND_OP_WORD64 (add,   , +)
+FETCH_AND_OP_WORD64 (sub,   , -)
+FETCH_AND_OP_WORD64 (or,    , |)
+FETCH_AND_OP_WORD64 (and,   , &)
+FETCH_AND_OP_WORD64 (xor,   , ^)
+FETCH_AND_OP_WORD64 (nand, ~, &)
+
+#define NAME_oldval(OP, WIDTH) __sync_fetch_and_##OP##_##WIDTH
+#define NAME_newval(OP, WIDTH) __sync_##OP##_and_fetch_##WIDTH
+
+/* Implement both __sync_<op>_and_fetch and __sync_fetch_and_<op> for
+   subword-sized quantities.  */
+
+#define OP_AND_FETCH_WORD64(OP, PFX_OP, INF_OP)			\
+  long long HIDDEN						\
+  __sync_##OP##_and_fetch_8 (long long *ptr, long long val)	\
+  {								\
+    int failure;						\
+    long long tmp,tmp2;						\
+								\
+    do {							\
+      tmp = *ptr;						\
+      tmp2 = PFX_OP (tmp INF_OP val);				\
+      failure = __kernel_cmpxchg64 (&tmp, &tmp2, ptr);		\
+    } while (failure != 0);					\
+								\
+    return tmp2;						\
+  }
+
+OP_AND_FETCH_WORD64 (add,   , +)
+OP_AND_FETCH_WORD64 (sub,   , -)
+OP_AND_FETCH_WORD64 (or,    , |)
+OP_AND_FETCH_WORD64 (and,   , &)
+OP_AND_FETCH_WORD64 (xor,   , ^)
+OP_AND_FETCH_WORD64 (nand, ~, &)
+
+long long HIDDEN
+__sync_val_compare_and_swap_8 (long long *ptr, long long oldval,
+				long long newval)
+{
+  int failure;
+  long long actual_oldval;
+
+  while (1)
+    {
+      actual_oldval = *ptr;
+
+      if (__builtin_expect (oldval != actual_oldval, 0))
+	return actual_oldval;
+
+      failure = __kernel_cmpxchg64 (&actual_oldval, &newval, ptr);
+
+      if (__builtin_expect (!failure, 1))
+	return oldval;
+    }
+}
+
+typedef unsigned char bool;
+
+bool HIDDEN
+__sync_bool_compare_and_swap_8 (long long *ptr, long long oldval,
+				 long long newval)
+{
+  int failure = __kernel_cmpxchg64 (&oldval, &newval, ptr);
+  return (failure == 0);
+}
+
+long long HIDDEN
+__sync_lock_test_and_set_8 (long long *ptr, long long val)
+{
+  int failure;
+  long long oldval;
+
+  do {
+    oldval = *ptr;
+    failure = __kernel_cmpxchg64 (&oldval, &val, ptr);
+  } while (failure != 0);
+
+  return oldval;
+}
diff --git a/gcc/config/arm/linux-atomic.c b/gcc/config/arm/linux-atomic.c
index 57065a6..80f161d 100644
--- a/gcc/config/arm/linux-atomic.c
+++ b/gcc/config/arm/linux-atomic.c
@@ -32,8 +32,8 @@ typedef void (__kernel_dmb_t) (void);
 #define __kernel_dmb (*(__kernel_dmb_t *) 0xffff0fa0)
 
 /* Note: we implement byte, short and int versions of atomic operations using
-   the above kernel helpers, but there is no support for "long long" (64-bit)
-   operations as yet.  */
+   the above kernel helpers; see linux-atomic-64bit.c for "long long" (64-bit)
+   operations.  */
 
 #define HIDDEN __attribute__ ((visibility ("hidden")))
 
@@ -273,6 +273,7 @@ SUBWORD_TEST_AND_SET (unsigned char,  1)
     *ptr = 0;								\
   }
 
+SYNC_LOCK_RELEASE (long long,   8)
 SYNC_LOCK_RELEASE (int,   4)
 SYNC_LOCK_RELEASE (short, 2)
 SYNC_LOCK_RELEASE (char,  1)
diff --git a/gcc/config/arm/t-linux-eabi b/gcc/config/arm/t-linux-eabi
index a328a00..3814cc0 100644
--- a/gcc/config/arm/t-linux-eabi
+++ b/gcc/config/arm/t-linux-eabi
@@ -36,3 +36,4 @@ LIB1ASMFUNCS := $(filter-out _dvmd_tls,$(LIB1ASMFUNCS)) _dvmd_lnx _clear_cache
 EXTRA_MULTILIB_PARTS=crtbegin.o crtend.o crtbeginS.o crtendS.o crtbeginT.o
 
 LIB2FUNCS_STATIC_EXTRA += $(srcdir)/config/arm/linux-atomic.c
+LIB2FUNCS_STATIC_EXTRA += $(srcdir)/config/arm/linux-atomic-64bit.c

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [Patch 5/5] ARM 64 bit sync atomic operations [V3]
  2011-10-06 18:03                     ` [Patch 4/5] " Dr. David Alan Gilbert
@ 2011-10-06 18:03                       ` Dr. David Alan Gilbert
  2011-10-12  4:51                         ` Ramana Radhakrishnan
  2011-10-12  5:06                         ` Mike Stump
  2011-10-12  0:53                       ` [Patch 4/5] " Ramana Radhakrishnan
  1 sibling, 2 replies; 43+ messages in thread
From: Dr. David Alan Gilbert @ 2011-10-06 18:03 UTC (permalink / raw)
  To: gcc-patches; +Cc: Ramana Radhakrishnan, rth, joseph, patches, mikestump, ro

   Test support for ARM 64bit sync intrinsics.

      gcc/testsuite/
        * gcc.dg/di-longlong64-sync-1.c: New test.
        * gcc.dg/di-sync-multithread.c: New test.
        * gcc.target/arm/di-longlong64-sync-withhelpers.c: New test.
        * gcc.target/arm/di-longlong64-sync-withldrexd.c: New test.
        * lib/target-supports.exp: (arm_arch_*_ok): Series of  effective-target
                tests for v5, v6, v6k, and v7-a, and add-options helpers.
          (check_effective_target_arm_arm_ok): New helper.
          (check_effective_target_sync_longlong): New helper.

diff --git a/gcc/testsuite/gcc.dg/di-longlong64-sync-1.c b/gcc/testsuite/gcc.dg/di-longlong64-sync-1.c
new file mode 100644
index 0000000..82a4ea2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/di-longlong64-sync-1.c
@@ -0,0 +1,164 @@
+/* { dg-do run } */
+/* { dg-require-effective-target sync_longlong } */
+/* { dg-options "-std=gnu99" } */
+/* { dg-message "note: '__sync_fetch_and_nand' changed semantics in GCC 4.4" "" { target *-*-* } 0 } */
+/* { dg-message "note: '__sync_nand_and_fetch' changed semantics in GCC 4.4" "" { target *-*-* } 0 } */
+
+
+/* Test basic functionality of the intrinsics.  The operations should
+   not be optimized away if no one checks the return values.  */
+
+/* Based on ia64-sync-[12].c, but 1) long on ARM is 32 bit so use long long
+   (an explicit 64bit type maybe a better bet) and 2) Use values that cross
+   the 32bit boundary and cause carries since the actual maths are done as
+   pairs of 32 bit instructions.  */
+
+/* Note: This file is #included by some of the ARM tests.  */
+
+__extension__ typedef __SIZE_TYPE__ size_t;
+
+extern void abort (void);
+extern void *memcpy (void *, const void *, size_t);
+extern int memcmp (const void *, const void *, size_t);
+
+/* Temporary space where the work actually gets done.  */
+static long long AL[24];
+/* Values copied into AL before we start.  */
+static long long init_di[24] = { 0x100000002ll, 0x200000003ll, 0, 1,
+
+				 0x100000002ll, 0x100000002ll,
+				 0x100000002ll, 0x100000002ll,
+
+				 0, 0x1000e0de0000ll,
+				 42 , 0xc001c0de0000ll,
+
+				 -1ll, 0, 0xff00ff0000ll, -1ll,
+
+				 0, 0x1000e0de0000ll,
+				 42 , 0xc001c0de0000ll,
+
+				 -1ll, 0, 0xff00ff0000ll, -1ll};
+/* This is what should be in AL at the end.  */
+static long long test_di[24] = { 0x1234567890ll, 0x1234567890ll, 1, 0,
+
+				 0x100000002ll, 0x100000002ll,
+				 0x100000002ll, 0x100000002ll,
+
+				 1, 0xc001c0de0000ll,
+				 20, 0x1000e0de0000ll,
+
+				 0x300000007ll , 0x500000009ll,
+				 0xf100ff0001ll, ~0xa00000007ll,
+
+				 1, 0xc001c0de0000ll,
+				 20, 0x1000e0de0000ll,
+
+				 0x300000007ll , 0x500000009ll,
+				 0xf100ff0001ll, ~0xa00000007ll };
+
+/* First check they work in terms of what they do to memory.  */
+static void
+do_noret_di (void)
+{
+  __sync_val_compare_and_swap (AL+0, 0x100000002ll, 0x1234567890ll);
+  __sync_bool_compare_and_swap (AL+1, 0x200000003ll, 0x1234567890ll);
+  __sync_lock_test_and_set (AL+2, 1);
+  __sync_lock_release (AL+3);
+
+  /* The following tests should not change the value since the
+     original does NOT match.  */
+  __sync_val_compare_and_swap (AL+4, 0x000000002ll, 0x1234567890ll);
+  __sync_val_compare_and_swap (AL+5, 0x100000000ll, 0x1234567890ll);
+  __sync_bool_compare_and_swap (AL+6, 0x000000002ll, 0x1234567890ll);
+  __sync_bool_compare_and_swap (AL+7, 0x100000000ll, 0x1234567890ll);
+
+  __sync_fetch_and_add (AL+8, 1);
+  __sync_fetch_and_add (AL+9, 0xb000e0000000ll); /* + to both halves & carry.  */
+  __sync_fetch_and_sub (AL+10, 22);
+  __sync_fetch_and_sub (AL+11, 0xb000e0000000ll);
+
+  __sync_fetch_and_and (AL+12, 0x300000007ll);
+  __sync_fetch_and_or (AL+13, 0x500000009ll);
+  __sync_fetch_and_xor (AL+14, 0xe00000001ll);
+  __sync_fetch_and_nand (AL+15, 0xa00000007ll);
+
+  /* These should be the same as the fetch_and_* cases except for
+     return value.  */
+  __sync_add_and_fetch (AL+16, 1);
+  /* add to both halves & carry.  */
+  __sync_add_and_fetch (AL+17, 0xb000e0000000ll);
+  __sync_sub_and_fetch (AL+18, 22);
+  __sync_sub_and_fetch (AL+19, 0xb000e0000000ll);
+
+  __sync_and_and_fetch (AL+20, 0x300000007ll);
+  __sync_or_and_fetch (AL+21, 0x500000009ll);
+  __sync_xor_and_fetch (AL+22, 0xe00000001ll);
+  __sync_nand_and_fetch (AL+23, 0xa00000007ll);
+}
+
+/* Now check return values.  */
+static void
+do_ret_di (void)
+{
+  if (__sync_val_compare_and_swap (AL+0, 0x100000002ll, 0x1234567890ll) !=
+	0x100000002ll) abort ();
+  if (__sync_bool_compare_and_swap (AL+1, 0x200000003ll, 0x1234567890ll) !=
+	1) abort ();
+  if (__sync_lock_test_and_set (AL+2, 1) != 0) abort ();
+  __sync_lock_release (AL+3); /* no return value, but keep to match results.  */
+
+  /* The following tests should not change the value since the
+     original does NOT match.  */
+  if (__sync_val_compare_and_swap (AL+4, 0x000000002ll, 0x1234567890ll) !=
+	0x100000002ll) abort ();
+  if (__sync_val_compare_and_swap (AL+5, 0x100000000ll, 0x1234567890ll) !=
+	0x100000002ll) abort ();
+  if (__sync_bool_compare_and_swap (AL+6, 0x000000002ll, 0x1234567890ll) !=
+	0) abort ();
+  if (__sync_bool_compare_and_swap (AL+7, 0x100000000ll, 0x1234567890ll) !=
+	0) abort ();
+
+  if (__sync_fetch_and_add (AL+8, 1) != 0) abort ();
+  if (__sync_fetch_and_add (AL+9, 0xb000e0000000ll) != 0x1000e0de0000ll) abort ();
+  if (__sync_fetch_and_sub (AL+10, 22) != 42) abort ();
+  if (__sync_fetch_and_sub (AL+11, 0xb000e0000000ll) != 0xc001c0de0000ll)
+	abort ();
+
+  if (__sync_fetch_and_and (AL+12, 0x300000007ll) != -1ll) abort ();
+  if (__sync_fetch_and_or (AL+13, 0x500000009ll) != 0) abort ();
+  if (__sync_fetch_and_xor (AL+14, 0xe00000001ll) != 0xff00ff0000ll) abort ();
+  if (__sync_fetch_and_nand (AL+15, 0xa00000007ll) != -1ll) abort ();
+
+  /* These should be the same as the fetch_and_* cases except for
+     return value.  */
+  if (__sync_add_and_fetch (AL+16, 1) != 1) abort ();
+  if (__sync_add_and_fetch (AL+17, 0xb000e0000000ll) != 0xc001c0de0000ll)
+	abort ();
+  if (__sync_sub_and_fetch (AL+18, 22) != 20) abort ();
+  if (__sync_sub_and_fetch (AL+19, 0xb000e0000000ll) != 0x1000e0de0000ll)
+	abort ();
+
+  if (__sync_and_and_fetch (AL+20, 0x300000007ll) != 0x300000007ll) abort ();
+  if (__sync_or_and_fetch (AL+21, 0x500000009ll) != 0x500000009ll) abort ();
+  if (__sync_xor_and_fetch (AL+22, 0xe00000001ll) != 0xf100ff0001ll) abort ();
+  if (__sync_nand_and_fetch (AL+23, 0xa00000007ll) != ~0xa00000007ll) abort ();
+}
+
+int main ()
+{
+  memcpy (AL, init_di, sizeof (init_di));
+
+  do_noret_di ();
+
+  if (memcmp (AL, test_di, sizeof (test_di)))
+    abort ();
+
+  memcpy (AL, init_di, sizeof (init_di));
+
+  do_ret_di ();
+
+  if (memcmp (AL, test_di, sizeof (test_di)))
+    abort ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/di-sync-multithread.c b/gcc/testsuite/gcc.dg/di-sync-multithread.c
new file mode 100644
index 0000000..c5b3d89
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/di-sync-multithread.c
@@ -0,0 +1,205 @@
+/* { dg-do run } */
+/* { dg-require-effective-target sync_longlong } */
+/* { dg-require-effective-target pthread_h } */
+/* { dg-require-effective-target pthread } */
+/* { dg-options "-pthread -std=gnu99" } */
+
+/* test of long long atomic ops performed in parallel in 3 pthreads
+   david.gilbert@linaro.org */
+
+#include <pthread.h>
+#include <unistd.h>
+
+/*#define DEBUGIT 1 */
+
+#ifdef DEBUGIT
+#include <stdio.h>
+
+#define DOABORT(x,...) {\
+	 fprintf (stderr, x, __VA_ARGS__); fflush (stderr); abort ();\
+	 }
+
+#else
+
+#define DOABORT(x,...) abort ();
+
+#endif
+
+/* Passed to each thread to describe which bits it is going to work on.  */
+struct threadwork {
+  unsigned long long count; /* incremented each time the worker loops.  */
+  unsigned int thread;    /* ID */
+  unsigned int addlsb;    /* 8 bit */
+  unsigned int logic1lsb; /* 5 bit */
+  unsigned int logic2lsb; /* 8 bit */
+};
+
+/* The shared word where all the atomic work is done.  */
+static volatile long long workspace;
+
+/* A shared word to tell the workers to quit when non-0.  */
+static long long doquit;
+
+extern void abort (void);
+
+/* Note this test doesn't test the return values much.  */
+void*
+worker (void* data)
+{
+  struct threadwork *tw = (struct threadwork*)data;
+  long long add1bit = 1ll << tw->addlsb;
+  long long logic1bit = 1ll << tw->logic1lsb;
+  long long logic2bit = 1ll << tw->logic2lsb;
+
+  /* Clear the bits we use.  */
+  __sync_and_and_fetch (&workspace, ~(0xffll * add1bit));
+  __sync_fetch_and_and (&workspace, ~(0x1fll * logic1bit));
+  __sync_fetch_and_and (&workspace, ~(0xffll * logic2bit));
+
+  do
+    {
+      long long tmp1, tmp2, tmp3;
+      /* OK, lets try and do some stuff to the workspace - by the end
+         of the main loop our area should be the same as it is now - i.e. 0.  */
+
+      /* Push the arithmetic section upto 128 - one of the threads will
+         case this to carry accross the 32bit boundary.  */
+      for (tmp2 = 0; tmp2 < 64; tmp2++)
+	{
+	  /* Add 2 using the two different adds.  */
+	  tmp1 = __sync_add_and_fetch (&workspace, add1bit);
+	  tmp3 = __sync_fetch_and_add (&workspace, add1bit);
+
+	  /* The value should be the intermediate add value in both cases.  */
+	  if ((tmp1 & (add1bit * 0xff)) != (tmp3 & (add1bit * 0xff)))
+	    DOABORT ("Mismatch of add intermediates on thread %d "
+			"workspace=0x%llx tmp1=0x%llx "
+			"tmp2=0x%llx tmp3=0x%llx\n",
+			 tw->thread, workspace, tmp1, tmp2, tmp3);
+	}
+
+      /* Set the logic bits.  */
+      tmp2=__sync_or_and_fetch (&workspace,
+			  0x1fll * logic1bit | 0xffll * logic2bit);
+
+      /* Check the logic bits are set and the arithmetic value is correct.  */
+      if ((tmp2 & (0x1fll * logic1bit | 0xffll * logic2bit
+			| 0xffll * add1bit))
+	  != (0x1fll * logic1bit | 0xffll * logic2bit | 0x80ll * add1bit))
+	DOABORT ("Midloop check failed on thread %d "
+			"workspace=0x%llx tmp2=0x%llx "
+			"masktmp2=0x%llx expected=0x%llx\n",
+		tw->thread, workspace, tmp2,
+		tmp2 & (0x1fll * logic1bit | 0xffll * logic2bit |
+			 0xffll * add1bit),
+		(0x1fll * logic1bit | 0xffll * logic2bit | 0x80ll * add1bit));
+
+      /* Pull the arithmetic set back down to 0 - again this should cause a
+	 carry across the 32bit boundary in one thread.  */
+
+      for (tmp2 = 0; tmp2 < 64; tmp2++)
+	{
+	  /* Subtract 2 using the two different subs.  */
+	  tmp1=__sync_sub_and_fetch (&workspace, add1bit);
+	  tmp3=__sync_fetch_and_sub (&workspace, add1bit);
+
+	  /* The value should be the intermediate sub value in both cases.  */
+	  if ((tmp1 & (add1bit * 0xff)) != (tmp3 & (add1bit * 0xff)))
+	    DOABORT ("Mismatch of sub intermediates on thread %d "
+			"workspace=0x%llx tmp1=0x%llx "
+			"tmp2=0x%llx tmp3=0x%llx\n",
+			tw->thread, workspace, tmp1, tmp2, tmp3);
+	}
+
+
+      /* Clear the logic bits.  */
+      __sync_fetch_and_xor (&workspace, 0x1fll * logic1bit);
+      tmp3=__sync_and_and_fetch (&workspace, ~(0xffll * logic2bit));
+
+      /* The logic bits and the arithmetic bits should be zero again.  */
+      if (tmp3 & (0x1fll * logic1bit | 0xffll * logic2bit | 0xffll * add1bit))
+	DOABORT ("End of worker loop; bits none 0 on thread %d "
+			"workspace=0x%llx tmp3=0x%llx "
+			"mask=0x%llx maskedtmp3=0x%llx\n",
+		tw->thread, workspace, tmp3, (0x1fll * logic1bit |
+			0xffll * logic2bit | 0xffll * add1bit),
+		tmp3 & (0x1fll * logic1bit | 0xffll * logic2bit | 0xffll * add1bit));
+
+      __sync_add_and_fetch (&tw->count, 1);
+    }
+  while (!__sync_bool_compare_and_swap (&doquit, 1, 1));
+
+  pthread_exit (0);
+}
+
+int
+main ()
+{
+  /* We have 3 threads doing three sets of operations, an 8 bit
+     arithmetic field, a 5 bit logic field and an 8 bit logic
+     field (just to pack them all in).
+
+  6      5       4       4       3       2       1
+  3      6       8       0       2       4       6       8       0
+  |...,...|...,...|...,...|...,...|...,...|...,...|...,...|...,...
+  - T0   --  T1  -- T2   --T2 --  T0  -*- T2-- T1-- T1   -***- T0-
+   logic2  logic2  arith   log2  arith  log1 log1  arith     log1
+
+  */
+  unsigned int t;
+  long long tmp;
+  int err;
+
+  struct threadwork tw[3]={
+    { 0ll, 0, 27, 0, 56 },
+    { 0ll, 1,  8,16, 48 },
+    { 0ll, 2, 40,21, 35 }
+  };
+
+  pthread_t threads[3];
+
+  __sync_lock_release (&doquit);
+
+  /* Get the work space into a known value - All 1's.  */
+  __sync_lock_release (&workspace); /* Now all 0.  */
+  tmp = __sync_val_compare_and_swap (&workspace, 0, -1ll);
+  if (tmp!=0)
+    DOABORT ("Initial __sync_val_compare_and_swap wasn't 0 workspace=0x%llx "
+		"tmp=0x%llx\n", workspace,tmp);
+
+  for (t = 0; t < 3; t++)
+  {
+    err=pthread_create (&threads[t], NULL , worker, &tw[t]);
+    if (err) DOABORT ("pthread_create failed on thread %d with error %d\n",
+	t, err);
+  };
+
+  sleep (5);
+
+  /* Stop please.  */
+  __sync_lock_test_and_set (&doquit, 1ll);
+
+  for (t = 0; t < 3; t++)
+    {
+      err=pthread_join (threads[t], NULL);
+      if (err)
+	DOABORT ("pthread_join failed on thread %d with error %d\n", t, err);
+    };
+
+  __sync_synchronize ();
+
+  /* OK, so all the workers have finished -
+     the workers should have zero'd their workspace, the unused areas
+     should still be 1.  */
+  if (!__sync_bool_compare_and_swap (&workspace, 0x040000e0ll, 0))
+    DOABORT ("End of run workspace mismatch, got %llx\n", workspace);
+
+  /* All the workers should have done some work.  */
+  for (t = 0; t < 3; t++)
+    {
+      if (tw[t].count == 0) DOABORT ("Worker %d gave 0 count\n", t);
+    };
+
+  return 0;
+}
+
diff --git a/gcc/testsuite/gcc.target/arm/di-longlong64-sync-withhelpers.c b/gcc/testsuite/gcc.target/arm/di-longlong64-sync-withhelpers.c
new file mode 100644
index 0000000..19342bf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/di-longlong64-sync-withhelpers.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_arch_v5_ok } */
+/* { dg-options "-std=gnu99" } */
+/* { dg-add-options arm_arch_v5 } */
+/* { dg-message "note: '__sync_fetch_and_nand' changed semantics in GCC 4.4" "" { target *-*-* } 0 } */
+/* { dg-message "note: '__sync_nand_and_fetch' changed semantics in GCC 4.4" "" { target *-*-* } 0 } */
+/* { dg-message "file included" "In file included" { target *-*-* } 0 } */
+
+#include "../../gcc.dg/di-longlong64-sync-1.c"
+
+/* On an old ARM we have no ldrexd or strexd so we have to use helpers.  */
+/* { dg-final { scan-assembler-not "ldrexd" } } */
+/* { dg-final { scan-assembler-not "strexd" } } */
+/* { dg-final { scan-assembler "__sync_" } } */
diff --git a/gcc/testsuite/gcc.target/arm/di-longlong64-sync-withldrexd.c b/gcc/testsuite/gcc.target/arm/di-longlong64-sync-withldrexd.c
new file mode 100644
index 0000000..dab6940
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/di-longlong64-sync-withldrexd.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_arm_ok } */
+/* { dg-options "-marm -std=gnu99" } */
+/* { dg-require-effective-target arm_arch_v6k_ok } */
+/* { dg-add-options arm_arch_v6k } */
+/* { dg-message "note: '__sync_fetch_and_nand' changed semantics in GCC 4.4" "" { target *-*-* } 0 } */
+/* { dg-message "note: '__sync_nand_and_fetch' changed semantics in GCC 4.4" "" { target *-*-* } 0 } */
+/* { dg-message "file included" "In file included" { target *-*-* } 0 } */
+
+#include "../../gcc.dg/di-longlong64-sync-1.c"
+
+/* We should be using ldrexd, strexd and no helpers or shorter ldrex.  */
+/* { dg-final { scan-assembler-times "\tldrexd" 46 } } */
+/* { dg-final { scan-assembler-times "\tstrexd" 46 } } */
+/* { dg-final { scan-assembler-not "__sync_" } } */
+/* { dg-final { scan-assembler-not "ldrex\t" } } */
+/* { dg-final { scan-assembler-not "strex\t" } } */
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 5d236f7..086fbc9 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -2067,6 +2067,47 @@ proc check_effective_target_arm_fp16_ok { } {
 		check_effective_target_arm_fp16_ok_nocache]
 }
 
+# Creates a series of routines that return 1 if the given architecture
+# can be selected and a routine to give the flags to select that architecture
+# Note: Extra flags may be added to disable options from newer compilers
+# (Thumb in particular - but others may be added in the future)
+# Usage: /* { dg-require-effective-target arm_arch_v5_ok } */
+#        /* { dg-add-options arm_arch_v5 } */
+foreach { armfunc armflag armdef } { v5 "-march=armv5 -marm" __ARM_ARCH_5__
+				     v6 "-march=armv6" __ARM_ARCH_6__
+				     v6k "-march=armv6k" __ARM_ARCH_6K__
+				     v7a "-march=armv7-a" __ARM_ARCH_7A__ } {
+    eval [string map [list FUNC $armfunc FLAG $armflag DEF $armdef ] {
+	proc check_effective_target_arm_arch_FUNC_ok { } {
+	    if { [ string match "*-marm*" "FLAG" ] &&
+		![check_effective_target_arm_arm_ok] } {
+		return 0
+	    }
+	    return [check_no_compiler_messages arm_arch_FUNC_ok assembly {
+		#if !defined (DEF)
+		#error FOO
+		#endif
+	    } "FLAG" ]
+	}
+
+	proc add_options_for_arm_arch_FUNC { flags } {
+	    return "$flags FLAG"
+	}
+    }]
+}
+
+# Return 1 if this is an ARM target where -marm causes ARM to be
+# used (not Thumb)
+
+proc check_effective_target_arm_arm_ok { } {
+    return [check_no_compiler_messages arm_arm_ok assembly {
+	#if !defined (__arm__) || defined (__thumb__) || defined (__thumb2__)
+	#error FOO
+	#endif
+    } "-marm"]
+}
+
+
 # Return 1 is this is an ARM target where -mthumb causes Thumb-1 to be
 # used.
 
@@ -3458,6 +3499,31 @@ proc check_effective_target_sync_int_long { } {
     return $et_sync_int_long_saved
 }
 
+# Return 1 if the target supports atomic operations on "long long" and can
+# execute them
+# So far only put checks in for ARM, others may want to add their own
+proc check_effective_target_sync_longlong { } {
+    return [check_runtime sync_longlong_runtime {
+      #include <stdlib.h>
+      int main ()
+      {
+	long long l1;
+
+	if (sizeof (long long) != 8)
+	  exit (1);
+
+      #ifdef __arm__
+	/* Just check for native; checking for kernel fallback is tricky.  */
+	asm volatile ("ldrexd r0,r1, [%0]" : : "r" (&l1) : "r0", "r1");
+      #else
+      # error "Add other suitable archs here"
+      #endif
+
+	exit (0);
+      }
+    } "" ]
+}
+
 # Return 1 if the target supports atomic operations on "char" and "short".
 
 proc check_effective_target_sync_char_short { } {

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Patch 1/5] ARM 64 bit sync atomic operations [V3]
  2011-10-06 17:53               ` [Patch 1/5] " Dr. David Alan Gilbert
  2011-10-06 17:54                 ` [Patch 2/5] " Dr. David Alan Gilbert
@ 2011-10-12  0:51                 ` Ramana Radhakrishnan
  1 sibling, 0 replies; 43+ messages in thread
From: Ramana Radhakrishnan @ 2011-10-12  0:51 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: gcc-patches, rth, joseph, patches, mikestump, ro

On 6 October 2011 18:51, Dr. David Alan Gilbert
<david.gilbert@linaro.org> wrote:
>       gcc/
>       * config/arm/arm.c (TARGET_HAVE_DMB_MCR): MCR Not available in Thumb1

Hasn't this been approved ? Please commit this.


cheers
Ramana

>
> diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
> index 993e3a0..f6f1da7 100644
> --- a/gcc/config/arm/arm.h
> +++ b/gcc/config/arm/arm.h
> @@ -288,7 +288,8 @@ extern void (*arm_lang_output_object_attributes_hook)(void);
>  #define TARGET_HAVE_DMB                (arm_arch7)
>
>  /* Nonzero if this chip implements a memory barrier via CP15.  */
> -#define TARGET_HAVE_DMB_MCR    (arm_arch6k && ! TARGET_HAVE_DMB)
> +#define TARGET_HAVE_DMB_MCR    (arm_arch6 && ! TARGET_HAVE_DMB \
> +                                && ! TARGET_THUMB1)
>
>  /* Nonzero if this chip implements a memory barrier instruction.  */
>  #define TARGET_HAVE_MEMORY_BARRIER (TARGET_HAVE_DMB || TARGET_HAVE_DMB_MCR)
>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Patch 2/5] ARM 64 bit sync atomic operations [V3]
  2011-10-06 17:54                 ` [Patch 2/5] " Dr. David Alan Gilbert
  2011-10-06 17:55                   ` [Patch 3/5] " Dr. David Alan Gilbert
@ 2011-10-12  0:52                   ` Ramana Radhakrishnan
  2011-10-12  1:45                   ` Ramana Radhakrishnan
  2 siblings, 0 replies; 43+ messages in thread
From: Ramana Radhakrishnan @ 2011-10-12  0:52 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: gcc-patches, rth, joseph, patches, mikestump, ro

On 6 October 2011 18:52, Dr. David Alan Gilbert
<david.gilbert@linaro.org> wrote:
>        Micahel K. Edwards points out in PR/48126 that the sync is in the wrong place
>        relative to the branch target of the compare, since the load could float
>        up beyond the ldrex.
>
>        PR target/48126
>
>          * config/arm/arm.c (arm_output_sync_loop): Move label before barrier


>
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index 5161439..6e7105a 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -24214,8 +24214,11 @@ arm_output_sync_loop (emit_f emit,
>        }
>     }
>
> -  arm_process_output_memory_barrier (emit, NULL);
> +  /* Note: label is before barrier so that in cmp failure case we still get
> +     a barrier to stop subsequent loads floating upwards past the ldrex
> +     pr/48126.  */
>   arm_output_asm_insn (emit, 1, operands, "%sLSYB%%=:", LOCAL_LABEL_PREFIX);
> +  arm_process_output_memory_barrier (emit, NULL);
>  }
>
>  static rtx
>

OK.

Ramana

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Patch 4/5] ARM 64 bit sync atomic operations [V3]
  2011-10-06 18:03                     ` [Patch 4/5] " Dr. David Alan Gilbert
  2011-10-06 18:03                       ` [Patch 5/5] " Dr. David Alan Gilbert
@ 2011-10-12  0:53                       ` Ramana Radhakrishnan
  1 sibling, 0 replies; 43+ messages in thread
From: Ramana Radhakrishnan @ 2011-10-12  0:53 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: gcc-patches, rth, joseph, patches, mikestump, ro

On 6 October 2011 18:54, Dr. David Alan Gilbert
<david.gilbert@linaro.org> wrote:
>    Add ARM 64bit sync helpers for use on older ARMs.  Based on 32bit
>    versions but with check for sufficiently new kernel version.
>
>        gcc/
>        * config/arm/linux-atomic-64bit.c: New (based on linux-atomic.c)
>        * config/arm/linux-atomic.c: Change comment to point to 64bit version
>          (SYNC_LOCK_RELEASE): Instantiate 64bit version.
>        * config/arm/t-linux-eabi: Pull in linux-atomic-64bit.c

OK.

Ramana
>
>
> diff --git a/gcc/config/arm/linux-atomic-64bit.c b/gcc/config/arm/linux-atomic-64bit.c
> new file mode 100644
> index 0000000..6966e66
> --- /dev/null
> +++ b/gcc/config/arm/linux-atomic-64bit.c
> @@ -0,0 +1,166 @@
> +/* 64bit Linux-specific atomic operations for ARM EABI.
> +   Copyright (C) 2008, 2009, 2010, 2011 Free Software Foundation, Inc.
> +   Based on linux-atomic.c
> +
> +   64 bit additions david.gilbert@linaro.org
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it under
> +the terms of the GNU General Public License as published by the Free
> +Software Foundation; either version 3, or (at your option) any later
> +version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +for more details.
> +
> +Under Section 7 of GPL version 3, you are granted additional
> +permissions described in the GCC Runtime Library Exception, version
> +3.1, as published by the Free Software Foundation.
> +
> +You should have received a copy of the GNU General Public License and
> +a copy of the GCC Runtime Library Exception along with this program;
> +see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> +<http://www.gnu.org/licenses/>.  */
> +
> +/* 64bit helper functions for atomic operations; the compiler will
> +   call these when the code is compiled for a CPU without ldrexd/strexd.
> +   (If the CPU had those then the compiler inlines the operation).
> +
> +   These helpers require a kernel helper that's only present on newer
> +   kernels; we check for that in an init section and bail out rather
> +   unceremoneously.  */
> +
> +extern unsigned int __write (int fd, const void *buf, unsigned int count);
> +extern void abort (void);
> +
> +/* Kernel helper for compare-and-exchange.  */
> +typedef int (__kernel_cmpxchg64_t) (const long long* oldval,
> +                                       const long long* newval,
> +                                       long long *ptr);
> +#define __kernel_cmpxchg64 (*(__kernel_cmpxchg64_t *) 0xffff0f60)
> +
> +/* Kernel helper page version number.  */
> +#define __kernel_helper_version (*(unsigned int *)0xffff0ffc)
> +
> +/* Check that the kernel has a new enough version at load.  */
> +static void __check_for_sync8_kernelhelper (void)
> +{
> +  if (__kernel_helper_version < 5)
> +    {
> +      const char err[] = "A newer kernel is required to run this binary. "
> +                               "(__kernel_cmpxchg64 helper)\n";
> +      /* At this point we need a way to crash with some information
> +        for the user - I'm not sure I can rely on much else being
> +        available at this point, so do the same as generic-morestack.c
> +        write () and abort ().  */
> +      __write (2 /* stderr.  */, err, sizeof (err));
> +      abort ();
> +    }
> +};
> +
> +static void (*__sync8_kernelhelper_inithook[]) (void)
> +               __attribute__ ((used, section (".init_array"))) = {
> +  &__check_for_sync8_kernelhelper
> +};
> +
> +#define HIDDEN __attribute__ ((visibility ("hidden")))
> +
> +#define FETCH_AND_OP_WORD64(OP, PFX_OP, INF_OP)                        \
> +  long long HIDDEN                                             \
> +  __sync_fetch_and_##OP##_8 (long long *ptr, long long val)    \
> +  {                                                            \
> +    int failure;                                               \
> +    long long tmp,tmp2;                                                \
> +                                                               \
> +    do {                                                       \
> +      tmp = *ptr;                                              \
> +      tmp2 = PFX_OP (tmp INF_OP val);                          \
> +      failure = __kernel_cmpxchg64 (&tmp, &tmp2, ptr);         \
> +    } while (failure != 0);                                    \
> +                                                               \
> +    return tmp;                                                        \
> +  }
> +
> +FETCH_AND_OP_WORD64 (add,   , +)
> +FETCH_AND_OP_WORD64 (sub,   , -)
> +FETCH_AND_OP_WORD64 (or,    , |)
> +FETCH_AND_OP_WORD64 (and,   , &)
> +FETCH_AND_OP_WORD64 (xor,   , ^)
> +FETCH_AND_OP_WORD64 (nand, ~, &)
> +
> +#define NAME_oldval(OP, WIDTH) __sync_fetch_and_##OP##_##WIDTH
> +#define NAME_newval(OP, WIDTH) __sync_##OP##_and_fetch_##WIDTH
> +
> +/* Implement both __sync_<op>_and_fetch and __sync_fetch_and_<op> for
> +   subword-sized quantities.  */
> +
> +#define OP_AND_FETCH_WORD64(OP, PFX_OP, INF_OP)                        \
> +  long long HIDDEN                                             \
> +  __sync_##OP##_and_fetch_8 (long long *ptr, long long val)    \
> +  {                                                            \
> +    int failure;                                               \
> +    long long tmp,tmp2;                                                \
> +                                                               \
> +    do {                                                       \
> +      tmp = *ptr;                                              \
> +      tmp2 = PFX_OP (tmp INF_OP val);                          \
> +      failure = __kernel_cmpxchg64 (&tmp, &tmp2, ptr);         \
> +    } while (failure != 0);                                    \
> +                                                               \
> +    return tmp2;                                               \
> +  }
> +
> +OP_AND_FETCH_WORD64 (add,   , +)
> +OP_AND_FETCH_WORD64 (sub,   , -)
> +OP_AND_FETCH_WORD64 (or,    , |)
> +OP_AND_FETCH_WORD64 (and,   , &)
> +OP_AND_FETCH_WORD64 (xor,   , ^)
> +OP_AND_FETCH_WORD64 (nand, ~, &)
> +
> +long long HIDDEN
> +__sync_val_compare_and_swap_8 (long long *ptr, long long oldval,
> +                               long long newval)
> +{
> +  int failure;
> +  long long actual_oldval;
> +
> +  while (1)
> +    {
> +      actual_oldval = *ptr;
> +
> +      if (__builtin_expect (oldval != actual_oldval, 0))
> +       return actual_oldval;
> +
> +      failure = __kernel_cmpxchg64 (&actual_oldval, &newval, ptr);
> +
> +      if (__builtin_expect (!failure, 1))
> +       return oldval;
> +    }
> +}
> +
> +typedef unsigned char bool;
> +
> +bool HIDDEN
> +__sync_bool_compare_and_swap_8 (long long *ptr, long long oldval,
> +                                long long newval)
> +{
> +  int failure = __kernel_cmpxchg64 (&oldval, &newval, ptr);
> +  return (failure == 0);
> +}
> +
> +long long HIDDEN
> +__sync_lock_test_and_set_8 (long long *ptr, long long val)
> +{
> +  int failure;
> +  long long oldval;
> +
> +  do {
> +    oldval = *ptr;
> +    failure = __kernel_cmpxchg64 (&oldval, &val, ptr);
> +  } while (failure != 0);
> +
> +  return oldval;
> +}
> diff --git a/gcc/config/arm/linux-atomic.c b/gcc/config/arm/linux-atomic.c
> index 57065a6..80f161d 100644
> --- a/gcc/config/arm/linux-atomic.c
> +++ b/gcc/config/arm/linux-atomic.c
> @@ -32,8 +32,8 @@ typedef void (__kernel_dmb_t) (void);
>  #define __kernel_dmb (*(__kernel_dmb_t *) 0xffff0fa0)
>
>  /* Note: we implement byte, short and int versions of atomic operations using
> -   the above kernel helpers, but there is no support for "long long" (64-bit)
> -   operations as yet.  */
> +   the above kernel helpers; see linux-atomic-64bit.c for "long long" (64-bit)
> +   operations.  */
>
>  #define HIDDEN __attribute__ ((visibility ("hidden")))
>
> @@ -273,6 +273,7 @@ SUBWORD_TEST_AND_SET (unsigned char,  1)
>     *ptr = 0;                                                          \
>   }
>
> +SYNC_LOCK_RELEASE (long long,   8)
>  SYNC_LOCK_RELEASE (int,   4)
>  SYNC_LOCK_RELEASE (short, 2)
>  SYNC_LOCK_RELEASE (char,  1)
> diff --git a/gcc/config/arm/t-linux-eabi b/gcc/config/arm/t-linux-eabi
> index a328a00..3814cc0 100644
> --- a/gcc/config/arm/t-linux-eabi
> +++ b/gcc/config/arm/t-linux-eabi
> @@ -36,3 +36,4 @@ LIB1ASMFUNCS := $(filter-out _dvmd_tls,$(LIB1ASMFUNCS)) _dvmd_lnx _clear_cache
>  EXTRA_MULTILIB_PARTS=crtbegin.o crtend.o crtbeginS.o crtendS.o crtbeginT.o
>
>  LIB2FUNCS_STATIC_EXTRA += $(srcdir)/config/arm/linux-atomic.c
> +LIB2FUNCS_STATIC_EXTRA += $(srcdir)/config/arm/linux-atomic-64bit.c
>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Patch 2/5] ARM 64 bit sync atomic operations [V3]
  2011-10-06 17:54                 ` [Patch 2/5] " Dr. David Alan Gilbert
  2011-10-06 17:55                   ` [Patch 3/5] " Dr. David Alan Gilbert
  2011-10-12  0:52                   ` [Patch 2/5] " Ramana Radhakrishnan
@ 2011-10-12  1:45                   ` Ramana Radhakrishnan
  2 siblings, 0 replies; 43+ messages in thread
From: Ramana Radhakrishnan @ 2011-10-12  1:45 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: gcc-patches, rth, joseph, patches, mikestump, ro

On 6 October 2011 18:52, Dr. David Alan Gilbert
<david.gilbert@linaro.org> wrote:
>        Micahel K. Edwards points out in PR/48126 that the sync is in the wrong place
>        relative to the branch target of the compare, since the load could float
>        up beyond the ldrex.
>
>        PR target/48126
>
>          * config/arm/arm.c (arm_output_sync_loop): Move label before barrier
>
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index 5161439..6e7105a 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -24214,8 +24214,11 @@ arm_output_sync_loop (emit_f emit,
>        }
>     }
>
> -  arm_process_output_memory_barrier (emit, NULL);
> +  /* Note: label is before barrier so that in cmp failure case we still get
> +     a barrier to stop subsequent loads floating upwards past the ldrex
> +     pr/48126.  */

Just one minor nit I just noticed. Please correct this to PR 48126 in
the comment rather than pr/48126.

Otherwise OK.

Ramana

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Patch 3/5] ARM 64 bit sync atomic operations [V3]
  2011-10-06 17:55                   ` [Patch 3/5] " Dr. David Alan Gilbert
  2011-10-06 18:03                     ` [Patch 4/5] " Dr. David Alan Gilbert
@ 2011-10-12  2:35                     ` Ramana Radhakrishnan
  1 sibling, 0 replies; 43+ messages in thread
From: Ramana Radhakrishnan @ 2011-10-12  2:35 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: gcc-patches, rth, joseph, patches, mikestump, ro

On 6 October 2011 18:53, Dr. David Alan Gilbert
<david.gilbert@linaro.org> wrote:
>    Add support for ARM 64bit sync intrinsics.
>
>        gcc/
>        * arm.c (arm_output_ldrex): Support ldrexd.
>          (arm_output_strex): Support strexd.
>          (arm_output_it): New helper to output it in Thumb2 mode only.
>          (arm_output_sync_loop): Support DI mode,
>                                 Change comment to not support const_int.
>          (arm_expand_sync): Support DI mode.
>
>        * arm.h (TARGET_HAVE_LDREXBHD): Split into LDREXBH and LDREXD.
>
>        * iterators.md (NARROW): move from sync.md.
>          (QHSD): New iterator for all current ARM integer modes.
>          (SIDI): New iterator for SI and DI modes only.
>
>        * sync.md  (sync_predtab): New mode_attr
>          (sync_compare_and_swapsi): Fold into sync_compare_and_swap<mode>
>          (sync_lock_test_and_setsi): Fold into sync_lock_test_and_setsi<mode>
>          (sync_<sync_optab>si): Fold into sync_<sync_optab><mode>
>          (sync_nandsi): Fold into sync_nand<mode>
>          (sync_new_<sync_optab>si): Fold into sync_new_<sync_optab><mode>
>          (sync_new_nandsi): Fold into sync_new_nand<mode>
>          (sync_old_<sync_optab>si): Fold into sync_old_<sync_optab><mode>
>          (sync_old_nandsi): Fold into sync_old_nand<mode>
>          (sync_compare_and_swap<mode>): Support SI & DI
>          (sync_lock_test_and_set<mode>): Likewise
>          (sync_<sync_optab><mode>): Likewise
>          (sync_nand<mode>): Likewise
>          (sync_new_<sync_optab><mode>): Likewise
>          (sync_new_nand<mode>): Likewise
>          (sync_old_<sync_optab><mode>): Likewise
>          (sync_old_nand<mode>): Likewise
>          (arm_sync_compare_and_swapsi): Turn into iterator on SI & DI
>          (arm_sync_lock_test_and_setsi): Likewise
>          (arm_sync_new_<sync_optab>si): Likewise
>          (arm_sync_new_nandsi): Likewise
>          (arm_sync_old_<sync_optab>si): Likewise
>          (arm_sync_old_nandsi): Likewise
>          (arm_sync_compare_and_swap<mode> NARROW): use sync_predtab, fix indent
>          (arm_sync_lock_test_and_setsi<mode> NARROW): Likewise
>          (arm_sync_new_<sync_optab><mode> NARROW): Likewise
>          (arm_sync_new_nand<mode> NARROW): Likewise
>          (arm_sync_old_<sync_optab><mode> NARROW): Likewise
>          (arm_sync_old_nand<mode> NARROW): Likewise

OK . Please commit this by Friday if no one else objects.

cheers
Ramana

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Patch 5/5] ARM 64 bit sync atomic operations [V3]
  2011-10-06 18:03                       ` [Patch 5/5] " Dr. David Alan Gilbert
@ 2011-10-12  4:51                         ` Ramana Radhakrishnan
  2011-10-12  5:06                         ` Mike Stump
  1 sibling, 0 replies; 43+ messages in thread
From: Ramana Radhakrishnan @ 2011-10-12  4:51 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: gcc-patches, rth, joseph, patches, mikestump, ro

On 6 October 2011 18:54, Dr. David Alan Gilbert
<david.gilbert@linaro.org> wrote:
>   Test support for ARM 64bit sync intrinsics.
>
>      gcc/testsuite/
>        * gcc.dg/di-longlong64-sync-1.c: New test.
>        * gcc.dg/di-sync-multithread.c: New test.
>        * gcc.target/arm/di-longlong64-sync-withhelpers.c: New test.
>        * gcc.target/arm/di-longlong64-sync-withldrexd.c: New test.
>        * lib/target-supports.exp: (arm_arch_*_ok): Series of  effective-target
>                tests for v5, v6, v6k, and v7-a, and add-options helpers.
>          (check_effective_target_arm_arm_ok): New helper.
>          (check_effective_target_sync_longlong): New helper.

I would like one of the testsuite maintainers to have a second look at
this. I'm not confident about my dejagnu foo to fully review this.

Ramana

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Patch 5/5] ARM 64 bit sync atomic operations [V3]
  2011-10-06 18:03                       ` [Patch 5/5] " Dr. David Alan Gilbert
  2011-10-12  4:51                         ` Ramana Radhakrishnan
@ 2011-10-12  5:06                         ` Mike Stump
  1 sibling, 0 replies; 43+ messages in thread
From: Mike Stump @ 2011-10-12  5:06 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: gcc-patches, Ramana Radhakrishnan, rth, joseph, patches, ro

On Oct 6, 2011, at 10:54 AM, Dr. David Alan Gilbert wrote:
>   Test support for ARM 64bit sync intrinsics.

Ok.  Watch for any fallout on non-arm systems.  I'd always invite people who think they know the best way to test volatile to chime in.  There is the new infrastructure to test multi core synch issues with gdb trickery.  As you want more beef, you can consider it.

I'll note that I do sometimes wonder if this type of code isn't better handled in #if feature tests inside the testcases themselves.

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2011-10-12  1:45 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-01 15:53 [Patch 0/3] ARM 64 bit atomic operations Dr. David Alan Gilbert
2011-07-01 15:55 ` [Patch 1/3] " Dr. David Alan Gilbert
2011-07-12 21:24   ` Ramana Radhakrishnan
2011-07-13  9:18     ` David Gilbert
2011-07-01 15:56 ` [Patch 2/3] " Dr. David Alan Gilbert
2011-07-01 16:03   ` Richard Henderson
2011-07-01 17:08     ` David Gilbert
2011-07-01 19:38   ` Joseph S. Myers
2011-07-04 22:27     ` David Gilbert
2011-07-01 15:57 ` [Patch 3/3] " Dr. David Alan Gilbert
2011-07-12 15:31   ` Ramana Radhakrishnan
2011-07-12 12:55 ` [Patch 0/3] " Ramana Radhakrishnan
2011-07-26  9:14   ` [Patch 0/4] ARM 64 bit sync atomic operations [V2] Dr. David Alan Gilbert
2011-07-26  9:16     ` [Patch 1/4] " Dr. David Alan Gilbert
2011-07-26  9:18       ` [Patch 2/4] " Dr. David Alan Gilbert
2011-07-26  9:23         ` [Patch 3/4] " Dr. David Alan Gilbert
2011-07-26  9:24           ` [Patch 4/4] " Dr. David Alan Gilbert
2011-08-01 15:43             ` Ramana Radhakrishnan
2011-08-17 13:07           ` [Patch 3/4] " Ramana Radhakrishnan
2011-09-30 18:04         ` [Patch 2/4] " Ramana Radhakrishnan
2011-09-30 18:29           ` H.J. Lu
2011-10-03 13:22             ` David Gilbert
2011-09-30 20:51           ` Joseph S. Myers
2011-10-03  8:35             ` Andrew Haley
2011-10-03 14:04               ` David Gilbert
2011-10-03 15:56               ` Joseph S. Myers
2011-10-03 16:07                 ` Jakub Jelinek
2011-09-30 14:25       ` [Patch 1/4] " Ramana Radhakrishnan
2011-10-03 12:53         ` David Gilbert
2011-10-03 15:18           ` David Gilbert
2011-10-06 17:52             ` [Patch 0/5] ARM 64 bit sync atomic operations [V3] Dr. David Alan Gilbert
2011-10-06 17:53               ` [Patch 1/5] " Dr. David Alan Gilbert
2011-10-06 17:54                 ` [Patch 2/5] " Dr. David Alan Gilbert
2011-10-06 17:55                   ` [Patch 3/5] " Dr. David Alan Gilbert
2011-10-06 18:03                     ` [Patch 4/5] " Dr. David Alan Gilbert
2011-10-06 18:03                       ` [Patch 5/5] " Dr. David Alan Gilbert
2011-10-12  4:51                         ` Ramana Radhakrishnan
2011-10-12  5:06                         ` Mike Stump
2011-10-12  0:53                       ` [Patch 4/5] " Ramana Radhakrishnan
2011-10-12  2:35                     ` [Patch 3/5] " Ramana Radhakrishnan
2011-10-12  0:52                   ` [Patch 2/5] " Ramana Radhakrishnan
2011-10-12  1:45                   ` Ramana Radhakrishnan
2011-10-12  0:51                 ` [Patch 1/5] " Ramana Radhakrishnan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).