public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [Patch] MIPS: Emit optimized sync instructions for Octeon CPUs.
@ 2009-08-13 18:01 David Daney
  2009-08-15  7:28 ` Richard Sandiford
  0 siblings, 1 reply; 5+ messages in thread
From: David Daney @ 2009-08-13 18:01 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 1727 bytes --]

The Octeon CPU has a variant of the SYNC instruction called SYNCW that 
is much faster when only ordering of writes is needed.  The built-in 
atomic memory operations are one case where syncw may be used.

This patch lets GCC emit the optimized SYNCW when the target is Octeon.

It turns out that we have to emit a pair of SYNCWs (which is still much 
faster than SYNC), so the lengths of all the atomic insns are not 
constants.  This necessitates changing all of the atomic insns to get 
their length attribute from the new function mips_sync_loop_length().


Bootstrapped and tested on mips64-unknown-linux-gnu configured 
--with-float=soft --with-arch=octeon for all default languages.

OK to commit?

009-08-13  David Daney  <ddaney@caviumnetworks.com>

	* config/mips/sync.md (*memory_barrier): Emit instructions from
	mips_output_sync_insn().  Set length attribute from
	mips_sync_insn_length().
	(sync_compare_and_swap<mode>, compare_and_swap_12, sync_add<mode>,
	sync_<optab>_12, sync_old_<optab>_12, sync_new_<optab>_12,
	sync_nand_12, sync_old_nand_12, sync_new_nand_12, sync_sub<mode>,
	sync_old_add<mode>, sync_old_sub<mode>, sync_new_add<mode>,
	sync_new_sub<mode>, sync_<optab><mode>, sync_old_<optab><mode>,
	sync_new_<optab><mode>, sync_nand<mode>, sync_old_nand<mode>,
	sync_new_nand<mode>, sync_lock_test_and_set<mode>,
	test_and_set_12) Set length attribute from mips_sync_loop_length().
	* gcc/config/mips/mips-protos.h (mips_sync_loop_length,
	mips_output_sync_insn, mips_sync_insn_length) Declare new functions.
	* gcc/config/mips/mips.c (mips_output_sync_loop) Call
	mips_output_sync_insn to emit sync instructions.
	(mips_sync_loop_length, mips_output_sync_insn,
	mips_sync_insn_length) New functions.

[-- Attachment #2: syncw.patch --]
[-- Type: text/plain, Size: 11262 bytes --]

Index: gcc/config/mips/sync.md
===================================================================
--- gcc/config/mips/sync.md	(revision 150651)
+++ gcc/config/mips/sync.md	(working copy)
@@ -40,7 +40,10 @@ (define_insn "*memory_barrier"
   [(set (match_operand:BLK 0 "" "")
 	(unspec:BLK [(match_dup 0)] UNSPEC_MEMORY_BARRIER))]
   "GENERATE_SYNC"
-  "%|sync%-")
+{
+  return mips_output_sync_insn ();
+}
+  [(set (attr "length") (symbol_ref "mips_sync_insn_length ()"))])
 
 (define_insn "sync_compare_and_swap<mode>"
   [(set (match_operand:GPR 0 "register_operand" "=&d,&d")
@@ -58,7 +61,7 @@ (define_insn "sync_compare_and_swap<mode
     loop = MIPS_COMPARE_AND_SWAP ("<d>", "move");
   return mips_output_sync_loop (true, loop, operands);
 }
-  [(set_attr "length" "32")])
+  [(set (attr "length") (symbol_ref "mips_sync_loop_length (true, 6)"))])
 
 (define_expand "sync_compare_and_swap<mode>"
   [(match_operand:SHORT 0 "register_operand")
@@ -93,7 +96,9 @@ (define_insn "compare_and_swap_12"
     loop = MIPS_COMPARE_AND_SWAP_12 (MIPS_COMPARE_AND_SWAP_12_ZERO_OP);
   return mips_output_sync_loop (true, loop, operands);
 }
-  [(set_attr "length" "40,36")])
+  [(set (attr "length")
+	(symbol_ref "mips_sync_loop_length (true,
+					    which_alternative ? 7 : 8)"))])
 
 (define_insn "sync_add<mode>"
   [(set (match_operand:GPR 0 "memory_operand" "+R,R")
@@ -110,7 +115,7 @@ (define_insn "sync_add<mode>"
     loop = MIPS_SYNC_OP ("<d>", "<d>addu");
   return mips_output_sync_loop (true, loop, operands);
 }
-  [(set_attr "length" "28")])
+  [(set (attr "length") (symbol_ref "mips_sync_loop_length (true, 5)"))])
 
 (define_expand "sync_<optab><mode>"
   [(set (match_operand:SHORT 0 "memory_operand")
@@ -142,7 +147,7 @@ (define_insn "sync_<optab>_12"
   const char *loop = MIPS_SYNC_OP_12 ("<insn>", MIPS_SYNC_OP_12_AND);
   return mips_output_sync_loop (true, loop, operands);
 }
-  [(set_attr "length" "40")])
+  [(set (attr "length") (symbol_ref "mips_sync_loop_length (true, 8)"))])
 
 (define_expand "sync_old_<optab><mode>"
   [(parallel [
@@ -179,7 +184,7 @@ (define_insn "sync_old_<optab>_12"
   const char *loop = MIPS_SYNC_OLD_OP_12 ("<insn>", MIPS_SYNC_OLD_OP_12_AND);
   return mips_output_sync_loop (true, loop, operands);
 }
-  [(set_attr "length" "40")])
+  [(set (attr "length") (symbol_ref "mips_sync_loop_length (true, 8)"))])
 
 (define_expand "sync_new_<optab><mode>"
   [(parallel [
@@ -221,7 +226,7 @@ (define_insn "sync_new_<optab>_12"
   const char *loop = MIPS_SYNC_NEW_OP_12 ("<insn>", MIPS_SYNC_NEW_OP_12_AND);
   return mips_output_sync_loop (true, loop, operands);
 }
-  [(set_attr "length" "40")])
+  [(set (attr "length") (symbol_ref "mips_sync_loop_length (true, 8)"))])
 
 (define_expand "sync_nand<mode>"
   [(set (match_operand:SHORT 0 "memory_operand")
@@ -253,7 +258,7 @@ (define_insn "sync_nand_12"
   const char *loop = MIPS_SYNC_OP_12 ("and", MIPS_SYNC_OP_12_XOR);
   return mips_output_sync_loop (true, loop, operands);
 }
-  [(set_attr "length" "40")])
+  [(set (attr "length") (symbol_ref "mips_sync_loop_length (true, 8)"))])
 
 (define_expand "sync_old_nand<mode>"
   [(parallel [
@@ -288,7 +293,7 @@ (define_insn "sync_old_nand_12"
   const char *loop = MIPS_SYNC_OLD_OP_12 ("and", MIPS_SYNC_OLD_OP_12_XOR);
   return mips_output_sync_loop (true, loop, operands);
 }
-  [(set_attr "length" "40")])
+  [(set (attr "length") (symbol_ref "mips_sync_loop_length (true, 8)"))])
 
 (define_expand "sync_new_nand<mode>"
   [(parallel [
@@ -328,7 +333,7 @@ (define_insn "sync_new_nand_12"
   const char *loop = MIPS_SYNC_NEW_OP_12 ("and", MIPS_SYNC_NEW_OP_12_XOR);
   return mips_output_sync_loop (true, loop, operands);
 }
-  [(set_attr "length" "40")])
+  [(set (attr "length") (symbol_ref "mips_sync_loop_length (true, 8)"))])
 
 (define_insn "sync_sub<mode>"
   [(set (match_operand:GPR 0 "memory_operand" "+R")
@@ -341,7 +346,7 @@ (define_insn "sync_sub<mode>"
   const char *loop = MIPS_SYNC_OP ("<d>", "<d>subu");
   return mips_output_sync_loop (true, loop, operands);
 }
-  [(set_attr "length" "28")])
+  [(set (attr "length") (symbol_ref "mips_sync_loop_length (true, 5)"))])
 
 (define_insn "sync_old_add<mode>"
   [(set (match_operand:GPR 0 "register_operand" "=&d,&d")
@@ -360,7 +365,7 @@ (define_insn "sync_old_add<mode>"
     loop = MIPS_SYNC_OLD_OP ("<d>", "<d>addu");
   return mips_output_sync_loop (true, loop, operands);
 }
-  [(set_attr "length" "28")])
+  [(set (attr "length") (symbol_ref "mips_sync_loop_length (true, 5)"))])
 
 (define_insn "sync_old_sub<mode>"
   [(set (match_operand:GPR 0 "register_operand" "=&d")
@@ -375,7 +380,7 @@ (define_insn "sync_old_sub<mode>"
   const char *loop = MIPS_SYNC_OLD_OP ("<d>", "<d>subu");
   return mips_output_sync_loop (true, loop, operands);
 }
-  [(set_attr "length" "28")])
+  [(set (attr "length") (symbol_ref "mips_sync_loop_length (true, 5)"))])
 
 (define_insn "sync_new_add<mode>"
   [(set (match_operand:GPR 0 "register_operand" "=&d,&d")
@@ -394,7 +399,7 @@ (define_insn "sync_new_add<mode>"
     loop = MIPS_SYNC_NEW_OP ("<d>", "<d>addu");
   return mips_output_sync_loop (true, loop, operands);
 }
-  [(set_attr "length" "28")])
+  [(set (attr "length") (symbol_ref "mips_sync_loop_length (true, 5)"))])
 
 (define_insn "sync_new_sub<mode>"
   [(set (match_operand:GPR 0 "register_operand" "=&d")
@@ -409,7 +414,7 @@ (define_insn "sync_new_sub<mode>"
   const char *loop = MIPS_SYNC_NEW_OP ("<d>", "<d>subu");
   return mips_output_sync_loop (true, loop, operands);
 }
-  [(set_attr "length" "28")])
+  [(set (attr "length") (symbol_ref "mips_sync_loop_length (true, 5)"))])
 
 (define_insn "sync_<optab><mode>"
   [(set (match_operand:GPR 0 "memory_operand" "+R,R")
@@ -426,7 +431,7 @@ (define_insn "sync_<optab><mode>"
     loop = MIPS_SYNC_OP ("<d>", "<insn>");
   return mips_output_sync_loop (true, loop, operands);
 }
-  [(set_attr "length" "28")])
+  [(set (attr "length") (symbol_ref "mips_sync_loop_length (true, 5)"))])
 
 (define_insn "sync_old_<optab><mode>"
   [(set (match_operand:GPR 0 "register_operand" "=&d,&d")
@@ -445,7 +450,7 @@ (define_insn "sync_old_<optab><mode>"
     loop = MIPS_SYNC_OLD_OP ("<d>", "<insn>");
   return mips_output_sync_loop (true, loop, operands);
 }
-  [(set_attr "length" "28")])
+  [(set (attr "length") (symbol_ref "mips_sync_loop_length (true, 5)"))])
 
 (define_insn "sync_new_<optab><mode>"
   [(set (match_operand:GPR 0 "register_operand" "=&d,&d")
@@ -464,7 +469,7 @@ (define_insn "sync_new_<optab><mode>"
     loop = MIPS_SYNC_NEW_OP ("<d>", "<insn>");
   return mips_output_sync_loop (true, loop, operands);
 }
-  [(set_attr "length" "28")])
+  [(set (attr "length") (symbol_ref "mips_sync_loop_length (true, 5)"))])
 
 (define_insn "sync_nand<mode>"
   [(set (match_operand:GPR 0 "memory_operand" "+R,R")
@@ -479,7 +484,7 @@ (define_insn "sync_nand<mode>"
     loop = MIPS_SYNC_NAND ("<d>", "and");
   return mips_output_sync_loop (true, loop, operands);
 }
-  [(set_attr "length" "32")])
+  [(set (attr "length") (symbol_ref "mips_sync_loop_length (true, 6)"))])
 
 (define_insn "sync_old_nand<mode>"
   [(set (match_operand:GPR 0 "register_operand" "=&d,&d")
@@ -496,7 +501,7 @@ (define_insn "sync_old_nand<mode>"
     loop = MIPS_SYNC_OLD_NAND ("<d>", "and");
   return mips_output_sync_loop (true, loop, operands);
 }
-  [(set_attr "length" "32")])
+  [(set (attr "length") (symbol_ref "mips_sync_loop_length (true, 6)"))])
 
 (define_insn "sync_new_nand<mode>"
   [(set (match_operand:GPR 0 "register_operand" "=&d,&d")
@@ -513,7 +518,7 @@ (define_insn "sync_new_nand<mode>"
     loop = MIPS_SYNC_NEW_NAND ("<d>", "and");
   return mips_output_sync_loop (true, loop, operands);
 }
-  [(set_attr "length" "32")])
+  [(set (attr "length") (symbol_ref "mips_sync_loop_length (true, 6)"))])
 
 (define_insn "sync_lock_test_and_set<mode>"
   [(set (match_operand:GPR 0 "register_operand" "=&d,&d")
@@ -530,7 +535,7 @@ (define_insn "sync_lock_test_and_set<mod
     loop = MIPS_SYNC_EXCHANGE ("<d>", "move");
   return mips_output_sync_loop (false, loop, operands);
 }
-  [(set_attr "length" "24")])
+  [(set (attr "length") (symbol_ref "mips_sync_loop_length (false, 5)"))])
 
 (define_expand "sync_lock_test_and_set<mode>"
   [(match_operand:SHORT 0 "register_operand")
@@ -562,4 +567,6 @@ (define_insn "test_and_set_12"
     loop = MIPS_SYNC_EXCHANGE_12 (MIPS_SYNC_EXCHANGE_12_ZERO_OP);
   return mips_output_sync_loop (false, loop, operands);
 }
-  [(set_attr "length" "28,24")])
+  [(set (attr "length")
+	(symbol_ref "mips_sync_loop_length (false,
+					    which_alternative ? 5 : 6)"))])
Index: gcc/config/mips/mips-protos.h
===================================================================
--- gcc/config/mips/mips-protos.h	(revision 150651)
+++ gcc/config/mips/mips-protos.h	(working copy)
@@ -299,6 +299,9 @@ extern const char *mips_output_condition
 						   const char *);
 extern const char *mips_output_order_conditional_branch (rtx, rtx *, bool);
 extern const char *mips_output_sync_loop (bool, const char *, rtx *);
+extern int mips_sync_loop_length (bool, int);
+extern const char *mips_output_sync_insn (void);
+extern int mips_sync_insn_length (void);
 extern const char *mips_output_division (const char *, rtx *);
 extern unsigned int mips_hard_regno_nregs (int, enum machine_mode);
 extern bool mips_linked_madd_p (rtx, rtx);
Index: gcc/config/mips/mips.c
===================================================================
--- gcc/config/mips/mips.c	(revision 150651)
+++ gcc/config/mips/mips.c	(working copy)
@@ -10761,7 +10761,7 @@ mips_output_sync_loop (bool barrier_befo
 		       const char *loop, rtx *operands)
 {
   if (barrier_before)
-    output_asm_insn ("sync", NULL);
+    output_asm_insn (mips_output_sync_insn (), NULL);
   /* Use branch-likely instructions to work around the LL/SC R10000 errata.  */
   mips_branch_likely = TARGET_FIX_R10000;
 
@@ -10771,11 +10771,54 @@ mips_output_sync_loop (bool barrier_befo
   if (TARGET_SYNC_AFTER_SC)
     {
       output_asm_insn (loop, operands);
-      loop = "sync";
+      loop = mips_output_sync_insn ();
     }
  
   return loop;
 }
+
+/* Return the length of code that would be emitted by
+   mips_output_sync_loop(), given that the loop consists of LOOP_INSNS
+   instructions and BARRIER_BEFORE has the same value as it would for
+   mips_output_sync_loop.  */
+
+int
+mips_sync_loop_length (bool barrier_before, int loop_insns)
+{
+  int sync_length = 0;
+
+  if (barrier_before)
+    sync_length += mips_sync_insn_length ();
+  if (TARGET_SYNC_AFTER_SC)
+    sync_length += mips_sync_insn_length ();
+  return (4 * loop_insns) + sync_length;
+}
+
+/* Return or emit the assembly code for a SYNC instruction.  */
+
+const char *
+mips_output_sync_insn (void)
+{
+  if (TARGET_OCTEON)
+    {
+      /*  For OCTEON, it is faster to issue two syncw instead of a
+	  sync.*/
+      return "syncw\n\tsyncw";
+    }
+  return "%|sync%-";
+}
+
+/* Return the length of code that would be emitted by
+   mips_output_sync_insn().  */
+
+int
+mips_sync_insn_length (void)
+{
+  if (TARGET_OCTEON)
+    return 8;
+  return 4;
+}
+
 \f
 /* Return the assembly code for DIV or DDIV instruction DIVISION, which has
    the operands given by OPERANDS.  Add in a divide-by-zero check if needed.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Patch] MIPS: Emit optimized sync instructions for Octeon CPUs.
  2009-08-13 18:01 [Patch] MIPS: Emit optimized sync instructions for Octeon CPUs David Daney
@ 2009-08-15  7:28 ` Richard Sandiford
  2009-08-17 17:14   ` David Daney
  0 siblings, 1 reply; 5+ messages in thread
From: Richard Sandiford @ 2009-08-15  7:28 UTC (permalink / raw)
  To: David Daney; +Cc: GCC Patches

David Daney <ddaney@caviumnetworks.com> writes:
> The Octeon CPU has a variant of the SYNC instruction called SYNCW that 
> is much faster when only ordering of writes is needed.  The built-in 
> atomic memory operations are one case where syncw may be used.

Sorry to have to ask, but why is that true for __sync_synchronize?
It's documented as being a full memory barrier.

A couple of minor formatting tweaks:

David Daney <ddaney@caviumnetworks.com> writes:
> Index: gcc/config/mips/sync.md
> ===================================================================
> --- gcc/config/mips/sync.md	(revision 150651)
> +++ gcc/config/mips/sync.md	(working copy)
> @@ -40,7 +40,10 @@ (define_insn "*memory_barrier"
>    [(set (match_operand:BLK 0 "" "")
>  	(unspec:BLK [(match_dup 0)] UNSPEC_MEMORY_BARRIER))]
>    "GENERATE_SYNC"
> -  "%|sync%-")
> +{
> +  return mips_output_sync_insn ();
> +}
> +  [(set (attr "length") (symbol_ref "mips_sync_insn_length ()"))])

{ return mips_output_sync_insn (); }

> +/* Return the length of code that would be emitted by
> +   mips_output_sync_loop(), given that the loop consists of LOOP_INSNS
> +   instructions and BARRIER_BEFORE has the same value as it would for
> +   mips_output_sync_loop.  */
> +
> +int
> +mips_sync_loop_length (bool barrier_before, int loop_insns)
> +{
> +  int sync_length = 0;
> +
> +  if (barrier_before)
> +    sync_length += mips_sync_insn_length ();
> +  if (TARGET_SYNC_AFTER_SC)
> +    sync_length += mips_sync_insn_length ();
> +  return (4 * loop_insns) + sync_length;
> +}
> +
> +/* Return or emit the assembly code for a SYNC instruction.  */
> +
> +const char *
> +mips_output_sync_insn (void)
> +{
> +  if (TARGET_OCTEON)
> +    {
> +      /*  For OCTEON, it is faster to issue two syncw instead of a
> +	  sync.*/
> +      return "syncw\n\tsyncw";
> +    }

  if (TARGET_OCTEON)
    /* For OCTEON, it is faster to issue two SYNCWs instead of a SYNC.  */
    return "syncw\n\tsyncw";

Richard

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Patch] MIPS: Emit optimized sync instructions for Octeon CPUs.
  2009-08-15  7:28 ` Richard Sandiford
@ 2009-08-17 17:14   ` David Daney
  2009-08-17 18:57     ` Paolo Bonzini
  0 siblings, 1 reply; 5+ messages in thread
From: David Daney @ 2009-08-17 17:14 UTC (permalink / raw)
  To: GCC Patches, rdsandiford

Richard Sandiford wrote:
> David Daney <ddaney@caviumnetworks.com> writes:
>> The Octeon CPU has a variant of the SYNC instruction called SYNCW that 
>> is much faster when only ordering of writes is needed.  The built-in 
>> atomic memory operations are one case where syncw may be used.
> 
> Sorry to have to ask, but why is that true for __sync_synchronize?

As you observe, it is not.  Given this problem and your impending 
complete rewrite of __sync*, the patch is withdrawn.

> It's documented as being a full memory barrier.
> 

Which is unfortunate, for many (perhaps most) use cases, all you really 
need are write barriers.

What do you think about adding an additional optional parameter to
__sync_synchronize that would weaken it to just a write barrier?

The write barrier version could be used in places like libgomp, libgcj, 
and the java front-end where we don't need full memory barrier semantics.

David Daney



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Patch] MIPS: Emit optimized sync instructions for Octeon CPUs.
  2009-08-17 17:14   ` David Daney
@ 2009-08-17 18:57     ` Paolo Bonzini
  2009-08-17 19:34       ` Richard Henderson
  0 siblings, 1 reply; 5+ messages in thread
From: Paolo Bonzini @ 2009-08-17 18:57 UTC (permalink / raw)
  To: David Daney; +Cc: GCC Patches, rdsandiford

On 08/17/2009 06:51 PM, David Daney wrote:
>
> What do you think about adding an additional optional parameter to
> __sync_synchronize that would weaken it to just a write barrier?

I'd rather have __sync_rmb and __sync_wmb (or maybe 
__sync_acquire_synchronize and __sync_release_synchronize?).

Paolo

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Patch] MIPS: Emit optimized sync instructions for Octeon CPUs.
  2009-08-17 18:57     ` Paolo Bonzini
@ 2009-08-17 19:34       ` Richard Henderson
  0 siblings, 0 replies; 5+ messages in thread
From: Richard Henderson @ 2009-08-17 19:34 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: David Daney, GCC Patches, rdsandiford

On 08/17/2009 11:22 AM, Paolo Bonzini wrote:
> On 08/17/2009 06:51 PM, David Daney wrote:
>>
>> What do you think about adding an additional optional parameter to
>> __sync_synchronize that would weaken it to just a write barrier?
>
> I'd rather have __sync_rmb and __sync_wmb (or maybe
> __sync_acquire_synchronize and __sync_release_synchronize?).

I'd like to have RMB and WMB, since these get used (via asm) in
libgomp and such.  However, instead of extending the __sync functions,
perhaps we should work more toward the c++0x atomics thing that
was discussed here recently.


r~

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2009-08-17 18:41 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-08-13 18:01 [Patch] MIPS: Emit optimized sync instructions for Octeon CPUs David Daney
2009-08-15  7:28 ` Richard Sandiford
2009-08-17 17:14   ` David Daney
2009-08-17 18:57     ` Paolo Bonzini
2009-08-17 19:34       ` Richard Henderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).