public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH, rs6000] Add memory barriers to tbegin, tend, etc.
@ 2015-09-03 22:12 Peter Bergner
  2015-09-08 14:54 ` David Edelsohn
  2015-10-09 14:41 ` Torvald Riegel
  0 siblings, 2 replies; 6+ messages in thread
From: Peter Bergner @ 2015-09-03 22:12 UTC (permalink / raw)
  To: GCC Patches; +Cc: David Edelsohn, Torvald Riegel, Andreas Krebbel, Andi Kleen

While debugging a transaction lock elision issue, we noticed that the
compiler was moving some loads/stores outside of the transaction body,
because the HTM instructions were not marked as memory barriers, which
is bad.  Looking deeper, I also noticed that neither Intel and S390
have their HTM instructions marked as memory barriers either, although
Andi did submit a patch last year:

    https://gcc.gnu.org/ml/gcc-patches/2014-10/msg02999.html

Richi and r~ both said the memory barrier should be part of the patterns:

    https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02235.html

The following patch uses that suggestion by adding memory barriers to
all of our HTM instructions, which fixes the issue.  I have also added
a __TM_FENCE__ macro users can test to see whether the compiler treats
the HTM instructions as memory barriers or not, in case they want to
explicitly add memory barriers to their code when using older compilers.

On a glibc thread discussing this issue, Torvald also asked that I add
documention describing the memory consistency semantics the HTM instructions
should have, so I added a blurb about that.  Torvald, is the text below
what you were looking for?

This has passed bootstrap/regtesting on powerpc64le-linux.  Is this ok
for mainline?

Since this is a correctness issue, I'd like to eventually backport this to
the release branches.  Is that ok once I've verified bootstrap/regtesting
on them?

Once this is committed, I can take a stab at fixing Intel and S390 similarly,
unless someone beats me to it (hint hint :).  I'd need help testing it though,
since I don't have access to Intel or S390 hardware that support HTM.

Peter

	* config/rs6000/htm.md (UNSPEC_HTM_FENCE): New.
	(tabort, tabort<wd>c, tabort<wd>ci, tbegin, tcheck, tend,
	trechkpt, treclaim, tsr, ttest): Rename define_insns from this...
	(*tabort, *tabort<wd>c, *tabort<wd>ci, *tbegin, *tcheck, *tend,
	*trechkpt, *treclaim, *tsr, *ttest): ...to this.  Add memory barrier.
	(tabort, tabort<wd>c, tabort<wd>ci, tbegin, tcheck, tend,
	trechkpt, treclaim, tsr, ttest): New define_expands.
	* config/rs6000/rs6000-c.c (rs6000_target_modify_macros): Define
	__TM_FENCE__ for htm.
	* doc/extend.texi: Update documentation for htm builtins.

Index: gcc/config/rs6000/htm.md
===================================================================
--- gcc/config/rs6000/htm.md	(revision 227308)
+++ gcc/config/rs6000/htm.md	(working copy)
@@ -45,96 +45,231 @@
    UNSPECV_HTM_MTSPR
   ])
 
+;;
+;; UNSPEC usage
+;;
 
-(define_insn "tabort"
+(define_c_enum "unspec"
+  [UNSPEC_HTM_FENCE
+  ])
+
+(define_expand "tabort"
+  [(parallel
+     [(set (match_operand:CC 1 "cc_reg_operand" "=x")
+	   (unspec_volatile:CC [(match_operand:SI 0 "base_reg_operand" "b")]
+			       UNSPECV_HTM_TABORT))
+      (set (match_dup 2) (unspec:BLK [(match_dup 2)] UNSPEC_HTM_FENCE))])]
+  "TARGET_HTM"
+{
+  operands[2] = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode));
+  MEM_VOLATILE_P (operands[2]) = 1;
+})
+
+(define_insn "*tabort"
   [(set (match_operand:CC 1 "cc_reg_operand" "=x")
 	(unspec_volatile:CC [(match_operand:SI 0 "base_reg_operand" "b")]
-			    UNSPECV_HTM_TABORT))]
+			    UNSPECV_HTM_TABORT))
+   (set (match_operand:BLK 2) (unspec:BLK [(match_dup 2)] UNSPEC_HTM_FENCE))]
   "TARGET_HTM"
   "tabort. %0"
   [(set_attr "type" "htm")
    (set_attr "length" "4")])
 
-(define_insn "tabort<wd>c"
+(define_expand "tabort<wd>c"
+  [(parallel
+     [(set (match_operand:CC 3 "cc_reg_operand" "=x")
+	   (unspec_volatile:CC [(match_operand 0 "u5bit_cint_operand" "n")
+				(match_operand:GPR 1 "gpc_reg_operand" "r")
+				(match_operand:GPR 2 "gpc_reg_operand" "r")]
+			       UNSPECV_HTM_TABORTXC))
+      (set (match_dup 4) (unspec:BLK [(match_dup 4)] UNSPEC_HTM_FENCE))])]
+  "TARGET_HTM"
+{
+  operands[4] = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode));
+  MEM_VOLATILE_P (operands[4]) = 1;
+})
+
+(define_insn "*tabort<wd>c"
   [(set (match_operand:CC 3 "cc_reg_operand" "=x")
 	(unspec_volatile:CC [(match_operand 0 "u5bit_cint_operand" "n")
 			     (match_operand:GPR 1 "gpc_reg_operand" "r")
 			     (match_operand:GPR 2 "gpc_reg_operand" "r")]
-			    UNSPECV_HTM_TABORTXC))]
+			    UNSPECV_HTM_TABORTXC))
+   (set (match_operand:BLK 4) (unspec:BLK [(match_dup 4)] UNSPEC_HTM_FENCE))]
   "TARGET_HTM"
   "tabort<wd>c. %0,%1,%2"
   [(set_attr "type" "htm")
    (set_attr "length" "4")])
 
-(define_insn "tabort<wd>ci"
+(define_expand "tabort<wd>ci"
+  [(parallel
+     [(set (match_operand:CC 3 "cc_reg_operand" "=x")
+	   (unspec_volatile:CC [(match_operand 0 "u5bit_cint_operand" "n")
+				(match_operand:GPR 1 "gpc_reg_operand" "r")
+				(match_operand 2 "s5bit_cint_operand" "n")]
+			       UNSPECV_HTM_TABORTXCI))
+      (set (match_dup 4) (unspec:BLK [(match_dup 4)] UNSPEC_HTM_FENCE))])]
+  "TARGET_HTM"
+{
+  operands[4] = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode));
+  MEM_VOLATILE_P (operands[4]) = 1;
+})
+
+(define_insn "*tabort<wd>ci"
   [(set (match_operand:CC 3 "cc_reg_operand" "=x")
 	(unspec_volatile:CC [(match_operand 0 "u5bit_cint_operand" "n")
 			     (match_operand:GPR 1 "gpc_reg_operand" "r")
 			     (match_operand 2 "s5bit_cint_operand" "n")]
-			    UNSPECV_HTM_TABORTXCI))]
+			    UNSPECV_HTM_TABORTXCI))
+   (set (match_operand:BLK 4) (unspec:BLK [(match_dup 4)] UNSPEC_HTM_FENCE))]
   "TARGET_HTM"
   "tabort<wd>ci. %0,%1,%2"
   [(set_attr "type" "htm")
    (set_attr "length" "4")])
 
-(define_insn "tbegin"
+(define_expand "tbegin"
+  [(parallel
+     [(set (match_operand:CC 1 "cc_reg_operand" "=x")
+	   (unspec_volatile:CC [(match_operand 0 "const_0_to_1_operand" "n")]
+			       UNSPECV_HTM_TBEGIN))
+      (set (match_dup 2) (unspec:BLK [(match_dup 2)] UNSPEC_HTM_FENCE))])]
+  "TARGET_HTM"
+{
+  operands[2] = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode));
+  MEM_VOLATILE_P (operands[2]) = 1;
+})
+
+(define_insn "*tbegin"
   [(set (match_operand:CC 1 "cc_reg_operand" "=x")
 	(unspec_volatile:CC [(match_operand 0 "const_0_to_1_operand" "n")]
-			    UNSPECV_HTM_TBEGIN))]
+			    UNSPECV_HTM_TBEGIN))
+   (set (match_operand:BLK 2) (unspec:BLK [(match_dup 2)] UNSPEC_HTM_FENCE))]
   "TARGET_HTM"
   "tbegin. %0"
   [(set_attr "type" "htm")
    (set_attr "length" "4")])
 
-(define_insn "tcheck"
+(define_expand "tcheck"
+  [(parallel
+     [(set (match_operand:CC 0 "cc_reg_operand" "=y")
+	   (unspec_volatile:CC [(const_int 0)] UNSPECV_HTM_TCHECK))
+      (set (match_dup 1) (unspec:BLK [(match_dup 1)] UNSPEC_HTM_FENCE))])]
+  "TARGET_HTM"
+{
+  operands[1] = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode));
+  MEM_VOLATILE_P (operands[1]) = 1;
+})
+
+(define_insn "*tcheck"
   [(set (match_operand:CC 0 "cc_reg_operand" "=y")
-	(unspec_volatile:CC [(const_int 0)]
-			    UNSPECV_HTM_TCHECK))]
+	(unspec_volatile:CC [(const_int 0)] UNSPECV_HTM_TCHECK))
+   (set (match_operand:BLK 1) (unspec:BLK [(match_dup 1)] UNSPEC_HTM_FENCE))]
   "TARGET_HTM"
   "tcheck %0"
   [(set_attr "type" "htm")
    (set_attr "length" "4")])
 
-(define_insn "tend"
+(define_expand "tend"
+  [(parallel
+     [(set (match_operand:CC 1 "cc_reg_operand" "=x")
+	   (unspec_volatile:CC [(match_operand 0 "const_0_to_1_operand" "n")]
+			       UNSPECV_HTM_TEND))
+      (set (match_dup 2) (unspec:BLK [(match_dup 2)] UNSPEC_HTM_FENCE))])]
+  "TARGET_HTM"
+{
+  operands[2] = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode));
+  MEM_VOLATILE_P (operands[2]) = 1;
+})
+
+(define_insn "*tend"
   [(set (match_operand:CC 1 "cc_reg_operand" "=x")
 	(unspec_volatile:CC [(match_operand 0 "const_0_to_1_operand" "n")]
-			    UNSPECV_HTM_TEND))]
+			    UNSPECV_HTM_TEND))
+   (set (match_operand:BLK 2) (unspec:BLK [(match_dup 2)] UNSPEC_HTM_FENCE))]
   "TARGET_HTM"
   "tend. %0"
   [(set_attr "type" "htm")
    (set_attr "length" "4")])
 
-(define_insn "trechkpt"
+(define_expand "trechkpt"
+  [(parallel
+     [(set (match_operand:CC 0 "cc_reg_operand" "=x")
+	   (unspec_volatile:CC [(const_int 0)] UNSPECV_HTM_TRECHKPT))
+      (set (match_dup 1) (unspec:BLK [(match_dup 1)] UNSPEC_HTM_FENCE))])]
+  "TARGET_HTM"
+{
+  operands[1] = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode));
+  MEM_VOLATILE_P (operands[1]) = 1;
+})
+
+(define_insn "*trechkpt"
   [(set (match_operand:CC 0 "cc_reg_operand" "=x")
-	(unspec_volatile:CC [(const_int 0)]
-			    UNSPECV_HTM_TRECHKPT))]
+	(unspec_volatile:CC [(const_int 0)] UNSPECV_HTM_TRECHKPT))
+   (set (match_operand:BLK 1) (unspec:BLK [(match_dup 1)] UNSPEC_HTM_FENCE))]
   "TARGET_HTM"
   "trechkpt."
   [(set_attr "type" "htm")
    (set_attr "length" "4")])
 
-(define_insn "treclaim"
+(define_expand "treclaim"
+  [(parallel
+     [(set (match_operand:CC 1 "cc_reg_operand" "=x")
+	   (unspec_volatile:CC [(match_operand:SI 0 "gpc_reg_operand" "r")]
+			       UNSPECV_HTM_TRECLAIM))
+      (set (match_dup 2) (unspec:BLK [(match_dup 2)] UNSPEC_HTM_FENCE))])]
+  "TARGET_HTM"
+{
+  operands[2] = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode));
+  MEM_VOLATILE_P (operands[2]) = 1;
+})
+
+(define_insn "*treclaim"
   [(set (match_operand:CC 1 "cc_reg_operand" "=x")
 	(unspec_volatile:CC [(match_operand:SI 0 "gpc_reg_operand" "r")]
-			    UNSPECV_HTM_TRECLAIM))]
+			    UNSPECV_HTM_TRECLAIM))
+   (set (match_operand:BLK 2) (unspec:BLK [(match_dup 2)] UNSPEC_HTM_FENCE))]
   "TARGET_HTM"
   "treclaim. %0"
   [(set_attr "type" "htm")
    (set_attr "length" "4")])
 
-(define_insn "tsr"
+(define_expand "tsr"
+  [(parallel
+     [(set (match_operand:CC 1 "cc_reg_operand" "=x")
+	   (unspec_volatile:CC [(match_operand 0 "const_0_to_1_operand" "n")]
+			       UNSPECV_HTM_TSR))
+      (set (match_dup 2) (unspec:BLK [(match_dup 2)] UNSPEC_HTM_FENCE))])]
+  "TARGET_HTM"
+{
+  operands[2] = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode));
+  MEM_VOLATILE_P (operands[2]) = 1;
+})
+
+(define_insn "*tsr"
   [(set (match_operand:CC 1 "cc_reg_operand" "=x")
 	(unspec_volatile:CC [(match_operand 0 "const_0_to_1_operand" "n")]
-			    UNSPECV_HTM_TSR))]
+			    UNSPECV_HTM_TSR))
+   (set (match_operand:BLK 2) (unspec:BLK [(match_dup 2)] UNSPEC_HTM_FENCE))]
   "TARGET_HTM"
   "tsr. %0"
   [(set_attr "type" "htm")
    (set_attr "length" "4")])
 
-(define_insn "ttest"
+(define_expand "ttest"
+  [(parallel
+     [(set (match_operand:CC 0 "cc_reg_operand" "=x")
+	   (unspec_volatile:CC [(const_int 0)] UNSPECV_HTM_TTEST))
+      (set (match_dup 1) (unspec:BLK [(match_dup 1)] UNSPEC_HTM_FENCE))])]
+  "TARGET_HTM"
+{
+  operands[1] = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode));
+  MEM_VOLATILE_P (operands[1]) = 1;
+})
+
+(define_insn "*ttest"
   [(set (match_operand:CC 0 "cc_reg_operand" "=x")
-	(unspec_volatile:CC [(const_int 0)]
-			    UNSPECV_HTM_TTEST))]
+	(unspec_volatile:CC [(const_int 0)] UNSPECV_HTM_TTEST))
+   (set (match_operand:BLK 1) (unspec:BLK [(match_dup 1)] UNSPEC_HTM_FENCE))]
   "TARGET_HTM"
   "tabortwci. 0,1,0"
   [(set_attr "type" "htm")
Index: gcc/config/rs6000/rs6000-c.c
===================================================================
--- gcc/config/rs6000/rs6000-c.c	(revision 227308)
+++ gcc/config/rs6000/rs6000-c.c	(working copy)
@@ -372,7 +372,11 @@
   if ((flags & OPTION_MASK_VSX) != 0)
     rs6000_define_or_undefine_macro (define_p, "__VSX__");
   if ((flags & OPTION_MASK_HTM) != 0)
-    rs6000_define_or_undefine_macro (define_p, "__HTM__");
+    {
+      rs6000_define_or_undefine_macro (define_p, "__HTM__");
+      /* Tell the user that our HTM insn patterns act as memory barriers.  */
+      rs6000_define_or_undefine_macro (define_p, "__TM_FENCE__");
+    }
   if ((flags & OPTION_MASK_P8_VECTOR) != 0)
     rs6000_define_or_undefine_macro (define_p, "__POWER8_VECTOR__");
   if ((flags & OPTION_MASK_QUAD_MEMORY) != 0)
Index: gcc/doc/extend.texi
===================================================================
--- gcc/doc/extend.texi	(revision 227308)
+++ gcc/doc/extend.texi	(working copy)
@@ -16030,6 +16030,23 @@
 unsigned int __builtin_tsuspend (void)
 @end smallexample
 
+Note that the semantics of the above HTM builtins are required to mimic the
+locking semantics used for critical sections.  Builtins that are used to
+create a new transaction or restart a suspended transaction (specifically any
+builtin that changes the state from non-transactional to transactional) must
+have lock acquisition like semantics while those builtins that end or suspend
+a transaction (ie, changes the state from transactional to non-transactional)
+must have lock release like semantics.  The HTM instructions associated with
+with the builtins inherently provide the correct acquisition and release
+semantics required.  However, the compiler must be prohibited from moving
+loads and stores across the HTM instructions.  This has been accomplished
+by adding memory barriers to the associated HTM instructions.  Earlier versions
+of the compiler did not treat the HTM instructions as memory barriers.
+A @code{__TM_FENCE__} macro has been added, which can be used to determine
+whether the current compiler treats HTM instructions as memory barriers or not.
+This allows the user to explicitly add memory barriers to their code when
+using an older version of the compiler.
+
 The following set of built-in functions are available to gain access
 to the HTM specific special purpose registers.
 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH, rs6000] Add memory barriers to tbegin, tend, etc.
  2015-09-03 22:12 [PATCH, rs6000] Add memory barriers to tbegin, tend, etc Peter Bergner
@ 2015-09-08 14:54 ` David Edelsohn
  2015-10-09 14:41 ` Torvald Riegel
  1 sibling, 0 replies; 6+ messages in thread
From: David Edelsohn @ 2015-09-08 14:54 UTC (permalink / raw)
  To: Peter Bergner; +Cc: GCC Patches, Torvald Riegel, Andreas Krebbel, Andi Kleen

On Thu, Sep 3, 2015 at 5:58 PM, Peter Bergner <bergner@vnet.ibm.com> wrote:
> While debugging a transaction lock elision issue, we noticed that the
> compiler was moving some loads/stores outside of the transaction body,
> because the HTM instructions were not marked as memory barriers, which
> is bad.  Looking deeper, I also noticed that neither Intel and S390
> have their HTM instructions marked as memory barriers either, although
> Andi did submit a patch last year:
>
>     https://gcc.gnu.org/ml/gcc-patches/2014-10/msg02999.html
>
> Richi and r~ both said the memory barrier should be part of the patterns:
>
>     https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02235.html
>
> The following patch uses that suggestion by adding memory barriers to
> all of our HTM instructions, which fixes the issue.  I have also added
> a __TM_FENCE__ macro users can test to see whether the compiler treats
> the HTM instructions as memory barriers or not, in case they want to
> explicitly add memory barriers to their code when using older compilers.
>
> On a glibc thread discussing this issue, Torvald also asked that I add
> documention describing the memory consistency semantics the HTM instructions
> should have, so I added a blurb about that.  Torvald, is the text below
> what you were looking for?
>
> This has passed bootstrap/regtesting on powerpc64le-linux.  Is this ok
> for mainline?
>
> Since this is a correctness issue, I'd like to eventually backport this to
> the release branches.  Is that ok once I've verified bootstrap/regtesting
> on them?
>
> Once this is committed, I can take a stab at fixing Intel and S390 similarly,
> unless someone beats me to it (hint hint :).  I'd need help testing it though,
> since I don't have access to Intel or S390 hardware that support HTM.
>
> Peter
>
>         * config/rs6000/htm.md (UNSPEC_HTM_FENCE): New.
>         (tabort, tabort<wd>c, tabort<wd>ci, tbegin, tcheck, tend,
>         trechkpt, treclaim, tsr, ttest): Rename define_insns from this...
>         (*tabort, *tabort<wd>c, *tabort<wd>ci, *tbegin, *tcheck, *tend,
>         *trechkpt, *treclaim, *tsr, *ttest): ...to this.  Add memory barrier.
>         (tabort, tabort<wd>c, tabort<wd>ci, tbegin, tcheck, tend,
>         trechkpt, treclaim, tsr, ttest): New define_expands.
>         * config/rs6000/rs6000-c.c (rs6000_target_modify_macros): Define
>         __TM_FENCE__ for htm.
>         * doc/extend.texi: Update documentation for htm builtins.

This is okay.

Torvald should comment on the descriptive text.

Thanks, David

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH, rs6000] Add memory barriers to tbegin, tend, etc.
  2015-09-03 22:12 [PATCH, rs6000] Add memory barriers to tbegin, tend, etc Peter Bergner
  2015-09-08 14:54 ` David Edelsohn
@ 2015-10-09 14:41 ` Torvald Riegel
  2015-10-09 16:52   ` Peter Bergner
  1 sibling, 1 reply; 6+ messages in thread
From: Torvald Riegel @ 2015-10-09 14:41 UTC (permalink / raw)
  To: Peter Bergner; +Cc: GCC Patches, David Edelsohn, Andreas Krebbel, Andi Kleen

Sorry for the much delayed response.  I've been sick and am slowly
catching up.

On Thu, 2015-09-03 at 16:58 -0500, Peter Bergner wrote:
> On a glibc thread discussing this issue, Torvald also asked that I add
> documention describing the memory consistency semantics the HTM instructions
> should have, so I added a blurb about that.  Torvald, is the text below
> what you were looking for?

> Index: gcc/doc/extend.texi
> ===================================================================
> --- gcc/doc/extend.texi	(revision 227308)
> +++ gcc/doc/extend.texi	(working copy)
> @@ -16030,6 +16030,23 @@
>  unsigned int __builtin_tsuspend (void)
>  @end smallexample
>  
> +Note that the semantics of the above HTM builtins are required to mimic the
> +locking semantics used for critical sections.  Builtins that are used to
> +create a new transaction or restart a suspended transaction (specifically any
> +builtin that changes the state from non-transactional to transactional)

Those builtins could be nested in transactions yet would need to provide
the same semantics at least as far as the compiler is concerned, so
maybe change "specifically" to "for example"?

> must
> +have lock acquisition like semantics while those builtins that end or suspend
> +a transaction (ie, changes the state from transactional to non-transactional)
> +must have lock release like semantics.

I would be a bit more specific just to avoid confusion about how the
locks are implemented or what specific semantics they have.  What about
adding this or something similar:

"Specifically, this must mimic lock semantics as specified by C++11, for
example: Lock acquisition is as-if an execution of
__atomic_exchange_n(&globallock,1,__ATOMIC_ACQUIRE) that returns 0, and
lock release is as-if an execution of
__atomic_store(&globallock,0,__ATOMIC_RELEASE), with globallock being an
implicit implementation-defined lock used for all transactions."

> The HTM instructions associated with
> +with the builtins inherently provide the correct acquisition and release
> +semantics required.

I would say "hardware barriers" or "hardware fences" instead of
"semantics" to clarify that this is just one part.

> However, the compiler must be prohibited from moving
> +loads and stores across the HTM instructions.  This has been accomplished
> +by adding memory barriers to the associated HTM instructions.

I think it would be good to make the requirement tighter, so that we do
not promise too much.  What about changing this to:

"However, the compiler must also be prohibited from moving loads and
stores across the builtins in a way that would violate their semantics.
This has been accomplished by adding memory barriers to the associated
HTM instructions (which is a conservative approach to provide acquire
and release semantics)."

I haven't thought enough about txn suspend/resume/abort to make a
proposal how their required semantics would be specified best.  Maybe we
should call this out explicitly, or maybe the sentences above are
sufficient because they refer to HTM instructions generally. 

Maybe the second "HTM instructions" should be "builtins" too, I don't
know.

> Earlier versions
> +of the compiler did not treat the HTM instructions as memory barriers.
> +A @code{__TM_FENCE__} macro has been added, which can be used to determine
> +whether the current compiler treats HTM instructions as memory barriers or not.

"builtins" instead of "HTM instructions"?

> +This allows the user to explicitly add memory barriers to their code when
> +using an older version of the compiler.
> +
>  The following set of built-in functions are available to gain access
>  to the HTM specific special purpose registers.

Thanks for documenting the semantics!

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH, rs6000] Add memory barriers to tbegin, tend, etc.
  2015-10-09 14:41 ` Torvald Riegel
@ 2015-10-09 16:52   ` Peter Bergner
  2015-10-09 20:19     ` Torvald Riegel
  0 siblings, 1 reply; 6+ messages in thread
From: Peter Bergner @ 2015-10-09 16:52 UTC (permalink / raw)
  To: Torvald Riegel; +Cc: GCC Patches, David Edelsohn, Andreas Krebbel, Andi Kleen

On Fri, 2015-10-09 at 16:41 +0200, Torvald Riegel wrote:
> On Thu, 2015-09-03 at 16:58 -0500, Peter Bergner wrote:
>> +Note that the semantics of the above HTM builtins are required to mimic the
>> +locking semantics used for critical sections.  Builtins that are used to
>> +create a new transaction or restart a suspended transaction (specifically any
>> +builtin that changes the state from non-transactional to transactional)
> 
> Those builtins could be nested in transactions yet would need to provide
> the same semantics at least as far as the compiler is concerned, so
> maybe change "specifically" to "for example"?

Ah, good point.  I forgot about the nested example.  In that case, how about
we just drop the text within the ()'s instead?  I'm not sure we need it
given your suggested addition below.



> I would be a bit more specific just to avoid confusion about how the
> locks are implemented or what specific semantics they have.  What about
> adding this or something similar:
> 
> "Specifically, this must mimic lock semantics as specified by C++11, for
> example: Lock acquisition is as-if an execution of
> __atomic_exchange_n(&globallock,1,__ATOMIC_ACQUIRE) that returns 0, and
> lock release is as-if an execution of
> __atomic_store(&globallock,0,__ATOMIC_RELEASE), with globallock being an
> implicit implementation-defined lock used for all transactions."

Looks good to me.


>> The HTM instructions associated with
>> +with the builtins inherently provide the correct acquisition and release
>> +semantics required.
> 
> I would say "hardware barriers" or "hardware fences" instead of
> "semantics" to clarify that this is just one part.

Agreed, I'll change it to use "hardware barriers".



>> However, the compiler must be prohibited from moving
>> +loads and stores across the HTM instructions.  This has been accomplished
>> +by adding memory barriers to the associated HTM instructions.
> 
> I think it would be good to make the requirement tighter, so that we do
> not promise too much.  What about changing this to:
> 
> "However, the compiler must also be prohibited from moving loads and
> stores across the builtins in a way that would violate their semantics.
> This has been accomplished by adding memory barriers to the associated
> HTM instructions (which is a conservative approach to provide acquire
> and release semantics)."

Done.


> I haven't thought enough about txn suspend/resume/abort to make a
> proposal how their required semantics would be specified best.  Maybe we
> should call this out explicitly, or maybe the sentences above are
> sufficient because they refer to HTM instructions generally. 

Well we do mention that we need acquire semantics not just for instructions
that start transactions, but also for instructions that resume transactions.
Similar for release and end/suspend, so I guess I'd lean towards what we have
is sufficient.


> Maybe the second "HTM instructions" should be "builtins" too, I don't
> know.

Well the builtins expand into multiple hardware instructions and the
memory barriers are only attached to the HTM instructions and not the
other instructions that are generated by the builtins, so I think
HTM instructions is probably more correct here.  But if you think
builtins is better, I can change it.



>> Earlier versions
>> +of the compiler did not treat the HTM instructions as memory barriers.
>> +A @code{__TM_FENCE__} macro has been added, which can be used to determine
>> +whether the current compiler treats HTM instructions as memory barriers or not.
> 
> "builtins" instead of "HTM instructions"?

For the same reason as above, I think HTM instructions is probably more
correct here too...but can be convinced otherwise. :-)


Thanks for the review!  I've attached the changes to the documentation below.
Is this better?


Peter


Index: gcc/doc/extend.texi
===================================================================
--- gcc/doc/extend.texi	(revision 227308)
+++ gcc/doc/extend.texi	(working copy)
@@ -16030,6 +16030,28 @@
 unsigned int __builtin_tsuspend (void)
 @end smallexample
 
+Note that the semantics of the above HTM builtins are required to mimic
+the locking semantics used for critical sections.  Builtins that are used
+to create a new transaction or restart a suspended transaction must have
+lock acquisition like semantics while those builtins that end or suspend a
+transaction must have lock release like semantics.  Specifically, this must
+mimic lock semantics as specified by C++11, for example: Lock acquisition is
+as-if an execution of __atomic_exchange_n(&globallock,1,__ATOMIC_ACQUIRE)
+that returns 0, and lock release is as-if an execution of
+__atomic_store(&globallock,0,__ATOMIC_RELEASE), with globallock being an
+implicit implementation-defined lock used for all transactions.  The HTM
+instructions associated with with the builtins inherently provide the
+correct acquisition and release hardware barriers required.  However,
+the compiler must also be prohibited from moving loads and stores across
+the builtins in a way that would violate their semantics.  This has been
+accomplished by adding memory barriers to the associated HTM instructions
+(which is a conservative approach to provide acquire and release semantics).
+Earlier versions of the compiler did not treat the HTM instructions as
+memory barriers.  A @code{__TM_FENCE__} macro has been added, which can
+be used to determine whether the current compiler treats HTM instructions
+as memory barriers or not.  This allows the user to explicitly add memory
+barriers to their code when using an older version of the compiler.
+
 The following set of built-in functions are available to gain access
 to the HTM specific special purpose registers.
 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH, rs6000] Add memory barriers to tbegin, tend, etc.
  2015-10-09 16:52   ` Peter Bergner
@ 2015-10-09 20:19     ` Torvald Riegel
  2015-10-14 21:52       ` Peter Bergner
  0 siblings, 1 reply; 6+ messages in thread
From: Torvald Riegel @ 2015-10-09 20:19 UTC (permalink / raw)
  To: Peter Bergner; +Cc: GCC Patches, David Edelsohn, Andreas Krebbel, Andi Kleen

On Fri, 2015-10-09 at 11:52 -0500, Peter Bergner wrote:
> On Fri, 2015-10-09 at 16:41 +0200, Torvald Riegel wrote:
> > On Thu, 2015-09-03 at 16:58 -0500, Peter Bergner wrote:
> >> +Note that the semantics of the above HTM builtins are required to mimic the
> >> +locking semantics used for critical sections.  Builtins that are used to
> >> +create a new transaction or restart a suspended transaction (specifically any
> >> +builtin that changes the state from non-transactional to transactional)
> > 
> > Those builtins could be nested in transactions yet would need to provide
> > the same semantics at least as far as the compiler is concerned, so
> > maybe change "specifically" to "for example"?
> 
> Ah, good point.  I forgot about the nested example.  In that case, how about
> we just drop the text within the ()'s instead?  I'm not sure we need it
> given your suggested addition below.

Sounds good to me.

> > I haven't thought enough about txn suspend/resume/abort to make a
> > proposal how their required semantics would be specified best.  Maybe we
> > should call this out explicitly, or maybe the sentences above are
> > sufficient because they refer to HTM instructions generally. 
> 
> Well we do mention that we need acquire semantics not just for instructions
> that start transactions, but also for instructions that resume transactions.
> Similar for release and end/suspend, so I guess I'd lean towards what we have
> is sufficient.

OK.

> 
> > Maybe the second "HTM instructions" should be "builtins" too, I don't
> > know.
> 
> Well the builtins expand into multiple hardware instructions and the
> memory barriers are only attached to the HTM instructions and not the
> other instructions that are generated by the builtins, so I think
> HTM instructions is probably more correct here.  But if you think
> builtins is better, I can change it.

No, it's exactly this aspect of the implementation that I didn't know
about, and thus wasn't sure.  Sounds all good.


> Thanks for the review!  I've attached the changes to the documentation below.
> Is this better?

Yes, thanks!

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH, rs6000] Add memory barriers to tbegin, tend, etc.
  2015-10-09 20:19     ` Torvald Riegel
@ 2015-10-14 21:52       ` Peter Bergner
  0 siblings, 0 replies; 6+ messages in thread
From: Peter Bergner @ 2015-10-14 21:52 UTC (permalink / raw)
  To: Torvald Riegel; +Cc: GCC Patches, David Edelsohn, Andreas Krebbel, Andi Kleen

On Fri, 2015-10-09 at 22:19 +0200, Torvald Riegel wrote:
> On Fri, 2015-10-09 at 11:52 -0500, Peter Bergner wrote:
>> Thanks for the review!  I've attached the changes to the documentation below.
>> Is this better?
> 
> Yes, thanks!

Great, thanks!  This is committed a revision 228827.  I'm just starting
the testing of the backports to the release branches and will commit it
there too, assuming everything is clean.

Peter


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-10-14 21:52 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-03 22:12 [PATCH, rs6000] Add memory barriers to tbegin, tend, etc Peter Bergner
2015-09-08 14:54 ` David Edelsohn
2015-10-09 14:41 ` Torvald Riegel
2015-10-09 16:52   ` Peter Bergner
2015-10-09 20:19     ` Torvald Riegel
2015-10-14 21:52       ` Peter Bergner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).