public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] Add faster HTM fastpath for libitm TSX v2
@ 2013-01-25  3:31 Andi Kleen
  2013-01-29 13:17 ` Torvald Riegel
  0 siblings, 1 reply; 6+ messages in thread
From: Andi Kleen @ 2013-01-25  3:31 UTC (permalink / raw)
  To: gcc-patches; +Cc: triegel, rth, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

The libitm TSX hardware transaction fast path currently does quite a bit of
unnecessary work (saving registers etc.) before even trying to start
a hardware transaction. This patch moves the initial attempt at a
transaction early into the assembler stub. Complicated work like retries
is still done in C. So this is essentially a fast path for the fast
path.

The assembler code doesn't understand the layout of "serial_lock", but
it needs to check that serial_lock is free. We export just the lock
variable as a separate pointer for this.

The assembler code controls the HTM fast path with a new pr_tryHTM flag.

I had to reorganize the retry loop slightly so that the waiting for the
lock to be free again is done first (because the first transaction
is already done elsewhere). This makes the patch look larger than
it really is. This just moves one basic block around.

Passes bootstrap/testsuite on x86_64

v2: Use asm() to supply assembler name of namespaced variables
    Make comments more verbose.

libitm/:
2013-01-24  Andi Kleen  <ak@linux.intel.com>

	* beginend.cc (GTM::htm_fastpath): Add asm alias to
	__gtm_htm_fastpath
	(GTM::global_lock): Add.
	(begin_transaction): Check pr_tryHTM. Remove one iteration
	from HTM retry loop. Move lock free check to the beginning of
	loop body.
	* config/linux/rwlock.h (gtm_rwlock::get_lock_var): Add.
	* config/posix/rwlock.h (tm_rwlock::get_lock_var): Add.
	* config/x86/sjlj.S: Include config.h.
	(pr_tryHTM, pr_uninstrumentedCode, pr_hasNoAbort,
	 a_runInstrumentedCode, a_runUninstrumentedCode,
	 _XABORT_EXPLICIT, _XABORT_RETRY): Add defines.
	(_ITM_beginTransaction): Add HTM fast path.
	* libitm.h (pr_tryHTM): Add.
	* libitm_i.h (GTM::htm_fastpath): Add asm alias.
	(GTM::gtm_global_lock): Add.
	* method-serial.cc (method_group::init, fini): Initialize
	GTM::global_lock and GTM::htm_fastpath.
---
 libitm/beginend.cc           |   46 +++++++++++++++++++++++------------------
 libitm/config/linux/rwlock.h |    5 +++++
 libitm/config/posix/rwlock.h |    5 +++++
 libitm/config/x86/sjlj.S     |   47 ++++++++++++++++++++++++++++++++++++++++++
 libitm/libitm.h              |    7 +++++--
 libitm/libitm_i.h            |    3 ++-
 libitm/method-serial.cc      |    2 ++
 7 files changed, 92 insertions(+), 23 deletions(-)

diff --git a/libitm/beginend.cc b/libitm/beginend.cc
index 4369946..118920b 100644
--- a/libitm/beginend.cc
+++ b/libitm/beginend.cc
@@ -55,7 +55,9 @@ static pthread_key_t thr_release_key;
 static pthread_once_t thr_release_once = PTHREAD_ONCE_INIT;
 
 // See gtm_thread::begin_transaction.
-uint32_t GTM::htm_fastpath = 0;
+uint32_t GTM::htm_fastpath asm("__gtm_htm_fastpath") = 0;
+
+uint32_t *GTM::global_lock asm("__gtm_global_lock");
 
 /* Allocate a transaction structure.  */
 void *
@@ -187,27 +189,13 @@ GTM::gtm_thread::begin_transaction (uint32_t prop, const gtm_jmpbuf *jb)
   // indeed in serial mode, and HW transactions should never need serial mode
   // for any internal changes (e.g., they never abort visibly to the STM code
   // and thus do not trigger the standard retry handling).
-  if (likely(htm_fastpath && (prop & pr_hasNoAbort)))
+  // pr_tryHTM can be set by an assembler fast path when it already tried
+  // a hardware transaction once. In this case we do one retry less.
+  if (likely(prop & pr_tryHTM))
     {
-      for (uint32_t t = htm_fastpath; t; t--)
+      // One transaction has been already tried by the assembler fast path.
+      for (uint32_t t = htm_fastpath - 1; t; t--)
 	{
-	  uint32_t ret = htm_begin();
-	  if (htm_begin_success(ret))
-	    {
-	      // We are executing a transaction now.
-	      // Monitor the writer flag in the serial-mode lock, and abort
-	      // if there is an active or waiting serial-mode transaction.
-	      if (unlikely(serial_lock.is_write_locked()))
-		htm_abort();
-	      else
-		// We do not need to set a_saveLiveVariables because of HTM.
-		return (prop & pr_uninstrumentedCode) ?
-		    a_runUninstrumentedCode : a_runInstrumentedCode;
-	    }
-	  // The transaction has aborted.  Don't retry if it's unlikely that
-	  // retrying the transaction will be successful.
-	  if (!htm_abort_should_retry(ret))
-	    break;
 	  // Wait until any concurrent serial-mode transactions have finished.
 	  // This is an empty critical section, but won't be elided.
 	  if (serial_lock.is_write_locked())
@@ -225,6 +213,24 @@ GTM::gtm_thread::begin_transaction (uint32_t prop, const gtm_jmpbuf *jb)
 	      // we have retried so often that we should go serial to avoid
 	      // starvation.
 	    }
+
+	  uint32_t ret = htm_begin();
+	  if (htm_begin_success(ret))
+	    {
+	      // We are executing a transaction now.
+	      // Monitor the writer flag in the serial-mode lock, and abort
+	      // if there is an active or waiting serial-mode transaction.
+	      if (unlikely(serial_lock.is_write_locked()))
+		htm_abort();
+	      else
+		// We do not need to set a_saveLiveVariables because of HTM.
+		return (prop & pr_uninstrumentedCode) ?
+		    a_runUninstrumentedCode : a_runInstrumentedCode;
+	    }
+	  // The transaction has aborted.  Don't retry if it's unlikely that
+	  // retrying the transaction will be successful.
+	  if (!htm_abort_should_retry(ret))
+	    break;
 	}
     }
 #endif
diff --git a/libitm/config/linux/rwlock.h b/libitm/config/linux/rwlock.h
index f13d287..b27b785 100644
--- a/libitm/config/linux/rwlock.h
+++ b/libitm/config/linux/rwlock.h
@@ -67,6 +67,11 @@ class gtm_rwlock
     return writers.load (memory_order_relaxed) != 0;
   }
 
+  uint32_t *get_lock_var()
+  {
+    return (uint32_t *)&writers;
+  }
+
  protected:
   bool write_lock_generic (gtm_thread *tx);
 };
diff --git a/libitm/config/posix/rwlock.h b/libitm/config/posix/rwlock.h
index 79f1429..4481378 100644
--- a/libitm/config/posix/rwlock.h
+++ b/libitm/config/posix/rwlock.h
@@ -82,6 +82,11 @@ class gtm_rwlock
     return summary.load (memory_order_relaxed) & (a_writer | w_writer);
   }
 
+  uint32_t *get_lock_var()
+  {
+    return (uint32_t *)&summmary;
+  }
+
  protected:
   bool write_lock_generic (gtm_thread *tx);
 };
diff --git a/libitm/config/x86/sjlj.S b/libitm/config/x86/sjlj.S
index 4d733f4..4403f89 100644
--- a/libitm/config/x86/sjlj.S
+++ b/libitm/config/x86/sjlj.S
@@ -24,6 +24,7 @@
 
 
 #include "asmcfi.h"
+#include "config.h"
 
 #define CONCAT1(a, b) CONCAT2(a, b)
 #define CONCAT2(a, b) a ## b
@@ -52,6 +53,19 @@
 #  endif
 #endif
 
+/* Should share those somewhere, but they are an ABI anyways, so shouldn't
+   change. */
+	
+#define   pr_tryHTM			0x800000
+#define   pr_uninstrumentedCode         0x0002
+#define   pr_hasNoAbort			0x0008
+
+#define   a_runInstrumentedCode         0x01
+#define   a_runUninstrumentedCode       0x02
+
+#define _XABORT_EXPLICIT    (1 << 0)
+#define _XABORT_RETRY       (1 << 1)
+	
 	.text
 
 	.align 4
@@ -60,6 +74,39 @@
 SYM(_ITM_beginTransaction):
 	cfi_startproc
 #ifdef __x86_64__
+#ifdef HAVE_AS_RTM
+	/* HTM fast path.
+	   Try to run the transaction once here. If it doesn't work
+	   call the C code which retries a few more times and does other
+	   things. This is merely a fast path to optimize small transactions,
+	   anything complicated is done in C++. */
+	cmpl  $0,__gtm_htm_fastpath(%rip)
+	jz    no_txn
+	testl $pr_hasNoAbort,%edi
+	jnz   no_txn
+	xbegin txn_abort
+	/* Would be better to have a tool to generate offsetof(). */
+	movq  __gtm_global_lock(%rip),%rax
+	/* global lock is free? */
+	cmpl  $0,(%rax)
+	jnz   1f
+	/* Everything good. start running transaction */
+	testl $pr_uninstrumentedCode,%edi	
+	mov   $a_runInstrumentedCode,%eax
+	mov   $a_runUninstrumentedCode,%ecx
+	cmovnz %ecx,%eax
+	ret
+	/* Lock is busy: abort */
+1:	xabort $0xff
+txn_abort:
+	/* if an internal abort, but not the xabort above,
+	   tell the C code to not retry */
+	testl $(_XABORT_RETRY|_XABORT_EXPLICIT),%eax
+	jz  no_txn
+	orl $pr_tryHTM,%edi
+	/* go on to the C code now for further retries */
+no_txn:
+#endif
 	leaq	8(%rsp), %rax
 	subq	$56, %rsp
 	cfi_def_cfa_offset(64)
diff --git a/libitm/libitm.h b/libitm/libitm.h
index 436eb94..e654e42 100644
--- a/libitm/libitm.h
+++ b/libitm/libitm.h
@@ -72,7 +72,8 @@ typedef enum
     inIrrevocableTransaction
 } _ITM_howExecuting;
 
-/* Values to describe properties of code, passed in to beginTransaction */
+/* Values to describe properties of code, passed in to beginTransaction.
+   Some values are duplicated in assembler code.  */
 typedef enum
 {
    pr_instrumentedCode		= 0x0001,
@@ -95,7 +96,9 @@ typedef enum
    pr_exceptionBlock		= 0x1000,
    pr_hasElse			= 0x2000,
    pr_readOnly			= 0x4000,
-   pr_hasNoSimpleReads		= 0x400000
+   pr_hasNoSimpleReads		= 0x400000,
+   /* Not part of he ABI: */
+   pr_tryHTM			= 0x800000
 } _ITM_codeProperties;
 
 /* Result from startTransaction that describes what actions to take.  */
diff --git a/libitm/libitm_i.h b/libitm/libitm_i.h
index 4dfcda9..7b2b73a 100644
--- a/libitm/libitm_i.h
+++ b/libitm/libitm_i.h
@@ -338,7 +338,8 @@ extern gtm_cacheline_mask gtm_mask_stack(gtm_cacheline *, gtm_cacheline_mask);
 
 // Control variable for the HTM fastpath that uses serial mode as fallback.
 // Non-zero if the HTM fastpath is enabled. See gtm_thread::begin_transaction.
-extern uint32_t htm_fastpath;
+extern uint32_t htm_fastpath asm("__gtm_htm_fastpath");
+extern uint32_t *global_lock asm("__gtm_global_lock");
 
 } // namespace GTM
 
diff --git a/libitm/method-serial.cc b/libitm/method-serial.cc
index 38857dc..2425690 100644
--- a/libitm/method-serial.cc
+++ b/libitm/method-serial.cc
@@ -222,6 +222,7 @@ struct htm_mg : public method_group
     // Enable the HTM fastpath if the HW is available.  The fastpath is
     // initially disabled.
 #ifdef USE_HTM_FASTPATH
+    global_lock = gtm_thread::serial_lock.get_lock_var();
     htm_fastpath = htm_init();
 #endif
   }
@@ -229,6 +230,7 @@ struct htm_mg : public method_group
   {
     // Disable the HTM fastpath.
     htm_fastpath = 0;
+    global_lock = NULL;
   }
 };
 
-- 
1.7.10.4

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Add faster HTM fastpath for libitm TSX v2
  2013-01-25  3:31 [PATCH] Add faster HTM fastpath for libitm TSX v2 Andi Kleen
@ 2013-01-29 13:17 ` Torvald Riegel
  2013-01-29 19:19   ` Andi Kleen
  0 siblings, 1 reply; 6+ messages in thread
From: Torvald Riegel @ 2013-01-29 13:17 UTC (permalink / raw)
  To: Andi Kleen; +Cc: gcc-patches, rth, Andi Kleen

On Thu, 2013-01-24 at 19:30 -0800, Andi Kleen wrote:
> From: Andi Kleen <ak@linux.intel.com>
> 
> The libitm TSX hardware transaction fast path currently does quite a bit of
> unnecessary work (saving registers etc.) before even trying to start
> a hardware transaction. This patch moves the initial attempt at a
> transaction early into the assembler stub. Complicated work like retries
> is still done in C. So this is essentially a fast path for the fast
> path.
> 
> The assembler code doesn't understand the layout of "serial_lock", but
> it needs to check that serial_lock is free. We export just the lock
> variable as a separate pointer for this.
> 
> The assembler code controls the HTM fast path with a new pr_tryHTM flag.
> 
> I had to reorganize the retry loop slightly so that the waiting for the
> lock to be free again is done first (because the first transaction
> is already done elsewhere). This makes the patch look larger than
> it really is. This just moves one basic block around.
> 
> Passes bootstrap/testsuite on x86_64
> 
> v2: Use asm() to supply assembler name of namespaced variables
>     Make comments more verbose.

Andi,

next time please reply to the points raised in a review if you disagree
with them, and don't just ignore them.  That speeds up the review.

> libitm/:
> 2013-01-24  Andi Kleen  <ak@linux.intel.com>
> 
> 	* beginend.cc (GTM::htm_fastpath): Add asm alias to
> 	__gtm_htm_fastpath
> 	(GTM::global_lock): Add.
> 	(begin_transaction): Check pr_tryHTM. Remove one iteration
> 	from HTM retry loop. Move lock free check to the beginning of
> 	loop body.
> 	* config/linux/rwlock.h (gtm_rwlock::get_lock_var): Add.
> 	* config/posix/rwlock.h (tm_rwlock::get_lock_var): Add.
> 	* config/x86/sjlj.S: Include config.h.
> 	(pr_tryHTM, pr_uninstrumentedCode, pr_hasNoAbort,
> 	 a_runInstrumentedCode, a_runUninstrumentedCode,
> 	 _XABORT_EXPLICIT, _XABORT_RETRY): Add defines.
> 	(_ITM_beginTransaction): Add HTM fast path.
> 	* libitm.h (pr_tryHTM): Add.
> 	* libitm_i.h (GTM::htm_fastpath): Add asm alias.
> 	(GTM::gtm_global_lock): Add.
> 	* method-serial.cc (method_group::init, fini): Initialize
> 	GTM::global_lock and GTM::htm_fastpath.
> ---
>  libitm/beginend.cc           |   46 +++++++++++++++++++++++------------------
>  libitm/config/linux/rwlock.h |    5 +++++
>  libitm/config/posix/rwlock.h |    5 +++++
>  libitm/config/x86/sjlj.S     |   47 ++++++++++++++++++++++++++++++++++++++++++
>  libitm/libitm.h              |    7 +++++--
>  libitm/libitm_i.h            |    3 ++-
>  libitm/method-serial.cc      |    2 ++
>  7 files changed, 92 insertions(+), 23 deletions(-)
> 
> diff --git a/libitm/beginend.cc b/libitm/beginend.cc
> index 4369946..118920b 100644
> --- a/libitm/beginend.cc
> +++ b/libitm/beginend.cc
> @@ -55,7 +55,9 @@ static pthread_key_t thr_release_key;
>  static pthread_once_t thr_release_once = PTHREAD_ONCE_INIT;
>  
>  // See gtm_thread::begin_transaction.
> -uint32_t GTM::htm_fastpath = 0;
> +uint32_t GTM::htm_fastpath asm("__gtm_htm_fastpath") = 0;
> +
> +uint32_t *GTM::global_lock asm("__gtm_global_lock");

Rearrange the serial lock's fields (see below).

>  /* Allocate a transaction structure.  */
>  void *
> @@ -187,27 +189,13 @@ GTM::gtm_thread::begin_transaction (uint32_t prop, const gtm_jmpbuf *jb)
>    // indeed in serial mode, and HW transactions should never need serial mode
>    // for any internal changes (e.g., they never abort visibly to the STM code
>    // and thus do not trigger the standard retry handling).

Add a comment as second paragraph in the previous block of comments,
briefly explaining when pr_tryHTM is set (ie, under which condition),
and that this happens in the assembler implementation of
_ITM_beginTransaction.

> -  if (likely(htm_fastpath && (prop & pr_hasNoAbort)))
> +  // pr_tryHTM can be set by an assembler fast path when it already tried
> +  // a hardware transaction once. In this case we do one retry less.
> +  if (likely(prop & pr_tryHTM))
>      {
> -      for (uint32_t t = htm_fastpath; t; t--)
> +      // One transaction has been already tried by the assembler fast path.
> +      for (uint32_t t = htm_fastpath - 1; t; t--)
>  	{
> -	  uint32_t ret = htm_begin();
> -	  if (htm_begin_success(ret))
> -	    {
> -	      // We are executing a transaction now.
> -	      // Monitor the writer flag in the serial-mode lock, and abort
> -	      // if there is an active or waiting serial-mode transaction.
> -	      if (unlikely(serial_lock.is_write_locked()))
> -		htm_abort();
> -	      else
> -		// We do not need to set a_saveLiveVariables because of HTM.
> -		return (prop & pr_uninstrumentedCode) ?
> -		    a_runUninstrumentedCode : a_runInstrumentedCode;
> -	    }
> -	  // The transaction has aborted.  Don't retry if it's unlikely that
> -	  // retrying the transaction will be successful.
> -	  if (!htm_abort_should_retry(ret))
> -	    break;
>  	  // Wait until any concurrent serial-mode transactions have finished.
>  	  // This is an empty critical section, but won't be elided.
>  	  if (serial_lock.is_write_locked())
> @@ -225,6 +213,24 @@ GTM::gtm_thread::begin_transaction (uint32_t prop, const gtm_jmpbuf *jb)
>  	      // we have retried so often that we should go serial to avoid
>  	      // starvation.
>  	    }
> +
> +	  uint32_t ret = htm_begin();
> +	  if (htm_begin_success(ret))
> +	    {
> +	      // We are executing a transaction now.
> +	      // Monitor the writer flag in the serial-mode lock, and abort
> +	      // if there is an active or waiting serial-mode transaction.
> +	      if (unlikely(serial_lock.is_write_locked()))
> +		htm_abort();
> +	      else
> +		// We do not need to set a_saveLiveVariables because of HTM.
> +		return (prop & pr_uninstrumentedCode) ?
> +		    a_runUninstrumentedCode : a_runInstrumentedCode;
> +	    }
> +	  // The transaction has aborted.  Don't retry if it's unlikely that
> +	  // retrying the transaction will be successful.
> +	  if (!htm_abort_should_retry(ret))
> +	    break;
>  	}
>      }
>  #endif
> diff --git a/libitm/config/linux/rwlock.h b/libitm/config/linux/rwlock.h
> index f13d287..b27b785 100644
> --- a/libitm/config/linux/rwlock.h
> +++ b/libitm/config/linux/rwlock.h
> @@ -67,6 +67,11 @@ class gtm_rwlock
>      return writers.load (memory_order_relaxed) != 0;
>    }
>  
> +  uint32_t *get_lock_var()
> +  {
> +    return (uint32_t *)&writers;
> +  }
> +

Instead of this, please add a comment that the writers field must remain
at the start of the struct because the assembler HTM fast path will
assume that there is no writer if this field's value is zero.
If you want, also mention that this must remain a POD type (see my other
mail).

>   protected:
>    bool write_lock_generic (gtm_thread *tx);
>  };
> diff --git a/libitm/config/posix/rwlock.h b/libitm/config/posix/rwlock.h
> index 79f1429..4481378 100644
> --- a/libitm/config/posix/rwlock.h
> +++ b/libitm/config/posix/rwlock.h
> @@ -82,6 +82,11 @@ class gtm_rwlock
>      return summary.load (memory_order_relaxed) & (a_writer | w_writer);
>    }
>  
> +  uint32_t *get_lock_var()
> +  {
> +    return (uint32_t *)&summmary;
> +  }
> +

Remove this, move the summary field to the start of the struct, and add
a comment as above.

>   protected:
>    bool write_lock_generic (gtm_thread *tx);
>  };
> diff --git a/libitm/config/x86/sjlj.S b/libitm/config/x86/sjlj.S
> index 4d733f4..4403f89 100644
> --- a/libitm/config/x86/sjlj.S
> +++ b/libitm/config/x86/sjlj.S
> @@ -24,6 +24,7 @@
>  
> 
>  #include "asmcfi.h"
> +#include "config.h"
>  
>  #define CONCAT1(a, b) CONCAT2(a, b)
>  #define CONCAT2(a, b) a ## b
> @@ -52,6 +53,19 @@
>  #  endif
>  #endif
>  
> +/* Should share those somewhere, but they are an ABI anyways, so shouldn't
> +   change. */
> +	

Trailing space here?

> +#define   pr_tryHTM			0x800000
> +#define   pr_uninstrumentedCode         0x0002
> +#define   pr_hasNoAbort			0x0008
> +
> +#define   a_runInstrumentedCode         0x01
> +#define   a_runUninstrumentedCode       0x02
> +
> +#define _XABORT_EXPLICIT    (1 << 0)
> +#define _XABORT_RETRY       (1 << 1)
> +	
>  	.text
>  
>  	.align 4
> @@ -60,6 +74,39 @@
>  SYM(_ITM_beginTransaction):
>  	cfi_startproc
>  #ifdef __x86_64__
> +#ifdef HAVE_AS_RTM
> +	/* HTM fast path.
> +	   Try to run the transaction once here. If it doesn't work
> +	   call the C code which retries a few more times and does other
> +	   things. This is merely a fast path to optimize small transactions,
> +	   anything complicated is done in C++. */

Either use C or C++ everywhere...

> +	cmpl  $0,__gtm_htm_fastpath(%rip)
> +	jz    no_txn
> +	testl $pr_hasNoAbort,%edi
> +	jnz   no_txn
> +	xbegin txn_abort
> +	/* Would be better to have a tool to generate offsetof(). */
> +	movq  __gtm_global_lock(%rip),%rax

Use the serial lock directly.

> +	/* global lock is free? */
> +	cmpl  $0,(%rax)
> +	jnz   1f
> +	/* Everything good. start running transaction */
> +	testl $pr_uninstrumentedCode,%edi	
> +	mov   $a_runInstrumentedCode,%eax
> +	mov   $a_runUninstrumentedCode,%ecx
> +	cmovnz %ecx,%eax
> +	ret
> +	/* Lock is busy: abort */
> +1:	xabort $0xff
> +txn_abort:
> +	/* if an internal abort, but not the xabort above,
> +	   tell the C code to not retry */
> +	testl $(_XABORT_RETRY|_XABORT_EXPLICIT),%eax
> +	jz  no_txn
> +	orl $pr_tryHTM,%edi
> +	/* go on to the C code now for further retries */
> +no_txn:
> +#endif
>  	leaq	8(%rsp), %rax
>  	subq	$56, %rsp
>  	cfi_def_cfa_offset(64)
> diff --git a/libitm/libitm.h b/libitm/libitm.h
> index 436eb94..e654e42 100644
> --- a/libitm/libitm.h
> +++ b/libitm/libitm.h
> @@ -72,7 +72,8 @@ typedef enum
>      inIrrevocableTransaction
>  } _ITM_howExecuting;
>  
> -/* Values to describe properties of code, passed in to beginTransaction */
> +/* Values to describe properties of code, passed in to beginTransaction.
> +   Some values are duplicated in assembler code.  */
>  typedef enum
>  {
>     pr_instrumentedCode		= 0x0001,
> @@ -95,7 +96,9 @@ typedef enum
>     pr_exceptionBlock		= 0x1000,
>     pr_hasElse			= 0x2000,
>     pr_readOnly			= 0x4000,
> -   pr_hasNoSimpleReads		= 0x400000
> +   pr_hasNoSimpleReads		= 0x400000,
> +   /* Not part of he ABI: */

Mention in this comment that along "this bit is used for the HTM fast
path.  See gtm_thread::begin_transaction."

> +   pr_tryHTM			= 0x800000
>  } _ITM_codeProperties;
>  
>  /* Result from startTransaction that describes what actions to take.  */
> diff --git a/libitm/libitm_i.h b/libitm/libitm_i.h
> index 4dfcda9..7b2b73a 100644
> --- a/libitm/libitm_i.h
> +++ b/libitm/libitm_i.h
> @@ -338,7 +338,8 @@ extern gtm_cacheline_mask gtm_mask_stack(gtm_cacheline *, gtm_cacheline_mask);
>  
>  // Control variable for the HTM fastpath that uses serial mode as fallback.
>  // Non-zero if the HTM fastpath is enabled. See gtm_thread::begin_transaction.
> -extern uint32_t htm_fastpath;
> +extern uint32_t htm_fastpath asm("__gtm_htm_fastpath");
> +extern uint32_t *global_lock asm("__gtm_global_lock");
>  
>  } // namespace GTM
>  
> diff --git a/libitm/method-serial.cc b/libitm/method-serial.cc
> index 38857dc..2425690 100644
> --- a/libitm/method-serial.cc
> +++ b/libitm/method-serial.cc
> @@ -222,6 +222,7 @@ struct htm_mg : public method_group
>      // Enable the HTM fastpath if the HW is available.  The fastpath is
>      // initially disabled.
>  #ifdef USE_HTM_FASTPATH
> +    global_lock = gtm_thread::serial_lock.get_lock_var();
>      htm_fastpath = htm_init();
>  #endif
>    }
> @@ -229,6 +230,7 @@ struct htm_mg : public method_group
>    {
>      // Disable the HTM fastpath.
>      htm_fastpath = 0;
> +    global_lock = NULL;
>    }
>  };
>  

Adapt these.

Thanks.


Torvald


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Add faster HTM fastpath for libitm TSX v2
  2013-01-29 13:17 ` Torvald Riegel
@ 2013-01-29 19:19   ` Andi Kleen
  2013-01-30 11:28     ` Torvald Riegel
  0 siblings, 1 reply; 6+ messages in thread
From: Andi Kleen @ 2013-01-29 19:19 UTC (permalink / raw)
  To: Torvald Riegel; +Cc: Andi Kleen, gcc-patches, rth, Andi Kleen

> next time please reply to the points raised in a review if you disagree
> with them, and don't just ignore them.  That speeds up the review.

It was all in the previous email on the topic.

> >  // See gtm_thread::begin_transaction.
> > -uint32_t GTM::htm_fastpath = 0;
> > +uint32_t GTM::htm_fastpath asm("__gtm_htm_fastpath") = 0;
> > +
> > +uint32_t *GTM::global_lock asm("__gtm_global_lock");
> 
> Rearrange the serial lock's fields (see below).

To my knowledge C++ classes have no guaranteed layout,
so that's not safe because there is no guarantee where
the vtable pointers are. it would be only with plain old structs.

+  // pr_tryHTM can be set by an assembler fast path when it already
tried
+  // a hardware transaction once. In this case we do one retry less.

pr_tryHTM is already documented.

-Andi

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Add faster HTM fastpath for libitm TSX v2
  2013-01-29 19:19   ` Andi Kleen
@ 2013-01-30 11:28     ` Torvald Riegel
  0 siblings, 0 replies; 6+ messages in thread
From: Torvald Riegel @ 2013-01-30 11:28 UTC (permalink / raw)
  To: Andi Kleen; +Cc: gcc-patches, rth, Andi Kleen

On Tue, 2013-01-29 at 20:19 +0100, Andi Kleen wrote:
> > next time please reply to the points raised in a review if you disagree
> > with them, and don't just ignore them.  That speeds up the review.
> 
> It was all in the previous email on the topic.

This v2 patch did not incorporate all the changes that I requested, nor
did you explicitly object to those you didn't incorporate.  Maybe you
don't care what's in libitm's comments, but I do.

> > >  // See gtm_thread::begin_transaction.
> > > -uint32_t GTM::htm_fastpath = 0;
> > > +uint32_t GTM::htm_fastpath asm("__gtm_htm_fastpath") = 0;
> > > +
> > > +uint32_t *GTM::global_lock asm("__gtm_global_lock");
> > 
> > Rearrange the serial lock's fields (see below).
> 
> To my knowledge C++ classes have no guaranteed layout,
> so that's not safe because there is no guarantee where
> the vtable pointers are. it would be only with plain old structs.

And gtm_rwlock doesn't have any virtual members, right?  So it's like a
plain old struct (ie, it's a POD type).

> +  // pr_tryHTM can be set by an assembler fast path when it already
> tried
> +  // a hardware transaction once. In this case we do one retry less.
> 
> pr_tryHTM is already documented.

And I asked you to simply reference this in a comment.  What's the
problem with that?  I do not want people working on libitm to have to
grep through the code for uses of pr_tryHTM and look for comments that
might be related to it just because you don't want to put a simple
reference into the comment at the declaration of pr_tryHTM.  Can't you
see that adding the reference is just plain useful for everybody else?


Torvald

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Add faster HTM fastpath for libitm TSX v2
  2013-01-25 13:38 Uros Bizjak
@ 2013-01-25 15:54 ` Andi Kleen
  0 siblings, 0 replies; 6+ messages in thread
From: Andi Kleen @ 2013-01-25 15:54 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches, Andi Kleen, Torvald Riegel, Richard Henderson

> Probably the attached (RFC) patch can be useful in this case. The
> patch allows to specify the label for xbegin, so it is possible to
> implement code like following (non-sensical) example:

It can be actually implemented using asm goto. I have some macros
for this.  And the tree optimizers should even support it.

#define XBEGIN(label)   \
     asm volatile goto(".byte 0xc7,0xf8 ; .long %l0-1f\n1:" ::: "eax" :
label)
#define XEND()    asm volatile(".byte 0x0f,0x01,0xd5")
#define XFAIL(label) label: asm volatile("" ::: "eax")
#define XFAIL_STATUS(label, status) label: asm volatile("" : "=a"
(status))

But the assembler code is still needed for this because the non TSX path needs
to save all registers (it's like a setjmp/longjmp), and that cannot 
be implemented in C.

The goal of the assembler code was not to use the label, but to move
the initial transaction before the setjmp code.

-Andi

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Add faster HTM fastpath for libitm TSX v2
@ 2013-01-25 13:38 Uros Bizjak
  2013-01-25 15:54 ` Andi Kleen
  0 siblings, 1 reply; 6+ messages in thread
From: Uros Bizjak @ 2013-01-25 13:38 UTC (permalink / raw)
  To: gcc-patches; +Cc: Andi Kleen, Torvald Riegel, Richard Henderson

[-- Attachment #1: Type: text/plain, Size: 1492 bytes --]

Hello!

> The libitm TSX hardware transaction fast path currently does quite a bit of
> unnecessary work (saving registers etc.) before even trying to start
> a hardware transaction. This patch moves the initial attempt at a
> transaction early into the assembler stub. Complicated work like retries
> is still done in C. So this is essentially a fast path for the fast
> path.
>
> The assembler code doesn't understand the layout of "serial_lock", but
> it needs to check that serial_lock is free. We export just the lock
> variable as a separate pointer for this.

Probably the attached (RFC) patch can be useful in this case. The
patch allows to specify the label for xbegin, so it is possible to
implement code like following (non-sensical) example:

--cut here--
extern int lock1;
extern int lock2;

int test ()
{
  register unsigned int res asm ("eax");

  __builtin_ia32_xbegin_label (&&__txn_abort);

  lock1 = 0;

 __txn_abort:
  if (res & 0x10)
    lock2 = 1;

  return 0;
}
--cut here--

gcc -O2 -mrtm:

test:
        xbegin  .L2
        movl    $0, lock1(%rip)
.L2:
        testb   $16, %al
        je      .L3
        movl    $1, lock2(%rip)
.L3:
        xorl    %eax, %eax
        ret

Please note that the edge from __builtin_ia32_xbegin_label is not
known to tree optimizers, so someone familiar with this part of the
compiler should enhance/fix the patch. With the (fixed) patch, you can
implement your assembly using plain C++, probably also using ifunc
relocations.

Uros.

[-- Attachment #2: p.diff.txt --]
[-- Type: text/plain, Size: 2010 bytes --]

Index: i386.md
===================================================================
--- i386.md	(revision 195460)
+++ i386.md	(working copy)
@@ -18093,6 +18093,17 @@
   DONE;
 })
 
+(define_expand "xbegin_label"
+  [(parallel
+    [(set (pc)
+	  (if_then_else (ne (unspec [(const_int 0)] UNSPEC_XBEGIN_ABORT)
+			    (const_int 0))
+			(match_operand 1)
+			(pc)))
+     (set (match_operand:SI 0 "register_operand")
+	  (unspec_volatile:SI [(match_dup 0)] UNSPECV_XBEGIN))])]
+  "TARGET_RTM")
+
 (define_insn "xbegin_1"
   [(set (pc)
 	(if_then_else (ne (unspec [(const_int 0)] UNSPEC_XBEGIN_ABORT)
Index: i386.c
===================================================================
--- i386.c	(revision 195460)
+++ i386.c	(working copy)
@@ -26570,6 +26570,7 @@ enum ix86_builtins
 
   /* RTM */
   IX86_BUILTIN_XBEGIN,
+  IX86_BUILTIN_XBEGIN_LABEL,
   IX86_BUILTIN_XEND,
   IX86_BUILTIN_XABORT,
   IX86_BUILTIN_XTEST,
@@ -26942,6 +26943,7 @@ static const struct builtin_description bdesc_spec
 
   /* RTM */
   { OPTION_MASK_ISA_RTM, CODE_FOR_xbegin, "__builtin_ia32_xbegin", IX86_BUILTIN_XBEGIN, UNKNOWN, (int) UNSIGNED_FTYPE_VOID },
+  { OPTION_MASK_ISA_RTM, CODE_FOR_xbegin_label, "__builtin_ia32_xbegin_label", IX86_BUILTIN_XBEGIN_LABEL, UNKNOWN, (int) VOID_FTYPE_PCVOID },
   { OPTION_MASK_ISA_RTM, CODE_FOR_xend, "__builtin_ia32_xend", IX86_BUILTIN_XEND, UNKNOWN, (int) VOID_FTYPE_VOID },
   { OPTION_MASK_ISA_RTM, CODE_FOR_xtest, "__builtin_ia32_xtest", IX86_BUILTIN_XTEST, UNKNOWN, (int) INT_FTYPE_VOID },
 };
@@ -32239,6 +32241,17 @@ addcarryx:
 
       return target;
 
+    case IX86_BUILTIN_XBEGIN_LABEL:
+      arg0 = CALL_EXPR_ARG (exp, 0);
+      op0 = expand_normal (arg0);
+      if (GET_CODE (op0) != LABEL_REF)
+	{
+	  error ("the xbegin's argument must be a label");
+	  return const0_rtx;
+	}
+      emit_jump_insn (gen_xbegin_label (gen_rtx_REG (SImode, AX_REG), op0));
+      return 0;
+
     case IX86_BUILTIN_XABORT:
       icode = CODE_FOR_xabort;
       arg0 = CALL_EXPR_ARG (exp, 0);

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-01-30 11:28 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-01-25  3:31 [PATCH] Add faster HTM fastpath for libitm TSX v2 Andi Kleen
2013-01-29 13:17 ` Torvald Riegel
2013-01-29 19:19   ` Andi Kleen
2013-01-30 11:28     ` Torvald Riegel
2013-01-25 13:38 Uros Bizjak
2013-01-25 15:54 ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).