public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: [Patch, i386] Avoid LCP stalls (issue5975045)
@ 2012-04-05  7:41 Uros Bizjak
  2012-04-05 13:50 ` Teresa Johnson
  0 siblings, 1 reply; 14+ messages in thread
From: Uros Bizjak @ 2012-04-05  7:41 UTC (permalink / raw)
  To: gcc-patches; +Cc: Teresa Johnson, H.J. Lu, reply

Hello!

> New patch to avoid LCP stalls based on feedback from earlier patch. I modified
> H.J.'s old patch to perform the peephole2 to split immediate moves to HImode
> memory. This is now enabled for Core2, Corei7 and Generic.

> 2012-04-04   Teresa Johnson  <tejohnson@google.com>
>
> 	* config/i386/i386.h (ix86_tune_indices): Add
> 	X86_TUNE_LCP_STALL.
>	* config/i386/i386.md (move immediate to memory peephole2):
> 	Add cases for HImode move when LCP stall avoidance is needed.
> 	* config/i386/i386.c (initial_ix86_tune_features): Initialize
> 	X86_TUNE_LCP_STALL entry.

   "optimize_insn_for_speed_p ()
-   && !TARGET_USE_MOV0
-   && TARGET_SPLIT_LONG_MOVES
-   && get_attr_length (insn) >= ix86_cur_cost ()->large_insn
+   && ((TARGET_LCP_STALL
+       && GET_MODE (operands[0]) == HImode)
+       || (!TARGET_USE_MOV0
+          && TARGET_SPLIT_LONG_MOVES
+          && get_attr_length (insn) >= ix86_cur_cost ()->large_insn))

Please change added condition to:

&& ((<MODE>mode == HImode
        && TARGET_LCP_STALL)
       || (...))

Please also add H.J. as co-author of this change to ChangeLog.

OK with these changes.

Thanks,
Uros.

^ permalink raw reply	[flat|nested] 14+ messages in thread
* [Patch, i386] Avoid LCP stalls (issue5975045)
@ 2012-04-05  0:07 Teresa Johnson
  2012-04-05  0:39 ` H.J. Lu
  0 siblings, 1 reply; 14+ messages in thread
From: Teresa Johnson @ 2012-04-05  0:07 UTC (permalink / raw)
  To: reply, gcc-patches

New patch to avoid LCP stalls based on feedback from earlier patch. I modified
H.J.'s old patch to perform the peephole2 to split immediate moves to HImode
memory. This is now enabled for Core2, Corei7 and Generic.

I verified that this enables the splitting to occur in the case that originally
motivated the optimization. If we subsequently find situations where LCP stalls
are hurting performance but an extra register is required to perform the
splitting, then we can revisit whether this should be performed earlier.

I also measured SPEC 2000/2006 performance using Generic64 on an AMD Opteron
and the results were neutral.

Bootstrapped and tested on x86_64-unknown-linux-gnu. Is this ok for trunk?

Thanks,
Teresa

2012-04-04   Teresa Johnson  <tejohnson@google.com>

	* config/i386/i386.h (ix86_tune_indices): Add
	X86_TUNE_LCP_STALL.
	* config/i386/i386.md (move immediate to memory peephole2):
	Add cases for HImode move when LCP stall avoidance is needed.
	* config/i386/i386.c (initial_ix86_tune_features): Initialize
	X86_TUNE_LCP_STALL entry.

Index: config/i386/i386.h
===================================================================
--- config/i386/i386.h	(revision 185920)
+++ config/i386/i386.h	(working copy)
@@ -262,6 +262,7 @@ enum ix86_tune_indices {
   X86_TUNE_MOVX,
   X86_TUNE_PARTIAL_REG_STALL,
   X86_TUNE_PARTIAL_FLAG_REG_STALL,
+  X86_TUNE_LCP_STALL,
   X86_TUNE_USE_HIMODE_FIOP,
   X86_TUNE_USE_SIMODE_FIOP,
   X86_TUNE_USE_MOV0,
@@ -340,6 +341,8 @@ extern unsigned char ix86_tune_features[X86_TUNE_L
 #define TARGET_PARTIAL_REG_STALL ix86_tune_features[X86_TUNE_PARTIAL_REG_STALL]
 #define TARGET_PARTIAL_FLAG_REG_STALL \
 	ix86_tune_features[X86_TUNE_PARTIAL_FLAG_REG_STALL]
+#define TARGET_LCP_STALL \
+	ix86_tune_features[X86_TUNE_LCP_STALL]
 #define TARGET_USE_HIMODE_FIOP	ix86_tune_features[X86_TUNE_USE_HIMODE_FIOP]
 #define TARGET_USE_SIMODE_FIOP	ix86_tune_features[X86_TUNE_USE_SIMODE_FIOP]
 #define TARGET_USE_MOV0		ix86_tune_features[X86_TUNE_USE_MOV0]
Index: config/i386/i386.md
===================================================================
--- config/i386/i386.md	(revision 185920)
+++ config/i386/i386.md	(working copy)
@@ -16977,9 +16977,11 @@
    (set (match_operand:SWI124 0 "memory_operand")
         (const_int 0))]
   "optimize_insn_for_speed_p ()
-   && !TARGET_USE_MOV0
-   && TARGET_SPLIT_LONG_MOVES
-   && get_attr_length (insn) >= ix86_cur_cost ()->large_insn
+   && ((TARGET_LCP_STALL
+       && GET_MODE (operands[0]) == HImode)
+       || (!TARGET_USE_MOV0
+          && TARGET_SPLIT_LONG_MOVES
+          && get_attr_length (insn) >= ix86_cur_cost ()->large_insn))
    && peep2_regno_dead_p (0, FLAGS_REG)"
   [(parallel [(set (match_dup 2) (const_int 0))
 	      (clobber (reg:CC FLAGS_REG))])
@@ -16991,8 +16993,10 @@
    (set (match_operand:SWI124 0 "memory_operand")
         (match_operand:SWI124 1 "immediate_operand"))]
   "optimize_insn_for_speed_p ()
-   && TARGET_SPLIT_LONG_MOVES
-   && get_attr_length (insn) >= ix86_cur_cost ()->large_insn"
+   && ((TARGET_LCP_STALL
+       && GET_MODE (operands[0]) == HImode)
+       || (TARGET_SPLIT_LONG_MOVES
+          && get_attr_length (insn) >= ix86_cur_cost ()->large_insn))"
   [(set (match_dup 2) (match_dup 1))
    (set (match_dup 0) (match_dup 2))])
 
Index: config/i386/i386.c
===================================================================
--- config/i386/i386.c	(revision 185920)
+++ config/i386/i386.c	(working copy)
@@ -1964,6 +1964,10 @@ static unsigned int initial_ix86_tune_features[X86
   /* X86_TUNE_PARTIAL_FLAG_REG_STALL */
   m_CORE2I7 | m_GENERIC,
 
+  /* X86_TUNE_LCP_STALL: Avoid an expensive length-changing prefix stall
+   * on 16-bit immediate moves into memory on Core2 and Corei7.  */
+  m_CORE2I7 | m_GENERIC,
+
   /* X86_TUNE_USE_HIMODE_FIOP */
   m_386 | m_486 | m_K6_GEODE,
 

--
This patch is available for review at http://codereview.appspot.com/5975045

^ permalink raw reply	[flat|nested] 14+ messages in thread
* [Patch, i386] Avoid LCP stalls (issue5975045)
@ 2012-03-30 15:04 Teresa Johnson
  2012-03-30 15:12 ` Richard Henderson
  0 siblings, 1 reply; 14+ messages in thread
From: Teresa Johnson @ 2012-03-30 15:04 UTC (permalink / raw)
  To: reply, gcc-patches

Minor update to patch to remove unnecessary check in new movhi_imm_internal
define_insn.

Retested successfully.

Teresa

2012-03-29   Teresa Johnson  <tejohnson@google.com>

        * config/i386/i386.h (ix86_tune_indices): Add
        X86_TUNE_LCP_STALL.
        * config/i386/i386.md (movhi_internal): Split to
        movhi_internal and movhi_imm_internal.
        * config/i386/i386.c (initial_ix86_tune_features): Initialize
        X86_TUNE_LCP_STALL entry.


Index: config/i386/i386.h
===================================================================
--- config/i386/i386.h	(revision 185920)
+++ config/i386/i386.h	(working copy)
@@ -262,6 +262,7 @@ enum ix86_tune_indices {
   X86_TUNE_MOVX,
   X86_TUNE_PARTIAL_REG_STALL,
   X86_TUNE_PARTIAL_FLAG_REG_STALL,
+  X86_TUNE_LCP_STALL,
   X86_TUNE_USE_HIMODE_FIOP,
   X86_TUNE_USE_SIMODE_FIOP,
   X86_TUNE_USE_MOV0,
@@ -340,6 +341,8 @@ extern unsigned char ix86_tune_features[X86_TUNE_L
 #define TARGET_PARTIAL_REG_STALL ix86_tune_features[X86_TUNE_PARTIAL_REG_STALL]
 #define TARGET_PARTIAL_FLAG_REG_STALL \
 	ix86_tune_features[X86_TUNE_PARTIAL_FLAG_REG_STALL]
+#define TARGET_LCP_STALL \
+	ix86_tune_features[X86_TUNE_LCP_STALL]
 #define TARGET_USE_HIMODE_FIOP	ix86_tune_features[X86_TUNE_USE_HIMODE_FIOP]
 #define TARGET_USE_SIMODE_FIOP	ix86_tune_features[X86_TUNE_USE_SIMODE_FIOP]
 #define TARGET_USE_MOV0		ix86_tune_features[X86_TUNE_USE_MOV0]
Index: config/i386/i386.md
===================================================================
--- config/i386/i386.md	(revision 185920)
+++ config/i386/i386.md	(working copy)
@@ -2262,9 +2262,19 @@
 	   ]
 	   (const_string "SI")))])
 
+(define_insn "*movhi_imm_internal"
+  [(set (match_operand:HI 0 "memory_operand" "=m")
+        (match_operand:HI 1 "immediate_operand" "n"))]
+  "!TARGET_LCP_STALL"
+{
+  return "mov{w}\t{%1, %0|%0, %1}";
+}
+  [(set (attr "type") (const_string "imov"))
+   (set (attr "mode") (const_string "HI"))])
+
 (define_insn "*movhi_internal"
   [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r,r,m")
-	(match_operand:HI 1 "general_operand" "r,rn,rm,rn"))]
+	(match_operand:HI 1 "general_operand" "r,rn,rm,r"))]
   "!(MEM_P (operands[0]) && MEM_P (operands[1]))"
 {
   switch (get_attr_type (insn))
Index: config/i386/i386.c
===================================================================
--- config/i386/i386.c	(revision 185920)
+++ config/i386/i386.c	(working copy)
@@ -1964,6 +1964,11 @@ static unsigned int initial_ix86_tune_features[X86
   /* X86_TUNE_PARTIAL_FLAG_REG_STALL */
   m_CORE2I7 | m_GENERIC,
 
+  /* X86_TUNE_LCP_STALL: Avoid an expensive length-changing prefix stall
+   * on 16-bit immediate moves into memory on Core2 and Corei7,
+   * which may also affect AMD implementations.  */
+  m_CORE2I7 | m_GENERIC | m_AMD_MULTIPLE,
+
   /* X86_TUNE_USE_HIMODE_FIOP */
   m_386 | m_486 | m_K6_GEODE,
 

--
This patch is available for review at http://codereview.appspot.com/5975045

^ permalink raw reply	[flat|nested] 14+ messages in thread
* [Patch, i386] Avoid LCP stalls (issue5975045)
@ 2012-03-30 14:19 Teresa Johnson
  2012-03-30 14:26 ` Teresa Johnson
  2012-03-30 15:18 ` Jan Hubicka
  0 siblings, 2 replies; 14+ messages in thread
From: Teresa Johnson @ 2012-03-30 14:19 UTC (permalink / raw)
  To: reply, gcc-patches

This patch addresses instructions that incur expensive length-changing prefix (LCP) stalls
on some x86-64 implementations, notably Core2 and Corei7. Specifically, a move of
a 16-bit constant into memory requires a length-changing prefix and can incur significant
penalties. The attached patch avoids this by forcing such instructions to be split into
two: a move of the corresponding 32-bit constant into a register, and a move of the
register's lower 16 bits into memory.

Bootstrapped and tested on x86_64-unknown-linux-gnu. Is this ok for trunk?

Thanks,
Teresa

2012-03-29   Teresa Johnson  <tejohnson@google.com>

	* config/i386/i386.h (ix86_tune_indices): Add
	X86_TUNE_LCP_STALL.
	* config/i386/i386.md (movhi_internal): Split to
	movhi_internal and movhi_imm_internal.
	* config/i386/i386.c (initial_ix86_tune_features): Initialize
	X86_TUNE_LCP_STALL entry.

Index: config/i386/i386.h
===================================================================
--- config/i386/i386.h	(revision 185920)
+++ config/i386/i386.h	(working copy)
@@ -262,6 +262,7 @@ enum ix86_tune_indices {
   X86_TUNE_MOVX,
   X86_TUNE_PARTIAL_REG_STALL,
   X86_TUNE_PARTIAL_FLAG_REG_STALL,
+  X86_TUNE_LCP_STALL,
   X86_TUNE_USE_HIMODE_FIOP,
   X86_TUNE_USE_SIMODE_FIOP,
   X86_TUNE_USE_MOV0,
@@ -340,6 +341,8 @@ extern unsigned char ix86_tune_features[X86_TUNE_L
 #define TARGET_PARTIAL_REG_STALL ix86_tune_features[X86_TUNE_PARTIAL_REG_STALL]
 #define TARGET_PARTIAL_FLAG_REG_STALL \
 	ix86_tune_features[X86_TUNE_PARTIAL_FLAG_REG_STALL]
+#define TARGET_LCP_STALL \
+	ix86_tune_features[X86_TUNE_LCP_STALL]
 #define TARGET_USE_HIMODE_FIOP	ix86_tune_features[X86_TUNE_USE_HIMODE_FIOP]
 #define TARGET_USE_SIMODE_FIOP	ix86_tune_features[X86_TUNE_USE_SIMODE_FIOP]
 #define TARGET_USE_MOV0		ix86_tune_features[X86_TUNE_USE_MOV0]
Index: config/i386/i386.md
===================================================================
--- config/i386/i386.md	(revision 185920)
+++ config/i386/i386.md	(working copy)
@@ -2262,9 +2262,19 @@
 	   ]
 	   (const_string "SI")))])
 
+(define_insn "*movhi_imm_internal"
+  [(set (match_operand:HI 0 "memory_operand" "=m")
+        (match_operand:HI 1 "immediate_operand" "n"))]
+  "!TARGET_LCP_STALL && !(MEM_P (operands[0]) && MEM_P (operands[1]))"
+{
+  return "mov{w}\t{%1, %0|%0, %1}";
+}
+  [(set (attr "type") (const_string "imov"))
+   (set (attr "mode") (const_string "HI"))])
+
 (define_insn "*movhi_internal"
   [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r,r,m")
-	(match_operand:HI 1 "general_operand" "r,rn,rm,rn"))]
+	(match_operand:HI 1 "general_operand" "r,rn,rm,r"))]
   "!(MEM_P (operands[0]) && MEM_P (operands[1]))"
 {
   switch (get_attr_type (insn))
Index: config/i386/i386.c
===================================================================
--- config/i386/i386.c	(revision 185920)
+++ config/i386/i386.c	(working copy)
@@ -1964,6 +1964,11 @@ static unsigned int initial_ix86_tune_features[X86
   /* X86_TUNE_PARTIAL_FLAG_REG_STALL */
   m_CORE2I7 | m_GENERIC,
 
+  /* X86_TUNE_LCP_STALL: Avoid an expensive length-changing prefix stall
+   * on 16-bit immediate moves into memory on Core2 and Corei7,
+   * which may also affect AMD implementations.  */
+  m_CORE2I7 | m_GENERIC | m_AMD_MULTIPLE,
+
   /* X86_TUNE_USE_HIMODE_FIOP */
   m_386 | m_486 | m_K6_GEODE,
 

--
This patch is available for review at http://codereview.appspot.com/5975045

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2012-04-05 13:50 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-05  7:41 [Patch, i386] Avoid LCP stalls (issue5975045) Uros Bizjak
2012-04-05 13:50 ` Teresa Johnson
  -- strict thread matches above, loose matches on Subject: below --
2012-04-05  0:07 Teresa Johnson
2012-04-05  0:39 ` H.J. Lu
2012-04-05  4:57   ` Teresa Johnson
2012-03-30 15:04 Teresa Johnson
2012-03-30 15:12 ` Richard Henderson
2012-03-30 15:19   ` Richard Henderson
2012-03-30 16:23     ` H.J. Lu
2012-03-30 16:33       ` Teresa Johnson
2012-03-30 20:37         ` Jan Hubicka
2012-03-30 14:19 Teresa Johnson
2012-03-30 14:26 ` Teresa Johnson
2012-03-30 15:18 ` Jan Hubicka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).