public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed
* [PATCH] Separate macro-fusible instructions from not for -mbranches-within-32B-boundaries
@ 2020-02-28  8:56 Hongtao Liu
  2020-02-28  8:57 ` Hongtao Liu
  0 siblings, 1 reply; 9+ messages in thread
From: Hongtao Liu @ 2020-02-28  8:56 UTC (permalink / raw)
  To: H. J. Lu; +Cc: binutils

Hi:
    This is subsequent patch for JCC Code Erratum.
    Refer to http://sourceware-org.1504.n7.nabble.com/PATCH-0-5-i386-Optimize-for-Jump-Conditional-Code-Erratum-tt598005.html#none

    According to intel SDM manual, not all compare flag-modifying
instructions are marcro-fusible with subsequent jcc instruction. For
those not, -mbranches-within-32B-boundaries need't align them as
FUSED_JCC_PADDING, only jcc itself need to be aligned. Here are 2
restrictions which separate macro-fusible instruction from not

    Restriction 1:
    If TEST/AND/CMP/ADD/SUB/INC/DEC is one of the following format:

       cmp m, imm
       add m, imm
       sub m, imm
       test m, imm
       and m, imm
       inc m
       dec m

    it is unfusible with any jcc instructions.

    Restriction 2:
    /* Table 3-2. Macro-Fusible Instructions in Haswell Microarchitecture
    ---------------------------------------------------------------------
    |   JCC   | ADD/SUB/CMP | INC/DEC | TEST/AND |
    | ------  | ----------- | ------- | -------- |
    |   Jo    |      N      |    N    |     Y    |
    |   Jno   |      N      |    N    |     Y    |
    |  Jc/Jb  |      Y      |    N    |     Y    |
    | Jae/Jnb |      Y      |    N    |     Y    |
    |  Je/Jz  |      Y      |    Y    |     Y    |
    | Jne/Jnz |      Y      |    Y    |     Y    |
    | Jna/Jbe |      Y      |    N    |     Y    |
    | Ja/Jnbe |      Y      |    N    |     Y    |
    |   Js    |      N      |    N    |     Y    |
    |   Jns   |      N      |    N    |     Y    |
    |  Jp/Jpe |      N      |    N    |     Y    |
    | Jnp/Jpo |      N      |    N    |     Y    |
    | Jl/Jnge |      Y      |    Y    |     Y    |
    | Jge/Jnl |      Y      |    Y    |     Y    |
    | Jle/Jng |      Y      |    Y    |     Y    |
    | Jg/Jnle |      Y      |    Y    |     Y    |


Changelog

            * gas/config/tc-i386.h (i386_tc_frag_data): Add member mf_type.
            (TC_FRAG_INIT): Init mf_type.
            * gas/config/tc-i386.c (enum mf_jcc_kind): New enum.
            (enum mf_cmp_kind): Ditto.
            (maybe_fused_with_jcc_p): Add argument mf_cmp_p to get
            mf_type of corresponding instructons, exclude unfusible
            instructions.
            (add_fused_jcc_padding_frag_p): Likewise.
            (add_branch_padding_frag_p): Likewise.
            (output_insn): Record mf_type for corresponding instructions.
            (i386_macro_fusible_p): New function.
            (i386_next_fusible_jcc_frag): Rename from i386_next_jcc_frag,
            add argument cmp_fragP to return next fusible jcc frag only.
            (i386_classify_machine_dependant_frag): Seperate macro-fusible
            instructions from condition jump.
            * gas/testsuite/gas/i386/align-branch-9.s: New file.
            * gas/testsuite/gas/i386/align-branch-9.d: Ditto.
            * gas/testsuite/gas/i386/x86-64-align-branch-9.s: Ditto.
            * gas/testsuite/gas/i386/x86-64-align-branch-9.d: Ditto.
            * gas/testsuite/gas/i386/i386.exp: Run new tests.

-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Separate macro-fusible instructions from not for -mbranches-within-32B-boundaries
  2020-02-28  8:56 [PATCH] Separate macro-fusible instructions from not for -mbranches-within-32B-boundaries Hongtao Liu
@ 2020-02-28  8:57 ` Hongtao Liu
  2020-02-28 11:35   ` Jan Beulich
  0 siblings, 1 reply; 9+ messages in thread
From: Hongtao Liu @ 2020-02-28  8:57 UTC (permalink / raw)
  To: H. J. Lu; +Cc: binutils

[-- Attachment #1: Type: text/plain, Size: 3330 bytes --]

On Fri, Feb 28, 2020 at 4:55 PM Hongtao Liu <crazylht@gmail.com> wrote:
>
> Hi:
>     This is subsequent patch for JCC Code Erratum.
>     Refer to http://sourceware-org.1504.n7.nabble.com/PATCH-0-5-i386-Optimize-for-Jump-Conditional-Code-Erratum-tt598005.html#none
>
>     According to intel SDM manual, not all compare flag-modifying
> instructions are marcro-fusible with subsequent jcc instruction. For
> those not, -mbranches-within-32B-boundaries need't align them as
> FUSED_JCC_PADDING, only jcc itself need to be aligned. Here are 2
> restrictions which separate macro-fusible instruction from not
>
>     Restriction 1:
>     If TEST/AND/CMP/ADD/SUB/INC/DEC is one of the following format:
>
>        cmp m, imm
>        add m, imm
>        sub m, imm
>        test m, imm
>        and m, imm
>        inc m
>        dec m
>
>     it is unfusible with any jcc instructions.
>
>     Restriction 2:
>     /* Table 3-2. Macro-Fusible Instructions in Haswell Microarchitecture
>     ---------------------------------------------------------------------
>     |   JCC   | ADD/SUB/CMP | INC/DEC | TEST/AND |
>     | ------  | ----------- | ------- | -------- |
>     |   Jo    |      N      |    N    |     Y    |
>     |   Jno   |      N      |    N    |     Y    |
>     |  Jc/Jb  |      Y      |    N    |     Y    |
>     | Jae/Jnb |      Y      |    N    |     Y    |
>     |  Je/Jz  |      Y      |    Y    |     Y    |
>     | Jne/Jnz |      Y      |    Y    |     Y    |
>     | Jna/Jbe |      Y      |    N    |     Y    |
>     | Ja/Jnbe |      Y      |    N    |     Y    |
>     |   Js    |      N      |    N    |     Y    |
>     |   Jns   |      N      |    N    |     Y    |
>     |  Jp/Jpe |      N      |    N    |     Y    |
>     | Jnp/Jpo |      N      |    N    |     Y    |
>     | Jl/Jnge |      Y      |    Y    |     Y    |
>     | Jge/Jnl |      Y      |    Y    |     Y    |
>     | Jle/Jng |      Y      |    Y    |     Y    |
>     | Jg/Jnle |      Y      |    Y    |     Y    |
>
>
> Changelog
>
>             * gas/config/tc-i386.h (i386_tc_frag_data): Add member mf_type.
>             (TC_FRAG_INIT): Init mf_type.
>             * gas/config/tc-i386.c (enum mf_jcc_kind): New enum.
>             (enum mf_cmp_kind): Ditto.
>             (maybe_fused_with_jcc_p): Add argument mf_cmp_p to get
>             mf_type of corresponding instructons, exclude unfusible
>             instructions.
>             (add_fused_jcc_padding_frag_p): Likewise.
>             (add_branch_padding_frag_p): Likewise.
>             (output_insn): Record mf_type for corresponding instructions.
>             (i386_macro_fusible_p): New function.
>             (i386_next_fusible_jcc_frag): Rename from i386_next_jcc_frag,
>             add argument cmp_fragP to return next fusible jcc frag only.
>             (i386_classify_machine_dependant_frag): Seperate macro-fusible
>             instructions from condition jump.
>             * gas/testsuite/gas/i386/align-branch-9.s: New file.
>             * gas/testsuite/gas/i386/align-branch-9.d: Ditto.
>             * gas/testsuite/gas/i386/x86-64-align-branch-9.s: Ditto.
>             * gas/testsuite/gas/i386/x86-64-align-branch-9.d: Ditto.
>             * gas/testsuite/gas/i386/i386.exp: Run new tests.
>
> --
> BR,
> Hongtao

Add patch.

-- 
BR,
Hongtao

[-- Attachment #2: 0001-According-to-intel-SDM-manual-not-all-compare-flag-m.patch --]
[-- Type: application/octet-stream, Size: 24193 bytes --]

From 656e46d54db13e3ab3bd71946aa85eee30fc9adc Mon Sep 17 00:00:00 2001
From: liuhongt <hongtao.liu@intel.com>
Date: Wed, 26 Feb 2020 14:57:04 +0800
Subject: [PATCH] According to intel SDM manual, not all compare flag-modifying
 instructions are marcro-fusible with subsequent jcc instruction. For those
 not, -mbranches-within-32B-boundaries need't align them as FUSED_JCC_PADDING,
 only jcc itself need to be aligned.

Here are 2 restrictions which separate macro-fusible instruction from not

Restriction 1:
If TEST/AND/CMP/ADD/SUB/INC/DEC is one of the following format:

   cmp m, imm
   add m, imm
   sub m, imm
   test m, imm
   and m, imm
   inc m
   dec m

it is unfusible with any jcc instruction.

Restriction 2:
/* Table 3-2. Macro-Fusible Instructions in Haswell Microarchitecture
---------------------------------------------------------------------
|   JCC   | ADD/SUB/CMP | INC/DEC | TEST/AND |
| ------  | ----------- | ------- | -------- |
|   Jo    |      N      |    N    |     Y    |
|   Jno   |      N      |    N    |     Y    |
|  Jc/Jb  |      Y      |    N    |     Y    |
| Jae/Jnb |      Y      |    N    |     Y    |
|  Je/Jz  |      Y      |    Y    |     Y    |
| Jne/Jnz |      Y      |    Y    |     Y    |
| Jna/Jbe |      Y      |    N    |     Y    |
| Ja/Jnbe |      Y      |    N    |     Y    |
|   Js    |      N      |    N    |     Y    |
|   Jns   |      N      |    N    |     Y    |
|  Jp/Jpe |      N      |    N    |     Y    |
| Jnp/Jpo |      N      |    N    |     Y    |
| Jl/Jnge |      Y      |    Y    |     Y    |
| Jge/Jnl |      Y      |    Y    |     Y    |
| Jle/Jng |      Y      |    Y    |     Y    |
| Jg/Jnle |      Y      |    Y    |     Y    |

Update maybe_fused_with_jcc_p to check if operands of CMP like instructions
can be fused with condition jump.

	* gas/config/tc-i386.h (i386_tc_frag_data): Add member mf_type.
	(TC_FRAG_INIT): Init mf_type.
	* gas/config/tc-i386.c (enum mf_jcc_kind): New enum.
	(enum mf_cmp_kind): Ditto.
	(maybe_fused_with_jcc_p): Add argument mf_cmp_p to get
	mf_type of corresponding instructons, exclude unfusible
	instructions.
	(add_fused_jcc_padding_frag_p): Likewise.
	(add_branch_padding_frag_p): Likewise.
	(output_insn): Record mf_type for corresponding instructions.
	(i386_macro_fusible_p): New function.
	(i386_next_fusible_jcc_frag): Rename from i386_next_jcc_frag,
	add argument cmp_fragP to return next fusible jcc frag only.
	(i386_classify_machine_dependant_frag): Seperate macro-fusible
	instructions from condition jump.
	* gas/testsuite/gas/i386/align-branch-9.s: New file.
	* gas/testsuite/gas/i386/align-branch-9.d: Ditto.
	* gas/testsuite/gas/i386/x86-64-align-branch-9.s: Ditto.
	* gas/testsuite/gas/i386/x86-64-align-branch-9.d: Ditto.
	* gas/testsuite/gas/i386/i386.exp: Run new tests.
---
 gas/config/tc-i386.c                          | 169 ++++++++++++++----
 gas/config/tc-i386.h                          |   2 +
 gas/testsuite/gas/i386/align-branch-9.d       |  78 ++++++++
 gas/testsuite/gas/i386/align-branch-9.s       |  74 ++++++++
 gas/testsuite/gas/i386/i386.exp               |   2 +
 .../gas/i386/x86-64-align-branch-9.d          |  46 +++++
 .../gas/i386/x86-64-align-branch-9.s          |  43 +++++
 7 files changed, 382 insertions(+), 32 deletions(-)
 create mode 100644 gas/testsuite/gas/i386/align-branch-9.d
 create mode 100644 gas/testsuite/gas/i386/align-branch-9.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-align-branch-9.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-align-branch-9.s

diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index 62b7cfbe6c..76ae20d8ad 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -687,6 +687,37 @@ static unsigned int align_branch = (align_branch_jcc_bit
 				    | align_branch_fused_bit
 				    | align_branch_jmp_bit);
 
+/* Types of condition jump used by macro-fusion.  */
+enum mf_jcc_kind
+  {
+    mf_jcc_jo = 0,  /* base opcode 0x70  */
+    mf_jcc_jno,     /* base opcode 0x71  */
+    mf_jcc_jc,      /* base opcode 0x72  */
+    mf_jcc_jae,     /* base opcode 0x73  */
+    mf_jcc_je,      /* base opcode 0x74  */
+    mf_jcc_jne,     /* base opcode 0x75  */
+    mf_jcc_jna,     /* base opcode 0x76  */
+    mf_jcc_ja,      /* base opcode 0x77  */
+    mf_jcc_js,      /* base opcode 0x78  */
+    mf_jcc_jns,     /* base opcode 0x79  */
+    mf_jcc_jp,      /* base opcode 0x7a  */
+    mf_jcc_jnp,     /* base opcode 0x7b  */
+    mf_jcc_jl,      /* base opcode 0x7c  */
+    mf_jcc_jge,     /* base opcode 0x7d  */
+    mf_jcc_jle,     /* base opcode 0x7e  */
+    mf_jcc_jg       /* base opcode 0x7f  */
+  };
+
+/* Types of compare flag-modifying insntructions used by macro-fusion.  */
+enum mf_cmp_kind
+  {
+    mf_cmp_test,  /* test */
+    mf_cmp_cmp,  /* cmp */
+    mf_cmp_and,  /* and */
+    mf_cmp_alu,  /* add/sub */
+    mf_cmp_incdec  /* inc/dec */
+  };
+
 /* The maximum padding size for fused jcc.  CMP like instruction can
    be 9 bytes and jcc can be 6 bytes.  Leave room just in case for
    prefixes.   */
@@ -8374,10 +8405,22 @@ encoding_length (const fragS *start_frag, offsetT start_off,
 }
 
 /* Return 1 for test, and, cmp, add, sub, inc and dec which may
-   be macro-fused with conditional jumps.  */
+   be macro-fused with conditional jumps.
+   NB: If TEST/AND/CMP/ADD/SUB/INC/DEC is of RIP relative address,
+   or is one of the following format:
+
+    cmp m, imm
+    add m, imm
+    sub m, imm
+   test m, imm
+    and m, imm
+    inc m
+    dec m
+
+   it is unfusible.  */
 
 static int
-maybe_fused_with_jcc_p (void)
+maybe_fused_with_jcc_p (enum mf_cmp_kind* mf_cmp_p)
 {
   /* No RIP address.  */
   if (i.base_reg && i.base_reg->reg_num == RegIP)
@@ -8387,36 +8430,54 @@ maybe_fused_with_jcc_p (void)
   if (is_any_vex_encoding (&i.tm))
     return 0;
 
-  /* and, add, sub with destination register.  */
-  if ((i.tm.base_opcode >= 0x20 && i.tm.base_opcode <= 0x25)
-      || i.tm.base_opcode <= 5
+  /* add, sub without add/sub m, imm.  */
+  if (i.tm.base_opcode <= 5
       || (i.tm.base_opcode >= 0x28 && i.tm.base_opcode <= 0x2d)
       || ((i.tm.base_opcode | 3) == 0x83
-	  && ((i.tm.extension_opcode | 1) == 0x5
+	  && (i.tm.extension_opcode == 0x5
 	      || i.tm.extension_opcode == 0x0)))
-    return (i.types[1].bitfield.class == Reg
-	    || i.types[1].bitfield.instance == Accum);
+    {
+      *mf_cmp_p = mf_cmp_alu;
+      return !(i.mem_operands && i.imm_operands);
+    }
 
-  /* test, cmp with any register.  */
+  /* and without and m, imm.  */
+  if ((i.tm.base_opcode >= 0x20 && i.tm.base_opcode <= 0x25)
+      || ((i.tm.base_opcode | 3) == 0x83
+	  && i.tm.extension_opcode == 0x4))
+    {
+      *mf_cmp_p = mf_cmp_and;
+      return !(i.mem_operands && i.imm_operands);
+    }
+
+  /* test without test m imm.  */
   if ((i.tm.base_opcode | 1) == 0x85
       || (i.tm.base_opcode | 1) == 0xa9
       || ((i.tm.base_opcode | 1) == 0xf7
-	  && i.tm.extension_opcode == 0)
-      || (i.tm.base_opcode >= 0x38 && i.tm.base_opcode <= 0x3d)
+	  && i.tm.extension_opcode == 0))
+    {
+      *mf_cmp_p = mf_cmp_test;
+      return !(i.mem_operands && i.imm_operands);
+    }
+
+  /* cmp without cmp m, imm.  */
+  if ((i.tm.base_opcode >= 0x38 && i.tm.base_opcode <= 0x3d)
       || ((i.tm.base_opcode | 3) == 0x83
 	  && (i.tm.extension_opcode == 0x7)))
-    return (i.types[0].bitfield.class == Reg
-	    || i.types[0].bitfield.instance == Accum
-	    || i.types[1].bitfield.class == Reg
-	    || i.types[1].bitfield.instance == Accum);
+    {
+      *mf_cmp_p = mf_cmp_cmp;
+      return !(i.mem_operands && i.imm_operands);
+    }
 
-  /* inc, dec with any register.   */
+  /* inc, dec without inc/dec m.   */
   if ((i.tm.cpu_flags.bitfield.cpuno64
        && (i.tm.base_opcode | 0xf) == 0x4f)
       || ((i.tm.base_opcode | 1) == 0xff
 	  && i.tm.extension_opcode <= 0x1))
-    return (i.types[0].bitfield.class == Reg
-	    || i.types[0].bitfield.instance == Accum);
+    {
+      *mf_cmp_p = mf_cmp_incdec;
+      return !i.mem_operands;
+    }
 
   return 0;
 }
@@ -8424,7 +8485,7 @@ maybe_fused_with_jcc_p (void)
 /* Return 1 if a FUSED_JCC_PADDING frag should be generated.  */
 
 static int
-add_fused_jcc_padding_frag_p (void)
+add_fused_jcc_padding_frag_p (enum mf_cmp_kind* mf_cmp_p)
 {
   /* NB: Don't work with COND_JUMP86 without i386.  */
   if (!align_branch_power
@@ -8433,7 +8494,7 @@ add_fused_jcc_padding_frag_p (void)
       || !(align_branch & align_branch_fused_bit))
     return 0;
 
-  if (maybe_fused_with_jcc_p ())
+  if (maybe_fused_with_jcc_p (mf_cmp_p))
     {
       if (last_insn.kind == last_insn_other
 	  || last_insn.seg != now_seg)
@@ -8481,7 +8542,8 @@ add_branch_prefix_frag_p (void)
 /* Return 1 if a BRANCH_PADDING frag should be generated.  */
 
 static int
-add_branch_padding_frag_p (enum align_branch_kind *branch_p)
+add_branch_padding_frag_p (enum align_branch_kind *branch_p,
+			   enum mf_jcc_kind *mf_jcc_p)
 {
   int add_padding;
 
@@ -8503,6 +8565,7 @@ add_branch_padding_frag_p (enum align_branch_kind *branch_p)
 	}
       else
 	{
+	  *mf_jcc_p = i.tm.base_opcode - 0x70;
 	  *branch_p = align_branch_jcc;
 	  if ((align_branch & align_branch_jcc_bit))
 	    add_padding = 1;
@@ -8573,6 +8636,7 @@ output_insn (void)
   offsetT insn_start_off;
   fragS *fragP = NULL;
   enum align_branch_kind branch = align_branch_none;
+  enum mf_jcc_kind mf_jcc = mf_jcc_jo;
 
 #if defined (OBJ_ELF) || defined (OBJ_MAYBE_ELF)
   if (IS_ELF && x86_used_note)
@@ -8665,7 +8729,7 @@ output_insn (void)
   insn_start_frag = frag_now;
   insn_start_off = frag_now_fix ();
 
-  if (add_branch_padding_frag_p (&branch))
+  if (add_branch_padding_frag_p (&branch, &mf_jcc))
     {
       char *p;
       /* Branch can be 8 bytes.  Leave some room for prefixes.  */
@@ -8686,6 +8750,7 @@ output_insn (void)
 		ENCODE_RELAX_STATE (BRANCH_PADDING, 0),
 		NULL, 0, p);
 
+      fragP->tc_frag_data.mf_type = mf_jcc;
       fragP->tc_frag_data.branch_type = branch;
       fragP->tc_frag_data.max_bytes = max_branch_padding_size;
     }
@@ -8705,6 +8770,7 @@ output_insn (void)
       unsigned char *q;
       unsigned int j;
       unsigned int prefix;
+      enum mf_cmp_kind mf_cmp;
 
       if (avoid_fence
 	  && (i.tm.base_opcode == 0xfaee8
@@ -8731,7 +8797,7 @@ output_insn (void)
       if (branch)
 	/* Skip if this is a branch.  */
 	;
-      else if (add_fused_jcc_padding_frag_p ())
+      else if (add_fused_jcc_padding_frag_p (&mf_cmp))
 	{
 	  /* Make room for padding.  */
 	  frag_grow (MAX_FUSED_JCC_PADDING_SIZE);
@@ -8743,6 +8809,7 @@ output_insn (void)
 		    ENCODE_RELAX_STATE (FUSED_JCC_PADDING, 0),
 		    NULL, 0, p);
 
+	  fragP->tc_frag_data.mf_type = mf_cmp;
 	  fragP->tc_frag_data.branch_type = align_branch_fused;
 	  fragP->tc_frag_data.max_bytes = MAX_FUSED_JCC_PADDING_SIZE;
 	}
@@ -10948,6 +11015,41 @@ elf_symbol_resolved_in_segment_p (symbolS *fr_symbol, offsetT fr_var)
 }
 #endif
 
+/* Table 3-2. Macro-Fusible Instructions in Haswell Microarchitecture
+---------------------------------------------------------------------
+|   JCC   | ADD/SUB/CMP | INC/DEC | TEST/AND |
+| ------  | ----------- | ------- | -------- |
+|   Jo    |      N      |    N    |     Y    |
+|   Jno   |      N      |    N    |     Y    |
+|  Jc/Jb  |      Y      |    N    |     Y    |
+| Jae/Jnb |      Y      |    N    |     Y    |
+|  Je/Jz  |      Y      |    Y    |     Y    |
+| Jne/Jnz |      Y      |    Y    |     Y    |
+| Jna/Jbe |      Y      |    N    |     Y    |
+| Ja/Jnbe |      Y      |    N    |     Y    |
+|   Js    |      N      |    N    |     Y    |
+|   Jns   |      N      |    N    |     Y    |
+|  Jp/Jpe |      N      |    N    |     Y    |
+| Jnp/Jpo |      N      |    N    |     Y    |
+| Jl/Jnge |      Y      |    Y    |     Y    |
+| Jge/Jnl |      Y      |    Y    |     Y    |
+| Jle/Jng |      Y      |    Y    |     Y    |
+| Jg/Jnle |      Y      |    Y    |     Y    |
+---------------------------------------------------------------------  */
+static int
+i386_macro_fusible_p (enum mf_cmp_kind mf_cmp, enum mf_jcc_kind mf_jcc)
+{
+  if (mf_cmp == mf_cmp_alu || mf_cmp == mf_cmp_cmp)
+    return ((mf_jcc >= mf_jcc_jc && mf_jcc <= mf_jcc_ja)
+	    || (mf_jcc >= mf_jcc_jl && mf_jcc <= mf_jcc_jg));
+  if (mf_cmp == mf_cmp_incdec)
+    return ((mf_jcc >= mf_jcc_je && mf_jcc <= mf_jcc_jne)
+	    || (mf_jcc >= mf_jcc_jl && mf_jcc <= mf_jcc_jg));
+  if (mf_cmp == mf_cmp_test || mf_cmp == mf_cmp_and)
+    return 1;
+  return 0;
+}
+
 /* Return the next non-empty frag.  */
 
 static fragS *
@@ -10967,20 +11069,23 @@ i386_next_non_empty_frag (fragS *fragP)
 /* Return the next jcc frag after BRANCH_PADDING.  */
 
 static fragS *
-i386_next_jcc_frag (fragS *fragP)
+i386_next_fusible_jcc_frag (fragS *cmp_fragP, fragS *pad_fragP)
 {
-  if (!fragP)
+  fragS *branch_fragP;
+  if (!pad_fragP)
     return NULL;
 
-  if (fragP->fr_type == rs_machine_dependent
-      && (TYPE_FROM_RELAX_STATE (fragP->fr_subtype)
+  if (pad_fragP->fr_type == rs_machine_dependent
+      && (TYPE_FROM_RELAX_STATE (pad_fragP->fr_subtype)
 	  == BRANCH_PADDING))
     {
-      fragP = i386_next_non_empty_frag (fragP);
-      if (fragP->fr_type != rs_machine_dependent)
+      branch_fragP = i386_next_non_empty_frag (pad_fragP);
+      if (branch_fragP->fr_type != rs_machine_dependent)
 	return NULL;
-      if (TYPE_FROM_RELAX_STATE (fragP->fr_subtype) == COND_JUMP)
-	return fragP;
+      if (TYPE_FROM_RELAX_STATE (branch_fragP->fr_subtype) == COND_JUMP
+	  && i386_macro_fusible_p (cmp_fragP->tc_frag_data.mf_type,
+				   pad_fragP->tc_frag_data.mf_type))
+	return branch_fragP;
     }
 
   return NULL;
@@ -11025,7 +11130,7 @@ i386_classify_machine_dependent_frag (fragS *fragP)
 	       */
 	    cmp_fragP = i386_next_non_empty_frag (next_fragP);
 	    pad_fragP = i386_next_non_empty_frag (cmp_fragP);
-	    branch_fragP = i386_next_jcc_frag (pad_fragP);
+	    branch_fragP = i386_next_fusible_jcc_frag (next_fragP, pad_fragP);
 	    if (branch_fragP)
 	      {
 		/* The BRANCH_PADDING frag is merged with the
diff --git a/gas/config/tc-i386.h b/gas/config/tc-i386.h
index 845b3901d4..d3f3819292 100644
--- a/gas/config/tc-i386.h
+++ b/gas/config/tc-i386.h
@@ -273,6 +273,7 @@ struct i386_tc_frag_data
   unsigned char prefix_length;
   unsigned char default_prefix;
   unsigned char cmp_size;
+  unsigned int mf_type : 4;
   unsigned int classified : 1;
   unsigned int branch_type : 3;
 };
@@ -299,6 +300,7 @@ struct i386_tc_frag_data
      (FRAGP)->tc_frag_data.cmp_size = 0;			\
      (FRAGP)->tc_frag_data.classified = 0;			\
      (FRAGP)->tc_frag_data.branch_type = 0;			\
+     (FRAGP)->tc_frag_data.mf_type = 0;			\
    }								\
  while (0)
 
diff --git a/gas/testsuite/gas/i386/align-branch-9.d b/gas/testsuite/gas/i386/align-branch-9.d
new file mode 100644
index 0000000000..6340817d04
--- /dev/null
+++ b/gas/testsuite/gas/i386/align-branch-9.d
@@ -0,0 +1,78 @@
+#as: -mbranches-within-32B-boundaries
+#objdump: -dw
+
+.*: +file format .*
+
+Disassembly of section .text:
+
+0+ <foo>:
+   0:	65 a3 01 00 00 00    	mov    %eax,%gs:0x1
+   6:	55                   	push   %ebp
+   7:	55                   	push   %ebp
+   8:	55                   	push   %ebp
+   9:	55                   	push   %ebp
+   a:	89 e5                	mov    %esp,%ebp
+   c:	89 7d f8             	mov    %edi,-0x8\(%ebp\)
+   f:	89 75 f4             	mov    %esi,-0xc\(%ebp\)
+  12:	89 75 f4             	mov    %esi,-0xc\(%ebp\)
+  15:	89 75 f4             	mov    %esi,-0xc\(%ebp\)
+  18:	89 75 f4             	mov    %esi,-0xc\(%ebp\)
+  1b:	89 75 f4             	mov    %esi,-0xc\(%ebp\)
+  1e:	39 c5                	cmp    %eax,%ebp
+  20:	70 62                	jo     84 <foo\+0x84>
+  22:	89 73 f4             	mov    %esi,-0xc\(%ebx\)
+  25:	89 75 f4             	mov    %esi,-0xc\(%ebp\)
+  28:	89 7d f8             	mov    %edi,-0x8\(%ebp\)
+  2b:	89 75 f4             	mov    %esi,-0xc\(%ebp\)
+  2e:	89 75 f4             	mov    %esi,-0xc\(%ebp\)
+  31:	89 75 f4             	mov    %esi,-0xc\(%ebp\)
+  34:	89 75 f4             	mov    %esi,-0xc\(%ebp\)
+  37:	89 75 f4             	mov    %esi,-0xc\(%ebp\)
+  3a:	5d                   	pop    %ebp
+  3b:	5d                   	pop    %ebp
+  3c:	5d                   	pop    %ebp
+  3d:	74 45                	je     84 <foo\+0x84>
+  3f:	5d                   	pop    %ebp
+  40:	74 42                	je     84 <foo\+0x84>
+  42:	89 44 24 fc          	mov    %eax,-0x4\(%esp\)
+  46:	89 75 f4             	mov    %esi,-0xc\(%ebp\)
+  49:	89 7d f8             	mov    %edi,-0x8\(%ebp\)
+  4c:	89 75 f4             	mov    %esi,-0xc\(%ebp\)
+  4f:	89 75 f4             	mov    %esi,-0xc\(%ebp\)
+  52:	89 75 f4             	mov    %esi,-0xc\(%ebp\)
+  55:	89 75 f4             	mov    %esi,-0xc\(%ebp\)
+  58:	89 75 f4             	mov    %esi,-0xc\(%ebp\)
+  5b:	5d                   	pop    %ebp
+  5c:	eb 2c                	jmp    8a <foo\+0x8a>
+  5e:	66 90                	xchg   %ax,%ax
+  60:	eb 28                	jmp    8a <foo\+0x8a>
+  62:	eb 26                	jmp    8a <foo\+0x8a>
+  64:	89 45 fc             	mov    %eax,-0x4\(%ebp\)
+  67:	89 75 f4             	mov    %esi,-0xc\(%ebp\)
+  6a:	89 7d f8             	mov    %edi,-0x8\(%ebp\)
+  6d:	5d                   	pop    %ebp
+  6e:	5d                   	pop    %ebp
+  6f:	40                   	inc    %eax
+  70:	72 12                	jb     84 <foo\+0x84>
+  72:	36 36 89 45 fc       	ss mov %eax,%ss:-0x4\(%ebp\)
+  77:	89 75 f4             	mov    %esi,-0xc\(%ebp\)
+  7a:	89 7d f8             	mov    %edi,-0x8\(%ebp\)
+  7d:	89 75 f4             	mov    %esi,-0xc\(%ebp\)
+  80:	21 c3                	and    %eax,%ebx
+  82:	7c 06                	jl     8a <foo\+0x8a>
+  84:	8b 45 f4             	mov    -0xc\(%ebp\),%eax
+  87:	89 45 fc             	mov    %eax,-0x4\(%ebp\)
+  8a:	89 b5 50 fb ff ff    	mov    %esi,-0x4b0\(%ebp\)
+  90:	89 b5 50 fb ff ff    	mov    %esi,-0x4b0\(%ebp\)
+  96:	89 b5 50 fb ff ff    	mov    %esi,-0x4b0\(%ebp\)
+  9c:	89 b5 50 fb ff ff    	mov    %esi,-0x4b0\(%ebp\)
+  a2:	89 75 0c             	mov    %esi,0xc\(%ebp\)
+  a5:	e9 fc ff ff ff       	jmp    a6 <foo\+0xa6>
+  aa:	89 b5 50 fb ff ff    	mov    %esi,-0x4b0\(%ebp\)
+  b0:	89 b5 50 fb ff ff    	mov    %esi,-0x4b0\(%ebp\)
+  b6:	89 b5 50 fb ff ff    	mov    %esi,-0x4b0\(%ebp\)
+  bc:	89 b5 50 fb ff ff    	mov    %esi,-0x4b0\(%ebp\)
+  c2:	89 75 00             	mov    %esi,0x0\(%ebp\)
+  c5:	74 c3                	je     8a <foo\+0x8a>
+  c7:	74 c1                	je     8a <foo\+0x8a>
+#pass
diff --git a/gas/testsuite/gas/i386/align-branch-9.s b/gas/testsuite/gas/i386/align-branch-9.s
new file mode 100644
index 0000000000..357abe30f9
--- /dev/null
+++ b/gas/testsuite/gas/i386/align-branch-9.s
@@ -0,0 +1,74 @@
+  .text
+  .globl  foo
+  .p2align  4
+foo:
+  movl	%eax, %gs:0x1
+  pushl  %ebp
+  pushl  %ebp
+  pushl  %ebp
+  pushl  %ebp
+  movl  %esp, %ebp
+  movl  %edi, -8(%ebp)
+  movl  %esi, -12(%ebp)
+  movl  %esi, -12(%ebp)
+  movl  %esi, -12(%ebp)
+  movl  %esi, -12(%ebp)
+  movl  %esi, -12(%ebp)
+  cmp  %eax, %ebp
+  jo  .L_2
+  movl  %esi, -12(%ebx)
+  movl  %esi, -12(%ebp)
+  movl  %edi, -8(%ebp)
+  movl  %esi, -12(%ebp)
+  movl  %esi, -12(%ebp)
+  movl  %esi, -12(%ebp)
+  movl  %esi, -12(%ebp)
+  movl  %esi, -12(%ebp)
+  popl  %ebp
+  popl  %ebp
+  popl  %ebp
+  je  .L_2
+  popl  %ebp
+  je  .L_2
+  movl  %eax, -4(%esp)
+  movl  %esi, -12(%ebp)
+  movl  %edi, -8(%ebp)
+  movl  %esi, -12(%ebp)
+  movl  %esi, -12(%ebp)
+  movl  %esi, -12(%ebp)
+  movl  %esi, -12(%ebp)
+  movl  %esi, -12(%ebp)
+  popl  %ebp
+  jmp  .L_3
+  jmp  .L_3
+  jmp  .L_3
+  movl  %eax, -4(%ebp)
+  movl  %esi, -12(%ebp)
+  movl  %edi, -8(%ebp)
+  popl  %ebp
+  popl  %ebp
+  inc  %eax
+  jc  .L_2
+  movl  %eax, -4(%ebp)
+  movl  %esi, -12(%ebp)
+  movl  %edi, -8(%ebp)
+  movl  %esi, -12(%ebp)
+  and  %eax, %ebx
+  jl  .L_3
+.L_2:
+  movl  -12(%ebp), %eax
+  movl  %eax, -4(%ebp)
+.L_3:
+  movl  %esi, -1200(%ebp)
+  movl  %esi, -1200(%ebp)
+  movl  %esi, -1200(%ebp)
+  movl  %esi, -1200(%ebp)
+  movl  %esi, 12(%ebp)
+  jmp  bar
+  movl  %esi, -1200(%ebp)
+  movl  %esi, -1200(%ebp)
+  movl  %esi, -1200(%ebp)
+  movl  %esi, -1200(%ebp)
+  movl  %esi, (%ebp)
+  je .L_3
+  je .L_3
diff --git a/gas/testsuite/gas/i386/i386.exp b/gas/testsuite/gas/i386/i386.exp
index 685e62ea72..8fc621f2bb 100644
--- a/gas/testsuite/gas/i386/i386.exp
+++ b/gas/testsuite/gas/i386/i386.exp
@@ -525,6 +525,7 @@ if [expr ([istarget "i*86-*-*"] ||  [istarget "x86_64-*-*"]) && [gas_32_check]]
     run_dump_test "align-branch-6"
     run_dump_test "align-branch-7"
     run_dump_test "align-branch-8"
+    run_dump_test "align-branch-9"
 
     # These tests require support for 8 and 16 bit relocs,
     # so we only run them for ELF and COFF targets.
@@ -1100,6 +1101,7 @@ if [expr ([istarget "i*86-*-*"] || [istarget "x86_64-*-*"]) && [gas_64_check]] t
     run_dump_test "x86-64-align-branch-6"
     run_dump_test "x86-64-align-branch-7"
     run_dump_test "x86-64-align-branch-8"
+    run_dump_test "x86-64-align-branch-9"
 
     if { ![istarget "*-*-aix*"]
       && ![istarget "*-*-beos*"]
diff --git a/gas/testsuite/gas/i386/x86-64-align-branch-9.d b/gas/testsuite/gas/i386/x86-64-align-branch-9.d
new file mode 100644
index 0000000000..1041fd0483
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-align-branch-9.d
@@ -0,0 +1,46 @@
+#as: -mbranches-within-32B-boundaries
+#objdump: -dw
+
+.*: +file format .*
+
+Disassembly of section .text:
+
+0+ <foo>:
+   0:	c1 e9 02             	shr    \$0x2,%ecx
+   3:	c1 e9 02             	shr    \$0x2,%ecx
+   6:	c1 e9 02             	shr    \$0x2,%ecx
+   9:	89 d1                	mov    %edx,%ecx
+   b:	31 c0                	xor    %eax,%eax
+   d:	c1 e9 02             	shr    \$0x2,%ecx
+  10:	c1 e9 02             	shr    \$0x2,%ecx
+  13:	c1 e9 02             	shr    \$0x2,%ecx
+  16:	c1 e9 02             	shr    \$0x2,%ecx
+  19:	c1 e9 02             	shr    \$0x2,%ecx
+  1c:	c1 e9 02             	shr    \$0x2,%ecx
+  1f:	80 fa 02             	cmp    \$0x2,%dl
+  22:	70 df                	jo     3 <foo\+0x3>
+  24:	2e 2e 2e 2e 31 c0    	cs cs cs cs xor %eax,%eax
+  2a:	c1 e9 02             	shr    \$0x2,%ecx
+  2d:	c1 e9 02             	shr    \$0x2,%ecx
+  30:	c1 e9 02             	shr    \$0x2,%ecx
+  33:	89 d1                	mov    %edx,%ecx
+  35:	31 c0                	xor    %eax,%eax
+  37:	c1 e9 02             	shr    \$0x2,%ecx
+  3a:	c1 e9 02             	shr    \$0x2,%ecx
+  3d:	c1 e9 02             	shr    \$0x2,%ecx
+  40:	f6 c2 02             	test   \$0x2,%dl
+  43:	75 e8                	jne    2d <foo\+0x2d>
+  45:	31 c0                	xor    %eax,%eax
+  47:	c1 e9 02             	shr    \$0x2,%ecx
+  4a:	c1 e9 02             	shr    \$0x2,%ecx
+  4d:	89 d1                	mov    %edx,%ecx
+  4f:	c1 e9 02             	shr    \$0x2,%ecx
+  52:	c1 e9 02             	shr    \$0x2,%ecx
+  55:	89 d1                	mov    %edx,%ecx
+  57:	c1 e9 02             	shr    \$0x2,%ecx
+  5a:	89 d1                	mov    %edx,%ecx
+  5c:	31 c0                	xor    %eax,%eax
+  5e:	ff c0                	inc    %eax
+  60:	76 cb                	jbe    2d <foo\+0x2d>
+  62:	31 c0                	xor    %eax,%eax
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-align-branch-9.s b/gas/testsuite/gas/i386/x86-64-align-branch-9.s
new file mode 100644
index 0000000000..917579bda4
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-align-branch-9.s
@@ -0,0 +1,43 @@
+	.text
+	.p2align 4,,15
+foo:
+	shrl	$2, %ecx
+.L1:
+	shrl	$2, %ecx
+	shrl	$2, %ecx
+	movl	%edx, %ecx
+	xorl	%eax, %eax
+	shrl	$2, %ecx
+	shrl	$2, %ecx
+	shrl	$2, %ecx
+	shrl	$2, %ecx
+	shrl	$2, %ecx
+	shrl	$2, %ecx
+	cmpb	$2, %dl
+	jo	.L1
+	xorl	%eax, %eax
+	shrl	$2, %ecx
+.L2:
+	shrl	$2, %ecx
+	shrl	$2, %ecx
+	movl	%edx, %ecx
+	xorl	%eax, %eax
+	shrl	$2, %ecx
+	shrl	$2, %ecx
+	shrl	$2, %ecx
+	testb	$2, %dl
+	jne	.L2
+	xorl	%eax, %eax
+.L3:
+	shrl	$2, %ecx
+	shrl	$2, %ecx
+	movl	%edx, %ecx
+	shrl	$2, %ecx
+	shrl	$2, %ecx
+	movl	%edx, %ecx
+	shrl	$2, %ecx
+	movl	%edx, %ecx
+	xorl	%eax, %eax
+	inc	%eax
+	jbe	.L2
+	xorl	%eax, %eax
-- 
2.18.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Separate macro-fusible instructions from not for -mbranches-within-32B-boundaries
  2020-02-28  8:57 ` Hongtao Liu
@ 2020-02-28 11:35   ` Jan Beulich
  2020-03-02  3:14     ` Hongtao Liu
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Beulich @ 2020-02-28 11:35 UTC (permalink / raw)
  To: Hongtao Liu, H. J. Lu; +Cc: binutils

On 28.02.2020 09:57, Hongtao Liu wrote:
> On Fri, Feb 28, 2020 at 4:55 PM Hongtao Liu <crazylht@gmail.com> wrote:
>>
>> Hi:
>>     This is subsequent patch for JCC Code Erratum.
>>     Refer to http://sourceware-org.1504.n7.nabble.com/PATCH-0-5-i386-Optimize-for-Jump-Conditional-Code-Erratum-tt598005.html#none
>>
>>     According to intel SDM manual, not all compare flag-modifying
>> instructions are marcro-fusible with subsequent jcc instruction. For
>> those not, -mbranches-within-32B-boundaries need't align them as
>> FUSED_JCC_PADDING, only jcc itself need to be aligned. Here are 2
>> restrictions which separate macro-fusible instruction from not
>>
>>     Restriction 1:
>>     If TEST/AND/CMP/ADD/SUB/INC/DEC is one of the following format:
>>
>>        cmp m, imm
>>        add m, imm
>>        sub m, imm
>>        test m, imm
>>        and m, imm
>>        inc m
>>        dec m
>>
>>     it is unfusible with any jcc instructions.
>>
>>     Restriction 2:
>>     /* Table 3-2. Macro-Fusible Instructions in Haswell Microarchitecture
>>     ---------------------------------------------------------------------
>>     |   JCC   | ADD/SUB/CMP | INC/DEC | TEST/AND |
>>     | ------  | ----------- | ------- | -------- |
>>     |   Jo    |      N      |    N    |     Y    |
>>     |   Jno   |      N      |    N    |     Y    |
>>     |  Jc/Jb  |      Y      |    N    |     Y    |
>>     | Jae/Jnb |      Y      |    N    |     Y    |
>>     |  Je/Jz  |      Y      |    Y    |     Y    |
>>     | Jne/Jnz |      Y      |    Y    |     Y    |
>>     | Jna/Jbe |      Y      |    N    |     Y    |
>>     | Ja/Jnbe |      Y      |    N    |     Y    |
>>     |   Js    |      N      |    N    |     Y    |
>>     |   Jns   |      N      |    N    |     Y    |
>>     |  Jp/Jpe |      N      |    N    |     Y    |
>>     | Jnp/Jpo |      N      |    N    |     Y    |
>>     | Jl/Jnge |      Y      |    Y    |     Y    |
>>     | Jge/Jnl |      Y      |    Y    |     Y    |
>>     | Jle/Jng |      Y      |    Y    |     Y    |
>>     | Jg/Jnle |      Y      |    Y    |     Y    |

Quoting a Haswell table (also in the code) for code to deal with
a Skylake erratum is, without any further comments, not very
helpful.

>+enum mf_jcc_kind
>+  {
>+    mf_jcc_jo = 0,  /* base opcode 0x70  */
>+    mf_jcc_jno,     /* base opcode 0x71  */
>+    mf_jcc_jc,      /* base opcode 0x72  */
>+    mf_jcc_jae,     /* base opcode 0x73  */
>+    mf_jcc_je,      /* base opcode 0x74  */
>+    mf_jcc_jne,     /* base opcode 0x75  */
>+    mf_jcc_jna,     /* base opcode 0x76  */
>+    mf_jcc_ja,      /* base opcode 0x77  */
>+    mf_jcc_js,      /* base opcode 0x78  */
>+    mf_jcc_jns,     /* base opcode 0x79  */
>+    mf_jcc_jp,      /* base opcode 0x7a  */
>+    mf_jcc_jnp,     /* base opcode 0x7b  */
>+    mf_jcc_jl,      /* base opcode 0x7c  */
>+    mf_jcc_jge,     /* base opcode 0x7d  */
>+    mf_jcc_jle,     /* base opcode 0x7e  */
>+    mf_jcc_jg       /* base opcode 0x7f  */
>+  };

Did you consider making this an insn template attribute instead?

>+enum mf_cmp_kind
>+  {
>+    mf_cmp_test,  /* test */
>+    mf_cmp_cmp,  /* cmp */
>+    mf_cmp_and,  /* and */
>+    mf_cmp_alu,  /* add/sub */
>+    mf_cmp_incdec  /* inc/dec */
>+  };

Why 5 enumerators when only three are needed (CMP going
together with ADD/SUB and AND going together with TEST)?

>@@ -8573,6 +8636,7 @@ output_insn (void)
>   offsetT insn_start_off;
>   fragS *fragP = NULL;
>   enum align_branch_kind branch = align_branch_none;
>+  enum mf_jcc_kind mf_jcc = mf_jcc_jo;

Is this initializer needed? It looks pretty arbitrary to pick
JO here.

>@@ -10967,20 +11069,23 @@ i386_next_non_empty_frag (fragS *fragP)
> /* Return the next jcc frag after BRANCH_PADDING.  */
> 
> static fragS *
>-i386_next_jcc_frag (fragS *fragP)
>+i386_next_fusible_jcc_frag (fragS *cmp_fragP, fragS *pad_fragP)

Besides "cmp" in the name again not really fitting all variants
(afaict), it also looks suspicious with ...

>@@ -11025,7 +11130,7 @@ i386_classify_machine_dependent_frag (fragS *fragP)
> 	       */
> 	    cmp_fragP = i386_next_non_empty_frag (next_fragP);
> 	    pad_fragP = i386_next_non_empty_frag (cmp_fragP);
>-	    branch_fragP = i386_next_jcc_frag (pad_fragP);
>+	    branch_fragP = i386_next_fusible_jcc_frag (next_fragP, pad_fragP);

... next_fragP, not cmp_fragP being passed here.

>--- a/gas/config/tc-i386.h
>+++ b/gas/config/tc-i386.h
>@@ -273,6 +273,7 @@ struct i386_tc_frag_data
>   unsigned char prefix_length;
>   unsigned char default_prefix;
>   unsigned char cmp_size;
>+  unsigned int mf_type : 4;

Even if not going with a (2-bit) insn attribute, this doesn't
need to be wider than 3 bits, as J<cc> and JN<cc> always fall
into the same group, and hence don't need separate tracking
(i.e. when recording you could ignore the low opcode bit).

Jan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Separate macro-fusible instructions from not for -mbranches-within-32B-boundaries
  2020-02-28 11:35   ` Jan Beulich
@ 2020-03-02  3:14     ` Hongtao Liu
  2020-03-02  4:05       ` Hongtao Liu
  0 siblings, 1 reply; 9+ messages in thread
From: Hongtao Liu @ 2020-03-02  3:14 UTC (permalink / raw)
  To: Jan Beulich; +Cc: H. J. Lu, binutils

On Fri, Feb 28, 2020 at 7:35 PM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 28.02.2020 09:57, Hongtao Liu wrote:
> > On Fri, Feb 28, 2020 at 4:55 PM Hongtao Liu <crazylht@gmail.com> wrote:
> >>
> >> Hi:
> >>     This is subsequent patch for JCC Code Erratum.
> >>     Refer to http://sourceware-org.1504.n7.nabble.com/PATCH-0-5-i386-Optimize-for-Jump-Conditional-Code-Erratum-tt598005.html#none
> >>
> >>     According to intel SDM manual, not all compare flag-modifying
> >> instructions are marcro-fusible with subsequent jcc instruction. For
> >> those not, -mbranches-within-32B-boundaries need't align them as
> >> FUSED_JCC_PADDING, only jcc itself need to be aligned. Here are 2
> >> restrictions which separate macro-fusible instruction from not
> >>
> >>     Restriction 1:
> >>     If TEST/AND/CMP/ADD/SUB/INC/DEC is one of the following format:
> >>
> >>        cmp m, imm
> >>        add m, imm
> >>        sub m, imm
> >>        test m, imm
> >>        and m, imm
> >>        inc m
> >>        dec m
> >>
> >>     it is unfusible with any jcc instructions.
> >>
> >>     Restriction 2:
> >>     /* Table 3-2. Macro-Fusible Instructions in Haswell Microarchitecture
> >>     ---------------------------------------------------------------------
> >>     |   JCC   | ADD/SUB/CMP | INC/DEC | TEST/AND |
> >>     | ------  | ----------- | ------- | -------- |
> >>     |   Jo    |      N      |    N    |     Y    |
> >>     |   Jno   |      N      |    N    |     Y    |
> >>     |  Jc/Jb  |      Y      |    N    |     Y    |
> >>     | Jae/Jnb |      Y      |    N    |     Y    |
> >>     |  Je/Jz  |      Y      |    Y    |     Y    |
> >>     | Jne/Jnz |      Y      |    Y    |     Y    |
> >>     | Jna/Jbe |      Y      |    N    |     Y    |
> >>     | Ja/Jnbe |      Y      |    N    |     Y    |
> >>     |   Js    |      N      |    N    |     Y    |
> >>     |   Jns   |      N      |    N    |     Y    |
> >>     |  Jp/Jpe |      N      |    N    |     Y    |
> >>     | Jnp/Jpo |      N      |    N    |     Y    |
> >>     | Jl/Jnge |      Y      |    Y    |     Y    |
> >>     | Jge/Jnl |      Y      |    Y    |     Y    |
> >>     | Jle/Jng |      Y      |    Y    |     Y    |
> >>     | Jg/Jnle |      Y      |    Y    |     Y    |
>
> Quoting a Haswell table (also in the code) for code to deal with
> a Skylake erratum is, without any further comments, not very
> helpful.
>
They also works for Skylake and Cascadelake, I'll add comments.
> >+enum mf_jcc_kind
> >+  {
> >+    mf_jcc_jo = 0,  /* base opcode 0x70  */
> >+    mf_jcc_jno,     /* base opcode 0x71  */
> >+    mf_jcc_jc,      /* base opcode 0x72  */
> >+    mf_jcc_jae,     /* base opcode 0x73  */
> >+    mf_jcc_je,      /* base opcode 0x74  */
> >+    mf_jcc_jne,     /* base opcode 0x75  */
> >+    mf_jcc_jna,     /* base opcode 0x76  */
> >+    mf_jcc_ja,      /* base opcode 0x77  */
> >+    mf_jcc_js,      /* base opcode 0x78  */
> >+    mf_jcc_jns,     /* base opcode 0x79  */
> >+    mf_jcc_jp,      /* base opcode 0x7a  */
> >+    mf_jcc_jnp,     /* base opcode 0x7b  */
> >+    mf_jcc_jl,      /* base opcode 0x7c  */
> >+    mf_jcc_jge,     /* base opcode 0x7d  */
> >+    mf_jcc_jle,     /* base opcode 0x7e  */
> >+    mf_jcc_jg       /* base opcode 0x7f  */
> >+  };
>
> Did you consider making this an insn template attribute instead?
>
Well, it could be calculated by tm.baseopcode - 0x70 if insn is jcc.
So maybe it's a bit redudant to add it into insn_template?
> >+enum mf_cmp_kind
> >+  {
> >+    mf_cmp_test,  /* test */
> >+    mf_cmp_cmp,  /* cmp */
> >+    mf_cmp_and,  /* and */
> >+    mf_cmp_alu,  /* add/sub */
> >+    mf_cmp_incdec  /* inc/dec */
> >+  };
>
> Why 5 enumerators when only three are needed (CMP going
> together with ADD/SUB and AND going together with TEST)?
>
Yes, only three are needed here, adding 5 enumerators just in case
there are other combinations use in the future.
> >@@ -8573,6 +8636,7 @@ output_insn (void)
> >   offsetT insn_start_off;
> >   fragS *fragP = NULL;
> >   enum align_branch_kind branch = align_branch_none;
> >+  enum mf_jcc_kind mf_jcc = mf_jcc_jo;
>
> Is this initializer needed? It looks pretty arbitrary to pick
> JO here.
>
It's arbitrary, the initializer here is just to avoid uninitialized
warning when building with option Wuninitialized.
I'll add comments to explain the initializer.
> >@@ -10967,20 +11069,23 @@ i386_next_non_empty_frag (fragS *fragP)
> > /* Return the next jcc frag after BRANCH_PADDING.  */
> >
> > static fragS *
> >-i386_next_jcc_frag (fragS *fragP)
> >+i386_next_fusible_jcc_frag (fragS *cmp_fragP, fragS *pad_fragP)
>
> Besides "cmp" in the name again not really fitting all variants
> (afaict), it also looks suspicious with ...
>
> >@@ -11025,7 +11130,7 @@ i386_classify_machine_dependent_frag (fragS *fragP)
> >              */
> >           cmp_fragP = i386_next_non_empty_frag (next_fragP);
> >           pad_fragP = i386_next_non_empty_frag (cmp_fragP);
> >-          branch_fragP = i386_next_jcc_frag (pad_fragP);
> >+          branch_fragP = i386_next_fusible_jcc_frag (next_fragP, pad_fragP);
>
> ... next_fragP, not cmp_fragP being passed here.
>
Yes, use maybe_cmp_fragP may be better.
> >--- a/gas/config/tc-i386.h
> >+++ b/gas/config/tc-i386.h
> >@@ -273,6 +273,7 @@ struct i386_tc_frag_data
> >   unsigned char prefix_length;
> >   unsigned char default_prefix;
> >   unsigned char cmp_size;
> >+  unsigned int mf_type : 4;
>
> Even if not going with a (2-bit) insn attribute, this doesn't
> need to be wider than 3 bits, as J<cc> and JN<cc> always fall
> into the same group, and hence don't need separate tracking
> (i.e. when recording you could ignore the low opcode bit).
>
Yes, J<cc> and JN<cc> could share the same group. 3 bits is enough.
> Jan



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Separate macro-fusible instructions from not for -mbranches-within-32B-boundaries
  2020-03-02  3:14     ` Hongtao Liu
@ 2020-03-02  4:05       ` Hongtao Liu
  2020-03-02  8:32         ` Jan Beulich
  0 siblings, 1 reply; 9+ messages in thread
From: Hongtao Liu @ 2020-03-02  4:05 UTC (permalink / raw)
  To: Jan Beulich; +Cc: H. J. Lu, binutils

[-- Attachment #1: Type: text/plain, Size: 6186 bytes --]

On Mon, Mar 2, 2020 at 11:13 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Fri, Feb 28, 2020 at 7:35 PM Jan Beulich <jbeulich@suse.com> wrote:
> >
> > On 28.02.2020 09:57, Hongtao Liu wrote:
> > > On Fri, Feb 28, 2020 at 4:55 PM Hongtao Liu <crazylht@gmail.com> wrote:
> > >>
> > >> Hi:
> > >>     This is subsequent patch for JCC Code Erratum.
> > >>     Refer to http://sourceware-org.1504.n7.nabble.com/PATCH-0-5-i386-Optimize-for-Jump-Conditional-Code-Erratum-tt598005.html#none
> > >>
> > >>     According to intel SDM manual, not all compare flag-modifying
> > >> instructions are marcro-fusible with subsequent jcc instruction. For
> > >> those not, -mbranches-within-32B-boundaries need't align them as
> > >> FUSED_JCC_PADDING, only jcc itself need to be aligned. Here are 2
> > >> restrictions which separate macro-fusible instruction from not
> > >>
> > >>     Restriction 1:
> > >>     If TEST/AND/CMP/ADD/SUB/INC/DEC is one of the following format:
> > >>
> > >>        cmp m, imm
> > >>        add m, imm
> > >>        sub m, imm
> > >>        test m, imm
> > >>        and m, imm
> > >>        inc m
> > >>        dec m
> > >>
> > >>     it is unfusible with any jcc instructions.
> > >>
> > >>     Restriction 2:
> > >>     /* Table 3-2. Macro-Fusible Instructions in Haswell Microarchitecture
> > >>     ---------------------------------------------------------------------
> > >>     |   JCC   | ADD/SUB/CMP | INC/DEC | TEST/AND |
> > >>     | ------  | ----------- | ------- | -------- |
> > >>     |   Jo    |      N      |    N    |     Y    |
> > >>     |   Jno   |      N      |    N    |     Y    |
> > >>     |  Jc/Jb  |      Y      |    N    |     Y    |
> > >>     | Jae/Jnb |      Y      |    N    |     Y    |
> > >>     |  Je/Jz  |      Y      |    Y    |     Y    |
> > >>     | Jne/Jnz |      Y      |    Y    |     Y    |
> > >>     | Jna/Jbe |      Y      |    N    |     Y    |
> > >>     | Ja/Jnbe |      Y      |    N    |     Y    |
> > >>     |   Js    |      N      |    N    |     Y    |
> > >>     |   Jns   |      N      |    N    |     Y    |
> > >>     |  Jp/Jpe |      N      |    N    |     Y    |
> > >>     | Jnp/Jpo |      N      |    N    |     Y    |
> > >>     | Jl/Jnge |      Y      |    Y    |     Y    |
> > >>     | Jge/Jnl |      Y      |    Y    |     Y    |
> > >>     | Jle/Jng |      Y      |    Y    |     Y    |
> > >>     | Jg/Jnle |      Y      |    Y    |     Y    |
> >
> > Quoting a Haswell table (also in the code) for code to deal with
> > a Skylake erratum is, without any further comments, not very
> > helpful.
> >
> They also works for Skylake and Cascadelake, I'll add comments.
> > >+enum mf_jcc_kind
> > >+  {
> > >+    mf_jcc_jo = 0,  /* base opcode 0x70  */
> > >+    mf_jcc_jno,     /* base opcode 0x71  */
> > >+    mf_jcc_jc,      /* base opcode 0x72  */
> > >+    mf_jcc_jae,     /* base opcode 0x73  */
> > >+    mf_jcc_je,      /* base opcode 0x74  */
> > >+    mf_jcc_jne,     /* base opcode 0x75  */
> > >+    mf_jcc_jna,     /* base opcode 0x76  */
> > >+    mf_jcc_ja,      /* base opcode 0x77  */
> > >+    mf_jcc_js,      /* base opcode 0x78  */
> > >+    mf_jcc_jns,     /* base opcode 0x79  */
> > >+    mf_jcc_jp,      /* base opcode 0x7a  */
> > >+    mf_jcc_jnp,     /* base opcode 0x7b  */
> > >+    mf_jcc_jl,      /* base opcode 0x7c  */
> > >+    mf_jcc_jge,     /* base opcode 0x7d  */
> > >+    mf_jcc_jle,     /* base opcode 0x7e  */
> > >+    mf_jcc_jg       /* base opcode 0x7f  */
> > >+  };
> >
> > Did you consider making this an insn template attribute instead?
> >
> Well, it could be calculated by tm.baseopcode - 0x70 if insn is jcc.
> So maybe it's a bit redudant to add it into insn_template?
> > >+enum mf_cmp_kind
> > >+  {
> > >+    mf_cmp_test,  /* test */
> > >+    mf_cmp_cmp,  /* cmp */
> > >+    mf_cmp_and,  /* and */
> > >+    mf_cmp_alu,  /* add/sub */
> > >+    mf_cmp_incdec  /* inc/dec */
> > >+  };
> >
> > Why 5 enumerators when only three are needed (CMP going
> > together with ADD/SUB and AND going together with TEST)?
> >
> Yes, only three are needed here, adding 5 enumerators just in case
> there are other combinations use in the future.
> > >@@ -8573,6 +8636,7 @@ output_insn (void)
> > >   offsetT insn_start_off;
> > >   fragS *fragP = NULL;
> > >   enum align_branch_kind branch = align_branch_none;
> > >+  enum mf_jcc_kind mf_jcc = mf_jcc_jo;
> >
> > Is this initializer needed? It looks pretty arbitrary to pick
> > JO here.
> >
> It's arbitrary, the initializer here is just to avoid uninitialized
> warning when building with option Wuninitialized.
> I'll add comments to explain the initializer.
> > >@@ -10967,20 +11069,23 @@ i386_next_non_empty_frag (fragS *fragP)
> > > /* Return the next jcc frag after BRANCH_PADDING.  */
> > >
> > > static fragS *
> > >-i386_next_jcc_frag (fragS *fragP)
> > >+i386_next_fusible_jcc_frag (fragS *cmp_fragP, fragS *pad_fragP)
> >
> > Besides "cmp" in the name again not really fitting all variants
> > (afaict), it also looks suspicious with ...
> >
> > >@@ -11025,7 +11130,7 @@ i386_classify_machine_dependent_frag (fragS *fragP)
> > >              */
> > >           cmp_fragP = i386_next_non_empty_frag (next_fragP);
> > >           pad_fragP = i386_next_non_empty_frag (cmp_fragP);
> > >-          branch_fragP = i386_next_jcc_frag (pad_fragP);
> > >+          branch_fragP = i386_next_fusible_jcc_frag (next_fragP, pad_fragP);
> >
> > ... next_fragP, not cmp_fragP being passed here.
> >
> Yes, use maybe_cmp_fragP may be better.
> > >--- a/gas/config/tc-i386.h
> > >+++ b/gas/config/tc-i386.h
> > >@@ -273,6 +273,7 @@ struct i386_tc_frag_data
> > >   unsigned char prefix_length;
> > >   unsigned char default_prefix;
> > >   unsigned char cmp_size;
> > >+  unsigned int mf_type : 4;
> >
> > Even if not going with a (2-bit) insn attribute, this doesn't
> > need to be wider than 3 bits, as J<cc> and JN<cc> always fall
> > into the same group, and hence don't need separate tracking
> > (i.e. when recording you could ignore the low opcode bit).
> >
> Yes, J<cc> and JN<cc> could share the same group. 3 bits is enough.
> > Jan
>
>
>
> --
> BR,
> Hongtao

Update patch.

-- 
BR,
Hongtao

[-- Attachment #2: 0001-According-to-intel-SDM-manual-not-all-compare-flag-m.patch --]
[-- Type: application/octet-stream, Size: 24194 bytes --]

From e9801ef97177eae61548ed54996d767e7179dff6 Mon Sep 17 00:00:00 2001
From: liuhongt <hongtao.liu@intel.com>
Date: Wed, 26 Feb 2020 14:57:04 +0800
Subject: [PATCH] According to intel SDM manual, not all compare flag-modifying
 instructions are marcro-fusible with subsequent jcc instruction. For those
 not, -mbranches-within-32B-boundaries need't align them as FUSED_JCC_PADDING,
 only jcc itself need to be aligned.

Here are 2 restrictions which separate macro-fusible instruction from not

Restriction 1:
If TEST/AND/CMP/ADD/SUB/INC/DEC is one of the following format:

   cmp m, imm
   add m, imm
   sub m, imm
   test m, imm
   and m, imm
   inc m
   dec m

it is unfusible with any jcc instruction.

Restriction 2:
/* Table 3-2. Macro-Fusible Instructions in Haswell Microarchitecture
   Note it also works for Skylake and Cascadelake.
---------------------------------------------------------------------
|   JCC   | ADD/SUB/CMP | INC/DEC | TEST/AND |
| ------  | ----------- | ------- | -------- |
|   Jo    |      N      |    N    |     Y    |
|   Jno   |      N      |    N    |     Y    |
|  Jc/Jb  |      Y      |    N    |     Y    |
| Jae/Jnb |      Y      |    N    |     Y    |
|  Je/Jz  |      Y      |    Y    |     Y    |
| Jne/Jnz |      Y      |    Y    |     Y    |
| Jna/Jbe |      Y      |    N    |     Y    |
| Ja/Jnbe |      Y      |    N    |     Y    |
|   Js    |      N      |    N    |     Y    |
|   Jns   |      N      |    N    |     Y    |
|  Jp/Jpe |      N      |    N    |     Y    |
| Jnp/Jpo |      N      |    N    |     Y    |
| Jl/Jnge |      Y      |    Y    |     Y    |
| Jge/Jnl |      Y      |    Y    |     Y    |
| Jle/Jng |      Y      |    Y    |     Y    |
| Jg/Jnle |      Y      |    Y    |     Y    |

Update maybe_fused_with_jcc_p to check if operands of CMP like instructions
can be fused with condition jump.

	* gas/config/tc-i386.h (i386_tc_frag_data): Add member mf_type.
	(TC_FRAG_INIT): Init mf_type.
	* gas/config/tc-i386.c (enum mf_jcc_kind): New enum.
	(enum mf_cmp_kind): Ditto.
	(maybe_fused_with_jcc_p): Add argument mf_cmp_p to get
	mf_type of corresponding instructons, exclude unfusible
	instructions.
	(add_fused_jcc_padding_frag_p): Likewise.
	(add_branch_padding_frag_p): Likewise.
	(output_insn): Record mf_type for corresponding instructions.
	(i386_macro_fusible_p): New function.
	(i386_next_fusible_jcc_frag): Rename from i386_next_jcc_frag,
	add argument cmp_fragP to return next fusible jcc frag only.
	(i386_classify_machine_dependant_frag): Seperate macro-fusible
	instructions from condition jump.
	* gas/testsuite/gas/i386/align-branch-9.s: New file.
	* gas/testsuite/gas/i386/align-branch-9.d: Ditto.
	* gas/testsuite/gas/i386/x86-64-align-branch-9.s: Ditto.
	* gas/testsuite/gas/i386/x86-64-align-branch-9.d: Ditto.
	* gas/testsuite/gas/i386/i386.exp: Run new tests.
---
 gas/config/tc-i386.c                          | 167 ++++++++++++++----
 gas/config/tc-i386.h                          |   2 +
 gas/testsuite/gas/i386/align-branch-9.d       |  78 ++++++++
 gas/testsuite/gas/i386/align-branch-9.s       |  74 ++++++++
 gas/testsuite/gas/i386/i386.exp               |   2 +
 .../gas/i386/x86-64-align-branch-9.d          |  46 +++++
 .../gas/i386/x86-64-align-branch-9.s          |  43 +++++
 7 files changed, 380 insertions(+), 32 deletions(-)
 create mode 100644 gas/testsuite/gas/i386/align-branch-9.d
 create mode 100644 gas/testsuite/gas/i386/align-branch-9.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-align-branch-9.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-align-branch-9.s

diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index 62b7cfbe6c..60e3283636 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -687,6 +687,29 @@ static unsigned int align_branch = (align_branch_jcc_bit
 				    | align_branch_fused_bit
 				    | align_branch_jmp_bit);
 
+/* Types of condition jump used by macro-fusion.  */
+enum mf_jcc_kind
+  {
+    mf_jcc_jo = 0,  /* base opcode 0x70  */
+    mf_jcc_jc,      /* base opcode 0x72  */
+    mf_jcc_je,      /* base opcode 0x74  */
+    mf_jcc_jna,     /* base opcode 0x76  */
+    mf_jcc_js,      /* base opcode 0x78  */
+    mf_jcc_jp,      /* base opcode 0x7a  */
+    mf_jcc_jl,      /* base opcode 0x7c  */
+    mf_jcc_jle,     /* base opcode 0x7e  */
+  };
+
+/* Types of compare flag-modifying insntructions used by macro-fusion.  */
+enum mf_cmp_kind
+  {
+    mf_cmp_test,  /* test */
+    mf_cmp_cmp,  /* cmp */
+    mf_cmp_and,  /* and */
+    mf_cmp_alu,  /* add/sub */
+    mf_cmp_incdec  /* inc/dec */
+  };
+
 /* The maximum padding size for fused jcc.  CMP like instruction can
    be 9 bytes and jcc can be 6 bytes.  Leave room just in case for
    prefixes.   */
@@ -8374,10 +8397,22 @@ encoding_length (const fragS *start_frag, offsetT start_off,
 }
 
 /* Return 1 for test, and, cmp, add, sub, inc and dec which may
-   be macro-fused with conditional jumps.  */
+   be macro-fused with conditional jumps.
+   NB: If TEST/AND/CMP/ADD/SUB/INC/DEC is of RIP relative address,
+   or is one of the following format:
+
+    cmp m, imm
+    add m, imm
+    sub m, imm
+   test m, imm
+    and m, imm
+    inc m
+    dec m
+
+   it is unfusible.  */
 
 static int
-maybe_fused_with_jcc_p (void)
+maybe_fused_with_jcc_p (enum mf_cmp_kind* mf_cmp_p)
 {
   /* No RIP address.  */
   if (i.base_reg && i.base_reg->reg_num == RegIP)
@@ -8387,36 +8422,54 @@ maybe_fused_with_jcc_p (void)
   if (is_any_vex_encoding (&i.tm))
     return 0;
 
-  /* and, add, sub with destination register.  */
-  if ((i.tm.base_opcode >= 0x20 && i.tm.base_opcode <= 0x25)
-      || i.tm.base_opcode <= 5
+  /* add, sub without add/sub m, imm.  */
+  if (i.tm.base_opcode <= 5
       || (i.tm.base_opcode >= 0x28 && i.tm.base_opcode <= 0x2d)
       || ((i.tm.base_opcode | 3) == 0x83
-	  && ((i.tm.extension_opcode | 1) == 0x5
+	  && (i.tm.extension_opcode == 0x5
 	      || i.tm.extension_opcode == 0x0)))
-    return (i.types[1].bitfield.class == Reg
-	    || i.types[1].bitfield.instance == Accum);
+    {
+      *mf_cmp_p = mf_cmp_alu;
+      return !(i.mem_operands && i.imm_operands);
+    }
 
-  /* test, cmp with any register.  */
+  /* and without and m, imm.  */
+  if ((i.tm.base_opcode >= 0x20 && i.tm.base_opcode <= 0x25)
+      || ((i.tm.base_opcode | 3) == 0x83
+	  && i.tm.extension_opcode == 0x4))
+    {
+      *mf_cmp_p = mf_cmp_and;
+      return !(i.mem_operands && i.imm_operands);
+    }
+
+  /* test without test m imm.  */
   if ((i.tm.base_opcode | 1) == 0x85
       || (i.tm.base_opcode | 1) == 0xa9
       || ((i.tm.base_opcode | 1) == 0xf7
-	  && i.tm.extension_opcode == 0)
-      || (i.tm.base_opcode >= 0x38 && i.tm.base_opcode <= 0x3d)
+	  && i.tm.extension_opcode == 0))
+    {
+      *mf_cmp_p = mf_cmp_test;
+      return !(i.mem_operands && i.imm_operands);
+    }
+
+  /* cmp without cmp m, imm.  */
+  if ((i.tm.base_opcode >= 0x38 && i.tm.base_opcode <= 0x3d)
       || ((i.tm.base_opcode | 3) == 0x83
 	  && (i.tm.extension_opcode == 0x7)))
-    return (i.types[0].bitfield.class == Reg
-	    || i.types[0].bitfield.instance == Accum
-	    || i.types[1].bitfield.class == Reg
-	    || i.types[1].bitfield.instance == Accum);
+    {
+      *mf_cmp_p = mf_cmp_cmp;
+      return !(i.mem_operands && i.imm_operands);
+    }
 
-  /* inc, dec with any register.   */
+  /* inc, dec without inc/dec m.   */
   if ((i.tm.cpu_flags.bitfield.cpuno64
        && (i.tm.base_opcode | 0xf) == 0x4f)
       || ((i.tm.base_opcode | 1) == 0xff
 	  && i.tm.extension_opcode <= 0x1))
-    return (i.types[0].bitfield.class == Reg
-	    || i.types[0].bitfield.instance == Accum);
+    {
+      *mf_cmp_p = mf_cmp_incdec;
+      return !i.mem_operands;
+    }
 
   return 0;
 }
@@ -8424,7 +8477,7 @@ maybe_fused_with_jcc_p (void)
 /* Return 1 if a FUSED_JCC_PADDING frag should be generated.  */
 
 static int
-add_fused_jcc_padding_frag_p (void)
+add_fused_jcc_padding_frag_p (enum mf_cmp_kind* mf_cmp_p)
 {
   /* NB: Don't work with COND_JUMP86 without i386.  */
   if (!align_branch_power
@@ -8433,7 +8486,7 @@ add_fused_jcc_padding_frag_p (void)
       || !(align_branch & align_branch_fused_bit))
     return 0;
 
-  if (maybe_fused_with_jcc_p ())
+  if (maybe_fused_with_jcc_p (mf_cmp_p))
     {
       if (last_insn.kind == last_insn_other
 	  || last_insn.seg != now_seg)
@@ -8481,7 +8534,8 @@ add_branch_prefix_frag_p (void)
 /* Return 1 if a BRANCH_PADDING frag should be generated.  */
 
 static int
-add_branch_padding_frag_p (enum align_branch_kind *branch_p)
+add_branch_padding_frag_p (enum align_branch_kind *branch_p,
+			   enum mf_jcc_kind *mf_jcc_p)
 {
   int add_padding;
 
@@ -8503,6 +8557,9 @@ add_branch_padding_frag_p (enum align_branch_kind *branch_p)
 	}
       else
 	{
+	  /* Because J<cc> and JN<cc> share same group in macro-fusible table,
+	     igore the lowest bit.  */
+	  *mf_jcc_p = (i.tm.base_opcode - 0x70) >> 1;
 	  *branch_p = align_branch_jcc;
 	  if ((align_branch & align_branch_jcc_bit))
 	    add_padding = 1;
@@ -8573,6 +8630,10 @@ output_insn (void)
   offsetT insn_start_off;
   fragS *fragP = NULL;
   enum align_branch_kind branch = align_branch_none;
+  /* The initializer is arbitrary just to avoid uninitialized error.
+     it's actually either assigned in add_branch_padding_frag_p
+     or never be used.  */
+  enum mf_jcc_kind mf_jcc = mf_jcc_jo;
 
 #if defined (OBJ_ELF) || defined (OBJ_MAYBE_ELF)
   if (IS_ELF && x86_used_note)
@@ -8665,7 +8726,7 @@ output_insn (void)
   insn_start_frag = frag_now;
   insn_start_off = frag_now_fix ();
 
-  if (add_branch_padding_frag_p (&branch))
+  if (add_branch_padding_frag_p (&branch, &mf_jcc))
     {
       char *p;
       /* Branch can be 8 bytes.  Leave some room for prefixes.  */
@@ -8686,6 +8747,7 @@ output_insn (void)
 		ENCODE_RELAX_STATE (BRANCH_PADDING, 0),
 		NULL, 0, p);
 
+      fragP->tc_frag_data.mf_type = mf_jcc;
       fragP->tc_frag_data.branch_type = branch;
       fragP->tc_frag_data.max_bytes = max_branch_padding_size;
     }
@@ -8705,6 +8767,7 @@ output_insn (void)
       unsigned char *q;
       unsigned int j;
       unsigned int prefix;
+      enum mf_cmp_kind mf_cmp;
 
       if (avoid_fence
 	  && (i.tm.base_opcode == 0xfaee8
@@ -8731,7 +8794,7 @@ output_insn (void)
       if (branch)
 	/* Skip if this is a branch.  */
 	;
-      else if (add_fused_jcc_padding_frag_p ())
+      else if (add_fused_jcc_padding_frag_p (&mf_cmp))
 	{
 	  /* Make room for padding.  */
 	  frag_grow (MAX_FUSED_JCC_PADDING_SIZE);
@@ -8743,6 +8806,7 @@ output_insn (void)
 		    ENCODE_RELAX_STATE (FUSED_JCC_PADDING, 0),
 		    NULL, 0, p);
 
+	  fragP->tc_frag_data.mf_type = mf_cmp;
 	  fragP->tc_frag_data.branch_type = align_branch_fused;
 	  fragP->tc_frag_data.max_bytes = MAX_FUSED_JCC_PADDING_SIZE;
 	}
@@ -10948,6 +11012,42 @@ elf_symbol_resolved_in_segment_p (symbolS *fr_symbol, offsetT fr_var)
 }
 #endif
 
+/* Table 3-2. Macro-Fusible Instructions in Haswell Microarchitecture
+   Note also work for Skylake and Cascadelake.
+---------------------------------------------------------------------
+|   JCC   | ADD/SUB/CMP | INC/DEC | TEST/AND |
+| ------  | ----------- | ------- | -------- |
+|   Jo    |      N      |    N    |     Y    |
+|   Jno   |      N      |    N    |     Y    |
+|  Jc/Jb  |      Y      |    N    |     Y    |
+| Jae/Jnb |      Y      |    N    |     Y    |
+|  Je/Jz  |      Y      |    Y    |     Y    |
+| Jne/Jnz |      Y      |    Y    |     Y    |
+| Jna/Jbe |      Y      |    N    |     Y    |
+| Ja/Jnbe |      Y      |    N    |     Y    |
+|   Js    |      N      |    N    |     Y    |
+|   Jns   |      N      |    N    |     Y    |
+|  Jp/Jpe |      N      |    N    |     Y    |
+| Jnp/Jpo |      N      |    N    |     Y    |
+| Jl/Jnge |      Y      |    Y    |     Y    |
+| Jge/Jnl |      Y      |    Y    |     Y    |
+| Jle/Jng |      Y      |    Y    |     Y    |
+| Jg/Jnle |      Y      |    Y    |     Y    |
+---------------------------------------------------------------------  */
+static int
+i386_macro_fusible_p (enum mf_cmp_kind mf_cmp, enum mf_jcc_kind mf_jcc)
+{
+  if (mf_cmp == mf_cmp_alu || mf_cmp == mf_cmp_cmp)
+    return ((mf_jcc >= mf_jcc_jc && mf_jcc <= mf_jcc_jna)
+	    || mf_jcc == mf_jcc_jl || mf_jcc == mf_jcc_jle);
+  if (mf_cmp == mf_cmp_incdec)
+    return (mf_jcc == mf_jcc_je || mf_jcc == mf_jcc_jl
+	    || mf_jcc == mf_jcc_jle);
+  if (mf_cmp == mf_cmp_test || mf_cmp == mf_cmp_and)
+    return 1;
+  return 0;
+}
+
 /* Return the next non-empty frag.  */
 
 static fragS *
@@ -10967,20 +11067,23 @@ i386_next_non_empty_frag (fragS *fragP)
 /* Return the next jcc frag after BRANCH_PADDING.  */
 
 static fragS *
-i386_next_jcc_frag (fragS *fragP)
+i386_next_fusible_jcc_frag (fragS *maybe_cmp_fragP, fragS *pad_fragP)
 {
-  if (!fragP)
+  fragS *branch_fragP;
+  if (!pad_fragP)
     return NULL;
 
-  if (fragP->fr_type == rs_machine_dependent
-      && (TYPE_FROM_RELAX_STATE (fragP->fr_subtype)
+  if (pad_fragP->fr_type == rs_machine_dependent
+      && (TYPE_FROM_RELAX_STATE (pad_fragP->fr_subtype)
 	  == BRANCH_PADDING))
     {
-      fragP = i386_next_non_empty_frag (fragP);
-      if (fragP->fr_type != rs_machine_dependent)
+      branch_fragP = i386_next_non_empty_frag (pad_fragP);
+      if (branch_fragP->fr_type != rs_machine_dependent)
 	return NULL;
-      if (TYPE_FROM_RELAX_STATE (fragP->fr_subtype) == COND_JUMP)
-	return fragP;
+      if (TYPE_FROM_RELAX_STATE (branch_fragP->fr_subtype) == COND_JUMP
+	  && i386_macro_fusible_p (maybe_cmp_fragP->tc_frag_data.mf_type,
+				   pad_fragP->tc_frag_data.mf_type))
+	return branch_fragP;
     }
 
   return NULL;
@@ -11025,7 +11128,7 @@ i386_classify_machine_dependent_frag (fragS *fragP)
 	       */
 	    cmp_fragP = i386_next_non_empty_frag (next_fragP);
 	    pad_fragP = i386_next_non_empty_frag (cmp_fragP);
-	    branch_fragP = i386_next_jcc_frag (pad_fragP);
+	    branch_fragP = i386_next_fusible_jcc_frag (next_fragP, pad_fragP);
 	    if (branch_fragP)
 	      {
 		/* The BRANCH_PADDING frag is merged with the
diff --git a/gas/config/tc-i386.h b/gas/config/tc-i386.h
index 845b3901d4..93678c2282 100644
--- a/gas/config/tc-i386.h
+++ b/gas/config/tc-i386.h
@@ -273,6 +273,7 @@ struct i386_tc_frag_data
   unsigned char prefix_length;
   unsigned char default_prefix;
   unsigned char cmp_size;
+  unsigned int mf_type : 3;
   unsigned int classified : 1;
   unsigned int branch_type : 3;
 };
@@ -299,6 +300,7 @@ struct i386_tc_frag_data
      (FRAGP)->tc_frag_data.cmp_size = 0;			\
      (FRAGP)->tc_frag_data.classified = 0;			\
      (FRAGP)->tc_frag_data.branch_type = 0;			\
+     (FRAGP)->tc_frag_data.mf_type = 0;			\
    }								\
  while (0)
 
diff --git a/gas/testsuite/gas/i386/align-branch-9.d b/gas/testsuite/gas/i386/align-branch-9.d
new file mode 100644
index 0000000000..6340817d04
--- /dev/null
+++ b/gas/testsuite/gas/i386/align-branch-9.d
@@ -0,0 +1,78 @@
+#as: -mbranches-within-32B-boundaries
+#objdump: -dw
+
+.*: +file format .*
+
+Disassembly of section .text:
+
+0+ <foo>:
+   0:	65 a3 01 00 00 00    	mov    %eax,%gs:0x1
+   6:	55                   	push   %ebp
+   7:	55                   	push   %ebp
+   8:	55                   	push   %ebp
+   9:	55                   	push   %ebp
+   a:	89 e5                	mov    %esp,%ebp
+   c:	89 7d f8             	mov    %edi,-0x8\(%ebp\)
+   f:	89 75 f4             	mov    %esi,-0xc\(%ebp\)
+  12:	89 75 f4             	mov    %esi,-0xc\(%ebp\)
+  15:	89 75 f4             	mov    %esi,-0xc\(%ebp\)
+  18:	89 75 f4             	mov    %esi,-0xc\(%ebp\)
+  1b:	89 75 f4             	mov    %esi,-0xc\(%ebp\)
+  1e:	39 c5                	cmp    %eax,%ebp
+  20:	70 62                	jo     84 <foo\+0x84>
+  22:	89 73 f4             	mov    %esi,-0xc\(%ebx\)
+  25:	89 75 f4             	mov    %esi,-0xc\(%ebp\)
+  28:	89 7d f8             	mov    %edi,-0x8\(%ebp\)
+  2b:	89 75 f4             	mov    %esi,-0xc\(%ebp\)
+  2e:	89 75 f4             	mov    %esi,-0xc\(%ebp\)
+  31:	89 75 f4             	mov    %esi,-0xc\(%ebp\)
+  34:	89 75 f4             	mov    %esi,-0xc\(%ebp\)
+  37:	89 75 f4             	mov    %esi,-0xc\(%ebp\)
+  3a:	5d                   	pop    %ebp
+  3b:	5d                   	pop    %ebp
+  3c:	5d                   	pop    %ebp
+  3d:	74 45                	je     84 <foo\+0x84>
+  3f:	5d                   	pop    %ebp
+  40:	74 42                	je     84 <foo\+0x84>
+  42:	89 44 24 fc          	mov    %eax,-0x4\(%esp\)
+  46:	89 75 f4             	mov    %esi,-0xc\(%ebp\)
+  49:	89 7d f8             	mov    %edi,-0x8\(%ebp\)
+  4c:	89 75 f4             	mov    %esi,-0xc\(%ebp\)
+  4f:	89 75 f4             	mov    %esi,-0xc\(%ebp\)
+  52:	89 75 f4             	mov    %esi,-0xc\(%ebp\)
+  55:	89 75 f4             	mov    %esi,-0xc\(%ebp\)
+  58:	89 75 f4             	mov    %esi,-0xc\(%ebp\)
+  5b:	5d                   	pop    %ebp
+  5c:	eb 2c                	jmp    8a <foo\+0x8a>
+  5e:	66 90                	xchg   %ax,%ax
+  60:	eb 28                	jmp    8a <foo\+0x8a>
+  62:	eb 26                	jmp    8a <foo\+0x8a>
+  64:	89 45 fc             	mov    %eax,-0x4\(%ebp\)
+  67:	89 75 f4             	mov    %esi,-0xc\(%ebp\)
+  6a:	89 7d f8             	mov    %edi,-0x8\(%ebp\)
+  6d:	5d                   	pop    %ebp
+  6e:	5d                   	pop    %ebp
+  6f:	40                   	inc    %eax
+  70:	72 12                	jb     84 <foo\+0x84>
+  72:	36 36 89 45 fc       	ss mov %eax,%ss:-0x4\(%ebp\)
+  77:	89 75 f4             	mov    %esi,-0xc\(%ebp\)
+  7a:	89 7d f8             	mov    %edi,-0x8\(%ebp\)
+  7d:	89 75 f4             	mov    %esi,-0xc\(%ebp\)
+  80:	21 c3                	and    %eax,%ebx
+  82:	7c 06                	jl     8a <foo\+0x8a>
+  84:	8b 45 f4             	mov    -0xc\(%ebp\),%eax
+  87:	89 45 fc             	mov    %eax,-0x4\(%ebp\)
+  8a:	89 b5 50 fb ff ff    	mov    %esi,-0x4b0\(%ebp\)
+  90:	89 b5 50 fb ff ff    	mov    %esi,-0x4b0\(%ebp\)
+  96:	89 b5 50 fb ff ff    	mov    %esi,-0x4b0\(%ebp\)
+  9c:	89 b5 50 fb ff ff    	mov    %esi,-0x4b0\(%ebp\)
+  a2:	89 75 0c             	mov    %esi,0xc\(%ebp\)
+  a5:	e9 fc ff ff ff       	jmp    a6 <foo\+0xa6>
+  aa:	89 b5 50 fb ff ff    	mov    %esi,-0x4b0\(%ebp\)
+  b0:	89 b5 50 fb ff ff    	mov    %esi,-0x4b0\(%ebp\)
+  b6:	89 b5 50 fb ff ff    	mov    %esi,-0x4b0\(%ebp\)
+  bc:	89 b5 50 fb ff ff    	mov    %esi,-0x4b0\(%ebp\)
+  c2:	89 75 00             	mov    %esi,0x0\(%ebp\)
+  c5:	74 c3                	je     8a <foo\+0x8a>
+  c7:	74 c1                	je     8a <foo\+0x8a>
+#pass
diff --git a/gas/testsuite/gas/i386/align-branch-9.s b/gas/testsuite/gas/i386/align-branch-9.s
new file mode 100644
index 0000000000..357abe30f9
--- /dev/null
+++ b/gas/testsuite/gas/i386/align-branch-9.s
@@ -0,0 +1,74 @@
+  .text
+  .globl  foo
+  .p2align  4
+foo:
+  movl	%eax, %gs:0x1
+  pushl  %ebp
+  pushl  %ebp
+  pushl  %ebp
+  pushl  %ebp
+  movl  %esp, %ebp
+  movl  %edi, -8(%ebp)
+  movl  %esi, -12(%ebp)
+  movl  %esi, -12(%ebp)
+  movl  %esi, -12(%ebp)
+  movl  %esi, -12(%ebp)
+  movl  %esi, -12(%ebp)
+  cmp  %eax, %ebp
+  jo  .L_2
+  movl  %esi, -12(%ebx)
+  movl  %esi, -12(%ebp)
+  movl  %edi, -8(%ebp)
+  movl  %esi, -12(%ebp)
+  movl  %esi, -12(%ebp)
+  movl  %esi, -12(%ebp)
+  movl  %esi, -12(%ebp)
+  movl  %esi, -12(%ebp)
+  popl  %ebp
+  popl  %ebp
+  popl  %ebp
+  je  .L_2
+  popl  %ebp
+  je  .L_2
+  movl  %eax, -4(%esp)
+  movl  %esi, -12(%ebp)
+  movl  %edi, -8(%ebp)
+  movl  %esi, -12(%ebp)
+  movl  %esi, -12(%ebp)
+  movl  %esi, -12(%ebp)
+  movl  %esi, -12(%ebp)
+  movl  %esi, -12(%ebp)
+  popl  %ebp
+  jmp  .L_3
+  jmp  .L_3
+  jmp  .L_3
+  movl  %eax, -4(%ebp)
+  movl  %esi, -12(%ebp)
+  movl  %edi, -8(%ebp)
+  popl  %ebp
+  popl  %ebp
+  inc  %eax
+  jc  .L_2
+  movl  %eax, -4(%ebp)
+  movl  %esi, -12(%ebp)
+  movl  %edi, -8(%ebp)
+  movl  %esi, -12(%ebp)
+  and  %eax, %ebx
+  jl  .L_3
+.L_2:
+  movl  -12(%ebp), %eax
+  movl  %eax, -4(%ebp)
+.L_3:
+  movl  %esi, -1200(%ebp)
+  movl  %esi, -1200(%ebp)
+  movl  %esi, -1200(%ebp)
+  movl  %esi, -1200(%ebp)
+  movl  %esi, 12(%ebp)
+  jmp  bar
+  movl  %esi, -1200(%ebp)
+  movl  %esi, -1200(%ebp)
+  movl  %esi, -1200(%ebp)
+  movl  %esi, -1200(%ebp)
+  movl  %esi, (%ebp)
+  je .L_3
+  je .L_3
diff --git a/gas/testsuite/gas/i386/i386.exp b/gas/testsuite/gas/i386/i386.exp
index 685e62ea72..8fc621f2bb 100644
--- a/gas/testsuite/gas/i386/i386.exp
+++ b/gas/testsuite/gas/i386/i386.exp
@@ -525,6 +525,7 @@ if [expr ([istarget "i*86-*-*"] ||  [istarget "x86_64-*-*"]) && [gas_32_check]]
     run_dump_test "align-branch-6"
     run_dump_test "align-branch-7"
     run_dump_test "align-branch-8"
+    run_dump_test "align-branch-9"
 
     # These tests require support for 8 and 16 bit relocs,
     # so we only run them for ELF and COFF targets.
@@ -1100,6 +1101,7 @@ if [expr ([istarget "i*86-*-*"] || [istarget "x86_64-*-*"]) && [gas_64_check]] t
     run_dump_test "x86-64-align-branch-6"
     run_dump_test "x86-64-align-branch-7"
     run_dump_test "x86-64-align-branch-8"
+    run_dump_test "x86-64-align-branch-9"
 
     if { ![istarget "*-*-aix*"]
       && ![istarget "*-*-beos*"]
diff --git a/gas/testsuite/gas/i386/x86-64-align-branch-9.d b/gas/testsuite/gas/i386/x86-64-align-branch-9.d
new file mode 100644
index 0000000000..1041fd0483
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-align-branch-9.d
@@ -0,0 +1,46 @@
+#as: -mbranches-within-32B-boundaries
+#objdump: -dw
+
+.*: +file format .*
+
+Disassembly of section .text:
+
+0+ <foo>:
+   0:	c1 e9 02             	shr    \$0x2,%ecx
+   3:	c1 e9 02             	shr    \$0x2,%ecx
+   6:	c1 e9 02             	shr    \$0x2,%ecx
+   9:	89 d1                	mov    %edx,%ecx
+   b:	31 c0                	xor    %eax,%eax
+   d:	c1 e9 02             	shr    \$0x2,%ecx
+  10:	c1 e9 02             	shr    \$0x2,%ecx
+  13:	c1 e9 02             	shr    \$0x2,%ecx
+  16:	c1 e9 02             	shr    \$0x2,%ecx
+  19:	c1 e9 02             	shr    \$0x2,%ecx
+  1c:	c1 e9 02             	shr    \$0x2,%ecx
+  1f:	80 fa 02             	cmp    \$0x2,%dl
+  22:	70 df                	jo     3 <foo\+0x3>
+  24:	2e 2e 2e 2e 31 c0    	cs cs cs cs xor %eax,%eax
+  2a:	c1 e9 02             	shr    \$0x2,%ecx
+  2d:	c1 e9 02             	shr    \$0x2,%ecx
+  30:	c1 e9 02             	shr    \$0x2,%ecx
+  33:	89 d1                	mov    %edx,%ecx
+  35:	31 c0                	xor    %eax,%eax
+  37:	c1 e9 02             	shr    \$0x2,%ecx
+  3a:	c1 e9 02             	shr    \$0x2,%ecx
+  3d:	c1 e9 02             	shr    \$0x2,%ecx
+  40:	f6 c2 02             	test   \$0x2,%dl
+  43:	75 e8                	jne    2d <foo\+0x2d>
+  45:	31 c0                	xor    %eax,%eax
+  47:	c1 e9 02             	shr    \$0x2,%ecx
+  4a:	c1 e9 02             	shr    \$0x2,%ecx
+  4d:	89 d1                	mov    %edx,%ecx
+  4f:	c1 e9 02             	shr    \$0x2,%ecx
+  52:	c1 e9 02             	shr    \$0x2,%ecx
+  55:	89 d1                	mov    %edx,%ecx
+  57:	c1 e9 02             	shr    \$0x2,%ecx
+  5a:	89 d1                	mov    %edx,%ecx
+  5c:	31 c0                	xor    %eax,%eax
+  5e:	ff c0                	inc    %eax
+  60:	76 cb                	jbe    2d <foo\+0x2d>
+  62:	31 c0                	xor    %eax,%eax
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-align-branch-9.s b/gas/testsuite/gas/i386/x86-64-align-branch-9.s
new file mode 100644
index 0000000000..917579bda4
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-align-branch-9.s
@@ -0,0 +1,43 @@
+	.text
+	.p2align 4,,15
+foo:
+	shrl	$2, %ecx
+.L1:
+	shrl	$2, %ecx
+	shrl	$2, %ecx
+	movl	%edx, %ecx
+	xorl	%eax, %eax
+	shrl	$2, %ecx
+	shrl	$2, %ecx
+	shrl	$2, %ecx
+	shrl	$2, %ecx
+	shrl	$2, %ecx
+	shrl	$2, %ecx
+	cmpb	$2, %dl
+	jo	.L1
+	xorl	%eax, %eax
+	shrl	$2, %ecx
+.L2:
+	shrl	$2, %ecx
+	shrl	$2, %ecx
+	movl	%edx, %ecx
+	xorl	%eax, %eax
+	shrl	$2, %ecx
+	shrl	$2, %ecx
+	shrl	$2, %ecx
+	testb	$2, %dl
+	jne	.L2
+	xorl	%eax, %eax
+.L3:
+	shrl	$2, %ecx
+	shrl	$2, %ecx
+	movl	%edx, %ecx
+	shrl	$2, %ecx
+	shrl	$2, %ecx
+	movl	%edx, %ecx
+	shrl	$2, %ecx
+	movl	%edx, %ecx
+	xorl	%eax, %eax
+	inc	%eax
+	jbe	.L2
+	xorl	%eax, %eax
-- 
2.18.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Separate macro-fusible instructions from not for -mbranches-within-32B-boundaries
  2020-03-02  4:05       ` Hongtao Liu
@ 2020-03-02  8:32         ` Jan Beulich
  2020-03-03  3:40           ` Hongtao Liu
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Beulich @ 2020-03-02  8:32 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: H. J. Lu, binutils

On 02.03.2020 05:05, Hongtao Liu wrote:
> Update patch.

>+enum mf_jcc_kind
>+  {
>+    mf_jcc_jo = 0,  /* base opcode 0x70  */

Is there a reason for the " = 0" here? Without is, the standard
still mandates the value to be zero.

>+enum mf_cmp_kind
>+  {
>+    mf_cmp_test,  /* test */
>+    mf_cmp_cmp,  /* cmp */
>+    mf_cmp_and,  /* and */
>+    mf_cmp_alu,  /* add/sub */
>+    mf_cmp_incdec  /* inc/dec */
>+  };

I saw your reply to my question of why this is 5 instead of 3
enumerators. Yet I think this being a workaround for something
that hopefully will be fixed in newer hardware suggests to go
with the minimal necessary set first, and split enumerators
only if indeed needed down the road.

>@@ -8503,6 +8557,9 @@ add_branch_padding_frag_p (enum align_branch_kind *branch_p)
> 	}
>       else
> 	{
>+	  /* Because J<cc> and JN<cc> share same group in macro-fusible table,
>+	     igore the lowest bit.  */
>+	  *mf_jcc_p = (i.tm.base_opcode - 0x70) >> 1;

May I suggest to use (i.tm.base_opcode & 0x0e) >> 1, to be independent
of the insn variant presently in i.tm (after all we do dynamically
update it after copying from the template)?

Thanks for doing the other adjustments.

Jan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Separate macro-fusible instructions from not for -mbranches-within-32B-boundaries
  2020-03-02  8:32         ` Jan Beulich
@ 2020-03-03  3:40           ` Hongtao Liu
  2020-03-03  8:38             ` Jan Beulich
  0 siblings, 1 reply; 9+ messages in thread
From: Hongtao Liu @ 2020-03-03  3:40 UTC (permalink / raw)
  To: Jan Beulich; +Cc: H. J. Lu, binutils

[-- Attachment #1: Type: text/plain, Size: 1707 bytes --]

On Mon, Mar 2, 2020 at 4:32 PM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 02.03.2020 05:05, Hongtao Liu wrote:
> > Update patch.
>
> >+enum mf_jcc_kind
> >+  {
> >+    mf_jcc_jo = 0,  /* base opcode 0x70  */
>
> Is there a reason for the " = 0" here? Without is, the standard
> still mandates the value to be zero.
>
Yes.
I saw some other places in tc-i386.c where first member of enum is
assigned as 0, and also some without initializer, so i guess when the
enum variable is used as left value such as calculated by (baseopcode
& 0x0e ) >>1, it should be explicitly assigned, otherwise not.
Just my guess.
> >+enum mf_cmp_kind
> >+  {
> >+    mf_cmp_test,  /* test */
> >+    mf_cmp_cmp,  /* cmp */
> >+    mf_cmp_and,  /* and */
> >+    mf_cmp_alu,  /* add/sub */
> >+    mf_cmp_incdec  /* inc/dec */
> >+  };
>
> I saw your reply to my question of why this is 5 instead of 3
> enumerators. Yet I think this being a workaround for something
> that hopefully will be fixed in newer hardware suggests to go
> with the minimal necessary set first, and split enumerators
> only if indeed needed down the road.
Done.
>
> >@@ -8503,6 +8557,9 @@ add_branch_padding_frag_p (enum align_branch_kind *branch_p)
> >       }
> >       else
> >       {
> >+        /* Because J<cc> and JN<cc> share same group in macro-fusible table,
> >+           igore the lowest bit.  */
> >+        *mf_jcc_p = (i.tm.base_opcode - 0x70) >> 1;
>
> May I suggest to use (i.tm.base_opcode & 0x0e) >> 1, to be independent
> of the insn variant presently in i.tm (after all we do dynamically
> update it after copying from the template)?
>
Done.

Update my patch.

> Thanks for doing the other adjustments.
>
> Jan



--
BR,
Hongtao

[-- Attachment #2: 0001-According-to-intel-SDM-manual-not-all-compare-flag-m_v2.patch --]
[-- Type: application/x-patch, Size: 24132 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Separate macro-fusible instructions from not for -mbranches-within-32B-boundaries
  2020-03-03  3:40           ` Hongtao Liu
@ 2020-03-03  8:38             ` Jan Beulich
  2020-03-03 14:12               ` H.J. Lu
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Beulich @ 2020-03-03  8:38 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: H. J. Lu, binutils

On 03.03.2020 04:40, Hongtao Liu wrote:
> Update my patch.

Thanks again, looks good to me now with just one minor comment:

>+enum mf_cmp_kind
>+  {
>+    mf_cmp_test_and,  /* test/cmp */

If there is a comment here, I think it should also mention AND. But
of course given the enumerator's name I'm not sure a comment is
needed in the first place. But you'll have to wait for H.J.'s
comments / okay anyway.

Jan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Separate macro-fusible instructions from not for -mbranches-within-32B-boundaries
  2020-03-03  8:38             ` Jan Beulich
@ 2020-03-03 14:12               ` H.J. Lu
  0 siblings, 0 replies; 9+ messages in thread
From: H.J. Lu @ 2020-03-03 14:12 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Hongtao Liu, Binutils

On Tue, Mar 3, 2020 at 12:38 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 03.03.2020 04:40, Hongtao Liu wrote:
> > Update my patch.
>
> Thanks again, looks good to me now with just one minor comment:
>
> >+enum mf_cmp_kind
> >+  {
> >+    mf_cmp_test_and,  /* test/cmp */
>
> If there is a comment here, I think it should also mention AND. But
> of course given the enumerator's name I'm not sure a comment is
> needed in the first place. But you'll have to wait for H.J.'s
> comments / okay anyway.
>

LGTM.  I am checking it in for Hongtao.

Thank you both.

-- 
H.J.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-03-03 14:12 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-28  8:56 [PATCH] Separate macro-fusible instructions from not for -mbranches-within-32B-boundaries Hongtao Liu
2020-02-28  8:57 ` Hongtao Liu
2020-02-28 11:35   ` Jan Beulich
2020-03-02  3:14     ` Hongtao Liu
2020-03-02  4:05       ` Hongtao Liu
2020-03-02  8:32         ` Jan Beulich
2020-03-03  3:40           ` Hongtao Liu
2020-03-03  8:38             ` Jan Beulich
2020-03-03 14:12               ` H.J. Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).