* [PATCH] Separate macro-fusible instructions from not for -mbranches-within-32B-boundaries
@ 2020-02-28 8:56 Hongtao Liu
2020-02-28 8:57 ` Hongtao Liu
0 siblings, 1 reply; 9+ messages in thread
From: Hongtao Liu @ 2020-02-28 8:56 UTC (permalink / raw)
To: H. J. Lu; +Cc: binutils
Hi:
This is subsequent patch for JCC Code Erratum.
Refer to http://sourceware-org.1504.n7.nabble.com/PATCH-0-5-i386-Optimize-for-Jump-Conditional-Code-Erratum-tt598005.html#none
According to intel SDM manual, not all compare flag-modifying
instructions are marcro-fusible with subsequent jcc instruction. For
those not, -mbranches-within-32B-boundaries need't align them as
FUSED_JCC_PADDING, only jcc itself need to be aligned. Here are 2
restrictions which separate macro-fusible instruction from not
Restriction 1:
If TEST/AND/CMP/ADD/SUB/INC/DEC is one of the following format:
cmp m, imm
add m, imm
sub m, imm
test m, imm
and m, imm
inc m
dec m
it is unfusible with any jcc instructions.
Restriction 2:
/* Table 3-2. Macro-Fusible Instructions in Haswell Microarchitecture
---------------------------------------------------------------------
| JCC | ADD/SUB/CMP | INC/DEC | TEST/AND |
| ------ | ----------- | ------- | -------- |
| Jo | N | N | Y |
| Jno | N | N | Y |
| Jc/Jb | Y | N | Y |
| Jae/Jnb | Y | N | Y |
| Je/Jz | Y | Y | Y |
| Jne/Jnz | Y | Y | Y |
| Jna/Jbe | Y | N | Y |
| Ja/Jnbe | Y | N | Y |
| Js | N | N | Y |
| Jns | N | N | Y |
| Jp/Jpe | N | N | Y |
| Jnp/Jpo | N | N | Y |
| Jl/Jnge | Y | Y | Y |
| Jge/Jnl | Y | Y | Y |
| Jle/Jng | Y | Y | Y |
| Jg/Jnle | Y | Y | Y |
Changelog
* gas/config/tc-i386.h (i386_tc_frag_data): Add member mf_type.
(TC_FRAG_INIT): Init mf_type.
* gas/config/tc-i386.c (enum mf_jcc_kind): New enum.
(enum mf_cmp_kind): Ditto.
(maybe_fused_with_jcc_p): Add argument mf_cmp_p to get
mf_type of corresponding instructons, exclude unfusible
instructions.
(add_fused_jcc_padding_frag_p): Likewise.
(add_branch_padding_frag_p): Likewise.
(output_insn): Record mf_type for corresponding instructions.
(i386_macro_fusible_p): New function.
(i386_next_fusible_jcc_frag): Rename from i386_next_jcc_frag,
add argument cmp_fragP to return next fusible jcc frag only.
(i386_classify_machine_dependant_frag): Seperate macro-fusible
instructions from condition jump.
* gas/testsuite/gas/i386/align-branch-9.s: New file.
* gas/testsuite/gas/i386/align-branch-9.d: Ditto.
* gas/testsuite/gas/i386/x86-64-align-branch-9.s: Ditto.
* gas/testsuite/gas/i386/x86-64-align-branch-9.d: Ditto.
* gas/testsuite/gas/i386/i386.exp: Run new tests.
--
BR,
Hongtao
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] Separate macro-fusible instructions from not for -mbranches-within-32B-boundaries
2020-02-28 8:56 [PATCH] Separate macro-fusible instructions from not for -mbranches-within-32B-boundaries Hongtao Liu
@ 2020-02-28 8:57 ` Hongtao Liu
2020-02-28 11:35 ` Jan Beulich
0 siblings, 1 reply; 9+ messages in thread
From: Hongtao Liu @ 2020-02-28 8:57 UTC (permalink / raw)
To: H. J. Lu; +Cc: binutils
[-- Attachment #1: Type: text/plain, Size: 3330 bytes --]
On Fri, Feb 28, 2020 at 4:55 PM Hongtao Liu <crazylht@gmail.com> wrote:
>
> Hi:
> This is subsequent patch for JCC Code Erratum.
> Refer to http://sourceware-org.1504.n7.nabble.com/PATCH-0-5-i386-Optimize-for-Jump-Conditional-Code-Erratum-tt598005.html#none
>
> According to intel SDM manual, not all compare flag-modifying
> instructions are marcro-fusible with subsequent jcc instruction. For
> those not, -mbranches-within-32B-boundaries need't align them as
> FUSED_JCC_PADDING, only jcc itself need to be aligned. Here are 2
> restrictions which separate macro-fusible instruction from not
>
> Restriction 1:
> If TEST/AND/CMP/ADD/SUB/INC/DEC is one of the following format:
>
> cmp m, imm
> add m, imm
> sub m, imm
> test m, imm
> and m, imm
> inc m
> dec m
>
> it is unfusible with any jcc instructions.
>
> Restriction 2:
> /* Table 3-2. Macro-Fusible Instructions in Haswell Microarchitecture
> ---------------------------------------------------------------------
> | JCC | ADD/SUB/CMP | INC/DEC | TEST/AND |
> | ------ | ----------- | ------- | -------- |
> | Jo | N | N | Y |
> | Jno | N | N | Y |
> | Jc/Jb | Y | N | Y |
> | Jae/Jnb | Y | N | Y |
> | Je/Jz | Y | Y | Y |
> | Jne/Jnz | Y | Y | Y |
> | Jna/Jbe | Y | N | Y |
> | Ja/Jnbe | Y | N | Y |
> | Js | N | N | Y |
> | Jns | N | N | Y |
> | Jp/Jpe | N | N | Y |
> | Jnp/Jpo | N | N | Y |
> | Jl/Jnge | Y | Y | Y |
> | Jge/Jnl | Y | Y | Y |
> | Jle/Jng | Y | Y | Y |
> | Jg/Jnle | Y | Y | Y |
>
>
> Changelog
>
> * gas/config/tc-i386.h (i386_tc_frag_data): Add member mf_type.
> (TC_FRAG_INIT): Init mf_type.
> * gas/config/tc-i386.c (enum mf_jcc_kind): New enum.
> (enum mf_cmp_kind): Ditto.
> (maybe_fused_with_jcc_p): Add argument mf_cmp_p to get
> mf_type of corresponding instructons, exclude unfusible
> instructions.
> (add_fused_jcc_padding_frag_p): Likewise.
> (add_branch_padding_frag_p): Likewise.
> (output_insn): Record mf_type for corresponding instructions.
> (i386_macro_fusible_p): New function.
> (i386_next_fusible_jcc_frag): Rename from i386_next_jcc_frag,
> add argument cmp_fragP to return next fusible jcc frag only.
> (i386_classify_machine_dependant_frag): Seperate macro-fusible
> instructions from condition jump.
> * gas/testsuite/gas/i386/align-branch-9.s: New file.
> * gas/testsuite/gas/i386/align-branch-9.d: Ditto.
> * gas/testsuite/gas/i386/x86-64-align-branch-9.s: Ditto.
> * gas/testsuite/gas/i386/x86-64-align-branch-9.d: Ditto.
> * gas/testsuite/gas/i386/i386.exp: Run new tests.
>
> --
> BR,
> Hongtao
Add patch.
--
BR,
Hongtao
[-- Attachment #2: 0001-According-to-intel-SDM-manual-not-all-compare-flag-m.patch --]
[-- Type: application/octet-stream, Size: 24193 bytes --]
From 656e46d54db13e3ab3bd71946aa85eee30fc9adc Mon Sep 17 00:00:00 2001
From: liuhongt <hongtao.liu@intel.com>
Date: Wed, 26 Feb 2020 14:57:04 +0800
Subject: [PATCH] According to intel SDM manual, not all compare flag-modifying
instructions are marcro-fusible with subsequent jcc instruction. For those
not, -mbranches-within-32B-boundaries need't align them as FUSED_JCC_PADDING,
only jcc itself need to be aligned.
Here are 2 restrictions which separate macro-fusible instruction from not
Restriction 1:
If TEST/AND/CMP/ADD/SUB/INC/DEC is one of the following format:
cmp m, imm
add m, imm
sub m, imm
test m, imm
and m, imm
inc m
dec m
it is unfusible with any jcc instruction.
Restriction 2:
/* Table 3-2. Macro-Fusible Instructions in Haswell Microarchitecture
---------------------------------------------------------------------
| JCC | ADD/SUB/CMP | INC/DEC | TEST/AND |
| ------ | ----------- | ------- | -------- |
| Jo | N | N | Y |
| Jno | N | N | Y |
| Jc/Jb | Y | N | Y |
| Jae/Jnb | Y | N | Y |
| Je/Jz | Y | Y | Y |
| Jne/Jnz | Y | Y | Y |
| Jna/Jbe | Y | N | Y |
| Ja/Jnbe | Y | N | Y |
| Js | N | N | Y |
| Jns | N | N | Y |
| Jp/Jpe | N | N | Y |
| Jnp/Jpo | N | N | Y |
| Jl/Jnge | Y | Y | Y |
| Jge/Jnl | Y | Y | Y |
| Jle/Jng | Y | Y | Y |
| Jg/Jnle | Y | Y | Y |
Update maybe_fused_with_jcc_p to check if operands of CMP like instructions
can be fused with condition jump.
* gas/config/tc-i386.h (i386_tc_frag_data): Add member mf_type.
(TC_FRAG_INIT): Init mf_type.
* gas/config/tc-i386.c (enum mf_jcc_kind): New enum.
(enum mf_cmp_kind): Ditto.
(maybe_fused_with_jcc_p): Add argument mf_cmp_p to get
mf_type of corresponding instructons, exclude unfusible
instructions.
(add_fused_jcc_padding_frag_p): Likewise.
(add_branch_padding_frag_p): Likewise.
(output_insn): Record mf_type for corresponding instructions.
(i386_macro_fusible_p): New function.
(i386_next_fusible_jcc_frag): Rename from i386_next_jcc_frag,
add argument cmp_fragP to return next fusible jcc frag only.
(i386_classify_machine_dependant_frag): Seperate macro-fusible
instructions from condition jump.
* gas/testsuite/gas/i386/align-branch-9.s: New file.
* gas/testsuite/gas/i386/align-branch-9.d: Ditto.
* gas/testsuite/gas/i386/x86-64-align-branch-9.s: Ditto.
* gas/testsuite/gas/i386/x86-64-align-branch-9.d: Ditto.
* gas/testsuite/gas/i386/i386.exp: Run new tests.
---
gas/config/tc-i386.c | 169 ++++++++++++++----
gas/config/tc-i386.h | 2 +
gas/testsuite/gas/i386/align-branch-9.d | 78 ++++++++
gas/testsuite/gas/i386/align-branch-9.s | 74 ++++++++
gas/testsuite/gas/i386/i386.exp | 2 +
.../gas/i386/x86-64-align-branch-9.d | 46 +++++
.../gas/i386/x86-64-align-branch-9.s | 43 +++++
7 files changed, 382 insertions(+), 32 deletions(-)
create mode 100644 gas/testsuite/gas/i386/align-branch-9.d
create mode 100644 gas/testsuite/gas/i386/align-branch-9.s
create mode 100644 gas/testsuite/gas/i386/x86-64-align-branch-9.d
create mode 100644 gas/testsuite/gas/i386/x86-64-align-branch-9.s
diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index 62b7cfbe6c..76ae20d8ad 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -687,6 +687,37 @@ static unsigned int align_branch = (align_branch_jcc_bit
| align_branch_fused_bit
| align_branch_jmp_bit);
+/* Types of condition jump used by macro-fusion. */
+enum mf_jcc_kind
+ {
+ mf_jcc_jo = 0, /* base opcode 0x70 */
+ mf_jcc_jno, /* base opcode 0x71 */
+ mf_jcc_jc, /* base opcode 0x72 */
+ mf_jcc_jae, /* base opcode 0x73 */
+ mf_jcc_je, /* base opcode 0x74 */
+ mf_jcc_jne, /* base opcode 0x75 */
+ mf_jcc_jna, /* base opcode 0x76 */
+ mf_jcc_ja, /* base opcode 0x77 */
+ mf_jcc_js, /* base opcode 0x78 */
+ mf_jcc_jns, /* base opcode 0x79 */
+ mf_jcc_jp, /* base opcode 0x7a */
+ mf_jcc_jnp, /* base opcode 0x7b */
+ mf_jcc_jl, /* base opcode 0x7c */
+ mf_jcc_jge, /* base opcode 0x7d */
+ mf_jcc_jle, /* base opcode 0x7e */
+ mf_jcc_jg /* base opcode 0x7f */
+ };
+
+/* Types of compare flag-modifying insntructions used by macro-fusion. */
+enum mf_cmp_kind
+ {
+ mf_cmp_test, /* test */
+ mf_cmp_cmp, /* cmp */
+ mf_cmp_and, /* and */
+ mf_cmp_alu, /* add/sub */
+ mf_cmp_incdec /* inc/dec */
+ };
+
/* The maximum padding size for fused jcc. CMP like instruction can
be 9 bytes and jcc can be 6 bytes. Leave room just in case for
prefixes. */
@@ -8374,10 +8405,22 @@ encoding_length (const fragS *start_frag, offsetT start_off,
}
/* Return 1 for test, and, cmp, add, sub, inc and dec which may
- be macro-fused with conditional jumps. */
+ be macro-fused with conditional jumps.
+ NB: If TEST/AND/CMP/ADD/SUB/INC/DEC is of RIP relative address,
+ or is one of the following format:
+
+ cmp m, imm
+ add m, imm
+ sub m, imm
+ test m, imm
+ and m, imm
+ inc m
+ dec m
+
+ it is unfusible. */
static int
-maybe_fused_with_jcc_p (void)
+maybe_fused_with_jcc_p (enum mf_cmp_kind* mf_cmp_p)
{
/* No RIP address. */
if (i.base_reg && i.base_reg->reg_num == RegIP)
@@ -8387,36 +8430,54 @@ maybe_fused_with_jcc_p (void)
if (is_any_vex_encoding (&i.tm))
return 0;
- /* and, add, sub with destination register. */
- if ((i.tm.base_opcode >= 0x20 && i.tm.base_opcode <= 0x25)
- || i.tm.base_opcode <= 5
+ /* add, sub without add/sub m, imm. */
+ if (i.tm.base_opcode <= 5
|| (i.tm.base_opcode >= 0x28 && i.tm.base_opcode <= 0x2d)
|| ((i.tm.base_opcode | 3) == 0x83
- && ((i.tm.extension_opcode | 1) == 0x5
+ && (i.tm.extension_opcode == 0x5
|| i.tm.extension_opcode == 0x0)))
- return (i.types[1].bitfield.class == Reg
- || i.types[1].bitfield.instance == Accum);
+ {
+ *mf_cmp_p = mf_cmp_alu;
+ return !(i.mem_operands && i.imm_operands);
+ }
- /* test, cmp with any register. */
+ /* and without and m, imm. */
+ if ((i.tm.base_opcode >= 0x20 && i.tm.base_opcode <= 0x25)
+ || ((i.tm.base_opcode | 3) == 0x83
+ && i.tm.extension_opcode == 0x4))
+ {
+ *mf_cmp_p = mf_cmp_and;
+ return !(i.mem_operands && i.imm_operands);
+ }
+
+ /* test without test m imm. */
if ((i.tm.base_opcode | 1) == 0x85
|| (i.tm.base_opcode | 1) == 0xa9
|| ((i.tm.base_opcode | 1) == 0xf7
- && i.tm.extension_opcode == 0)
- || (i.tm.base_opcode >= 0x38 && i.tm.base_opcode <= 0x3d)
+ && i.tm.extension_opcode == 0))
+ {
+ *mf_cmp_p = mf_cmp_test;
+ return !(i.mem_operands && i.imm_operands);
+ }
+
+ /* cmp without cmp m, imm. */
+ if ((i.tm.base_opcode >= 0x38 && i.tm.base_opcode <= 0x3d)
|| ((i.tm.base_opcode | 3) == 0x83
&& (i.tm.extension_opcode == 0x7)))
- return (i.types[0].bitfield.class == Reg
- || i.types[0].bitfield.instance == Accum
- || i.types[1].bitfield.class == Reg
- || i.types[1].bitfield.instance == Accum);
+ {
+ *mf_cmp_p = mf_cmp_cmp;
+ return !(i.mem_operands && i.imm_operands);
+ }
- /* inc, dec with any register. */
+ /* inc, dec without inc/dec m. */
if ((i.tm.cpu_flags.bitfield.cpuno64
&& (i.tm.base_opcode | 0xf) == 0x4f)
|| ((i.tm.base_opcode | 1) == 0xff
&& i.tm.extension_opcode <= 0x1))
- return (i.types[0].bitfield.class == Reg
- || i.types[0].bitfield.instance == Accum);
+ {
+ *mf_cmp_p = mf_cmp_incdec;
+ return !i.mem_operands;
+ }
return 0;
}
@@ -8424,7 +8485,7 @@ maybe_fused_with_jcc_p (void)
/* Return 1 if a FUSED_JCC_PADDING frag should be generated. */
static int
-add_fused_jcc_padding_frag_p (void)
+add_fused_jcc_padding_frag_p (enum mf_cmp_kind* mf_cmp_p)
{
/* NB: Don't work with COND_JUMP86 without i386. */
if (!align_branch_power
@@ -8433,7 +8494,7 @@ add_fused_jcc_padding_frag_p (void)
|| !(align_branch & align_branch_fused_bit))
return 0;
- if (maybe_fused_with_jcc_p ())
+ if (maybe_fused_with_jcc_p (mf_cmp_p))
{
if (last_insn.kind == last_insn_other
|| last_insn.seg != now_seg)
@@ -8481,7 +8542,8 @@ add_branch_prefix_frag_p (void)
/* Return 1 if a BRANCH_PADDING frag should be generated. */
static int
-add_branch_padding_frag_p (enum align_branch_kind *branch_p)
+add_branch_padding_frag_p (enum align_branch_kind *branch_p,
+ enum mf_jcc_kind *mf_jcc_p)
{
int add_padding;
@@ -8503,6 +8565,7 @@ add_branch_padding_frag_p (enum align_branch_kind *branch_p)
}
else
{
+ *mf_jcc_p = i.tm.base_opcode - 0x70;
*branch_p = align_branch_jcc;
if ((align_branch & align_branch_jcc_bit))
add_padding = 1;
@@ -8573,6 +8636,7 @@ output_insn (void)
offsetT insn_start_off;
fragS *fragP = NULL;
enum align_branch_kind branch = align_branch_none;
+ enum mf_jcc_kind mf_jcc = mf_jcc_jo;
#if defined (OBJ_ELF) || defined (OBJ_MAYBE_ELF)
if (IS_ELF && x86_used_note)
@@ -8665,7 +8729,7 @@ output_insn (void)
insn_start_frag = frag_now;
insn_start_off = frag_now_fix ();
- if (add_branch_padding_frag_p (&branch))
+ if (add_branch_padding_frag_p (&branch, &mf_jcc))
{
char *p;
/* Branch can be 8 bytes. Leave some room for prefixes. */
@@ -8686,6 +8750,7 @@ output_insn (void)
ENCODE_RELAX_STATE (BRANCH_PADDING, 0),
NULL, 0, p);
+ fragP->tc_frag_data.mf_type = mf_jcc;
fragP->tc_frag_data.branch_type = branch;
fragP->tc_frag_data.max_bytes = max_branch_padding_size;
}
@@ -8705,6 +8770,7 @@ output_insn (void)
unsigned char *q;
unsigned int j;
unsigned int prefix;
+ enum mf_cmp_kind mf_cmp;
if (avoid_fence
&& (i.tm.base_opcode == 0xfaee8
@@ -8731,7 +8797,7 @@ output_insn (void)
if (branch)
/* Skip if this is a branch. */
;
- else if (add_fused_jcc_padding_frag_p ())
+ else if (add_fused_jcc_padding_frag_p (&mf_cmp))
{
/* Make room for padding. */
frag_grow (MAX_FUSED_JCC_PADDING_SIZE);
@@ -8743,6 +8809,7 @@ output_insn (void)
ENCODE_RELAX_STATE (FUSED_JCC_PADDING, 0),
NULL, 0, p);
+ fragP->tc_frag_data.mf_type = mf_cmp;
fragP->tc_frag_data.branch_type = align_branch_fused;
fragP->tc_frag_data.max_bytes = MAX_FUSED_JCC_PADDING_SIZE;
}
@@ -10948,6 +11015,41 @@ elf_symbol_resolved_in_segment_p (symbolS *fr_symbol, offsetT fr_var)
}
#endif
+/* Table 3-2. Macro-Fusible Instructions in Haswell Microarchitecture
+---------------------------------------------------------------------
+| JCC | ADD/SUB/CMP | INC/DEC | TEST/AND |
+| ------ | ----------- | ------- | -------- |
+| Jo | N | N | Y |
+| Jno | N | N | Y |
+| Jc/Jb | Y | N | Y |
+| Jae/Jnb | Y | N | Y |
+| Je/Jz | Y | Y | Y |
+| Jne/Jnz | Y | Y | Y |
+| Jna/Jbe | Y | N | Y |
+| Ja/Jnbe | Y | N | Y |
+| Js | N | N | Y |
+| Jns | N | N | Y |
+| Jp/Jpe | N | N | Y |
+| Jnp/Jpo | N | N | Y |
+| Jl/Jnge | Y | Y | Y |
+| Jge/Jnl | Y | Y | Y |
+| Jle/Jng | Y | Y | Y |
+| Jg/Jnle | Y | Y | Y |
+--------------------------------------------------------------------- */
+static int
+i386_macro_fusible_p (enum mf_cmp_kind mf_cmp, enum mf_jcc_kind mf_jcc)
+{
+ if (mf_cmp == mf_cmp_alu || mf_cmp == mf_cmp_cmp)
+ return ((mf_jcc >= mf_jcc_jc && mf_jcc <= mf_jcc_ja)
+ || (mf_jcc >= mf_jcc_jl && mf_jcc <= mf_jcc_jg));
+ if (mf_cmp == mf_cmp_incdec)
+ return ((mf_jcc >= mf_jcc_je && mf_jcc <= mf_jcc_jne)
+ || (mf_jcc >= mf_jcc_jl && mf_jcc <= mf_jcc_jg));
+ if (mf_cmp == mf_cmp_test || mf_cmp == mf_cmp_and)
+ return 1;
+ return 0;
+}
+
/* Return the next non-empty frag. */
static fragS *
@@ -10967,20 +11069,23 @@ i386_next_non_empty_frag (fragS *fragP)
/* Return the next jcc frag after BRANCH_PADDING. */
static fragS *
-i386_next_jcc_frag (fragS *fragP)
+i386_next_fusible_jcc_frag (fragS *cmp_fragP, fragS *pad_fragP)
{
- if (!fragP)
+ fragS *branch_fragP;
+ if (!pad_fragP)
return NULL;
- if (fragP->fr_type == rs_machine_dependent
- && (TYPE_FROM_RELAX_STATE (fragP->fr_subtype)
+ if (pad_fragP->fr_type == rs_machine_dependent
+ && (TYPE_FROM_RELAX_STATE (pad_fragP->fr_subtype)
== BRANCH_PADDING))
{
- fragP = i386_next_non_empty_frag (fragP);
- if (fragP->fr_type != rs_machine_dependent)
+ branch_fragP = i386_next_non_empty_frag (pad_fragP);
+ if (branch_fragP->fr_type != rs_machine_dependent)
return NULL;
- if (TYPE_FROM_RELAX_STATE (fragP->fr_subtype) == COND_JUMP)
- return fragP;
+ if (TYPE_FROM_RELAX_STATE (branch_fragP->fr_subtype) == COND_JUMP
+ && i386_macro_fusible_p (cmp_fragP->tc_frag_data.mf_type,
+ pad_fragP->tc_frag_data.mf_type))
+ return branch_fragP;
}
return NULL;
@@ -11025,7 +11130,7 @@ i386_classify_machine_dependent_frag (fragS *fragP)
*/
cmp_fragP = i386_next_non_empty_frag (next_fragP);
pad_fragP = i386_next_non_empty_frag (cmp_fragP);
- branch_fragP = i386_next_jcc_frag (pad_fragP);
+ branch_fragP = i386_next_fusible_jcc_frag (next_fragP, pad_fragP);
if (branch_fragP)
{
/* The BRANCH_PADDING frag is merged with the
diff --git a/gas/config/tc-i386.h b/gas/config/tc-i386.h
index 845b3901d4..d3f3819292 100644
--- a/gas/config/tc-i386.h
+++ b/gas/config/tc-i386.h
@@ -273,6 +273,7 @@ struct i386_tc_frag_data
unsigned char prefix_length;
unsigned char default_prefix;
unsigned char cmp_size;
+ unsigned int mf_type : 4;
unsigned int classified : 1;
unsigned int branch_type : 3;
};
@@ -299,6 +300,7 @@ struct i386_tc_frag_data
(FRAGP)->tc_frag_data.cmp_size = 0; \
(FRAGP)->tc_frag_data.classified = 0; \
(FRAGP)->tc_frag_data.branch_type = 0; \
+ (FRAGP)->tc_frag_data.mf_type = 0; \
} \
while (0)
diff --git a/gas/testsuite/gas/i386/align-branch-9.d b/gas/testsuite/gas/i386/align-branch-9.d
new file mode 100644
index 0000000000..6340817d04
--- /dev/null
+++ b/gas/testsuite/gas/i386/align-branch-9.d
@@ -0,0 +1,78 @@
+#as: -mbranches-within-32B-boundaries
+#objdump: -dw
+
+.*: +file format .*
+
+Disassembly of section .text:
+
+0+ <foo>:
+ 0: 65 a3 01 00 00 00 mov %eax,%gs:0x1
+ 6: 55 push %ebp
+ 7: 55 push %ebp
+ 8: 55 push %ebp
+ 9: 55 push %ebp
+ a: 89 e5 mov %esp,%ebp
+ c: 89 7d f8 mov %edi,-0x8\(%ebp\)
+ f: 89 75 f4 mov %esi,-0xc\(%ebp\)
+ 12: 89 75 f4 mov %esi,-0xc\(%ebp\)
+ 15: 89 75 f4 mov %esi,-0xc\(%ebp\)
+ 18: 89 75 f4 mov %esi,-0xc\(%ebp\)
+ 1b: 89 75 f4 mov %esi,-0xc\(%ebp\)
+ 1e: 39 c5 cmp %eax,%ebp
+ 20: 70 62 jo 84 <foo\+0x84>
+ 22: 89 73 f4 mov %esi,-0xc\(%ebx\)
+ 25: 89 75 f4 mov %esi,-0xc\(%ebp\)
+ 28: 89 7d f8 mov %edi,-0x8\(%ebp\)
+ 2b: 89 75 f4 mov %esi,-0xc\(%ebp\)
+ 2e: 89 75 f4 mov %esi,-0xc\(%ebp\)
+ 31: 89 75 f4 mov %esi,-0xc\(%ebp\)
+ 34: 89 75 f4 mov %esi,-0xc\(%ebp\)
+ 37: 89 75 f4 mov %esi,-0xc\(%ebp\)
+ 3a: 5d pop %ebp
+ 3b: 5d pop %ebp
+ 3c: 5d pop %ebp
+ 3d: 74 45 je 84 <foo\+0x84>
+ 3f: 5d pop %ebp
+ 40: 74 42 je 84 <foo\+0x84>
+ 42: 89 44 24 fc mov %eax,-0x4\(%esp\)
+ 46: 89 75 f4 mov %esi,-0xc\(%ebp\)
+ 49: 89 7d f8 mov %edi,-0x8\(%ebp\)
+ 4c: 89 75 f4 mov %esi,-0xc\(%ebp\)
+ 4f: 89 75 f4 mov %esi,-0xc\(%ebp\)
+ 52: 89 75 f4 mov %esi,-0xc\(%ebp\)
+ 55: 89 75 f4 mov %esi,-0xc\(%ebp\)
+ 58: 89 75 f4 mov %esi,-0xc\(%ebp\)
+ 5b: 5d pop %ebp
+ 5c: eb 2c jmp 8a <foo\+0x8a>
+ 5e: 66 90 xchg %ax,%ax
+ 60: eb 28 jmp 8a <foo\+0x8a>
+ 62: eb 26 jmp 8a <foo\+0x8a>
+ 64: 89 45 fc mov %eax,-0x4\(%ebp\)
+ 67: 89 75 f4 mov %esi,-0xc\(%ebp\)
+ 6a: 89 7d f8 mov %edi,-0x8\(%ebp\)
+ 6d: 5d pop %ebp
+ 6e: 5d pop %ebp
+ 6f: 40 inc %eax
+ 70: 72 12 jb 84 <foo\+0x84>
+ 72: 36 36 89 45 fc ss mov %eax,%ss:-0x4\(%ebp\)
+ 77: 89 75 f4 mov %esi,-0xc\(%ebp\)
+ 7a: 89 7d f8 mov %edi,-0x8\(%ebp\)
+ 7d: 89 75 f4 mov %esi,-0xc\(%ebp\)
+ 80: 21 c3 and %eax,%ebx
+ 82: 7c 06 jl 8a <foo\+0x8a>
+ 84: 8b 45 f4 mov -0xc\(%ebp\),%eax
+ 87: 89 45 fc mov %eax,-0x4\(%ebp\)
+ 8a: 89 b5 50 fb ff ff mov %esi,-0x4b0\(%ebp\)
+ 90: 89 b5 50 fb ff ff mov %esi,-0x4b0\(%ebp\)
+ 96: 89 b5 50 fb ff ff mov %esi,-0x4b0\(%ebp\)
+ 9c: 89 b5 50 fb ff ff mov %esi,-0x4b0\(%ebp\)
+ a2: 89 75 0c mov %esi,0xc\(%ebp\)
+ a5: e9 fc ff ff ff jmp a6 <foo\+0xa6>
+ aa: 89 b5 50 fb ff ff mov %esi,-0x4b0\(%ebp\)
+ b0: 89 b5 50 fb ff ff mov %esi,-0x4b0\(%ebp\)
+ b6: 89 b5 50 fb ff ff mov %esi,-0x4b0\(%ebp\)
+ bc: 89 b5 50 fb ff ff mov %esi,-0x4b0\(%ebp\)
+ c2: 89 75 00 mov %esi,0x0\(%ebp\)
+ c5: 74 c3 je 8a <foo\+0x8a>
+ c7: 74 c1 je 8a <foo\+0x8a>
+#pass
diff --git a/gas/testsuite/gas/i386/align-branch-9.s b/gas/testsuite/gas/i386/align-branch-9.s
new file mode 100644
index 0000000000..357abe30f9
--- /dev/null
+++ b/gas/testsuite/gas/i386/align-branch-9.s
@@ -0,0 +1,74 @@
+ .text
+ .globl foo
+ .p2align 4
+foo:
+ movl %eax, %gs:0x1
+ pushl %ebp
+ pushl %ebp
+ pushl %ebp
+ pushl %ebp
+ movl %esp, %ebp
+ movl %edi, -8(%ebp)
+ movl %esi, -12(%ebp)
+ movl %esi, -12(%ebp)
+ movl %esi, -12(%ebp)
+ movl %esi, -12(%ebp)
+ movl %esi, -12(%ebp)
+ cmp %eax, %ebp
+ jo .L_2
+ movl %esi, -12(%ebx)
+ movl %esi, -12(%ebp)
+ movl %edi, -8(%ebp)
+ movl %esi, -12(%ebp)
+ movl %esi, -12(%ebp)
+ movl %esi, -12(%ebp)
+ movl %esi, -12(%ebp)
+ movl %esi, -12(%ebp)
+ popl %ebp
+ popl %ebp
+ popl %ebp
+ je .L_2
+ popl %ebp
+ je .L_2
+ movl %eax, -4(%esp)
+ movl %esi, -12(%ebp)
+ movl %edi, -8(%ebp)
+ movl %esi, -12(%ebp)
+ movl %esi, -12(%ebp)
+ movl %esi, -12(%ebp)
+ movl %esi, -12(%ebp)
+ movl %esi, -12(%ebp)
+ popl %ebp
+ jmp .L_3
+ jmp .L_3
+ jmp .L_3
+ movl %eax, -4(%ebp)
+ movl %esi, -12(%ebp)
+ movl %edi, -8(%ebp)
+ popl %ebp
+ popl %ebp
+ inc %eax
+ jc .L_2
+ movl %eax, -4(%ebp)
+ movl %esi, -12(%ebp)
+ movl %edi, -8(%ebp)
+ movl %esi, -12(%ebp)
+ and %eax, %ebx
+ jl .L_3
+.L_2:
+ movl -12(%ebp), %eax
+ movl %eax, -4(%ebp)
+.L_3:
+ movl %esi, -1200(%ebp)
+ movl %esi, -1200(%ebp)
+ movl %esi, -1200(%ebp)
+ movl %esi, -1200(%ebp)
+ movl %esi, 12(%ebp)
+ jmp bar
+ movl %esi, -1200(%ebp)
+ movl %esi, -1200(%ebp)
+ movl %esi, -1200(%ebp)
+ movl %esi, -1200(%ebp)
+ movl %esi, (%ebp)
+ je .L_3
+ je .L_3
diff --git a/gas/testsuite/gas/i386/i386.exp b/gas/testsuite/gas/i386/i386.exp
index 685e62ea72..8fc621f2bb 100644
--- a/gas/testsuite/gas/i386/i386.exp
+++ b/gas/testsuite/gas/i386/i386.exp
@@ -525,6 +525,7 @@ if [expr ([istarget "i*86-*-*"] || [istarget "x86_64-*-*"]) && [gas_32_check]]
run_dump_test "align-branch-6"
run_dump_test "align-branch-7"
run_dump_test "align-branch-8"
+ run_dump_test "align-branch-9"
# These tests require support for 8 and 16 bit relocs,
# so we only run them for ELF and COFF targets.
@@ -1100,6 +1101,7 @@ if [expr ([istarget "i*86-*-*"] || [istarget "x86_64-*-*"]) && [gas_64_check]] t
run_dump_test "x86-64-align-branch-6"
run_dump_test "x86-64-align-branch-7"
run_dump_test "x86-64-align-branch-8"
+ run_dump_test "x86-64-align-branch-9"
if { ![istarget "*-*-aix*"]
&& ![istarget "*-*-beos*"]
diff --git a/gas/testsuite/gas/i386/x86-64-align-branch-9.d b/gas/testsuite/gas/i386/x86-64-align-branch-9.d
new file mode 100644
index 0000000000..1041fd0483
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-align-branch-9.d
@@ -0,0 +1,46 @@
+#as: -mbranches-within-32B-boundaries
+#objdump: -dw
+
+.*: +file format .*
+
+Disassembly of section .text:
+
+0+ <foo>:
+ 0: c1 e9 02 shr \$0x2,%ecx
+ 3: c1 e9 02 shr \$0x2,%ecx
+ 6: c1 e9 02 shr \$0x2,%ecx
+ 9: 89 d1 mov %edx,%ecx
+ b: 31 c0 xor %eax,%eax
+ d: c1 e9 02 shr \$0x2,%ecx
+ 10: c1 e9 02 shr \$0x2,%ecx
+ 13: c1 e9 02 shr \$0x2,%ecx
+ 16: c1 e9 02 shr \$0x2,%ecx
+ 19: c1 e9 02 shr \$0x2,%ecx
+ 1c: c1 e9 02 shr \$0x2,%ecx
+ 1f: 80 fa 02 cmp \$0x2,%dl
+ 22: 70 df jo 3 <foo\+0x3>
+ 24: 2e 2e 2e 2e 31 c0 cs cs cs cs xor %eax,%eax
+ 2a: c1 e9 02 shr \$0x2,%ecx
+ 2d: c1 e9 02 shr \$0x2,%ecx
+ 30: c1 e9 02 shr \$0x2,%ecx
+ 33: 89 d1 mov %edx,%ecx
+ 35: 31 c0 xor %eax,%eax
+ 37: c1 e9 02 shr \$0x2,%ecx
+ 3a: c1 e9 02 shr \$0x2,%ecx
+ 3d: c1 e9 02 shr \$0x2,%ecx
+ 40: f6 c2 02 test \$0x2,%dl
+ 43: 75 e8 jne 2d <foo\+0x2d>
+ 45: 31 c0 xor %eax,%eax
+ 47: c1 e9 02 shr \$0x2,%ecx
+ 4a: c1 e9 02 shr \$0x2,%ecx
+ 4d: 89 d1 mov %edx,%ecx
+ 4f: c1 e9 02 shr \$0x2,%ecx
+ 52: c1 e9 02 shr \$0x2,%ecx
+ 55: 89 d1 mov %edx,%ecx
+ 57: c1 e9 02 shr \$0x2,%ecx
+ 5a: 89 d1 mov %edx,%ecx
+ 5c: 31 c0 xor %eax,%eax
+ 5e: ff c0 inc %eax
+ 60: 76 cb jbe 2d <foo\+0x2d>
+ 62: 31 c0 xor %eax,%eax
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-align-branch-9.s b/gas/testsuite/gas/i386/x86-64-align-branch-9.s
new file mode 100644
index 0000000000..917579bda4
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-align-branch-9.s
@@ -0,0 +1,43 @@
+ .text
+ .p2align 4,,15
+foo:
+ shrl $2, %ecx
+.L1:
+ shrl $2, %ecx
+ shrl $2, %ecx
+ movl %edx, %ecx
+ xorl %eax, %eax
+ shrl $2, %ecx
+ shrl $2, %ecx
+ shrl $2, %ecx
+ shrl $2, %ecx
+ shrl $2, %ecx
+ shrl $2, %ecx
+ cmpb $2, %dl
+ jo .L1
+ xorl %eax, %eax
+ shrl $2, %ecx
+.L2:
+ shrl $2, %ecx
+ shrl $2, %ecx
+ movl %edx, %ecx
+ xorl %eax, %eax
+ shrl $2, %ecx
+ shrl $2, %ecx
+ shrl $2, %ecx
+ testb $2, %dl
+ jne .L2
+ xorl %eax, %eax
+.L3:
+ shrl $2, %ecx
+ shrl $2, %ecx
+ movl %edx, %ecx
+ shrl $2, %ecx
+ shrl $2, %ecx
+ movl %edx, %ecx
+ shrl $2, %ecx
+ movl %edx, %ecx
+ xorl %eax, %eax
+ inc %eax
+ jbe .L2
+ xorl %eax, %eax
--
2.18.1
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] Separate macro-fusible instructions from not for -mbranches-within-32B-boundaries
2020-02-28 8:57 ` Hongtao Liu
@ 2020-02-28 11:35 ` Jan Beulich
2020-03-02 3:14 ` Hongtao Liu
0 siblings, 1 reply; 9+ messages in thread
From: Jan Beulich @ 2020-02-28 11:35 UTC (permalink / raw)
To: Hongtao Liu, H. J. Lu; +Cc: binutils
On 28.02.2020 09:57, Hongtao Liu wrote:
> On Fri, Feb 28, 2020 at 4:55 PM Hongtao Liu <crazylht@gmail.com> wrote:
>>
>> Hi:
>> This is subsequent patch for JCC Code Erratum.
>> Refer to http://sourceware-org.1504.n7.nabble.com/PATCH-0-5-i386-Optimize-for-Jump-Conditional-Code-Erratum-tt598005.html#none
>>
>> According to intel SDM manual, not all compare flag-modifying
>> instructions are marcro-fusible with subsequent jcc instruction. For
>> those not, -mbranches-within-32B-boundaries need't align them as
>> FUSED_JCC_PADDING, only jcc itself need to be aligned. Here are 2
>> restrictions which separate macro-fusible instruction from not
>>
>> Restriction 1:
>> If TEST/AND/CMP/ADD/SUB/INC/DEC is one of the following format:
>>
>> cmp m, imm
>> add m, imm
>> sub m, imm
>> test m, imm
>> and m, imm
>> inc m
>> dec m
>>
>> it is unfusible with any jcc instructions.
>>
>> Restriction 2:
>> /* Table 3-2. Macro-Fusible Instructions in Haswell Microarchitecture
>> ---------------------------------------------------------------------
>> | JCC | ADD/SUB/CMP | INC/DEC | TEST/AND |
>> | ------ | ----------- | ------- | -------- |
>> | Jo | N | N | Y |
>> | Jno | N | N | Y |
>> | Jc/Jb | Y | N | Y |
>> | Jae/Jnb | Y | N | Y |
>> | Je/Jz | Y | Y | Y |
>> | Jne/Jnz | Y | Y | Y |
>> | Jna/Jbe | Y | N | Y |
>> | Ja/Jnbe | Y | N | Y |
>> | Js | N | N | Y |
>> | Jns | N | N | Y |
>> | Jp/Jpe | N | N | Y |
>> | Jnp/Jpo | N | N | Y |
>> | Jl/Jnge | Y | Y | Y |
>> | Jge/Jnl | Y | Y | Y |
>> | Jle/Jng | Y | Y | Y |
>> | Jg/Jnle | Y | Y | Y |
Quoting a Haswell table (also in the code) for code to deal with
a Skylake erratum is, without any further comments, not very
helpful.
>+enum mf_jcc_kind
>+ {
>+ mf_jcc_jo = 0, /* base opcode 0x70 */
>+ mf_jcc_jno, /* base opcode 0x71 */
>+ mf_jcc_jc, /* base opcode 0x72 */
>+ mf_jcc_jae, /* base opcode 0x73 */
>+ mf_jcc_je, /* base opcode 0x74 */
>+ mf_jcc_jne, /* base opcode 0x75 */
>+ mf_jcc_jna, /* base opcode 0x76 */
>+ mf_jcc_ja, /* base opcode 0x77 */
>+ mf_jcc_js, /* base opcode 0x78 */
>+ mf_jcc_jns, /* base opcode 0x79 */
>+ mf_jcc_jp, /* base opcode 0x7a */
>+ mf_jcc_jnp, /* base opcode 0x7b */
>+ mf_jcc_jl, /* base opcode 0x7c */
>+ mf_jcc_jge, /* base opcode 0x7d */
>+ mf_jcc_jle, /* base opcode 0x7e */
>+ mf_jcc_jg /* base opcode 0x7f */
>+ };
Did you consider making this an insn template attribute instead?
>+enum mf_cmp_kind
>+ {
>+ mf_cmp_test, /* test */
>+ mf_cmp_cmp, /* cmp */
>+ mf_cmp_and, /* and */
>+ mf_cmp_alu, /* add/sub */
>+ mf_cmp_incdec /* inc/dec */
>+ };
Why 5 enumerators when only three are needed (CMP going
together with ADD/SUB and AND going together with TEST)?
>@@ -8573,6 +8636,7 @@ output_insn (void)
> offsetT insn_start_off;
> fragS *fragP = NULL;
> enum align_branch_kind branch = align_branch_none;
>+ enum mf_jcc_kind mf_jcc = mf_jcc_jo;
Is this initializer needed? It looks pretty arbitrary to pick
JO here.
>@@ -10967,20 +11069,23 @@ i386_next_non_empty_frag (fragS *fragP)
> /* Return the next jcc frag after BRANCH_PADDING. */
>
> static fragS *
>-i386_next_jcc_frag (fragS *fragP)
>+i386_next_fusible_jcc_frag (fragS *cmp_fragP, fragS *pad_fragP)
Besides "cmp" in the name again not really fitting all variants
(afaict), it also looks suspicious with ...
>@@ -11025,7 +11130,7 @@ i386_classify_machine_dependent_frag (fragS *fragP)
> */
> cmp_fragP = i386_next_non_empty_frag (next_fragP);
> pad_fragP = i386_next_non_empty_frag (cmp_fragP);
>- branch_fragP = i386_next_jcc_frag (pad_fragP);
>+ branch_fragP = i386_next_fusible_jcc_frag (next_fragP, pad_fragP);
... next_fragP, not cmp_fragP being passed here.
>--- a/gas/config/tc-i386.h
>+++ b/gas/config/tc-i386.h
>@@ -273,6 +273,7 @@ struct i386_tc_frag_data
> unsigned char prefix_length;
> unsigned char default_prefix;
> unsigned char cmp_size;
>+ unsigned int mf_type : 4;
Even if not going with a (2-bit) insn attribute, this doesn't
need to be wider than 3 bits, as J<cc> and JN<cc> always fall
into the same group, and hence don't need separate tracking
(i.e. when recording you could ignore the low opcode bit).
Jan
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] Separate macro-fusible instructions from not for -mbranches-within-32B-boundaries
2020-02-28 11:35 ` Jan Beulich
@ 2020-03-02 3:14 ` Hongtao Liu
2020-03-02 4:05 ` Hongtao Liu
0 siblings, 1 reply; 9+ messages in thread
From: Hongtao Liu @ 2020-03-02 3:14 UTC (permalink / raw)
To: Jan Beulich; +Cc: H. J. Lu, binutils
On Fri, Feb 28, 2020 at 7:35 PM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 28.02.2020 09:57, Hongtao Liu wrote:
> > On Fri, Feb 28, 2020 at 4:55 PM Hongtao Liu <crazylht@gmail.com> wrote:
> >>
> >> Hi:
> >> This is subsequent patch for JCC Code Erratum.
> >> Refer to http://sourceware-org.1504.n7.nabble.com/PATCH-0-5-i386-Optimize-for-Jump-Conditional-Code-Erratum-tt598005.html#none
> >>
> >> According to intel SDM manual, not all compare flag-modifying
> >> instructions are marcro-fusible with subsequent jcc instruction. For
> >> those not, -mbranches-within-32B-boundaries need't align them as
> >> FUSED_JCC_PADDING, only jcc itself need to be aligned. Here are 2
> >> restrictions which separate macro-fusible instruction from not
> >>
> >> Restriction 1:
> >> If TEST/AND/CMP/ADD/SUB/INC/DEC is one of the following format:
> >>
> >> cmp m, imm
> >> add m, imm
> >> sub m, imm
> >> test m, imm
> >> and m, imm
> >> inc m
> >> dec m
> >>
> >> it is unfusible with any jcc instructions.
> >>
> >> Restriction 2:
> >> /* Table 3-2. Macro-Fusible Instructions in Haswell Microarchitecture
> >> ---------------------------------------------------------------------
> >> | JCC | ADD/SUB/CMP | INC/DEC | TEST/AND |
> >> | ------ | ----------- | ------- | -------- |
> >> | Jo | N | N | Y |
> >> | Jno | N | N | Y |
> >> | Jc/Jb | Y | N | Y |
> >> | Jae/Jnb | Y | N | Y |
> >> | Je/Jz | Y | Y | Y |
> >> | Jne/Jnz | Y | Y | Y |
> >> | Jna/Jbe | Y | N | Y |
> >> | Ja/Jnbe | Y | N | Y |
> >> | Js | N | N | Y |
> >> | Jns | N | N | Y |
> >> | Jp/Jpe | N | N | Y |
> >> | Jnp/Jpo | N | N | Y |
> >> | Jl/Jnge | Y | Y | Y |
> >> | Jge/Jnl | Y | Y | Y |
> >> | Jle/Jng | Y | Y | Y |
> >> | Jg/Jnle | Y | Y | Y |
>
> Quoting a Haswell table (also in the code) for code to deal with
> a Skylake erratum is, without any further comments, not very
> helpful.
>
They also works for Skylake and Cascadelake, I'll add comments.
> >+enum mf_jcc_kind
> >+ {
> >+ mf_jcc_jo = 0, /* base opcode 0x70 */
> >+ mf_jcc_jno, /* base opcode 0x71 */
> >+ mf_jcc_jc, /* base opcode 0x72 */
> >+ mf_jcc_jae, /* base opcode 0x73 */
> >+ mf_jcc_je, /* base opcode 0x74 */
> >+ mf_jcc_jne, /* base opcode 0x75 */
> >+ mf_jcc_jna, /* base opcode 0x76 */
> >+ mf_jcc_ja, /* base opcode 0x77 */
> >+ mf_jcc_js, /* base opcode 0x78 */
> >+ mf_jcc_jns, /* base opcode 0x79 */
> >+ mf_jcc_jp, /* base opcode 0x7a */
> >+ mf_jcc_jnp, /* base opcode 0x7b */
> >+ mf_jcc_jl, /* base opcode 0x7c */
> >+ mf_jcc_jge, /* base opcode 0x7d */
> >+ mf_jcc_jle, /* base opcode 0x7e */
> >+ mf_jcc_jg /* base opcode 0x7f */
> >+ };
>
> Did you consider making this an insn template attribute instead?
>
Well, it could be calculated by tm.baseopcode - 0x70 if insn is jcc.
So maybe it's a bit redudant to add it into insn_template?
> >+enum mf_cmp_kind
> >+ {
> >+ mf_cmp_test, /* test */
> >+ mf_cmp_cmp, /* cmp */
> >+ mf_cmp_and, /* and */
> >+ mf_cmp_alu, /* add/sub */
> >+ mf_cmp_incdec /* inc/dec */
> >+ };
>
> Why 5 enumerators when only three are needed (CMP going
> together with ADD/SUB and AND going together with TEST)?
>
Yes, only three are needed here, adding 5 enumerators just in case
there are other combinations use in the future.
> >@@ -8573,6 +8636,7 @@ output_insn (void)
> > offsetT insn_start_off;
> > fragS *fragP = NULL;
> > enum align_branch_kind branch = align_branch_none;
> >+ enum mf_jcc_kind mf_jcc = mf_jcc_jo;
>
> Is this initializer needed? It looks pretty arbitrary to pick
> JO here.
>
It's arbitrary, the initializer here is just to avoid uninitialized
warning when building with option Wuninitialized.
I'll add comments to explain the initializer.
> >@@ -10967,20 +11069,23 @@ i386_next_non_empty_frag (fragS *fragP)
> > /* Return the next jcc frag after BRANCH_PADDING. */
> >
> > static fragS *
> >-i386_next_jcc_frag (fragS *fragP)
> >+i386_next_fusible_jcc_frag (fragS *cmp_fragP, fragS *pad_fragP)
>
> Besides "cmp" in the name again not really fitting all variants
> (afaict), it also looks suspicious with ...
>
> >@@ -11025,7 +11130,7 @@ i386_classify_machine_dependent_frag (fragS *fragP)
> > */
> > cmp_fragP = i386_next_non_empty_frag (next_fragP);
> > pad_fragP = i386_next_non_empty_frag (cmp_fragP);
> >- branch_fragP = i386_next_jcc_frag (pad_fragP);
> >+ branch_fragP = i386_next_fusible_jcc_frag (next_fragP, pad_fragP);
>
> ... next_fragP, not cmp_fragP being passed here.
>
Yes, use maybe_cmp_fragP may be better.
> >--- a/gas/config/tc-i386.h
> >+++ b/gas/config/tc-i386.h
> >@@ -273,6 +273,7 @@ struct i386_tc_frag_data
> > unsigned char prefix_length;
> > unsigned char default_prefix;
> > unsigned char cmp_size;
> >+ unsigned int mf_type : 4;
>
> Even if not going with a (2-bit) insn attribute, this doesn't
> need to be wider than 3 bits, as J<cc> and JN<cc> always fall
> into the same group, and hence don't need separate tracking
> (i.e. when recording you could ignore the low opcode bit).
>
Yes, J<cc> and JN<cc> could share the same group. 3 bits is enough.
> Jan
--
BR,
Hongtao
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] Separate macro-fusible instructions from not for -mbranches-within-32B-boundaries
2020-03-02 3:14 ` Hongtao Liu
@ 2020-03-02 4:05 ` Hongtao Liu
2020-03-02 8:32 ` Jan Beulich
0 siblings, 1 reply; 9+ messages in thread
From: Hongtao Liu @ 2020-03-02 4:05 UTC (permalink / raw)
To: Jan Beulich; +Cc: H. J. Lu, binutils
[-- Attachment #1: Type: text/plain, Size: 6186 bytes --]
On Mon, Mar 2, 2020 at 11:13 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Fri, Feb 28, 2020 at 7:35 PM Jan Beulich <jbeulich@suse.com> wrote:
> >
> > On 28.02.2020 09:57, Hongtao Liu wrote:
> > > On Fri, Feb 28, 2020 at 4:55 PM Hongtao Liu <crazylht@gmail.com> wrote:
> > >>
> > >> Hi:
> > >> This is subsequent patch for JCC Code Erratum.
> > >> Refer to http://sourceware-org.1504.n7.nabble.com/PATCH-0-5-i386-Optimize-for-Jump-Conditional-Code-Erratum-tt598005.html#none
> > >>
> > >> According to intel SDM manual, not all compare flag-modifying
> > >> instructions are marcro-fusible with subsequent jcc instruction. For
> > >> those not, -mbranches-within-32B-boundaries need't align them as
> > >> FUSED_JCC_PADDING, only jcc itself need to be aligned. Here are 2
> > >> restrictions which separate macro-fusible instruction from not
> > >>
> > >> Restriction 1:
> > >> If TEST/AND/CMP/ADD/SUB/INC/DEC is one of the following format:
> > >>
> > >> cmp m, imm
> > >> add m, imm
> > >> sub m, imm
> > >> test m, imm
> > >> and m, imm
> > >> inc m
> > >> dec m
> > >>
> > >> it is unfusible with any jcc instructions.
> > >>
> > >> Restriction 2:
> > >> /* Table 3-2. Macro-Fusible Instructions in Haswell Microarchitecture
> > >> ---------------------------------------------------------------------
> > >> | JCC | ADD/SUB/CMP | INC/DEC | TEST/AND |
> > >> | ------ | ----------- | ------- | -------- |
> > >> | Jo | N | N | Y |
> > >> | Jno | N | N | Y |
> > >> | Jc/Jb | Y | N | Y |
> > >> | Jae/Jnb | Y | N | Y |
> > >> | Je/Jz | Y | Y | Y |
> > >> | Jne/Jnz | Y | Y | Y |
> > >> | Jna/Jbe | Y | N | Y |
> > >> | Ja/Jnbe | Y | N | Y |
> > >> | Js | N | N | Y |
> > >> | Jns | N | N | Y |
> > >> | Jp/Jpe | N | N | Y |
> > >> | Jnp/Jpo | N | N | Y |
> > >> | Jl/Jnge | Y | Y | Y |
> > >> | Jge/Jnl | Y | Y | Y |
> > >> | Jle/Jng | Y | Y | Y |
> > >> | Jg/Jnle | Y | Y | Y |
> >
> > Quoting a Haswell table (also in the code) for code to deal with
> > a Skylake erratum is, without any further comments, not very
> > helpful.
> >
> They also works for Skylake and Cascadelake, I'll add comments.
> > >+enum mf_jcc_kind
> > >+ {
> > >+ mf_jcc_jo = 0, /* base opcode 0x70 */
> > >+ mf_jcc_jno, /* base opcode 0x71 */
> > >+ mf_jcc_jc, /* base opcode 0x72 */
> > >+ mf_jcc_jae, /* base opcode 0x73 */
> > >+ mf_jcc_je, /* base opcode 0x74 */
> > >+ mf_jcc_jne, /* base opcode 0x75 */
> > >+ mf_jcc_jna, /* base opcode 0x76 */
> > >+ mf_jcc_ja, /* base opcode 0x77 */
> > >+ mf_jcc_js, /* base opcode 0x78 */
> > >+ mf_jcc_jns, /* base opcode 0x79 */
> > >+ mf_jcc_jp, /* base opcode 0x7a */
> > >+ mf_jcc_jnp, /* base opcode 0x7b */
> > >+ mf_jcc_jl, /* base opcode 0x7c */
> > >+ mf_jcc_jge, /* base opcode 0x7d */
> > >+ mf_jcc_jle, /* base opcode 0x7e */
> > >+ mf_jcc_jg /* base opcode 0x7f */
> > >+ };
> >
> > Did you consider making this an insn template attribute instead?
> >
> Well, it could be calculated by tm.baseopcode - 0x70 if insn is jcc.
> So maybe it's a bit redudant to add it into insn_template?
> > >+enum mf_cmp_kind
> > >+ {
> > >+ mf_cmp_test, /* test */
> > >+ mf_cmp_cmp, /* cmp */
> > >+ mf_cmp_and, /* and */
> > >+ mf_cmp_alu, /* add/sub */
> > >+ mf_cmp_incdec /* inc/dec */
> > >+ };
> >
> > Why 5 enumerators when only three are needed (CMP going
> > together with ADD/SUB and AND going together with TEST)?
> >
> Yes, only three are needed here, adding 5 enumerators just in case
> there are other combinations use in the future.
> > >@@ -8573,6 +8636,7 @@ output_insn (void)
> > > offsetT insn_start_off;
> > > fragS *fragP = NULL;
> > > enum align_branch_kind branch = align_branch_none;
> > >+ enum mf_jcc_kind mf_jcc = mf_jcc_jo;
> >
> > Is this initializer needed? It looks pretty arbitrary to pick
> > JO here.
> >
> It's arbitrary, the initializer here is just to avoid uninitialized
> warning when building with option Wuninitialized.
> I'll add comments to explain the initializer.
> > >@@ -10967,20 +11069,23 @@ i386_next_non_empty_frag (fragS *fragP)
> > > /* Return the next jcc frag after BRANCH_PADDING. */
> > >
> > > static fragS *
> > >-i386_next_jcc_frag (fragS *fragP)
> > >+i386_next_fusible_jcc_frag (fragS *cmp_fragP, fragS *pad_fragP)
> >
> > Besides "cmp" in the name again not really fitting all variants
> > (afaict), it also looks suspicious with ...
> >
> > >@@ -11025,7 +11130,7 @@ i386_classify_machine_dependent_frag (fragS *fragP)
> > > */
> > > cmp_fragP = i386_next_non_empty_frag (next_fragP);
> > > pad_fragP = i386_next_non_empty_frag (cmp_fragP);
> > >- branch_fragP = i386_next_jcc_frag (pad_fragP);
> > >+ branch_fragP = i386_next_fusible_jcc_frag (next_fragP, pad_fragP);
> >
> > ... next_fragP, not cmp_fragP being passed here.
> >
> Yes, use maybe_cmp_fragP may be better.
> > >--- a/gas/config/tc-i386.h
> > >+++ b/gas/config/tc-i386.h
> > >@@ -273,6 +273,7 @@ struct i386_tc_frag_data
> > > unsigned char prefix_length;
> > > unsigned char default_prefix;
> > > unsigned char cmp_size;
> > >+ unsigned int mf_type : 4;
> >
> > Even if not going with a (2-bit) insn attribute, this doesn't
> > need to be wider than 3 bits, as J<cc> and JN<cc> always fall
> > into the same group, and hence don't need separate tracking
> > (i.e. when recording you could ignore the low opcode bit).
> >
> Yes, J<cc> and JN<cc> could share the same group. 3 bits is enough.
> > Jan
>
>
>
> --
> BR,
> Hongtao
Update patch.
--
BR,
Hongtao
[-- Attachment #2: 0001-According-to-intel-SDM-manual-not-all-compare-flag-m.patch --]
[-- Type: application/octet-stream, Size: 24194 bytes --]
From e9801ef97177eae61548ed54996d767e7179dff6 Mon Sep 17 00:00:00 2001
From: liuhongt <hongtao.liu@intel.com>
Date: Wed, 26 Feb 2020 14:57:04 +0800
Subject: [PATCH] According to intel SDM manual, not all compare flag-modifying
instructions are marcro-fusible with subsequent jcc instruction. For those
not, -mbranches-within-32B-boundaries need't align them as FUSED_JCC_PADDING,
only jcc itself need to be aligned.
Here are 2 restrictions which separate macro-fusible instruction from not
Restriction 1:
If TEST/AND/CMP/ADD/SUB/INC/DEC is one of the following format:
cmp m, imm
add m, imm
sub m, imm
test m, imm
and m, imm
inc m
dec m
it is unfusible with any jcc instruction.
Restriction 2:
/* Table 3-2. Macro-Fusible Instructions in Haswell Microarchitecture
Note it also works for Skylake and Cascadelake.
---------------------------------------------------------------------
| JCC | ADD/SUB/CMP | INC/DEC | TEST/AND |
| ------ | ----------- | ------- | -------- |
| Jo | N | N | Y |
| Jno | N | N | Y |
| Jc/Jb | Y | N | Y |
| Jae/Jnb | Y | N | Y |
| Je/Jz | Y | Y | Y |
| Jne/Jnz | Y | Y | Y |
| Jna/Jbe | Y | N | Y |
| Ja/Jnbe | Y | N | Y |
| Js | N | N | Y |
| Jns | N | N | Y |
| Jp/Jpe | N | N | Y |
| Jnp/Jpo | N | N | Y |
| Jl/Jnge | Y | Y | Y |
| Jge/Jnl | Y | Y | Y |
| Jle/Jng | Y | Y | Y |
| Jg/Jnle | Y | Y | Y |
Update maybe_fused_with_jcc_p to check if operands of CMP like instructions
can be fused with condition jump.
* gas/config/tc-i386.h (i386_tc_frag_data): Add member mf_type.
(TC_FRAG_INIT): Init mf_type.
* gas/config/tc-i386.c (enum mf_jcc_kind): New enum.
(enum mf_cmp_kind): Ditto.
(maybe_fused_with_jcc_p): Add argument mf_cmp_p to get
mf_type of corresponding instructons, exclude unfusible
instructions.
(add_fused_jcc_padding_frag_p): Likewise.
(add_branch_padding_frag_p): Likewise.
(output_insn): Record mf_type for corresponding instructions.
(i386_macro_fusible_p): New function.
(i386_next_fusible_jcc_frag): Rename from i386_next_jcc_frag,
add argument cmp_fragP to return next fusible jcc frag only.
(i386_classify_machine_dependant_frag): Seperate macro-fusible
instructions from condition jump.
* gas/testsuite/gas/i386/align-branch-9.s: New file.
* gas/testsuite/gas/i386/align-branch-9.d: Ditto.
* gas/testsuite/gas/i386/x86-64-align-branch-9.s: Ditto.
* gas/testsuite/gas/i386/x86-64-align-branch-9.d: Ditto.
* gas/testsuite/gas/i386/i386.exp: Run new tests.
---
gas/config/tc-i386.c | 167 ++++++++++++++----
gas/config/tc-i386.h | 2 +
gas/testsuite/gas/i386/align-branch-9.d | 78 ++++++++
gas/testsuite/gas/i386/align-branch-9.s | 74 ++++++++
gas/testsuite/gas/i386/i386.exp | 2 +
.../gas/i386/x86-64-align-branch-9.d | 46 +++++
.../gas/i386/x86-64-align-branch-9.s | 43 +++++
7 files changed, 380 insertions(+), 32 deletions(-)
create mode 100644 gas/testsuite/gas/i386/align-branch-9.d
create mode 100644 gas/testsuite/gas/i386/align-branch-9.s
create mode 100644 gas/testsuite/gas/i386/x86-64-align-branch-9.d
create mode 100644 gas/testsuite/gas/i386/x86-64-align-branch-9.s
diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index 62b7cfbe6c..60e3283636 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -687,6 +687,29 @@ static unsigned int align_branch = (align_branch_jcc_bit
| align_branch_fused_bit
| align_branch_jmp_bit);
+/* Types of condition jump used by macro-fusion. */
+enum mf_jcc_kind
+ {
+ mf_jcc_jo = 0, /* base opcode 0x70 */
+ mf_jcc_jc, /* base opcode 0x72 */
+ mf_jcc_je, /* base opcode 0x74 */
+ mf_jcc_jna, /* base opcode 0x76 */
+ mf_jcc_js, /* base opcode 0x78 */
+ mf_jcc_jp, /* base opcode 0x7a */
+ mf_jcc_jl, /* base opcode 0x7c */
+ mf_jcc_jle, /* base opcode 0x7e */
+ };
+
+/* Types of compare flag-modifying insntructions used by macro-fusion. */
+enum mf_cmp_kind
+ {
+ mf_cmp_test, /* test */
+ mf_cmp_cmp, /* cmp */
+ mf_cmp_and, /* and */
+ mf_cmp_alu, /* add/sub */
+ mf_cmp_incdec /* inc/dec */
+ };
+
/* The maximum padding size for fused jcc. CMP like instruction can
be 9 bytes and jcc can be 6 bytes. Leave room just in case for
prefixes. */
@@ -8374,10 +8397,22 @@ encoding_length (const fragS *start_frag, offsetT start_off,
}
/* Return 1 for test, and, cmp, add, sub, inc and dec which may
- be macro-fused with conditional jumps. */
+ be macro-fused with conditional jumps.
+ NB: If TEST/AND/CMP/ADD/SUB/INC/DEC is of RIP relative address,
+ or is one of the following format:
+
+ cmp m, imm
+ add m, imm
+ sub m, imm
+ test m, imm
+ and m, imm
+ inc m
+ dec m
+
+ it is unfusible. */
static int
-maybe_fused_with_jcc_p (void)
+maybe_fused_with_jcc_p (enum mf_cmp_kind* mf_cmp_p)
{
/* No RIP address. */
if (i.base_reg && i.base_reg->reg_num == RegIP)
@@ -8387,36 +8422,54 @@ maybe_fused_with_jcc_p (void)
if (is_any_vex_encoding (&i.tm))
return 0;
- /* and, add, sub with destination register. */
- if ((i.tm.base_opcode >= 0x20 && i.tm.base_opcode <= 0x25)
- || i.tm.base_opcode <= 5
+ /* add, sub without add/sub m, imm. */
+ if (i.tm.base_opcode <= 5
|| (i.tm.base_opcode >= 0x28 && i.tm.base_opcode <= 0x2d)
|| ((i.tm.base_opcode | 3) == 0x83
- && ((i.tm.extension_opcode | 1) == 0x5
+ && (i.tm.extension_opcode == 0x5
|| i.tm.extension_opcode == 0x0)))
- return (i.types[1].bitfield.class == Reg
- || i.types[1].bitfield.instance == Accum);
+ {
+ *mf_cmp_p = mf_cmp_alu;
+ return !(i.mem_operands && i.imm_operands);
+ }
- /* test, cmp with any register. */
+ /* and without and m, imm. */
+ if ((i.tm.base_opcode >= 0x20 && i.tm.base_opcode <= 0x25)
+ || ((i.tm.base_opcode | 3) == 0x83
+ && i.tm.extension_opcode == 0x4))
+ {
+ *mf_cmp_p = mf_cmp_and;
+ return !(i.mem_operands && i.imm_operands);
+ }
+
+ /* test without test m imm. */
if ((i.tm.base_opcode | 1) == 0x85
|| (i.tm.base_opcode | 1) == 0xa9
|| ((i.tm.base_opcode | 1) == 0xf7
- && i.tm.extension_opcode == 0)
- || (i.tm.base_opcode >= 0x38 && i.tm.base_opcode <= 0x3d)
+ && i.tm.extension_opcode == 0))
+ {
+ *mf_cmp_p = mf_cmp_test;
+ return !(i.mem_operands && i.imm_operands);
+ }
+
+ /* cmp without cmp m, imm. */
+ if ((i.tm.base_opcode >= 0x38 && i.tm.base_opcode <= 0x3d)
|| ((i.tm.base_opcode | 3) == 0x83
&& (i.tm.extension_opcode == 0x7)))
- return (i.types[0].bitfield.class == Reg
- || i.types[0].bitfield.instance == Accum
- || i.types[1].bitfield.class == Reg
- || i.types[1].bitfield.instance == Accum);
+ {
+ *mf_cmp_p = mf_cmp_cmp;
+ return !(i.mem_operands && i.imm_operands);
+ }
- /* inc, dec with any register. */
+ /* inc, dec without inc/dec m. */
if ((i.tm.cpu_flags.bitfield.cpuno64
&& (i.tm.base_opcode | 0xf) == 0x4f)
|| ((i.tm.base_opcode | 1) == 0xff
&& i.tm.extension_opcode <= 0x1))
- return (i.types[0].bitfield.class == Reg
- || i.types[0].bitfield.instance == Accum);
+ {
+ *mf_cmp_p = mf_cmp_incdec;
+ return !i.mem_operands;
+ }
return 0;
}
@@ -8424,7 +8477,7 @@ maybe_fused_with_jcc_p (void)
/* Return 1 if a FUSED_JCC_PADDING frag should be generated. */
static int
-add_fused_jcc_padding_frag_p (void)
+add_fused_jcc_padding_frag_p (enum mf_cmp_kind* mf_cmp_p)
{
/* NB: Don't work with COND_JUMP86 without i386. */
if (!align_branch_power
@@ -8433,7 +8486,7 @@ add_fused_jcc_padding_frag_p (void)
|| !(align_branch & align_branch_fused_bit))
return 0;
- if (maybe_fused_with_jcc_p ())
+ if (maybe_fused_with_jcc_p (mf_cmp_p))
{
if (last_insn.kind == last_insn_other
|| last_insn.seg != now_seg)
@@ -8481,7 +8534,8 @@ add_branch_prefix_frag_p (void)
/* Return 1 if a BRANCH_PADDING frag should be generated. */
static int
-add_branch_padding_frag_p (enum align_branch_kind *branch_p)
+add_branch_padding_frag_p (enum align_branch_kind *branch_p,
+ enum mf_jcc_kind *mf_jcc_p)
{
int add_padding;
@@ -8503,6 +8557,9 @@ add_branch_padding_frag_p (enum align_branch_kind *branch_p)
}
else
{
+ /* Because J<cc> and JN<cc> share same group in macro-fusible table,
+ igore the lowest bit. */
+ *mf_jcc_p = (i.tm.base_opcode - 0x70) >> 1;
*branch_p = align_branch_jcc;
if ((align_branch & align_branch_jcc_bit))
add_padding = 1;
@@ -8573,6 +8630,10 @@ output_insn (void)
offsetT insn_start_off;
fragS *fragP = NULL;
enum align_branch_kind branch = align_branch_none;
+ /* The initializer is arbitrary just to avoid uninitialized error.
+ it's actually either assigned in add_branch_padding_frag_p
+ or never be used. */
+ enum mf_jcc_kind mf_jcc = mf_jcc_jo;
#if defined (OBJ_ELF) || defined (OBJ_MAYBE_ELF)
if (IS_ELF && x86_used_note)
@@ -8665,7 +8726,7 @@ output_insn (void)
insn_start_frag = frag_now;
insn_start_off = frag_now_fix ();
- if (add_branch_padding_frag_p (&branch))
+ if (add_branch_padding_frag_p (&branch, &mf_jcc))
{
char *p;
/* Branch can be 8 bytes. Leave some room for prefixes. */
@@ -8686,6 +8747,7 @@ output_insn (void)
ENCODE_RELAX_STATE (BRANCH_PADDING, 0),
NULL, 0, p);
+ fragP->tc_frag_data.mf_type = mf_jcc;
fragP->tc_frag_data.branch_type = branch;
fragP->tc_frag_data.max_bytes = max_branch_padding_size;
}
@@ -8705,6 +8767,7 @@ output_insn (void)
unsigned char *q;
unsigned int j;
unsigned int prefix;
+ enum mf_cmp_kind mf_cmp;
if (avoid_fence
&& (i.tm.base_opcode == 0xfaee8
@@ -8731,7 +8794,7 @@ output_insn (void)
if (branch)
/* Skip if this is a branch. */
;
- else if (add_fused_jcc_padding_frag_p ())
+ else if (add_fused_jcc_padding_frag_p (&mf_cmp))
{
/* Make room for padding. */
frag_grow (MAX_FUSED_JCC_PADDING_SIZE);
@@ -8743,6 +8806,7 @@ output_insn (void)
ENCODE_RELAX_STATE (FUSED_JCC_PADDING, 0),
NULL, 0, p);
+ fragP->tc_frag_data.mf_type = mf_cmp;
fragP->tc_frag_data.branch_type = align_branch_fused;
fragP->tc_frag_data.max_bytes = MAX_FUSED_JCC_PADDING_SIZE;
}
@@ -10948,6 +11012,42 @@ elf_symbol_resolved_in_segment_p (symbolS *fr_symbol, offsetT fr_var)
}
#endif
+/* Table 3-2. Macro-Fusible Instructions in Haswell Microarchitecture
+ Note also work for Skylake and Cascadelake.
+---------------------------------------------------------------------
+| JCC | ADD/SUB/CMP | INC/DEC | TEST/AND |
+| ------ | ----------- | ------- | -------- |
+| Jo | N | N | Y |
+| Jno | N | N | Y |
+| Jc/Jb | Y | N | Y |
+| Jae/Jnb | Y | N | Y |
+| Je/Jz | Y | Y | Y |
+| Jne/Jnz | Y | Y | Y |
+| Jna/Jbe | Y | N | Y |
+| Ja/Jnbe | Y | N | Y |
+| Js | N | N | Y |
+| Jns | N | N | Y |
+| Jp/Jpe | N | N | Y |
+| Jnp/Jpo | N | N | Y |
+| Jl/Jnge | Y | Y | Y |
+| Jge/Jnl | Y | Y | Y |
+| Jle/Jng | Y | Y | Y |
+| Jg/Jnle | Y | Y | Y |
+--------------------------------------------------------------------- */
+static int
+i386_macro_fusible_p (enum mf_cmp_kind mf_cmp, enum mf_jcc_kind mf_jcc)
+{
+ if (mf_cmp == mf_cmp_alu || mf_cmp == mf_cmp_cmp)
+ return ((mf_jcc >= mf_jcc_jc && mf_jcc <= mf_jcc_jna)
+ || mf_jcc == mf_jcc_jl || mf_jcc == mf_jcc_jle);
+ if (mf_cmp == mf_cmp_incdec)
+ return (mf_jcc == mf_jcc_je || mf_jcc == mf_jcc_jl
+ || mf_jcc == mf_jcc_jle);
+ if (mf_cmp == mf_cmp_test || mf_cmp == mf_cmp_and)
+ return 1;
+ return 0;
+}
+
/* Return the next non-empty frag. */
static fragS *
@@ -10967,20 +11067,23 @@ i386_next_non_empty_frag (fragS *fragP)
/* Return the next jcc frag after BRANCH_PADDING. */
static fragS *
-i386_next_jcc_frag (fragS *fragP)
+i386_next_fusible_jcc_frag (fragS *maybe_cmp_fragP, fragS *pad_fragP)
{
- if (!fragP)
+ fragS *branch_fragP;
+ if (!pad_fragP)
return NULL;
- if (fragP->fr_type == rs_machine_dependent
- && (TYPE_FROM_RELAX_STATE (fragP->fr_subtype)
+ if (pad_fragP->fr_type == rs_machine_dependent
+ && (TYPE_FROM_RELAX_STATE (pad_fragP->fr_subtype)
== BRANCH_PADDING))
{
- fragP = i386_next_non_empty_frag (fragP);
- if (fragP->fr_type != rs_machine_dependent)
+ branch_fragP = i386_next_non_empty_frag (pad_fragP);
+ if (branch_fragP->fr_type != rs_machine_dependent)
return NULL;
- if (TYPE_FROM_RELAX_STATE (fragP->fr_subtype) == COND_JUMP)
- return fragP;
+ if (TYPE_FROM_RELAX_STATE (branch_fragP->fr_subtype) == COND_JUMP
+ && i386_macro_fusible_p (maybe_cmp_fragP->tc_frag_data.mf_type,
+ pad_fragP->tc_frag_data.mf_type))
+ return branch_fragP;
}
return NULL;
@@ -11025,7 +11128,7 @@ i386_classify_machine_dependent_frag (fragS *fragP)
*/
cmp_fragP = i386_next_non_empty_frag (next_fragP);
pad_fragP = i386_next_non_empty_frag (cmp_fragP);
- branch_fragP = i386_next_jcc_frag (pad_fragP);
+ branch_fragP = i386_next_fusible_jcc_frag (next_fragP, pad_fragP);
if (branch_fragP)
{
/* The BRANCH_PADDING frag is merged with the
diff --git a/gas/config/tc-i386.h b/gas/config/tc-i386.h
index 845b3901d4..93678c2282 100644
--- a/gas/config/tc-i386.h
+++ b/gas/config/tc-i386.h
@@ -273,6 +273,7 @@ struct i386_tc_frag_data
unsigned char prefix_length;
unsigned char default_prefix;
unsigned char cmp_size;
+ unsigned int mf_type : 3;
unsigned int classified : 1;
unsigned int branch_type : 3;
};
@@ -299,6 +300,7 @@ struct i386_tc_frag_data
(FRAGP)->tc_frag_data.cmp_size = 0; \
(FRAGP)->tc_frag_data.classified = 0; \
(FRAGP)->tc_frag_data.branch_type = 0; \
+ (FRAGP)->tc_frag_data.mf_type = 0; \
} \
while (0)
diff --git a/gas/testsuite/gas/i386/align-branch-9.d b/gas/testsuite/gas/i386/align-branch-9.d
new file mode 100644
index 0000000000..6340817d04
--- /dev/null
+++ b/gas/testsuite/gas/i386/align-branch-9.d
@@ -0,0 +1,78 @@
+#as: -mbranches-within-32B-boundaries
+#objdump: -dw
+
+.*: +file format .*
+
+Disassembly of section .text:
+
+0+ <foo>:
+ 0: 65 a3 01 00 00 00 mov %eax,%gs:0x1
+ 6: 55 push %ebp
+ 7: 55 push %ebp
+ 8: 55 push %ebp
+ 9: 55 push %ebp
+ a: 89 e5 mov %esp,%ebp
+ c: 89 7d f8 mov %edi,-0x8\(%ebp\)
+ f: 89 75 f4 mov %esi,-0xc\(%ebp\)
+ 12: 89 75 f4 mov %esi,-0xc\(%ebp\)
+ 15: 89 75 f4 mov %esi,-0xc\(%ebp\)
+ 18: 89 75 f4 mov %esi,-0xc\(%ebp\)
+ 1b: 89 75 f4 mov %esi,-0xc\(%ebp\)
+ 1e: 39 c5 cmp %eax,%ebp
+ 20: 70 62 jo 84 <foo\+0x84>
+ 22: 89 73 f4 mov %esi,-0xc\(%ebx\)
+ 25: 89 75 f4 mov %esi,-0xc\(%ebp\)
+ 28: 89 7d f8 mov %edi,-0x8\(%ebp\)
+ 2b: 89 75 f4 mov %esi,-0xc\(%ebp\)
+ 2e: 89 75 f4 mov %esi,-0xc\(%ebp\)
+ 31: 89 75 f4 mov %esi,-0xc\(%ebp\)
+ 34: 89 75 f4 mov %esi,-0xc\(%ebp\)
+ 37: 89 75 f4 mov %esi,-0xc\(%ebp\)
+ 3a: 5d pop %ebp
+ 3b: 5d pop %ebp
+ 3c: 5d pop %ebp
+ 3d: 74 45 je 84 <foo\+0x84>
+ 3f: 5d pop %ebp
+ 40: 74 42 je 84 <foo\+0x84>
+ 42: 89 44 24 fc mov %eax,-0x4\(%esp\)
+ 46: 89 75 f4 mov %esi,-0xc\(%ebp\)
+ 49: 89 7d f8 mov %edi,-0x8\(%ebp\)
+ 4c: 89 75 f4 mov %esi,-0xc\(%ebp\)
+ 4f: 89 75 f4 mov %esi,-0xc\(%ebp\)
+ 52: 89 75 f4 mov %esi,-0xc\(%ebp\)
+ 55: 89 75 f4 mov %esi,-0xc\(%ebp\)
+ 58: 89 75 f4 mov %esi,-0xc\(%ebp\)
+ 5b: 5d pop %ebp
+ 5c: eb 2c jmp 8a <foo\+0x8a>
+ 5e: 66 90 xchg %ax,%ax
+ 60: eb 28 jmp 8a <foo\+0x8a>
+ 62: eb 26 jmp 8a <foo\+0x8a>
+ 64: 89 45 fc mov %eax,-0x4\(%ebp\)
+ 67: 89 75 f4 mov %esi,-0xc\(%ebp\)
+ 6a: 89 7d f8 mov %edi,-0x8\(%ebp\)
+ 6d: 5d pop %ebp
+ 6e: 5d pop %ebp
+ 6f: 40 inc %eax
+ 70: 72 12 jb 84 <foo\+0x84>
+ 72: 36 36 89 45 fc ss mov %eax,%ss:-0x4\(%ebp\)
+ 77: 89 75 f4 mov %esi,-0xc\(%ebp\)
+ 7a: 89 7d f8 mov %edi,-0x8\(%ebp\)
+ 7d: 89 75 f4 mov %esi,-0xc\(%ebp\)
+ 80: 21 c3 and %eax,%ebx
+ 82: 7c 06 jl 8a <foo\+0x8a>
+ 84: 8b 45 f4 mov -0xc\(%ebp\),%eax
+ 87: 89 45 fc mov %eax,-0x4\(%ebp\)
+ 8a: 89 b5 50 fb ff ff mov %esi,-0x4b0\(%ebp\)
+ 90: 89 b5 50 fb ff ff mov %esi,-0x4b0\(%ebp\)
+ 96: 89 b5 50 fb ff ff mov %esi,-0x4b0\(%ebp\)
+ 9c: 89 b5 50 fb ff ff mov %esi,-0x4b0\(%ebp\)
+ a2: 89 75 0c mov %esi,0xc\(%ebp\)
+ a5: e9 fc ff ff ff jmp a6 <foo\+0xa6>
+ aa: 89 b5 50 fb ff ff mov %esi,-0x4b0\(%ebp\)
+ b0: 89 b5 50 fb ff ff mov %esi,-0x4b0\(%ebp\)
+ b6: 89 b5 50 fb ff ff mov %esi,-0x4b0\(%ebp\)
+ bc: 89 b5 50 fb ff ff mov %esi,-0x4b0\(%ebp\)
+ c2: 89 75 00 mov %esi,0x0\(%ebp\)
+ c5: 74 c3 je 8a <foo\+0x8a>
+ c7: 74 c1 je 8a <foo\+0x8a>
+#pass
diff --git a/gas/testsuite/gas/i386/align-branch-9.s b/gas/testsuite/gas/i386/align-branch-9.s
new file mode 100644
index 0000000000..357abe30f9
--- /dev/null
+++ b/gas/testsuite/gas/i386/align-branch-9.s
@@ -0,0 +1,74 @@
+ .text
+ .globl foo
+ .p2align 4
+foo:
+ movl %eax, %gs:0x1
+ pushl %ebp
+ pushl %ebp
+ pushl %ebp
+ pushl %ebp
+ movl %esp, %ebp
+ movl %edi, -8(%ebp)
+ movl %esi, -12(%ebp)
+ movl %esi, -12(%ebp)
+ movl %esi, -12(%ebp)
+ movl %esi, -12(%ebp)
+ movl %esi, -12(%ebp)
+ cmp %eax, %ebp
+ jo .L_2
+ movl %esi, -12(%ebx)
+ movl %esi, -12(%ebp)
+ movl %edi, -8(%ebp)
+ movl %esi, -12(%ebp)
+ movl %esi, -12(%ebp)
+ movl %esi, -12(%ebp)
+ movl %esi, -12(%ebp)
+ movl %esi, -12(%ebp)
+ popl %ebp
+ popl %ebp
+ popl %ebp
+ je .L_2
+ popl %ebp
+ je .L_2
+ movl %eax, -4(%esp)
+ movl %esi, -12(%ebp)
+ movl %edi, -8(%ebp)
+ movl %esi, -12(%ebp)
+ movl %esi, -12(%ebp)
+ movl %esi, -12(%ebp)
+ movl %esi, -12(%ebp)
+ movl %esi, -12(%ebp)
+ popl %ebp
+ jmp .L_3
+ jmp .L_3
+ jmp .L_3
+ movl %eax, -4(%ebp)
+ movl %esi, -12(%ebp)
+ movl %edi, -8(%ebp)
+ popl %ebp
+ popl %ebp
+ inc %eax
+ jc .L_2
+ movl %eax, -4(%ebp)
+ movl %esi, -12(%ebp)
+ movl %edi, -8(%ebp)
+ movl %esi, -12(%ebp)
+ and %eax, %ebx
+ jl .L_3
+.L_2:
+ movl -12(%ebp), %eax
+ movl %eax, -4(%ebp)
+.L_3:
+ movl %esi, -1200(%ebp)
+ movl %esi, -1200(%ebp)
+ movl %esi, -1200(%ebp)
+ movl %esi, -1200(%ebp)
+ movl %esi, 12(%ebp)
+ jmp bar
+ movl %esi, -1200(%ebp)
+ movl %esi, -1200(%ebp)
+ movl %esi, -1200(%ebp)
+ movl %esi, -1200(%ebp)
+ movl %esi, (%ebp)
+ je .L_3
+ je .L_3
diff --git a/gas/testsuite/gas/i386/i386.exp b/gas/testsuite/gas/i386/i386.exp
index 685e62ea72..8fc621f2bb 100644
--- a/gas/testsuite/gas/i386/i386.exp
+++ b/gas/testsuite/gas/i386/i386.exp
@@ -525,6 +525,7 @@ if [expr ([istarget "i*86-*-*"] || [istarget "x86_64-*-*"]) && [gas_32_check]]
run_dump_test "align-branch-6"
run_dump_test "align-branch-7"
run_dump_test "align-branch-8"
+ run_dump_test "align-branch-9"
# These tests require support for 8 and 16 bit relocs,
# so we only run them for ELF and COFF targets.
@@ -1100,6 +1101,7 @@ if [expr ([istarget "i*86-*-*"] || [istarget "x86_64-*-*"]) && [gas_64_check]] t
run_dump_test "x86-64-align-branch-6"
run_dump_test "x86-64-align-branch-7"
run_dump_test "x86-64-align-branch-8"
+ run_dump_test "x86-64-align-branch-9"
if { ![istarget "*-*-aix*"]
&& ![istarget "*-*-beos*"]
diff --git a/gas/testsuite/gas/i386/x86-64-align-branch-9.d b/gas/testsuite/gas/i386/x86-64-align-branch-9.d
new file mode 100644
index 0000000000..1041fd0483
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-align-branch-9.d
@@ -0,0 +1,46 @@
+#as: -mbranches-within-32B-boundaries
+#objdump: -dw
+
+.*: +file format .*
+
+Disassembly of section .text:
+
+0+ <foo>:
+ 0: c1 e9 02 shr \$0x2,%ecx
+ 3: c1 e9 02 shr \$0x2,%ecx
+ 6: c1 e9 02 shr \$0x2,%ecx
+ 9: 89 d1 mov %edx,%ecx
+ b: 31 c0 xor %eax,%eax
+ d: c1 e9 02 shr \$0x2,%ecx
+ 10: c1 e9 02 shr \$0x2,%ecx
+ 13: c1 e9 02 shr \$0x2,%ecx
+ 16: c1 e9 02 shr \$0x2,%ecx
+ 19: c1 e9 02 shr \$0x2,%ecx
+ 1c: c1 e9 02 shr \$0x2,%ecx
+ 1f: 80 fa 02 cmp \$0x2,%dl
+ 22: 70 df jo 3 <foo\+0x3>
+ 24: 2e 2e 2e 2e 31 c0 cs cs cs cs xor %eax,%eax
+ 2a: c1 e9 02 shr \$0x2,%ecx
+ 2d: c1 e9 02 shr \$0x2,%ecx
+ 30: c1 e9 02 shr \$0x2,%ecx
+ 33: 89 d1 mov %edx,%ecx
+ 35: 31 c0 xor %eax,%eax
+ 37: c1 e9 02 shr \$0x2,%ecx
+ 3a: c1 e9 02 shr \$0x2,%ecx
+ 3d: c1 e9 02 shr \$0x2,%ecx
+ 40: f6 c2 02 test \$0x2,%dl
+ 43: 75 e8 jne 2d <foo\+0x2d>
+ 45: 31 c0 xor %eax,%eax
+ 47: c1 e9 02 shr \$0x2,%ecx
+ 4a: c1 e9 02 shr \$0x2,%ecx
+ 4d: 89 d1 mov %edx,%ecx
+ 4f: c1 e9 02 shr \$0x2,%ecx
+ 52: c1 e9 02 shr \$0x2,%ecx
+ 55: 89 d1 mov %edx,%ecx
+ 57: c1 e9 02 shr \$0x2,%ecx
+ 5a: 89 d1 mov %edx,%ecx
+ 5c: 31 c0 xor %eax,%eax
+ 5e: ff c0 inc %eax
+ 60: 76 cb jbe 2d <foo\+0x2d>
+ 62: 31 c0 xor %eax,%eax
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-align-branch-9.s b/gas/testsuite/gas/i386/x86-64-align-branch-9.s
new file mode 100644
index 0000000000..917579bda4
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-align-branch-9.s
@@ -0,0 +1,43 @@
+ .text
+ .p2align 4,,15
+foo:
+ shrl $2, %ecx
+.L1:
+ shrl $2, %ecx
+ shrl $2, %ecx
+ movl %edx, %ecx
+ xorl %eax, %eax
+ shrl $2, %ecx
+ shrl $2, %ecx
+ shrl $2, %ecx
+ shrl $2, %ecx
+ shrl $2, %ecx
+ shrl $2, %ecx
+ cmpb $2, %dl
+ jo .L1
+ xorl %eax, %eax
+ shrl $2, %ecx
+.L2:
+ shrl $2, %ecx
+ shrl $2, %ecx
+ movl %edx, %ecx
+ xorl %eax, %eax
+ shrl $2, %ecx
+ shrl $2, %ecx
+ shrl $2, %ecx
+ testb $2, %dl
+ jne .L2
+ xorl %eax, %eax
+.L3:
+ shrl $2, %ecx
+ shrl $2, %ecx
+ movl %edx, %ecx
+ shrl $2, %ecx
+ shrl $2, %ecx
+ movl %edx, %ecx
+ shrl $2, %ecx
+ movl %edx, %ecx
+ xorl %eax, %eax
+ inc %eax
+ jbe .L2
+ xorl %eax, %eax
--
2.18.1
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] Separate macro-fusible instructions from not for -mbranches-within-32B-boundaries
2020-03-02 4:05 ` Hongtao Liu
@ 2020-03-02 8:32 ` Jan Beulich
2020-03-03 3:40 ` Hongtao Liu
0 siblings, 1 reply; 9+ messages in thread
From: Jan Beulich @ 2020-03-02 8:32 UTC (permalink / raw)
To: Hongtao Liu; +Cc: H. J. Lu, binutils
On 02.03.2020 05:05, Hongtao Liu wrote:
> Update patch.
>+enum mf_jcc_kind
>+ {
>+ mf_jcc_jo = 0, /* base opcode 0x70 */
Is there a reason for the " = 0" here? Without is, the standard
still mandates the value to be zero.
>+enum mf_cmp_kind
>+ {
>+ mf_cmp_test, /* test */
>+ mf_cmp_cmp, /* cmp */
>+ mf_cmp_and, /* and */
>+ mf_cmp_alu, /* add/sub */
>+ mf_cmp_incdec /* inc/dec */
>+ };
I saw your reply to my question of why this is 5 instead of 3
enumerators. Yet I think this being a workaround for something
that hopefully will be fixed in newer hardware suggests to go
with the minimal necessary set first, and split enumerators
only if indeed needed down the road.
>@@ -8503,6 +8557,9 @@ add_branch_padding_frag_p (enum align_branch_kind *branch_p)
> }
> else
> {
>+ /* Because J<cc> and JN<cc> share same group in macro-fusible table,
>+ igore the lowest bit. */
>+ *mf_jcc_p = (i.tm.base_opcode - 0x70) >> 1;
May I suggest to use (i.tm.base_opcode & 0x0e) >> 1, to be independent
of the insn variant presently in i.tm (after all we do dynamically
update it after copying from the template)?
Thanks for doing the other adjustments.
Jan
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] Separate macro-fusible instructions from not for -mbranches-within-32B-boundaries
2020-03-02 8:32 ` Jan Beulich
@ 2020-03-03 3:40 ` Hongtao Liu
2020-03-03 8:38 ` Jan Beulich
0 siblings, 1 reply; 9+ messages in thread
From: Hongtao Liu @ 2020-03-03 3:40 UTC (permalink / raw)
To: Jan Beulich; +Cc: H. J. Lu, binutils
[-- Attachment #1: Type: text/plain, Size: 1707 bytes --]
On Mon, Mar 2, 2020 at 4:32 PM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 02.03.2020 05:05, Hongtao Liu wrote:
> > Update patch.
>
> >+enum mf_jcc_kind
> >+ {
> >+ mf_jcc_jo = 0, /* base opcode 0x70 */
>
> Is there a reason for the " = 0" here? Without is, the standard
> still mandates the value to be zero.
>
Yes.
I saw some other places in tc-i386.c where first member of enum is
assigned as 0, and also some without initializer, so i guess when the
enum variable is used as left value such as calculated by (baseopcode
& 0x0e ) >>1, it should be explicitly assigned, otherwise not.
Just my guess.
> >+enum mf_cmp_kind
> >+ {
> >+ mf_cmp_test, /* test */
> >+ mf_cmp_cmp, /* cmp */
> >+ mf_cmp_and, /* and */
> >+ mf_cmp_alu, /* add/sub */
> >+ mf_cmp_incdec /* inc/dec */
> >+ };
>
> I saw your reply to my question of why this is 5 instead of 3
> enumerators. Yet I think this being a workaround for something
> that hopefully will be fixed in newer hardware suggests to go
> with the minimal necessary set first, and split enumerators
> only if indeed needed down the road.
Done.
>
> >@@ -8503,6 +8557,9 @@ add_branch_padding_frag_p (enum align_branch_kind *branch_p)
> > }
> > else
> > {
> >+ /* Because J<cc> and JN<cc> share same group in macro-fusible table,
> >+ igore the lowest bit. */
> >+ *mf_jcc_p = (i.tm.base_opcode - 0x70) >> 1;
>
> May I suggest to use (i.tm.base_opcode & 0x0e) >> 1, to be independent
> of the insn variant presently in i.tm (after all we do dynamically
> update it after copying from the template)?
>
Done.
Update my patch.
> Thanks for doing the other adjustments.
>
> Jan
--
BR,
Hongtao
[-- Attachment #2: 0001-According-to-intel-SDM-manual-not-all-compare-flag-m_v2.patch --]
[-- Type: application/x-patch, Size: 24132 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] Separate macro-fusible instructions from not for -mbranches-within-32B-boundaries
2020-03-03 3:40 ` Hongtao Liu
@ 2020-03-03 8:38 ` Jan Beulich
2020-03-03 14:12 ` H.J. Lu
0 siblings, 1 reply; 9+ messages in thread
From: Jan Beulich @ 2020-03-03 8:38 UTC (permalink / raw)
To: Hongtao Liu; +Cc: H. J. Lu, binutils
On 03.03.2020 04:40, Hongtao Liu wrote:
> Update my patch.
Thanks again, looks good to me now with just one minor comment:
>+enum mf_cmp_kind
>+ {
>+ mf_cmp_test_and, /* test/cmp */
If there is a comment here, I think it should also mention AND. But
of course given the enumerator's name I'm not sure a comment is
needed in the first place. But you'll have to wait for H.J.'s
comments / okay anyway.
Jan
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] Separate macro-fusible instructions from not for -mbranches-within-32B-boundaries
2020-03-03 8:38 ` Jan Beulich
@ 2020-03-03 14:12 ` H.J. Lu
0 siblings, 0 replies; 9+ messages in thread
From: H.J. Lu @ 2020-03-03 14:12 UTC (permalink / raw)
To: Jan Beulich; +Cc: Hongtao Liu, Binutils
On Tue, Mar 3, 2020 at 12:38 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 03.03.2020 04:40, Hongtao Liu wrote:
> > Update my patch.
>
> Thanks again, looks good to me now with just one minor comment:
>
> >+enum mf_cmp_kind
> >+ {
> >+ mf_cmp_test_and, /* test/cmp */
>
> If there is a comment here, I think it should also mention AND. But
> of course given the enumerator's name I'm not sure a comment is
> needed in the first place. But you'll have to wait for H.J.'s
> comments / okay anyway.
>
LGTM. I am checking it in for Hongtao.
Thank you both.
--
H.J.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2020-03-03 14:12 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-28 8:56 [PATCH] Separate macro-fusible instructions from not for -mbranches-within-32B-boundaries Hongtao Liu
2020-02-28 8:57 ` Hongtao Liu
2020-02-28 11:35 ` Jan Beulich
2020-03-02 3:14 ` Hongtao Liu
2020-03-02 4:05 ` Hongtao Liu
2020-03-02 8:32 ` Jan Beulich
2020-03-03 3:40 ` Hongtao Liu
2020-03-03 8:38 ` Jan Beulich
2020-03-03 14:12 ` H.J. Lu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).